History log of /netbsd-current/sys/netinet/ip_input.c
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 1.402 02-Sep-2022 thorpej

pktqueue: Re-factor sysctl handling.

Provide a new pktq_sysctl_setup() function that attaches standard
pktq sysctl nodes below a specified parent node, with either a
fixed node ID or CTL_CREATE to dynamically assign node IDs. Make
all of the sysctl handlers private to pktqueue.c, and remove the
INET- and INET6-specific pktqueue sysctl code from net/if.c.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.401 08-Mar-2021 christos

remove now unused pseudo-random ip id code.


# 1.400 07-Mar-2021 christos

netinet: Enable random IP fragment ids by default (from riastradh)


# 1.399 19-Feb-2021 christos

- Make ALIGNED_POINTER use __alignof(t) instead of sizeof(t). This is more
correct because it works with non-primitive types and provides the ABI
alignment for the type the compiler will use.
- Remove all the *_HDR_ALIGNMENT macros and asserts
- Replace POINTER_ALIGNED_P with ACCESSIBLE_POINTER which is identical to
ALIGNED_POINTER, but returns that the pointer is always aligned if the
CPU supports unaligned accesses.
[ as proposed in tech-kern ]


# 1.398 14-Feb-2021 christos

- centralize header align and pullup into a single inline function
- use a single macro to align pointers and expose the alignment, instead
of hard-coding 3 in 1/2 the macros.
- fix an issue in the ipv6 lt2p where it was aligning for ipv4 and pulling
for ipv6.


# 1.397 28-Aug-2020 ozaki-r

branches: 1.397.2;
inet: reduce silent packet discards


# 1.396 28-Aug-2020 ozaki-r

inet: pull m_get_rcvif_psref out of ip_input for simplicity

Same as ip6_input.


# 1.395 28-Aug-2020 ozaki-r

ipsec: rename ipsec_ip_input to ipsec_ip_input_checkpolicy

Because it just checks if a packet passes security policies.


# 1.394 28-Aug-2020 ozaki-r

inet, inet6: count packets dropped by IPsec

The counters count packets dropped due to security policy checks.


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3 ad-namecache-base2 ad-namecache-base1 ad-namecache-base phil-wifi-20191119
# 1.393 13-Nov-2019 ozaki-r

Get rid of unnecessary NULL checks for rt_ifa and ifa_ifp

They are always non-NULL nowadays.


# 1.392 19-Sep-2019 ozaki-r

Apply some missing changes lost on the previous commit


# 1.391 19-Sep-2019 ozaki-r

Avoid having a rtcache directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.
A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.

Reviewed by knakahara@ and yamaguchi@


# 1.390 15-Sep-2019 bouyer

Packet filters can return an mbuf chain with fragmented headers, so
m_pullup() it if needed and remove the KASSERT()s.


Revision tags: netbsd-9-base phil-wifi-20190609
# 1.389 13-May-2019 ozaki-r

branches: 1.389.2;
Count packets dropped by pfil


Revision tags: isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118
# 1.388 17-Jan-2019 knakahara

Fix ipsecif(4) cannot apply input direction packet filter. Reviewed by ozaki-r@n.o and ryo@n.o.

Add ATF later.


Revision tags: pgoyette-compat-1226 pgoyette-compat-1126
# 1.387 15-Nov-2018 maxv

Remove the 't' argument from m_tag_find().


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.386 02-Sep-2018 maxv

remove reference to ipnat, and duplicate comments


Revision tags: pgoyette-compat-0728
# 1.385 10-Jul-2018 maxv

Remove the second argument from ip_reass_packet(). We want the IP header
on the mbuf, not elsewhere. Simplifies the NPF reassembly code a little.
No real functional change.


Revision tags: phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.384 17-May-2018 maxv

branches: 1.384.2;
Add KASSERTs, related to PR/39794.


# 1.383 14-May-2018 maxv

Merge ipsec4_input and ipsec6_input into ipsec_ip_input. Make the argument
a bool for clarity. Optimize the function: if M_CANFASTFWD is not there
(because already removed by the firewall) leave now.

Makes it easier to see that M_CANFASTFWD is not removed on IPv6.


# 1.382 10-May-2018 maxv

Rename ipsec4_forward -> ipsec_mtu, and switch to void.


Revision tags: pgoyette-compat-0502
# 1.381 26-Apr-2018 maxv

Remove unused mbuf argument from sbsavetimestamp.


Revision tags: pgoyette-compat-0422 pgoyette-compat-0415
# 1.380 15-Apr-2018 maxv

Introduce a m_verify_packet function, that verifies the mbuf chain of a
packet to ensure it is not malformed. Call this function in "points of
interest", that are the IPv4/IPv6/IPsec entry points. There could be more.

We use M_VERIFY_PACKET(m), declared under DIAGNOSTIC only.

This function should not be called everywhere, especially not in places
that temporarily manipulate (and clobber) the mbuf structure; once they're
done they put the mbuf back in a correct format.


# 1.379 11-Apr-2018 maxv

Don't pass IP_ALLOWBROADCAST in ipsec4_input. The flag lands in
ipsec_getpolicybyaddr, and only IP_FORWARDING is taken.

In fact it would be good to change the 'flags' argument of ipsec4_input
to be a boolean, same for ipsec_getpolicybyaddr. It would be less
misleading.


# 1.378 11-Apr-2018 maxv

Add comment about IPsec.


# 1.377 11-Apr-2018 maxv

Small changes in ip_dooptions: replace bcopy by memcpy, the areas can't
overlap.


Revision tags: pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.376 24-Feb-2018 ozaki-r

branches: 1.376.2;
Avoid a deadlock between softnet_lock and IFNET_LOCK

A deadlock occurs because there is a violation of the rule of lock ordering;
softnet_lock is held with hodling IFNET_LOCK, which violates the rule.
To avoid the deadlock, replace softnet_lock in in_control and in6_control
with KERNEL_LOCK.

We also need to add some KERNEL_LOCKs to protect the network stack surely.
This is required, for example, for PR kern/51356.

Fix PR kern/53043


# 1.375 09-Feb-2018 maxv

Remove dead code.


# 1.374 07-Feb-2018 maxv

Remove null check on ip, it can't be null. (Confuses code scanners.)


# 1.373 06-Feb-2018 maxv

Typos and style a bit, no functional change.


# 1.372 05-Feb-2018 maxv

Exterminate IPSENDREDIRECTS and IPMTUDISCTIMEOUT, neither is documented.


# 1.371 05-Feb-2018 maxv

Nuke DIRECTED_BROADCAST, it is not documented and not enabled anywhere. It
probably wouldn't have built correctly anyway, since there is no associated
defflag.

These ten lines of code in ip_input.c already look a lot better.


# 1.370 05-Feb-2018 maxv

Clean up this mess. This is typically the kind of places where we need to
seriously cut the bullshit. These things are unreadable, undocumented, and
all they bought us was not figuring out we had IPv4 forwarding enabled by
default for 20+ years.


# 1.369 05-Feb-2018 maxv

Be tougher, and don't allow LSRR+SSRR (RFC7126).


# 1.368 05-Feb-2018 maxv

Kick duplicate options, they are not allowed (RFC791).


# 1.367 05-Feb-2018 maxv

Remove unused variable.


# 1.366 05-Feb-2018 maxv

Disable ip_allowsrcrt and ip_forwsrcrt. Enabling them by default was a
completely dumb idea, because they have security implications.

By sending an IPv4 packet containing an LSRR option, an attacker will
cause the system to forward the packet to another IPv4 address - and
this way he white-washes the source of the packet.

It is also possible for an attacker to reach hidden networks: if a server
has a public address, and a private one on an internal network (network
which has several internal machines connected), the attacker can send a
packet with:

source = 0.0.0.0
destination = public address of the server
LSRR first address = address of a machine on the internal network

And the packet will be forwarded, by the server, to the internal machine,
in some cases even with the internal IP address of the server as a source.


# 1.365 05-Feb-2018 maxv

Style, no functional change.


# 1.364 01-Jan-2018 christos

1) "#define ipi_spec_dst ipi_addr" in <netinet/in.h>
2) Change the IP_RECVPKTINFO option to control the generation of
IP_PKTINFO control messages, the way it's done in Solaris.
3) Remove the superfluous IP_RECVPKTINFO control message.
4) Change the IP_PKTINFO option to do different things depending on
the parameter it's supplied with:
- If it's sizeof(int), assume it's being used as in Linux:
- If it's non-zero, turn on the IP_RECVPKTINFO option.
- If it's zero, turn off the IP_RECVPKTINFO option.
- If it's sizeof(struct in_pktinfo), assume it's being used as in
Solaris, to set a default for the source interface and/or
source address for outgoing packets on the socket.
5) Return what Linux or Solaris compatible code expects, depending
on data size, and just added a fallback to a Linux (and current NetBSD)
compatible value if the size is unknown (as it is now), or,
in the future, if the calling application specifies a receiving
buffer that doesn't match either data item.

From: Tom Ivar Helbekkmo


Revision tags: tls-maxphys-base-20171202
# 1.363 24-Nov-2017 roy

Allow local communication over DETACHED addresses.
Allow binding to DETACHED or TENTATIVE addresses as we deny
sending upstream from them anyway.
Prefer non DETACHED or TENTATIVE addresses.


# 1.362 17-Nov-2017 ozaki-r

Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch

It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.

No functional change


# 1.361 27-Sep-2017 ozaki-r

Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).


Revision tags: nick-nhusb-base-20170825
# 1.360 27-Jul-2017 ozaki-r

Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.


# 1.359 19-Jul-2017 ozaki-r

Correct a comment


Revision tags: perseant-stdc-iso10646-base
# 1.358 08-Jul-2017 christos

Reorder the controls to the ones that need an interface and the ones that
don't; process the ones that don't first. Add a DIAGNOSTIC if there is no
interface; really this should be a KASSERT/panic because it is a bug if the
interface is not set at this point.


# 1.357 06-Jul-2017 christos

remove unnecessary casts (no functional change)


# 1.356 06-Jul-2017 christos

Merge the two copies SO_TIMESTAMP/SO_OTIMESTAMP processing to a single
function, and add a SOOPT_TIMESTAMP define reducing compat pollution from
5 places to 1.


Revision tags: netbsd-8-base
# 1.355 01-Jun-2017 chs

branches: 1.355.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base
# 1.354 31-Mar-2017 ozaki-r

Don't use a single global variable to store source route information for multiple incoming packets

It's not MP-safe. So use a m_tag to store the information instead.

Pointed out by knakahara@
The fix is from OpenBSD (originally fixed in FreeBSD)


# 1.353 31-Mar-2017 ozaki-r

Don't use a single global variable as a temporal storage for multiple packets

It's not MP-safe. So use local variables instead.


Revision tags: pgoyette-localcount-20170320
# 1.352 06-Mar-2017 ozaki-r

Make sure icmp_redirect_timeout_q and ip_mtudisc_timeout_q are initialized on bootup

Fix PR kern/52029


# 1.351 17-Feb-2017 ozaki-r

Fix return value


# 1.350 17-Feb-2017 ozaki-r

Protect sysctl_net_inet_ip_pmtudto with icmp_mtx instead of softnet_lock


# 1.349 07-Feb-2017 ozaki-r

Add missing NULL checks for m_get_rcvif


Revision tags: nick-nhusb-base-20170204
# 1.348 24-Jan-2017 ozaki-r

Tweak softnet_lock and NET_MPSAFE

- Don't hold softnet_lock in some functions if NET_MPSAFE
- Add softnet_lock to sysctl_net_inet_icmp_redirtimeout
- Add softnet_lock to expire_upcalls of ip_mroute.c
- Restore softnet_lock for in{,6}_pcbpurgeif{,0} if NET_MPSAFE
- Mark some softnet_lock for future work


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107
# 1.347 12-Dec-2016 ozaki-r

branches: 1.347.2;
Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.346 08-Dec-2016 ozaki-r

Use psref for ip_rtaddr

ip_rtaddr will be sleepable soon. So use psref instead of pserialize.


# 1.345 08-Dec-2016 ozaki-r

Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.344 18-Oct-2016 ozaki-r

Don't hold global locks if NET_MPSAFE is enabled

If NET_MPSAFE is enabled, don't hold KERNEL_LOCK and softnet_lock in
part of the network stack such as IP forwarding paths. The aim of the
change is to make it easy to test the network stack without the locks
and reduce our local diffs.

By default (i.e., if NET_MPSAFE isn't enabled), the locks are held
as they used to be.

Reviewed by knakahara@


# 1.343 18-Oct-2016 ozaki-r

Avoid double frees of mbuf

May fix one of panicks reported by Tom Ivar Helbekkmo in PR kern/51522


# 1.342 11-Oct-2016 ozaki-r

Fix kernel builds with IFA_STATS


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.341 07-Sep-2016 roy

Disallow input to detached addresses because they are not yet valid.


# 1.340 31-Aug-2016 ozaki-r

Make ipforward_rt and ip6_forward_rt percpu

Sharing one rtcache between CPUs is just a bad idea.

Reviewed by knakahara@


Revision tags: pgoyette-localcount-20160806
# 1.339 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


# 1.338 26-Jul-2016 ozaki-r

Fix downmatch increment


Revision tags: pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.337 08-Jul-2016 ozaki-r

branches: 1.337.2;
CID 1363344: remove dead code

We may need to reconsider a case when m_get_rcvif_psref returns NULL.


# 1.336 07-Jul-2016 ozaki-r

Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.


# 1.335 06-Jul-2016 ozaki-r

Switch the IPv4 address list to pslist(9)

Note that we leave the old list just in case; it seems there are some
kvm(3) users accessing the list. We can remove it later if we confirmed
nobody does actually.


# 1.334 06-Jul-2016 ozaki-r

Add and use pslist(9)-based hashtable for IPv4 addresses

Note that we leave the old hashtable to keep vmstat -H working.


# 1.333 04-Jul-2016 ozaki-r

Separate IP address matching functions

No functional change intended.


# 1.332 30-Jun-2016 ozaki-r

Tidy up goto lables

No functional change.


# 1.331 30-Jun-2016 ozaki-r

Fix error paths

Some error paths did m_put_rcvif_psref twice.


# 1.330 28-Jun-2016 ozaki-r

Add missing NULL checks for m_get_rcvif_psref


# 1.329 10-Jun-2016 ozaki-r

Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319
# 1.328 21-Jan-2016 riastradh

Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.


# 1.327 21-Jan-2016 riastradh

Give proper prototype to ip_output.


# 1.326 08-Jan-2016 knakahara

eliminate ip_input.c and ip6_input.c dependency on gif(4)


Revision tags: nick-nhusb-base-20151226
# 1.325 13-Oct-2015 roy

Include arp.h to restore the sysctl net.inet.ip.dad_count.
Fixes PR kern/49883 thanks to HITOSHI Osada.


Revision tags: nick-nhusb-base-20150921
# 1.324 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.323 07-Aug-2015 ozaki-r

Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.


Revision tags: nick-nhusb-base-20150606
# 1.322 02-May-2015 joerg

Fix !ARP build.


# 1.321 02-May-2015 roy

Add IPv4 address flags IN_IFF_TENTATIVE, IN_IFF_DUPLICATED and
IN_IFF_DETATCHED to mimic the IPv6 address behaviour.
Add SIOCGIFAFLAG_IN ioctl to retrieve the address flag via the
ifreq structure.
Add IPv4 DAD detection via the ARP methods described in RFC 5227.
Add sysctls net.inet.ip.dad_count and net.inet.arp.debug.

Discussed on tech-net@


Revision tags: nick-nhusb-base-20150406
# 1.320 26-Mar-2015 ozaki-r

Tidy up the regular path of ip_forward

No functional change is intended.


Revision tags: netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.319 16-Jun-2014 ozaki-r

branches: 1.319.2; 1.319.4; 1.319.6; 1.319.10;
Add 3rd argument to pktq_create to pass sc

It will be used to pass bridge sc for bridge_forward softint.

ok rmind@


# 1.318 05-Jun-2014 rmind

- Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.


# 1.317 30-May-2014 christos

Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.


# 1.316 29-May-2014 rmind

Make IGMP and multicast group management code MP-safe. Use a read-write
lock to protect the hash table of multicast address records; also, make it
private and eliminate some macros. In the long term, the lookup path ought
to be optimised.


# 1.315 28-May-2014 christos

CID 12164{49,51}: Remove bogus ifp == NULL checks; if ifp was really NULL,
we would have been dead a few lines before the tests.


# 1.314 23-May-2014 rmind

ip_input(), ip_savecontrol(): cache m->m_pkthdr.rcvif in a variable.


# 1.313 23-May-2014 rmind

Make ip_forward() static, there is no need to expose it.


# 1.312 23-May-2014 rmind

Make ip_input() static, there is no need to expose it.


# 1.311 22-May-2014 rmind

- Add in_init() and move some functions, variables and sysctls into in.c
where they belong to. Make some functions and variables static.
- ip_input.c: reduce some #ifdefs, cleanup a little.
- Move some sysctls into ip_flow.c as they belong there.

No functional change.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.310 19-Mar-2014 liamjfoy

branches: 1.310.2;
Remove ipflow_prune and replace with ipflow_reap. ok rmind@


Revision tags: riastradh-drm2-base3
# 1.309 25-Feb-2014 pooka

Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.308 29-Jun-2013 rmind

- Rewrite parts of pfil(9): use array to store hooks and thus be more cache
friendly (there are only few hooks in the system). Make the structures
opaque and the interface more strict.
- Remove PFIL_HOOKS option by making pfil(9) mandatory.


# 1.307 27-Jun-2013 christos

branches: 1.307.2;
flip src/dst


# 1.306 27-Jun-2013 christos

implement IP_PKTINFO and IP_RECVPKTINFO.


# 1.305 08-Jun-2013 rmind

Split IPsec code in ip_input() and ip_forward() into the separate routines
ipsec4_input() and ipsec4_forward(). Tested by christos@.


# 1.304 05-Jun-2013 christos

IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.


Revision tags: agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.303 29-Nov-2012 christos

Add a new sysctl to mark ports as reserved, so that they are not used in
the anonymous or reserved port allocation.


Revision tags: yamt-pagecache-base6
# 1.302 25-Jun-2012 christos

branches: 1.302.2;
rename rfc6056 -> portalgo, requested by yamt


# 1.301 22-Jun-2012 christos

PR/46602: Move the rfc6056 port randomization to the IP layer.


# 1.300 02-Jun-2012 dsl

Add some pre-processor magic to verify that the type of the data item
passed to sysctl_createv() actually matches the declared type for
the item itself.
In the places where the caller specifies a function and a structure
address (typically the 'softc') an explicit (void *) cast is now needed.
Fixes bugs in sys/dev/acpi/asus_acpi.c sys/dev/bluetooth/bcsp.c
sys/kern/vfs_bio.c sys/miscfs/syncfs/sync_subr.c and setting
AcpiGbl_EnableAmlDebugObject.
(mostly passing the address of a uint64_t when typed as CTLTYPE_INT).
I've test built quite a few kernels, but there may be some unfixed MD
fallout. Most likely passing &char[] to char *.
Also add CTLFLAG_UNSIGNED for unsiged decimals - not set yet.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.299 22-Mar-2012 drochner

remove KAME IPSEC, replaced by FAST_IPSEC


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.298 09-Jan-2012 liamjfoy

branches: 1.298.2; 1.298.6; 1.298.8;
check against NULL


# 1.297 19-Dec-2011 drochner

rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.296 31-Aug-2011 plunky

branches: 1.296.2; 1.296.6;
NULL does not need a cast


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.295 03-May-2011 dyoung

*_drain() routines may be called with locks held, so instead of doing
any work in *_drain(), set a drain-needed flag. Do the work in the
fasttimo handler.

Contributed by Coyote Point Systems, Inc.


# 1.294 14-Apr-2011 dyoung

In ipintr(), don't overwrite ipintrq.ifq_maxlen with IFQ_MAXLEN.

Initialize ipintrq.ifq_maxlen using IFQ_MAXLEN directly instead of using
the global ipqmaxlen. Get rid of the global ipqmaxlen.

Now it works again to override the maximum IP queue length with, for
example, sysctl -w net.inet.ip.ifq.maxlen=5.


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.293 13-Dec-2010 matt

branches: 1.293.2;
Back out rev that shouldn't have been committed.


# 1.292 11-Dec-2010 matt

Add routines to calculate a checkesum if the driver concludes that the
h/w can't do it.


Revision tags: uebayasi-xip-base4
# 1.291 05-Nov-2010 rmind

ip_randomid: make mechanism MP-safe and more modular.

OK matt@


# 1.290 05-Nov-2010 rmind

ip_reass_packet: finish abstraction; some clean-up.
Discussed some time ago with matt@.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.289 19-Jul-2010 rmind

Abstract IP reassembly into single generic routine - ip_reass_packet().
Make struct ipq private and struct ipqent not visible to userland.
Push ip_len adjustment into reassembly layer.

OK matt@


# 1.288 13-Jul-2010 rmind

Split-off IPv4 re-assembly mechanism into a separate module. Abstract
into ip_reass_init(), ip_reass_lookup(), etc (note: abstraction is not
yet complete). No functional changes to the actual mechanism.

OK matt@


# 1.287 09-Jul-2010 rmind

ip_input: move lookup for fragment queue a little bit further. OK matt@.


Revision tags: uebayasi-xip-base1
# 1.286 01-Apr-2010 tls

As suggested by at least 3 different people (the guilty parties know who
they are) avoid repeated kernel_lock/unlock by using an intrq on the stack.

About 5%-10% better from run to run, on my *very* simpleminded test. Can't
possibly be worse.


# 1.285 31-Mar-2010 tls

Don't hold kernel lock across call to ip_input() -- it blocked *all*
hardware interrupts for the length of time it took for all dequeued
packets to flow up the stack (on multiprocessors only). Initial testing
shows performance impact is minimal -- since this temporary fix actually
means taking/releasing the kernel lock per-packet, that seems
acceptable.

Holding the kernel lock across the ip_input() call duplicated the
exclusion intended to be provided by the socket locks/softnet lock
(same lock, for INET/INET6 sockets) and could mask serious bugs. Several
hours' testing didn't turn any up but I'd be surprised if some don't now
appear.

Damon Permezel noticed the problem. Temporary fix suggested by matt@.


Revision tags: yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.284 16-Sep-2009 pooka

branches: 1.284.2; 1.284.4;
Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.283 17-Jul-2009 minskim

Delete trailing whitespace.


Revision tags: yamt-nfs-mp-base6
# 1.282 16-Jul-2009 minskim

Add the IP_RECVTTL option support.

If the IP_RECVTTL option is enabled on a SOCK_DGRAM socket, the
recvmsg(2) call will return the TTL of the received datagram. The
msg_control field in the msghdr structure points to a buffer that
contains a cmsghdr structure followed by the TTL value.

Modeled after FreeBSD implementation.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.281 18-Apr-2009 tsutsui

Remove extra whitespace added by a stupid tool.
XXX: more in src/sys/arch


# 1.280 15-Apr-2009 elad

Remove a few KAUTH_GENERIC_ISSUSER in favor of more descriptive
alternatives.

Discussed on tech-kern:

http://mail-index.netbsd.org/tech-kern/2009/04/11/msg004798.html

Input from ad@, christos@, dyoung@, tsutsui@.

Okay ad@.


# 1.279 18-Mar-2009 cegger

bcopy -> memcpy


Revision tags: nick-hppapmap-base2
# 1.278 19-Jan-2009 christos

branches: 1.278.2;
Provide compatibility to the old timeval SCM_TIMESTAMP messages.


Revision tags: mjf-devfs2-base
# 1.277 17-Dec-2008 cegger

kill MALLOC and FREE macros.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.276 23-Nov-2008 rmind

ip_input: fix an IPQ "lock" leak. (hi <matt>!)


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4
# 1.275 04-Oct-2008 pooka

branches: 1.275.2; 1.275.4;
POOL_INIT -> pool_init


Revision tags: wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.274 05-Sep-2008 seanb

Wrong route being consulted in one place
in ip_forward() after change to rtcache_*().
Restore previous behaviour.


# 1.273 20-Aug-2008 matt

Make the sysctl routines take out softnet_lock before dealing with
any data structures.

Change inet6ctlerrmap and zeroin6_addr to const.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base
# 1.272 05-May-2008 ad

branches: 1.272.2; 1.272.6;
- Convert hashinit() to use kmem_alloc(). The hash tables can be large
and it's better to not have them in kmem_map.
- Convert a couple of minor items along the way to kmem_alloc().
- Fix some memory leaks.


# 1.271 04-May-2008 thorpej

Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577


# 1.270 02-May-2008 ad

PR kern/38497 Out of memory allocating ksiginfo

Work around: don't acquire softnet_lock in protocol drain routines.


# 1.269 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.268 24-Apr-2008 ad

branches: 1.268.2;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.


# 1.267 23-Apr-2008 thorpej

Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.266 12-Apr-2008 thorpej

branches: 1.266.2;
Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.


# 1.265 09-Apr-2008 thorpej

- ipflow is not used outside ip_flow.c; move its definition there.
- Make ipflow_reap() private to ip_flow.c, and introduce ipflow_prune()
for external callers to use (avoids returning an ipflow * that is never
actually used anyway).


# 1.264 07-Apr-2008 thorpej

Change IP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ipstat structure; old netstat
binaries will continue to work properly.


# 1.263 27-Mar-2008 cube

- Make sure we send a reasonable fragment size when IPSEC is configured.
Otherwise we end up sending a dubious "0" whenever we cannot find a
proper association for the packet.
- Reset sack_newdata along with snd_nxt to avoid improper integer
arithmetics that lead to sending data from an incorrect place in the
stream, making it appear as corrupted.

Patch by Michael Van Elst, based on an analysis by Michael for the IPSEC
stuff and I for the SACK issue.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.262 06-Feb-2008 matt

branches: 1.262.6;
Add a new ip_id generation scheme based on a Fisher-Yates shuffle over a
sliding window. XXX replace use of arc4random RSN.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.261 14-Jan-2008 dyoung

Use rtcache_validate() instead of rtcache_getrt(). Shorten staircase
in in_losing().


Revision tags: vmlocking2-base3 matt-armv6-base
# 1.260 22-Dec-2007 matt

Fix offset calculation.
Make sure that all frags use the same TOS.


# 1.259 21-Dec-2007 matt

Also make sure the first is at 68 bytes long.


# 1.258 21-Dec-2007 matt

Prevent TCP blind data attacks by not allowing non-initial fragments to
start at less than 68 bytes (minimal fragment size).


# 1.257 20-Dec-2007 dyoung

Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.256 26-Nov-2007 yamt

branches: 1.256.2; 1.256.6;
inetctlerrmap: use designated initializer.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.255 09-Nov-2007 kefren

Don't MCLAIM in ipintr() because we do it anyway in ip_input()


Revision tags: jmcneill-base yamt-x86pmap-base4 yamt-x86pmap-base3 yamt-x86pmap-base2 vmlocking-base
# 1.254 02-Oct-2007 dyoung

branches: 1.254.2; 1.254.4;
Delete the unused second argument to ip_stripoptions(), move it
closer to its single caller in if_eon.c, try to move fewer bytes
by moving the IP header forward instead of moving the tail of the
mbuf backward, and use m_adj(9) instead of fiddling directly with
mbuf data members.


Revision tags: yamt-x86pmap-base
# 1.253 11-Sep-2007 degroote

branches: 1.253.2;
In some FAST_IPSEC, spl level is not restored correctly. Fix that.

Spotted by Wolfgang Stukenbrock in pr/36800


Revision tags: nick-csl-alignment-base5
# 1.252 30-Aug-2007 dyoung

Use malloc(9) for sockaddrs instead of pool(9), and remove dom_sa_pool
and dom_sa_len members from struct domain. Pools of fixed-size
objects are too rigid for sockaddr_dls, whose size can vary over
a wide range.

Return sockaddr_dl to its "historical" size. Now that I'm using
malloc(9) instead of pool(9) to allocate sockaddr_dl, I can create
a sockaddr_dl of any size in the kernel, so expanding sockaddr_dl
is useless.

Avoid using sizeof(struct sockaddr_dl) in the kernel.

Introduce sockaddr_dl_alloc() for allocating & initializing an
arbitrary sockaddr_dl on the heap.

Add an argument, the sockaddr length, to sockaddr_alloc(),
sockaddr_copy(), and sockaddr_dl_setaddr().

Constify: LLADDR() -> CLLADDR().

Where the kernel overwrites LLADDR(), use sockaddr_dl_setaddr(),
instead. Used properly, sockaddr_dl_setaddr() will not overrun
the end of the sockaddr.


# 1.251 10-Aug-2007 dyoung

branches: 1.251.2;
Use sockaddr_dl_init().


Revision tags: matt-mips64-base
# 1.250 19-Jul-2007 dyoung

branches: 1.250.4; 1.250.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base yamt-idlelwp-base8 mjf-ufs-trans-base
# 1.249 02-May-2007 dyoung

branches: 1.249.2;
Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


Revision tags: thorpej-atomic-base
# 1.248 25-Mar-2007 liamjfoy

Add net.inet.ip.hashsize to control the IPv4 fast forward hash table size.


# 1.247 24-Mar-2007 liamjfoy

Don't call ip*flow_reap if we're just looking up maxflows


# 1.246 12-Mar-2007 ad

branches: 1.246.2; 1.246.4;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.


# 1.245 05-Mar-2007 liamjfoy

branches: 1.245.2;
Move ipflow_slowtimo from ip_slowtimo and into in_proto.c

ok matt@


# 1.244 04-Mar-2007 christos

Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.243 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.242 29-Jan-2007 dyoung

branches: 1.242.2;
Cosmetic: remove extraneous, non-KNF parentheses. Change a
sizeof(type) to a sizeof(*ptr) so the correctness of the statement
is correct "at a glance" (or so I hope).


# 1.241 22-Dec-2006 ad

ipintr(): check if the queue is empty before looping. Hardly a giant
win, but removed 30% of splnet() calls in one local test.


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.240 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.239 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


# 1.238 06-Dec-2006 dyoung

KNF.


# 1.237 06-Dec-2006 dyoung

KNF.


Revision tags: netbsd-4-0-RC1 netbsd-4-base
# 1.236 16-Nov-2006 christos

branches: 1.236.2; 1.236.4;
__unused removal on arguments; approved by core.


Revision tags: yamt-splraiseipl-base2
# 1.235 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


# 1.234 10-Oct-2006 dogcow

change the MOWNER_INIT define to take two args; fix extant struct mowner
decls to use it. Makes options MBUFTRACE compile again and not whinge about
missing structure declarations. (Also makes initialization consistent.)


# 1.233 05-Oct-2006 tls

Protect calls to pool_put/pool_get that may occur in interrupt context
with spl used to protect other allocations and frees, or datastructure
element insertion and removal, in adjacent code.

It is almost unquestionably the case that some of the spl()/splx() calls
added here are superfluous, but it really seems wrong to see:

s=splfoo();
/* frob data structure */
splx(s);
pool_put(x);

and if we think we need to protect the first operation, then it is hard
to see why we should not think we need to protect the next. "Better
safe than sorry".

It is also almost unquestionably the case that I missed some pool
gets/puts from interrupt context with my strategy for finding these
calls; use of PR_NOWAIT is a strong hint that a pool may be used from
interrupt context but many callers in the kernel pass a "can wait/can't
wait" flag down such that my searches might not have found them. One
notable area that needs to be looked at is pf.

See also:

http://mail-index.netbsd.org/tech-kern/2006/07/19/0003.html
http://mail-index.netbsd.org/tech-kern/2006/07/19/0009.html


# 1.232 19-Sep-2006 elad

Remove ugly (void *) casts from network scope authorization wrapper and
calls to it.

While here, adapt code for system scope listeners to avoid some more
casts (forgotten in previous run).

Update documentation.


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9
# 1.231 13-Sep-2006 elad

branches: 1.231.2;
Don't use KAUTH_RESULT_* where it's not applicable.
Prompted by yamt@.


# 1.230 08-Sep-2006 elad

First take at security model abstraction.

- Add a few scopes to the kernel: system, network, and machdep.

- Add a few more actions/sub-actions (requests), and start using them as
opposed to the KAUTH_GENERIC_ISSUSER place-holders.

- Introduce a basic set of listeners that implement our "traditional"
security model, called "bsd44". This is the default (and only) model we
have at the moment.

- Update all relevant documentation.

- Add some code and docs to help folks who want to actually use this stuff:

* There's a sample overlay model, sitting on-top of "bsd44", for
fast experimenting with tweaking just a subset of an existing model.

This is pretty cool because it's *really* straightforward to do stuff
you had to use ugly hacks for until now...

* And of course, documentation describing how to do the above for quick
reference, including code samples.

All of these changes were tested for regressions using a Python-based
testsuite that will be (I hope) available soon via pkgsrc. Information
about the tests, and how to write new ones, can be found on:

http://kauth.linbsd.org/kauthwiki

NOTE FOR DEVELOPERS: *PLEASE* don't add any code that does any of the
following:

- Uses a KAUTH_GENERIC_ISSUSER kauth(9) request,
- Checks 'securelevel' directly,
- Checks a uid/gid directly.

(or if you feel you have to, contact me first)

This is still work in progress; It's far from being done, but now it'll
be a lot easier.

Relevant mailing list threads:

http://mail-index.netbsd.org/tech-security/2006/01/25/0011.html
http://mail-index.netbsd.org/tech-security/2006/03/24/0001.html
http://mail-index.netbsd.org/tech-security/2006/04/18/0000.html
http://mail-index.netbsd.org/tech-security/2006/05/15/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/01/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/25/0000.html

Many thanks to YAMAMOTO Takashi, Matt Thomas, and Christos Zoulas for help
stablizing kauth(9).

Full credit for the regression tests, making sure these changes didn't break
anything, goes to Matt Fleming and Jaime Fournier.

Happy birthday Randi! :)


Revision tags: yamt-pdpolicy-base8 rpaulo-netinet-merge-pcb-base
# 1.229 30-Aug-2006 christos

branches: 1.229.2;
fix initializer


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.228 30-Jul-2006 elad

ugh.. more stuff that's overdue and should not be in 4.0: remove the
sysctl(9) flags CTLFLAG_READONLY[12]. luckily they're not documented
so it's only half regression.

only two knobs used them; proc.curproc.corename (check added in the
existing handler; its CTLFLAG_ANYWRITE, yay) and net.inet.ip.forwsrcrt,
that got its own handler now too.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base
# 1.227 07-Jun-2006 kardel

merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base simonb-timecounters-base
# 1.226 08-May-2006 liamjfoy

branches: 1.226.2;
#if -> #ifdef

ok christos


# 1.225 15-Apr-2006 christos

Coverity CID 1134: Protect against NULL deref.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.224 18-Feb-2006 joerg

branches: 1.224.2; 1.224.4; 1.224.6;
Print the source and destination IP in ip_forward's DIAGNOSTIC code
with inet_ntoa, making it more human friendly.

From Liam J. Foy in private mail.


# 1.223 24-Dec-2005 perry

branches: 1.223.2; 1.223.4; 1.223.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.222 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.221 01-Nov-2005 christos

Don't decrement the ttl, until we are sure that we can forward this packet.
Before if there was no route, we would call icmp_error with a datagram
packet that has an incorrect checksum. (From Liam Foy)


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base
# 1.220 23-Oct-2005 christos

No need to pass an interface when only the mtu is needed. From OpenBSD via
Liam Foy.


Revision tags: yamt-vop-base
# 1.219 05-Aug-2005 elad

branches: 1.219.2;
Add sysctls for IP, ICMP, TCP, and UDP statistics.


# 1.218 28-Jun-2005 seanb

branches: 1.218.2;
- Return ICMP_UNREACH_NET when no route found as per
section 4.3.3.1 of rfc1812.


# 1.217 09-Jun-2005 atatat

Properly fix the constipated lossage wrt -Wcast-qual and the sysctl
code. I know it's not the prettiest code, but it seems to work rather
well in spite of itself.


# 1.216 01-Jun-2005 blymn

Unconstify rnode to prevent compile error when GATEWAY option set.


Revision tags: kent-audio2-base
# 1.215 29-Apr-2005 yamt

move decl of inetsw to its own header to avoid array of incomplete type.
found by gcc4. reported by Adam Ciarcinski.


# 1.214 18-Apr-2005 yamt

fix problems related to loopback interface checksum omission. PR/29971.

- for ipv4, defer decision to ip layer as h/w checksum offloading does
so that it can check the actual interface the packet is going to.
- for ipv6, disable it.
(maybe will be revisited when it implements h/w checksum offloading.)

ok'ed by Jason Thorpe.


# 1.213 29-Mar-2005 yamt

ip_reass: clear stale csum_flags.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base
# 1.212 26-Feb-2005 perry

branches: 1.212.2;
nuke trailing whitespace


Revision tags: yamt-km-base2
# 1.211 03-Feb-2005 perry

ANSIfy function declarations


# 1.210 02-Feb-2005 perry

de-__P -- will ANSIfy .c files later.


Revision tags: yamt-km-base
# 1.209 24-Jan-2005 matt

branches: 1.209.2;
Add IFNET_FOREACH and IFADDR_FOREACH macros and start using them.


Revision tags: kent-audio1-beforemerge
# 1.208 19-Dec-2004 christos

branches: 1.208.2;
yamt's changes seem to fix all the checksumming issues. Turn the loopback
checksums back off so we can make sure that everything works.


# 1.207 17-Dec-2004 christos

Turn checksumming on loopback back on until we fix the bugs in it.
Connect over tcp on the loopback is broken:

4729 amq 0.000007 CALL connect(4,0x804f2a0,0x1c)
4729 amq 75.007420 RET connect -1 errno 60 Connection timed out


# 1.206 15-Dec-2004 thorpej

Don't perform checksums on loopback interfaces. They can be reenabled with
the net.inet.*.do_loopback_cksum sysctl.

Approved by: groo


Revision tags: kent-audio1-base
# 1.205 06-Oct-2004 darrenr

Add a comment to document what setting "srcrt" is really on about in ipintr()


# 1.204 29-Sep-2004 christos

PR/27081: Sean Boudreau: ip_input() bad csum count not incremented on sw csum


Revision tags: BEFORE-IPF413
# 1.203 25-May-2004 atatat

Sysctl descriptions under net subtree (net.key not done)


# 1.202 02-May-2004 darrenr

at line 543, we do a pullup here of hlen bytes into the mbuf,
so these later ones are superfluous.


# 1.201 01-May-2004 matt

Use EVCNT_ATTACH_STATIC{,2}


# 1.200 25-Apr-2004 simonb

Initialise (most) pools from a link set instead of explicit calls
to pool_init. Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.

Convert struct session, ucred and lockf to pools.


# 1.199 22-Apr-2004 matt

Constify protosw arrays. This can reduce the kernel .data section by
over 4K (if all the network protocols) are loaded.


# 1.198 01-Apr-2004 matt

In ip_reass_ttl_descr, make i signed since it's compared to >= 0


Revision tags: netbsd-2-0-base BEFORE-IPF411
# 1.197 24-Mar-2004 atatat

branches: 1.197.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.


# 1.196 15-Jan-2004 itojun

correct typo in 1.94 -> 1.95. pointed out by Shiva Shenoy


# 1.195 14-Dec-2003 thorpej

Fix syntax errors in CHECK_NMBCLUSTER_PARAMS().


# 1.194 14-Dec-2003 jonathan

Second part of hashed IP_reassembly changes:

When under pressure for mbufs or we have too many fragments in the IP
reassembly queue, drop half of all fragments. This multiplicative-drop
strategy ensures we return to a healthy state, even under borderline
denial-of-service from extremely lossy NFS-over-UDP peers.
The multiplicative-drop phase currently drops 50% of fragments, but
has pre-placed support for implementing drop-fractions other than 50%

The threshhold for the `drop-half' phase is the new variable,
ip_maxfrags which is calculated as nmbclusters/4.

ip_input.c now keeps ip_nmbclusters, a cached copy of nmbclusters.
Before using limits derived from nmbclusters, we check if nmbclusters
and ip_nmclusters are equal. If not, we recompute Ip parameters
derived from nmbclusters. Based on a suggestion by Jason Thorpe.
ip_maxfrags is currently auto-recalcuated.

The counters ip_nfrags and ip_nfragpacketsr are now declared static
and uninitialized (bss), to discourage tampering with them.


# 1.193 12-Dec-2003 scw

Make fast-ipsec and ipflow (Fast Forwarding) interoperate.

The idea is that we only clear M_CANFASTFWD if an SPD exists
for the packet. Otherwise, it's safe to add a fast-forward
cache entry for the route.

To make this work properly, we invalidate the entire ipflow
cache if a fast-ipsec key is added or changed.


# 1.192 08-Dec-2003 jonathan

Add new field ipq_nfrags to struct ipq. Maintain count of fragments
(fragments, not fragmented packets) in each queue entry.
Use ipq_nfrags to maintain a count of total fragments in reassembly queue.


# 1.191 07-Dec-2003 jonathan

KNF: s/unsigned/u_int/, in a couple of places I missed.


# 1.190 06-Dec-2003 jonathan

Replace the single global IP reassembly list/listhead, with a
hashtable of list-heads. Independently re-invented, then reworked to
match similar code in FreeBSD.


# 1.189 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.188 04-Dec-2003 scw

ipflow (IP fast forwarding) is not compatible with FAST_IPSEC either.

XXX: The decision whether or not to fast forward should be made
XXX: dynamically. Using the current approach seriously reduces
XXX: routing performance on gateways with IPsec enabled.


# 1.187 26-Nov-2003 itojun

define RANDOM_IP_ID by default (unifdef -DRANDOM_IP_ID).
one use remains in sys/netipsec, which is kept for freebsd source code compat.


# 1.186 24-Nov-2003 scw

For FAST_IPSEC, ipfilter gets to see wire-format IPsec-encapsulated packets
only. Decapsulated packets bypass ipfilter. This mimics current behaviour
for Kame IPsec.


# 1.185 19-Nov-2003 fvdl

Correct number of arguments to sysctl_rdint.


# 1.184 19-Nov-2003 jonathan

Patch back support for (badly) randomized IP ids, by request:

* Include "opt_inet.h" everywhere IP-ids are generated with ip_newid(),
so the RANDOM_IP_ID option is visible. Also in ip_id(), to ensure
the prototype for ip_randomid() is made visible.

* Add new sysctl to enable randomized IP-ids, provided the kernel was
configured with RANDOM_IP_ID. (The sysctl defaults to zero, and is
a read-only zero if RANDOM_IP_ID is not configured).

Note that the implementation of randomized IP ids is still defective,
and should not be enabled at all (even if configured) without
very careful deliberation. Caveat emptor.


# 1.183 17-Nov-2003 jonathan

Diff to netinet/ip_input.c (restore ip_id, initialize) for ip_id fix:

Revert the (default) ip_id algorithm to the pre-randomid algorithm,
due to demonstrated low-period repeated IDs from the randomized IP_id
code. Consensus is that the low-period repetition (much less than
2^15) is not suitable for general-purpose use.

Allocators of new IPv4 IDs should now call the function ip_newid().
Randomized IP_ids is now a config-time option, "options RANDOM_IP_ID".
ip_newid() can use ip_random-id()_IP_ID if and only if configured
with RANDOM_IP_ID. A sysctl knob should be provided.

This API may be reworked in the near future to support linear ip_id
counters per (src,dst) IP-address pair.


# 1.182 12-Nov-2003 itojun

KNF


# 1.181 11-Nov-2003 jonathan

Change global head-of-local-IP-address list from in_ifaddr to
in_ifaddrhead. Recent changes in struct names caused a namespace
collision in fast-ipsec, which are most cleanly fixed by using
"in_ifaddrhead" as the listhead name.


# 1.180 10-Nov-2003 jonathan

Make per-protocol network input queue stats visible to userland via
sysctl. Add a protocol-independent sysctl handler to show the per-protocol
"struct ifq' statistics. Add IP(v4) specific call to the handler.
Other protocols can show their per-protocol input statistics by
allocating a sysclt node and calling sysctl_ifq() with their own struct ifq *.

As posted to tech-kern plus improvements/cleanup suggested by Andrew Brown.


# 1.179 28-Sep-2003 mycroft

Remove some code that breaks AH tunnels completely. The comment describing
the purpose of this code appears to be on crack -- it's talking about
end-to-end authentication, but the purpose of an AH tunnel is NOT end-to-end
authentication; it's authentication of the tunnel endpoints.

NB: This does not fix the fact that IPsec leaks "packet tags."


# 1.178 06-Sep-2003 itojun

randomize IPv4/v6 fragment ID and IPv6 flowlabel. avoids predictability
of these fields. ip_id.c is from openbsd. ip6_id.c is adapted by kame.


# 1.177 06-Sep-2003 itojun

backout previous, we don't know if arc4random() corrides on reboot.


# 1.176 05-Sep-2003 itojun

initialize fragment ID with arc4random, not by time.tv_sec


# 1.175 22-Aug-2003 itojun

remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.


# 1.174 22-Aug-2003 itojun

change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.


# 1.173 15-Aug-2003 jonathan

(fast-ipsec): Add hooks to pass IPv4 IPsec traffic into fast-ipsec, if
configured with ``options FAST_IPSEC''. Kernels with KAME IPsec or
with no IPsec should work as before.

All calls to ip_output() now always pass an additional compulsory
argument: the inpcb associated with the packet being sent,
or 0 if no inpcb is available.

Fast-ipsec tested with ICMP or UDP over ESP. TCP doesn't work, yet.


# 1.172 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.171 14-Jul-2003 itojun

correct igmp. from love


# 1.170 03-Jul-2003 itojun

minor KNF


# 1.169 30-Jun-2003 itojun

branches: 1.169.2;
do not generate ICMP redirect when packet filter alters ip_dst to an
address that reside on the same link. Cedric Berger convinced me that
it is necessary.


# 1.168 30-Jun-2003 itojun

fix indent


# 1.167 23-Jun-2003 martin

Make sure to include opt_foo.h if a defflag option FOO is used.


# 1.166 15-Jun-2003 matt

Change the way multicasts are kept. They now use a hash table in the same
manner as the ifaddr hash table. By doing this, the mkludge code can go
away. At the same time, keep track of what pcbs are using what ifaddr and
when an address is deleted from an interface, notify/abort all sockets
that have that address as a source. Switch IGMP and multicasts to use pools
for allocation. Fix a number of potential problems in the igmp code where
allocation failures could cause a trap/panic.


# 1.165 11-Apr-2003 christos

PR/991: Darren Reed: Add a sysctl (checkinteface) to implement this. This
implementation is taken from FreeBSD, but we default to off.
XXX: We should really do this on a per ifaddr basis as jason suggested.


# 1.164 26-Feb-2003 matt

Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.163 12-Nov-2002 itojun

remove all entries in rt timer queue on ip_mtudisc change, instead of
destroying the queue.


# 1.162 12-Nov-2002 itojun

ckout previous - doesn't compile


# 1.161 12-Nov-2002 itojun

update ip_mtudisc sysctl change handling.


# 1.160 10-Nov-2002 itojun

always create pmtud timeout queue, as ip_mtudisc can be tweaked via
sysctl at runtime. From lha@stacken.kth.se


# 1.159 02-Nov-2002 perry

/*CONTCOND*/ while (0)'ed macros


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.158 23-Sep-2002 itojun

revert mtudisc_timeout value to the old one if update falis


# 1.157 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame.


# 1.156 11-Sep-2002 itojun

correct signedness mixup in pointer passing. sync w/kame


Revision tags: gehenna-devsw-base
# 1.155 14-Aug-2002 itojun

avoid swapping endian of ip_len and ip_off on mbuf, to meet with M_LEADINGSPACE
optimization made last year. should solve PR 17867 and 10195.

IP_HDRINCL behavior of raw ip socket is kept unchanged. we may want to
provide IP_HDRINCL variant that does not swap endian.


# 1.154 30-Jun-2002 thorpej

Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).


# 1.153 13-Jun-2002 itojun

set IPv4 parameter to modern value.
- turn on path MTU discovery (previous: turned off)
- ICMPv4 redirect entry timeout = 600 sec (previous: never timeout)


# 1.152 09-Jun-2002 itojun

whitespace


# 1.151 07-Jun-2002 itojun

look at rmx_mtu on IPsec tunnel MTU computation.
From: David Waitzman <djw@bbn.com>


Revision tags: netbsd-1-6-base
# 1.150 12-May-2002 matt

branches: 1.150.2; 1.150.4;
Eliminate commons.


# 1.149 12-May-2002 wiz

Spelling fixes, from Sergey Svishchev in kern/16650.


# 1.148 07-May-2002 matt

Change struct ipqe to use TAILQ's instead of LIST's (primarily for TCP's
benefit currently). Rework tcp_reass code to optimize the 4 most likely causes
of out-of-order packets: first OoO pkt, next OoO pkt in seq, OoO pkt is part
of new chuck of OoO packets, and the OoO pkt fills the first hole. Add evcnts
to instrument tcp_reass (enabled by the options TCP_REASS_COUNTERS). This is
part 1/2 of tcp_reass changes.


# 1.147 18-Apr-2002 matt

Change test for M_EXT to M_READONLY for MROUTING. We only need to to do
a pullup if we aren't allowed to modify the packet.


Revision tags: eeh-devprop-base newlock-base
# 1.146 08-Mar-2002 thorpej

Pool deals fairly well with physical memory shortage, but it doesn't
deal with shortages of the VM maps where the backing pages are mapped
(usually kmem_map). Try to deal with this:

* Group all information about the backend allocator for a pool in a
separate structure. The pool references this structure, rather than
the individual fields.
* Change the pool_init() API accordingly, and adjust all callers.
* Link all pools using the same backend allocator on a list.
* The backend allocator is responsible for waiting for physical memory
to become available, but will still fail if it cannot callocate KVA
space for the pages. If this happens, carefully drain all pools using
the same backend allocator, so that some KVA space can be freed.
* Change pool_reclaim() to indicate if it actually succeeded in freeing
some pages, and use that information to make draining easier and more
efficient.
* Get rid of PR_URGENT. There was only one use of it, and it could be
dealt with by the caller.

From art@openbsd.org.


Revision tags: ifpoll-base
# 1.145 25-Feb-2002 itojun

correctly enforce ipsec policy check on forwarding case.
From: Greg Troxel <gdt@ir.bbn.com>, Bill Chiarchiaro <wjc@work.cleartech.com>


# 1.144 24-Feb-2002 martin

Clear M_BCAST and M_MCAST on outgoing mbufs.
Don't copy ttl from the inner packet to the encapsulating packet. Make
the outer ttl sysctl'able. This should close PR 14269 from Jasper Wallace
(change partly from there) and it makes traceroute work over gre tunnels.


# 1.143 21-Feb-2002 itojun

suppress source quence message, based on router-req RFC (also could be abused
as DoS traffic generator). from kjc/kame


# 1.142 28-Nov-2001 darrenr

recompute hlen after calling pfil_run_hooks() in case ip_hl was changed.


# 1.141 13-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-mips-cache-base
# 1.140 04-Nov-2001 matt

Convert netinet to not use the internal <sys/queue.h> field names
but instead the access macros. Use the FOREACH macros where appropriate.


# 1.139 04-Nov-2001 matt

Change a few variable/tables to const since they are read-only.


# 1.138 29-Oct-2001 simonb

Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.137 17-Sep-2001 thorpej

branches: 1.137.2;
Split the pre-computed ifnet checksum flags into Tx and Rx directions.
Add capabilities bits that indicate an interface can only perform
in-bound TCPv4 or UDPv4 checksums. There is at least one Gig-E chip
for which this is true (Level One LXT-1001), and this is also the
case for the Intel i82559 10/100 Ethernet chips.


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.136 06-Aug-2001 itojun

branches: 1.136.2;
cache IPsec policy on in6?pcb. most of the lookup operations can be bypassed,
especially when it is a connected SOCK_STREAM in6?pcb. sync with kame.


# 1.135 02-Jun-2001 thorpej

branches: 1.135.2;
Implement support for IP/TCP/UDP checksum offloading provided by
network interfaces. This works by pre-computing the pseudo-header
checksum and caching it, delaying the actual checksum to ip_output()
if the hardware cannot perform the sum for us. In-bound checksums
can either be fully-checked by hardware, or summed up for final
verification by software. This method was modeled after how this
is done in FreeBSD, although the code is significantly different in
most places.

We don't delay checksums for IPv6/TCP, but we do take advantage of the
cached pseudo-header checksum.

Note: hardware-assisted checksumming defaults to "off". It is
enabled with ifconfig(8). See the manual page for details.

Implement hardware-assisted checksumming on the DP83820 Gigabit Ethernet,
3c90xB/3c90xC 10/100 Ethernet, and Alteon Tigon/Tigon2 Gigabit Ethernet.


# 1.134 21-May-2001 lukem

fix spelo in comment


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.133 16-Apr-2001 itojun

give a default value to net.inet.ip.maxfragpackets, to protect us from
"lots of fragmented packets" DoS attack.

the current default value is derived from ipv6 counterpart, which is
a magical value "200". it should be enough for normal systems, not sure
if it is enough when you take hundreds of thousands of tcp connections on
your system. if you have proposal for a better value with concrete reasons,
let me know.


# 1.132 13-Apr-2001 thorpej

Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.


# 1.131 27-Mar-2001 itojun

net.inet.ip.maxfragpackets defines the maximum size of ip reass queue
(prevents fragment flood from chewing up mbuf memory space).
derived from KAME net.inet6.ip6.maxfragpackets.


# 1.130 02-Mar-2001 itojun

branches: 1.130.2;
increase ipstat.ips_badaddr if the packet fails to pass address checks.


# 1.129 02-Mar-2001 itojun

reject packets with 127/8 on IPv4 src/dst, they must not appear on wire
(RFC1122). torture-tests will be welcomed.
XXX do we want to check source routing headers as well?


# 1.128 01-Mar-2001 itojun

make sure to enforce inbound ipsec policy checking, for any protocols on top
of ip (check it when final header is visited). sync with kame.
XXX kame team will need to re-check policy engine code


# 1.127 24-Jan-2001 itojun

- record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation


# 1.126 28-Dec-2000 thorpej

Back out the sledgehammer damage applied by wiz while I was out for
the holiday.


# 1.125 25-Dec-2000 wiz

Back out previous change. It causes NAT to fail, and was CLEARLY
NOT TESTED before it was committed.


# 1.124 22-Dec-2000 thorpej

Slight adjustment to how pfil_head's are registered. Instead of a
"key" and a "dlt", use a "type" (PFIL_TYPE_{AF,IFNET} for now) and
a val/ptr appropriate for that type. This allows for more future
flexibility with the pfil_hook mechanism.


# 1.123 14-Dec-2000 thorpej

Add ALTQ glue. XXX Temporary until ALTQ is changed to use a pfil hook.


# 1.122 24-Nov-2000 itojun

IFA_STATS stability (not complete); don't touch ip if it is NULL.


# 1.121 11-Nov-2000 thorpej

Restructure the PFIL_HOOKS mechanism a bit:
- All packets are passed to PFIL_HOOKS as they come off the wire, i.e.
fields in protocol headers in network order, etc.
- Allow for multiple hooks to be registered, using a "key" and a "dlt".
The "dlt" is a BPF data link type, indicating what type of header is
present.
- INET and INET6 register with key == AF_INET or AF_INET6, and
dlt == DLT_RAW.
- PFIL_HOOKS now take an argument for the filter hook, and mbuf **,
an ifnet *, and a direction (PFIL_IN or PFIL_OUT), thus making them
less IP (really, IP Filter) centric.

Maintain compatibility with IP Filter by adding wrapper functions for
IP Filter.


# 1.120 08-Nov-2000 ad

Update for hashinit() change.


# 1.119 13-Oct-2000 itojun

make sure we don't share external mbuf between m and mcopy, in ip_forward().
should solve PR 11201.


# 1.118 26-Aug-2000 itojun

make sure anonport{min,max} is not negative number


# 1.117 25-Aug-2000 tron

Add new sysctl variables "net.inet.ip.lowportmin" and
"net.inet.ip.lowportmax" which can be used to the set minimum
and maximum port number assigned to sockets using
IP_PORTRANGE_LOW.


# 1.116 06-Jul-2000 itojun

remove unnecessary #include <netkey/key_debug.h>. from kame.


# 1.115 28-Jun-2000 mrg

<vm/vm.h> -> <uvm/uvm_extern.h>


Revision tags: netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.114 10-May-2000 itojun

branches: 1.114.4;
add missing boundary checks to ip options processing.
correct timestamp option validation (len and ptr upper/lower bound
based on RFC791).
fill "pointer" field for parameter problem in timestamp option processing.


# 1.113 10-May-2000 itojun

correct more out-of-bounds memory access, if cnt == 1 and optlen > 1.


# 1.112 06-May-2000 sommerfeld

Handle large offsets with very small options correctly.


# 1.111 31-Mar-2000 jdolecek

Slighly improve previous - only include <netinet/ip_mroute.h> if MROUTING
is defined.


# 1.110 31-Mar-2000 jdolecek

include <netinet/ip_mroute.h> for ip_mforward() - needed after
last duplicate prototype sweep (prototype for ip_mforward() used to be in <netinet/ip_var.h>)


# 1.109 30-Mar-2000 augustss

Remove register declarations.


# 1.108 30-Mar-2000 simonb

Delete uninitialised declaration of ip_defttl - there's an initialised
decl earlier in this file.


# 1.107 10-Mar-2000 thorpej

Back out previous, and adjust a comment.


# 1.106 07-Mar-2000 thorpej

Back out part of 1.104 which isn't actually needed.


# 1.105 03-Mar-2000 itojun

remove unnecessary ttl initialization which I mistakingly bringed in
during KAME merge (this is part of WIDE's expeirmental reass code...)
NetBSD PR: 9412
From: Wolfgang Rupprecht <wolfgang@wsrcc.com>
Fix from: ho@crt.se
itojun was notified from: theo


# 1.104 02-Mar-2000 thorpej

Avoid a bug in GCC which manifests itself when processing unaligned
IP options. Problem pointed out by Matt Hargett and Erik Fair, analyzed
by me.


# 1.103 01-Mar-2000 itojun

introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)


# 1.102 20-Feb-2000 darrenr

pass "struct pfil_head *" to pfil_add_hook and pfil_remove hook rather
than "struct protosw *".


# 1.101 17-Feb-2000 darrenr

Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.


# 1.100 16-Feb-2000 itojun

- if ip_dst matches address on !IFF_UP interface, and
- there's no match against addresses on IFF_UP interface,
send icmp unreach if I'm router. drop it if I'm host.

Revised version of PR: 9387 from nrt@iij.ad.jp. Discussed with thorpej+nrt.


Revision tags: chs-ubc2-newbase
# 1.99 12-Feb-2000 thorpej

Typo (Thanks, Havard :-)


# 1.98 12-Feb-2000 thorpej

Small cosmetic change, and note a place where a statistic should be
gathered.


# 1.97 11-Feb-2000 itojun

fix in-kernel packet forwarding loop (till TTL becomes 0) when:
- a packet is delivered to an address X,
- and the address X is configured on my !IFF_UP interface
- and ipforwarding=1

NetBSD PR: 9387
From: nrt@iij.ad.jp


# 1.96 01-Feb-2000 thorpej

Use ifatoia() and sintosa() consistently, rather than using home-grown
casting macros intermixed.


# 1.95 31-Jan-2000 itojun

bring in latest KAME ipsec tree.
- interop issues in ipcomp is fixed
- padding type (after ESP) is configurable
- key database memory management (need more fixes)
- policy specification is revisited

XXX m->m_pkthdr.rcvif is still overloaded - hope to fix it soon


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.94 26-Oct-1999 itojun

disable ipflow (IPv4 fast fowarding) when IPsec is configured into the kernel.


# 1.93 17-Oct-1999 sommerfeld

branches: 1.93.2; 1.93.4;
In ip_forward():

Avoid forwarding ip unicast packets which were contained inside
link-level multicast packets; having M_MCAST still set in the packet
header flags will mean that the packet will get multicast to a bogus
group instead of unicast to the next hop.

Malformed packets like this have occasionally been spotted "in the
wild" on a mediaone cable modem segment which also had multiple netbsd
machines running as router/NAT boxes.

Without this, any subnet with multiple netbsd routers receiving all
multicasts will generate a packet storm on receipt of such a
multicast. Note that we already do the same check here for link-level
broadcasts; ip6_forward already does this as well.

Note that multicast forwarding does not go through ip_forward().

Adding some code to if_ethersubr to sanity check link-level
vs. ip-level multicast addresses might also be worthwhile.


Revision tags: chs-ubc2-base
# 1.92 23-Jul-1999 itojun

branches: 1.92.2;
do not include unnecessary include files.


# 1.91 09-Jul-1999 thorpej

defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).


# 1.90 06-Jul-1999 itojun

sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups


# 1.89 01-Jul-1999 itojun

IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.88 26-Jun-1999 sommerfeld

If the new global variable hostzerobroadcast is zero, no longer assume
address zero of each net/subnet is a broadcast address.
(The default value is nonzero, which preserves the current behavior).

This can be set using sysctl; the boot-time default can also be
configured using the HOSTZEROBROADCAST kernel config option.

While we're here, defopt HOSTZEROBROADCAST and SUBNETSARELOCAL


# 1.87 04-May-1999 hwr

It does not make much sense to increase a "output" counter on input.


# 1.86 03-May-1999 thorpej

In INADDR_TO_IA(), skip interfaces which are not up. Revert previous change
to ip_input.c to check the interface status after INADDR_TO_IA().

Fix cooked up by Heiko Rupp and myself.

Fixes PR 7480.


# 1.85 03-May-1999 hwr

Drop packets, that have a Class-D address as source address.
Implements the first half of PR 7003.


# 1.84 07-Apr-1999 proff

tiny KNF change


# 1.83 07-Apr-1999 proff

Prevent reception of packets on downed interfaces (via an up interface).
fixes kern/7327


Revision tags: netbsd-1-4-base
# 1.82 27-Mar-1999 aidan

branches: 1.82.2;
Added per-addr input/output statistics. Currently just support netatalk
and netinet, currently only tested under netinet.

Disabled by default, enabled by compiling the kernel with option
IFA_STATS. Enabling this feature seems to make the ip_output function
take 13% longer than before, which should be OK for people that need
this feature.


# 1.81 26-Mar-1999 proff

security: test for ip_len < ip_hl <<2 and drop packet accordingly


# 1.80 19-Jan-1999 mycroft

There's just no plausible reason to byte-swap ip_id internally. It's opaque.


# 1.79 19-Jan-1999 mycroft

Don't screw with ip_len; just subtract from it where we actually use the
value.


# 1.78 19-Jan-1999 mycroft

Don't overwrite the checksum fields when checking them. There's no reason to
do this, and it screws up ICMP replies.
XXX The returned IP checksum and length are still wrong.


# 1.77 11-Jan-1999 thorpej

Fix byte order and ip_len inconsistencies in ICMP reply code. Also, fix
some formatting and HTONS(foo) vs. foo = htons(foo) inconsistencies.

PR #6602, Darren Reed.


# 1.76 19-Dec-1998 thorpej

Reverse the copyright-notice-swap. It went against existing practice.


# 1.75 18-Dec-1998 thorpej

Add a lock around the IP fragment reassembly queue, to prevent ip_drain()
from corrupting the queue if called from a device's interrupt context.

Should fix PR #5684.


Revision tags: kenh-if-detach-base
# 1.74 13-Nov-1998 thorpej

branches: 1.74.2;
Once a fragmented IP packet has been reassembled, recompute the packet
length before passing it up the stack. From FreeBSD.


Revision tags: chs-ubc-base
# 1.73 08-Oct-1998 thorpej

Use the pool allocator for ipflow entries.


# 1.72 08-Oct-1998 thorpej

Use the pool allocator for ipqent structures.


# 1.71 30-Sep-1998 tls

Switch order of TNF and UCB copyrights so UCB copyright is first; this seems more appropriate since UCB wrote the original code, after all.


# 1.70 09-Sep-1998 thorpej

Make a diagnostic printf more sensible, PR #5951, Heiko W. Rupp.


# 1.69 09-Aug-1998 mrg

defopt PFIL_HOOKS.


Revision tags: eeh-paddr_t-base
# 1.68 17-Jul-1998 sommerfe

Fix PR5508: ipfil cut-through forwarding causes panic


# 1.67 01-Jun-1998 thorpej

Protect the ipflow_reap() call with splsoftnet.


# 1.66 24-May-1998 thorpej

Fix OBOB in IP timestamp option processing, as noted in FreeBSD PR 6738,
from Jennifer Dawn Meyers <jdm@enteract.com>.


# 1.65 04-May-1998 matt

Default IP flow to being enabled. Add a sysctl to control the maximum
number of flows (net.inet.ip.maxflows). If set to 0, will disable fast
path forwarding.


# 1.64 01-May-1998 thorpej

Allow packet filters to prevent a packet from creating a fast-forwarding
flow, by setting the "can fast forward" flag in the packet header, and
giving a chance for filters to clear the flag. If the flag is still
set after the filters have given it a chance, the packet will be used
to create a fast-forward flow entry.


# 1.63 29-Apr-1998 matt

Add support for "fast" forwarding. Add hooks in if_ethersubr.c and
if_fddisubr.c to fastpath IP forwarding. If ip_forward successfully
forwards a packet, it will create a cache (ipflow) entry. ether_input
and fddi_input will first call ipflow_fastforward with the received
packet and if the packet passes enough tests, it will be forwarded (the
ttl is decremented and the cksum is adjusted incrementally).


# 1.62 29-Apr-1998 matt

defopt GATEWAY


# 1.61 29-Apr-1998 kml

change path MTU timeout value to match RFC 1191


# 1.60 29-Apr-1998 kml

Add support for deletion of routes added by path MTU discovery;
uses new generic route timeout code. Add sysctl for timeout period.


# 1.59 19-Mar-1998 mrg

convert pfil(9) in and out lists from <sys/queue.h> LISTs to TAILQs, and
change pfil_add_hook to put output filters at the tail of the queue,
while continuing to place input filters at the head of the queue. update
the two users of these functions, and document these changes.

fixes PR#4593.


# 1.58 15-Feb-1998 tls

Add correct copyright notice for IP address hash change. This code is donated to TNF by the original copyright holder, Panix.


# 1.57 13-Feb-1998 tls

Change list of interface IP addresses to a hash. Improves performance on hosts with a large number of IP addresses significantly.


# 1.56 28-Jan-1998 thorpej

Use offsetof() from libkern.h


# 1.55 12-Jan-1998 scottr

Use option header file for MROUTING


# 1.54 05-Jan-1998 lukem

enhance ephemeral port allocation code:
* support sysctl net.inet.ip.anonportmin (lowest ephemeral port)
and net.inet.ip.anonportmax (highest ephemeral port).
these can't be set to >65535, < IPPORT_RESERVED (unless IPNOPRIVPORTS
is defined), and anonportmin has to be < anonportmax.
* use a cleaner way of only cycling through the available set once;
this will be useful for when a random allocation scheme is used
* define IPPORT_ANON{MIN,MAX} instead of IPPORT_USER{LOW,HIGH}


Revision tags: netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base
# 1.53 18-Oct-1997 kml

branches: 1.53.2;
change sysctl net.inet.icmp.mtudisc to net.inet.ip.mtudisc


# 1.52 17-Oct-1997 thorpej

Allow `subnetsarelocal' to be changed via sysctl.


Revision tags: thorpej-signal-base marc-pcmcia-base
# 1.51 29-Aug-1997 gwr

Tweaks to allow operation with an interface address of 0.0.0.0
(needed for NFS mountroot using BOOTP to get boot parameters)


Revision tags: marc-pcmcia-bp
# 1.50 24-Jun-1997 thorpej

branches: 1.50.4;
Eliminate use of dtom() from the network code, allowing more flexible
use of mbuf external storage and increasing performance (by eliminating
an m_pullup() for clusters in the IP reassembly code).

Changes from Koji Imada <koji@math.human.nagoya-u.ac.jp>, in PR #3628
and #3480, with ever-so-slight integration changes by me.


# 1.49 15-Apr-1997 christos

Move the mtod calls *after* we've made sure that the packet has passed the
filter successfully. Otherwise it can be NULL if the filter blocked it,
and we die. How did this ever work?


Revision tags: is-newarp-before-merge
# 1.48 26-Feb-1997 mrg

allow src-routed packetd by default, per host requirements


# 1.47 25-Feb-1997 cjs

Add net.inet.ip.allowsrcrt option which allows/drops all source
routed packets. This currently defaults to `drop,' but once we
verify that all applications that rely on determining remote IP
addresses for authentication are dropping the connection when they
see a source route option (not just disabling the source route
option), we can turn this back on and conform with the host
requirements.


# 1.46 19-Feb-1997 cjs

Fix bug in sysctl net.inet.ip.forwsrcrt handing: now you can read it
if securelevel > 0. (Thanks, cgd.)


# 1.45 18-Feb-1997 mrg

pseudo-device ipfilter brings in PFIL_HOOKS.


Revision tags: is-newarp-base
# 1.44 11-Jan-1997 thorpej

branches: 1.44.4;
Implement the IP_RECVIF socket option: supply a datagram packet's incoming
interface using a sockaddr_dl in a control mbuf.

Implement SO_TIMESTAMP for IP datagrams.

Move packet information option processing into a generic function
so that they work with multicast UDP and raw IP as well as unicast UDP.

Contributed by Bill Fenner <fenner@parc.xerox.com>.


# 1.43 20-Dec-1996 mrg

in pfil_hooks: always reassign ip after calling hook.


# 1.42 20-Dec-1996 mrg

remove pfil_bad.


# 1.41 25-Oct-1996 thorpej

Before concatenating frags, sanity check the length of the packet. If it's
larger than IP_MAXPACKET, discard it.
Based on a patch from Bill Fenner <fenner@parc.xerox.com>


# 1.40 22-Oct-1996 veego

Fix a panic from the pfil_hooks.


# 1.39 13-Oct-1996 christos

backout previous kprintf changes


# 1.38 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.37 21-Sep-1996 perry

commit fix in pr 2772 -- the IP input code was assuming that the
reserved (must be zero) flag must necessarily be zero. We now define
an IP_RF (by analogy to IP_DF and IP_MF) and mask it out when necessary.


# 1.36 14-Sep-1996 mrg

move the packet filter hooks in to a saner location. while i'm here, rename
PACKET_FILTER to PFIL_HOOKS.


# 1.35 09-Sep-1996 mycroft

Add in_nullhost() and in_hosteq() macros, to hide some protocol
details. Also, fix a bug in TCP wrt SYN+URG packets.


# 1.34 08-Sep-1996 mycroft

Save 68 bytes of the packet for ICMP, not 64. From Laine Stump, PR 2296.


# 1.33 06-Sep-1996 mrg

add packet filter interface code. see pfil(9) for more details. you
need the PACKET_FILTER option to enable this code. currently, ipfilter
version 3.1.1-beta has been converted to use this new interface.


# 1.32 14-Aug-1996 thorpej

Fix some DIAGNOSTIC printf() formats; ntohl() provides a 32-bit quantity,
and should be printed with %x, not %lx.


# 1.31 10-Jul-1996 cgd

print result of ntohl/htonl as a long. (makes -Wformat work on the
Alpha.)


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.30 16-Mar-1996 christos

branches: 1.30.4;
Fix printf format args.


# 1.29 26-Feb-1996 mrg

two more local addr changes, all done differently now (idea from charles)


# 1.28 13-Feb-1996 christos

netinet prototypes


# 1.27 16-Jan-1996 thorpej

Add a net.inet.ip.directed-broadcast sysctl as suggested by
Darren Reed <darrenr@vitruvius.arbld.unimelb.edu.au> in PR #1227.
This change is slightly different than the one submitted by Darren in
that the DIRECTED_BROADCAST compile-time option will behave like it used
to so that existing configurations utilizing it won't have to change.


# 1.26 15-Jan-1996 thorpej

Add net.inet.ip.forwsrcrt: if zero, the system will not forward
source-routed packets. Note this value is protected by kernel security
level; it can only be changed if securelevel < 1.


# 1.25 21-Nov-1995 cgd

make netinet work on systems where pointers and longs are 64 bits
(like the alpha). Biggest problem: IP headers were overlayed with
structure which included pointers, and which therefore didn't overlay
properly on 64-bit machines. Solution: instead of threading pointers
through IP header overlays, add a "queue element" structure to do
the threading, and point it at the ip headers.


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.24 12-Aug-1995 mycroft

splnet --> splsoftnet


# 1.23 12-Jun-1995 mycroft

Change in_pcbnotify*() to take an errno value. Make inetctlerrmap[] an
array on ints, not u_chars.


# 1.22 12-Jun-1995 mycroft

Various cleanup, including:
* Convert several data structures to use queue.h.
* Split in_pcbnotify() into two parts; one for notifying a specific PCB, and
one for notifying all PCBs for a particular foreign address.


# 1.21 07-Jun-1995 mycroft

Remove ip_ifmatrix completely.


# 1.20 04-Jun-1995 mycroft

Don't cast things unnecessarily.


# 1.19 04-Jun-1995 mycroft

Clean up many more casts.


# 1.18 01-Jun-1995 mycroft

Avoid byte-swapping IP addresses at run time.


# 1.17 15-May-1995 cgd

oops; forgot a '{'


# 1.16 14-May-1995 cgd

drop (and record) malformed IP fragments. Fixes pr 1030 (differently).


# 1.15 13-Apr-1995 cgd

be a bit more careful and explicit with types. (basically a large no-op.)


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.14 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.13 13-May-1994 mycroft

Update to 4.4-Lite networking code, with a few local changes.


# 1.12 14-Feb-1994 mycroft

PARANOID --> DIAGNOSTIC for inexpensive tests.


# 1.11 02-Feb-1994 hpeyerl

Multicast is no longer optional.


# 1.10 29-Jan-1994 brezak

Fix some cases of NOT dealing with m_pkthdr's. This code is still suspect though, at least this fixes some panics.


# 1.9 10-Jan-1994 mycroft

Should compile now with or without `options MULTICAST'.


# 1.8 09-Jan-1994 mycroft

Prototype the rest.


# 1.7 08-Jan-1994 mycroft

More prototypes.


# 1.6 08-Jan-1994 mycroft

Fix some inconsistent spacing; spaces at the end of lines, etc.


# 1.5 18-Dec-1993 mycroft

Canonicalize all #includes.


# 1.4 06-Dec-1993 hpeyerl

multicast support.
>From Chris Maeda, cmaeda@cs.washington.edu
These patches are derived from the IP Multicast patches for BSDI.


Revision tags: magnum-base netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.3 20-May-1993 cgd

branches: 1.3.4;
more rcsid additions and file header cleanups


# 1.2 04-May-1993 cgd

make ip_input recursion checking be for -DPARANOID, and make it panic


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.401 08-Mar-2021 christos

remove now unused pseudo-random ip id code.


# 1.400 07-Mar-2021 christos

netinet: Enable random IP fragment ids by default (from riastradh)


# 1.399 19-Feb-2021 christos

- Make ALIGNED_POINTER use __alignof(t) instead of sizeof(t). This is more
correct because it works with non-primitive types and provides the ABI
alignment for the type the compiler will use.
- Remove all the *_HDR_ALIGNMENT macros and asserts
- Replace POINTER_ALIGNED_P with ACCESSIBLE_POINTER which is identical to
ALIGNED_POINTER, but returns that the pointer is always aligned if the
CPU supports unaligned accesses.
[ as proposed in tech-kern ]


# 1.398 14-Feb-2021 christos

- centralize header align and pullup into a single inline function
- use a single macro to align pointers and expose the alignment, instead
of hard-coding 3 in 1/2 the macros.
- fix an issue in the ipv6 lt2p where it was aligning for ipv4 and pulling
for ipv6.


Revision tags: thorpej-futex-base
# 1.397 28-Aug-2020 ozaki-r

inet: reduce silent packet discards


# 1.396 28-Aug-2020 ozaki-r

inet: pull m_get_rcvif_psref out of ip_input for simplicity

Same as ip6_input.


# 1.395 28-Aug-2020 ozaki-r

ipsec: rename ipsec_ip_input to ipsec_ip_input_checkpolicy

Because it just checks if a packet passes security policies.


# 1.394 28-Aug-2020 ozaki-r

inet, inet6: count packets dropped by IPsec

The counters count packets dropped due to security policy checks.


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3 ad-namecache-base2 ad-namecache-base1 ad-namecache-base phil-wifi-20191119
# 1.393 13-Nov-2019 ozaki-r

Get rid of unnecessary NULL checks for rt_ifa and ifa_ifp

They are always non-NULL nowadays.


# 1.392 19-Sep-2019 ozaki-r

Apply some missing changes lost on the previous commit


# 1.391 19-Sep-2019 ozaki-r

Avoid having a rtcache directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.
A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.

Reviewed by knakahara@ and yamaguchi@


# 1.390 15-Sep-2019 bouyer

Packet filters can return an mbuf chain with fragmented headers, so
m_pullup() it if needed and remove the KASSERT()s.


Revision tags: netbsd-9-base phil-wifi-20190609
# 1.389 13-May-2019 ozaki-r

branches: 1.389.2;
Count packets dropped by pfil


Revision tags: isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118
# 1.388 17-Jan-2019 knakahara

Fix ipsecif(4) cannot apply input direction packet filter. Reviewed by ozaki-r@n.o and ryo@n.o.

Add ATF later.


Revision tags: pgoyette-compat-1226 pgoyette-compat-1126
# 1.387 15-Nov-2018 maxv

Remove the 't' argument from m_tag_find().


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.386 02-Sep-2018 maxv

remove reference to ipnat, and duplicate comments


Revision tags: pgoyette-compat-0728
# 1.385 10-Jul-2018 maxv

Remove the second argument from ip_reass_packet(). We want the IP header
on the mbuf, not elsewhere. Simplifies the NPF reassembly code a little.
No real functional change.


Revision tags: phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.384 17-May-2018 maxv

branches: 1.384.2;
Add KASSERTs, related to PR/39794.


# 1.383 14-May-2018 maxv

Merge ipsec4_input and ipsec6_input into ipsec_ip_input. Make the argument
a bool for clarity. Optimize the function: if M_CANFASTFWD is not there
(because already removed by the firewall) leave now.

Makes it easier to see that M_CANFASTFWD is not removed on IPv6.


# 1.382 10-May-2018 maxv

Rename ipsec4_forward -> ipsec_mtu, and switch to void.


Revision tags: pgoyette-compat-0502
# 1.381 26-Apr-2018 maxv

Remove unused mbuf argument from sbsavetimestamp.


Revision tags: pgoyette-compat-0422 pgoyette-compat-0415
# 1.380 15-Apr-2018 maxv

Introduce a m_verify_packet function, that verifies the mbuf chain of a
packet to ensure it is not malformed. Call this function in "points of
interest", that are the IPv4/IPv6/IPsec entry points. There could be more.

We use M_VERIFY_PACKET(m), declared under DIAGNOSTIC only.

This function should not be called everywhere, especially not in places
that temporarily manipulate (and clobber) the mbuf structure; once they're
done they put the mbuf back in a correct format.


# 1.379 11-Apr-2018 maxv

Don't pass IP_ALLOWBROADCAST in ipsec4_input. The flag lands in
ipsec_getpolicybyaddr, and only IP_FORWARDING is taken.

In fact it would be good to change the 'flags' argument of ipsec4_input
to be a boolean, same for ipsec_getpolicybyaddr. It would be less
misleading.


# 1.378 11-Apr-2018 maxv

Add comment about IPsec.


# 1.377 11-Apr-2018 maxv

Small changes in ip_dooptions: replace bcopy by memcpy, the areas can't
overlap.


Revision tags: pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.376 24-Feb-2018 ozaki-r

branches: 1.376.2;
Avoid a deadlock between softnet_lock and IFNET_LOCK

A deadlock occurs because there is a violation of the rule of lock ordering;
softnet_lock is held with hodling IFNET_LOCK, which violates the rule.
To avoid the deadlock, replace softnet_lock in in_control and in6_control
with KERNEL_LOCK.

We also need to add some KERNEL_LOCKs to protect the network stack surely.
This is required, for example, for PR kern/51356.

Fix PR kern/53043


# 1.375 09-Feb-2018 maxv

Remove dead code.


# 1.374 07-Feb-2018 maxv

Remove null check on ip, it can't be null. (Confuses code scanners.)


# 1.373 06-Feb-2018 maxv

Typos and style a bit, no functional change.


# 1.372 05-Feb-2018 maxv

Exterminate IPSENDREDIRECTS and IPMTUDISCTIMEOUT, neither is documented.


# 1.371 05-Feb-2018 maxv

Nuke DIRECTED_BROADCAST, it is not documented and not enabled anywhere. It
probably wouldn't have built correctly anyway, since there is no associated
defflag.

These ten lines of code in ip_input.c already look a lot better.


# 1.370 05-Feb-2018 maxv

Clean up this mess. This is typically the kind of places where we need to
seriously cut the bullshit. These things are unreadable, undocumented, and
all they bought us was not figuring out we had IPv4 forwarding enabled by
default for 20+ years.


# 1.369 05-Feb-2018 maxv

Be tougher, and don't allow LSRR+SSRR (RFC7126).


# 1.368 05-Feb-2018 maxv

Kick duplicate options, they are not allowed (RFC791).


# 1.367 05-Feb-2018 maxv

Remove unused variable.


# 1.366 05-Feb-2018 maxv

Disable ip_allowsrcrt and ip_forwsrcrt. Enabling them by default was a
completely dumb idea, because they have security implications.

By sending an IPv4 packet containing an LSRR option, an attacker will
cause the system to forward the packet to another IPv4 address - and
this way he white-washes the source of the packet.

It is also possible for an attacker to reach hidden networks: if a server
has a public address, and a private one on an internal network (network
which has several internal machines connected), the attacker can send a
packet with:

source = 0.0.0.0
destination = public address of the server
LSRR first address = address of a machine on the internal network

And the packet will be forwarded, by the server, to the internal machine,
in some cases even with the internal IP address of the server as a source.


# 1.365 05-Feb-2018 maxv

Style, no functional change.


# 1.364 01-Jan-2018 christos

1) "#define ipi_spec_dst ipi_addr" in <netinet/in.h>
2) Change the IP_RECVPKTINFO option to control the generation of
IP_PKTINFO control messages, the way it's done in Solaris.
3) Remove the superfluous IP_RECVPKTINFO control message.
4) Change the IP_PKTINFO option to do different things depending on
the parameter it's supplied with:
- If it's sizeof(int), assume it's being used as in Linux:
- If it's non-zero, turn on the IP_RECVPKTINFO option.
- If it's zero, turn off the IP_RECVPKTINFO option.
- If it's sizeof(struct in_pktinfo), assume it's being used as in
Solaris, to set a default for the source interface and/or
source address for outgoing packets on the socket.
5) Return what Linux or Solaris compatible code expects, depending
on data size, and just added a fallback to a Linux (and current NetBSD)
compatible value if the size is unknown (as it is now), or,
in the future, if the calling application specifies a receiving
buffer that doesn't match either data item.

From: Tom Ivar Helbekkmo


Revision tags: tls-maxphys-base-20171202
# 1.363 24-Nov-2017 roy

Allow local communication over DETACHED addresses.
Allow binding to DETACHED or TENTATIVE addresses as we deny
sending upstream from them anyway.
Prefer non DETACHED or TENTATIVE addresses.


# 1.362 17-Nov-2017 ozaki-r

Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch

It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.

No functional change


# 1.361 27-Sep-2017 ozaki-r

Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).


Revision tags: nick-nhusb-base-20170825
# 1.360 27-Jul-2017 ozaki-r

Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.


# 1.359 19-Jul-2017 ozaki-r

Correct a comment


Revision tags: perseant-stdc-iso10646-base
# 1.358 08-Jul-2017 christos

Reorder the controls to the ones that need an interface and the ones that
don't; process the ones that don't first. Add a DIAGNOSTIC if there is no
interface; really this should be a KASSERT/panic because it is a bug if the
interface is not set at this point.


# 1.357 06-Jul-2017 christos

remove unnecessary casts (no functional change)


# 1.356 06-Jul-2017 christos

Merge the two copies SO_TIMESTAMP/SO_OTIMESTAMP processing to a single
function, and add a SOOPT_TIMESTAMP define reducing compat pollution from
5 places to 1.


Revision tags: netbsd-8-base
# 1.355 01-Jun-2017 chs

branches: 1.355.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base
# 1.354 31-Mar-2017 ozaki-r

Don't use a single global variable to store source route information for multiple incoming packets

It's not MP-safe. So use a m_tag to store the information instead.

Pointed out by knakahara@
The fix is from OpenBSD (originally fixed in FreeBSD)


# 1.353 31-Mar-2017 ozaki-r

Don't use a single global variable as a temporal storage for multiple packets

It's not MP-safe. So use local variables instead.


Revision tags: pgoyette-localcount-20170320
# 1.352 06-Mar-2017 ozaki-r

Make sure icmp_redirect_timeout_q and ip_mtudisc_timeout_q are initialized on bootup

Fix PR kern/52029


# 1.351 17-Feb-2017 ozaki-r

Fix return value


# 1.350 17-Feb-2017 ozaki-r

Protect sysctl_net_inet_ip_pmtudto with icmp_mtx instead of softnet_lock


# 1.349 07-Feb-2017 ozaki-r

Add missing NULL checks for m_get_rcvif


Revision tags: nick-nhusb-base-20170204
# 1.348 24-Jan-2017 ozaki-r

Tweak softnet_lock and NET_MPSAFE

- Don't hold softnet_lock in some functions if NET_MPSAFE
- Add softnet_lock to sysctl_net_inet_icmp_redirtimeout
- Add softnet_lock to expire_upcalls of ip_mroute.c
- Restore softnet_lock for in{,6}_pcbpurgeif{,0} if NET_MPSAFE
- Mark some softnet_lock for future work


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107
# 1.347 12-Dec-2016 ozaki-r

branches: 1.347.2;
Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.346 08-Dec-2016 ozaki-r

Use psref for ip_rtaddr

ip_rtaddr will be sleepable soon. So use psref instead of pserialize.


# 1.345 08-Dec-2016 ozaki-r

Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.344 18-Oct-2016 ozaki-r

Don't hold global locks if NET_MPSAFE is enabled

If NET_MPSAFE is enabled, don't hold KERNEL_LOCK and softnet_lock in
part of the network stack such as IP forwarding paths. The aim of the
change is to make it easy to test the network stack without the locks
and reduce our local diffs.

By default (i.e., if NET_MPSAFE isn't enabled), the locks are held
as they used to be.

Reviewed by knakahara@


# 1.343 18-Oct-2016 ozaki-r

Avoid double frees of mbuf

May fix one of panicks reported by Tom Ivar Helbekkmo in PR kern/51522


# 1.342 11-Oct-2016 ozaki-r

Fix kernel builds with IFA_STATS


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.341 07-Sep-2016 roy

Disallow input to detached addresses because they are not yet valid.


# 1.340 31-Aug-2016 ozaki-r

Make ipforward_rt and ip6_forward_rt percpu

Sharing one rtcache between CPUs is just a bad idea.

Reviewed by knakahara@


Revision tags: pgoyette-localcount-20160806
# 1.339 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


# 1.338 26-Jul-2016 ozaki-r

Fix downmatch increment


Revision tags: pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.337 08-Jul-2016 ozaki-r

branches: 1.337.2;
CID 1363344: remove dead code

We may need to reconsider a case when m_get_rcvif_psref returns NULL.


# 1.336 07-Jul-2016 ozaki-r

Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.


# 1.335 06-Jul-2016 ozaki-r

Switch the IPv4 address list to pslist(9)

Note that we leave the old list just in case; it seems there are some
kvm(3) users accessing the list. We can remove it later if we confirmed
nobody does actually.


# 1.334 06-Jul-2016 ozaki-r

Add and use pslist(9)-based hashtable for IPv4 addresses

Note that we leave the old hashtable to keep vmstat -H working.


# 1.333 04-Jul-2016 ozaki-r

Separate IP address matching functions

No functional change intended.


# 1.332 30-Jun-2016 ozaki-r

Tidy up goto lables

No functional change.


# 1.331 30-Jun-2016 ozaki-r

Fix error paths

Some error paths did m_put_rcvif_psref twice.


# 1.330 28-Jun-2016 ozaki-r

Add missing NULL checks for m_get_rcvif_psref


# 1.329 10-Jun-2016 ozaki-r

Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319
# 1.328 21-Jan-2016 riastradh

Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.


# 1.327 21-Jan-2016 riastradh

Give proper prototype to ip_output.


# 1.326 08-Jan-2016 knakahara

eliminate ip_input.c and ip6_input.c dependency on gif(4)


Revision tags: nick-nhusb-base-20151226
# 1.325 13-Oct-2015 roy

Include arp.h to restore the sysctl net.inet.ip.dad_count.
Fixes PR kern/49883 thanks to HITOSHI Osada.


Revision tags: nick-nhusb-base-20150921
# 1.324 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.323 07-Aug-2015 ozaki-r

Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.


Revision tags: nick-nhusb-base-20150606
# 1.322 02-May-2015 joerg

Fix !ARP build.


# 1.321 02-May-2015 roy

Add IPv4 address flags IN_IFF_TENTATIVE, IN_IFF_DUPLICATED and
IN_IFF_DETATCHED to mimic the IPv6 address behaviour.
Add SIOCGIFAFLAG_IN ioctl to retrieve the address flag via the
ifreq structure.
Add IPv4 DAD detection via the ARP methods described in RFC 5227.
Add sysctls net.inet.ip.dad_count and net.inet.arp.debug.

Discussed on tech-net@


Revision tags: nick-nhusb-base-20150406
# 1.320 26-Mar-2015 ozaki-r

Tidy up the regular path of ip_forward

No functional change is intended.


Revision tags: netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.319 16-Jun-2014 ozaki-r

branches: 1.319.2; 1.319.4; 1.319.6; 1.319.10;
Add 3rd argument to pktq_create to pass sc

It will be used to pass bridge sc for bridge_forward softint.

ok rmind@


# 1.318 05-Jun-2014 rmind

- Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.


# 1.317 30-May-2014 christos

Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.


# 1.316 29-May-2014 rmind

Make IGMP and multicast group management code MP-safe. Use a read-write
lock to protect the hash table of multicast address records; also, make it
private and eliminate some macros. In the long term, the lookup path ought
to be optimised.


# 1.315 28-May-2014 christos

CID 12164{49,51}: Remove bogus ifp == NULL checks; if ifp was really NULL,
we would have been dead a few lines before the tests.


# 1.314 23-May-2014 rmind

ip_input(), ip_savecontrol(): cache m->m_pkthdr.rcvif in a variable.


# 1.313 23-May-2014 rmind

Make ip_forward() static, there is no need to expose it.


# 1.312 23-May-2014 rmind

Make ip_input() static, there is no need to expose it.


# 1.311 22-May-2014 rmind

- Add in_init() and move some functions, variables and sysctls into in.c
where they belong to. Make some functions and variables static.
- ip_input.c: reduce some #ifdefs, cleanup a little.
- Move some sysctls into ip_flow.c as they belong there.

No functional change.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.310 19-Mar-2014 liamjfoy

branches: 1.310.2;
Remove ipflow_prune and replace with ipflow_reap. ok rmind@


Revision tags: riastradh-drm2-base3
# 1.309 25-Feb-2014 pooka

Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.308 29-Jun-2013 rmind

- Rewrite parts of pfil(9): use array to store hooks and thus be more cache
friendly (there are only few hooks in the system). Make the structures
opaque and the interface more strict.
- Remove PFIL_HOOKS option by making pfil(9) mandatory.


# 1.307 27-Jun-2013 christos

branches: 1.307.2;
flip src/dst


# 1.306 27-Jun-2013 christos

implement IP_PKTINFO and IP_RECVPKTINFO.


# 1.305 08-Jun-2013 rmind

Split IPsec code in ip_input() and ip_forward() into the separate routines
ipsec4_input() and ipsec4_forward(). Tested by christos@.


# 1.304 05-Jun-2013 christos

IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.


Revision tags: agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.303 29-Nov-2012 christos

Add a new sysctl to mark ports as reserved, so that they are not used in
the anonymous or reserved port allocation.


Revision tags: yamt-pagecache-base6
# 1.302 25-Jun-2012 christos

branches: 1.302.2;
rename rfc6056 -> portalgo, requested by yamt


# 1.301 22-Jun-2012 christos

PR/46602: Move the rfc6056 port randomization to the IP layer.


# 1.300 02-Jun-2012 dsl

Add some pre-processor magic to verify that the type of the data item
passed to sysctl_createv() actually matches the declared type for
the item itself.
In the places where the caller specifies a function and a structure
address (typically the 'softc') an explicit (void *) cast is now needed.
Fixes bugs in sys/dev/acpi/asus_acpi.c sys/dev/bluetooth/bcsp.c
sys/kern/vfs_bio.c sys/miscfs/syncfs/sync_subr.c and setting
AcpiGbl_EnableAmlDebugObject.
(mostly passing the address of a uint64_t when typed as CTLTYPE_INT).
I've test built quite a few kernels, but there may be some unfixed MD
fallout. Most likely passing &char[] to char *.
Also add CTLFLAG_UNSIGNED for unsiged decimals - not set yet.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.299 22-Mar-2012 drochner

remove KAME IPSEC, replaced by FAST_IPSEC


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.298 09-Jan-2012 liamjfoy

branches: 1.298.2; 1.298.6; 1.298.8;
check against NULL


# 1.297 19-Dec-2011 drochner

rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.296 31-Aug-2011 plunky

branches: 1.296.2; 1.296.6;
NULL does not need a cast


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.295 03-May-2011 dyoung

*_drain() routines may be called with locks held, so instead of doing
any work in *_drain(), set a drain-needed flag. Do the work in the
fasttimo handler.

Contributed by Coyote Point Systems, Inc.


# 1.294 14-Apr-2011 dyoung

In ipintr(), don't overwrite ipintrq.ifq_maxlen with IFQ_MAXLEN.

Initialize ipintrq.ifq_maxlen using IFQ_MAXLEN directly instead of using
the global ipqmaxlen. Get rid of the global ipqmaxlen.

Now it works again to override the maximum IP queue length with, for
example, sysctl -w net.inet.ip.ifq.maxlen=5.


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.293 13-Dec-2010 matt

branches: 1.293.2;
Back out rev that shouldn't have been committed.


# 1.292 11-Dec-2010 matt

Add routines to calculate a checkesum if the driver concludes that the
h/w can't do it.


Revision tags: uebayasi-xip-base4
# 1.291 05-Nov-2010 rmind

ip_randomid: make mechanism MP-safe and more modular.

OK matt@


# 1.290 05-Nov-2010 rmind

ip_reass_packet: finish abstraction; some clean-up.
Discussed some time ago with matt@.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.289 19-Jul-2010 rmind

Abstract IP reassembly into single generic routine - ip_reass_packet().
Make struct ipq private and struct ipqent not visible to userland.
Push ip_len adjustment into reassembly layer.

OK matt@


# 1.288 13-Jul-2010 rmind

Split-off IPv4 re-assembly mechanism into a separate module. Abstract
into ip_reass_init(), ip_reass_lookup(), etc (note: abstraction is not
yet complete). No functional changes to the actual mechanism.

OK matt@


# 1.287 09-Jul-2010 rmind

ip_input: move lookup for fragment queue a little bit further. OK matt@.


Revision tags: uebayasi-xip-base1
# 1.286 01-Apr-2010 tls

As suggested by at least 3 different people (the guilty parties know who
they are) avoid repeated kernel_lock/unlock by using an intrq on the stack.

About 5%-10% better from run to run, on my *very* simpleminded test. Can't
possibly be worse.


# 1.285 31-Mar-2010 tls

Don't hold kernel lock across call to ip_input() -- it blocked *all*
hardware interrupts for the length of time it took for all dequeued
packets to flow up the stack (on multiprocessors only). Initial testing
shows performance impact is minimal -- since this temporary fix actually
means taking/releasing the kernel lock per-packet, that seems
acceptable.

Holding the kernel lock across the ip_input() call duplicated the
exclusion intended to be provided by the socket locks/softnet lock
(same lock, for INET/INET6 sockets) and could mask serious bugs. Several
hours' testing didn't turn any up but I'd be surprised if some don't now
appear.

Damon Permezel noticed the problem. Temporary fix suggested by matt@.


Revision tags: yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.284 16-Sep-2009 pooka

branches: 1.284.2; 1.284.4;
Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.283 17-Jul-2009 minskim

Delete trailing whitespace.


Revision tags: yamt-nfs-mp-base6
# 1.282 16-Jul-2009 minskim

Add the IP_RECVTTL option support.

If the IP_RECVTTL option is enabled on a SOCK_DGRAM socket, the
recvmsg(2) call will return the TTL of the received datagram. The
msg_control field in the msghdr structure points to a buffer that
contains a cmsghdr structure followed by the TTL value.

Modeled after FreeBSD implementation.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.281 18-Apr-2009 tsutsui

Remove extra whitespace added by a stupid tool.
XXX: more in src/sys/arch


# 1.280 15-Apr-2009 elad

Remove a few KAUTH_GENERIC_ISSUSER in favor of more descriptive
alternatives.

Discussed on tech-kern:

http://mail-index.netbsd.org/tech-kern/2009/04/11/msg004798.html

Input from ad@, christos@, dyoung@, tsutsui@.

Okay ad@.


# 1.279 18-Mar-2009 cegger

bcopy -> memcpy


Revision tags: nick-hppapmap-base2
# 1.278 19-Jan-2009 christos

branches: 1.278.2;
Provide compatibility to the old timeval SCM_TIMESTAMP messages.


Revision tags: mjf-devfs2-base
# 1.277 17-Dec-2008 cegger

kill MALLOC and FREE macros.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.276 23-Nov-2008 rmind

ip_input: fix an IPQ "lock" leak. (hi <matt>!)


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4
# 1.275 04-Oct-2008 pooka

branches: 1.275.2; 1.275.4;
POOL_INIT -> pool_init


Revision tags: wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.274 05-Sep-2008 seanb

Wrong route being consulted in one place
in ip_forward() after change to rtcache_*().
Restore previous behaviour.


# 1.273 20-Aug-2008 matt

Make the sysctl routines take out softnet_lock before dealing with
any data structures.

Change inet6ctlerrmap and zeroin6_addr to const.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base
# 1.272 05-May-2008 ad

branches: 1.272.2; 1.272.6;
- Convert hashinit() to use kmem_alloc(). The hash tables can be large
and it's better to not have them in kmem_map.
- Convert a couple of minor items along the way to kmem_alloc().
- Fix some memory leaks.


# 1.271 04-May-2008 thorpej

Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577


# 1.270 02-May-2008 ad

PR kern/38497 Out of memory allocating ksiginfo

Work around: don't acquire softnet_lock in protocol drain routines.


# 1.269 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.268 24-Apr-2008 ad

branches: 1.268.2;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.


# 1.267 23-Apr-2008 thorpej

Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.266 12-Apr-2008 thorpej

branches: 1.266.2;
Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.


# 1.265 09-Apr-2008 thorpej

- ipflow is not used outside ip_flow.c; move its definition there.
- Make ipflow_reap() private to ip_flow.c, and introduce ipflow_prune()
for external callers to use (avoids returning an ipflow * that is never
actually used anyway).


# 1.264 07-Apr-2008 thorpej

Change IP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ipstat structure; old netstat
binaries will continue to work properly.


# 1.263 27-Mar-2008 cube

- Make sure we send a reasonable fragment size when IPSEC is configured.
Otherwise we end up sending a dubious "0" whenever we cannot find a
proper association for the packet.
- Reset sack_newdata along with snd_nxt to avoid improper integer
arithmetics that lead to sending data from an incorrect place in the
stream, making it appear as corrupted.

Patch by Michael Van Elst, based on an analysis by Michael for the IPSEC
stuff and I for the SACK issue.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.262 06-Feb-2008 matt

branches: 1.262.6;
Add a new ip_id generation scheme based on a Fisher-Yates shuffle over a
sliding window. XXX replace use of arc4random RSN.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.261 14-Jan-2008 dyoung

Use rtcache_validate() instead of rtcache_getrt(). Shorten staircase
in in_losing().


Revision tags: vmlocking2-base3 matt-armv6-base
# 1.260 22-Dec-2007 matt

Fix offset calculation.
Make sure that all frags use the same TOS.


# 1.259 21-Dec-2007 matt

Also make sure the first is at 68 bytes long.


# 1.258 21-Dec-2007 matt

Prevent TCP blind data attacks by not allowing non-initial fragments to
start at less than 68 bytes (minimal fragment size).


# 1.257 20-Dec-2007 dyoung

Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.256 26-Nov-2007 yamt

branches: 1.256.2; 1.256.6;
inetctlerrmap: use designated initializer.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.255 09-Nov-2007 kefren

Don't MCLAIM in ipintr() because we do it anyway in ip_input()


Revision tags: jmcneill-base yamt-x86pmap-base4 yamt-x86pmap-base3 yamt-x86pmap-base2 vmlocking-base
# 1.254 02-Oct-2007 dyoung

branches: 1.254.2; 1.254.4;
Delete the unused second argument to ip_stripoptions(), move it
closer to its single caller in if_eon.c, try to move fewer bytes
by moving the IP header forward instead of moving the tail of the
mbuf backward, and use m_adj(9) instead of fiddling directly with
mbuf data members.


Revision tags: yamt-x86pmap-base
# 1.253 11-Sep-2007 degroote

branches: 1.253.2;
In some FAST_IPSEC, spl level is not restored correctly. Fix that.

Spotted by Wolfgang Stukenbrock in pr/36800


Revision tags: nick-csl-alignment-base5
# 1.252 30-Aug-2007 dyoung

Use malloc(9) for sockaddrs instead of pool(9), and remove dom_sa_pool
and dom_sa_len members from struct domain. Pools of fixed-size
objects are too rigid for sockaddr_dls, whose size can vary over
a wide range.

Return sockaddr_dl to its "historical" size. Now that I'm using
malloc(9) instead of pool(9) to allocate sockaddr_dl, I can create
a sockaddr_dl of any size in the kernel, so expanding sockaddr_dl
is useless.

Avoid using sizeof(struct sockaddr_dl) in the kernel.

Introduce sockaddr_dl_alloc() for allocating & initializing an
arbitrary sockaddr_dl on the heap.

Add an argument, the sockaddr length, to sockaddr_alloc(),
sockaddr_copy(), and sockaddr_dl_setaddr().

Constify: LLADDR() -> CLLADDR().

Where the kernel overwrites LLADDR(), use sockaddr_dl_setaddr(),
instead. Used properly, sockaddr_dl_setaddr() will not overrun
the end of the sockaddr.


# 1.251 10-Aug-2007 dyoung

branches: 1.251.2;
Use sockaddr_dl_init().


Revision tags: matt-mips64-base
# 1.250 19-Jul-2007 dyoung

branches: 1.250.4; 1.250.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base yamt-idlelwp-base8 mjf-ufs-trans-base
# 1.249 02-May-2007 dyoung

branches: 1.249.2;
Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


Revision tags: thorpej-atomic-base
# 1.248 25-Mar-2007 liamjfoy

Add net.inet.ip.hashsize to control the IPv4 fast forward hash table size.


# 1.247 24-Mar-2007 liamjfoy

Don't call ip*flow_reap if we're just looking up maxflows


# 1.246 12-Mar-2007 ad

branches: 1.246.2; 1.246.4;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.


# 1.245 05-Mar-2007 liamjfoy

branches: 1.245.2;
Move ipflow_slowtimo from ip_slowtimo and into in_proto.c

ok matt@


# 1.244 04-Mar-2007 christos

Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.243 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.242 29-Jan-2007 dyoung

branches: 1.242.2;
Cosmetic: remove extraneous, non-KNF parentheses. Change a
sizeof(type) to a sizeof(*ptr) so the correctness of the statement
is correct "at a glance" (or so I hope).


# 1.241 22-Dec-2006 ad

ipintr(): check if the queue is empty before looping. Hardly a giant
win, but removed 30% of splnet() calls in one local test.


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.240 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.239 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


# 1.238 06-Dec-2006 dyoung

KNF.


# 1.237 06-Dec-2006 dyoung

KNF.


Revision tags: netbsd-4-0-RC1 netbsd-4-base
# 1.236 16-Nov-2006 christos

branches: 1.236.2; 1.236.4;
__unused removal on arguments; approved by core.


Revision tags: yamt-splraiseipl-base2
# 1.235 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


# 1.234 10-Oct-2006 dogcow

change the MOWNER_INIT define to take two args; fix extant struct mowner
decls to use it. Makes options MBUFTRACE compile again and not whinge about
missing structure declarations. (Also makes initialization consistent.)


# 1.233 05-Oct-2006 tls

Protect calls to pool_put/pool_get that may occur in interrupt context
with spl used to protect other allocations and frees, or datastructure
element insertion and removal, in adjacent code.

It is almost unquestionably the case that some of the spl()/splx() calls
added here are superfluous, but it really seems wrong to see:

s=splfoo();
/* frob data structure */
splx(s);
pool_put(x);

and if we think we need to protect the first operation, then it is hard
to see why we should not think we need to protect the next. "Better
safe than sorry".

It is also almost unquestionably the case that I missed some pool
gets/puts from interrupt context with my strategy for finding these
calls; use of PR_NOWAIT is a strong hint that a pool may be used from
interrupt context but many callers in the kernel pass a "can wait/can't
wait" flag down such that my searches might not have found them. One
notable area that needs to be looked at is pf.

See also:

http://mail-index.netbsd.org/tech-kern/2006/07/19/0003.html
http://mail-index.netbsd.org/tech-kern/2006/07/19/0009.html


# 1.232 19-Sep-2006 elad

Remove ugly (void *) casts from network scope authorization wrapper and
calls to it.

While here, adapt code for system scope listeners to avoid some more
casts (forgotten in previous run).

Update documentation.


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9
# 1.231 13-Sep-2006 elad

branches: 1.231.2;
Don't use KAUTH_RESULT_* where it's not applicable.
Prompted by yamt@.


# 1.230 08-Sep-2006 elad

First take at security model abstraction.

- Add a few scopes to the kernel: system, network, and machdep.

- Add a few more actions/sub-actions (requests), and start using them as
opposed to the KAUTH_GENERIC_ISSUSER place-holders.

- Introduce a basic set of listeners that implement our "traditional"
security model, called "bsd44". This is the default (and only) model we
have at the moment.

- Update all relevant documentation.

- Add some code and docs to help folks who want to actually use this stuff:

* There's a sample overlay model, sitting on-top of "bsd44", for
fast experimenting with tweaking just a subset of an existing model.

This is pretty cool because it's *really* straightforward to do stuff
you had to use ugly hacks for until now...

* And of course, documentation describing how to do the above for quick
reference, including code samples.

All of these changes were tested for regressions using a Python-based
testsuite that will be (I hope) available soon via pkgsrc. Information
about the tests, and how to write new ones, can be found on:

http://kauth.linbsd.org/kauthwiki

NOTE FOR DEVELOPERS: *PLEASE* don't add any code that does any of the
following:

- Uses a KAUTH_GENERIC_ISSUSER kauth(9) request,
- Checks 'securelevel' directly,
- Checks a uid/gid directly.

(or if you feel you have to, contact me first)

This is still work in progress; It's far from being done, but now it'll
be a lot easier.

Relevant mailing list threads:

http://mail-index.netbsd.org/tech-security/2006/01/25/0011.html
http://mail-index.netbsd.org/tech-security/2006/03/24/0001.html
http://mail-index.netbsd.org/tech-security/2006/04/18/0000.html
http://mail-index.netbsd.org/tech-security/2006/05/15/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/01/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/25/0000.html

Many thanks to YAMAMOTO Takashi, Matt Thomas, and Christos Zoulas for help
stablizing kauth(9).

Full credit for the regression tests, making sure these changes didn't break
anything, goes to Matt Fleming and Jaime Fournier.

Happy birthday Randi! :)


Revision tags: yamt-pdpolicy-base8 rpaulo-netinet-merge-pcb-base
# 1.229 30-Aug-2006 christos

branches: 1.229.2;
fix initializer


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.228 30-Jul-2006 elad

ugh.. more stuff that's overdue and should not be in 4.0: remove the
sysctl(9) flags CTLFLAG_READONLY[12]. luckily they're not documented
so it's only half regression.

only two knobs used them; proc.curproc.corename (check added in the
existing handler; its CTLFLAG_ANYWRITE, yay) and net.inet.ip.forwsrcrt,
that got its own handler now too.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base
# 1.227 07-Jun-2006 kardel

merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base simonb-timecounters-base
# 1.226 08-May-2006 liamjfoy

branches: 1.226.2;
#if -> #ifdef

ok christos


# 1.225 15-Apr-2006 christos

Coverity CID 1134: Protect against NULL deref.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.224 18-Feb-2006 joerg

branches: 1.224.2; 1.224.4; 1.224.6;
Print the source and destination IP in ip_forward's DIAGNOSTIC code
with inet_ntoa, making it more human friendly.

From Liam J. Foy in private mail.


# 1.223 24-Dec-2005 perry

branches: 1.223.2; 1.223.4; 1.223.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.222 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.221 01-Nov-2005 christos

Don't decrement the ttl, until we are sure that we can forward this packet.
Before if there was no route, we would call icmp_error with a datagram
packet that has an incorrect checksum. (From Liam Foy)


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base
# 1.220 23-Oct-2005 christos

No need to pass an interface when only the mtu is needed. From OpenBSD via
Liam Foy.


Revision tags: yamt-vop-base
# 1.219 05-Aug-2005 elad

branches: 1.219.2;
Add sysctls for IP, ICMP, TCP, and UDP statistics.


# 1.218 28-Jun-2005 seanb

branches: 1.218.2;
- Return ICMP_UNREACH_NET when no route found as per
section 4.3.3.1 of rfc1812.


# 1.217 09-Jun-2005 atatat

Properly fix the constipated lossage wrt -Wcast-qual and the sysctl
code. I know it's not the prettiest code, but it seems to work rather
well in spite of itself.


# 1.216 01-Jun-2005 blymn

Unconstify rnode to prevent compile error when GATEWAY option set.


Revision tags: kent-audio2-base
# 1.215 29-Apr-2005 yamt

move decl of inetsw to its own header to avoid array of incomplete type.
found by gcc4. reported by Adam Ciarcinski.


# 1.214 18-Apr-2005 yamt

fix problems related to loopback interface checksum omission. PR/29971.

- for ipv4, defer decision to ip layer as h/w checksum offloading does
so that it can check the actual interface the packet is going to.
- for ipv6, disable it.
(maybe will be revisited when it implements h/w checksum offloading.)

ok'ed by Jason Thorpe.


# 1.213 29-Mar-2005 yamt

ip_reass: clear stale csum_flags.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base
# 1.212 26-Feb-2005 perry

branches: 1.212.2;
nuke trailing whitespace


Revision tags: yamt-km-base2
# 1.211 03-Feb-2005 perry

ANSIfy function declarations


# 1.210 02-Feb-2005 perry

de-__P -- will ANSIfy .c files later.


Revision tags: yamt-km-base
# 1.209 24-Jan-2005 matt

branches: 1.209.2;
Add IFNET_FOREACH and IFADDR_FOREACH macros and start using them.


Revision tags: kent-audio1-beforemerge
# 1.208 19-Dec-2004 christos

branches: 1.208.2;
yamt's changes seem to fix all the checksumming issues. Turn the loopback
checksums back off so we can make sure that everything works.


# 1.207 17-Dec-2004 christos

Turn checksumming on loopback back on until we fix the bugs in it.
Connect over tcp on the loopback is broken:

4729 amq 0.000007 CALL connect(4,0x804f2a0,0x1c)
4729 amq 75.007420 RET connect -1 errno 60 Connection timed out


# 1.206 15-Dec-2004 thorpej

Don't perform checksums on loopback interfaces. They can be reenabled with
the net.inet.*.do_loopback_cksum sysctl.

Approved by: groo


Revision tags: kent-audio1-base
# 1.205 06-Oct-2004 darrenr

Add a comment to document what setting "srcrt" is really on about in ipintr()


# 1.204 29-Sep-2004 christos

PR/27081: Sean Boudreau: ip_input() bad csum count not incremented on sw csum


Revision tags: BEFORE-IPF413
# 1.203 25-May-2004 atatat

Sysctl descriptions under net subtree (net.key not done)


# 1.202 02-May-2004 darrenr

at line 543, we do a pullup here of hlen bytes into the mbuf,
so these later ones are superfluous.


# 1.201 01-May-2004 matt

Use EVCNT_ATTACH_STATIC{,2}


# 1.200 25-Apr-2004 simonb

Initialise (most) pools from a link set instead of explicit calls
to pool_init. Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.

Convert struct session, ucred and lockf to pools.


# 1.199 22-Apr-2004 matt

Constify protosw arrays. This can reduce the kernel .data section by
over 4K (if all the network protocols) are loaded.


# 1.198 01-Apr-2004 matt

In ip_reass_ttl_descr, make i signed since it's compared to >= 0


Revision tags: netbsd-2-0-base BEFORE-IPF411
# 1.197 24-Mar-2004 atatat

branches: 1.197.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.


# 1.196 15-Jan-2004 itojun

correct typo in 1.94 -> 1.95. pointed out by Shiva Shenoy


# 1.195 14-Dec-2003 thorpej

Fix syntax errors in CHECK_NMBCLUSTER_PARAMS().


# 1.194 14-Dec-2003 jonathan

Second part of hashed IP_reassembly changes:

When under pressure for mbufs or we have too many fragments in the IP
reassembly queue, drop half of all fragments. This multiplicative-drop
strategy ensures we return to a healthy state, even under borderline
denial-of-service from extremely lossy NFS-over-UDP peers.
The multiplicative-drop phase currently drops 50% of fragments, but
has pre-placed support for implementing drop-fractions other than 50%

The threshhold for the `drop-half' phase is the new variable,
ip_maxfrags which is calculated as nmbclusters/4.

ip_input.c now keeps ip_nmbclusters, a cached copy of nmbclusters.
Before using limits derived from nmbclusters, we check if nmbclusters
and ip_nmclusters are equal. If not, we recompute Ip parameters
derived from nmbclusters. Based on a suggestion by Jason Thorpe.
ip_maxfrags is currently auto-recalcuated.

The counters ip_nfrags and ip_nfragpacketsr are now declared static
and uninitialized (bss), to discourage tampering with them.


# 1.193 12-Dec-2003 scw

Make fast-ipsec and ipflow (Fast Forwarding) interoperate.

The idea is that we only clear M_CANFASTFWD if an SPD exists
for the packet. Otherwise, it's safe to add a fast-forward
cache entry for the route.

To make this work properly, we invalidate the entire ipflow
cache if a fast-ipsec key is added or changed.


# 1.192 08-Dec-2003 jonathan

Add new field ipq_nfrags to struct ipq. Maintain count of fragments
(fragments, not fragmented packets) in each queue entry.
Use ipq_nfrags to maintain a count of total fragments in reassembly queue.


# 1.191 07-Dec-2003 jonathan

KNF: s/unsigned/u_int/, in a couple of places I missed.


# 1.190 06-Dec-2003 jonathan

Replace the single global IP reassembly list/listhead, with a
hashtable of list-heads. Independently re-invented, then reworked to
match similar code in FreeBSD.


# 1.189 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.188 04-Dec-2003 scw

ipflow (IP fast forwarding) is not compatible with FAST_IPSEC either.

XXX: The decision whether or not to fast forward should be made
XXX: dynamically. Using the current approach seriously reduces
XXX: routing performance on gateways with IPsec enabled.


# 1.187 26-Nov-2003 itojun

define RANDOM_IP_ID by default (unifdef -DRANDOM_IP_ID).
one use remains in sys/netipsec, which is kept for freebsd source code compat.


# 1.186 24-Nov-2003 scw

For FAST_IPSEC, ipfilter gets to see wire-format IPsec-encapsulated packets
only. Decapsulated packets bypass ipfilter. This mimics current behaviour
for Kame IPsec.


# 1.185 19-Nov-2003 fvdl

Correct number of arguments to sysctl_rdint.


# 1.184 19-Nov-2003 jonathan

Patch back support for (badly) randomized IP ids, by request:

* Include "opt_inet.h" everywhere IP-ids are generated with ip_newid(),
so the RANDOM_IP_ID option is visible. Also in ip_id(), to ensure
the prototype for ip_randomid() is made visible.

* Add new sysctl to enable randomized IP-ids, provided the kernel was
configured with RANDOM_IP_ID. (The sysctl defaults to zero, and is
a read-only zero if RANDOM_IP_ID is not configured).

Note that the implementation of randomized IP ids is still defective,
and should not be enabled at all (even if configured) without
very careful deliberation. Caveat emptor.


# 1.183 17-Nov-2003 jonathan

Diff to netinet/ip_input.c (restore ip_id, initialize) for ip_id fix:

Revert the (default) ip_id algorithm to the pre-randomid algorithm,
due to demonstrated low-period repeated IDs from the randomized IP_id
code. Consensus is that the low-period repetition (much less than
2^15) is not suitable for general-purpose use.

Allocators of new IPv4 IDs should now call the function ip_newid().
Randomized IP_ids is now a config-time option, "options RANDOM_IP_ID".
ip_newid() can use ip_random-id()_IP_ID if and only if configured
with RANDOM_IP_ID. A sysctl knob should be provided.

This API may be reworked in the near future to support linear ip_id
counters per (src,dst) IP-address pair.


# 1.182 12-Nov-2003 itojun

KNF


# 1.181 11-Nov-2003 jonathan

Change global head-of-local-IP-address list from in_ifaddr to
in_ifaddrhead. Recent changes in struct names caused a namespace
collision in fast-ipsec, which are most cleanly fixed by using
"in_ifaddrhead" as the listhead name.


# 1.180 10-Nov-2003 jonathan

Make per-protocol network input queue stats visible to userland via
sysctl. Add a protocol-independent sysctl handler to show the per-protocol
"struct ifq' statistics. Add IP(v4) specific call to the handler.
Other protocols can show their per-protocol input statistics by
allocating a sysclt node and calling sysctl_ifq() with their own struct ifq *.

As posted to tech-kern plus improvements/cleanup suggested by Andrew Brown.


# 1.179 28-Sep-2003 mycroft

Remove some code that breaks AH tunnels completely. The comment describing
the purpose of this code appears to be on crack -- it's talking about
end-to-end authentication, but the purpose of an AH tunnel is NOT end-to-end
authentication; it's authentication of the tunnel endpoints.

NB: This does not fix the fact that IPsec leaks "packet tags."


# 1.178 06-Sep-2003 itojun

randomize IPv4/v6 fragment ID and IPv6 flowlabel. avoids predictability
of these fields. ip_id.c is from openbsd. ip6_id.c is adapted by kame.


# 1.177 06-Sep-2003 itojun

backout previous, we don't know if arc4random() corrides on reboot.


# 1.176 05-Sep-2003 itojun

initialize fragment ID with arc4random, not by time.tv_sec


# 1.175 22-Aug-2003 itojun

remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.


# 1.174 22-Aug-2003 itojun

change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.


# 1.173 15-Aug-2003 jonathan

(fast-ipsec): Add hooks to pass IPv4 IPsec traffic into fast-ipsec, if
configured with ``options FAST_IPSEC''. Kernels with KAME IPsec or
with no IPsec should work as before.

All calls to ip_output() now always pass an additional compulsory
argument: the inpcb associated with the packet being sent,
or 0 if no inpcb is available.

Fast-ipsec tested with ICMP or UDP over ESP. TCP doesn't work, yet.


# 1.172 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.171 14-Jul-2003 itojun

correct igmp. from love


# 1.170 03-Jul-2003 itojun

minor KNF


# 1.169 30-Jun-2003 itojun

branches: 1.169.2;
do not generate ICMP redirect when packet filter alters ip_dst to an
address that reside on the same link. Cedric Berger convinced me that
it is necessary.


# 1.168 30-Jun-2003 itojun

fix indent


# 1.167 23-Jun-2003 martin

Make sure to include opt_foo.h if a defflag option FOO is used.


# 1.166 15-Jun-2003 matt

Change the way multicasts are kept. They now use a hash table in the same
manner as the ifaddr hash table. By doing this, the mkludge code can go
away. At the same time, keep track of what pcbs are using what ifaddr and
when an address is deleted from an interface, notify/abort all sockets
that have that address as a source. Switch IGMP and multicasts to use pools
for allocation. Fix a number of potential problems in the igmp code where
allocation failures could cause a trap/panic.


# 1.165 11-Apr-2003 christos

PR/991: Darren Reed: Add a sysctl (checkinteface) to implement this. This
implementation is taken from FreeBSD, but we default to off.
XXX: We should really do this on a per ifaddr basis as jason suggested.


# 1.164 26-Feb-2003 matt

Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.163 12-Nov-2002 itojun

remove all entries in rt timer queue on ip_mtudisc change, instead of
destroying the queue.


# 1.162 12-Nov-2002 itojun

ckout previous - doesn't compile


# 1.161 12-Nov-2002 itojun

update ip_mtudisc sysctl change handling.


# 1.160 10-Nov-2002 itojun

always create pmtud timeout queue, as ip_mtudisc can be tweaked via
sysctl at runtime. From lha@stacken.kth.se


# 1.159 02-Nov-2002 perry

/*CONTCOND*/ while (0)'ed macros


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.158 23-Sep-2002 itojun

revert mtudisc_timeout value to the old one if update falis


# 1.157 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame.


# 1.156 11-Sep-2002 itojun

correct signedness mixup in pointer passing. sync w/kame


Revision tags: gehenna-devsw-base
# 1.155 14-Aug-2002 itojun

avoid swapping endian of ip_len and ip_off on mbuf, to meet with M_LEADINGSPACE
optimization made last year. should solve PR 17867 and 10195.

IP_HDRINCL behavior of raw ip socket is kept unchanged. we may want to
provide IP_HDRINCL variant that does not swap endian.


# 1.154 30-Jun-2002 thorpej

Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).


# 1.153 13-Jun-2002 itojun

set IPv4 parameter to modern value.
- turn on path MTU discovery (previous: turned off)
- ICMPv4 redirect entry timeout = 600 sec (previous: never timeout)


# 1.152 09-Jun-2002 itojun

whitespace


# 1.151 07-Jun-2002 itojun

look at rmx_mtu on IPsec tunnel MTU computation.
From: David Waitzman <djw@bbn.com>


Revision tags: netbsd-1-6-base
# 1.150 12-May-2002 matt

branches: 1.150.2; 1.150.4;
Eliminate commons.


# 1.149 12-May-2002 wiz

Spelling fixes, from Sergey Svishchev in kern/16650.


# 1.148 07-May-2002 matt

Change struct ipqe to use TAILQ's instead of LIST's (primarily for TCP's
benefit currently). Rework tcp_reass code to optimize the 4 most likely causes
of out-of-order packets: first OoO pkt, next OoO pkt in seq, OoO pkt is part
of new chuck of OoO packets, and the OoO pkt fills the first hole. Add evcnts
to instrument tcp_reass (enabled by the options TCP_REASS_COUNTERS). This is
part 1/2 of tcp_reass changes.


# 1.147 18-Apr-2002 matt

Change test for M_EXT to M_READONLY for MROUTING. We only need to to do
a pullup if we aren't allowed to modify the packet.


Revision tags: eeh-devprop-base newlock-base
# 1.146 08-Mar-2002 thorpej

Pool deals fairly well with physical memory shortage, but it doesn't
deal with shortages of the VM maps where the backing pages are mapped
(usually kmem_map). Try to deal with this:

* Group all information about the backend allocator for a pool in a
separate structure. The pool references this structure, rather than
the individual fields.
* Change the pool_init() API accordingly, and adjust all callers.
* Link all pools using the same backend allocator on a list.
* The backend allocator is responsible for waiting for physical memory
to become available, but will still fail if it cannot callocate KVA
space for the pages. If this happens, carefully drain all pools using
the same backend allocator, so that some KVA space can be freed.
* Change pool_reclaim() to indicate if it actually succeeded in freeing
some pages, and use that information to make draining easier and more
efficient.
* Get rid of PR_URGENT. There was only one use of it, and it could be
dealt with by the caller.

From art@openbsd.org.


Revision tags: ifpoll-base
# 1.145 25-Feb-2002 itojun

correctly enforce ipsec policy check on forwarding case.
From: Greg Troxel <gdt@ir.bbn.com>, Bill Chiarchiaro <wjc@work.cleartech.com>


# 1.144 24-Feb-2002 martin

Clear M_BCAST and M_MCAST on outgoing mbufs.
Don't copy ttl from the inner packet to the encapsulating packet. Make
the outer ttl sysctl'able. This should close PR 14269 from Jasper Wallace
(change partly from there) and it makes traceroute work over gre tunnels.


# 1.143 21-Feb-2002 itojun

suppress source quence message, based on router-req RFC (also could be abused
as DoS traffic generator). from kjc/kame


# 1.142 28-Nov-2001 darrenr

recompute hlen after calling pfil_run_hooks() in case ip_hl was changed.


# 1.141 13-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-mips-cache-base
# 1.140 04-Nov-2001 matt

Convert netinet to not use the internal <sys/queue.h> field names
but instead the access macros. Use the FOREACH macros where appropriate.


# 1.139 04-Nov-2001 matt

Change a few variable/tables to const since they are read-only.


# 1.138 29-Oct-2001 simonb

Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.137 17-Sep-2001 thorpej

branches: 1.137.2;
Split the pre-computed ifnet checksum flags into Tx and Rx directions.
Add capabilities bits that indicate an interface can only perform
in-bound TCPv4 or UDPv4 checksums. There is at least one Gig-E chip
for which this is true (Level One LXT-1001), and this is also the
case for the Intel i82559 10/100 Ethernet chips.


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.136 06-Aug-2001 itojun

branches: 1.136.2;
cache IPsec policy on in6?pcb. most of the lookup operations can be bypassed,
especially when it is a connected SOCK_STREAM in6?pcb. sync with kame.


# 1.135 02-Jun-2001 thorpej

branches: 1.135.2;
Implement support for IP/TCP/UDP checksum offloading provided by
network interfaces. This works by pre-computing the pseudo-header
checksum and caching it, delaying the actual checksum to ip_output()
if the hardware cannot perform the sum for us. In-bound checksums
can either be fully-checked by hardware, or summed up for final
verification by software. This method was modeled after how this
is done in FreeBSD, although the code is significantly different in
most places.

We don't delay checksums for IPv6/TCP, but we do take advantage of the
cached pseudo-header checksum.

Note: hardware-assisted checksumming defaults to "off". It is
enabled with ifconfig(8). See the manual page for details.

Implement hardware-assisted checksumming on the DP83820 Gigabit Ethernet,
3c90xB/3c90xC 10/100 Ethernet, and Alteon Tigon/Tigon2 Gigabit Ethernet.


# 1.134 21-May-2001 lukem

fix spelo in comment


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.133 16-Apr-2001 itojun

give a default value to net.inet.ip.maxfragpackets, to protect us from
"lots of fragmented packets" DoS attack.

the current default value is derived from ipv6 counterpart, which is
a magical value "200". it should be enough for normal systems, not sure
if it is enough when you take hundreds of thousands of tcp connections on
your system. if you have proposal for a better value with concrete reasons,
let me know.


# 1.132 13-Apr-2001 thorpej

Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.


# 1.131 27-Mar-2001 itojun

net.inet.ip.maxfragpackets defines the maximum size of ip reass queue
(prevents fragment flood from chewing up mbuf memory space).
derived from KAME net.inet6.ip6.maxfragpackets.


# 1.130 02-Mar-2001 itojun

branches: 1.130.2;
increase ipstat.ips_badaddr if the packet fails to pass address checks.


# 1.129 02-Mar-2001 itojun

reject packets with 127/8 on IPv4 src/dst, they must not appear on wire
(RFC1122). torture-tests will be welcomed.
XXX do we want to check source routing headers as well?


# 1.128 01-Mar-2001 itojun

make sure to enforce inbound ipsec policy checking, for any protocols on top
of ip (check it when final header is visited). sync with kame.
XXX kame team will need to re-check policy engine code


# 1.127 24-Jan-2001 itojun

- record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation


# 1.126 28-Dec-2000 thorpej

Back out the sledgehammer damage applied by wiz while I was out for
the holiday.


# 1.125 25-Dec-2000 wiz

Back out previous change. It causes NAT to fail, and was CLEARLY
NOT TESTED before it was committed.


# 1.124 22-Dec-2000 thorpej

Slight adjustment to how pfil_head's are registered. Instead of a
"key" and a "dlt", use a "type" (PFIL_TYPE_{AF,IFNET} for now) and
a val/ptr appropriate for that type. This allows for more future
flexibility with the pfil_hook mechanism.


# 1.123 14-Dec-2000 thorpej

Add ALTQ glue. XXX Temporary until ALTQ is changed to use a pfil hook.


# 1.122 24-Nov-2000 itojun

IFA_STATS stability (not complete); don't touch ip if it is NULL.


# 1.121 11-Nov-2000 thorpej

Restructure the PFIL_HOOKS mechanism a bit:
- All packets are passed to PFIL_HOOKS as they come off the wire, i.e.
fields in protocol headers in network order, etc.
- Allow for multiple hooks to be registered, using a "key" and a "dlt".
The "dlt" is a BPF data link type, indicating what type of header is
present.
- INET and INET6 register with key == AF_INET or AF_INET6, and
dlt == DLT_RAW.
- PFIL_HOOKS now take an argument for the filter hook, and mbuf **,
an ifnet *, and a direction (PFIL_IN or PFIL_OUT), thus making them
less IP (really, IP Filter) centric.

Maintain compatibility with IP Filter by adding wrapper functions for
IP Filter.


# 1.120 08-Nov-2000 ad

Update for hashinit() change.


# 1.119 13-Oct-2000 itojun

make sure we don't share external mbuf between m and mcopy, in ip_forward().
should solve PR 11201.


# 1.118 26-Aug-2000 itojun

make sure anonport{min,max} is not negative number


# 1.117 25-Aug-2000 tron

Add new sysctl variables "net.inet.ip.lowportmin" and
"net.inet.ip.lowportmax" which can be used to the set minimum
and maximum port number assigned to sockets using
IP_PORTRANGE_LOW.


# 1.116 06-Jul-2000 itojun

remove unnecessary #include <netkey/key_debug.h>. from kame.


# 1.115 28-Jun-2000 mrg

<vm/vm.h> -> <uvm/uvm_extern.h>


Revision tags: netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.114 10-May-2000 itojun

branches: 1.114.4;
add missing boundary checks to ip options processing.
correct timestamp option validation (len and ptr upper/lower bound
based on RFC791).
fill "pointer" field for parameter problem in timestamp option processing.


# 1.113 10-May-2000 itojun

correct more out-of-bounds memory access, if cnt == 1 and optlen > 1.


# 1.112 06-May-2000 sommerfeld

Handle large offsets with very small options correctly.


# 1.111 31-Mar-2000 jdolecek

Slighly improve previous - only include <netinet/ip_mroute.h> if MROUTING
is defined.


# 1.110 31-Mar-2000 jdolecek

include <netinet/ip_mroute.h> for ip_mforward() - needed after
last duplicate prototype sweep (prototype for ip_mforward() used to be in <netinet/ip_var.h>)


# 1.109 30-Mar-2000 augustss

Remove register declarations.


# 1.108 30-Mar-2000 simonb

Delete uninitialised declaration of ip_defttl - there's an initialised
decl earlier in this file.


# 1.107 10-Mar-2000 thorpej

Back out previous, and adjust a comment.


# 1.106 07-Mar-2000 thorpej

Back out part of 1.104 which isn't actually needed.


# 1.105 03-Mar-2000 itojun

remove unnecessary ttl initialization which I mistakingly bringed in
during KAME merge (this is part of WIDE's expeirmental reass code...)
NetBSD PR: 9412
From: Wolfgang Rupprecht <wolfgang@wsrcc.com>
Fix from: ho@crt.se
itojun was notified from: theo


# 1.104 02-Mar-2000 thorpej

Avoid a bug in GCC which manifests itself when processing unaligned
IP options. Problem pointed out by Matt Hargett and Erik Fair, analyzed
by me.


# 1.103 01-Mar-2000 itojun

introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)


# 1.102 20-Feb-2000 darrenr

pass "struct pfil_head *" to pfil_add_hook and pfil_remove hook rather
than "struct protosw *".


# 1.101 17-Feb-2000 darrenr

Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.


# 1.100 16-Feb-2000 itojun

- if ip_dst matches address on !IFF_UP interface, and
- there's no match against addresses on IFF_UP interface,
send icmp unreach if I'm router. drop it if I'm host.

Revised version of PR: 9387 from nrt@iij.ad.jp. Discussed with thorpej+nrt.


Revision tags: chs-ubc2-newbase
# 1.99 12-Feb-2000 thorpej

Typo (Thanks, Havard :-)


# 1.98 12-Feb-2000 thorpej

Small cosmetic change, and note a place where a statistic should be
gathered.


# 1.97 11-Feb-2000 itojun

fix in-kernel packet forwarding loop (till TTL becomes 0) when:
- a packet is delivered to an address X,
- and the address X is configured on my !IFF_UP interface
- and ipforwarding=1

NetBSD PR: 9387
From: nrt@iij.ad.jp


# 1.96 01-Feb-2000 thorpej

Use ifatoia() and sintosa() consistently, rather than using home-grown
casting macros intermixed.


# 1.95 31-Jan-2000 itojun

bring in latest KAME ipsec tree.
- interop issues in ipcomp is fixed
- padding type (after ESP) is configurable
- key database memory management (need more fixes)
- policy specification is revisited

XXX m->m_pkthdr.rcvif is still overloaded - hope to fix it soon


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.94 26-Oct-1999 itojun

disable ipflow (IPv4 fast fowarding) when IPsec is configured into the kernel.


# 1.93 17-Oct-1999 sommerfeld

branches: 1.93.2; 1.93.4;
In ip_forward():

Avoid forwarding ip unicast packets which were contained inside
link-level multicast packets; having M_MCAST still set in the packet
header flags will mean that the packet will get multicast to a bogus
group instead of unicast to the next hop.

Malformed packets like this have occasionally been spotted "in the
wild" on a mediaone cable modem segment which also had multiple netbsd
machines running as router/NAT boxes.

Without this, any subnet with multiple netbsd routers receiving all
multicasts will generate a packet storm on receipt of such a
multicast. Note that we already do the same check here for link-level
broadcasts; ip6_forward already does this as well.

Note that multicast forwarding does not go through ip_forward().

Adding some code to if_ethersubr to sanity check link-level
vs. ip-level multicast addresses might also be worthwhile.


Revision tags: chs-ubc2-base
# 1.92 23-Jul-1999 itojun

branches: 1.92.2;
do not include unnecessary include files.


# 1.91 09-Jul-1999 thorpej

defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).


# 1.90 06-Jul-1999 itojun

sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups


# 1.89 01-Jul-1999 itojun

IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.88 26-Jun-1999 sommerfeld

If the new global variable hostzerobroadcast is zero, no longer assume
address zero of each net/subnet is a broadcast address.
(The default value is nonzero, which preserves the current behavior).

This can be set using sysctl; the boot-time default can also be
configured using the HOSTZEROBROADCAST kernel config option.

While we're here, defopt HOSTZEROBROADCAST and SUBNETSARELOCAL


# 1.87 04-May-1999 hwr

It does not make much sense to increase a "output" counter on input.


# 1.86 03-May-1999 thorpej

In INADDR_TO_IA(), skip interfaces which are not up. Revert previous change
to ip_input.c to check the interface status after INADDR_TO_IA().

Fix cooked up by Heiko Rupp and myself.

Fixes PR 7480.


# 1.85 03-May-1999 hwr

Drop packets, that have a Class-D address as source address.
Implements the first half of PR 7003.


# 1.84 07-Apr-1999 proff

tiny KNF change


# 1.83 07-Apr-1999 proff

Prevent reception of packets on downed interfaces (via an up interface).
fixes kern/7327


Revision tags: netbsd-1-4-base
# 1.82 27-Mar-1999 aidan

branches: 1.82.2;
Added per-addr input/output statistics. Currently just support netatalk
and netinet, currently only tested under netinet.

Disabled by default, enabled by compiling the kernel with option
IFA_STATS. Enabling this feature seems to make the ip_output function
take 13% longer than before, which should be OK for people that need
this feature.


# 1.81 26-Mar-1999 proff

security: test for ip_len < ip_hl <<2 and drop packet accordingly


# 1.80 19-Jan-1999 mycroft

There's just no plausible reason to byte-swap ip_id internally. It's opaque.


# 1.79 19-Jan-1999 mycroft

Don't screw with ip_len; just subtract from it where we actually use the
value.


# 1.78 19-Jan-1999 mycroft

Don't overwrite the checksum fields when checking them. There's no reason to
do this, and it screws up ICMP replies.
XXX The returned IP checksum and length are still wrong.


# 1.77 11-Jan-1999 thorpej

Fix byte order and ip_len inconsistencies in ICMP reply code. Also, fix
some formatting and HTONS(foo) vs. foo = htons(foo) inconsistencies.

PR #6602, Darren Reed.


# 1.76 19-Dec-1998 thorpej

Reverse the copyright-notice-swap. It went against existing practice.


# 1.75 18-Dec-1998 thorpej

Add a lock around the IP fragment reassembly queue, to prevent ip_drain()
from corrupting the queue if called from a device's interrupt context.

Should fix PR #5684.


Revision tags: kenh-if-detach-base
# 1.74 13-Nov-1998 thorpej

branches: 1.74.2;
Once a fragmented IP packet has been reassembled, recompute the packet
length before passing it up the stack. From FreeBSD.


Revision tags: chs-ubc-base
# 1.73 08-Oct-1998 thorpej

Use the pool allocator for ipflow entries.


# 1.72 08-Oct-1998 thorpej

Use the pool allocator for ipqent structures.


# 1.71 30-Sep-1998 tls

Switch order of TNF and UCB copyrights so UCB copyright is first; this seems more appropriate since UCB wrote the original code, after all.


# 1.70 09-Sep-1998 thorpej

Make a diagnostic printf more sensible, PR #5951, Heiko W. Rupp.


# 1.69 09-Aug-1998 mrg

defopt PFIL_HOOKS.


Revision tags: eeh-paddr_t-base
# 1.68 17-Jul-1998 sommerfe

Fix PR5508: ipfil cut-through forwarding causes panic


# 1.67 01-Jun-1998 thorpej

Protect the ipflow_reap() call with splsoftnet.


# 1.66 24-May-1998 thorpej

Fix OBOB in IP timestamp option processing, as noted in FreeBSD PR 6738,
from Jennifer Dawn Meyers <jdm@enteract.com>.


# 1.65 04-May-1998 matt

Default IP flow to being enabled. Add a sysctl to control the maximum
number of flows (net.inet.ip.maxflows). If set to 0, will disable fast
path forwarding.


# 1.64 01-May-1998 thorpej

Allow packet filters to prevent a packet from creating a fast-forwarding
flow, by setting the "can fast forward" flag in the packet header, and
giving a chance for filters to clear the flag. If the flag is still
set after the filters have given it a chance, the packet will be used
to create a fast-forward flow entry.


# 1.63 29-Apr-1998 matt

Add support for "fast" forwarding. Add hooks in if_ethersubr.c and
if_fddisubr.c to fastpath IP forwarding. If ip_forward successfully
forwards a packet, it will create a cache (ipflow) entry. ether_input
and fddi_input will first call ipflow_fastforward with the received
packet and if the packet passes enough tests, it will be forwarded (the
ttl is decremented and the cksum is adjusted incrementally).


# 1.62 29-Apr-1998 matt

defopt GATEWAY


# 1.61 29-Apr-1998 kml

change path MTU timeout value to match RFC 1191


# 1.60 29-Apr-1998 kml

Add support for deletion of routes added by path MTU discovery;
uses new generic route timeout code. Add sysctl for timeout period.


# 1.59 19-Mar-1998 mrg

convert pfil(9) in and out lists from <sys/queue.h> LISTs to TAILQs, and
change pfil_add_hook to put output filters at the tail of the queue,
while continuing to place input filters at the head of the queue. update
the two users of these functions, and document these changes.

fixes PR#4593.


# 1.58 15-Feb-1998 tls

Add correct copyright notice for IP address hash change. This code is donated to TNF by the original copyright holder, Panix.


# 1.57 13-Feb-1998 tls

Change list of interface IP addresses to a hash. Improves performance on hosts with a large number of IP addresses significantly.


# 1.56 28-Jan-1998 thorpej

Use offsetof() from libkern.h


# 1.55 12-Jan-1998 scottr

Use option header file for MROUTING


# 1.54 05-Jan-1998 lukem

enhance ephemeral port allocation code:
* support sysctl net.inet.ip.anonportmin (lowest ephemeral port)
and net.inet.ip.anonportmax (highest ephemeral port).
these can't be set to >65535, < IPPORT_RESERVED (unless IPNOPRIVPORTS
is defined), and anonportmin has to be < anonportmax.
* use a cleaner way of only cycling through the available set once;
this will be useful for when a random allocation scheme is used
* define IPPORT_ANON{MIN,MAX} instead of IPPORT_USER{LOW,HIGH}


Revision tags: netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base
# 1.53 18-Oct-1997 kml

branches: 1.53.2;
change sysctl net.inet.icmp.mtudisc to net.inet.ip.mtudisc


# 1.52 17-Oct-1997 thorpej

Allow `subnetsarelocal' to be changed via sysctl.


Revision tags: thorpej-signal-base marc-pcmcia-base
# 1.51 29-Aug-1997 gwr

Tweaks to allow operation with an interface address of 0.0.0.0
(needed for NFS mountroot using BOOTP to get boot parameters)


Revision tags: marc-pcmcia-bp
# 1.50 24-Jun-1997 thorpej

branches: 1.50.4;
Eliminate use of dtom() from the network code, allowing more flexible
use of mbuf external storage and increasing performance (by eliminating
an m_pullup() for clusters in the IP reassembly code).

Changes from Koji Imada <koji@math.human.nagoya-u.ac.jp>, in PR #3628
and #3480, with ever-so-slight integration changes by me.


# 1.49 15-Apr-1997 christos

Move the mtod calls *after* we've made sure that the packet has passed the
filter successfully. Otherwise it can be NULL if the filter blocked it,
and we die. How did this ever work?


Revision tags: is-newarp-before-merge
# 1.48 26-Feb-1997 mrg

allow src-routed packetd by default, per host requirements


# 1.47 25-Feb-1997 cjs

Add net.inet.ip.allowsrcrt option which allows/drops all source
routed packets. This currently defaults to `drop,' but once we
verify that all applications that rely on determining remote IP
addresses for authentication are dropping the connection when they
see a source route option (not just disabling the source route
option), we can turn this back on and conform with the host
requirements.


# 1.46 19-Feb-1997 cjs

Fix bug in sysctl net.inet.ip.forwsrcrt handing: now you can read it
if securelevel > 0. (Thanks, cgd.)


# 1.45 18-Feb-1997 mrg

pseudo-device ipfilter brings in PFIL_HOOKS.


Revision tags: is-newarp-base
# 1.44 11-Jan-1997 thorpej

branches: 1.44.4;
Implement the IP_RECVIF socket option: supply a datagram packet's incoming
interface using a sockaddr_dl in a control mbuf.

Implement SO_TIMESTAMP for IP datagrams.

Move packet information option processing into a generic function
so that they work with multicast UDP and raw IP as well as unicast UDP.

Contributed by Bill Fenner <fenner@parc.xerox.com>.


# 1.43 20-Dec-1996 mrg

in pfil_hooks: always reassign ip after calling hook.


# 1.42 20-Dec-1996 mrg

remove pfil_bad.


# 1.41 25-Oct-1996 thorpej

Before concatenating frags, sanity check the length of the packet. If it's
larger than IP_MAXPACKET, discard it.
Based on a patch from Bill Fenner <fenner@parc.xerox.com>


# 1.40 22-Oct-1996 veego

Fix a panic from the pfil_hooks.


# 1.39 13-Oct-1996 christos

backout previous kprintf changes


# 1.38 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.37 21-Sep-1996 perry

commit fix in pr 2772 -- the IP input code was assuming that the
reserved (must be zero) flag must necessarily be zero. We now define
an IP_RF (by analogy to IP_DF and IP_MF) and mask it out when necessary.


# 1.36 14-Sep-1996 mrg

move the packet filter hooks in to a saner location. while i'm here, rename
PACKET_FILTER to PFIL_HOOKS.


# 1.35 09-Sep-1996 mycroft

Add in_nullhost() and in_hosteq() macros, to hide some protocol
details. Also, fix a bug in TCP wrt SYN+URG packets.


# 1.34 08-Sep-1996 mycroft

Save 68 bytes of the packet for ICMP, not 64. From Laine Stump, PR 2296.


# 1.33 06-Sep-1996 mrg

add packet filter interface code. see pfil(9) for more details. you
need the PACKET_FILTER option to enable this code. currently, ipfilter
version 3.1.1-beta has been converted to use this new interface.


# 1.32 14-Aug-1996 thorpej

Fix some DIAGNOSTIC printf() formats; ntohl() provides a 32-bit quantity,
and should be printed with %x, not %lx.


# 1.31 10-Jul-1996 cgd

print result of ntohl/htonl as a long. (makes -Wformat work on the
Alpha.)


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.30 16-Mar-1996 christos

branches: 1.30.4;
Fix printf format args.


# 1.29 26-Feb-1996 mrg

two more local addr changes, all done differently now (idea from charles)


# 1.28 13-Feb-1996 christos

netinet prototypes


# 1.27 16-Jan-1996 thorpej

Add a net.inet.ip.directed-broadcast sysctl as suggested by
Darren Reed <darrenr@vitruvius.arbld.unimelb.edu.au> in PR #1227.
This change is slightly different than the one submitted by Darren in
that the DIRECTED_BROADCAST compile-time option will behave like it used
to so that existing configurations utilizing it won't have to change.


# 1.26 15-Jan-1996 thorpej

Add net.inet.ip.forwsrcrt: if zero, the system will not forward
source-routed packets. Note this value is protected by kernel security
level; it can only be changed if securelevel < 1.


# 1.25 21-Nov-1995 cgd

make netinet work on systems where pointers and longs are 64 bits
(like the alpha). Biggest problem: IP headers were overlayed with
structure which included pointers, and which therefore didn't overlay
properly on 64-bit machines. Solution: instead of threading pointers
through IP header overlays, add a "queue element" structure to do
the threading, and point it at the ip headers.


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.24 12-Aug-1995 mycroft

splnet --> splsoftnet


# 1.23 12-Jun-1995 mycroft

Change in_pcbnotify*() to take an errno value. Make inetctlerrmap[] an
array on ints, not u_chars.


# 1.22 12-Jun-1995 mycroft

Various cleanup, including:
* Convert several data structures to use queue.h.
* Split in_pcbnotify() into two parts; one for notifying a specific PCB, and
one for notifying all PCBs for a particular foreign address.


# 1.21 07-Jun-1995 mycroft

Remove ip_ifmatrix completely.


# 1.20 04-Jun-1995 mycroft

Don't cast things unnecessarily.


# 1.19 04-Jun-1995 mycroft

Clean up many more casts.


# 1.18 01-Jun-1995 mycroft

Avoid byte-swapping IP addresses at run time.


# 1.17 15-May-1995 cgd

oops; forgot a '{'


# 1.16 14-May-1995 cgd

drop (and record) malformed IP fragments. Fixes pr 1030 (differently).


# 1.15 13-Apr-1995 cgd

be a bit more careful and explicit with types. (basically a large no-op.)


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.14 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.13 13-May-1994 mycroft

Update to 4.4-Lite networking code, with a few local changes.


# 1.12 14-Feb-1994 mycroft

PARANOID --> DIAGNOSTIC for inexpensive tests.


# 1.11 02-Feb-1994 hpeyerl

Multicast is no longer optional.


# 1.10 29-Jan-1994 brezak

Fix some cases of NOT dealing with m_pkthdr's. This code is still suspect though, at least this fixes some panics.


# 1.9 10-Jan-1994 mycroft

Should compile now with or without `options MULTICAST'.


# 1.8 09-Jan-1994 mycroft

Prototype the rest.


# 1.7 08-Jan-1994 mycroft

More prototypes.


# 1.6 08-Jan-1994 mycroft

Fix some inconsistent spacing; spaces at the end of lines, etc.


# 1.5 18-Dec-1993 mycroft

Canonicalize all #includes.


# 1.4 06-Dec-1993 hpeyerl

multicast support.
>From Chris Maeda, cmaeda@cs.washington.edu
These patches are derived from the IP Multicast patches for BSDI.


Revision tags: magnum-base netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.3 20-May-1993 cgd

branches: 1.3.4;
more rcsid additions and file header cleanups


# 1.2 04-May-1993 cgd

make ip_input recursion checking be for -DPARANOID, and make it panic


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.400 07-Mar-2021 christos

netinet: Enable random IP fragment ids by default (from riastradh)


# 1.399 19-Feb-2021 christos

- Make ALIGNED_POINTER use __alignof(t) instead of sizeof(t). This is more
correct because it works with non-primitive types and provides the ABI
alignment for the type the compiler will use.
- Remove all the *_HDR_ALIGNMENT macros and asserts
- Replace POINTER_ALIGNED_P with ACCESSIBLE_POINTER which is identical to
ALIGNED_POINTER, but returns that the pointer is always aligned if the
CPU supports unaligned accesses.
[ as proposed in tech-kern ]


# 1.398 14-Feb-2021 christos

- centralize header align and pullup into a single inline function
- use a single macro to align pointers and expose the alignment, instead
of hard-coding 3 in 1/2 the macros.
- fix an issue in the ipv6 lt2p where it was aligning for ipv4 and pulling
for ipv6.


Revision tags: thorpej-futex-base
# 1.397 28-Aug-2020 ozaki-r

inet: reduce silent packet discards


# 1.396 28-Aug-2020 ozaki-r

inet: pull m_get_rcvif_psref out of ip_input for simplicity

Same as ip6_input.


# 1.395 28-Aug-2020 ozaki-r

ipsec: rename ipsec_ip_input to ipsec_ip_input_checkpolicy

Because it just checks if a packet passes security policies.


# 1.394 28-Aug-2020 ozaki-r

inet, inet6: count packets dropped by IPsec

The counters count packets dropped due to security policy checks.


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3 ad-namecache-base2 ad-namecache-base1 ad-namecache-base phil-wifi-20191119
# 1.393 13-Nov-2019 ozaki-r

Get rid of unnecessary NULL checks for rt_ifa and ifa_ifp

They are always non-NULL nowadays.


# 1.392 19-Sep-2019 ozaki-r

Apply some missing changes lost on the previous commit


# 1.391 19-Sep-2019 ozaki-r

Avoid having a rtcache directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.
A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.

Reviewed by knakahara@ and yamaguchi@


# 1.390 15-Sep-2019 bouyer

Packet filters can return an mbuf chain with fragmented headers, so
m_pullup() it if needed and remove the KASSERT()s.


Revision tags: netbsd-9-base phil-wifi-20190609
# 1.389 13-May-2019 ozaki-r

branches: 1.389.2;
Count packets dropped by pfil


Revision tags: isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118
# 1.388 17-Jan-2019 knakahara

Fix ipsecif(4) cannot apply input direction packet filter. Reviewed by ozaki-r@n.o and ryo@n.o.

Add ATF later.


Revision tags: pgoyette-compat-1226 pgoyette-compat-1126
# 1.387 15-Nov-2018 maxv

Remove the 't' argument from m_tag_find().


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.386 02-Sep-2018 maxv

remove reference to ipnat, and duplicate comments


Revision tags: pgoyette-compat-0728
# 1.385 10-Jul-2018 maxv

Remove the second argument from ip_reass_packet(). We want the IP header
on the mbuf, not elsewhere. Simplifies the NPF reassembly code a little.
No real functional change.


Revision tags: phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.384 17-May-2018 maxv

branches: 1.384.2;
Add KASSERTs, related to PR/39794.


# 1.383 14-May-2018 maxv

Merge ipsec4_input and ipsec6_input into ipsec_ip_input. Make the argument
a bool for clarity. Optimize the function: if M_CANFASTFWD is not there
(because already removed by the firewall) leave now.

Makes it easier to see that M_CANFASTFWD is not removed on IPv6.


# 1.382 10-May-2018 maxv

Rename ipsec4_forward -> ipsec_mtu, and switch to void.


Revision tags: pgoyette-compat-0502
# 1.381 26-Apr-2018 maxv

Remove unused mbuf argument from sbsavetimestamp.


Revision tags: pgoyette-compat-0422 pgoyette-compat-0415
# 1.380 15-Apr-2018 maxv

Introduce a m_verify_packet function, that verifies the mbuf chain of a
packet to ensure it is not malformed. Call this function in "points of
interest", that are the IPv4/IPv6/IPsec entry points. There could be more.

We use M_VERIFY_PACKET(m), declared under DIAGNOSTIC only.

This function should not be called everywhere, especially not in places
that temporarily manipulate (and clobber) the mbuf structure; once they're
done they put the mbuf back in a correct format.


# 1.379 11-Apr-2018 maxv

Don't pass IP_ALLOWBROADCAST in ipsec4_input. The flag lands in
ipsec_getpolicybyaddr, and only IP_FORWARDING is taken.

In fact it would be good to change the 'flags' argument of ipsec4_input
to be a boolean, same for ipsec_getpolicybyaddr. It would be less
misleading.


# 1.378 11-Apr-2018 maxv

Add comment about IPsec.


# 1.377 11-Apr-2018 maxv

Small changes in ip_dooptions: replace bcopy by memcpy, the areas can't
overlap.


Revision tags: pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.376 24-Feb-2018 ozaki-r

branches: 1.376.2;
Avoid a deadlock between softnet_lock and IFNET_LOCK

A deadlock occurs because there is a violation of the rule of lock ordering;
softnet_lock is held with hodling IFNET_LOCK, which violates the rule.
To avoid the deadlock, replace softnet_lock in in_control and in6_control
with KERNEL_LOCK.

We also need to add some KERNEL_LOCKs to protect the network stack surely.
This is required, for example, for PR kern/51356.

Fix PR kern/53043


# 1.375 09-Feb-2018 maxv

Remove dead code.


# 1.374 07-Feb-2018 maxv

Remove null check on ip, it can't be null. (Confuses code scanners.)


# 1.373 06-Feb-2018 maxv

Typos and style a bit, no functional change.


# 1.372 05-Feb-2018 maxv

Exterminate IPSENDREDIRECTS and IPMTUDISCTIMEOUT, neither is documented.


# 1.371 05-Feb-2018 maxv

Nuke DIRECTED_BROADCAST, it is not documented and not enabled anywhere. It
probably wouldn't have built correctly anyway, since there is no associated
defflag.

These ten lines of code in ip_input.c already look a lot better.


# 1.370 05-Feb-2018 maxv

Clean up this mess. This is typically the kind of places where we need to
seriously cut the bullshit. These things are unreadable, undocumented, and
all they bought us was not figuring out we had IPv4 forwarding enabled by
default for 20+ years.


# 1.369 05-Feb-2018 maxv

Be tougher, and don't allow LSRR+SSRR (RFC7126).


# 1.368 05-Feb-2018 maxv

Kick duplicate options, they are not allowed (RFC791).


# 1.367 05-Feb-2018 maxv

Remove unused variable.


# 1.366 05-Feb-2018 maxv

Disable ip_allowsrcrt and ip_forwsrcrt. Enabling them by default was a
completely dumb idea, because they have security implications.

By sending an IPv4 packet containing an LSRR option, an attacker will
cause the system to forward the packet to another IPv4 address - and
this way he white-washes the source of the packet.

It is also possible for an attacker to reach hidden networks: if a server
has a public address, and a private one on an internal network (network
which has several internal machines connected), the attacker can send a
packet with:

source = 0.0.0.0
destination = public address of the server
LSRR first address = address of a machine on the internal network

And the packet will be forwarded, by the server, to the internal machine,
in some cases even with the internal IP address of the server as a source.


# 1.365 05-Feb-2018 maxv

Style, no functional change.


# 1.364 01-Jan-2018 christos

1) "#define ipi_spec_dst ipi_addr" in <netinet/in.h>
2) Change the IP_RECVPKTINFO option to control the generation of
IP_PKTINFO control messages, the way it's done in Solaris.
3) Remove the superfluous IP_RECVPKTINFO control message.
4) Change the IP_PKTINFO option to do different things depending on
the parameter it's supplied with:
- If it's sizeof(int), assume it's being used as in Linux:
- If it's non-zero, turn on the IP_RECVPKTINFO option.
- If it's zero, turn off the IP_RECVPKTINFO option.
- If it's sizeof(struct in_pktinfo), assume it's being used as in
Solaris, to set a default for the source interface and/or
source address for outgoing packets on the socket.
5) Return what Linux or Solaris compatible code expects, depending
on data size, and just added a fallback to a Linux (and current NetBSD)
compatible value if the size is unknown (as it is now), or,
in the future, if the calling application specifies a receiving
buffer that doesn't match either data item.

From: Tom Ivar Helbekkmo


Revision tags: tls-maxphys-base-20171202
# 1.363 24-Nov-2017 roy

Allow local communication over DETACHED addresses.
Allow binding to DETACHED or TENTATIVE addresses as we deny
sending upstream from them anyway.
Prefer non DETACHED or TENTATIVE addresses.


# 1.362 17-Nov-2017 ozaki-r

Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch

It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.

No functional change


# 1.361 27-Sep-2017 ozaki-r

Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).


Revision tags: nick-nhusb-base-20170825
# 1.360 27-Jul-2017 ozaki-r

Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.


# 1.359 19-Jul-2017 ozaki-r

Correct a comment


Revision tags: perseant-stdc-iso10646-base
# 1.358 08-Jul-2017 christos

Reorder the controls to the ones that need an interface and the ones that
don't; process the ones that don't first. Add a DIAGNOSTIC if there is no
interface; really this should be a KASSERT/panic because it is a bug if the
interface is not set at this point.


# 1.357 06-Jul-2017 christos

remove unnecessary casts (no functional change)


# 1.356 06-Jul-2017 christos

Merge the two copies SO_TIMESTAMP/SO_OTIMESTAMP processing to a single
function, and add a SOOPT_TIMESTAMP define reducing compat pollution from
5 places to 1.


Revision tags: netbsd-8-base
# 1.355 01-Jun-2017 chs

branches: 1.355.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base
# 1.354 31-Mar-2017 ozaki-r

Don't use a single global variable to store source route information for multiple incoming packets

It's not MP-safe. So use a m_tag to store the information instead.

Pointed out by knakahara@
The fix is from OpenBSD (originally fixed in FreeBSD)


# 1.353 31-Mar-2017 ozaki-r

Don't use a single global variable as a temporal storage for multiple packets

It's not MP-safe. So use local variables instead.


Revision tags: pgoyette-localcount-20170320
# 1.352 06-Mar-2017 ozaki-r

Make sure icmp_redirect_timeout_q and ip_mtudisc_timeout_q are initialized on bootup

Fix PR kern/52029


# 1.351 17-Feb-2017 ozaki-r

Fix return value


# 1.350 17-Feb-2017 ozaki-r

Protect sysctl_net_inet_ip_pmtudto with icmp_mtx instead of softnet_lock


# 1.349 07-Feb-2017 ozaki-r

Add missing NULL checks for m_get_rcvif


Revision tags: nick-nhusb-base-20170204
# 1.348 24-Jan-2017 ozaki-r

Tweak softnet_lock and NET_MPSAFE

- Don't hold softnet_lock in some functions if NET_MPSAFE
- Add softnet_lock to sysctl_net_inet_icmp_redirtimeout
- Add softnet_lock to expire_upcalls of ip_mroute.c
- Restore softnet_lock for in{,6}_pcbpurgeif{,0} if NET_MPSAFE
- Mark some softnet_lock for future work


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107
# 1.347 12-Dec-2016 ozaki-r

branches: 1.347.2;
Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.346 08-Dec-2016 ozaki-r

Use psref for ip_rtaddr

ip_rtaddr will be sleepable soon. So use psref instead of pserialize.


# 1.345 08-Dec-2016 ozaki-r

Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.344 18-Oct-2016 ozaki-r

Don't hold global locks if NET_MPSAFE is enabled

If NET_MPSAFE is enabled, don't hold KERNEL_LOCK and softnet_lock in
part of the network stack such as IP forwarding paths. The aim of the
change is to make it easy to test the network stack without the locks
and reduce our local diffs.

By default (i.e., if NET_MPSAFE isn't enabled), the locks are held
as they used to be.

Reviewed by knakahara@


# 1.343 18-Oct-2016 ozaki-r

Avoid double frees of mbuf

May fix one of panicks reported by Tom Ivar Helbekkmo in PR kern/51522


# 1.342 11-Oct-2016 ozaki-r

Fix kernel builds with IFA_STATS


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.341 07-Sep-2016 roy

Disallow input to detached addresses because they are not yet valid.


# 1.340 31-Aug-2016 ozaki-r

Make ipforward_rt and ip6_forward_rt percpu

Sharing one rtcache between CPUs is just a bad idea.

Reviewed by knakahara@


Revision tags: pgoyette-localcount-20160806
# 1.339 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


# 1.338 26-Jul-2016 ozaki-r

Fix downmatch increment


Revision tags: pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.337 08-Jul-2016 ozaki-r

branches: 1.337.2;
CID 1363344: remove dead code

We may need to reconsider a case when m_get_rcvif_psref returns NULL.


# 1.336 07-Jul-2016 ozaki-r

Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.


# 1.335 06-Jul-2016 ozaki-r

Switch the IPv4 address list to pslist(9)

Note that we leave the old list just in case; it seems there are some
kvm(3) users accessing the list. We can remove it later if we confirmed
nobody does actually.


# 1.334 06-Jul-2016 ozaki-r

Add and use pslist(9)-based hashtable for IPv4 addresses

Note that we leave the old hashtable to keep vmstat -H working.


# 1.333 04-Jul-2016 ozaki-r

Separate IP address matching functions

No functional change intended.


# 1.332 30-Jun-2016 ozaki-r

Tidy up goto lables

No functional change.


# 1.331 30-Jun-2016 ozaki-r

Fix error paths

Some error paths did m_put_rcvif_psref twice.


# 1.330 28-Jun-2016 ozaki-r

Add missing NULL checks for m_get_rcvif_psref


# 1.329 10-Jun-2016 ozaki-r

Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319
# 1.328 21-Jan-2016 riastradh

Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.


# 1.327 21-Jan-2016 riastradh

Give proper prototype to ip_output.


# 1.326 08-Jan-2016 knakahara

eliminate ip_input.c and ip6_input.c dependency on gif(4)


Revision tags: nick-nhusb-base-20151226
# 1.325 13-Oct-2015 roy

Include arp.h to restore the sysctl net.inet.ip.dad_count.
Fixes PR kern/49883 thanks to HITOSHI Osada.


Revision tags: nick-nhusb-base-20150921
# 1.324 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.323 07-Aug-2015 ozaki-r

Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.


Revision tags: nick-nhusb-base-20150606
# 1.322 02-May-2015 joerg

Fix !ARP build.


# 1.321 02-May-2015 roy

Add IPv4 address flags IN_IFF_TENTATIVE, IN_IFF_DUPLICATED and
IN_IFF_DETATCHED to mimic the IPv6 address behaviour.
Add SIOCGIFAFLAG_IN ioctl to retrieve the address flag via the
ifreq structure.
Add IPv4 DAD detection via the ARP methods described in RFC 5227.
Add sysctls net.inet.ip.dad_count and net.inet.arp.debug.

Discussed on tech-net@


Revision tags: nick-nhusb-base-20150406
# 1.320 26-Mar-2015 ozaki-r

Tidy up the regular path of ip_forward

No functional change is intended.


Revision tags: netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.319 16-Jun-2014 ozaki-r

branches: 1.319.2; 1.319.4; 1.319.6; 1.319.10;
Add 3rd argument to pktq_create to pass sc

It will be used to pass bridge sc for bridge_forward softint.

ok rmind@


# 1.318 05-Jun-2014 rmind

- Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.


# 1.317 30-May-2014 christos

Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.


# 1.316 29-May-2014 rmind

Make IGMP and multicast group management code MP-safe. Use a read-write
lock to protect the hash table of multicast address records; also, make it
private and eliminate some macros. In the long term, the lookup path ought
to be optimised.


# 1.315 28-May-2014 christos

CID 12164{49,51}: Remove bogus ifp == NULL checks; if ifp was really NULL,
we would have been dead a few lines before the tests.


# 1.314 23-May-2014 rmind

ip_input(), ip_savecontrol(): cache m->m_pkthdr.rcvif in a variable.


# 1.313 23-May-2014 rmind

Make ip_forward() static, there is no need to expose it.


# 1.312 23-May-2014 rmind

Make ip_input() static, there is no need to expose it.


# 1.311 22-May-2014 rmind

- Add in_init() and move some functions, variables and sysctls into in.c
where they belong to. Make some functions and variables static.
- ip_input.c: reduce some #ifdefs, cleanup a little.
- Move some sysctls into ip_flow.c as they belong there.

No functional change.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.310 19-Mar-2014 liamjfoy

branches: 1.310.2;
Remove ipflow_prune and replace with ipflow_reap. ok rmind@


Revision tags: riastradh-drm2-base3
# 1.309 25-Feb-2014 pooka

Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.308 29-Jun-2013 rmind

- Rewrite parts of pfil(9): use array to store hooks and thus be more cache
friendly (there are only few hooks in the system). Make the structures
opaque and the interface more strict.
- Remove PFIL_HOOKS option by making pfil(9) mandatory.


# 1.307 27-Jun-2013 christos

branches: 1.307.2;
flip src/dst


# 1.306 27-Jun-2013 christos

implement IP_PKTINFO and IP_RECVPKTINFO.


# 1.305 08-Jun-2013 rmind

Split IPsec code in ip_input() and ip_forward() into the separate routines
ipsec4_input() and ipsec4_forward(). Tested by christos@.


# 1.304 05-Jun-2013 christos

IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.


Revision tags: agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.303 29-Nov-2012 christos

Add a new sysctl to mark ports as reserved, so that they are not used in
the anonymous or reserved port allocation.


Revision tags: yamt-pagecache-base6
# 1.302 25-Jun-2012 christos

branches: 1.302.2;
rename rfc6056 -> portalgo, requested by yamt


# 1.301 22-Jun-2012 christos

PR/46602: Move the rfc6056 port randomization to the IP layer.


# 1.300 02-Jun-2012 dsl

Add some pre-processor magic to verify that the type of the data item
passed to sysctl_createv() actually matches the declared type for
the item itself.
In the places where the caller specifies a function and a structure
address (typically the 'softc') an explicit (void *) cast is now needed.
Fixes bugs in sys/dev/acpi/asus_acpi.c sys/dev/bluetooth/bcsp.c
sys/kern/vfs_bio.c sys/miscfs/syncfs/sync_subr.c and setting
AcpiGbl_EnableAmlDebugObject.
(mostly passing the address of a uint64_t when typed as CTLTYPE_INT).
I've test built quite a few kernels, but there may be some unfixed MD
fallout. Most likely passing &char[] to char *.
Also add CTLFLAG_UNSIGNED for unsiged decimals - not set yet.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.299 22-Mar-2012 drochner

remove KAME IPSEC, replaced by FAST_IPSEC


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.298 09-Jan-2012 liamjfoy

branches: 1.298.2; 1.298.6; 1.298.8;
check against NULL


# 1.297 19-Dec-2011 drochner

rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.296 31-Aug-2011 plunky

branches: 1.296.2; 1.296.6;
NULL does not need a cast


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.295 03-May-2011 dyoung

*_drain() routines may be called with locks held, so instead of doing
any work in *_drain(), set a drain-needed flag. Do the work in the
fasttimo handler.

Contributed by Coyote Point Systems, Inc.


# 1.294 14-Apr-2011 dyoung

In ipintr(), don't overwrite ipintrq.ifq_maxlen with IFQ_MAXLEN.

Initialize ipintrq.ifq_maxlen using IFQ_MAXLEN directly instead of using
the global ipqmaxlen. Get rid of the global ipqmaxlen.

Now it works again to override the maximum IP queue length with, for
example, sysctl -w net.inet.ip.ifq.maxlen=5.


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.293 13-Dec-2010 matt

branches: 1.293.2;
Back out rev that shouldn't have been committed.


# 1.292 11-Dec-2010 matt

Add routines to calculate a checkesum if the driver concludes that the
h/w can't do it.


Revision tags: uebayasi-xip-base4
# 1.291 05-Nov-2010 rmind

ip_randomid: make mechanism MP-safe and more modular.

OK matt@


# 1.290 05-Nov-2010 rmind

ip_reass_packet: finish abstraction; some clean-up.
Discussed some time ago with matt@.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.289 19-Jul-2010 rmind

Abstract IP reassembly into single generic routine - ip_reass_packet().
Make struct ipq private and struct ipqent not visible to userland.
Push ip_len adjustment into reassembly layer.

OK matt@


# 1.288 13-Jul-2010 rmind

Split-off IPv4 re-assembly mechanism into a separate module. Abstract
into ip_reass_init(), ip_reass_lookup(), etc (note: abstraction is not
yet complete). No functional changes to the actual mechanism.

OK matt@


# 1.287 09-Jul-2010 rmind

ip_input: move lookup for fragment queue a little bit further. OK matt@.


Revision tags: uebayasi-xip-base1
# 1.286 01-Apr-2010 tls

As suggested by at least 3 different people (the guilty parties know who
they are) avoid repeated kernel_lock/unlock by using an intrq on the stack.

About 5%-10% better from run to run, on my *very* simpleminded test. Can't
possibly be worse.


# 1.285 31-Mar-2010 tls

Don't hold kernel lock across call to ip_input() -- it blocked *all*
hardware interrupts for the length of time it took for all dequeued
packets to flow up the stack (on multiprocessors only). Initial testing
shows performance impact is minimal -- since this temporary fix actually
means taking/releasing the kernel lock per-packet, that seems
acceptable.

Holding the kernel lock across the ip_input() call duplicated the
exclusion intended to be provided by the socket locks/softnet lock
(same lock, for INET/INET6 sockets) and could mask serious bugs. Several
hours' testing didn't turn any up but I'd be surprised if some don't now
appear.

Damon Permezel noticed the problem. Temporary fix suggested by matt@.


Revision tags: yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.284 16-Sep-2009 pooka

branches: 1.284.2; 1.284.4;
Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.283 17-Jul-2009 minskim

Delete trailing whitespace.


Revision tags: yamt-nfs-mp-base6
# 1.282 16-Jul-2009 minskim

Add the IP_RECVTTL option support.

If the IP_RECVTTL option is enabled on a SOCK_DGRAM socket, the
recvmsg(2) call will return the TTL of the received datagram. The
msg_control field in the msghdr structure points to a buffer that
contains a cmsghdr structure followed by the TTL value.

Modeled after FreeBSD implementation.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.281 18-Apr-2009 tsutsui

Remove extra whitespace added by a stupid tool.
XXX: more in src/sys/arch


# 1.280 15-Apr-2009 elad

Remove a few KAUTH_GENERIC_ISSUSER in favor of more descriptive
alternatives.

Discussed on tech-kern:

http://mail-index.netbsd.org/tech-kern/2009/04/11/msg004798.html

Input from ad@, christos@, dyoung@, tsutsui@.

Okay ad@.


# 1.279 18-Mar-2009 cegger

bcopy -> memcpy


Revision tags: nick-hppapmap-base2
# 1.278 19-Jan-2009 christos

branches: 1.278.2;
Provide compatibility to the old timeval SCM_TIMESTAMP messages.


Revision tags: mjf-devfs2-base
# 1.277 17-Dec-2008 cegger

kill MALLOC and FREE macros.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.276 23-Nov-2008 rmind

ip_input: fix an IPQ "lock" leak. (hi <matt>!)


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4
# 1.275 04-Oct-2008 pooka

branches: 1.275.2; 1.275.4;
POOL_INIT -> pool_init


Revision tags: wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.274 05-Sep-2008 seanb

Wrong route being consulted in one place
in ip_forward() after change to rtcache_*().
Restore previous behaviour.


# 1.273 20-Aug-2008 matt

Make the sysctl routines take out softnet_lock before dealing with
any data structures.

Change inet6ctlerrmap and zeroin6_addr to const.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base
# 1.272 05-May-2008 ad

branches: 1.272.2; 1.272.6;
- Convert hashinit() to use kmem_alloc(). The hash tables can be large
and it's better to not have them in kmem_map.
- Convert a couple of minor items along the way to kmem_alloc().
- Fix some memory leaks.


# 1.271 04-May-2008 thorpej

Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577


# 1.270 02-May-2008 ad

PR kern/38497 Out of memory allocating ksiginfo

Work around: don't acquire softnet_lock in protocol drain routines.


# 1.269 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.268 24-Apr-2008 ad

branches: 1.268.2;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.


# 1.267 23-Apr-2008 thorpej

Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.266 12-Apr-2008 thorpej

branches: 1.266.2;
Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.


# 1.265 09-Apr-2008 thorpej

- ipflow is not used outside ip_flow.c; move its definition there.
- Make ipflow_reap() private to ip_flow.c, and introduce ipflow_prune()
for external callers to use (avoids returning an ipflow * that is never
actually used anyway).


# 1.264 07-Apr-2008 thorpej

Change IP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ipstat structure; old netstat
binaries will continue to work properly.


# 1.263 27-Mar-2008 cube

- Make sure we send a reasonable fragment size when IPSEC is configured.
Otherwise we end up sending a dubious "0" whenever we cannot find a
proper association for the packet.
- Reset sack_newdata along with snd_nxt to avoid improper integer
arithmetics that lead to sending data from an incorrect place in the
stream, making it appear as corrupted.

Patch by Michael Van Elst, based on an analysis by Michael for the IPSEC
stuff and I for the SACK issue.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.262 06-Feb-2008 matt

branches: 1.262.6;
Add a new ip_id generation scheme based on a Fisher-Yates shuffle over a
sliding window. XXX replace use of arc4random RSN.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.261 14-Jan-2008 dyoung

Use rtcache_validate() instead of rtcache_getrt(). Shorten staircase
in in_losing().


Revision tags: vmlocking2-base3 matt-armv6-base
# 1.260 22-Dec-2007 matt

Fix offset calculation.
Make sure that all frags use the same TOS.


# 1.259 21-Dec-2007 matt

Also make sure the first is at 68 bytes long.


# 1.258 21-Dec-2007 matt

Prevent TCP blind data attacks by not allowing non-initial fragments to
start at less than 68 bytes (minimal fragment size).


# 1.257 20-Dec-2007 dyoung

Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.256 26-Nov-2007 yamt

branches: 1.256.2; 1.256.6;
inetctlerrmap: use designated initializer.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.255 09-Nov-2007 kefren

Don't MCLAIM in ipintr() because we do it anyway in ip_input()


Revision tags: jmcneill-base yamt-x86pmap-base4 yamt-x86pmap-base3 yamt-x86pmap-base2 vmlocking-base
# 1.254 02-Oct-2007 dyoung

branches: 1.254.2; 1.254.4;
Delete the unused second argument to ip_stripoptions(), move it
closer to its single caller in if_eon.c, try to move fewer bytes
by moving the IP header forward instead of moving the tail of the
mbuf backward, and use m_adj(9) instead of fiddling directly with
mbuf data members.


Revision tags: yamt-x86pmap-base
# 1.253 11-Sep-2007 degroote

branches: 1.253.2;
In some FAST_IPSEC, spl level is not restored correctly. Fix that.

Spotted by Wolfgang Stukenbrock in pr/36800


Revision tags: nick-csl-alignment-base5
# 1.252 30-Aug-2007 dyoung

Use malloc(9) for sockaddrs instead of pool(9), and remove dom_sa_pool
and dom_sa_len members from struct domain. Pools of fixed-size
objects are too rigid for sockaddr_dls, whose size can vary over
a wide range.

Return sockaddr_dl to its "historical" size. Now that I'm using
malloc(9) instead of pool(9) to allocate sockaddr_dl, I can create
a sockaddr_dl of any size in the kernel, so expanding sockaddr_dl
is useless.

Avoid using sizeof(struct sockaddr_dl) in the kernel.

Introduce sockaddr_dl_alloc() for allocating & initializing an
arbitrary sockaddr_dl on the heap.

Add an argument, the sockaddr length, to sockaddr_alloc(),
sockaddr_copy(), and sockaddr_dl_setaddr().

Constify: LLADDR() -> CLLADDR().

Where the kernel overwrites LLADDR(), use sockaddr_dl_setaddr(),
instead. Used properly, sockaddr_dl_setaddr() will not overrun
the end of the sockaddr.


# 1.251 10-Aug-2007 dyoung

branches: 1.251.2;
Use sockaddr_dl_init().


Revision tags: matt-mips64-base
# 1.250 19-Jul-2007 dyoung

branches: 1.250.4; 1.250.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base yamt-idlelwp-base8 mjf-ufs-trans-base
# 1.249 02-May-2007 dyoung

branches: 1.249.2;
Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


Revision tags: thorpej-atomic-base
# 1.248 25-Mar-2007 liamjfoy

Add net.inet.ip.hashsize to control the IPv4 fast forward hash table size.


# 1.247 24-Mar-2007 liamjfoy

Don't call ip*flow_reap if we're just looking up maxflows


# 1.246 12-Mar-2007 ad

branches: 1.246.2; 1.246.4;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.


# 1.245 05-Mar-2007 liamjfoy

branches: 1.245.2;
Move ipflow_slowtimo from ip_slowtimo and into in_proto.c

ok matt@


# 1.244 04-Mar-2007 christos

Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.243 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.242 29-Jan-2007 dyoung

branches: 1.242.2;
Cosmetic: remove extraneous, non-KNF parentheses. Change a
sizeof(type) to a sizeof(*ptr) so the correctness of the statement
is correct "at a glance" (or so I hope).


# 1.241 22-Dec-2006 ad

ipintr(): check if the queue is empty before looping. Hardly a giant
win, but removed 30% of splnet() calls in one local test.


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.240 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.239 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


# 1.238 06-Dec-2006 dyoung

KNF.


# 1.237 06-Dec-2006 dyoung

KNF.


Revision tags: netbsd-4-0-RC1 netbsd-4-base
# 1.236 16-Nov-2006 christos

branches: 1.236.2; 1.236.4;
__unused removal on arguments; approved by core.


Revision tags: yamt-splraiseipl-base2
# 1.235 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


# 1.234 10-Oct-2006 dogcow

change the MOWNER_INIT define to take two args; fix extant struct mowner
decls to use it. Makes options MBUFTRACE compile again and not whinge about
missing structure declarations. (Also makes initialization consistent.)


# 1.233 05-Oct-2006 tls

Protect calls to pool_put/pool_get that may occur in interrupt context
with spl used to protect other allocations and frees, or datastructure
element insertion and removal, in adjacent code.

It is almost unquestionably the case that some of the spl()/splx() calls
added here are superfluous, but it really seems wrong to see:

s=splfoo();
/* frob data structure */
splx(s);
pool_put(x);

and if we think we need to protect the first operation, then it is hard
to see why we should not think we need to protect the next. "Better
safe than sorry".

It is also almost unquestionably the case that I missed some pool
gets/puts from interrupt context with my strategy for finding these
calls; use of PR_NOWAIT is a strong hint that a pool may be used from
interrupt context but many callers in the kernel pass a "can wait/can't
wait" flag down such that my searches might not have found them. One
notable area that needs to be looked at is pf.

See also:

http://mail-index.netbsd.org/tech-kern/2006/07/19/0003.html
http://mail-index.netbsd.org/tech-kern/2006/07/19/0009.html


# 1.232 19-Sep-2006 elad

Remove ugly (void *) casts from network scope authorization wrapper and
calls to it.

While here, adapt code for system scope listeners to avoid some more
casts (forgotten in previous run).

Update documentation.


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9
# 1.231 13-Sep-2006 elad

branches: 1.231.2;
Don't use KAUTH_RESULT_* where it's not applicable.
Prompted by yamt@.


# 1.230 08-Sep-2006 elad

First take at security model abstraction.

- Add a few scopes to the kernel: system, network, and machdep.

- Add a few more actions/sub-actions (requests), and start using them as
opposed to the KAUTH_GENERIC_ISSUSER place-holders.

- Introduce a basic set of listeners that implement our "traditional"
security model, called "bsd44". This is the default (and only) model we
have at the moment.

- Update all relevant documentation.

- Add some code and docs to help folks who want to actually use this stuff:

* There's a sample overlay model, sitting on-top of "bsd44", for
fast experimenting with tweaking just a subset of an existing model.

This is pretty cool because it's *really* straightforward to do stuff
you had to use ugly hacks for until now...

* And of course, documentation describing how to do the above for quick
reference, including code samples.

All of these changes were tested for regressions using a Python-based
testsuite that will be (I hope) available soon via pkgsrc. Information
about the tests, and how to write new ones, can be found on:

http://kauth.linbsd.org/kauthwiki

NOTE FOR DEVELOPERS: *PLEASE* don't add any code that does any of the
following:

- Uses a KAUTH_GENERIC_ISSUSER kauth(9) request,
- Checks 'securelevel' directly,
- Checks a uid/gid directly.

(or if you feel you have to, contact me first)

This is still work in progress; It's far from being done, but now it'll
be a lot easier.

Relevant mailing list threads:

http://mail-index.netbsd.org/tech-security/2006/01/25/0011.html
http://mail-index.netbsd.org/tech-security/2006/03/24/0001.html
http://mail-index.netbsd.org/tech-security/2006/04/18/0000.html
http://mail-index.netbsd.org/tech-security/2006/05/15/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/01/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/25/0000.html

Many thanks to YAMAMOTO Takashi, Matt Thomas, and Christos Zoulas for help
stablizing kauth(9).

Full credit for the regression tests, making sure these changes didn't break
anything, goes to Matt Fleming and Jaime Fournier.

Happy birthday Randi! :)


Revision tags: yamt-pdpolicy-base8 rpaulo-netinet-merge-pcb-base
# 1.229 30-Aug-2006 christos

branches: 1.229.2;
fix initializer


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.228 30-Jul-2006 elad

ugh.. more stuff that's overdue and should not be in 4.0: remove the
sysctl(9) flags CTLFLAG_READONLY[12]. luckily they're not documented
so it's only half regression.

only two knobs used them; proc.curproc.corename (check added in the
existing handler; its CTLFLAG_ANYWRITE, yay) and net.inet.ip.forwsrcrt,
that got its own handler now too.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base
# 1.227 07-Jun-2006 kardel

merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base simonb-timecounters-base
# 1.226 08-May-2006 liamjfoy

branches: 1.226.2;
#if -> #ifdef

ok christos


# 1.225 15-Apr-2006 christos

Coverity CID 1134: Protect against NULL deref.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.224 18-Feb-2006 joerg

branches: 1.224.2; 1.224.4; 1.224.6;
Print the source and destination IP in ip_forward's DIAGNOSTIC code
with inet_ntoa, making it more human friendly.

From Liam J. Foy in private mail.


# 1.223 24-Dec-2005 perry

branches: 1.223.2; 1.223.4; 1.223.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.222 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.221 01-Nov-2005 christos

Don't decrement the ttl, until we are sure that we can forward this packet.
Before if there was no route, we would call icmp_error with a datagram
packet that has an incorrect checksum. (From Liam Foy)


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base
# 1.220 23-Oct-2005 christos

No need to pass an interface when only the mtu is needed. From OpenBSD via
Liam Foy.


Revision tags: yamt-vop-base
# 1.219 05-Aug-2005 elad

branches: 1.219.2;
Add sysctls for IP, ICMP, TCP, and UDP statistics.


# 1.218 28-Jun-2005 seanb

branches: 1.218.2;
- Return ICMP_UNREACH_NET when no route found as per
section 4.3.3.1 of rfc1812.


# 1.217 09-Jun-2005 atatat

Properly fix the constipated lossage wrt -Wcast-qual and the sysctl
code. I know it's not the prettiest code, but it seems to work rather
well in spite of itself.


# 1.216 01-Jun-2005 blymn

Unconstify rnode to prevent compile error when GATEWAY option set.


Revision tags: kent-audio2-base
# 1.215 29-Apr-2005 yamt

move decl of inetsw to its own header to avoid array of incomplete type.
found by gcc4. reported by Adam Ciarcinski.


# 1.214 18-Apr-2005 yamt

fix problems related to loopback interface checksum omission. PR/29971.

- for ipv4, defer decision to ip layer as h/w checksum offloading does
so that it can check the actual interface the packet is going to.
- for ipv6, disable it.
(maybe will be revisited when it implements h/w checksum offloading.)

ok'ed by Jason Thorpe.


# 1.213 29-Mar-2005 yamt

ip_reass: clear stale csum_flags.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base
# 1.212 26-Feb-2005 perry

branches: 1.212.2;
nuke trailing whitespace


Revision tags: yamt-km-base2
# 1.211 03-Feb-2005 perry

ANSIfy function declarations


# 1.210 02-Feb-2005 perry

de-__P -- will ANSIfy .c files later.


Revision tags: yamt-km-base
# 1.209 24-Jan-2005 matt

branches: 1.209.2;
Add IFNET_FOREACH and IFADDR_FOREACH macros and start using them.


Revision tags: kent-audio1-beforemerge
# 1.208 19-Dec-2004 christos

branches: 1.208.2;
yamt's changes seem to fix all the checksumming issues. Turn the loopback
checksums back off so we can make sure that everything works.


# 1.207 17-Dec-2004 christos

Turn checksumming on loopback back on until we fix the bugs in it.
Connect over tcp on the loopback is broken:

4729 amq 0.000007 CALL connect(4,0x804f2a0,0x1c)
4729 amq 75.007420 RET connect -1 errno 60 Connection timed out


# 1.206 15-Dec-2004 thorpej

Don't perform checksums on loopback interfaces. They can be reenabled with
the net.inet.*.do_loopback_cksum sysctl.

Approved by: groo


Revision tags: kent-audio1-base
# 1.205 06-Oct-2004 darrenr

Add a comment to document what setting "srcrt" is really on about in ipintr()


# 1.204 29-Sep-2004 christos

PR/27081: Sean Boudreau: ip_input() bad csum count not incremented on sw csum


Revision tags: BEFORE-IPF413
# 1.203 25-May-2004 atatat

Sysctl descriptions under net subtree (net.key not done)


# 1.202 02-May-2004 darrenr

at line 543, we do a pullup here of hlen bytes into the mbuf,
so these later ones are superfluous.


# 1.201 01-May-2004 matt

Use EVCNT_ATTACH_STATIC{,2}


# 1.200 25-Apr-2004 simonb

Initialise (most) pools from a link set instead of explicit calls
to pool_init. Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.

Convert struct session, ucred and lockf to pools.


# 1.199 22-Apr-2004 matt

Constify protosw arrays. This can reduce the kernel .data section by
over 4K (if all the network protocols) are loaded.


# 1.198 01-Apr-2004 matt

In ip_reass_ttl_descr, make i signed since it's compared to >= 0


Revision tags: netbsd-2-0-base BEFORE-IPF411
# 1.197 24-Mar-2004 atatat

branches: 1.197.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.


# 1.196 15-Jan-2004 itojun

correct typo in 1.94 -> 1.95. pointed out by Shiva Shenoy


# 1.195 14-Dec-2003 thorpej

Fix syntax errors in CHECK_NMBCLUSTER_PARAMS().


# 1.194 14-Dec-2003 jonathan

Second part of hashed IP_reassembly changes:

When under pressure for mbufs or we have too many fragments in the IP
reassembly queue, drop half of all fragments. This multiplicative-drop
strategy ensures we return to a healthy state, even under borderline
denial-of-service from extremely lossy NFS-over-UDP peers.
The multiplicative-drop phase currently drops 50% of fragments, but
has pre-placed support for implementing drop-fractions other than 50%

The threshhold for the `drop-half' phase is the new variable,
ip_maxfrags which is calculated as nmbclusters/4.

ip_input.c now keeps ip_nmbclusters, a cached copy of nmbclusters.
Before using limits derived from nmbclusters, we check if nmbclusters
and ip_nmclusters are equal. If not, we recompute Ip parameters
derived from nmbclusters. Based on a suggestion by Jason Thorpe.
ip_maxfrags is currently auto-recalcuated.

The counters ip_nfrags and ip_nfragpacketsr are now declared static
and uninitialized (bss), to discourage tampering with them.


# 1.193 12-Dec-2003 scw

Make fast-ipsec and ipflow (Fast Forwarding) interoperate.

The idea is that we only clear M_CANFASTFWD if an SPD exists
for the packet. Otherwise, it's safe to add a fast-forward
cache entry for the route.

To make this work properly, we invalidate the entire ipflow
cache if a fast-ipsec key is added or changed.


# 1.192 08-Dec-2003 jonathan

Add new field ipq_nfrags to struct ipq. Maintain count of fragments
(fragments, not fragmented packets) in each queue entry.
Use ipq_nfrags to maintain a count of total fragments in reassembly queue.


# 1.191 07-Dec-2003 jonathan

KNF: s/unsigned/u_int/, in a couple of places I missed.


# 1.190 06-Dec-2003 jonathan

Replace the single global IP reassembly list/listhead, with a
hashtable of list-heads. Independently re-invented, then reworked to
match similar code in FreeBSD.


# 1.189 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.188 04-Dec-2003 scw

ipflow (IP fast forwarding) is not compatible with FAST_IPSEC either.

XXX: The decision whether or not to fast forward should be made
XXX: dynamically. Using the current approach seriously reduces
XXX: routing performance on gateways with IPsec enabled.


# 1.187 26-Nov-2003 itojun

define RANDOM_IP_ID by default (unifdef -DRANDOM_IP_ID).
one use remains in sys/netipsec, which is kept for freebsd source code compat.


# 1.186 24-Nov-2003 scw

For FAST_IPSEC, ipfilter gets to see wire-format IPsec-encapsulated packets
only. Decapsulated packets bypass ipfilter. This mimics current behaviour
for Kame IPsec.


# 1.185 19-Nov-2003 fvdl

Correct number of arguments to sysctl_rdint.


# 1.184 19-Nov-2003 jonathan

Patch back support for (badly) randomized IP ids, by request:

* Include "opt_inet.h" everywhere IP-ids are generated with ip_newid(),
so the RANDOM_IP_ID option is visible. Also in ip_id(), to ensure
the prototype for ip_randomid() is made visible.

* Add new sysctl to enable randomized IP-ids, provided the kernel was
configured with RANDOM_IP_ID. (The sysctl defaults to zero, and is
a read-only zero if RANDOM_IP_ID is not configured).

Note that the implementation of randomized IP ids is still defective,
and should not be enabled at all (even if configured) without
very careful deliberation. Caveat emptor.


# 1.183 17-Nov-2003 jonathan

Diff to netinet/ip_input.c (restore ip_id, initialize) for ip_id fix:

Revert the (default) ip_id algorithm to the pre-randomid algorithm,
due to demonstrated low-period repeated IDs from the randomized IP_id
code. Consensus is that the low-period repetition (much less than
2^15) is not suitable for general-purpose use.

Allocators of new IPv4 IDs should now call the function ip_newid().
Randomized IP_ids is now a config-time option, "options RANDOM_IP_ID".
ip_newid() can use ip_random-id()_IP_ID if and only if configured
with RANDOM_IP_ID. A sysctl knob should be provided.

This API may be reworked in the near future to support linear ip_id
counters per (src,dst) IP-address pair.


# 1.182 12-Nov-2003 itojun

KNF


# 1.181 11-Nov-2003 jonathan

Change global head-of-local-IP-address list from in_ifaddr to
in_ifaddrhead. Recent changes in struct names caused a namespace
collision in fast-ipsec, which are most cleanly fixed by using
"in_ifaddrhead" as the listhead name.


# 1.180 10-Nov-2003 jonathan

Make per-protocol network input queue stats visible to userland via
sysctl. Add a protocol-independent sysctl handler to show the per-protocol
"struct ifq' statistics. Add IP(v4) specific call to the handler.
Other protocols can show their per-protocol input statistics by
allocating a sysclt node and calling sysctl_ifq() with their own struct ifq *.

As posted to tech-kern plus improvements/cleanup suggested by Andrew Brown.


# 1.179 28-Sep-2003 mycroft

Remove some code that breaks AH tunnels completely. The comment describing
the purpose of this code appears to be on crack -- it's talking about
end-to-end authentication, but the purpose of an AH tunnel is NOT end-to-end
authentication; it's authentication of the tunnel endpoints.

NB: This does not fix the fact that IPsec leaks "packet tags."


# 1.178 06-Sep-2003 itojun

randomize IPv4/v6 fragment ID and IPv6 flowlabel. avoids predictability
of these fields. ip_id.c is from openbsd. ip6_id.c is adapted by kame.


# 1.177 06-Sep-2003 itojun

backout previous, we don't know if arc4random() corrides on reboot.


# 1.176 05-Sep-2003 itojun

initialize fragment ID with arc4random, not by time.tv_sec


# 1.175 22-Aug-2003 itojun

remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.


# 1.174 22-Aug-2003 itojun

change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.


# 1.173 15-Aug-2003 jonathan

(fast-ipsec): Add hooks to pass IPv4 IPsec traffic into fast-ipsec, if
configured with ``options FAST_IPSEC''. Kernels with KAME IPsec or
with no IPsec should work as before.

All calls to ip_output() now always pass an additional compulsory
argument: the inpcb associated with the packet being sent,
or 0 if no inpcb is available.

Fast-ipsec tested with ICMP or UDP over ESP. TCP doesn't work, yet.


# 1.172 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.171 14-Jul-2003 itojun

correct igmp. from love


# 1.170 03-Jul-2003 itojun

minor KNF


# 1.169 30-Jun-2003 itojun

branches: 1.169.2;
do not generate ICMP redirect when packet filter alters ip_dst to an
address that reside on the same link. Cedric Berger convinced me that
it is necessary.


# 1.168 30-Jun-2003 itojun

fix indent


# 1.167 23-Jun-2003 martin

Make sure to include opt_foo.h if a defflag option FOO is used.


# 1.166 15-Jun-2003 matt

Change the way multicasts are kept. They now use a hash table in the same
manner as the ifaddr hash table. By doing this, the mkludge code can go
away. At the same time, keep track of what pcbs are using what ifaddr and
when an address is deleted from an interface, notify/abort all sockets
that have that address as a source. Switch IGMP and multicasts to use pools
for allocation. Fix a number of potential problems in the igmp code where
allocation failures could cause a trap/panic.


# 1.165 11-Apr-2003 christos

PR/991: Darren Reed: Add a sysctl (checkinteface) to implement this. This
implementation is taken from FreeBSD, but we default to off.
XXX: We should really do this on a per ifaddr basis as jason suggested.


# 1.164 26-Feb-2003 matt

Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.163 12-Nov-2002 itojun

remove all entries in rt timer queue on ip_mtudisc change, instead of
destroying the queue.


# 1.162 12-Nov-2002 itojun

ckout previous - doesn't compile


# 1.161 12-Nov-2002 itojun

update ip_mtudisc sysctl change handling.


# 1.160 10-Nov-2002 itojun

always create pmtud timeout queue, as ip_mtudisc can be tweaked via
sysctl at runtime. From lha@stacken.kth.se


# 1.159 02-Nov-2002 perry

/*CONTCOND*/ while (0)'ed macros


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.158 23-Sep-2002 itojun

revert mtudisc_timeout value to the old one if update falis


# 1.157 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame.


# 1.156 11-Sep-2002 itojun

correct signedness mixup in pointer passing. sync w/kame


Revision tags: gehenna-devsw-base
# 1.155 14-Aug-2002 itojun

avoid swapping endian of ip_len and ip_off on mbuf, to meet with M_LEADINGSPACE
optimization made last year. should solve PR 17867 and 10195.

IP_HDRINCL behavior of raw ip socket is kept unchanged. we may want to
provide IP_HDRINCL variant that does not swap endian.


# 1.154 30-Jun-2002 thorpej

Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).


# 1.153 13-Jun-2002 itojun

set IPv4 parameter to modern value.
- turn on path MTU discovery (previous: turned off)
- ICMPv4 redirect entry timeout = 600 sec (previous: never timeout)


# 1.152 09-Jun-2002 itojun

whitespace


# 1.151 07-Jun-2002 itojun

look at rmx_mtu on IPsec tunnel MTU computation.
From: David Waitzman <djw@bbn.com>


Revision tags: netbsd-1-6-base
# 1.150 12-May-2002 matt

branches: 1.150.2; 1.150.4;
Eliminate commons.


# 1.149 12-May-2002 wiz

Spelling fixes, from Sergey Svishchev in kern/16650.


# 1.148 07-May-2002 matt

Change struct ipqe to use TAILQ's instead of LIST's (primarily for TCP's
benefit currently). Rework tcp_reass code to optimize the 4 most likely causes
of out-of-order packets: first OoO pkt, next OoO pkt in seq, OoO pkt is part
of new chuck of OoO packets, and the OoO pkt fills the first hole. Add evcnts
to instrument tcp_reass (enabled by the options TCP_REASS_COUNTERS). This is
part 1/2 of tcp_reass changes.


# 1.147 18-Apr-2002 matt

Change test for M_EXT to M_READONLY for MROUTING. We only need to to do
a pullup if we aren't allowed to modify the packet.


Revision tags: eeh-devprop-base newlock-base
# 1.146 08-Mar-2002 thorpej

Pool deals fairly well with physical memory shortage, but it doesn't
deal with shortages of the VM maps where the backing pages are mapped
(usually kmem_map). Try to deal with this:

* Group all information about the backend allocator for a pool in a
separate structure. The pool references this structure, rather than
the individual fields.
* Change the pool_init() API accordingly, and adjust all callers.
* Link all pools using the same backend allocator on a list.
* The backend allocator is responsible for waiting for physical memory
to become available, but will still fail if it cannot callocate KVA
space for the pages. If this happens, carefully drain all pools using
the same backend allocator, so that some KVA space can be freed.
* Change pool_reclaim() to indicate if it actually succeeded in freeing
some pages, and use that information to make draining easier and more
efficient.
* Get rid of PR_URGENT. There was only one use of it, and it could be
dealt with by the caller.

From art@openbsd.org.


Revision tags: ifpoll-base
# 1.145 25-Feb-2002 itojun

correctly enforce ipsec policy check on forwarding case.
From: Greg Troxel <gdt@ir.bbn.com>, Bill Chiarchiaro <wjc@work.cleartech.com>


# 1.144 24-Feb-2002 martin

Clear M_BCAST and M_MCAST on outgoing mbufs.
Don't copy ttl from the inner packet to the encapsulating packet. Make
the outer ttl sysctl'able. This should close PR 14269 from Jasper Wallace
(change partly from there) and it makes traceroute work over gre tunnels.


# 1.143 21-Feb-2002 itojun

suppress source quence message, based on router-req RFC (also could be abused
as DoS traffic generator). from kjc/kame


# 1.142 28-Nov-2001 darrenr

recompute hlen after calling pfil_run_hooks() in case ip_hl was changed.


# 1.141 13-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-mips-cache-base
# 1.140 04-Nov-2001 matt

Convert netinet to not use the internal <sys/queue.h> field names
but instead the access macros. Use the FOREACH macros where appropriate.


# 1.139 04-Nov-2001 matt

Change a few variable/tables to const since they are read-only.


# 1.138 29-Oct-2001 simonb

Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.137 17-Sep-2001 thorpej

branches: 1.137.2;
Split the pre-computed ifnet checksum flags into Tx and Rx directions.
Add capabilities bits that indicate an interface can only perform
in-bound TCPv4 or UDPv4 checksums. There is at least one Gig-E chip
for which this is true (Level One LXT-1001), and this is also the
case for the Intel i82559 10/100 Ethernet chips.


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.136 06-Aug-2001 itojun

branches: 1.136.2;
cache IPsec policy on in6?pcb. most of the lookup operations can be bypassed,
especially when it is a connected SOCK_STREAM in6?pcb. sync with kame.


# 1.135 02-Jun-2001 thorpej

branches: 1.135.2;
Implement support for IP/TCP/UDP checksum offloading provided by
network interfaces. This works by pre-computing the pseudo-header
checksum and caching it, delaying the actual checksum to ip_output()
if the hardware cannot perform the sum for us. In-bound checksums
can either be fully-checked by hardware, or summed up for final
verification by software. This method was modeled after how this
is done in FreeBSD, although the code is significantly different in
most places.

We don't delay checksums for IPv6/TCP, but we do take advantage of the
cached pseudo-header checksum.

Note: hardware-assisted checksumming defaults to "off". It is
enabled with ifconfig(8). See the manual page for details.

Implement hardware-assisted checksumming on the DP83820 Gigabit Ethernet,
3c90xB/3c90xC 10/100 Ethernet, and Alteon Tigon/Tigon2 Gigabit Ethernet.


# 1.134 21-May-2001 lukem

fix spelo in comment


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.133 16-Apr-2001 itojun

give a default value to net.inet.ip.maxfragpackets, to protect us from
"lots of fragmented packets" DoS attack.

the current default value is derived from ipv6 counterpart, which is
a magical value "200". it should be enough for normal systems, not sure
if it is enough when you take hundreds of thousands of tcp connections on
your system. if you have proposal for a better value with concrete reasons,
let me know.


# 1.132 13-Apr-2001 thorpej

Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.


# 1.131 27-Mar-2001 itojun

net.inet.ip.maxfragpackets defines the maximum size of ip reass queue
(prevents fragment flood from chewing up mbuf memory space).
derived from KAME net.inet6.ip6.maxfragpackets.


# 1.130 02-Mar-2001 itojun

branches: 1.130.2;
increase ipstat.ips_badaddr if the packet fails to pass address checks.


# 1.129 02-Mar-2001 itojun

reject packets with 127/8 on IPv4 src/dst, they must not appear on wire
(RFC1122). torture-tests will be welcomed.
XXX do we want to check source routing headers as well?


# 1.128 01-Mar-2001 itojun

make sure to enforce inbound ipsec policy checking, for any protocols on top
of ip (check it when final header is visited). sync with kame.
XXX kame team will need to re-check policy engine code


# 1.127 24-Jan-2001 itojun

- record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation


# 1.126 28-Dec-2000 thorpej

Back out the sledgehammer damage applied by wiz while I was out for
the holiday.


# 1.125 25-Dec-2000 wiz

Back out previous change. It causes NAT to fail, and was CLEARLY
NOT TESTED before it was committed.


# 1.124 22-Dec-2000 thorpej

Slight adjustment to how pfil_head's are registered. Instead of a
"key" and a "dlt", use a "type" (PFIL_TYPE_{AF,IFNET} for now) and
a val/ptr appropriate for that type. This allows for more future
flexibility with the pfil_hook mechanism.


# 1.123 14-Dec-2000 thorpej

Add ALTQ glue. XXX Temporary until ALTQ is changed to use a pfil hook.


# 1.122 24-Nov-2000 itojun

IFA_STATS stability (not complete); don't touch ip if it is NULL.


# 1.121 11-Nov-2000 thorpej

Restructure the PFIL_HOOKS mechanism a bit:
- All packets are passed to PFIL_HOOKS as they come off the wire, i.e.
fields in protocol headers in network order, etc.
- Allow for multiple hooks to be registered, using a "key" and a "dlt".
The "dlt" is a BPF data link type, indicating what type of header is
present.
- INET and INET6 register with key == AF_INET or AF_INET6, and
dlt == DLT_RAW.
- PFIL_HOOKS now take an argument for the filter hook, and mbuf **,
an ifnet *, and a direction (PFIL_IN or PFIL_OUT), thus making them
less IP (really, IP Filter) centric.

Maintain compatibility with IP Filter by adding wrapper functions for
IP Filter.


# 1.120 08-Nov-2000 ad

Update for hashinit() change.


# 1.119 13-Oct-2000 itojun

make sure we don't share external mbuf between m and mcopy, in ip_forward().
should solve PR 11201.


# 1.118 26-Aug-2000 itojun

make sure anonport{min,max} is not negative number


# 1.117 25-Aug-2000 tron

Add new sysctl variables "net.inet.ip.lowportmin" and
"net.inet.ip.lowportmax" which can be used to the set minimum
and maximum port number assigned to sockets using
IP_PORTRANGE_LOW.


# 1.116 06-Jul-2000 itojun

remove unnecessary #include <netkey/key_debug.h>. from kame.


# 1.115 28-Jun-2000 mrg

<vm/vm.h> -> <uvm/uvm_extern.h>


Revision tags: netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.114 10-May-2000 itojun

branches: 1.114.4;
add missing boundary checks to ip options processing.
correct timestamp option validation (len and ptr upper/lower bound
based on RFC791).
fill "pointer" field for parameter problem in timestamp option processing.


# 1.113 10-May-2000 itojun

correct more out-of-bounds memory access, if cnt == 1 and optlen > 1.


# 1.112 06-May-2000 sommerfeld

Handle large offsets with very small options correctly.


# 1.111 31-Mar-2000 jdolecek

Slighly improve previous - only include <netinet/ip_mroute.h> if MROUTING
is defined.


# 1.110 31-Mar-2000 jdolecek

include <netinet/ip_mroute.h> for ip_mforward() - needed after
last duplicate prototype sweep (prototype for ip_mforward() used to be in <netinet/ip_var.h>)


# 1.109 30-Mar-2000 augustss

Remove register declarations.


# 1.108 30-Mar-2000 simonb

Delete uninitialised declaration of ip_defttl - there's an initialised
decl earlier in this file.


# 1.107 10-Mar-2000 thorpej

Back out previous, and adjust a comment.


# 1.106 07-Mar-2000 thorpej

Back out part of 1.104 which isn't actually needed.


# 1.105 03-Mar-2000 itojun

remove unnecessary ttl initialization which I mistakingly bringed in
during KAME merge (this is part of WIDE's expeirmental reass code...)
NetBSD PR: 9412
From: Wolfgang Rupprecht <wolfgang@wsrcc.com>
Fix from: ho@crt.se
itojun was notified from: theo


# 1.104 02-Mar-2000 thorpej

Avoid a bug in GCC which manifests itself when processing unaligned
IP options. Problem pointed out by Matt Hargett and Erik Fair, analyzed
by me.


# 1.103 01-Mar-2000 itojun

introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)


# 1.102 20-Feb-2000 darrenr

pass "struct pfil_head *" to pfil_add_hook and pfil_remove hook rather
than "struct protosw *".


# 1.101 17-Feb-2000 darrenr

Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.


# 1.100 16-Feb-2000 itojun

- if ip_dst matches address on !IFF_UP interface, and
- there's no match against addresses on IFF_UP interface,
send icmp unreach if I'm router. drop it if I'm host.

Revised version of PR: 9387 from nrt@iij.ad.jp. Discussed with thorpej+nrt.


Revision tags: chs-ubc2-newbase
# 1.99 12-Feb-2000 thorpej

Typo (Thanks, Havard :-)


# 1.98 12-Feb-2000 thorpej

Small cosmetic change, and note a place where a statistic should be
gathered.


# 1.97 11-Feb-2000 itojun

fix in-kernel packet forwarding loop (till TTL becomes 0) when:
- a packet is delivered to an address X,
- and the address X is configured on my !IFF_UP interface
- and ipforwarding=1

NetBSD PR: 9387
From: nrt@iij.ad.jp


# 1.96 01-Feb-2000 thorpej

Use ifatoia() and sintosa() consistently, rather than using home-grown
casting macros intermixed.


# 1.95 31-Jan-2000 itojun

bring in latest KAME ipsec tree.
- interop issues in ipcomp is fixed
- padding type (after ESP) is configurable
- key database memory management (need more fixes)
- policy specification is revisited

XXX m->m_pkthdr.rcvif is still overloaded - hope to fix it soon


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.94 26-Oct-1999 itojun

disable ipflow (IPv4 fast fowarding) when IPsec is configured into the kernel.


# 1.93 17-Oct-1999 sommerfeld

branches: 1.93.2; 1.93.4;
In ip_forward():

Avoid forwarding ip unicast packets which were contained inside
link-level multicast packets; having M_MCAST still set in the packet
header flags will mean that the packet will get multicast to a bogus
group instead of unicast to the next hop.

Malformed packets like this have occasionally been spotted "in the
wild" on a mediaone cable modem segment which also had multiple netbsd
machines running as router/NAT boxes.

Without this, any subnet with multiple netbsd routers receiving all
multicasts will generate a packet storm on receipt of such a
multicast. Note that we already do the same check here for link-level
broadcasts; ip6_forward already does this as well.

Note that multicast forwarding does not go through ip_forward().

Adding some code to if_ethersubr to sanity check link-level
vs. ip-level multicast addresses might also be worthwhile.


Revision tags: chs-ubc2-base
# 1.92 23-Jul-1999 itojun

branches: 1.92.2;
do not include unnecessary include files.


# 1.91 09-Jul-1999 thorpej

defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).


# 1.90 06-Jul-1999 itojun

sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups


# 1.89 01-Jul-1999 itojun

IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.88 26-Jun-1999 sommerfeld

If the new global variable hostzerobroadcast is zero, no longer assume
address zero of each net/subnet is a broadcast address.
(The default value is nonzero, which preserves the current behavior).

This can be set using sysctl; the boot-time default can also be
configured using the HOSTZEROBROADCAST kernel config option.

While we're here, defopt HOSTZEROBROADCAST and SUBNETSARELOCAL


# 1.87 04-May-1999 hwr

It does not make much sense to increase a "output" counter on input.


# 1.86 03-May-1999 thorpej

In INADDR_TO_IA(), skip interfaces which are not up. Revert previous change
to ip_input.c to check the interface status after INADDR_TO_IA().

Fix cooked up by Heiko Rupp and myself.

Fixes PR 7480.


# 1.85 03-May-1999 hwr

Drop packets, that have a Class-D address as source address.
Implements the first half of PR 7003.


# 1.84 07-Apr-1999 proff

tiny KNF change


# 1.83 07-Apr-1999 proff

Prevent reception of packets on downed interfaces (via an up interface).
fixes kern/7327


Revision tags: netbsd-1-4-base
# 1.82 27-Mar-1999 aidan

branches: 1.82.2;
Added per-addr input/output statistics. Currently just support netatalk
and netinet, currently only tested under netinet.

Disabled by default, enabled by compiling the kernel with option
IFA_STATS. Enabling this feature seems to make the ip_output function
take 13% longer than before, which should be OK for people that need
this feature.


# 1.81 26-Mar-1999 proff

security: test for ip_len < ip_hl <<2 and drop packet accordingly


# 1.80 19-Jan-1999 mycroft

There's just no plausible reason to byte-swap ip_id internally. It's opaque.


# 1.79 19-Jan-1999 mycroft

Don't screw with ip_len; just subtract from it where we actually use the
value.


# 1.78 19-Jan-1999 mycroft

Don't overwrite the checksum fields when checking them. There's no reason to
do this, and it screws up ICMP replies.
XXX The returned IP checksum and length are still wrong.


# 1.77 11-Jan-1999 thorpej

Fix byte order and ip_len inconsistencies in ICMP reply code. Also, fix
some formatting and HTONS(foo) vs. foo = htons(foo) inconsistencies.

PR #6602, Darren Reed.


# 1.76 19-Dec-1998 thorpej

Reverse the copyright-notice-swap. It went against existing practice.


# 1.75 18-Dec-1998 thorpej

Add a lock around the IP fragment reassembly queue, to prevent ip_drain()
from corrupting the queue if called from a device's interrupt context.

Should fix PR #5684.


Revision tags: kenh-if-detach-base
# 1.74 13-Nov-1998 thorpej

branches: 1.74.2;
Once a fragmented IP packet has been reassembled, recompute the packet
length before passing it up the stack. From FreeBSD.


Revision tags: chs-ubc-base
# 1.73 08-Oct-1998 thorpej

Use the pool allocator for ipflow entries.


# 1.72 08-Oct-1998 thorpej

Use the pool allocator for ipqent structures.


# 1.71 30-Sep-1998 tls

Switch order of TNF and UCB copyrights so UCB copyright is first; this seems more appropriate since UCB wrote the original code, after all.


# 1.70 09-Sep-1998 thorpej

Make a diagnostic printf more sensible, PR #5951, Heiko W. Rupp.


# 1.69 09-Aug-1998 mrg

defopt PFIL_HOOKS.


Revision tags: eeh-paddr_t-base
# 1.68 17-Jul-1998 sommerfe

Fix PR5508: ipfil cut-through forwarding causes panic


# 1.67 01-Jun-1998 thorpej

Protect the ipflow_reap() call with splsoftnet.


# 1.66 24-May-1998 thorpej

Fix OBOB in IP timestamp option processing, as noted in FreeBSD PR 6738,
from Jennifer Dawn Meyers <jdm@enteract.com>.


# 1.65 04-May-1998 matt

Default IP flow to being enabled. Add a sysctl to control the maximum
number of flows (net.inet.ip.maxflows). If set to 0, will disable fast
path forwarding.


# 1.64 01-May-1998 thorpej

Allow packet filters to prevent a packet from creating a fast-forwarding
flow, by setting the "can fast forward" flag in the packet header, and
giving a chance for filters to clear the flag. If the flag is still
set after the filters have given it a chance, the packet will be used
to create a fast-forward flow entry.


# 1.63 29-Apr-1998 matt

Add support for "fast" forwarding. Add hooks in if_ethersubr.c and
if_fddisubr.c to fastpath IP forwarding. If ip_forward successfully
forwards a packet, it will create a cache (ipflow) entry. ether_input
and fddi_input will first call ipflow_fastforward with the received
packet and if the packet passes enough tests, it will be forwarded (the
ttl is decremented and the cksum is adjusted incrementally).


# 1.62 29-Apr-1998 matt

defopt GATEWAY


# 1.61 29-Apr-1998 kml

change path MTU timeout value to match RFC 1191


# 1.60 29-Apr-1998 kml

Add support for deletion of routes added by path MTU discovery;
uses new generic route timeout code. Add sysctl for timeout period.


# 1.59 19-Mar-1998 mrg

convert pfil(9) in and out lists from <sys/queue.h> LISTs to TAILQs, and
change pfil_add_hook to put output filters at the tail of the queue,
while continuing to place input filters at the head of the queue. update
the two users of these functions, and document these changes.

fixes PR#4593.


# 1.58 15-Feb-1998 tls

Add correct copyright notice for IP address hash change. This code is donated to TNF by the original copyright holder, Panix.


# 1.57 13-Feb-1998 tls

Change list of interface IP addresses to a hash. Improves performance on hosts with a large number of IP addresses significantly.


# 1.56 28-Jan-1998 thorpej

Use offsetof() from libkern.h


# 1.55 12-Jan-1998 scottr

Use option header file for MROUTING


# 1.54 05-Jan-1998 lukem

enhance ephemeral port allocation code:
* support sysctl net.inet.ip.anonportmin (lowest ephemeral port)
and net.inet.ip.anonportmax (highest ephemeral port).
these can't be set to >65535, < IPPORT_RESERVED (unless IPNOPRIVPORTS
is defined), and anonportmin has to be < anonportmax.
* use a cleaner way of only cycling through the available set once;
this will be useful for when a random allocation scheme is used
* define IPPORT_ANON{MIN,MAX} instead of IPPORT_USER{LOW,HIGH}


Revision tags: netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base
# 1.53 18-Oct-1997 kml

branches: 1.53.2;
change sysctl net.inet.icmp.mtudisc to net.inet.ip.mtudisc


# 1.52 17-Oct-1997 thorpej

Allow `subnetsarelocal' to be changed via sysctl.


Revision tags: thorpej-signal-base marc-pcmcia-base
# 1.51 29-Aug-1997 gwr

Tweaks to allow operation with an interface address of 0.0.0.0
(needed for NFS mountroot using BOOTP to get boot parameters)


Revision tags: marc-pcmcia-bp
# 1.50 24-Jun-1997 thorpej

branches: 1.50.4;
Eliminate use of dtom() from the network code, allowing more flexible
use of mbuf external storage and increasing performance (by eliminating
an m_pullup() for clusters in the IP reassembly code).

Changes from Koji Imada <koji@math.human.nagoya-u.ac.jp>, in PR #3628
and #3480, with ever-so-slight integration changes by me.


# 1.49 15-Apr-1997 christos

Move the mtod calls *after* we've made sure that the packet has passed the
filter successfully. Otherwise it can be NULL if the filter blocked it,
and we die. How did this ever work?


Revision tags: is-newarp-before-merge
# 1.48 26-Feb-1997 mrg

allow src-routed packetd by default, per host requirements


# 1.47 25-Feb-1997 cjs

Add net.inet.ip.allowsrcrt option which allows/drops all source
routed packets. This currently defaults to `drop,' but once we
verify that all applications that rely on determining remote IP
addresses for authentication are dropping the connection when they
see a source route option (not just disabling the source route
option), we can turn this back on and conform with the host
requirements.


# 1.46 19-Feb-1997 cjs

Fix bug in sysctl net.inet.ip.forwsrcrt handing: now you can read it
if securelevel > 0. (Thanks, cgd.)


# 1.45 18-Feb-1997 mrg

pseudo-device ipfilter brings in PFIL_HOOKS.


Revision tags: is-newarp-base
# 1.44 11-Jan-1997 thorpej

branches: 1.44.4;
Implement the IP_RECVIF socket option: supply a datagram packet's incoming
interface using a sockaddr_dl in a control mbuf.

Implement SO_TIMESTAMP for IP datagrams.

Move packet information option processing into a generic function
so that they work with multicast UDP and raw IP as well as unicast UDP.

Contributed by Bill Fenner <fenner@parc.xerox.com>.


# 1.43 20-Dec-1996 mrg

in pfil_hooks: always reassign ip after calling hook.


# 1.42 20-Dec-1996 mrg

remove pfil_bad.


# 1.41 25-Oct-1996 thorpej

Before concatenating frags, sanity check the length of the packet. If it's
larger than IP_MAXPACKET, discard it.
Based on a patch from Bill Fenner <fenner@parc.xerox.com>


# 1.40 22-Oct-1996 veego

Fix a panic from the pfil_hooks.


# 1.39 13-Oct-1996 christos

backout previous kprintf changes


# 1.38 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.37 21-Sep-1996 perry

commit fix in pr 2772 -- the IP input code was assuming that the
reserved (must be zero) flag must necessarily be zero. We now define
an IP_RF (by analogy to IP_DF and IP_MF) and mask it out when necessary.


# 1.36 14-Sep-1996 mrg

move the packet filter hooks in to a saner location. while i'm here, rename
PACKET_FILTER to PFIL_HOOKS.


# 1.35 09-Sep-1996 mycroft

Add in_nullhost() and in_hosteq() macros, to hide some protocol
details. Also, fix a bug in TCP wrt SYN+URG packets.


# 1.34 08-Sep-1996 mycroft

Save 68 bytes of the packet for ICMP, not 64. From Laine Stump, PR 2296.


# 1.33 06-Sep-1996 mrg

add packet filter interface code. see pfil(9) for more details. you
need the PACKET_FILTER option to enable this code. currently, ipfilter
version 3.1.1-beta has been converted to use this new interface.


# 1.32 14-Aug-1996 thorpej

Fix some DIAGNOSTIC printf() formats; ntohl() provides a 32-bit quantity,
and should be printed with %x, not %lx.


# 1.31 10-Jul-1996 cgd

print result of ntohl/htonl as a long. (makes -Wformat work on the
Alpha.)


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.30 16-Mar-1996 christos

branches: 1.30.4;
Fix printf format args.


# 1.29 26-Feb-1996 mrg

two more local addr changes, all done differently now (idea from charles)


# 1.28 13-Feb-1996 christos

netinet prototypes


# 1.27 16-Jan-1996 thorpej

Add a net.inet.ip.directed-broadcast sysctl as suggested by
Darren Reed <darrenr@vitruvius.arbld.unimelb.edu.au> in PR #1227.
This change is slightly different than the one submitted by Darren in
that the DIRECTED_BROADCAST compile-time option will behave like it used
to so that existing configurations utilizing it won't have to change.


# 1.26 15-Jan-1996 thorpej

Add net.inet.ip.forwsrcrt: if zero, the system will not forward
source-routed packets. Note this value is protected by kernel security
level; it can only be changed if securelevel < 1.


# 1.25 21-Nov-1995 cgd

make netinet work on systems where pointers and longs are 64 bits
(like the alpha). Biggest problem: IP headers were overlayed with
structure which included pointers, and which therefore didn't overlay
properly on 64-bit machines. Solution: instead of threading pointers
through IP header overlays, add a "queue element" structure to do
the threading, and point it at the ip headers.


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.24 12-Aug-1995 mycroft

splnet --> splsoftnet


# 1.23 12-Jun-1995 mycroft

Change in_pcbnotify*() to take an errno value. Make inetctlerrmap[] an
array on ints, not u_chars.


# 1.22 12-Jun-1995 mycroft

Various cleanup, including:
* Convert several data structures to use queue.h.
* Split in_pcbnotify() into two parts; one for notifying a specific PCB, and
one for notifying all PCBs for a particular foreign address.


# 1.21 07-Jun-1995 mycroft

Remove ip_ifmatrix completely.


# 1.20 04-Jun-1995 mycroft

Don't cast things unnecessarily.


# 1.19 04-Jun-1995 mycroft

Clean up many more casts.


# 1.18 01-Jun-1995 mycroft

Avoid byte-swapping IP addresses at run time.


# 1.17 15-May-1995 cgd

oops; forgot a '{'


# 1.16 14-May-1995 cgd

drop (and record) malformed IP fragments. Fixes pr 1030 (differently).


# 1.15 13-Apr-1995 cgd

be a bit more careful and explicit with types. (basically a large no-op.)


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.14 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.13 13-May-1994 mycroft

Update to 4.4-Lite networking code, with a few local changes.


# 1.12 14-Feb-1994 mycroft

PARANOID --> DIAGNOSTIC for inexpensive tests.


# 1.11 02-Feb-1994 hpeyerl

Multicast is no longer optional.


# 1.10 29-Jan-1994 brezak

Fix some cases of NOT dealing with m_pkthdr's. This code is still suspect though, at least this fixes some panics.


# 1.9 10-Jan-1994 mycroft

Should compile now with or without `options MULTICAST'.


# 1.8 09-Jan-1994 mycroft

Prototype the rest.


# 1.7 08-Jan-1994 mycroft

More prototypes.


# 1.6 08-Jan-1994 mycroft

Fix some inconsistent spacing; spaces at the end of lines, etc.


# 1.5 18-Dec-1993 mycroft

Canonicalize all #includes.


# 1.4 06-Dec-1993 hpeyerl

multicast support.
>From Chris Maeda, cmaeda@cs.washington.edu
These patches are derived from the IP Multicast patches for BSDI.


Revision tags: magnum-base netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.3 20-May-1993 cgd

branches: 1.3.4;
more rcsid additions and file header cleanups


# 1.2 04-May-1993 cgd

make ip_input recursion checking be for -DPARANOID, and make it panic


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.399 19-Feb-2021 christos

- Make ALIGNED_POINTER use __alignof(t) instead of sizeof(t). This is more
correct because it works with non-primitive types and provides the ABI
alignment for the type the compiler will use.
- Remove all the *_HDR_ALIGNMENT macros and asserts
- Replace POINTER_ALIGNED_P with ACCESSIBLE_POINTER which is identical to
ALIGNED_POINTER, but returns that the pointer is always aligned if the
CPU supports unaligned accesses.
[ as proposed in tech-kern ]


# 1.398 14-Feb-2021 christos

- centralize header align and pullup into a single inline function
- use a single macro to align pointers and expose the alignment, instead
of hard-coding 3 in 1/2 the macros.
- fix an issue in the ipv6 lt2p where it was aligning for ipv4 and pulling
for ipv6.


Revision tags: thorpej-futex-base
# 1.397 28-Aug-2020 ozaki-r

inet: reduce silent packet discards


# 1.396 28-Aug-2020 ozaki-r

inet: pull m_get_rcvif_psref out of ip_input for simplicity

Same as ip6_input.


# 1.395 28-Aug-2020 ozaki-r

ipsec: rename ipsec_ip_input to ipsec_ip_input_checkpolicy

Because it just checks if a packet passes security policies.


# 1.394 28-Aug-2020 ozaki-r

inet, inet6: count packets dropped by IPsec

The counters count packets dropped due to security policy checks.


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3 ad-namecache-base2 ad-namecache-base1 ad-namecache-base phil-wifi-20191119
# 1.393 13-Nov-2019 ozaki-r

Get rid of unnecessary NULL checks for rt_ifa and ifa_ifp

They are always non-NULL nowadays.


# 1.392 19-Sep-2019 ozaki-r

Apply some missing changes lost on the previous commit


# 1.391 19-Sep-2019 ozaki-r

Avoid having a rtcache directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.
A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.

Reviewed by knakahara@ and yamaguchi@


# 1.390 15-Sep-2019 bouyer

Packet filters can return an mbuf chain with fragmented headers, so
m_pullup() it if needed and remove the KASSERT()s.


Revision tags: netbsd-9-base phil-wifi-20190609
# 1.389 13-May-2019 ozaki-r

branches: 1.389.2;
Count packets dropped by pfil


Revision tags: isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118
# 1.388 17-Jan-2019 knakahara

Fix ipsecif(4) cannot apply input direction packet filter. Reviewed by ozaki-r@n.o and ryo@n.o.

Add ATF later.


Revision tags: pgoyette-compat-1226 pgoyette-compat-1126
# 1.387 15-Nov-2018 maxv

Remove the 't' argument from m_tag_find().


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.386 02-Sep-2018 maxv

remove reference to ipnat, and duplicate comments


Revision tags: pgoyette-compat-0728
# 1.385 10-Jul-2018 maxv

Remove the second argument from ip_reass_packet(). We want the IP header
on the mbuf, not elsewhere. Simplifies the NPF reassembly code a little.
No real functional change.


Revision tags: phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.384 17-May-2018 maxv

branches: 1.384.2;
Add KASSERTs, related to PR/39794.


# 1.383 14-May-2018 maxv

Merge ipsec4_input and ipsec6_input into ipsec_ip_input. Make the argument
a bool for clarity. Optimize the function: if M_CANFASTFWD is not there
(because already removed by the firewall) leave now.

Makes it easier to see that M_CANFASTFWD is not removed on IPv6.


# 1.382 10-May-2018 maxv

Rename ipsec4_forward -> ipsec_mtu, and switch to void.


Revision tags: pgoyette-compat-0502
# 1.381 26-Apr-2018 maxv

Remove unused mbuf argument from sbsavetimestamp.


Revision tags: pgoyette-compat-0422 pgoyette-compat-0415
# 1.380 15-Apr-2018 maxv

Introduce a m_verify_packet function, that verifies the mbuf chain of a
packet to ensure it is not malformed. Call this function in "points of
interest", that are the IPv4/IPv6/IPsec entry points. There could be more.

We use M_VERIFY_PACKET(m), declared under DIAGNOSTIC only.

This function should not be called everywhere, especially not in places
that temporarily manipulate (and clobber) the mbuf structure; once they're
done they put the mbuf back in a correct format.


# 1.379 11-Apr-2018 maxv

Don't pass IP_ALLOWBROADCAST in ipsec4_input. The flag lands in
ipsec_getpolicybyaddr, and only IP_FORWARDING is taken.

In fact it would be good to change the 'flags' argument of ipsec4_input
to be a boolean, same for ipsec_getpolicybyaddr. It would be less
misleading.


# 1.378 11-Apr-2018 maxv

Add comment about IPsec.


# 1.377 11-Apr-2018 maxv

Small changes in ip_dooptions: replace bcopy by memcpy, the areas can't
overlap.


Revision tags: pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.376 24-Feb-2018 ozaki-r

branches: 1.376.2;
Avoid a deadlock between softnet_lock and IFNET_LOCK

A deadlock occurs because there is a violation of the rule of lock ordering;
softnet_lock is held with hodling IFNET_LOCK, which violates the rule.
To avoid the deadlock, replace softnet_lock in in_control and in6_control
with KERNEL_LOCK.

We also need to add some KERNEL_LOCKs to protect the network stack surely.
This is required, for example, for PR kern/51356.

Fix PR kern/53043


# 1.375 09-Feb-2018 maxv

Remove dead code.


# 1.374 07-Feb-2018 maxv

Remove null check on ip, it can't be null. (Confuses code scanners.)


# 1.373 06-Feb-2018 maxv

Typos and style a bit, no functional change.


# 1.372 05-Feb-2018 maxv

Exterminate IPSENDREDIRECTS and IPMTUDISCTIMEOUT, neither is documented.


# 1.371 05-Feb-2018 maxv

Nuke DIRECTED_BROADCAST, it is not documented and not enabled anywhere. It
probably wouldn't have built correctly anyway, since there is no associated
defflag.

These ten lines of code in ip_input.c already look a lot better.


# 1.370 05-Feb-2018 maxv

Clean up this mess. This is typically the kind of places where we need to
seriously cut the bullshit. These things are unreadable, undocumented, and
all they bought us was not figuring out we had IPv4 forwarding enabled by
default for 20+ years.


# 1.369 05-Feb-2018 maxv

Be tougher, and don't allow LSRR+SSRR (RFC7126).


# 1.368 05-Feb-2018 maxv

Kick duplicate options, they are not allowed (RFC791).


# 1.367 05-Feb-2018 maxv

Remove unused variable.


# 1.366 05-Feb-2018 maxv

Disable ip_allowsrcrt and ip_forwsrcrt. Enabling them by default was a
completely dumb idea, because they have security implications.

By sending an IPv4 packet containing an LSRR option, an attacker will
cause the system to forward the packet to another IPv4 address - and
this way he white-washes the source of the packet.

It is also possible for an attacker to reach hidden networks: if a server
has a public address, and a private one on an internal network (network
which has several internal machines connected), the attacker can send a
packet with:

source = 0.0.0.0
destination = public address of the server
LSRR first address = address of a machine on the internal network

And the packet will be forwarded, by the server, to the internal machine,
in some cases even with the internal IP address of the server as a source.


# 1.365 05-Feb-2018 maxv

Style, no functional change.


# 1.364 01-Jan-2018 christos

1) "#define ipi_spec_dst ipi_addr" in <netinet/in.h>
2) Change the IP_RECVPKTINFO option to control the generation of
IP_PKTINFO control messages, the way it's done in Solaris.
3) Remove the superfluous IP_RECVPKTINFO control message.
4) Change the IP_PKTINFO option to do different things depending on
the parameter it's supplied with:
- If it's sizeof(int), assume it's being used as in Linux:
- If it's non-zero, turn on the IP_RECVPKTINFO option.
- If it's zero, turn off the IP_RECVPKTINFO option.
- If it's sizeof(struct in_pktinfo), assume it's being used as in
Solaris, to set a default for the source interface and/or
source address for outgoing packets on the socket.
5) Return what Linux or Solaris compatible code expects, depending
on data size, and just added a fallback to a Linux (and current NetBSD)
compatible value if the size is unknown (as it is now), or,
in the future, if the calling application specifies a receiving
buffer that doesn't match either data item.

From: Tom Ivar Helbekkmo


Revision tags: tls-maxphys-base-20171202
# 1.363 24-Nov-2017 roy

Allow local communication over DETACHED addresses.
Allow binding to DETACHED or TENTATIVE addresses as we deny
sending upstream from them anyway.
Prefer non DETACHED or TENTATIVE addresses.


# 1.362 17-Nov-2017 ozaki-r

Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch

It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.

No functional change


# 1.361 27-Sep-2017 ozaki-r

Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).


Revision tags: nick-nhusb-base-20170825
# 1.360 27-Jul-2017 ozaki-r

Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.


# 1.359 19-Jul-2017 ozaki-r

Correct a comment


Revision tags: perseant-stdc-iso10646-base
# 1.358 08-Jul-2017 christos

Reorder the controls to the ones that need an interface and the ones that
don't; process the ones that don't first. Add a DIAGNOSTIC if there is no
interface; really this should be a KASSERT/panic because it is a bug if the
interface is not set at this point.


# 1.357 06-Jul-2017 christos

remove unnecessary casts (no functional change)


# 1.356 06-Jul-2017 christos

Merge the two copies SO_TIMESTAMP/SO_OTIMESTAMP processing to a single
function, and add a SOOPT_TIMESTAMP define reducing compat pollution from
5 places to 1.


Revision tags: netbsd-8-base
# 1.355 01-Jun-2017 chs

branches: 1.355.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base
# 1.354 31-Mar-2017 ozaki-r

Don't use a single global variable to store source route information for multiple incoming packets

It's not MP-safe. So use a m_tag to store the information instead.

Pointed out by knakahara@
The fix is from OpenBSD (originally fixed in FreeBSD)


# 1.353 31-Mar-2017 ozaki-r

Don't use a single global variable as a temporal storage for multiple packets

It's not MP-safe. So use local variables instead.


Revision tags: pgoyette-localcount-20170320
# 1.352 06-Mar-2017 ozaki-r

Make sure icmp_redirect_timeout_q and ip_mtudisc_timeout_q are initialized on bootup

Fix PR kern/52029


# 1.351 17-Feb-2017 ozaki-r

Fix return value


# 1.350 17-Feb-2017 ozaki-r

Protect sysctl_net_inet_ip_pmtudto with icmp_mtx instead of softnet_lock


# 1.349 07-Feb-2017 ozaki-r

Add missing NULL checks for m_get_rcvif


Revision tags: nick-nhusb-base-20170204
# 1.348 24-Jan-2017 ozaki-r

Tweak softnet_lock and NET_MPSAFE

- Don't hold softnet_lock in some functions if NET_MPSAFE
- Add softnet_lock to sysctl_net_inet_icmp_redirtimeout
- Add softnet_lock to expire_upcalls of ip_mroute.c
- Restore softnet_lock for in{,6}_pcbpurgeif{,0} if NET_MPSAFE
- Mark some softnet_lock for future work


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107
# 1.347 12-Dec-2016 ozaki-r

branches: 1.347.2;
Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.346 08-Dec-2016 ozaki-r

Use psref for ip_rtaddr

ip_rtaddr will be sleepable soon. So use psref instead of pserialize.


# 1.345 08-Dec-2016 ozaki-r

Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.344 18-Oct-2016 ozaki-r

Don't hold global locks if NET_MPSAFE is enabled

If NET_MPSAFE is enabled, don't hold KERNEL_LOCK and softnet_lock in
part of the network stack such as IP forwarding paths. The aim of the
change is to make it easy to test the network stack without the locks
and reduce our local diffs.

By default (i.e., if NET_MPSAFE isn't enabled), the locks are held
as they used to be.

Reviewed by knakahara@


# 1.343 18-Oct-2016 ozaki-r

Avoid double frees of mbuf

May fix one of panicks reported by Tom Ivar Helbekkmo in PR kern/51522


# 1.342 11-Oct-2016 ozaki-r

Fix kernel builds with IFA_STATS


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.341 07-Sep-2016 roy

Disallow input to detached addresses because they are not yet valid.


# 1.340 31-Aug-2016 ozaki-r

Make ipforward_rt and ip6_forward_rt percpu

Sharing one rtcache between CPUs is just a bad idea.

Reviewed by knakahara@


Revision tags: pgoyette-localcount-20160806
# 1.339 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


# 1.338 26-Jul-2016 ozaki-r

Fix downmatch increment


Revision tags: pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.337 08-Jul-2016 ozaki-r

branches: 1.337.2;
CID 1363344: remove dead code

We may need to reconsider a case when m_get_rcvif_psref returns NULL.


# 1.336 07-Jul-2016 ozaki-r

Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.


# 1.335 06-Jul-2016 ozaki-r

Switch the IPv4 address list to pslist(9)

Note that we leave the old list just in case; it seems there are some
kvm(3) users accessing the list. We can remove it later if we confirmed
nobody does actually.


# 1.334 06-Jul-2016 ozaki-r

Add and use pslist(9)-based hashtable for IPv4 addresses

Note that we leave the old hashtable to keep vmstat -H working.


# 1.333 04-Jul-2016 ozaki-r

Separate IP address matching functions

No functional change intended.


# 1.332 30-Jun-2016 ozaki-r

Tidy up goto lables

No functional change.


# 1.331 30-Jun-2016 ozaki-r

Fix error paths

Some error paths did m_put_rcvif_psref twice.


# 1.330 28-Jun-2016 ozaki-r

Add missing NULL checks for m_get_rcvif_psref


# 1.329 10-Jun-2016 ozaki-r

Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319
# 1.328 21-Jan-2016 riastradh

Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.


# 1.327 21-Jan-2016 riastradh

Give proper prototype to ip_output.


# 1.326 08-Jan-2016 knakahara

eliminate ip_input.c and ip6_input.c dependency on gif(4)


Revision tags: nick-nhusb-base-20151226
# 1.325 13-Oct-2015 roy

Include arp.h to restore the sysctl net.inet.ip.dad_count.
Fixes PR kern/49883 thanks to HITOSHI Osada.


Revision tags: nick-nhusb-base-20150921
# 1.324 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.323 07-Aug-2015 ozaki-r

Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.


Revision tags: nick-nhusb-base-20150606
# 1.322 02-May-2015 joerg

Fix !ARP build.


# 1.321 02-May-2015 roy

Add IPv4 address flags IN_IFF_TENTATIVE, IN_IFF_DUPLICATED and
IN_IFF_DETATCHED to mimic the IPv6 address behaviour.
Add SIOCGIFAFLAG_IN ioctl to retrieve the address flag via the
ifreq structure.
Add IPv4 DAD detection via the ARP methods described in RFC 5227.
Add sysctls net.inet.ip.dad_count and net.inet.arp.debug.

Discussed on tech-net@


Revision tags: nick-nhusb-base-20150406
# 1.320 26-Mar-2015 ozaki-r

Tidy up the regular path of ip_forward

No functional change is intended.


Revision tags: netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.319 16-Jun-2014 ozaki-r

branches: 1.319.2; 1.319.4; 1.319.6; 1.319.10;
Add 3rd argument to pktq_create to pass sc

It will be used to pass bridge sc for bridge_forward softint.

ok rmind@


# 1.318 05-Jun-2014 rmind

- Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.


# 1.317 30-May-2014 christos

Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.


# 1.316 29-May-2014 rmind

Make IGMP and multicast group management code MP-safe. Use a read-write
lock to protect the hash table of multicast address records; also, make it
private and eliminate some macros. In the long term, the lookup path ought
to be optimised.


# 1.315 28-May-2014 christos

CID 12164{49,51}: Remove bogus ifp == NULL checks; if ifp was really NULL,
we would have been dead a few lines before the tests.


# 1.314 23-May-2014 rmind

ip_input(), ip_savecontrol(): cache m->m_pkthdr.rcvif in a variable.


# 1.313 23-May-2014 rmind

Make ip_forward() static, there is no need to expose it.


# 1.312 23-May-2014 rmind

Make ip_input() static, there is no need to expose it.


# 1.311 22-May-2014 rmind

- Add in_init() and move some functions, variables and sysctls into in.c
where they belong to. Make some functions and variables static.
- ip_input.c: reduce some #ifdefs, cleanup a little.
- Move some sysctls into ip_flow.c as they belong there.

No functional change.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.310 19-Mar-2014 liamjfoy

branches: 1.310.2;
Remove ipflow_prune and replace with ipflow_reap. ok rmind@


Revision tags: riastradh-drm2-base3
# 1.309 25-Feb-2014 pooka

Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.308 29-Jun-2013 rmind

- Rewrite parts of pfil(9): use array to store hooks and thus be more cache
friendly (there are only few hooks in the system). Make the structures
opaque and the interface more strict.
- Remove PFIL_HOOKS option by making pfil(9) mandatory.


# 1.307 27-Jun-2013 christos

branches: 1.307.2;
flip src/dst


# 1.306 27-Jun-2013 christos

implement IP_PKTINFO and IP_RECVPKTINFO.


# 1.305 08-Jun-2013 rmind

Split IPsec code in ip_input() and ip_forward() into the separate routines
ipsec4_input() and ipsec4_forward(). Tested by christos@.


# 1.304 05-Jun-2013 christos

IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.


Revision tags: agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.303 29-Nov-2012 christos

Add a new sysctl to mark ports as reserved, so that they are not used in
the anonymous or reserved port allocation.


Revision tags: yamt-pagecache-base6
# 1.302 25-Jun-2012 christos

branches: 1.302.2;
rename rfc6056 -> portalgo, requested by yamt


# 1.301 22-Jun-2012 christos

PR/46602: Move the rfc6056 port randomization to the IP layer.


# 1.300 02-Jun-2012 dsl

Add some pre-processor magic to verify that the type of the data item
passed to sysctl_createv() actually matches the declared type for
the item itself.
In the places where the caller specifies a function and a structure
address (typically the 'softc') an explicit (void *) cast is now needed.
Fixes bugs in sys/dev/acpi/asus_acpi.c sys/dev/bluetooth/bcsp.c
sys/kern/vfs_bio.c sys/miscfs/syncfs/sync_subr.c and setting
AcpiGbl_EnableAmlDebugObject.
(mostly passing the address of a uint64_t when typed as CTLTYPE_INT).
I've test built quite a few kernels, but there may be some unfixed MD
fallout. Most likely passing &char[] to char *.
Also add CTLFLAG_UNSIGNED for unsiged decimals - not set yet.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.299 22-Mar-2012 drochner

remove KAME IPSEC, replaced by FAST_IPSEC


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.298 09-Jan-2012 liamjfoy

branches: 1.298.2; 1.298.6; 1.298.8;
check against NULL


# 1.297 19-Dec-2011 drochner

rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.296 31-Aug-2011 plunky

branches: 1.296.2; 1.296.6;
NULL does not need a cast


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.295 03-May-2011 dyoung

*_drain() routines may be called with locks held, so instead of doing
any work in *_drain(), set a drain-needed flag. Do the work in the
fasttimo handler.

Contributed by Coyote Point Systems, Inc.


# 1.294 14-Apr-2011 dyoung

In ipintr(), don't overwrite ipintrq.ifq_maxlen with IFQ_MAXLEN.

Initialize ipintrq.ifq_maxlen using IFQ_MAXLEN directly instead of using
the global ipqmaxlen. Get rid of the global ipqmaxlen.

Now it works again to override the maximum IP queue length with, for
example, sysctl -w net.inet.ip.ifq.maxlen=5.


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.293 13-Dec-2010 matt

branches: 1.293.2;
Back out rev that shouldn't have been committed.


# 1.292 11-Dec-2010 matt

Add routines to calculate a checkesum if the driver concludes that the
h/w can't do it.


Revision tags: uebayasi-xip-base4
# 1.291 05-Nov-2010 rmind

ip_randomid: make mechanism MP-safe and more modular.

OK matt@


# 1.290 05-Nov-2010 rmind

ip_reass_packet: finish abstraction; some clean-up.
Discussed some time ago with matt@.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.289 19-Jul-2010 rmind

Abstract IP reassembly into single generic routine - ip_reass_packet().
Make struct ipq private and struct ipqent not visible to userland.
Push ip_len adjustment into reassembly layer.

OK matt@


# 1.288 13-Jul-2010 rmind

Split-off IPv4 re-assembly mechanism into a separate module. Abstract
into ip_reass_init(), ip_reass_lookup(), etc (note: abstraction is not
yet complete). No functional changes to the actual mechanism.

OK matt@


# 1.287 09-Jul-2010 rmind

ip_input: move lookup for fragment queue a little bit further. OK matt@.


Revision tags: uebayasi-xip-base1
# 1.286 01-Apr-2010 tls

As suggested by at least 3 different people (the guilty parties know who
they are) avoid repeated kernel_lock/unlock by using an intrq on the stack.

About 5%-10% better from run to run, on my *very* simpleminded test. Can't
possibly be worse.


# 1.285 31-Mar-2010 tls

Don't hold kernel lock across call to ip_input() -- it blocked *all*
hardware interrupts for the length of time it took for all dequeued
packets to flow up the stack (on multiprocessors only). Initial testing
shows performance impact is minimal -- since this temporary fix actually
means taking/releasing the kernel lock per-packet, that seems
acceptable.

Holding the kernel lock across the ip_input() call duplicated the
exclusion intended to be provided by the socket locks/softnet lock
(same lock, for INET/INET6 sockets) and could mask serious bugs. Several
hours' testing didn't turn any up but I'd be surprised if some don't now
appear.

Damon Permezel noticed the problem. Temporary fix suggested by matt@.


Revision tags: yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.284 16-Sep-2009 pooka

branches: 1.284.2; 1.284.4;
Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.283 17-Jul-2009 minskim

Delete trailing whitespace.


Revision tags: yamt-nfs-mp-base6
# 1.282 16-Jul-2009 minskim

Add the IP_RECVTTL option support.

If the IP_RECVTTL option is enabled on a SOCK_DGRAM socket, the
recvmsg(2) call will return the TTL of the received datagram. The
msg_control field in the msghdr structure points to a buffer that
contains a cmsghdr structure followed by the TTL value.

Modeled after FreeBSD implementation.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.281 18-Apr-2009 tsutsui

Remove extra whitespace added by a stupid tool.
XXX: more in src/sys/arch


# 1.280 15-Apr-2009 elad

Remove a few KAUTH_GENERIC_ISSUSER in favor of more descriptive
alternatives.

Discussed on tech-kern:

http://mail-index.netbsd.org/tech-kern/2009/04/11/msg004798.html

Input from ad@, christos@, dyoung@, tsutsui@.

Okay ad@.


# 1.279 18-Mar-2009 cegger

bcopy -> memcpy


Revision tags: nick-hppapmap-base2
# 1.278 19-Jan-2009 christos

branches: 1.278.2;
Provide compatibility to the old timeval SCM_TIMESTAMP messages.


Revision tags: mjf-devfs2-base
# 1.277 17-Dec-2008 cegger

kill MALLOC and FREE macros.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.276 23-Nov-2008 rmind

ip_input: fix an IPQ "lock" leak. (hi <matt>!)


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4
# 1.275 04-Oct-2008 pooka

branches: 1.275.2; 1.275.4;
POOL_INIT -> pool_init


Revision tags: wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.274 05-Sep-2008 seanb

Wrong route being consulted in one place
in ip_forward() after change to rtcache_*().
Restore previous behaviour.


# 1.273 20-Aug-2008 matt

Make the sysctl routines take out softnet_lock before dealing with
any data structures.

Change inet6ctlerrmap and zeroin6_addr to const.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base
# 1.272 05-May-2008 ad

branches: 1.272.2; 1.272.6;
- Convert hashinit() to use kmem_alloc(). The hash tables can be large
and it's better to not have them in kmem_map.
- Convert a couple of minor items along the way to kmem_alloc().
- Fix some memory leaks.


# 1.271 04-May-2008 thorpej

Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577


# 1.270 02-May-2008 ad

PR kern/38497 Out of memory allocating ksiginfo

Work around: don't acquire softnet_lock in protocol drain routines.


# 1.269 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.268 24-Apr-2008 ad

branches: 1.268.2;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.


# 1.267 23-Apr-2008 thorpej

Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.266 12-Apr-2008 thorpej

branches: 1.266.2;
Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.


# 1.265 09-Apr-2008 thorpej

- ipflow is not used outside ip_flow.c; move its definition there.
- Make ipflow_reap() private to ip_flow.c, and introduce ipflow_prune()
for external callers to use (avoids returning an ipflow * that is never
actually used anyway).


# 1.264 07-Apr-2008 thorpej

Change IP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ipstat structure; old netstat
binaries will continue to work properly.


# 1.263 27-Mar-2008 cube

- Make sure we send a reasonable fragment size when IPSEC is configured.
Otherwise we end up sending a dubious "0" whenever we cannot find a
proper association for the packet.
- Reset sack_newdata along with snd_nxt to avoid improper integer
arithmetics that lead to sending data from an incorrect place in the
stream, making it appear as corrupted.

Patch by Michael Van Elst, based on an analysis by Michael for the IPSEC
stuff and I for the SACK issue.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.262 06-Feb-2008 matt

branches: 1.262.6;
Add a new ip_id generation scheme based on a Fisher-Yates shuffle over a
sliding window. XXX replace use of arc4random RSN.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.261 14-Jan-2008 dyoung

Use rtcache_validate() instead of rtcache_getrt(). Shorten staircase
in in_losing().


Revision tags: vmlocking2-base3 matt-armv6-base
# 1.260 22-Dec-2007 matt

Fix offset calculation.
Make sure that all frags use the same TOS.


# 1.259 21-Dec-2007 matt

Also make sure the first is at 68 bytes long.


# 1.258 21-Dec-2007 matt

Prevent TCP blind data attacks by not allowing non-initial fragments to
start at less than 68 bytes (minimal fragment size).


# 1.257 20-Dec-2007 dyoung

Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.256 26-Nov-2007 yamt

branches: 1.256.2; 1.256.6;
inetctlerrmap: use designated initializer.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.255 09-Nov-2007 kefren

Don't MCLAIM in ipintr() because we do it anyway in ip_input()


Revision tags: jmcneill-base yamt-x86pmap-base4 yamt-x86pmap-base3 yamt-x86pmap-base2 vmlocking-base
# 1.254 02-Oct-2007 dyoung

branches: 1.254.2; 1.254.4;
Delete the unused second argument to ip_stripoptions(), move it
closer to its single caller in if_eon.c, try to move fewer bytes
by moving the IP header forward instead of moving the tail of the
mbuf backward, and use m_adj(9) instead of fiddling directly with
mbuf data members.


Revision tags: yamt-x86pmap-base
# 1.253 11-Sep-2007 degroote

branches: 1.253.2;
In some FAST_IPSEC, spl level is not restored correctly. Fix that.

Spotted by Wolfgang Stukenbrock in pr/36800


Revision tags: nick-csl-alignment-base5
# 1.252 30-Aug-2007 dyoung

Use malloc(9) for sockaddrs instead of pool(9), and remove dom_sa_pool
and dom_sa_len members from struct domain. Pools of fixed-size
objects are too rigid for sockaddr_dls, whose size can vary over
a wide range.

Return sockaddr_dl to its "historical" size. Now that I'm using
malloc(9) instead of pool(9) to allocate sockaddr_dl, I can create
a sockaddr_dl of any size in the kernel, so expanding sockaddr_dl
is useless.

Avoid using sizeof(struct sockaddr_dl) in the kernel.

Introduce sockaddr_dl_alloc() for allocating & initializing an
arbitrary sockaddr_dl on the heap.

Add an argument, the sockaddr length, to sockaddr_alloc(),
sockaddr_copy(), and sockaddr_dl_setaddr().

Constify: LLADDR() -> CLLADDR().

Where the kernel overwrites LLADDR(), use sockaddr_dl_setaddr(),
instead. Used properly, sockaddr_dl_setaddr() will not overrun
the end of the sockaddr.


# 1.251 10-Aug-2007 dyoung

branches: 1.251.2;
Use sockaddr_dl_init().


Revision tags: matt-mips64-base
# 1.250 19-Jul-2007 dyoung

branches: 1.250.4; 1.250.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base yamt-idlelwp-base8 mjf-ufs-trans-base
# 1.249 02-May-2007 dyoung

branches: 1.249.2;
Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


Revision tags: thorpej-atomic-base
# 1.248 25-Mar-2007 liamjfoy

Add net.inet.ip.hashsize to control the IPv4 fast forward hash table size.


# 1.247 24-Mar-2007 liamjfoy

Don't call ip*flow_reap if we're just looking up maxflows


# 1.246 12-Mar-2007 ad

branches: 1.246.2; 1.246.4;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.


# 1.245 05-Mar-2007 liamjfoy

branches: 1.245.2;
Move ipflow_slowtimo from ip_slowtimo and into in_proto.c

ok matt@


# 1.244 04-Mar-2007 christos

Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.243 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.242 29-Jan-2007 dyoung

branches: 1.242.2;
Cosmetic: remove extraneous, non-KNF parentheses. Change a
sizeof(type) to a sizeof(*ptr) so the correctness of the statement
is correct "at a glance" (or so I hope).


# 1.241 22-Dec-2006 ad

ipintr(): check if the queue is empty before looping. Hardly a giant
win, but removed 30% of splnet() calls in one local test.


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.240 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.239 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


# 1.238 06-Dec-2006 dyoung

KNF.


# 1.237 06-Dec-2006 dyoung

KNF.


Revision tags: netbsd-4-0-RC1 netbsd-4-base
# 1.236 16-Nov-2006 christos

branches: 1.236.2; 1.236.4;
__unused removal on arguments; approved by core.


Revision tags: yamt-splraiseipl-base2
# 1.235 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


# 1.234 10-Oct-2006 dogcow

change the MOWNER_INIT define to take two args; fix extant struct mowner
decls to use it. Makes options MBUFTRACE compile again and not whinge about
missing structure declarations. (Also makes initialization consistent.)


# 1.233 05-Oct-2006 tls

Protect calls to pool_put/pool_get that may occur in interrupt context
with spl used to protect other allocations and frees, or datastructure
element insertion and removal, in adjacent code.

It is almost unquestionably the case that some of the spl()/splx() calls
added here are superfluous, but it really seems wrong to see:

s=splfoo();
/* frob data structure */
splx(s);
pool_put(x);

and if we think we need to protect the first operation, then it is hard
to see why we should not think we need to protect the next. "Better
safe than sorry".

It is also almost unquestionably the case that I missed some pool
gets/puts from interrupt context with my strategy for finding these
calls; use of PR_NOWAIT is a strong hint that a pool may be used from
interrupt context but many callers in the kernel pass a "can wait/can't
wait" flag down such that my searches might not have found them. One
notable area that needs to be looked at is pf.

See also:

http://mail-index.netbsd.org/tech-kern/2006/07/19/0003.html
http://mail-index.netbsd.org/tech-kern/2006/07/19/0009.html


# 1.232 19-Sep-2006 elad

Remove ugly (void *) casts from network scope authorization wrapper and
calls to it.

While here, adapt code for system scope listeners to avoid some more
casts (forgotten in previous run).

Update documentation.


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9
# 1.231 13-Sep-2006 elad

branches: 1.231.2;
Don't use KAUTH_RESULT_* where it's not applicable.
Prompted by yamt@.


# 1.230 08-Sep-2006 elad

First take at security model abstraction.

- Add a few scopes to the kernel: system, network, and machdep.

- Add a few more actions/sub-actions (requests), and start using them as
opposed to the KAUTH_GENERIC_ISSUSER place-holders.

- Introduce a basic set of listeners that implement our "traditional"
security model, called "bsd44". This is the default (and only) model we
have at the moment.

- Update all relevant documentation.

- Add some code and docs to help folks who want to actually use this stuff:

* There's a sample overlay model, sitting on-top of "bsd44", for
fast experimenting with tweaking just a subset of an existing model.

This is pretty cool because it's *really* straightforward to do stuff
you had to use ugly hacks for until now...

* And of course, documentation describing how to do the above for quick
reference, including code samples.

All of these changes were tested for regressions using a Python-based
testsuite that will be (I hope) available soon via pkgsrc. Information
about the tests, and how to write new ones, can be found on:

http://kauth.linbsd.org/kauthwiki

NOTE FOR DEVELOPERS: *PLEASE* don't add any code that does any of the
following:

- Uses a KAUTH_GENERIC_ISSUSER kauth(9) request,
- Checks 'securelevel' directly,
- Checks a uid/gid directly.

(or if you feel you have to, contact me first)

This is still work in progress; It's far from being done, but now it'll
be a lot easier.

Relevant mailing list threads:

http://mail-index.netbsd.org/tech-security/2006/01/25/0011.html
http://mail-index.netbsd.org/tech-security/2006/03/24/0001.html
http://mail-index.netbsd.org/tech-security/2006/04/18/0000.html
http://mail-index.netbsd.org/tech-security/2006/05/15/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/01/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/25/0000.html

Many thanks to YAMAMOTO Takashi, Matt Thomas, and Christos Zoulas for help
stablizing kauth(9).

Full credit for the regression tests, making sure these changes didn't break
anything, goes to Matt Fleming and Jaime Fournier.

Happy birthday Randi! :)


Revision tags: yamt-pdpolicy-base8 rpaulo-netinet-merge-pcb-base
# 1.229 30-Aug-2006 christos

branches: 1.229.2;
fix initializer


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.228 30-Jul-2006 elad

ugh.. more stuff that's overdue and should not be in 4.0: remove the
sysctl(9) flags CTLFLAG_READONLY[12]. luckily they're not documented
so it's only half regression.

only two knobs used them; proc.curproc.corename (check added in the
existing handler; its CTLFLAG_ANYWRITE, yay) and net.inet.ip.forwsrcrt,
that got its own handler now too.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base
# 1.227 07-Jun-2006 kardel

merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base simonb-timecounters-base
# 1.226 08-May-2006 liamjfoy

branches: 1.226.2;
#if -> #ifdef

ok christos


# 1.225 15-Apr-2006 christos

Coverity CID 1134: Protect against NULL deref.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.224 18-Feb-2006 joerg

branches: 1.224.2; 1.224.4; 1.224.6;
Print the source and destination IP in ip_forward's DIAGNOSTIC code
with inet_ntoa, making it more human friendly.

From Liam J. Foy in private mail.


# 1.223 24-Dec-2005 perry

branches: 1.223.2; 1.223.4; 1.223.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.222 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.221 01-Nov-2005 christos

Don't decrement the ttl, until we are sure that we can forward this packet.
Before if there was no route, we would call icmp_error with a datagram
packet that has an incorrect checksum. (From Liam Foy)


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base
# 1.220 23-Oct-2005 christos

No need to pass an interface when only the mtu is needed. From OpenBSD via
Liam Foy.


Revision tags: yamt-vop-base
# 1.219 05-Aug-2005 elad

branches: 1.219.2;
Add sysctls for IP, ICMP, TCP, and UDP statistics.


# 1.218 28-Jun-2005 seanb

branches: 1.218.2;
- Return ICMP_UNREACH_NET when no route found as per
section 4.3.3.1 of rfc1812.


# 1.217 09-Jun-2005 atatat

Properly fix the constipated lossage wrt -Wcast-qual and the sysctl
code. I know it's not the prettiest code, but it seems to work rather
well in spite of itself.


# 1.216 01-Jun-2005 blymn

Unconstify rnode to prevent compile error when GATEWAY option set.


Revision tags: kent-audio2-base
# 1.215 29-Apr-2005 yamt

move decl of inetsw to its own header to avoid array of incomplete type.
found by gcc4. reported by Adam Ciarcinski.


# 1.214 18-Apr-2005 yamt

fix problems related to loopback interface checksum omission. PR/29971.

- for ipv4, defer decision to ip layer as h/w checksum offloading does
so that it can check the actual interface the packet is going to.
- for ipv6, disable it.
(maybe will be revisited when it implements h/w checksum offloading.)

ok'ed by Jason Thorpe.


# 1.213 29-Mar-2005 yamt

ip_reass: clear stale csum_flags.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base
# 1.212 26-Feb-2005 perry

branches: 1.212.2;
nuke trailing whitespace


Revision tags: yamt-km-base2
# 1.211 03-Feb-2005 perry

ANSIfy function declarations


# 1.210 02-Feb-2005 perry

de-__P -- will ANSIfy .c files later.


Revision tags: yamt-km-base
# 1.209 24-Jan-2005 matt

branches: 1.209.2;
Add IFNET_FOREACH and IFADDR_FOREACH macros and start using them.


Revision tags: kent-audio1-beforemerge
# 1.208 19-Dec-2004 christos

branches: 1.208.2;
yamt's changes seem to fix all the checksumming issues. Turn the loopback
checksums back off so we can make sure that everything works.


# 1.207 17-Dec-2004 christos

Turn checksumming on loopback back on until we fix the bugs in it.
Connect over tcp on the loopback is broken:

4729 amq 0.000007 CALL connect(4,0x804f2a0,0x1c)
4729 amq 75.007420 RET connect -1 errno 60 Connection timed out


# 1.206 15-Dec-2004 thorpej

Don't perform checksums on loopback interfaces. They can be reenabled with
the net.inet.*.do_loopback_cksum sysctl.

Approved by: groo


Revision tags: kent-audio1-base
# 1.205 06-Oct-2004 darrenr

Add a comment to document what setting "srcrt" is really on about in ipintr()


# 1.204 29-Sep-2004 christos

PR/27081: Sean Boudreau: ip_input() bad csum count not incremented on sw csum


Revision tags: BEFORE-IPF413
# 1.203 25-May-2004 atatat

Sysctl descriptions under net subtree (net.key not done)


# 1.202 02-May-2004 darrenr

at line 543, we do a pullup here of hlen bytes into the mbuf,
so these later ones are superfluous.


# 1.201 01-May-2004 matt

Use EVCNT_ATTACH_STATIC{,2}


# 1.200 25-Apr-2004 simonb

Initialise (most) pools from a link set instead of explicit calls
to pool_init. Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.

Convert struct session, ucred and lockf to pools.


# 1.199 22-Apr-2004 matt

Constify protosw arrays. This can reduce the kernel .data section by
over 4K (if all the network protocols) are loaded.


# 1.198 01-Apr-2004 matt

In ip_reass_ttl_descr, make i signed since it's compared to >= 0


Revision tags: netbsd-2-0-base BEFORE-IPF411
# 1.197 24-Mar-2004 atatat

branches: 1.197.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.


# 1.196 15-Jan-2004 itojun

correct typo in 1.94 -> 1.95. pointed out by Shiva Shenoy


# 1.195 14-Dec-2003 thorpej

Fix syntax errors in CHECK_NMBCLUSTER_PARAMS().


# 1.194 14-Dec-2003 jonathan

Second part of hashed IP_reassembly changes:

When under pressure for mbufs or we have too many fragments in the IP
reassembly queue, drop half of all fragments. This multiplicative-drop
strategy ensures we return to a healthy state, even under borderline
denial-of-service from extremely lossy NFS-over-UDP peers.
The multiplicative-drop phase currently drops 50% of fragments, but
has pre-placed support for implementing drop-fractions other than 50%

The threshhold for the `drop-half' phase is the new variable,
ip_maxfrags which is calculated as nmbclusters/4.

ip_input.c now keeps ip_nmbclusters, a cached copy of nmbclusters.
Before using limits derived from nmbclusters, we check if nmbclusters
and ip_nmclusters are equal. If not, we recompute Ip parameters
derived from nmbclusters. Based on a suggestion by Jason Thorpe.
ip_maxfrags is currently auto-recalcuated.

The counters ip_nfrags and ip_nfragpacketsr are now declared static
and uninitialized (bss), to discourage tampering with them.


# 1.193 12-Dec-2003 scw

Make fast-ipsec and ipflow (Fast Forwarding) interoperate.

The idea is that we only clear M_CANFASTFWD if an SPD exists
for the packet. Otherwise, it's safe to add a fast-forward
cache entry for the route.

To make this work properly, we invalidate the entire ipflow
cache if a fast-ipsec key is added or changed.


# 1.192 08-Dec-2003 jonathan

Add new field ipq_nfrags to struct ipq. Maintain count of fragments
(fragments, not fragmented packets) in each queue entry.
Use ipq_nfrags to maintain a count of total fragments in reassembly queue.


# 1.191 07-Dec-2003 jonathan

KNF: s/unsigned/u_int/, in a couple of places I missed.


# 1.190 06-Dec-2003 jonathan

Replace the single global IP reassembly list/listhead, with a
hashtable of list-heads. Independently re-invented, then reworked to
match similar code in FreeBSD.


# 1.189 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.188 04-Dec-2003 scw

ipflow (IP fast forwarding) is not compatible with FAST_IPSEC either.

XXX: The decision whether or not to fast forward should be made
XXX: dynamically. Using the current approach seriously reduces
XXX: routing performance on gateways with IPsec enabled.


# 1.187 26-Nov-2003 itojun

define RANDOM_IP_ID by default (unifdef -DRANDOM_IP_ID).
one use remains in sys/netipsec, which is kept for freebsd source code compat.


# 1.186 24-Nov-2003 scw

For FAST_IPSEC, ipfilter gets to see wire-format IPsec-encapsulated packets
only. Decapsulated packets bypass ipfilter. This mimics current behaviour
for Kame IPsec.


# 1.185 19-Nov-2003 fvdl

Correct number of arguments to sysctl_rdint.


# 1.184 19-Nov-2003 jonathan

Patch back support for (badly) randomized IP ids, by request:

* Include "opt_inet.h" everywhere IP-ids are generated with ip_newid(),
so the RANDOM_IP_ID option is visible. Also in ip_id(), to ensure
the prototype for ip_randomid() is made visible.

* Add new sysctl to enable randomized IP-ids, provided the kernel was
configured with RANDOM_IP_ID. (The sysctl defaults to zero, and is
a read-only zero if RANDOM_IP_ID is not configured).

Note that the implementation of randomized IP ids is still defective,
and should not be enabled at all (even if configured) without
very careful deliberation. Caveat emptor.


# 1.183 17-Nov-2003 jonathan

Diff to netinet/ip_input.c (restore ip_id, initialize) for ip_id fix:

Revert the (default) ip_id algorithm to the pre-randomid algorithm,
due to demonstrated low-period repeated IDs from the randomized IP_id
code. Consensus is that the low-period repetition (much less than
2^15) is not suitable for general-purpose use.

Allocators of new IPv4 IDs should now call the function ip_newid().
Randomized IP_ids is now a config-time option, "options RANDOM_IP_ID".
ip_newid() can use ip_random-id()_IP_ID if and only if configured
with RANDOM_IP_ID. A sysctl knob should be provided.

This API may be reworked in the near future to support linear ip_id
counters per (src,dst) IP-address pair.


# 1.182 12-Nov-2003 itojun

KNF


# 1.181 11-Nov-2003 jonathan

Change global head-of-local-IP-address list from in_ifaddr to
in_ifaddrhead. Recent changes in struct names caused a namespace
collision in fast-ipsec, which are most cleanly fixed by using
"in_ifaddrhead" as the listhead name.


# 1.180 10-Nov-2003 jonathan

Make per-protocol network input queue stats visible to userland via
sysctl. Add a protocol-independent sysctl handler to show the per-protocol
"struct ifq' statistics. Add IP(v4) specific call to the handler.
Other protocols can show their per-protocol input statistics by
allocating a sysclt node and calling sysctl_ifq() with their own struct ifq *.

As posted to tech-kern plus improvements/cleanup suggested by Andrew Brown.


# 1.179 28-Sep-2003 mycroft

Remove some code that breaks AH tunnels completely. The comment describing
the purpose of this code appears to be on crack -- it's talking about
end-to-end authentication, but the purpose of an AH tunnel is NOT end-to-end
authentication; it's authentication of the tunnel endpoints.

NB: This does not fix the fact that IPsec leaks "packet tags."


# 1.178 06-Sep-2003 itojun

randomize IPv4/v6 fragment ID and IPv6 flowlabel. avoids predictability
of these fields. ip_id.c is from openbsd. ip6_id.c is adapted by kame.


# 1.177 06-Sep-2003 itojun

backout previous, we don't know if arc4random() corrides on reboot.


# 1.176 05-Sep-2003 itojun

initialize fragment ID with arc4random, not by time.tv_sec


# 1.175 22-Aug-2003 itojun

remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.


# 1.174 22-Aug-2003 itojun

change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.


# 1.173 15-Aug-2003 jonathan

(fast-ipsec): Add hooks to pass IPv4 IPsec traffic into fast-ipsec, if
configured with ``options FAST_IPSEC''. Kernels with KAME IPsec or
with no IPsec should work as before.

All calls to ip_output() now always pass an additional compulsory
argument: the inpcb associated with the packet being sent,
or 0 if no inpcb is available.

Fast-ipsec tested with ICMP or UDP over ESP. TCP doesn't work, yet.


# 1.172 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.171 14-Jul-2003 itojun

correct igmp. from love


# 1.170 03-Jul-2003 itojun

minor KNF


# 1.169 30-Jun-2003 itojun

branches: 1.169.2;
do not generate ICMP redirect when packet filter alters ip_dst to an
address that reside on the same link. Cedric Berger convinced me that
it is necessary.


# 1.168 30-Jun-2003 itojun

fix indent


# 1.167 23-Jun-2003 martin

Make sure to include opt_foo.h if a defflag option FOO is used.


# 1.166 15-Jun-2003 matt

Change the way multicasts are kept. They now use a hash table in the same
manner as the ifaddr hash table. By doing this, the mkludge code can go
away. At the same time, keep track of what pcbs are using what ifaddr and
when an address is deleted from an interface, notify/abort all sockets
that have that address as a source. Switch IGMP and multicasts to use pools
for allocation. Fix a number of potential problems in the igmp code where
allocation failures could cause a trap/panic.


# 1.165 11-Apr-2003 christos

PR/991: Darren Reed: Add a sysctl (checkinteface) to implement this. This
implementation is taken from FreeBSD, but we default to off.
XXX: We should really do this on a per ifaddr basis as jason suggested.


# 1.164 26-Feb-2003 matt

Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.163 12-Nov-2002 itojun

remove all entries in rt timer queue on ip_mtudisc change, instead of
destroying the queue.


# 1.162 12-Nov-2002 itojun

ckout previous - doesn't compile


# 1.161 12-Nov-2002 itojun

update ip_mtudisc sysctl change handling.


# 1.160 10-Nov-2002 itojun

always create pmtud timeout queue, as ip_mtudisc can be tweaked via
sysctl at runtime. From lha@stacken.kth.se


# 1.159 02-Nov-2002 perry

/*CONTCOND*/ while (0)'ed macros


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.158 23-Sep-2002 itojun

revert mtudisc_timeout value to the old one if update falis


# 1.157 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame.


# 1.156 11-Sep-2002 itojun

correct signedness mixup in pointer passing. sync w/kame


Revision tags: gehenna-devsw-base
# 1.155 14-Aug-2002 itojun

avoid swapping endian of ip_len and ip_off on mbuf, to meet with M_LEADINGSPACE
optimization made last year. should solve PR 17867 and 10195.

IP_HDRINCL behavior of raw ip socket is kept unchanged. we may want to
provide IP_HDRINCL variant that does not swap endian.


# 1.154 30-Jun-2002 thorpej

Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).


# 1.153 13-Jun-2002 itojun

set IPv4 parameter to modern value.
- turn on path MTU discovery (previous: turned off)
- ICMPv4 redirect entry timeout = 600 sec (previous: never timeout)


# 1.152 09-Jun-2002 itojun

whitespace


# 1.151 07-Jun-2002 itojun

look at rmx_mtu on IPsec tunnel MTU computation.
From: David Waitzman <djw@bbn.com>


Revision tags: netbsd-1-6-base
# 1.150 12-May-2002 matt

branches: 1.150.2; 1.150.4;
Eliminate commons.


# 1.149 12-May-2002 wiz

Spelling fixes, from Sergey Svishchev in kern/16650.


# 1.148 07-May-2002 matt

Change struct ipqe to use TAILQ's instead of LIST's (primarily for TCP's
benefit currently). Rework tcp_reass code to optimize the 4 most likely causes
of out-of-order packets: first OoO pkt, next OoO pkt in seq, OoO pkt is part
of new chuck of OoO packets, and the OoO pkt fills the first hole. Add evcnts
to instrument tcp_reass (enabled by the options TCP_REASS_COUNTERS). This is
part 1/2 of tcp_reass changes.


# 1.147 18-Apr-2002 matt

Change test for M_EXT to M_READONLY for MROUTING. We only need to to do
a pullup if we aren't allowed to modify the packet.


Revision tags: eeh-devprop-base newlock-base
# 1.146 08-Mar-2002 thorpej

Pool deals fairly well with physical memory shortage, but it doesn't
deal with shortages of the VM maps where the backing pages are mapped
(usually kmem_map). Try to deal with this:

* Group all information about the backend allocator for a pool in a
separate structure. The pool references this structure, rather than
the individual fields.
* Change the pool_init() API accordingly, and adjust all callers.
* Link all pools using the same backend allocator on a list.
* The backend allocator is responsible for waiting for physical memory
to become available, but will still fail if it cannot callocate KVA
space for the pages. If this happens, carefully drain all pools using
the same backend allocator, so that some KVA space can be freed.
* Change pool_reclaim() to indicate if it actually succeeded in freeing
some pages, and use that information to make draining easier and more
efficient.
* Get rid of PR_URGENT. There was only one use of it, and it could be
dealt with by the caller.

From art@openbsd.org.


Revision tags: ifpoll-base
# 1.145 25-Feb-2002 itojun

correctly enforce ipsec policy check on forwarding case.
From: Greg Troxel <gdt@ir.bbn.com>, Bill Chiarchiaro <wjc@work.cleartech.com>


# 1.144 24-Feb-2002 martin

Clear M_BCAST and M_MCAST on outgoing mbufs.
Don't copy ttl from the inner packet to the encapsulating packet. Make
the outer ttl sysctl'able. This should close PR 14269 from Jasper Wallace
(change partly from there) and it makes traceroute work over gre tunnels.


# 1.143 21-Feb-2002 itojun

suppress source quence message, based on router-req RFC (also could be abused
as DoS traffic generator). from kjc/kame


# 1.142 28-Nov-2001 darrenr

recompute hlen after calling pfil_run_hooks() in case ip_hl was changed.


# 1.141 13-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-mips-cache-base
# 1.140 04-Nov-2001 matt

Convert netinet to not use the internal <sys/queue.h> field names
but instead the access macros. Use the FOREACH macros where appropriate.


# 1.139 04-Nov-2001 matt

Change a few variable/tables to const since they are read-only.


# 1.138 29-Oct-2001 simonb

Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.137 17-Sep-2001 thorpej

branches: 1.137.2;
Split the pre-computed ifnet checksum flags into Tx and Rx directions.
Add capabilities bits that indicate an interface can only perform
in-bound TCPv4 or UDPv4 checksums. There is at least one Gig-E chip
for which this is true (Level One LXT-1001), and this is also the
case for the Intel i82559 10/100 Ethernet chips.


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.136 06-Aug-2001 itojun

branches: 1.136.2;
cache IPsec policy on in6?pcb. most of the lookup operations can be bypassed,
especially when it is a connected SOCK_STREAM in6?pcb. sync with kame.


# 1.135 02-Jun-2001 thorpej

branches: 1.135.2;
Implement support for IP/TCP/UDP checksum offloading provided by
network interfaces. This works by pre-computing the pseudo-header
checksum and caching it, delaying the actual checksum to ip_output()
if the hardware cannot perform the sum for us. In-bound checksums
can either be fully-checked by hardware, or summed up for final
verification by software. This method was modeled after how this
is done in FreeBSD, although the code is significantly different in
most places.

We don't delay checksums for IPv6/TCP, but we do take advantage of the
cached pseudo-header checksum.

Note: hardware-assisted checksumming defaults to "off". It is
enabled with ifconfig(8). See the manual page for details.

Implement hardware-assisted checksumming on the DP83820 Gigabit Ethernet,
3c90xB/3c90xC 10/100 Ethernet, and Alteon Tigon/Tigon2 Gigabit Ethernet.


# 1.134 21-May-2001 lukem

fix spelo in comment


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.133 16-Apr-2001 itojun

give a default value to net.inet.ip.maxfragpackets, to protect us from
"lots of fragmented packets" DoS attack.

the current default value is derived from ipv6 counterpart, which is
a magical value "200". it should be enough for normal systems, not sure
if it is enough when you take hundreds of thousands of tcp connections on
your system. if you have proposal for a better value with concrete reasons,
let me know.


# 1.132 13-Apr-2001 thorpej

Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.


# 1.131 27-Mar-2001 itojun

net.inet.ip.maxfragpackets defines the maximum size of ip reass queue
(prevents fragment flood from chewing up mbuf memory space).
derived from KAME net.inet6.ip6.maxfragpackets.


# 1.130 02-Mar-2001 itojun

branches: 1.130.2;
increase ipstat.ips_badaddr if the packet fails to pass address checks.


# 1.129 02-Mar-2001 itojun

reject packets with 127/8 on IPv4 src/dst, they must not appear on wire
(RFC1122). torture-tests will be welcomed.
XXX do we want to check source routing headers as well?


# 1.128 01-Mar-2001 itojun

make sure to enforce inbound ipsec policy checking, for any protocols on top
of ip (check it when final header is visited). sync with kame.
XXX kame team will need to re-check policy engine code


# 1.127 24-Jan-2001 itojun

- record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation


# 1.126 28-Dec-2000 thorpej

Back out the sledgehammer damage applied by wiz while I was out for
the holiday.


# 1.125 25-Dec-2000 wiz

Back out previous change. It causes NAT to fail, and was CLEARLY
NOT TESTED before it was committed.


# 1.124 22-Dec-2000 thorpej

Slight adjustment to how pfil_head's are registered. Instead of a
"key" and a "dlt", use a "type" (PFIL_TYPE_{AF,IFNET} for now) and
a val/ptr appropriate for that type. This allows for more future
flexibility with the pfil_hook mechanism.


# 1.123 14-Dec-2000 thorpej

Add ALTQ glue. XXX Temporary until ALTQ is changed to use a pfil hook.


# 1.122 24-Nov-2000 itojun

IFA_STATS stability (not complete); don't touch ip if it is NULL.


# 1.121 11-Nov-2000 thorpej

Restructure the PFIL_HOOKS mechanism a bit:
- All packets are passed to PFIL_HOOKS as they come off the wire, i.e.
fields in protocol headers in network order, etc.
- Allow for multiple hooks to be registered, using a "key" and a "dlt".
The "dlt" is a BPF data link type, indicating what type of header is
present.
- INET and INET6 register with key == AF_INET or AF_INET6, and
dlt == DLT_RAW.
- PFIL_HOOKS now take an argument for the filter hook, and mbuf **,
an ifnet *, and a direction (PFIL_IN or PFIL_OUT), thus making them
less IP (really, IP Filter) centric.

Maintain compatibility with IP Filter by adding wrapper functions for
IP Filter.


# 1.120 08-Nov-2000 ad

Update for hashinit() change.


# 1.119 13-Oct-2000 itojun

make sure we don't share external mbuf between m and mcopy, in ip_forward().
should solve PR 11201.


# 1.118 26-Aug-2000 itojun

make sure anonport{min,max} is not negative number


# 1.117 25-Aug-2000 tron

Add new sysctl variables "net.inet.ip.lowportmin" and
"net.inet.ip.lowportmax" which can be used to the set minimum
and maximum port number assigned to sockets using
IP_PORTRANGE_LOW.


# 1.116 06-Jul-2000 itojun

remove unnecessary #include <netkey/key_debug.h>. from kame.


# 1.115 28-Jun-2000 mrg

<vm/vm.h> -> <uvm/uvm_extern.h>


Revision tags: netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.114 10-May-2000 itojun

branches: 1.114.4;
add missing boundary checks to ip options processing.
correct timestamp option validation (len and ptr upper/lower bound
based on RFC791).
fill "pointer" field for parameter problem in timestamp option processing.


# 1.113 10-May-2000 itojun

correct more out-of-bounds memory access, if cnt == 1 and optlen > 1.


# 1.112 06-May-2000 sommerfeld

Handle large offsets with very small options correctly.


# 1.111 31-Mar-2000 jdolecek

Slighly improve previous - only include <netinet/ip_mroute.h> if MROUTING
is defined.


# 1.110 31-Mar-2000 jdolecek

include <netinet/ip_mroute.h> for ip_mforward() - needed after
last duplicate prototype sweep (prototype for ip_mforward() used to be in <netinet/ip_var.h>)


# 1.109 30-Mar-2000 augustss

Remove register declarations.


# 1.108 30-Mar-2000 simonb

Delete uninitialised declaration of ip_defttl - there's an initialised
decl earlier in this file.


# 1.107 10-Mar-2000 thorpej

Back out previous, and adjust a comment.


# 1.106 07-Mar-2000 thorpej

Back out part of 1.104 which isn't actually needed.


# 1.105 03-Mar-2000 itojun

remove unnecessary ttl initialization which I mistakingly bringed in
during KAME merge (this is part of WIDE's expeirmental reass code...)
NetBSD PR: 9412
From: Wolfgang Rupprecht <wolfgang@wsrcc.com>
Fix from: ho@crt.se
itojun was notified from: theo


# 1.104 02-Mar-2000 thorpej

Avoid a bug in GCC which manifests itself when processing unaligned
IP options. Problem pointed out by Matt Hargett and Erik Fair, analyzed
by me.


# 1.103 01-Mar-2000 itojun

introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)


# 1.102 20-Feb-2000 darrenr

pass "struct pfil_head *" to pfil_add_hook and pfil_remove hook rather
than "struct protosw *".


# 1.101 17-Feb-2000 darrenr

Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.


# 1.100 16-Feb-2000 itojun

- if ip_dst matches address on !IFF_UP interface, and
- there's no match against addresses on IFF_UP interface,
send icmp unreach if I'm router. drop it if I'm host.

Revised version of PR: 9387 from nrt@iij.ad.jp. Discussed with thorpej+nrt.


Revision tags: chs-ubc2-newbase
# 1.99 12-Feb-2000 thorpej

Typo (Thanks, Havard :-)


# 1.98 12-Feb-2000 thorpej

Small cosmetic change, and note a place where a statistic should be
gathered.


# 1.97 11-Feb-2000 itojun

fix in-kernel packet forwarding loop (till TTL becomes 0) when:
- a packet is delivered to an address X,
- and the address X is configured on my !IFF_UP interface
- and ipforwarding=1

NetBSD PR: 9387
From: nrt@iij.ad.jp


# 1.96 01-Feb-2000 thorpej

Use ifatoia() and sintosa() consistently, rather than using home-grown
casting macros intermixed.


# 1.95 31-Jan-2000 itojun

bring in latest KAME ipsec tree.
- interop issues in ipcomp is fixed
- padding type (after ESP) is configurable
- key database memory management (need more fixes)
- policy specification is revisited

XXX m->m_pkthdr.rcvif is still overloaded - hope to fix it soon


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.94 26-Oct-1999 itojun

disable ipflow (IPv4 fast fowarding) when IPsec is configured into the kernel.


# 1.93 17-Oct-1999 sommerfeld

branches: 1.93.2; 1.93.4;
In ip_forward():

Avoid forwarding ip unicast packets which were contained inside
link-level multicast packets; having M_MCAST still set in the packet
header flags will mean that the packet will get multicast to a bogus
group instead of unicast to the next hop.

Malformed packets like this have occasionally been spotted "in the
wild" on a mediaone cable modem segment which also had multiple netbsd
machines running as router/NAT boxes.

Without this, any subnet with multiple netbsd routers receiving all
multicasts will generate a packet storm on receipt of such a
multicast. Note that we already do the same check here for link-level
broadcasts; ip6_forward already does this as well.

Note that multicast forwarding does not go through ip_forward().

Adding some code to if_ethersubr to sanity check link-level
vs. ip-level multicast addresses might also be worthwhile.


Revision tags: chs-ubc2-base
# 1.92 23-Jul-1999 itojun

branches: 1.92.2;
do not include unnecessary include files.


# 1.91 09-Jul-1999 thorpej

defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).


# 1.90 06-Jul-1999 itojun

sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups


# 1.89 01-Jul-1999 itojun

IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.88 26-Jun-1999 sommerfeld

If the new global variable hostzerobroadcast is zero, no longer assume
address zero of each net/subnet is a broadcast address.
(The default value is nonzero, which preserves the current behavior).

This can be set using sysctl; the boot-time default can also be
configured using the HOSTZEROBROADCAST kernel config option.

While we're here, defopt HOSTZEROBROADCAST and SUBNETSARELOCAL


# 1.87 04-May-1999 hwr

It does not make much sense to increase a "output" counter on input.


# 1.86 03-May-1999 thorpej

In INADDR_TO_IA(), skip interfaces which are not up. Revert previous change
to ip_input.c to check the interface status after INADDR_TO_IA().

Fix cooked up by Heiko Rupp and myself.

Fixes PR 7480.


# 1.85 03-May-1999 hwr

Drop packets, that have a Class-D address as source address.
Implements the first half of PR 7003.


# 1.84 07-Apr-1999 proff

tiny KNF change


# 1.83 07-Apr-1999 proff

Prevent reception of packets on downed interfaces (via an up interface).
fixes kern/7327


Revision tags: netbsd-1-4-base
# 1.82 27-Mar-1999 aidan

branches: 1.82.2;
Added per-addr input/output statistics. Currently just support netatalk
and netinet, currently only tested under netinet.

Disabled by default, enabled by compiling the kernel with option
IFA_STATS. Enabling this feature seems to make the ip_output function
take 13% longer than before, which should be OK for people that need
this feature.


# 1.81 26-Mar-1999 proff

security: test for ip_len < ip_hl <<2 and drop packet accordingly


# 1.80 19-Jan-1999 mycroft

There's just no plausible reason to byte-swap ip_id internally. It's opaque.


# 1.79 19-Jan-1999 mycroft

Don't screw with ip_len; just subtract from it where we actually use the
value.


# 1.78 19-Jan-1999 mycroft

Don't overwrite the checksum fields when checking them. There's no reason to
do this, and it screws up ICMP replies.
XXX The returned IP checksum and length are still wrong.


# 1.77 11-Jan-1999 thorpej

Fix byte order and ip_len inconsistencies in ICMP reply code. Also, fix
some formatting and HTONS(foo) vs. foo = htons(foo) inconsistencies.

PR #6602, Darren Reed.


# 1.76 19-Dec-1998 thorpej

Reverse the copyright-notice-swap. It went against existing practice.


# 1.75 18-Dec-1998 thorpej

Add a lock around the IP fragment reassembly queue, to prevent ip_drain()
from corrupting the queue if called from a device's interrupt context.

Should fix PR #5684.


Revision tags: kenh-if-detach-base
# 1.74 13-Nov-1998 thorpej

branches: 1.74.2;
Once a fragmented IP packet has been reassembled, recompute the packet
length before passing it up the stack. From FreeBSD.


Revision tags: chs-ubc-base
# 1.73 08-Oct-1998 thorpej

Use the pool allocator for ipflow entries.


# 1.72 08-Oct-1998 thorpej

Use the pool allocator for ipqent structures.


# 1.71 30-Sep-1998 tls

Switch order of TNF and UCB copyrights so UCB copyright is first; this seems more appropriate since UCB wrote the original code, after all.


# 1.70 09-Sep-1998 thorpej

Make a diagnostic printf more sensible, PR #5951, Heiko W. Rupp.


# 1.69 09-Aug-1998 mrg

defopt PFIL_HOOKS.


Revision tags: eeh-paddr_t-base
# 1.68 17-Jul-1998 sommerfe

Fix PR5508: ipfil cut-through forwarding causes panic


# 1.67 01-Jun-1998 thorpej

Protect the ipflow_reap() call with splsoftnet.


# 1.66 24-May-1998 thorpej

Fix OBOB in IP timestamp option processing, as noted in FreeBSD PR 6738,
from Jennifer Dawn Meyers <jdm@enteract.com>.


# 1.65 04-May-1998 matt

Default IP flow to being enabled. Add a sysctl to control the maximum
number of flows (net.inet.ip.maxflows). If set to 0, will disable fast
path forwarding.


# 1.64 01-May-1998 thorpej

Allow packet filters to prevent a packet from creating a fast-forwarding
flow, by setting the "can fast forward" flag in the packet header, and
giving a chance for filters to clear the flag. If the flag is still
set after the filters have given it a chance, the packet will be used
to create a fast-forward flow entry.


# 1.63 29-Apr-1998 matt

Add support for "fast" forwarding. Add hooks in if_ethersubr.c and
if_fddisubr.c to fastpath IP forwarding. If ip_forward successfully
forwards a packet, it will create a cache (ipflow) entry. ether_input
and fddi_input will first call ipflow_fastforward with the received
packet and if the packet passes enough tests, it will be forwarded (the
ttl is decremented and the cksum is adjusted incrementally).


# 1.62 29-Apr-1998 matt

defopt GATEWAY


# 1.61 29-Apr-1998 kml

change path MTU timeout value to match RFC 1191


# 1.60 29-Apr-1998 kml

Add support for deletion of routes added by path MTU discovery;
uses new generic route timeout code. Add sysctl for timeout period.


# 1.59 19-Mar-1998 mrg

convert pfil(9) in and out lists from <sys/queue.h> LISTs to TAILQs, and
change pfil_add_hook to put output filters at the tail of the queue,
while continuing to place input filters at the head of the queue. update
the two users of these functions, and document these changes.

fixes PR#4593.


# 1.58 15-Feb-1998 tls

Add correct copyright notice for IP address hash change. This code is donated to TNF by the original copyright holder, Panix.


# 1.57 13-Feb-1998 tls

Change list of interface IP addresses to a hash. Improves performance on hosts with a large number of IP addresses significantly.


# 1.56 28-Jan-1998 thorpej

Use offsetof() from libkern.h


# 1.55 12-Jan-1998 scottr

Use option header file for MROUTING


# 1.54 05-Jan-1998 lukem

enhance ephemeral port allocation code:
* support sysctl net.inet.ip.anonportmin (lowest ephemeral port)
and net.inet.ip.anonportmax (highest ephemeral port).
these can't be set to >65535, < IPPORT_RESERVED (unless IPNOPRIVPORTS
is defined), and anonportmin has to be < anonportmax.
* use a cleaner way of only cycling through the available set once;
this will be useful for when a random allocation scheme is used
* define IPPORT_ANON{MIN,MAX} instead of IPPORT_USER{LOW,HIGH}


Revision tags: netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base
# 1.53 18-Oct-1997 kml

branches: 1.53.2;
change sysctl net.inet.icmp.mtudisc to net.inet.ip.mtudisc


# 1.52 17-Oct-1997 thorpej

Allow `subnetsarelocal' to be changed via sysctl.


Revision tags: thorpej-signal-base marc-pcmcia-base
# 1.51 29-Aug-1997 gwr

Tweaks to allow operation with an interface address of 0.0.0.0
(needed for NFS mountroot using BOOTP to get boot parameters)


Revision tags: marc-pcmcia-bp
# 1.50 24-Jun-1997 thorpej

branches: 1.50.4;
Eliminate use of dtom() from the network code, allowing more flexible
use of mbuf external storage and increasing performance (by eliminating
an m_pullup() for clusters in the IP reassembly code).

Changes from Koji Imada <koji@math.human.nagoya-u.ac.jp>, in PR #3628
and #3480, with ever-so-slight integration changes by me.


# 1.49 15-Apr-1997 christos

Move the mtod calls *after* we've made sure that the packet has passed the
filter successfully. Otherwise it can be NULL if the filter blocked it,
and we die. How did this ever work?


Revision tags: is-newarp-before-merge
# 1.48 26-Feb-1997 mrg

allow src-routed packetd by default, per host requirements


# 1.47 25-Feb-1997 cjs

Add net.inet.ip.allowsrcrt option which allows/drops all source
routed packets. This currently defaults to `drop,' but once we
verify that all applications that rely on determining remote IP
addresses for authentication are dropping the connection when they
see a source route option (not just disabling the source route
option), we can turn this back on and conform with the host
requirements.


# 1.46 19-Feb-1997 cjs

Fix bug in sysctl net.inet.ip.forwsrcrt handing: now you can read it
if securelevel > 0. (Thanks, cgd.)


# 1.45 18-Feb-1997 mrg

pseudo-device ipfilter brings in PFIL_HOOKS.


Revision tags: is-newarp-base
# 1.44 11-Jan-1997 thorpej

branches: 1.44.4;
Implement the IP_RECVIF socket option: supply a datagram packet's incoming
interface using a sockaddr_dl in a control mbuf.

Implement SO_TIMESTAMP for IP datagrams.

Move packet information option processing into a generic function
so that they work with multicast UDP and raw IP as well as unicast UDP.

Contributed by Bill Fenner <fenner@parc.xerox.com>.


# 1.43 20-Dec-1996 mrg

in pfil_hooks: always reassign ip after calling hook.


# 1.42 20-Dec-1996 mrg

remove pfil_bad.


# 1.41 25-Oct-1996 thorpej

Before concatenating frags, sanity check the length of the packet. If it's
larger than IP_MAXPACKET, discard it.
Based on a patch from Bill Fenner <fenner@parc.xerox.com>


# 1.40 22-Oct-1996 veego

Fix a panic from the pfil_hooks.


# 1.39 13-Oct-1996 christos

backout previous kprintf changes


# 1.38 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.37 21-Sep-1996 perry

commit fix in pr 2772 -- the IP input code was assuming that the
reserved (must be zero) flag must necessarily be zero. We now define
an IP_RF (by analogy to IP_DF and IP_MF) and mask it out when necessary.


# 1.36 14-Sep-1996 mrg

move the packet filter hooks in to a saner location. while i'm here, rename
PACKET_FILTER to PFIL_HOOKS.


# 1.35 09-Sep-1996 mycroft

Add in_nullhost() and in_hosteq() macros, to hide some protocol
details. Also, fix a bug in TCP wrt SYN+URG packets.


# 1.34 08-Sep-1996 mycroft

Save 68 bytes of the packet for ICMP, not 64. From Laine Stump, PR 2296.


# 1.33 06-Sep-1996 mrg

add packet filter interface code. see pfil(9) for more details. you
need the PACKET_FILTER option to enable this code. currently, ipfilter
version 3.1.1-beta has been converted to use this new interface.


# 1.32 14-Aug-1996 thorpej

Fix some DIAGNOSTIC printf() formats; ntohl() provides a 32-bit quantity,
and should be printed with %x, not %lx.


# 1.31 10-Jul-1996 cgd

print result of ntohl/htonl as a long. (makes -Wformat work on the
Alpha.)


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.30 16-Mar-1996 christos

branches: 1.30.4;
Fix printf format args.


# 1.29 26-Feb-1996 mrg

two more local addr changes, all done differently now (idea from charles)


# 1.28 13-Feb-1996 christos

netinet prototypes


# 1.27 16-Jan-1996 thorpej

Add a net.inet.ip.directed-broadcast sysctl as suggested by
Darren Reed <darrenr@vitruvius.arbld.unimelb.edu.au> in PR #1227.
This change is slightly different than the one submitted by Darren in
that the DIRECTED_BROADCAST compile-time option will behave like it used
to so that existing configurations utilizing it won't have to change.


# 1.26 15-Jan-1996 thorpej

Add net.inet.ip.forwsrcrt: if zero, the system will not forward
source-routed packets. Note this value is protected by kernel security
level; it can only be changed if securelevel < 1.


# 1.25 21-Nov-1995 cgd

make netinet work on systems where pointers and longs are 64 bits
(like the alpha). Biggest problem: IP headers were overlayed with
structure which included pointers, and which therefore didn't overlay
properly on 64-bit machines. Solution: instead of threading pointers
through IP header overlays, add a "queue element" structure to do
the threading, and point it at the ip headers.


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.24 12-Aug-1995 mycroft

splnet --> splsoftnet


# 1.23 12-Jun-1995 mycroft

Change in_pcbnotify*() to take an errno value. Make inetctlerrmap[] an
array on ints, not u_chars.


# 1.22 12-Jun-1995 mycroft

Various cleanup, including:
* Convert several data structures to use queue.h.
* Split in_pcbnotify() into two parts; one for notifying a specific PCB, and
one for notifying all PCBs for a particular foreign address.


# 1.21 07-Jun-1995 mycroft

Remove ip_ifmatrix completely.


# 1.20 04-Jun-1995 mycroft

Don't cast things unnecessarily.


# 1.19 04-Jun-1995 mycroft

Clean up many more casts.


# 1.18 01-Jun-1995 mycroft

Avoid byte-swapping IP addresses at run time.


# 1.17 15-May-1995 cgd

oops; forgot a '{'


# 1.16 14-May-1995 cgd

drop (and record) malformed IP fragments. Fixes pr 1030 (differently).


# 1.15 13-Apr-1995 cgd

be a bit more careful and explicit with types. (basically a large no-op.)


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.14 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.13 13-May-1994 mycroft

Update to 4.4-Lite networking code, with a few local changes.


# 1.12 14-Feb-1994 mycroft

PARANOID --> DIAGNOSTIC for inexpensive tests.


# 1.11 02-Feb-1994 hpeyerl

Multicast is no longer optional.


# 1.10 29-Jan-1994 brezak

Fix some cases of NOT dealing with m_pkthdr's. This code is still suspect though, at least this fixes some panics.


# 1.9 10-Jan-1994 mycroft

Should compile now with or without `options MULTICAST'.


# 1.8 09-Jan-1994 mycroft

Prototype the rest.


# 1.7 08-Jan-1994 mycroft

More prototypes.


# 1.6 08-Jan-1994 mycroft

Fix some inconsistent spacing; spaces at the end of lines, etc.


# 1.5 18-Dec-1993 mycroft

Canonicalize all #includes.


# 1.4 06-Dec-1993 hpeyerl

multicast support.
>From Chris Maeda, cmaeda@cs.washington.edu
These patches are derived from the IP Multicast patches for BSDI.


Revision tags: magnum-base netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.3 20-May-1993 cgd

branches: 1.3.4;
more rcsid additions and file header cleanups


# 1.2 04-May-1993 cgd

make ip_input recursion checking be for -DPARANOID, and make it panic


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.398 14-Feb-2021 christos

- centralize header align and pullup into a single inline function
- use a single macro to align pointers and expose the alignment, instead
of hard-coding 3 in 1/2 the macros.
- fix an issue in the ipv6 lt2p where it was aligning for ipv4 and pulling
for ipv6.


Revision tags: thorpej-futex-base
# 1.397 28-Aug-2020 ozaki-r

inet: reduce silent packet discards


# 1.396 28-Aug-2020 ozaki-r

inet: pull m_get_rcvif_psref out of ip_input for simplicity

Same as ip6_input.


# 1.395 28-Aug-2020 ozaki-r

ipsec: rename ipsec_ip_input to ipsec_ip_input_checkpolicy

Because it just checks if a packet passes security policies.


# 1.394 28-Aug-2020 ozaki-r

inet, inet6: count packets dropped by IPsec

The counters count packets dropped due to security policy checks.


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3 ad-namecache-base2 ad-namecache-base1 ad-namecache-base phil-wifi-20191119
# 1.393 13-Nov-2019 ozaki-r

Get rid of unnecessary NULL checks for rt_ifa and ifa_ifp

They are always non-NULL nowadays.


# 1.392 19-Sep-2019 ozaki-r

Apply some missing changes lost on the previous commit


# 1.391 19-Sep-2019 ozaki-r

Avoid having a rtcache directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.
A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.

Reviewed by knakahara@ and yamaguchi@


# 1.390 15-Sep-2019 bouyer

Packet filters can return an mbuf chain with fragmented headers, so
m_pullup() it if needed and remove the KASSERT()s.


Revision tags: netbsd-9-base phil-wifi-20190609
# 1.389 13-May-2019 ozaki-r

branches: 1.389.2;
Count packets dropped by pfil


Revision tags: isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118
# 1.388 17-Jan-2019 knakahara

Fix ipsecif(4) cannot apply input direction packet filter. Reviewed by ozaki-r@n.o and ryo@n.o.

Add ATF later.


Revision tags: pgoyette-compat-1226 pgoyette-compat-1126
# 1.387 15-Nov-2018 maxv

Remove the 't' argument from m_tag_find().


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.386 02-Sep-2018 maxv

remove reference to ipnat, and duplicate comments


Revision tags: pgoyette-compat-0728
# 1.385 10-Jul-2018 maxv

Remove the second argument from ip_reass_packet(). We want the IP header
on the mbuf, not elsewhere. Simplifies the NPF reassembly code a little.
No real functional change.


Revision tags: phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.384 17-May-2018 maxv

branches: 1.384.2;
Add KASSERTs, related to PR/39794.


# 1.383 14-May-2018 maxv

Merge ipsec4_input and ipsec6_input into ipsec_ip_input. Make the argument
a bool for clarity. Optimize the function: if M_CANFASTFWD is not there
(because already removed by the firewall) leave now.

Makes it easier to see that M_CANFASTFWD is not removed on IPv6.


# 1.382 10-May-2018 maxv

Rename ipsec4_forward -> ipsec_mtu, and switch to void.


Revision tags: pgoyette-compat-0502
# 1.381 26-Apr-2018 maxv

Remove unused mbuf argument from sbsavetimestamp.


Revision tags: pgoyette-compat-0422 pgoyette-compat-0415
# 1.380 15-Apr-2018 maxv

Introduce a m_verify_packet function, that verifies the mbuf chain of a
packet to ensure it is not malformed. Call this function in "points of
interest", that are the IPv4/IPv6/IPsec entry points. There could be more.

We use M_VERIFY_PACKET(m), declared under DIAGNOSTIC only.

This function should not be called everywhere, especially not in places
that temporarily manipulate (and clobber) the mbuf structure; once they're
done they put the mbuf back in a correct format.


# 1.379 11-Apr-2018 maxv

Don't pass IP_ALLOWBROADCAST in ipsec4_input. The flag lands in
ipsec_getpolicybyaddr, and only IP_FORWARDING is taken.

In fact it would be good to change the 'flags' argument of ipsec4_input
to be a boolean, same for ipsec_getpolicybyaddr. It would be less
misleading.


# 1.378 11-Apr-2018 maxv

Add comment about IPsec.


# 1.377 11-Apr-2018 maxv

Small changes in ip_dooptions: replace bcopy by memcpy, the areas can't
overlap.


Revision tags: pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.376 24-Feb-2018 ozaki-r

branches: 1.376.2;
Avoid a deadlock between softnet_lock and IFNET_LOCK

A deadlock occurs because there is a violation of the rule of lock ordering;
softnet_lock is held with hodling IFNET_LOCK, which violates the rule.
To avoid the deadlock, replace softnet_lock in in_control and in6_control
with KERNEL_LOCK.

We also need to add some KERNEL_LOCKs to protect the network stack surely.
This is required, for example, for PR kern/51356.

Fix PR kern/53043


# 1.375 09-Feb-2018 maxv

Remove dead code.


# 1.374 07-Feb-2018 maxv

Remove null check on ip, it can't be null. (Confuses code scanners.)


# 1.373 06-Feb-2018 maxv

Typos and style a bit, no functional change.


# 1.372 05-Feb-2018 maxv

Exterminate IPSENDREDIRECTS and IPMTUDISCTIMEOUT, neither is documented.


# 1.371 05-Feb-2018 maxv

Nuke DIRECTED_BROADCAST, it is not documented and not enabled anywhere. It
probably wouldn't have built correctly anyway, since there is no associated
defflag.

These ten lines of code in ip_input.c already look a lot better.


# 1.370 05-Feb-2018 maxv

Clean up this mess. This is typically the kind of places where we need to
seriously cut the bullshit. These things are unreadable, undocumented, and
all they bought us was not figuring out we had IPv4 forwarding enabled by
default for 20+ years.


# 1.369 05-Feb-2018 maxv

Be tougher, and don't allow LSRR+SSRR (RFC7126).


# 1.368 05-Feb-2018 maxv

Kick duplicate options, they are not allowed (RFC791).


# 1.367 05-Feb-2018 maxv

Remove unused variable.


# 1.366 05-Feb-2018 maxv

Disable ip_allowsrcrt and ip_forwsrcrt. Enabling them by default was a
completely dumb idea, because they have security implications.

By sending an IPv4 packet containing an LSRR option, an attacker will
cause the system to forward the packet to another IPv4 address - and
this way he white-washes the source of the packet.

It is also possible for an attacker to reach hidden networks: if a server
has a public address, and a private one on an internal network (network
which has several internal machines connected), the attacker can send a
packet with:

source = 0.0.0.0
destination = public address of the server
LSRR first address = address of a machine on the internal network

And the packet will be forwarded, by the server, to the internal machine,
in some cases even with the internal IP address of the server as a source.


# 1.365 05-Feb-2018 maxv

Style, no functional change.


# 1.364 01-Jan-2018 christos

1) "#define ipi_spec_dst ipi_addr" in <netinet/in.h>
2) Change the IP_RECVPKTINFO option to control the generation of
IP_PKTINFO control messages, the way it's done in Solaris.
3) Remove the superfluous IP_RECVPKTINFO control message.
4) Change the IP_PKTINFO option to do different things depending on
the parameter it's supplied with:
- If it's sizeof(int), assume it's being used as in Linux:
- If it's non-zero, turn on the IP_RECVPKTINFO option.
- If it's zero, turn off the IP_RECVPKTINFO option.
- If it's sizeof(struct in_pktinfo), assume it's being used as in
Solaris, to set a default for the source interface and/or
source address for outgoing packets on the socket.
5) Return what Linux or Solaris compatible code expects, depending
on data size, and just added a fallback to a Linux (and current NetBSD)
compatible value if the size is unknown (as it is now), or,
in the future, if the calling application specifies a receiving
buffer that doesn't match either data item.

From: Tom Ivar Helbekkmo


Revision tags: tls-maxphys-base-20171202
# 1.363 24-Nov-2017 roy

Allow local communication over DETACHED addresses.
Allow binding to DETACHED or TENTATIVE addresses as we deny
sending upstream from them anyway.
Prefer non DETACHED or TENTATIVE addresses.


# 1.362 17-Nov-2017 ozaki-r

Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch

It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.

No functional change


# 1.361 27-Sep-2017 ozaki-r

Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).


Revision tags: nick-nhusb-base-20170825
# 1.360 27-Jul-2017 ozaki-r

Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.


# 1.359 19-Jul-2017 ozaki-r

Correct a comment


Revision tags: perseant-stdc-iso10646-base
# 1.358 08-Jul-2017 christos

Reorder the controls to the ones that need an interface and the ones that
don't; process the ones that don't first. Add a DIAGNOSTIC if there is no
interface; really this should be a KASSERT/panic because it is a bug if the
interface is not set at this point.


# 1.357 06-Jul-2017 christos

remove unnecessary casts (no functional change)


# 1.356 06-Jul-2017 christos

Merge the two copies SO_TIMESTAMP/SO_OTIMESTAMP processing to a single
function, and add a SOOPT_TIMESTAMP define reducing compat pollution from
5 places to 1.


Revision tags: netbsd-8-base
# 1.355 01-Jun-2017 chs

branches: 1.355.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base
# 1.354 31-Mar-2017 ozaki-r

Don't use a single global variable to store source route information for multiple incoming packets

It's not MP-safe. So use a m_tag to store the information instead.

Pointed out by knakahara@
The fix is from OpenBSD (originally fixed in FreeBSD)


# 1.353 31-Mar-2017 ozaki-r

Don't use a single global variable as a temporal storage for multiple packets

It's not MP-safe. So use local variables instead.


Revision tags: pgoyette-localcount-20170320
# 1.352 06-Mar-2017 ozaki-r

Make sure icmp_redirect_timeout_q and ip_mtudisc_timeout_q are initialized on bootup

Fix PR kern/52029


# 1.351 17-Feb-2017 ozaki-r

Fix return value


# 1.350 17-Feb-2017 ozaki-r

Protect sysctl_net_inet_ip_pmtudto with icmp_mtx instead of softnet_lock


# 1.349 07-Feb-2017 ozaki-r

Add missing NULL checks for m_get_rcvif


Revision tags: nick-nhusb-base-20170204
# 1.348 24-Jan-2017 ozaki-r

Tweak softnet_lock and NET_MPSAFE

- Don't hold softnet_lock in some functions if NET_MPSAFE
- Add softnet_lock to sysctl_net_inet_icmp_redirtimeout
- Add softnet_lock to expire_upcalls of ip_mroute.c
- Restore softnet_lock for in{,6}_pcbpurgeif{,0} if NET_MPSAFE
- Mark some softnet_lock for future work


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107
# 1.347 12-Dec-2016 ozaki-r

branches: 1.347.2;
Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.346 08-Dec-2016 ozaki-r

Use psref for ip_rtaddr

ip_rtaddr will be sleepable soon. So use psref instead of pserialize.


# 1.345 08-Dec-2016 ozaki-r

Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.344 18-Oct-2016 ozaki-r

Don't hold global locks if NET_MPSAFE is enabled

If NET_MPSAFE is enabled, don't hold KERNEL_LOCK and softnet_lock in
part of the network stack such as IP forwarding paths. The aim of the
change is to make it easy to test the network stack without the locks
and reduce our local diffs.

By default (i.e., if NET_MPSAFE isn't enabled), the locks are held
as they used to be.

Reviewed by knakahara@


# 1.343 18-Oct-2016 ozaki-r

Avoid double frees of mbuf

May fix one of panicks reported by Tom Ivar Helbekkmo in PR kern/51522


# 1.342 11-Oct-2016 ozaki-r

Fix kernel builds with IFA_STATS


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.341 07-Sep-2016 roy

Disallow input to detached addresses because they are not yet valid.


# 1.340 31-Aug-2016 ozaki-r

Make ipforward_rt and ip6_forward_rt percpu

Sharing one rtcache between CPUs is just a bad idea.

Reviewed by knakahara@


Revision tags: pgoyette-localcount-20160806
# 1.339 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


# 1.338 26-Jul-2016 ozaki-r

Fix downmatch increment


Revision tags: pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.337 08-Jul-2016 ozaki-r

branches: 1.337.2;
CID 1363344: remove dead code

We may need to reconsider a case when m_get_rcvif_psref returns NULL.


# 1.336 07-Jul-2016 ozaki-r

Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.


# 1.335 06-Jul-2016 ozaki-r

Switch the IPv4 address list to pslist(9)

Note that we leave the old list just in case; it seems there are some
kvm(3) users accessing the list. We can remove it later if we confirmed
nobody does actually.


# 1.334 06-Jul-2016 ozaki-r

Add and use pslist(9)-based hashtable for IPv4 addresses

Note that we leave the old hashtable to keep vmstat -H working.


# 1.333 04-Jul-2016 ozaki-r

Separate IP address matching functions

No functional change intended.


# 1.332 30-Jun-2016 ozaki-r

Tidy up goto lables

No functional change.


# 1.331 30-Jun-2016 ozaki-r

Fix error paths

Some error paths did m_put_rcvif_psref twice.


# 1.330 28-Jun-2016 ozaki-r

Add missing NULL checks for m_get_rcvif_psref


# 1.329 10-Jun-2016 ozaki-r

Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319
# 1.328 21-Jan-2016 riastradh

Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.


# 1.327 21-Jan-2016 riastradh

Give proper prototype to ip_output.


# 1.326 08-Jan-2016 knakahara

eliminate ip_input.c and ip6_input.c dependency on gif(4)


Revision tags: nick-nhusb-base-20151226
# 1.325 13-Oct-2015 roy

Include arp.h to restore the sysctl net.inet.ip.dad_count.
Fixes PR kern/49883 thanks to HITOSHI Osada.


Revision tags: nick-nhusb-base-20150921
# 1.324 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.323 07-Aug-2015 ozaki-r

Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.


Revision tags: nick-nhusb-base-20150606
# 1.322 02-May-2015 joerg

Fix !ARP build.


# 1.321 02-May-2015 roy

Add IPv4 address flags IN_IFF_TENTATIVE, IN_IFF_DUPLICATED and
IN_IFF_DETATCHED to mimic the IPv6 address behaviour.
Add SIOCGIFAFLAG_IN ioctl to retrieve the address flag via the
ifreq structure.
Add IPv4 DAD detection via the ARP methods described in RFC 5227.
Add sysctls net.inet.ip.dad_count and net.inet.arp.debug.

Discussed on tech-net@


Revision tags: nick-nhusb-base-20150406
# 1.320 26-Mar-2015 ozaki-r

Tidy up the regular path of ip_forward

No functional change is intended.


Revision tags: netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.319 16-Jun-2014 ozaki-r

branches: 1.319.2; 1.319.4; 1.319.6; 1.319.10;
Add 3rd argument to pktq_create to pass sc

It will be used to pass bridge sc for bridge_forward softint.

ok rmind@


# 1.318 05-Jun-2014 rmind

- Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.


# 1.317 30-May-2014 christos

Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.


# 1.316 29-May-2014 rmind

Make IGMP and multicast group management code MP-safe. Use a read-write
lock to protect the hash table of multicast address records; also, make it
private and eliminate some macros. In the long term, the lookup path ought
to be optimised.


# 1.315 28-May-2014 christos

CID 12164{49,51}: Remove bogus ifp == NULL checks; if ifp was really NULL,
we would have been dead a few lines before the tests.


# 1.314 23-May-2014 rmind

ip_input(), ip_savecontrol(): cache m->m_pkthdr.rcvif in a variable.


# 1.313 23-May-2014 rmind

Make ip_forward() static, there is no need to expose it.


# 1.312 23-May-2014 rmind

Make ip_input() static, there is no need to expose it.


# 1.311 22-May-2014 rmind

- Add in_init() and move some functions, variables and sysctls into in.c
where they belong to. Make some functions and variables static.
- ip_input.c: reduce some #ifdefs, cleanup a little.
- Move some sysctls into ip_flow.c as they belong there.

No functional change.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.310 19-Mar-2014 liamjfoy

branches: 1.310.2;
Remove ipflow_prune and replace with ipflow_reap. ok rmind@


Revision tags: riastradh-drm2-base3
# 1.309 25-Feb-2014 pooka

Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.308 29-Jun-2013 rmind

- Rewrite parts of pfil(9): use array to store hooks and thus be more cache
friendly (there are only few hooks in the system). Make the structures
opaque and the interface more strict.
- Remove PFIL_HOOKS option by making pfil(9) mandatory.


# 1.307 27-Jun-2013 christos

branches: 1.307.2;
flip src/dst


# 1.306 27-Jun-2013 christos

implement IP_PKTINFO and IP_RECVPKTINFO.


# 1.305 08-Jun-2013 rmind

Split IPsec code in ip_input() and ip_forward() into the separate routines
ipsec4_input() and ipsec4_forward(). Tested by christos@.


# 1.304 05-Jun-2013 christos

IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.


Revision tags: agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.303 29-Nov-2012 christos

Add a new sysctl to mark ports as reserved, so that they are not used in
the anonymous or reserved port allocation.


Revision tags: yamt-pagecache-base6
# 1.302 25-Jun-2012 christos

branches: 1.302.2;
rename rfc6056 -> portalgo, requested by yamt


# 1.301 22-Jun-2012 christos

PR/46602: Move the rfc6056 port randomization to the IP layer.


# 1.300 02-Jun-2012 dsl

Add some pre-processor magic to verify that the type of the data item
passed to sysctl_createv() actually matches the declared type for
the item itself.
In the places where the caller specifies a function and a structure
address (typically the 'softc') an explicit (void *) cast is now needed.
Fixes bugs in sys/dev/acpi/asus_acpi.c sys/dev/bluetooth/bcsp.c
sys/kern/vfs_bio.c sys/miscfs/syncfs/sync_subr.c and setting
AcpiGbl_EnableAmlDebugObject.
(mostly passing the address of a uint64_t when typed as CTLTYPE_INT).
I've test built quite a few kernels, but there may be some unfixed MD
fallout. Most likely passing &char[] to char *.
Also add CTLFLAG_UNSIGNED for unsiged decimals - not set yet.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.299 22-Mar-2012 drochner

remove KAME IPSEC, replaced by FAST_IPSEC


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.298 09-Jan-2012 liamjfoy

branches: 1.298.2; 1.298.6; 1.298.8;
check against NULL


# 1.297 19-Dec-2011 drochner

rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.296 31-Aug-2011 plunky

branches: 1.296.2; 1.296.6;
NULL does not need a cast


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.295 03-May-2011 dyoung

*_drain() routines may be called with locks held, so instead of doing
any work in *_drain(), set a drain-needed flag. Do the work in the
fasttimo handler.

Contributed by Coyote Point Systems, Inc.


# 1.294 14-Apr-2011 dyoung

In ipintr(), don't overwrite ipintrq.ifq_maxlen with IFQ_MAXLEN.

Initialize ipintrq.ifq_maxlen using IFQ_MAXLEN directly instead of using
the global ipqmaxlen. Get rid of the global ipqmaxlen.

Now it works again to override the maximum IP queue length with, for
example, sysctl -w net.inet.ip.ifq.maxlen=5.


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.293 13-Dec-2010 matt

branches: 1.293.2;
Back out rev that shouldn't have been committed.


# 1.292 11-Dec-2010 matt

Add routines to calculate a checkesum if the driver concludes that the
h/w can't do it.


Revision tags: uebayasi-xip-base4
# 1.291 05-Nov-2010 rmind

ip_randomid: make mechanism MP-safe and more modular.

OK matt@


# 1.290 05-Nov-2010 rmind

ip_reass_packet: finish abstraction; some clean-up.
Discussed some time ago with matt@.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.289 19-Jul-2010 rmind

Abstract IP reassembly into single generic routine - ip_reass_packet().
Make struct ipq private and struct ipqent not visible to userland.
Push ip_len adjustment into reassembly layer.

OK matt@


# 1.288 13-Jul-2010 rmind

Split-off IPv4 re-assembly mechanism into a separate module. Abstract
into ip_reass_init(), ip_reass_lookup(), etc (note: abstraction is not
yet complete). No functional changes to the actual mechanism.

OK matt@


# 1.287 09-Jul-2010 rmind

ip_input: move lookup for fragment queue a little bit further. OK matt@.


Revision tags: uebayasi-xip-base1
# 1.286 01-Apr-2010 tls

As suggested by at least 3 different people (the guilty parties know who
they are) avoid repeated kernel_lock/unlock by using an intrq on the stack.

About 5%-10% better from run to run, on my *very* simpleminded test. Can't
possibly be worse.


# 1.285 31-Mar-2010 tls

Don't hold kernel lock across call to ip_input() -- it blocked *all*
hardware interrupts for the length of time it took for all dequeued
packets to flow up the stack (on multiprocessors only). Initial testing
shows performance impact is minimal -- since this temporary fix actually
means taking/releasing the kernel lock per-packet, that seems
acceptable.

Holding the kernel lock across the ip_input() call duplicated the
exclusion intended to be provided by the socket locks/softnet lock
(same lock, for INET/INET6 sockets) and could mask serious bugs. Several
hours' testing didn't turn any up but I'd be surprised if some don't now
appear.

Damon Permezel noticed the problem. Temporary fix suggested by matt@.


Revision tags: yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.284 16-Sep-2009 pooka

branches: 1.284.2; 1.284.4;
Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.283 17-Jul-2009 minskim

Delete trailing whitespace.


Revision tags: yamt-nfs-mp-base6
# 1.282 16-Jul-2009 minskim

Add the IP_RECVTTL option support.

If the IP_RECVTTL option is enabled on a SOCK_DGRAM socket, the
recvmsg(2) call will return the TTL of the received datagram. The
msg_control field in the msghdr structure points to a buffer that
contains a cmsghdr structure followed by the TTL value.

Modeled after FreeBSD implementation.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.281 18-Apr-2009 tsutsui

Remove extra whitespace added by a stupid tool.
XXX: more in src/sys/arch


# 1.280 15-Apr-2009 elad

Remove a few KAUTH_GENERIC_ISSUSER in favor of more descriptive
alternatives.

Discussed on tech-kern:

http://mail-index.netbsd.org/tech-kern/2009/04/11/msg004798.html

Input from ad@, christos@, dyoung@, tsutsui@.

Okay ad@.


# 1.279 18-Mar-2009 cegger

bcopy -> memcpy


Revision tags: nick-hppapmap-base2
# 1.278 19-Jan-2009 christos

branches: 1.278.2;
Provide compatibility to the old timeval SCM_TIMESTAMP messages.


Revision tags: mjf-devfs2-base
# 1.277 17-Dec-2008 cegger

kill MALLOC and FREE macros.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.276 23-Nov-2008 rmind

ip_input: fix an IPQ "lock" leak. (hi <matt>!)


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4
# 1.275 04-Oct-2008 pooka

branches: 1.275.2; 1.275.4;
POOL_INIT -> pool_init


Revision tags: wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.274 05-Sep-2008 seanb

Wrong route being consulted in one place
in ip_forward() after change to rtcache_*().
Restore previous behaviour.


# 1.273 20-Aug-2008 matt

Make the sysctl routines take out softnet_lock before dealing with
any data structures.

Change inet6ctlerrmap and zeroin6_addr to const.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base
# 1.272 05-May-2008 ad

branches: 1.272.2; 1.272.6;
- Convert hashinit() to use kmem_alloc(). The hash tables can be large
and it's better to not have them in kmem_map.
- Convert a couple of minor items along the way to kmem_alloc().
- Fix some memory leaks.


# 1.271 04-May-2008 thorpej

Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577


# 1.270 02-May-2008 ad

PR kern/38497 Out of memory allocating ksiginfo

Work around: don't acquire softnet_lock in protocol drain routines.


# 1.269 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.268 24-Apr-2008 ad

branches: 1.268.2;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.


# 1.267 23-Apr-2008 thorpej

Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.266 12-Apr-2008 thorpej

branches: 1.266.2;
Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.


# 1.265 09-Apr-2008 thorpej

- ipflow is not used outside ip_flow.c; move its definition there.
- Make ipflow_reap() private to ip_flow.c, and introduce ipflow_prune()
for external callers to use (avoids returning an ipflow * that is never
actually used anyway).


# 1.264 07-Apr-2008 thorpej

Change IP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ipstat structure; old netstat
binaries will continue to work properly.


# 1.263 27-Mar-2008 cube

- Make sure we send a reasonable fragment size when IPSEC is configured.
Otherwise we end up sending a dubious "0" whenever we cannot find a
proper association for the packet.
- Reset sack_newdata along with snd_nxt to avoid improper integer
arithmetics that lead to sending data from an incorrect place in the
stream, making it appear as corrupted.

Patch by Michael Van Elst, based on an analysis by Michael for the IPSEC
stuff and I for the SACK issue.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.262 06-Feb-2008 matt

branches: 1.262.6;
Add a new ip_id generation scheme based on a Fisher-Yates shuffle over a
sliding window. XXX replace use of arc4random RSN.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.261 14-Jan-2008 dyoung

Use rtcache_validate() instead of rtcache_getrt(). Shorten staircase
in in_losing().


Revision tags: vmlocking2-base3 matt-armv6-base
# 1.260 22-Dec-2007 matt

Fix offset calculation.
Make sure that all frags use the same TOS.


# 1.259 21-Dec-2007 matt

Also make sure the first is at 68 bytes long.


# 1.258 21-Dec-2007 matt

Prevent TCP blind data attacks by not allowing non-initial fragments to
start at less than 68 bytes (minimal fragment size).


# 1.257 20-Dec-2007 dyoung

Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.256 26-Nov-2007 yamt

branches: 1.256.2; 1.256.6;
inetctlerrmap: use designated initializer.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.255 09-Nov-2007 kefren

Don't MCLAIM in ipintr() because we do it anyway in ip_input()


Revision tags: jmcneill-base yamt-x86pmap-base4 yamt-x86pmap-base3 yamt-x86pmap-base2 vmlocking-base
# 1.254 02-Oct-2007 dyoung

branches: 1.254.2; 1.254.4;
Delete the unused second argument to ip_stripoptions(), move it
closer to its single caller in if_eon.c, try to move fewer bytes
by moving the IP header forward instead of moving the tail of the
mbuf backward, and use m_adj(9) instead of fiddling directly with
mbuf data members.


Revision tags: yamt-x86pmap-base
# 1.253 11-Sep-2007 degroote

branches: 1.253.2;
In some FAST_IPSEC, spl level is not restored correctly. Fix that.

Spotted by Wolfgang Stukenbrock in pr/36800


Revision tags: nick-csl-alignment-base5
# 1.252 30-Aug-2007 dyoung

Use malloc(9) for sockaddrs instead of pool(9), and remove dom_sa_pool
and dom_sa_len members from struct domain. Pools of fixed-size
objects are too rigid for sockaddr_dls, whose size can vary over
a wide range.

Return sockaddr_dl to its "historical" size. Now that I'm using
malloc(9) instead of pool(9) to allocate sockaddr_dl, I can create
a sockaddr_dl of any size in the kernel, so expanding sockaddr_dl
is useless.

Avoid using sizeof(struct sockaddr_dl) in the kernel.

Introduce sockaddr_dl_alloc() for allocating & initializing an
arbitrary sockaddr_dl on the heap.

Add an argument, the sockaddr length, to sockaddr_alloc(),
sockaddr_copy(), and sockaddr_dl_setaddr().

Constify: LLADDR() -> CLLADDR().

Where the kernel overwrites LLADDR(), use sockaddr_dl_setaddr(),
instead. Used properly, sockaddr_dl_setaddr() will not overrun
the end of the sockaddr.


# 1.251 10-Aug-2007 dyoung

branches: 1.251.2;
Use sockaddr_dl_init().


Revision tags: matt-mips64-base
# 1.250 19-Jul-2007 dyoung

branches: 1.250.4; 1.250.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base yamt-idlelwp-base8 mjf-ufs-trans-base
# 1.249 02-May-2007 dyoung

branches: 1.249.2;
Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


Revision tags: thorpej-atomic-base
# 1.248 25-Mar-2007 liamjfoy

Add net.inet.ip.hashsize to control the IPv4 fast forward hash table size.


# 1.247 24-Mar-2007 liamjfoy

Don't call ip*flow_reap if we're just looking up maxflows


# 1.246 12-Mar-2007 ad

branches: 1.246.2; 1.246.4;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.


# 1.245 05-Mar-2007 liamjfoy

branches: 1.245.2;
Move ipflow_slowtimo from ip_slowtimo and into in_proto.c

ok matt@


# 1.244 04-Mar-2007 christos

Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.243 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.242 29-Jan-2007 dyoung

branches: 1.242.2;
Cosmetic: remove extraneous, non-KNF parentheses. Change a
sizeof(type) to a sizeof(*ptr) so the correctness of the statement
is correct "at a glance" (or so I hope).


# 1.241 22-Dec-2006 ad

ipintr(): check if the queue is empty before looping. Hardly a giant
win, but removed 30% of splnet() calls in one local test.


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.240 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.239 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


# 1.238 06-Dec-2006 dyoung

KNF.


# 1.237 06-Dec-2006 dyoung

KNF.


Revision tags: netbsd-4-0-RC1 netbsd-4-base
# 1.236 16-Nov-2006 christos

branches: 1.236.2; 1.236.4;
__unused removal on arguments; approved by core.


Revision tags: yamt-splraiseipl-base2
# 1.235 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


# 1.234 10-Oct-2006 dogcow

change the MOWNER_INIT define to take two args; fix extant struct mowner
decls to use it. Makes options MBUFTRACE compile again and not whinge about
missing structure declarations. (Also makes initialization consistent.)


# 1.233 05-Oct-2006 tls

Protect calls to pool_put/pool_get that may occur in interrupt context
with spl used to protect other allocations and frees, or datastructure
element insertion and removal, in adjacent code.

It is almost unquestionably the case that some of the spl()/splx() calls
added here are superfluous, but it really seems wrong to see:

s=splfoo();
/* frob data structure */
splx(s);
pool_put(x);

and if we think we need to protect the first operation, then it is hard
to see why we should not think we need to protect the next. "Better
safe than sorry".

It is also almost unquestionably the case that I missed some pool
gets/puts from interrupt context with my strategy for finding these
calls; use of PR_NOWAIT is a strong hint that a pool may be used from
interrupt context but many callers in the kernel pass a "can wait/can't
wait" flag down such that my searches might not have found them. One
notable area that needs to be looked at is pf.

See also:

http://mail-index.netbsd.org/tech-kern/2006/07/19/0003.html
http://mail-index.netbsd.org/tech-kern/2006/07/19/0009.html


# 1.232 19-Sep-2006 elad

Remove ugly (void *) casts from network scope authorization wrapper and
calls to it.

While here, adapt code for system scope listeners to avoid some more
casts (forgotten in previous run).

Update documentation.


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9
# 1.231 13-Sep-2006 elad

branches: 1.231.2;
Don't use KAUTH_RESULT_* where it's not applicable.
Prompted by yamt@.


# 1.230 08-Sep-2006 elad

First take at security model abstraction.

- Add a few scopes to the kernel: system, network, and machdep.

- Add a few more actions/sub-actions (requests), and start using them as
opposed to the KAUTH_GENERIC_ISSUSER place-holders.

- Introduce a basic set of listeners that implement our "traditional"
security model, called "bsd44". This is the default (and only) model we
have at the moment.

- Update all relevant documentation.

- Add some code and docs to help folks who want to actually use this stuff:

* There's a sample overlay model, sitting on-top of "bsd44", for
fast experimenting with tweaking just a subset of an existing model.

This is pretty cool because it's *really* straightforward to do stuff
you had to use ugly hacks for until now...

* And of course, documentation describing how to do the above for quick
reference, including code samples.

All of these changes were tested for regressions using a Python-based
testsuite that will be (I hope) available soon via pkgsrc. Information
about the tests, and how to write new ones, can be found on:

http://kauth.linbsd.org/kauthwiki

NOTE FOR DEVELOPERS: *PLEASE* don't add any code that does any of the
following:

- Uses a KAUTH_GENERIC_ISSUSER kauth(9) request,
- Checks 'securelevel' directly,
- Checks a uid/gid directly.

(or if you feel you have to, contact me first)

This is still work in progress; It's far from being done, but now it'll
be a lot easier.

Relevant mailing list threads:

http://mail-index.netbsd.org/tech-security/2006/01/25/0011.html
http://mail-index.netbsd.org/tech-security/2006/03/24/0001.html
http://mail-index.netbsd.org/tech-security/2006/04/18/0000.html
http://mail-index.netbsd.org/tech-security/2006/05/15/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/01/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/25/0000.html

Many thanks to YAMAMOTO Takashi, Matt Thomas, and Christos Zoulas for help
stablizing kauth(9).

Full credit for the regression tests, making sure these changes didn't break
anything, goes to Matt Fleming and Jaime Fournier.

Happy birthday Randi! :)


Revision tags: yamt-pdpolicy-base8 rpaulo-netinet-merge-pcb-base
# 1.229 30-Aug-2006 christos

branches: 1.229.2;
fix initializer


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.228 30-Jul-2006 elad

ugh.. more stuff that's overdue and should not be in 4.0: remove the
sysctl(9) flags CTLFLAG_READONLY[12]. luckily they're not documented
so it's only half regression.

only two knobs used them; proc.curproc.corename (check added in the
existing handler; its CTLFLAG_ANYWRITE, yay) and net.inet.ip.forwsrcrt,
that got its own handler now too.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base
# 1.227 07-Jun-2006 kardel

merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base simonb-timecounters-base
# 1.226 08-May-2006 liamjfoy

branches: 1.226.2;
#if -> #ifdef

ok christos


# 1.225 15-Apr-2006 christos

Coverity CID 1134: Protect against NULL deref.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.224 18-Feb-2006 joerg

branches: 1.224.2; 1.224.4; 1.224.6;
Print the source and destination IP in ip_forward's DIAGNOSTIC code
with inet_ntoa, making it more human friendly.

From Liam J. Foy in private mail.


# 1.223 24-Dec-2005 perry

branches: 1.223.2; 1.223.4; 1.223.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.222 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.221 01-Nov-2005 christos

Don't decrement the ttl, until we are sure that we can forward this packet.
Before if there was no route, we would call icmp_error with a datagram
packet that has an incorrect checksum. (From Liam Foy)


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base
# 1.220 23-Oct-2005 christos

No need to pass an interface when only the mtu is needed. From OpenBSD via
Liam Foy.


Revision tags: yamt-vop-base
# 1.219 05-Aug-2005 elad

branches: 1.219.2;
Add sysctls for IP, ICMP, TCP, and UDP statistics.


# 1.218 28-Jun-2005 seanb

branches: 1.218.2;
- Return ICMP_UNREACH_NET when no route found as per
section 4.3.3.1 of rfc1812.


# 1.217 09-Jun-2005 atatat

Properly fix the constipated lossage wrt -Wcast-qual and the sysctl
code. I know it's not the prettiest code, but it seems to work rather
well in spite of itself.


# 1.216 01-Jun-2005 blymn

Unconstify rnode to prevent compile error when GATEWAY option set.


Revision tags: kent-audio2-base
# 1.215 29-Apr-2005 yamt

move decl of inetsw to its own header to avoid array of incomplete type.
found by gcc4. reported by Adam Ciarcinski.


# 1.214 18-Apr-2005 yamt

fix problems related to loopback interface checksum omission. PR/29971.

- for ipv4, defer decision to ip layer as h/w checksum offloading does
so that it can check the actual interface the packet is going to.
- for ipv6, disable it.
(maybe will be revisited when it implements h/w checksum offloading.)

ok'ed by Jason Thorpe.


# 1.213 29-Mar-2005 yamt

ip_reass: clear stale csum_flags.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base
# 1.212 26-Feb-2005 perry

branches: 1.212.2;
nuke trailing whitespace


Revision tags: yamt-km-base2
# 1.211 03-Feb-2005 perry

ANSIfy function declarations


# 1.210 02-Feb-2005 perry

de-__P -- will ANSIfy .c files later.


Revision tags: yamt-km-base
# 1.209 24-Jan-2005 matt

branches: 1.209.2;
Add IFNET_FOREACH and IFADDR_FOREACH macros and start using them.


Revision tags: kent-audio1-beforemerge
# 1.208 19-Dec-2004 christos

branches: 1.208.2;
yamt's changes seem to fix all the checksumming issues. Turn the loopback
checksums back off so we can make sure that everything works.


# 1.207 17-Dec-2004 christos

Turn checksumming on loopback back on until we fix the bugs in it.
Connect over tcp on the loopback is broken:

4729 amq 0.000007 CALL connect(4,0x804f2a0,0x1c)
4729 amq 75.007420 RET connect -1 errno 60 Connection timed out


# 1.206 15-Dec-2004 thorpej

Don't perform checksums on loopback interfaces. They can be reenabled with
the net.inet.*.do_loopback_cksum sysctl.

Approved by: groo


Revision tags: kent-audio1-base
# 1.205 06-Oct-2004 darrenr

Add a comment to document what setting "srcrt" is really on about in ipintr()


# 1.204 29-Sep-2004 christos

PR/27081: Sean Boudreau: ip_input() bad csum count not incremented on sw csum


Revision tags: BEFORE-IPF413
# 1.203 25-May-2004 atatat

Sysctl descriptions under net subtree (net.key not done)


# 1.202 02-May-2004 darrenr

at line 543, we do a pullup here of hlen bytes into the mbuf,
so these later ones are superfluous.


# 1.201 01-May-2004 matt

Use EVCNT_ATTACH_STATIC{,2}


# 1.200 25-Apr-2004 simonb

Initialise (most) pools from a link set instead of explicit calls
to pool_init. Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.

Convert struct session, ucred and lockf to pools.


# 1.199 22-Apr-2004 matt

Constify protosw arrays. This can reduce the kernel .data section by
over 4K (if all the network protocols) are loaded.


# 1.198 01-Apr-2004 matt

In ip_reass_ttl_descr, make i signed since it's compared to >= 0


Revision tags: netbsd-2-0-base BEFORE-IPF411
# 1.197 24-Mar-2004 atatat

branches: 1.197.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.


# 1.196 15-Jan-2004 itojun

correct typo in 1.94 -> 1.95. pointed out by Shiva Shenoy


# 1.195 14-Dec-2003 thorpej

Fix syntax errors in CHECK_NMBCLUSTER_PARAMS().


# 1.194 14-Dec-2003 jonathan

Second part of hashed IP_reassembly changes:

When under pressure for mbufs or we have too many fragments in the IP
reassembly queue, drop half of all fragments. This multiplicative-drop
strategy ensures we return to a healthy state, even under borderline
denial-of-service from extremely lossy NFS-over-UDP peers.
The multiplicative-drop phase currently drops 50% of fragments, but
has pre-placed support for implementing drop-fractions other than 50%

The threshhold for the `drop-half' phase is the new variable,
ip_maxfrags which is calculated as nmbclusters/4.

ip_input.c now keeps ip_nmbclusters, a cached copy of nmbclusters.
Before using limits derived from nmbclusters, we check if nmbclusters
and ip_nmclusters are equal. If not, we recompute Ip parameters
derived from nmbclusters. Based on a suggestion by Jason Thorpe.
ip_maxfrags is currently auto-recalcuated.

The counters ip_nfrags and ip_nfragpacketsr are now declared static
and uninitialized (bss), to discourage tampering with them.


# 1.193 12-Dec-2003 scw

Make fast-ipsec and ipflow (Fast Forwarding) interoperate.

The idea is that we only clear M_CANFASTFWD if an SPD exists
for the packet. Otherwise, it's safe to add a fast-forward
cache entry for the route.

To make this work properly, we invalidate the entire ipflow
cache if a fast-ipsec key is added or changed.


# 1.192 08-Dec-2003 jonathan

Add new field ipq_nfrags to struct ipq. Maintain count of fragments
(fragments, not fragmented packets) in each queue entry.
Use ipq_nfrags to maintain a count of total fragments in reassembly queue.


# 1.191 07-Dec-2003 jonathan

KNF: s/unsigned/u_int/, in a couple of places I missed.


# 1.190 06-Dec-2003 jonathan

Replace the single global IP reassembly list/listhead, with a
hashtable of list-heads. Independently re-invented, then reworked to
match similar code in FreeBSD.


# 1.189 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.188 04-Dec-2003 scw

ipflow (IP fast forwarding) is not compatible with FAST_IPSEC either.

XXX: The decision whether or not to fast forward should be made
XXX: dynamically. Using the current approach seriously reduces
XXX: routing performance on gateways with IPsec enabled.


# 1.187 26-Nov-2003 itojun

define RANDOM_IP_ID by default (unifdef -DRANDOM_IP_ID).
one use remains in sys/netipsec, which is kept for freebsd source code compat.


# 1.186 24-Nov-2003 scw

For FAST_IPSEC, ipfilter gets to see wire-format IPsec-encapsulated packets
only. Decapsulated packets bypass ipfilter. This mimics current behaviour
for Kame IPsec.


# 1.185 19-Nov-2003 fvdl

Correct number of arguments to sysctl_rdint.


# 1.184 19-Nov-2003 jonathan

Patch back support for (badly) randomized IP ids, by request:

* Include "opt_inet.h" everywhere IP-ids are generated with ip_newid(),
so the RANDOM_IP_ID option is visible. Also in ip_id(), to ensure
the prototype for ip_randomid() is made visible.

* Add new sysctl to enable randomized IP-ids, provided the kernel was
configured with RANDOM_IP_ID. (The sysctl defaults to zero, and is
a read-only zero if RANDOM_IP_ID is not configured).

Note that the implementation of randomized IP ids is still defective,
and should not be enabled at all (even if configured) without
very careful deliberation. Caveat emptor.


# 1.183 17-Nov-2003 jonathan

Diff to netinet/ip_input.c (restore ip_id, initialize) for ip_id fix:

Revert the (default) ip_id algorithm to the pre-randomid algorithm,
due to demonstrated low-period repeated IDs from the randomized IP_id
code. Consensus is that the low-period repetition (much less than
2^15) is not suitable for general-purpose use.

Allocators of new IPv4 IDs should now call the function ip_newid().
Randomized IP_ids is now a config-time option, "options RANDOM_IP_ID".
ip_newid() can use ip_random-id()_IP_ID if and only if configured
with RANDOM_IP_ID. A sysctl knob should be provided.

This API may be reworked in the near future to support linear ip_id
counters per (src,dst) IP-address pair.


# 1.182 12-Nov-2003 itojun

KNF


# 1.181 11-Nov-2003 jonathan

Change global head-of-local-IP-address list from in_ifaddr to
in_ifaddrhead. Recent changes in struct names caused a namespace
collision in fast-ipsec, which are most cleanly fixed by using
"in_ifaddrhead" as the listhead name.


# 1.180 10-Nov-2003 jonathan

Make per-protocol network input queue stats visible to userland via
sysctl. Add a protocol-independent sysctl handler to show the per-protocol
"struct ifq' statistics. Add IP(v4) specific call to the handler.
Other protocols can show their per-protocol input statistics by
allocating a sysclt node and calling sysctl_ifq() with their own struct ifq *.

As posted to tech-kern plus improvements/cleanup suggested by Andrew Brown.


# 1.179 28-Sep-2003 mycroft

Remove some code that breaks AH tunnels completely. The comment describing
the purpose of this code appears to be on crack -- it's talking about
end-to-end authentication, but the purpose of an AH tunnel is NOT end-to-end
authentication; it's authentication of the tunnel endpoints.

NB: This does not fix the fact that IPsec leaks "packet tags."


# 1.178 06-Sep-2003 itojun

randomize IPv4/v6 fragment ID and IPv6 flowlabel. avoids predictability
of these fields. ip_id.c is from openbsd. ip6_id.c is adapted by kame.


# 1.177 06-Sep-2003 itojun

backout previous, we don't know if arc4random() corrides on reboot.


# 1.176 05-Sep-2003 itojun

initialize fragment ID with arc4random, not by time.tv_sec


# 1.175 22-Aug-2003 itojun

remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.


# 1.174 22-Aug-2003 itojun

change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.


# 1.173 15-Aug-2003 jonathan

(fast-ipsec): Add hooks to pass IPv4 IPsec traffic into fast-ipsec, if
configured with ``options FAST_IPSEC''. Kernels with KAME IPsec or
with no IPsec should work as before.

All calls to ip_output() now always pass an additional compulsory
argument: the inpcb associated with the packet being sent,
or 0 if no inpcb is available.

Fast-ipsec tested with ICMP or UDP over ESP. TCP doesn't work, yet.


# 1.172 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.171 14-Jul-2003 itojun

correct igmp. from love


# 1.170 03-Jul-2003 itojun

minor KNF


# 1.169 30-Jun-2003 itojun

branches: 1.169.2;
do not generate ICMP redirect when packet filter alters ip_dst to an
address that reside on the same link. Cedric Berger convinced me that
it is necessary.


# 1.168 30-Jun-2003 itojun

fix indent


# 1.167 23-Jun-2003 martin

Make sure to include opt_foo.h if a defflag option FOO is used.


# 1.166 15-Jun-2003 matt

Change the way multicasts are kept. They now use a hash table in the same
manner as the ifaddr hash table. By doing this, the mkludge code can go
away. At the same time, keep track of what pcbs are using what ifaddr and
when an address is deleted from an interface, notify/abort all sockets
that have that address as a source. Switch IGMP and multicasts to use pools
for allocation. Fix a number of potential problems in the igmp code where
allocation failures could cause a trap/panic.


# 1.165 11-Apr-2003 christos

PR/991: Darren Reed: Add a sysctl (checkinteface) to implement this. This
implementation is taken from FreeBSD, but we default to off.
XXX: We should really do this on a per ifaddr basis as jason suggested.


# 1.164 26-Feb-2003 matt

Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.163 12-Nov-2002 itojun

remove all entries in rt timer queue on ip_mtudisc change, instead of
destroying the queue.


# 1.162 12-Nov-2002 itojun

ckout previous - doesn't compile


# 1.161 12-Nov-2002 itojun

update ip_mtudisc sysctl change handling.


# 1.160 10-Nov-2002 itojun

always create pmtud timeout queue, as ip_mtudisc can be tweaked via
sysctl at runtime. From lha@stacken.kth.se


# 1.159 02-Nov-2002 perry

/*CONTCOND*/ while (0)'ed macros


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.158 23-Sep-2002 itojun

revert mtudisc_timeout value to the old one if update falis


# 1.157 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame.


# 1.156 11-Sep-2002 itojun

correct signedness mixup in pointer passing. sync w/kame


Revision tags: gehenna-devsw-base
# 1.155 14-Aug-2002 itojun

avoid swapping endian of ip_len and ip_off on mbuf, to meet with M_LEADINGSPACE
optimization made last year. should solve PR 17867 and 10195.

IP_HDRINCL behavior of raw ip socket is kept unchanged. we may want to
provide IP_HDRINCL variant that does not swap endian.


# 1.154 30-Jun-2002 thorpej

Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).


# 1.153 13-Jun-2002 itojun

set IPv4 parameter to modern value.
- turn on path MTU discovery (previous: turned off)
- ICMPv4 redirect entry timeout = 600 sec (previous: never timeout)


# 1.152 09-Jun-2002 itojun

whitespace


# 1.151 07-Jun-2002 itojun

look at rmx_mtu on IPsec tunnel MTU computation.
From: David Waitzman <djw@bbn.com>


Revision tags: netbsd-1-6-base
# 1.150 12-May-2002 matt

branches: 1.150.2; 1.150.4;
Eliminate commons.


# 1.149 12-May-2002 wiz

Spelling fixes, from Sergey Svishchev in kern/16650.


# 1.148 07-May-2002 matt

Change struct ipqe to use TAILQ's instead of LIST's (primarily for TCP's
benefit currently). Rework tcp_reass code to optimize the 4 most likely causes
of out-of-order packets: first OoO pkt, next OoO pkt in seq, OoO pkt is part
of new chuck of OoO packets, and the OoO pkt fills the first hole. Add evcnts
to instrument tcp_reass (enabled by the options TCP_REASS_COUNTERS). This is
part 1/2 of tcp_reass changes.


# 1.147 18-Apr-2002 matt

Change test for M_EXT to M_READONLY for MROUTING. We only need to to do
a pullup if we aren't allowed to modify the packet.


Revision tags: eeh-devprop-base newlock-base
# 1.146 08-Mar-2002 thorpej

Pool deals fairly well with physical memory shortage, but it doesn't
deal with shortages of the VM maps where the backing pages are mapped
(usually kmem_map). Try to deal with this:

* Group all information about the backend allocator for a pool in a
separate structure. The pool references this structure, rather than
the individual fields.
* Change the pool_init() API accordingly, and adjust all callers.
* Link all pools using the same backend allocator on a list.
* The backend allocator is responsible for waiting for physical memory
to become available, but will still fail if it cannot callocate KVA
space for the pages. If this happens, carefully drain all pools using
the same backend allocator, so that some KVA space can be freed.
* Change pool_reclaim() to indicate if it actually succeeded in freeing
some pages, and use that information to make draining easier and more
efficient.
* Get rid of PR_URGENT. There was only one use of it, and it could be
dealt with by the caller.

From art@openbsd.org.


Revision tags: ifpoll-base
# 1.145 25-Feb-2002 itojun

correctly enforce ipsec policy check on forwarding case.
From: Greg Troxel <gdt@ir.bbn.com>, Bill Chiarchiaro <wjc@work.cleartech.com>


# 1.144 24-Feb-2002 martin

Clear M_BCAST and M_MCAST on outgoing mbufs.
Don't copy ttl from the inner packet to the encapsulating packet. Make
the outer ttl sysctl'able. This should close PR 14269 from Jasper Wallace
(change partly from there) and it makes traceroute work over gre tunnels.


# 1.143 21-Feb-2002 itojun

suppress source quence message, based on router-req RFC (also could be abused
as DoS traffic generator). from kjc/kame


# 1.142 28-Nov-2001 darrenr

recompute hlen after calling pfil_run_hooks() in case ip_hl was changed.


# 1.141 13-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-mips-cache-base
# 1.140 04-Nov-2001 matt

Convert netinet to not use the internal <sys/queue.h> field names
but instead the access macros. Use the FOREACH macros where appropriate.


# 1.139 04-Nov-2001 matt

Change a few variable/tables to const since they are read-only.


# 1.138 29-Oct-2001 simonb

Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.137 17-Sep-2001 thorpej

branches: 1.137.2;
Split the pre-computed ifnet checksum flags into Tx and Rx directions.
Add capabilities bits that indicate an interface can only perform
in-bound TCPv4 or UDPv4 checksums. There is at least one Gig-E chip
for which this is true (Level One LXT-1001), and this is also the
case for the Intel i82559 10/100 Ethernet chips.


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.136 06-Aug-2001 itojun

branches: 1.136.2;
cache IPsec policy on in6?pcb. most of the lookup operations can be bypassed,
especially when it is a connected SOCK_STREAM in6?pcb. sync with kame.


# 1.135 02-Jun-2001 thorpej

branches: 1.135.2;
Implement support for IP/TCP/UDP checksum offloading provided by
network interfaces. This works by pre-computing the pseudo-header
checksum and caching it, delaying the actual checksum to ip_output()
if the hardware cannot perform the sum for us. In-bound checksums
can either be fully-checked by hardware, or summed up for final
verification by software. This method was modeled after how this
is done in FreeBSD, although the code is significantly different in
most places.

We don't delay checksums for IPv6/TCP, but we do take advantage of the
cached pseudo-header checksum.

Note: hardware-assisted checksumming defaults to "off". It is
enabled with ifconfig(8). See the manual page for details.

Implement hardware-assisted checksumming on the DP83820 Gigabit Ethernet,
3c90xB/3c90xC 10/100 Ethernet, and Alteon Tigon/Tigon2 Gigabit Ethernet.


# 1.134 21-May-2001 lukem

fix spelo in comment


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.133 16-Apr-2001 itojun

give a default value to net.inet.ip.maxfragpackets, to protect us from
"lots of fragmented packets" DoS attack.

the current default value is derived from ipv6 counterpart, which is
a magical value "200". it should be enough for normal systems, not sure
if it is enough when you take hundreds of thousands of tcp connections on
your system. if you have proposal for a better value with concrete reasons,
let me know.


# 1.132 13-Apr-2001 thorpej

Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.


# 1.131 27-Mar-2001 itojun

net.inet.ip.maxfragpackets defines the maximum size of ip reass queue
(prevents fragment flood from chewing up mbuf memory space).
derived from KAME net.inet6.ip6.maxfragpackets.


# 1.130 02-Mar-2001 itojun

branches: 1.130.2;
increase ipstat.ips_badaddr if the packet fails to pass address checks.


# 1.129 02-Mar-2001 itojun

reject packets with 127/8 on IPv4 src/dst, they must not appear on wire
(RFC1122). torture-tests will be welcomed.
XXX do we want to check source routing headers as well?


# 1.128 01-Mar-2001 itojun

make sure to enforce inbound ipsec policy checking, for any protocols on top
of ip (check it when final header is visited). sync with kame.
XXX kame team will need to re-check policy engine code


# 1.127 24-Jan-2001 itojun

- record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation


# 1.126 28-Dec-2000 thorpej

Back out the sledgehammer damage applied by wiz while I was out for
the holiday.


# 1.125 25-Dec-2000 wiz

Back out previous change. It causes NAT to fail, and was CLEARLY
NOT TESTED before it was committed.


# 1.124 22-Dec-2000 thorpej

Slight adjustment to how pfil_head's are registered. Instead of a
"key" and a "dlt", use a "type" (PFIL_TYPE_{AF,IFNET} for now) and
a val/ptr appropriate for that type. This allows for more future
flexibility with the pfil_hook mechanism.


# 1.123 14-Dec-2000 thorpej

Add ALTQ glue. XXX Temporary until ALTQ is changed to use a pfil hook.


# 1.122 24-Nov-2000 itojun

IFA_STATS stability (not complete); don't touch ip if it is NULL.


# 1.121 11-Nov-2000 thorpej

Restructure the PFIL_HOOKS mechanism a bit:
- All packets are passed to PFIL_HOOKS as they come off the wire, i.e.
fields in protocol headers in network order, etc.
- Allow for multiple hooks to be registered, using a "key" and a "dlt".
The "dlt" is a BPF data link type, indicating what type of header is
present.
- INET and INET6 register with key == AF_INET or AF_INET6, and
dlt == DLT_RAW.
- PFIL_HOOKS now take an argument for the filter hook, and mbuf **,
an ifnet *, and a direction (PFIL_IN or PFIL_OUT), thus making them
less IP (really, IP Filter) centric.

Maintain compatibility with IP Filter by adding wrapper functions for
IP Filter.


# 1.120 08-Nov-2000 ad

Update for hashinit() change.


# 1.119 13-Oct-2000 itojun

make sure we don't share external mbuf between m and mcopy, in ip_forward().
should solve PR 11201.


# 1.118 26-Aug-2000 itojun

make sure anonport{min,max} is not negative number


# 1.117 25-Aug-2000 tron

Add new sysctl variables "net.inet.ip.lowportmin" and
"net.inet.ip.lowportmax" which can be used to the set minimum
and maximum port number assigned to sockets using
IP_PORTRANGE_LOW.


# 1.116 06-Jul-2000 itojun

remove unnecessary #include <netkey/key_debug.h>. from kame.


# 1.115 28-Jun-2000 mrg

<vm/vm.h> -> <uvm/uvm_extern.h>


Revision tags: netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.114 10-May-2000 itojun

branches: 1.114.4;
add missing boundary checks to ip options processing.
correct timestamp option validation (len and ptr upper/lower bound
based on RFC791).
fill "pointer" field for parameter problem in timestamp option processing.


# 1.113 10-May-2000 itojun

correct more out-of-bounds memory access, if cnt == 1 and optlen > 1.


# 1.112 06-May-2000 sommerfeld

Handle large offsets with very small options correctly.


# 1.111 31-Mar-2000 jdolecek

Slighly improve previous - only include <netinet/ip_mroute.h> if MROUTING
is defined.


# 1.110 31-Mar-2000 jdolecek

include <netinet/ip_mroute.h> for ip_mforward() - needed after
last duplicate prototype sweep (prototype for ip_mforward() used to be in <netinet/ip_var.h>)


# 1.109 30-Mar-2000 augustss

Remove register declarations.


# 1.108 30-Mar-2000 simonb

Delete uninitialised declaration of ip_defttl - there's an initialised
decl earlier in this file.


# 1.107 10-Mar-2000 thorpej

Back out previous, and adjust a comment.


# 1.106 07-Mar-2000 thorpej

Back out part of 1.104 which isn't actually needed.


# 1.105 03-Mar-2000 itojun

remove unnecessary ttl initialization which I mistakingly bringed in
during KAME merge (this is part of WIDE's expeirmental reass code...)
NetBSD PR: 9412
From: Wolfgang Rupprecht <wolfgang@wsrcc.com>
Fix from: ho@crt.se
itojun was notified from: theo


# 1.104 02-Mar-2000 thorpej

Avoid a bug in GCC which manifests itself when processing unaligned
IP options. Problem pointed out by Matt Hargett and Erik Fair, analyzed
by me.


# 1.103 01-Mar-2000 itojun

introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)


# 1.102 20-Feb-2000 darrenr

pass "struct pfil_head *" to pfil_add_hook and pfil_remove hook rather
than "struct protosw *".


# 1.101 17-Feb-2000 darrenr

Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.


# 1.100 16-Feb-2000 itojun

- if ip_dst matches address on !IFF_UP interface, and
- there's no match against addresses on IFF_UP interface,
send icmp unreach if I'm router. drop it if I'm host.

Revised version of PR: 9387 from nrt@iij.ad.jp. Discussed with thorpej+nrt.


Revision tags: chs-ubc2-newbase
# 1.99 12-Feb-2000 thorpej

Typo (Thanks, Havard :-)


# 1.98 12-Feb-2000 thorpej

Small cosmetic change, and note a place where a statistic should be
gathered.


# 1.97 11-Feb-2000 itojun

fix in-kernel packet forwarding loop (till TTL becomes 0) when:
- a packet is delivered to an address X,
- and the address X is configured on my !IFF_UP interface
- and ipforwarding=1

NetBSD PR: 9387
From: nrt@iij.ad.jp


# 1.96 01-Feb-2000 thorpej

Use ifatoia() and sintosa() consistently, rather than using home-grown
casting macros intermixed.


# 1.95 31-Jan-2000 itojun

bring in latest KAME ipsec tree.
- interop issues in ipcomp is fixed
- padding type (after ESP) is configurable
- key database memory management (need more fixes)
- policy specification is revisited

XXX m->m_pkthdr.rcvif is still overloaded - hope to fix it soon


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.94 26-Oct-1999 itojun

disable ipflow (IPv4 fast fowarding) when IPsec is configured into the kernel.


# 1.93 17-Oct-1999 sommerfeld

branches: 1.93.2; 1.93.4;
In ip_forward():

Avoid forwarding ip unicast packets which were contained inside
link-level multicast packets; having M_MCAST still set in the packet
header flags will mean that the packet will get multicast to a bogus
group instead of unicast to the next hop.

Malformed packets like this have occasionally been spotted "in the
wild" on a mediaone cable modem segment which also had multiple netbsd
machines running as router/NAT boxes.

Without this, any subnet with multiple netbsd routers receiving all
multicasts will generate a packet storm on receipt of such a
multicast. Note that we already do the same check here for link-level
broadcasts; ip6_forward already does this as well.

Note that multicast forwarding does not go through ip_forward().

Adding some code to if_ethersubr to sanity check link-level
vs. ip-level multicast addresses might also be worthwhile.


Revision tags: chs-ubc2-base
# 1.92 23-Jul-1999 itojun

branches: 1.92.2;
do not include unnecessary include files.


# 1.91 09-Jul-1999 thorpej

defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).


# 1.90 06-Jul-1999 itojun

sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups


# 1.89 01-Jul-1999 itojun

IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.88 26-Jun-1999 sommerfeld

If the new global variable hostzerobroadcast is zero, no longer assume
address zero of each net/subnet is a broadcast address.
(The default value is nonzero, which preserves the current behavior).

This can be set using sysctl; the boot-time default can also be
configured using the HOSTZEROBROADCAST kernel config option.

While we're here, defopt HOSTZEROBROADCAST and SUBNETSARELOCAL


# 1.87 04-May-1999 hwr

It does not make much sense to increase a "output" counter on input.


# 1.86 03-May-1999 thorpej

In INADDR_TO_IA(), skip interfaces which are not up. Revert previous change
to ip_input.c to check the interface status after INADDR_TO_IA().

Fix cooked up by Heiko Rupp and myself.

Fixes PR 7480.


# 1.85 03-May-1999 hwr

Drop packets, that have a Class-D address as source address.
Implements the first half of PR 7003.


# 1.84 07-Apr-1999 proff

tiny KNF change


# 1.83 07-Apr-1999 proff

Prevent reception of packets on downed interfaces (via an up interface).
fixes kern/7327


Revision tags: netbsd-1-4-base
# 1.82 27-Mar-1999 aidan

branches: 1.82.2;
Added per-addr input/output statistics. Currently just support netatalk
and netinet, currently only tested under netinet.

Disabled by default, enabled by compiling the kernel with option
IFA_STATS. Enabling this feature seems to make the ip_output function
take 13% longer than before, which should be OK for people that need
this feature.


# 1.81 26-Mar-1999 proff

security: test for ip_len < ip_hl <<2 and drop packet accordingly


# 1.80 19-Jan-1999 mycroft

There's just no plausible reason to byte-swap ip_id internally. It's opaque.


# 1.79 19-Jan-1999 mycroft

Don't screw with ip_len; just subtract from it where we actually use the
value.


# 1.78 19-Jan-1999 mycroft

Don't overwrite the checksum fields when checking them. There's no reason to
do this, and it screws up ICMP replies.
XXX The returned IP checksum and length are still wrong.


# 1.77 11-Jan-1999 thorpej

Fix byte order and ip_len inconsistencies in ICMP reply code. Also, fix
some formatting and HTONS(foo) vs. foo = htons(foo) inconsistencies.

PR #6602, Darren Reed.


# 1.76 19-Dec-1998 thorpej

Reverse the copyright-notice-swap. It went against existing practice.


# 1.75 18-Dec-1998 thorpej

Add a lock around the IP fragment reassembly queue, to prevent ip_drain()
from corrupting the queue if called from a device's interrupt context.

Should fix PR #5684.


Revision tags: kenh-if-detach-base
# 1.74 13-Nov-1998 thorpej

branches: 1.74.2;
Once a fragmented IP packet has been reassembled, recompute the packet
length before passing it up the stack. From FreeBSD.


Revision tags: chs-ubc-base
# 1.73 08-Oct-1998 thorpej

Use the pool allocator for ipflow entries.


# 1.72 08-Oct-1998 thorpej

Use the pool allocator for ipqent structures.


# 1.71 30-Sep-1998 tls

Switch order of TNF and UCB copyrights so UCB copyright is first; this seems more appropriate since UCB wrote the original code, after all.


# 1.70 09-Sep-1998 thorpej

Make a diagnostic printf more sensible, PR #5951, Heiko W. Rupp.


# 1.69 09-Aug-1998 mrg

defopt PFIL_HOOKS.


Revision tags: eeh-paddr_t-base
# 1.68 17-Jul-1998 sommerfe

Fix PR5508: ipfil cut-through forwarding causes panic


# 1.67 01-Jun-1998 thorpej

Protect the ipflow_reap() call with splsoftnet.


# 1.66 24-May-1998 thorpej

Fix OBOB in IP timestamp option processing, as noted in FreeBSD PR 6738,
from Jennifer Dawn Meyers <jdm@enteract.com>.


# 1.65 04-May-1998 matt

Default IP flow to being enabled. Add a sysctl to control the maximum
number of flows (net.inet.ip.maxflows). If set to 0, will disable fast
path forwarding.


# 1.64 01-May-1998 thorpej

Allow packet filters to prevent a packet from creating a fast-forwarding
flow, by setting the "can fast forward" flag in the packet header, and
giving a chance for filters to clear the flag. If the flag is still
set after the filters have given it a chance, the packet will be used
to create a fast-forward flow entry.


# 1.63 29-Apr-1998 matt

Add support for "fast" forwarding. Add hooks in if_ethersubr.c and
if_fddisubr.c to fastpath IP forwarding. If ip_forward successfully
forwards a packet, it will create a cache (ipflow) entry. ether_input
and fddi_input will first call ipflow_fastforward with the received
packet and if the packet passes enough tests, it will be forwarded (the
ttl is decremented and the cksum is adjusted incrementally).


# 1.62 29-Apr-1998 matt

defopt GATEWAY


# 1.61 29-Apr-1998 kml

change path MTU timeout value to match RFC 1191


# 1.60 29-Apr-1998 kml

Add support for deletion of routes added by path MTU discovery;
uses new generic route timeout code. Add sysctl for timeout period.


# 1.59 19-Mar-1998 mrg

convert pfil(9) in and out lists from <sys/queue.h> LISTs to TAILQs, and
change pfil_add_hook to put output filters at the tail of the queue,
while continuing to place input filters at the head of the queue. update
the two users of these functions, and document these changes.

fixes PR#4593.


# 1.58 15-Feb-1998 tls

Add correct copyright notice for IP address hash change. This code is donated to TNF by the original copyright holder, Panix.


# 1.57 13-Feb-1998 tls

Change list of interface IP addresses to a hash. Improves performance on hosts with a large number of IP addresses significantly.


# 1.56 28-Jan-1998 thorpej

Use offsetof() from libkern.h


# 1.55 12-Jan-1998 scottr

Use option header file for MROUTING


# 1.54 05-Jan-1998 lukem

enhance ephemeral port allocation code:
* support sysctl net.inet.ip.anonportmin (lowest ephemeral port)
and net.inet.ip.anonportmax (highest ephemeral port).
these can't be set to >65535, < IPPORT_RESERVED (unless IPNOPRIVPORTS
is defined), and anonportmin has to be < anonportmax.
* use a cleaner way of only cycling through the available set once;
this will be useful for when a random allocation scheme is used
* define IPPORT_ANON{MIN,MAX} instead of IPPORT_USER{LOW,HIGH}


Revision tags: netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base
# 1.53 18-Oct-1997 kml

branches: 1.53.2;
change sysctl net.inet.icmp.mtudisc to net.inet.ip.mtudisc


# 1.52 17-Oct-1997 thorpej

Allow `subnetsarelocal' to be changed via sysctl.


Revision tags: thorpej-signal-base marc-pcmcia-base
# 1.51 29-Aug-1997 gwr

Tweaks to allow operation with an interface address of 0.0.0.0
(needed for NFS mountroot using BOOTP to get boot parameters)


Revision tags: marc-pcmcia-bp
# 1.50 24-Jun-1997 thorpej

branches: 1.50.4;
Eliminate use of dtom() from the network code, allowing more flexible
use of mbuf external storage and increasing performance (by eliminating
an m_pullup() for clusters in the IP reassembly code).

Changes from Koji Imada <koji@math.human.nagoya-u.ac.jp>, in PR #3628
and #3480, with ever-so-slight integration changes by me.


# 1.49 15-Apr-1997 christos

Move the mtod calls *after* we've made sure that the packet has passed the
filter successfully. Otherwise it can be NULL if the filter blocked it,
and we die. How did this ever work?


Revision tags: is-newarp-before-merge
# 1.48 26-Feb-1997 mrg

allow src-routed packetd by default, per host requirements


# 1.47 25-Feb-1997 cjs

Add net.inet.ip.allowsrcrt option which allows/drops all source
routed packets. This currently defaults to `drop,' but once we
verify that all applications that rely on determining remote IP
addresses for authentication are dropping the connection when they
see a source route option (not just disabling the source route
option), we can turn this back on and conform with the host
requirements.


# 1.46 19-Feb-1997 cjs

Fix bug in sysctl net.inet.ip.forwsrcrt handing: now you can read it
if securelevel > 0. (Thanks, cgd.)


# 1.45 18-Feb-1997 mrg

pseudo-device ipfilter brings in PFIL_HOOKS.


Revision tags: is-newarp-base
# 1.44 11-Jan-1997 thorpej

branches: 1.44.4;
Implement the IP_RECVIF socket option: supply a datagram packet's incoming
interface using a sockaddr_dl in a control mbuf.

Implement SO_TIMESTAMP for IP datagrams.

Move packet information option processing into a generic function
so that they work with multicast UDP and raw IP as well as unicast UDP.

Contributed by Bill Fenner <fenner@parc.xerox.com>.


# 1.43 20-Dec-1996 mrg

in pfil_hooks: always reassign ip after calling hook.


# 1.42 20-Dec-1996 mrg

remove pfil_bad.


# 1.41 25-Oct-1996 thorpej

Before concatenating frags, sanity check the length of the packet. If it's
larger than IP_MAXPACKET, discard it.
Based on a patch from Bill Fenner <fenner@parc.xerox.com>


# 1.40 22-Oct-1996 veego

Fix a panic from the pfil_hooks.


# 1.39 13-Oct-1996 christos

backout previous kprintf changes


# 1.38 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.37 21-Sep-1996 perry

commit fix in pr 2772 -- the IP input code was assuming that the
reserved (must be zero) flag must necessarily be zero. We now define
an IP_RF (by analogy to IP_DF and IP_MF) and mask it out when necessary.


# 1.36 14-Sep-1996 mrg

move the packet filter hooks in to a saner location. while i'm here, rename
PACKET_FILTER to PFIL_HOOKS.


# 1.35 09-Sep-1996 mycroft

Add in_nullhost() and in_hosteq() macros, to hide some protocol
details. Also, fix a bug in TCP wrt SYN+URG packets.


# 1.34 08-Sep-1996 mycroft

Save 68 bytes of the packet for ICMP, not 64. From Laine Stump, PR 2296.


# 1.33 06-Sep-1996 mrg

add packet filter interface code. see pfil(9) for more details. you
need the PACKET_FILTER option to enable this code. currently, ipfilter
version 3.1.1-beta has been converted to use this new interface.


# 1.32 14-Aug-1996 thorpej

Fix some DIAGNOSTIC printf() formats; ntohl() provides a 32-bit quantity,
and should be printed with %x, not %lx.


# 1.31 10-Jul-1996 cgd

print result of ntohl/htonl as a long. (makes -Wformat work on the
Alpha.)


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.30 16-Mar-1996 christos

branches: 1.30.4;
Fix printf format args.


# 1.29 26-Feb-1996 mrg

two more local addr changes, all done differently now (idea from charles)


# 1.28 13-Feb-1996 christos

netinet prototypes


# 1.27 16-Jan-1996 thorpej

Add a net.inet.ip.directed-broadcast sysctl as suggested by
Darren Reed <darrenr@vitruvius.arbld.unimelb.edu.au> in PR #1227.
This change is slightly different than the one submitted by Darren in
that the DIRECTED_BROADCAST compile-time option will behave like it used
to so that existing configurations utilizing it won't have to change.


# 1.26 15-Jan-1996 thorpej

Add net.inet.ip.forwsrcrt: if zero, the system will not forward
source-routed packets. Note this value is protected by kernel security
level; it can only be changed if securelevel < 1.


# 1.25 21-Nov-1995 cgd

make netinet work on systems where pointers and longs are 64 bits
(like the alpha). Biggest problem: IP headers were overlayed with
structure which included pointers, and which therefore didn't overlay
properly on 64-bit machines. Solution: instead of threading pointers
through IP header overlays, add a "queue element" structure to do
the threading, and point it at the ip headers.


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.24 12-Aug-1995 mycroft

splnet --> splsoftnet


# 1.23 12-Jun-1995 mycroft

Change in_pcbnotify*() to take an errno value. Make inetctlerrmap[] an
array on ints, not u_chars.


# 1.22 12-Jun-1995 mycroft

Various cleanup, including:
* Convert several data structures to use queue.h.
* Split in_pcbnotify() into two parts; one for notifying a specific PCB, and
one for notifying all PCBs for a particular foreign address.


# 1.21 07-Jun-1995 mycroft

Remove ip_ifmatrix completely.


# 1.20 04-Jun-1995 mycroft

Don't cast things unnecessarily.


# 1.19 04-Jun-1995 mycroft

Clean up many more casts.


# 1.18 01-Jun-1995 mycroft

Avoid byte-swapping IP addresses at run time.


# 1.17 15-May-1995 cgd

oops; forgot a '{'


# 1.16 14-May-1995 cgd

drop (and record) malformed IP fragments. Fixes pr 1030 (differently).


# 1.15 13-Apr-1995 cgd

be a bit more careful and explicit with types. (basically a large no-op.)


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.14 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.13 13-May-1994 mycroft

Update to 4.4-Lite networking code, with a few local changes.


# 1.12 14-Feb-1994 mycroft

PARANOID --> DIAGNOSTIC for inexpensive tests.


# 1.11 02-Feb-1994 hpeyerl

Multicast is no longer optional.


# 1.10 29-Jan-1994 brezak

Fix some cases of NOT dealing with m_pkthdr's. This code is still suspect though, at least this fixes some panics.


# 1.9 10-Jan-1994 mycroft

Should compile now with or without `options MULTICAST'.


# 1.8 09-Jan-1994 mycroft

Prototype the rest.


# 1.7 08-Jan-1994 mycroft

More prototypes.


# 1.6 08-Jan-1994 mycroft

Fix some inconsistent spacing; spaces at the end of lines, etc.


# 1.5 18-Dec-1993 mycroft

Canonicalize all #includes.


# 1.4 06-Dec-1993 hpeyerl

multicast support.
>From Chris Maeda, cmaeda@cs.washington.edu
These patches are derived from the IP Multicast patches for BSDI.


Revision tags: magnum-base netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.3 20-May-1993 cgd

branches: 1.3.4;
more rcsid additions and file header cleanups


# 1.2 04-May-1993 cgd

make ip_input recursion checking be for -DPARANOID, and make it panic


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.397 28-Aug-2020 ozaki-r

inet: reduce silent packet discards


# 1.396 28-Aug-2020 ozaki-r

inet: pull m_get_rcvif_psref out of ip_input for simplicity

Same as ip6_input.


# 1.395 28-Aug-2020 ozaki-r

ipsec: rename ipsec_ip_input to ipsec_ip_input_checkpolicy

Because it just checks if a packet passes security policies.


# 1.394 28-Aug-2020 ozaki-r

inet, inet6: count packets dropped by IPsec

The counters count packets dropped due to security policy checks.


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406 ad-namecache-base3 ad-namecache-base2 ad-namecache-base1 ad-namecache-base phil-wifi-20191119
# 1.393 13-Nov-2019 ozaki-r

Get rid of unnecessary NULL checks for rt_ifa and ifa_ifp

They are always non-NULL nowadays.


# 1.392 19-Sep-2019 ozaki-r

Apply some missing changes lost on the previous commit


# 1.391 19-Sep-2019 ozaki-r

Avoid having a rtcache directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.
A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.

Reviewed by knakahara@ and yamaguchi@


# 1.390 15-Sep-2019 bouyer

Packet filters can return an mbuf chain with fragmented headers, so
m_pullup() it if needed and remove the KASSERT()s.


Revision tags: netbsd-9-base phil-wifi-20190609
# 1.389 13-May-2019 ozaki-r

branches: 1.389.2;
Count packets dropped by pfil


Revision tags: isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118
# 1.388 17-Jan-2019 knakahara

Fix ipsecif(4) cannot apply input direction packet filter. Reviewed by ozaki-r@n.o and ryo@n.o.

Add ATF later.


Revision tags: pgoyette-compat-1226 pgoyette-compat-1126
# 1.387 15-Nov-2018 maxv

Remove the 't' argument from m_tag_find().


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.386 02-Sep-2018 maxv

remove reference to ipnat, and duplicate comments


Revision tags: pgoyette-compat-0728
# 1.385 10-Jul-2018 maxv

Remove the second argument from ip_reass_packet(). We want the IP header
on the mbuf, not elsewhere. Simplifies the NPF reassembly code a little.
No real functional change.


Revision tags: phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.384 17-May-2018 maxv

branches: 1.384.2;
Add KASSERTs, related to PR/39794.


# 1.383 14-May-2018 maxv

Merge ipsec4_input and ipsec6_input into ipsec_ip_input. Make the argument
a bool for clarity. Optimize the function: if M_CANFASTFWD is not there
(because already removed by the firewall) leave now.

Makes it easier to see that M_CANFASTFWD is not removed on IPv6.


# 1.382 10-May-2018 maxv

Rename ipsec4_forward -> ipsec_mtu, and switch to void.


Revision tags: pgoyette-compat-0502
# 1.381 26-Apr-2018 maxv

Remove unused mbuf argument from sbsavetimestamp.


Revision tags: pgoyette-compat-0422 pgoyette-compat-0415
# 1.380 15-Apr-2018 maxv

Introduce a m_verify_packet function, that verifies the mbuf chain of a
packet to ensure it is not malformed. Call this function in "points of
interest", that are the IPv4/IPv6/IPsec entry points. There could be more.

We use M_VERIFY_PACKET(m), declared under DIAGNOSTIC only.

This function should not be called everywhere, especially not in places
that temporarily manipulate (and clobber) the mbuf structure; once they're
done they put the mbuf back in a correct format.


# 1.379 11-Apr-2018 maxv

Don't pass IP_ALLOWBROADCAST in ipsec4_input. The flag lands in
ipsec_getpolicybyaddr, and only IP_FORWARDING is taken.

In fact it would be good to change the 'flags' argument of ipsec4_input
to be a boolean, same for ipsec_getpolicybyaddr. It would be less
misleading.


# 1.378 11-Apr-2018 maxv

Add comment about IPsec.


# 1.377 11-Apr-2018 maxv

Small changes in ip_dooptions: replace bcopy by memcpy, the areas can't
overlap.


Revision tags: pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.376 24-Feb-2018 ozaki-r

branches: 1.376.2;
Avoid a deadlock between softnet_lock and IFNET_LOCK

A deadlock occurs because there is a violation of the rule of lock ordering;
softnet_lock is held with hodling IFNET_LOCK, which violates the rule.
To avoid the deadlock, replace softnet_lock in in_control and in6_control
with KERNEL_LOCK.

We also need to add some KERNEL_LOCKs to protect the network stack surely.
This is required, for example, for PR kern/51356.

Fix PR kern/53043


# 1.375 09-Feb-2018 maxv

Remove dead code.


# 1.374 07-Feb-2018 maxv

Remove null check on ip, it can't be null. (Confuses code scanners.)


# 1.373 06-Feb-2018 maxv

Typos and style a bit, no functional change.


# 1.372 05-Feb-2018 maxv

Exterminate IPSENDREDIRECTS and IPMTUDISCTIMEOUT, neither is documented.


# 1.371 05-Feb-2018 maxv

Nuke DIRECTED_BROADCAST, it is not documented and not enabled anywhere. It
probably wouldn't have built correctly anyway, since there is no associated
defflag.

These ten lines of code in ip_input.c already look a lot better.


# 1.370 05-Feb-2018 maxv

Clean up this mess. This is typically the kind of places where we need to
seriously cut the bullshit. These things are unreadable, undocumented, and
all they bought us was not figuring out we had IPv4 forwarding enabled by
default for 20+ years.


# 1.369 05-Feb-2018 maxv

Be tougher, and don't allow LSRR+SSRR (RFC7126).


# 1.368 05-Feb-2018 maxv

Kick duplicate options, they are not allowed (RFC791).


# 1.367 05-Feb-2018 maxv

Remove unused variable.


# 1.366 05-Feb-2018 maxv

Disable ip_allowsrcrt and ip_forwsrcrt. Enabling them by default was a
completely dumb idea, because they have security implications.

By sending an IPv4 packet containing an LSRR option, an attacker will
cause the system to forward the packet to another IPv4 address - and
this way he white-washes the source of the packet.

It is also possible for an attacker to reach hidden networks: if a server
has a public address, and a private one on an internal network (network
which has several internal machines connected), the attacker can send a
packet with:

source = 0.0.0.0
destination = public address of the server
LSRR first address = address of a machine on the internal network

And the packet will be forwarded, by the server, to the internal machine,
in some cases even with the internal IP address of the server as a source.


# 1.365 05-Feb-2018 maxv

Style, no functional change.


# 1.364 01-Jan-2018 christos

1) "#define ipi_spec_dst ipi_addr" in <netinet/in.h>
2) Change the IP_RECVPKTINFO option to control the generation of
IP_PKTINFO control messages, the way it's done in Solaris.
3) Remove the superfluous IP_RECVPKTINFO control message.
4) Change the IP_PKTINFO option to do different things depending on
the parameter it's supplied with:
- If it's sizeof(int), assume it's being used as in Linux:
- If it's non-zero, turn on the IP_RECVPKTINFO option.
- If it's zero, turn off the IP_RECVPKTINFO option.
- If it's sizeof(struct in_pktinfo), assume it's being used as in
Solaris, to set a default for the source interface and/or
source address for outgoing packets on the socket.
5) Return what Linux or Solaris compatible code expects, depending
on data size, and just added a fallback to a Linux (and current NetBSD)
compatible value if the size is unknown (as it is now), or,
in the future, if the calling application specifies a receiving
buffer that doesn't match either data item.

From: Tom Ivar Helbekkmo


Revision tags: tls-maxphys-base-20171202
# 1.363 24-Nov-2017 roy

Allow local communication over DETACHED addresses.
Allow binding to DETACHED or TENTATIVE addresses as we deny
sending upstream from them anyway.
Prefer non DETACHED or TENTATIVE addresses.


# 1.362 17-Nov-2017 ozaki-r

Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch

It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.

No functional change


# 1.361 27-Sep-2017 ozaki-r

Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).


Revision tags: nick-nhusb-base-20170825
# 1.360 27-Jul-2017 ozaki-r

Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.


# 1.359 19-Jul-2017 ozaki-r

Correct a comment


Revision tags: perseant-stdc-iso10646-base
# 1.358 08-Jul-2017 christos

Reorder the controls to the ones that need an interface and the ones that
don't; process the ones that don't first. Add a DIAGNOSTIC if there is no
interface; really this should be a KASSERT/panic because it is a bug if the
interface is not set at this point.


# 1.357 06-Jul-2017 christos

remove unnecessary casts (no functional change)


# 1.356 06-Jul-2017 christos

Merge the two copies SO_TIMESTAMP/SO_OTIMESTAMP processing to a single
function, and add a SOOPT_TIMESTAMP define reducing compat pollution from
5 places to 1.


Revision tags: netbsd-8-base
# 1.355 01-Jun-2017 chs

branches: 1.355.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base
# 1.354 31-Mar-2017 ozaki-r

Don't use a single global variable to store source route information for multiple incoming packets

It's not MP-safe. So use a m_tag to store the information instead.

Pointed out by knakahara@
The fix is from OpenBSD (originally fixed in FreeBSD)


# 1.353 31-Mar-2017 ozaki-r

Don't use a single global variable as a temporal storage for multiple packets

It's not MP-safe. So use local variables instead.


Revision tags: pgoyette-localcount-20170320
# 1.352 06-Mar-2017 ozaki-r

Make sure icmp_redirect_timeout_q and ip_mtudisc_timeout_q are initialized on bootup

Fix PR kern/52029


# 1.351 17-Feb-2017 ozaki-r

Fix return value


# 1.350 17-Feb-2017 ozaki-r

Protect sysctl_net_inet_ip_pmtudto with icmp_mtx instead of softnet_lock


# 1.349 07-Feb-2017 ozaki-r

Add missing NULL checks for m_get_rcvif


Revision tags: nick-nhusb-base-20170204
# 1.348 24-Jan-2017 ozaki-r

Tweak softnet_lock and NET_MPSAFE

- Don't hold softnet_lock in some functions if NET_MPSAFE
- Add softnet_lock to sysctl_net_inet_icmp_redirtimeout
- Add softnet_lock to expire_upcalls of ip_mroute.c
- Restore softnet_lock for in{,6}_pcbpurgeif{,0} if NET_MPSAFE
- Mark some softnet_lock for future work


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107
# 1.347 12-Dec-2016 ozaki-r

branches: 1.347.2;
Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.346 08-Dec-2016 ozaki-r

Use psref for ip_rtaddr

ip_rtaddr will be sleepable soon. So use psref instead of pserialize.


# 1.345 08-Dec-2016 ozaki-r

Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.344 18-Oct-2016 ozaki-r

Don't hold global locks if NET_MPSAFE is enabled

If NET_MPSAFE is enabled, don't hold KERNEL_LOCK and softnet_lock in
part of the network stack such as IP forwarding paths. The aim of the
change is to make it easy to test the network stack without the locks
and reduce our local diffs.

By default (i.e., if NET_MPSAFE isn't enabled), the locks are held
as they used to be.

Reviewed by knakahara@


# 1.343 18-Oct-2016 ozaki-r

Avoid double frees of mbuf

May fix one of panicks reported by Tom Ivar Helbekkmo in PR kern/51522


# 1.342 11-Oct-2016 ozaki-r

Fix kernel builds with IFA_STATS


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.341 07-Sep-2016 roy

Disallow input to detached addresses because they are not yet valid.


# 1.340 31-Aug-2016 ozaki-r

Make ipforward_rt and ip6_forward_rt percpu

Sharing one rtcache between CPUs is just a bad idea.

Reviewed by knakahara@


Revision tags: pgoyette-localcount-20160806
# 1.339 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


# 1.338 26-Jul-2016 ozaki-r

Fix downmatch increment


Revision tags: pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.337 08-Jul-2016 ozaki-r

branches: 1.337.2;
CID 1363344: remove dead code

We may need to reconsider a case when m_get_rcvif_psref returns NULL.


# 1.336 07-Jul-2016 ozaki-r

Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.


# 1.335 06-Jul-2016 ozaki-r

Switch the IPv4 address list to pslist(9)

Note that we leave the old list just in case; it seems there are some
kvm(3) users accessing the list. We can remove it later if we confirmed
nobody does actually.


# 1.334 06-Jul-2016 ozaki-r

Add and use pslist(9)-based hashtable for IPv4 addresses

Note that we leave the old hashtable to keep vmstat -H working.


# 1.333 04-Jul-2016 ozaki-r

Separate IP address matching functions

No functional change intended.


# 1.332 30-Jun-2016 ozaki-r

Tidy up goto lables

No functional change.


# 1.331 30-Jun-2016 ozaki-r

Fix error paths

Some error paths did m_put_rcvif_psref twice.


# 1.330 28-Jun-2016 ozaki-r

Add missing NULL checks for m_get_rcvif_psref


# 1.329 10-Jun-2016 ozaki-r

Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319
# 1.328 21-Jan-2016 riastradh

Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.


# 1.327 21-Jan-2016 riastradh

Give proper prototype to ip_output.


# 1.326 08-Jan-2016 knakahara

eliminate ip_input.c and ip6_input.c dependency on gif(4)


Revision tags: nick-nhusb-base-20151226
# 1.325 13-Oct-2015 roy

Include arp.h to restore the sysctl net.inet.ip.dad_count.
Fixes PR kern/49883 thanks to HITOSHI Osada.


Revision tags: nick-nhusb-base-20150921
# 1.324 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.323 07-Aug-2015 ozaki-r

Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.


Revision tags: nick-nhusb-base-20150606
# 1.322 02-May-2015 joerg

Fix !ARP build.


# 1.321 02-May-2015 roy

Add IPv4 address flags IN_IFF_TENTATIVE, IN_IFF_DUPLICATED and
IN_IFF_DETATCHED to mimic the IPv6 address behaviour.
Add SIOCGIFAFLAG_IN ioctl to retrieve the address flag via the
ifreq structure.
Add IPv4 DAD detection via the ARP methods described in RFC 5227.
Add sysctls net.inet.ip.dad_count and net.inet.arp.debug.

Discussed on tech-net@


Revision tags: nick-nhusb-base-20150406
# 1.320 26-Mar-2015 ozaki-r

Tidy up the regular path of ip_forward

No functional change is intended.


Revision tags: netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.319 16-Jun-2014 ozaki-r

branches: 1.319.2; 1.319.4; 1.319.6; 1.319.10;
Add 3rd argument to pktq_create to pass sc

It will be used to pass bridge sc for bridge_forward softint.

ok rmind@


# 1.318 05-Jun-2014 rmind

- Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.


# 1.317 30-May-2014 christos

Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.


# 1.316 29-May-2014 rmind

Make IGMP and multicast group management code MP-safe. Use a read-write
lock to protect the hash table of multicast address records; also, make it
private and eliminate some macros. In the long term, the lookup path ought
to be optimised.


# 1.315 28-May-2014 christos

CID 12164{49,51}: Remove bogus ifp == NULL checks; if ifp was really NULL,
we would have been dead a few lines before the tests.


# 1.314 23-May-2014 rmind

ip_input(), ip_savecontrol(): cache m->m_pkthdr.rcvif in a variable.


# 1.313 23-May-2014 rmind

Make ip_forward() static, there is no need to expose it.


# 1.312 23-May-2014 rmind

Make ip_input() static, there is no need to expose it.


# 1.311 22-May-2014 rmind

- Add in_init() and move some functions, variables and sysctls into in.c
where they belong to. Make some functions and variables static.
- ip_input.c: reduce some #ifdefs, cleanup a little.
- Move some sysctls into ip_flow.c as they belong there.

No functional change.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.310 19-Mar-2014 liamjfoy

branches: 1.310.2;
Remove ipflow_prune and replace with ipflow_reap. ok rmind@


Revision tags: riastradh-drm2-base3
# 1.309 25-Feb-2014 pooka

Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.308 29-Jun-2013 rmind

- Rewrite parts of pfil(9): use array to store hooks and thus be more cache
friendly (there are only few hooks in the system). Make the structures
opaque and the interface more strict.
- Remove PFIL_HOOKS option by making pfil(9) mandatory.


# 1.307 27-Jun-2013 christos

branches: 1.307.2;
flip src/dst


# 1.306 27-Jun-2013 christos

implement IP_PKTINFO and IP_RECVPKTINFO.


# 1.305 08-Jun-2013 rmind

Split IPsec code in ip_input() and ip_forward() into the separate routines
ipsec4_input() and ipsec4_forward(). Tested by christos@.


# 1.304 05-Jun-2013 christos

IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.


Revision tags: agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.303 29-Nov-2012 christos

Add a new sysctl to mark ports as reserved, so that they are not used in
the anonymous or reserved port allocation.


Revision tags: yamt-pagecache-base6
# 1.302 25-Jun-2012 christos

branches: 1.302.2;
rename rfc6056 -> portalgo, requested by yamt


# 1.301 22-Jun-2012 christos

PR/46602: Move the rfc6056 port randomization to the IP layer.


# 1.300 02-Jun-2012 dsl

Add some pre-processor magic to verify that the type of the data item
passed to sysctl_createv() actually matches the declared type for
the item itself.
In the places where the caller specifies a function and a structure
address (typically the 'softc') an explicit (void *) cast is now needed.
Fixes bugs in sys/dev/acpi/asus_acpi.c sys/dev/bluetooth/bcsp.c
sys/kern/vfs_bio.c sys/miscfs/syncfs/sync_subr.c and setting
AcpiGbl_EnableAmlDebugObject.
(mostly passing the address of a uint64_t when typed as CTLTYPE_INT).
I've test built quite a few kernels, but there may be some unfixed MD
fallout. Most likely passing &char[] to char *.
Also add CTLFLAG_UNSIGNED for unsiged decimals - not set yet.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.299 22-Mar-2012 drochner

remove KAME IPSEC, replaced by FAST_IPSEC


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.298 09-Jan-2012 liamjfoy

branches: 1.298.2; 1.298.6; 1.298.8;
check against NULL


# 1.297 19-Dec-2011 drochner

rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.296 31-Aug-2011 plunky

branches: 1.296.2; 1.296.6;
NULL does not need a cast


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.295 03-May-2011 dyoung

*_drain() routines may be called with locks held, so instead of doing
any work in *_drain(), set a drain-needed flag. Do the work in the
fasttimo handler.

Contributed by Coyote Point Systems, Inc.


# 1.294 14-Apr-2011 dyoung

In ipintr(), don't overwrite ipintrq.ifq_maxlen with IFQ_MAXLEN.

Initialize ipintrq.ifq_maxlen using IFQ_MAXLEN directly instead of using
the global ipqmaxlen. Get rid of the global ipqmaxlen.

Now it works again to override the maximum IP queue length with, for
example, sysctl -w net.inet.ip.ifq.maxlen=5.


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.293 13-Dec-2010 matt

branches: 1.293.2;
Back out rev that shouldn't have been committed.


# 1.292 11-Dec-2010 matt

Add routines to calculate a checkesum if the driver concludes that the
h/w can't do it.


Revision tags: uebayasi-xip-base4
# 1.291 05-Nov-2010 rmind

ip_randomid: make mechanism MP-safe and more modular.

OK matt@


# 1.290 05-Nov-2010 rmind

ip_reass_packet: finish abstraction; some clean-up.
Discussed some time ago with matt@.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.289 19-Jul-2010 rmind

Abstract IP reassembly into single generic routine - ip_reass_packet().
Make struct ipq private and struct ipqent not visible to userland.
Push ip_len adjustment into reassembly layer.

OK matt@


# 1.288 13-Jul-2010 rmind

Split-off IPv4 re-assembly mechanism into a separate module. Abstract
into ip_reass_init(), ip_reass_lookup(), etc (note: abstraction is not
yet complete). No functional changes to the actual mechanism.

OK matt@


# 1.287 09-Jul-2010 rmind

ip_input: move lookup for fragment queue a little bit further. OK matt@.


Revision tags: uebayasi-xip-base1
# 1.286 01-Apr-2010 tls

As suggested by at least 3 different people (the guilty parties know who
they are) avoid repeated kernel_lock/unlock by using an intrq on the stack.

About 5%-10% better from run to run, on my *very* simpleminded test. Can't
possibly be worse.


# 1.285 31-Mar-2010 tls

Don't hold kernel lock across call to ip_input() -- it blocked *all*
hardware interrupts for the length of time it took for all dequeued
packets to flow up the stack (on multiprocessors only). Initial testing
shows performance impact is minimal -- since this temporary fix actually
means taking/releasing the kernel lock per-packet, that seems
acceptable.

Holding the kernel lock across the ip_input() call duplicated the
exclusion intended to be provided by the socket locks/softnet lock
(same lock, for INET/INET6 sockets) and could mask serious bugs. Several
hours' testing didn't turn any up but I'd be surprised if some don't now
appear.

Damon Permezel noticed the problem. Temporary fix suggested by matt@.


Revision tags: yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.284 16-Sep-2009 pooka

branches: 1.284.2; 1.284.4;
Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.283 17-Jul-2009 minskim

Delete trailing whitespace.


Revision tags: yamt-nfs-mp-base6
# 1.282 16-Jul-2009 minskim

Add the IP_RECVTTL option support.

If the IP_RECVTTL option is enabled on a SOCK_DGRAM socket, the
recvmsg(2) call will return the TTL of the received datagram. The
msg_control field in the msghdr structure points to a buffer that
contains a cmsghdr structure followed by the TTL value.

Modeled after FreeBSD implementation.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.281 18-Apr-2009 tsutsui

Remove extra whitespace added by a stupid tool.
XXX: more in src/sys/arch


# 1.280 15-Apr-2009 elad

Remove a few KAUTH_GENERIC_ISSUSER in favor of more descriptive
alternatives.

Discussed on tech-kern:

http://mail-index.netbsd.org/tech-kern/2009/04/11/msg004798.html

Input from ad@, christos@, dyoung@, tsutsui@.

Okay ad@.


# 1.279 18-Mar-2009 cegger

bcopy -> memcpy


Revision tags: nick-hppapmap-base2
# 1.278 19-Jan-2009 christos

branches: 1.278.2;
Provide compatibility to the old timeval SCM_TIMESTAMP messages.


Revision tags: mjf-devfs2-base
# 1.277 17-Dec-2008 cegger

kill MALLOC and FREE macros.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.276 23-Nov-2008 rmind

ip_input: fix an IPQ "lock" leak. (hi <matt>!)


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4
# 1.275 04-Oct-2008 pooka

branches: 1.275.2; 1.275.4;
POOL_INIT -> pool_init


Revision tags: wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.274 05-Sep-2008 seanb

Wrong route being consulted in one place
in ip_forward() after change to rtcache_*().
Restore previous behaviour.


# 1.273 20-Aug-2008 matt

Make the sysctl routines take out softnet_lock before dealing with
any data structures.

Change inet6ctlerrmap and zeroin6_addr to const.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base
# 1.272 05-May-2008 ad

branches: 1.272.2; 1.272.6;
- Convert hashinit() to use kmem_alloc(). The hash tables can be large
and it's better to not have them in kmem_map.
- Convert a couple of minor items along the way to kmem_alloc().
- Fix some memory leaks.


# 1.271 04-May-2008 thorpej

Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577


# 1.270 02-May-2008 ad

PR kern/38497 Out of memory allocating ksiginfo

Work around: don't acquire softnet_lock in protocol drain routines.


# 1.269 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.268 24-Apr-2008 ad

branches: 1.268.2;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.


# 1.267 23-Apr-2008 thorpej

Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.266 12-Apr-2008 thorpej

branches: 1.266.2;
Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.


# 1.265 09-Apr-2008 thorpej

- ipflow is not used outside ip_flow.c; move its definition there.
- Make ipflow_reap() private to ip_flow.c, and introduce ipflow_prune()
for external callers to use (avoids returning an ipflow * that is never
actually used anyway).


# 1.264 07-Apr-2008 thorpej

Change IP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ipstat structure; old netstat
binaries will continue to work properly.


# 1.263 27-Mar-2008 cube

- Make sure we send a reasonable fragment size when IPSEC is configured.
Otherwise we end up sending a dubious "0" whenever we cannot find a
proper association for the packet.
- Reset sack_newdata along with snd_nxt to avoid improper integer
arithmetics that lead to sending data from an incorrect place in the
stream, making it appear as corrupted.

Patch by Michael Van Elst, based on an analysis by Michael for the IPSEC
stuff and I for the SACK issue.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.262 06-Feb-2008 matt

branches: 1.262.6;
Add a new ip_id generation scheme based on a Fisher-Yates shuffle over a
sliding window. XXX replace use of arc4random RSN.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.261 14-Jan-2008 dyoung

Use rtcache_validate() instead of rtcache_getrt(). Shorten staircase
in in_losing().


Revision tags: vmlocking2-base3 matt-armv6-base
# 1.260 22-Dec-2007 matt

Fix offset calculation.
Make sure that all frags use the same TOS.


# 1.259 21-Dec-2007 matt

Also make sure the first is at 68 bytes long.


# 1.258 21-Dec-2007 matt

Prevent TCP blind data attacks by not allowing non-initial fragments to
start at less than 68 bytes (minimal fragment size).


# 1.257 20-Dec-2007 dyoung

Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.256 26-Nov-2007 yamt

branches: 1.256.2; 1.256.6;
inetctlerrmap: use designated initializer.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.255 09-Nov-2007 kefren

Don't MCLAIM in ipintr() because we do it anyway in ip_input()


Revision tags: jmcneill-base yamt-x86pmap-base4 yamt-x86pmap-base3 yamt-x86pmap-base2 vmlocking-base
# 1.254 02-Oct-2007 dyoung

branches: 1.254.2; 1.254.4;
Delete the unused second argument to ip_stripoptions(), move it
closer to its single caller in if_eon.c, try to move fewer bytes
by moving the IP header forward instead of moving the tail of the
mbuf backward, and use m_adj(9) instead of fiddling directly with
mbuf data members.


Revision tags: yamt-x86pmap-base
# 1.253 11-Sep-2007 degroote

branches: 1.253.2;
In some FAST_IPSEC, spl level is not restored correctly. Fix that.

Spotted by Wolfgang Stukenbrock in pr/36800


Revision tags: nick-csl-alignment-base5
# 1.252 30-Aug-2007 dyoung

Use malloc(9) for sockaddrs instead of pool(9), and remove dom_sa_pool
and dom_sa_len members from struct domain. Pools of fixed-size
objects are too rigid for sockaddr_dls, whose size can vary over
a wide range.

Return sockaddr_dl to its "historical" size. Now that I'm using
malloc(9) instead of pool(9) to allocate sockaddr_dl, I can create
a sockaddr_dl of any size in the kernel, so expanding sockaddr_dl
is useless.

Avoid using sizeof(struct sockaddr_dl) in the kernel.

Introduce sockaddr_dl_alloc() for allocating & initializing an
arbitrary sockaddr_dl on the heap.

Add an argument, the sockaddr length, to sockaddr_alloc(),
sockaddr_copy(), and sockaddr_dl_setaddr().

Constify: LLADDR() -> CLLADDR().

Where the kernel overwrites LLADDR(), use sockaddr_dl_setaddr(),
instead. Used properly, sockaddr_dl_setaddr() will not overrun
the end of the sockaddr.


# 1.251 10-Aug-2007 dyoung

branches: 1.251.2;
Use sockaddr_dl_init().


Revision tags: matt-mips64-base
# 1.250 19-Jul-2007 dyoung

branches: 1.250.4; 1.250.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base yamt-idlelwp-base8 mjf-ufs-trans-base
# 1.249 02-May-2007 dyoung

branches: 1.249.2;
Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


Revision tags: thorpej-atomic-base
# 1.248 25-Mar-2007 liamjfoy

Add net.inet.ip.hashsize to control the IPv4 fast forward hash table size.


# 1.247 24-Mar-2007 liamjfoy

Don't call ip*flow_reap if we're just looking up maxflows


# 1.246 12-Mar-2007 ad

branches: 1.246.2; 1.246.4;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.


# 1.245 05-Mar-2007 liamjfoy

branches: 1.245.2;
Move ipflow_slowtimo from ip_slowtimo and into in_proto.c

ok matt@


# 1.244 04-Mar-2007 christos

Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.243 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.242 29-Jan-2007 dyoung

branches: 1.242.2;
Cosmetic: remove extraneous, non-KNF parentheses. Change a
sizeof(type) to a sizeof(*ptr) so the correctness of the statement
is correct "at a glance" (or so I hope).


# 1.241 22-Dec-2006 ad

ipintr(): check if the queue is empty before looping. Hardly a giant
win, but removed 30% of splnet() calls in one local test.


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.240 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.239 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


# 1.238 06-Dec-2006 dyoung

KNF.


# 1.237 06-Dec-2006 dyoung

KNF.


Revision tags: netbsd-4-0-RC1 netbsd-4-base
# 1.236 16-Nov-2006 christos

branches: 1.236.2; 1.236.4;
__unused removal on arguments; approved by core.


Revision tags: yamt-splraiseipl-base2
# 1.235 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


# 1.234 10-Oct-2006 dogcow

change the MOWNER_INIT define to take two args; fix extant struct mowner
decls to use it. Makes options MBUFTRACE compile again and not whinge about
missing structure declarations. (Also makes initialization consistent.)


# 1.233 05-Oct-2006 tls

Protect calls to pool_put/pool_get that may occur in interrupt context
with spl used to protect other allocations and frees, or datastructure
element insertion and removal, in adjacent code.

It is almost unquestionably the case that some of the spl()/splx() calls
added here are superfluous, but it really seems wrong to see:

s=splfoo();
/* frob data structure */
splx(s);
pool_put(x);

and if we think we need to protect the first operation, then it is hard
to see why we should not think we need to protect the next. "Better
safe than sorry".

It is also almost unquestionably the case that I missed some pool
gets/puts from interrupt context with my strategy for finding these
calls; use of PR_NOWAIT is a strong hint that a pool may be used from
interrupt context but many callers in the kernel pass a "can wait/can't
wait" flag down such that my searches might not have found them. One
notable area that needs to be looked at is pf.

See also:

http://mail-index.netbsd.org/tech-kern/2006/07/19/0003.html
http://mail-index.netbsd.org/tech-kern/2006/07/19/0009.html


# 1.232 19-Sep-2006 elad

Remove ugly (void *) casts from network scope authorization wrapper and
calls to it.

While here, adapt code for system scope listeners to avoid some more
casts (forgotten in previous run).

Update documentation.


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9
# 1.231 13-Sep-2006 elad

branches: 1.231.2;
Don't use KAUTH_RESULT_* where it's not applicable.
Prompted by yamt@.


# 1.230 08-Sep-2006 elad

First take at security model abstraction.

- Add a few scopes to the kernel: system, network, and machdep.

- Add a few more actions/sub-actions (requests), and start using them as
opposed to the KAUTH_GENERIC_ISSUSER place-holders.

- Introduce a basic set of listeners that implement our "traditional"
security model, called "bsd44". This is the default (and only) model we
have at the moment.

- Update all relevant documentation.

- Add some code and docs to help folks who want to actually use this stuff:

* There's a sample overlay model, sitting on-top of "bsd44", for
fast experimenting with tweaking just a subset of an existing model.

This is pretty cool because it's *really* straightforward to do stuff
you had to use ugly hacks for until now...

* And of course, documentation describing how to do the above for quick
reference, including code samples.

All of these changes were tested for regressions using a Python-based
testsuite that will be (I hope) available soon via pkgsrc. Information
about the tests, and how to write new ones, can be found on:

http://kauth.linbsd.org/kauthwiki

NOTE FOR DEVELOPERS: *PLEASE* don't add any code that does any of the
following:

- Uses a KAUTH_GENERIC_ISSUSER kauth(9) request,
- Checks 'securelevel' directly,
- Checks a uid/gid directly.

(or if you feel you have to, contact me first)

This is still work in progress; It's far from being done, but now it'll
be a lot easier.

Relevant mailing list threads:

http://mail-index.netbsd.org/tech-security/2006/01/25/0011.html
http://mail-index.netbsd.org/tech-security/2006/03/24/0001.html
http://mail-index.netbsd.org/tech-security/2006/04/18/0000.html
http://mail-index.netbsd.org/tech-security/2006/05/15/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/01/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/25/0000.html

Many thanks to YAMAMOTO Takashi, Matt Thomas, and Christos Zoulas for help
stablizing kauth(9).

Full credit for the regression tests, making sure these changes didn't break
anything, goes to Matt Fleming and Jaime Fournier.

Happy birthday Randi! :)


Revision tags: yamt-pdpolicy-base8 rpaulo-netinet-merge-pcb-base
# 1.229 30-Aug-2006 christos

branches: 1.229.2;
fix initializer


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.228 30-Jul-2006 elad

ugh.. more stuff that's overdue and should not be in 4.0: remove the
sysctl(9) flags CTLFLAG_READONLY[12]. luckily they're not documented
so it's only half regression.

only two knobs used them; proc.curproc.corename (check added in the
existing handler; its CTLFLAG_ANYWRITE, yay) and net.inet.ip.forwsrcrt,
that got its own handler now too.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base
# 1.227 07-Jun-2006 kardel

merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base simonb-timecounters-base
# 1.226 08-May-2006 liamjfoy

branches: 1.226.2;
#if -> #ifdef

ok christos


# 1.225 15-Apr-2006 christos

Coverity CID 1134: Protect against NULL deref.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.224 18-Feb-2006 joerg

branches: 1.224.2; 1.224.4; 1.224.6;
Print the source and destination IP in ip_forward's DIAGNOSTIC code
with inet_ntoa, making it more human friendly.

From Liam J. Foy in private mail.


# 1.223 24-Dec-2005 perry

branches: 1.223.2; 1.223.4; 1.223.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.222 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.221 01-Nov-2005 christos

Don't decrement the ttl, until we are sure that we can forward this packet.
Before if there was no route, we would call icmp_error with a datagram
packet that has an incorrect checksum. (From Liam Foy)


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base
# 1.220 23-Oct-2005 christos

No need to pass an interface when only the mtu is needed. From OpenBSD via
Liam Foy.


Revision tags: yamt-vop-base
# 1.219 05-Aug-2005 elad

branches: 1.219.2;
Add sysctls for IP, ICMP, TCP, and UDP statistics.


# 1.218 28-Jun-2005 seanb

branches: 1.218.2;
- Return ICMP_UNREACH_NET when no route found as per
section 4.3.3.1 of rfc1812.


# 1.217 09-Jun-2005 atatat

Properly fix the constipated lossage wrt -Wcast-qual and the sysctl
code. I know it's not the prettiest code, but it seems to work rather
well in spite of itself.


# 1.216 01-Jun-2005 blymn

Unconstify rnode to prevent compile error when GATEWAY option set.


Revision tags: kent-audio2-base
# 1.215 29-Apr-2005 yamt

move decl of inetsw to its own header to avoid array of incomplete type.
found by gcc4. reported by Adam Ciarcinski.


# 1.214 18-Apr-2005 yamt

fix problems related to loopback interface checksum omission. PR/29971.

- for ipv4, defer decision to ip layer as h/w checksum offloading does
so that it can check the actual interface the packet is going to.
- for ipv6, disable it.
(maybe will be revisited when it implements h/w checksum offloading.)

ok'ed by Jason Thorpe.


# 1.213 29-Mar-2005 yamt

ip_reass: clear stale csum_flags.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base
# 1.212 26-Feb-2005 perry

branches: 1.212.2;
nuke trailing whitespace


Revision tags: yamt-km-base2
# 1.211 03-Feb-2005 perry

ANSIfy function declarations


# 1.210 02-Feb-2005 perry

de-__P -- will ANSIfy .c files later.


Revision tags: yamt-km-base
# 1.209 24-Jan-2005 matt

branches: 1.209.2;
Add IFNET_FOREACH and IFADDR_FOREACH macros and start using them.


Revision tags: kent-audio1-beforemerge
# 1.208 19-Dec-2004 christos

branches: 1.208.2;
yamt's changes seem to fix all the checksumming issues. Turn the loopback
checksums back off so we can make sure that everything works.


# 1.207 17-Dec-2004 christos

Turn checksumming on loopback back on until we fix the bugs in it.
Connect over tcp on the loopback is broken:

4729 amq 0.000007 CALL connect(4,0x804f2a0,0x1c)
4729 amq 75.007420 RET connect -1 errno 60 Connection timed out


# 1.206 15-Dec-2004 thorpej

Don't perform checksums on loopback interfaces. They can be reenabled with
the net.inet.*.do_loopback_cksum sysctl.

Approved by: groo


Revision tags: kent-audio1-base
# 1.205 06-Oct-2004 darrenr

Add a comment to document what setting "srcrt" is really on about in ipintr()


# 1.204 29-Sep-2004 christos

PR/27081: Sean Boudreau: ip_input() bad csum count not incremented on sw csum


Revision tags: BEFORE-IPF413
# 1.203 25-May-2004 atatat

Sysctl descriptions under net subtree (net.key not done)


# 1.202 02-May-2004 darrenr

at line 543, we do a pullup here of hlen bytes into the mbuf,
so these later ones are superfluous.


# 1.201 01-May-2004 matt

Use EVCNT_ATTACH_STATIC{,2}


# 1.200 25-Apr-2004 simonb

Initialise (most) pools from a link set instead of explicit calls
to pool_init. Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.

Convert struct session, ucred and lockf to pools.


# 1.199 22-Apr-2004 matt

Constify protosw arrays. This can reduce the kernel .data section by
over 4K (if all the network protocols) are loaded.


# 1.198 01-Apr-2004 matt

In ip_reass_ttl_descr, make i signed since it's compared to >= 0


Revision tags: netbsd-2-0-base BEFORE-IPF411
# 1.197 24-Mar-2004 atatat

branches: 1.197.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.


# 1.196 15-Jan-2004 itojun

correct typo in 1.94 -> 1.95. pointed out by Shiva Shenoy


# 1.195 14-Dec-2003 thorpej

Fix syntax errors in CHECK_NMBCLUSTER_PARAMS().


# 1.194 14-Dec-2003 jonathan

Second part of hashed IP_reassembly changes:

When under pressure for mbufs or we have too many fragments in the IP
reassembly queue, drop half of all fragments. This multiplicative-drop
strategy ensures we return to a healthy state, even under borderline
denial-of-service from extremely lossy NFS-over-UDP peers.
The multiplicative-drop phase currently drops 50% of fragments, but
has pre-placed support for implementing drop-fractions other than 50%

The threshhold for the `drop-half' phase is the new variable,
ip_maxfrags which is calculated as nmbclusters/4.

ip_input.c now keeps ip_nmbclusters, a cached copy of nmbclusters.
Before using limits derived from nmbclusters, we check if nmbclusters
and ip_nmclusters are equal. If not, we recompute Ip parameters
derived from nmbclusters. Based on a suggestion by Jason Thorpe.
ip_maxfrags is currently auto-recalcuated.

The counters ip_nfrags and ip_nfragpacketsr are now declared static
and uninitialized (bss), to discourage tampering with them.


# 1.193 12-Dec-2003 scw

Make fast-ipsec and ipflow (Fast Forwarding) interoperate.

The idea is that we only clear M_CANFASTFWD if an SPD exists
for the packet. Otherwise, it's safe to add a fast-forward
cache entry for the route.

To make this work properly, we invalidate the entire ipflow
cache if a fast-ipsec key is added or changed.


# 1.192 08-Dec-2003 jonathan

Add new field ipq_nfrags to struct ipq. Maintain count of fragments
(fragments, not fragmented packets) in each queue entry.
Use ipq_nfrags to maintain a count of total fragments in reassembly queue.


# 1.191 07-Dec-2003 jonathan

KNF: s/unsigned/u_int/, in a couple of places I missed.


# 1.190 06-Dec-2003 jonathan

Replace the single global IP reassembly list/listhead, with a
hashtable of list-heads. Independently re-invented, then reworked to
match similar code in FreeBSD.


# 1.189 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.188 04-Dec-2003 scw

ipflow (IP fast forwarding) is not compatible with FAST_IPSEC either.

XXX: The decision whether or not to fast forward should be made
XXX: dynamically. Using the current approach seriously reduces
XXX: routing performance on gateways with IPsec enabled.


# 1.187 26-Nov-2003 itojun

define RANDOM_IP_ID by default (unifdef -DRANDOM_IP_ID).
one use remains in sys/netipsec, which is kept for freebsd source code compat.


# 1.186 24-Nov-2003 scw

For FAST_IPSEC, ipfilter gets to see wire-format IPsec-encapsulated packets
only. Decapsulated packets bypass ipfilter. This mimics current behaviour
for Kame IPsec.


# 1.185 19-Nov-2003 fvdl

Correct number of arguments to sysctl_rdint.


# 1.184 19-Nov-2003 jonathan

Patch back support for (badly) randomized IP ids, by request:

* Include "opt_inet.h" everywhere IP-ids are generated with ip_newid(),
so the RANDOM_IP_ID option is visible. Also in ip_id(), to ensure
the prototype for ip_randomid() is made visible.

* Add new sysctl to enable randomized IP-ids, provided the kernel was
configured with RANDOM_IP_ID. (The sysctl defaults to zero, and is
a read-only zero if RANDOM_IP_ID is not configured).

Note that the implementation of randomized IP ids is still defective,
and should not be enabled at all (even if configured) without
very careful deliberation. Caveat emptor.


# 1.183 17-Nov-2003 jonathan

Diff to netinet/ip_input.c (restore ip_id, initialize) for ip_id fix:

Revert the (default) ip_id algorithm to the pre-randomid algorithm,
due to demonstrated low-period repeated IDs from the randomized IP_id
code. Consensus is that the low-period repetition (much less than
2^15) is not suitable for general-purpose use.

Allocators of new IPv4 IDs should now call the function ip_newid().
Randomized IP_ids is now a config-time option, "options RANDOM_IP_ID".
ip_newid() can use ip_random-id()_IP_ID if and only if configured
with RANDOM_IP_ID. A sysctl knob should be provided.

This API may be reworked in the near future to support linear ip_id
counters per (src,dst) IP-address pair.


# 1.182 12-Nov-2003 itojun

KNF


# 1.181 11-Nov-2003 jonathan

Change global head-of-local-IP-address list from in_ifaddr to
in_ifaddrhead. Recent changes in struct names caused a namespace
collision in fast-ipsec, which are most cleanly fixed by using
"in_ifaddrhead" as the listhead name.


# 1.180 10-Nov-2003 jonathan

Make per-protocol network input queue stats visible to userland via
sysctl. Add a protocol-independent sysctl handler to show the per-protocol
"struct ifq' statistics. Add IP(v4) specific call to the handler.
Other protocols can show their per-protocol input statistics by
allocating a sysclt node and calling sysctl_ifq() with their own struct ifq *.

As posted to tech-kern plus improvements/cleanup suggested by Andrew Brown.


# 1.179 28-Sep-2003 mycroft

Remove some code that breaks AH tunnels completely. The comment describing
the purpose of this code appears to be on crack -- it's talking about
end-to-end authentication, but the purpose of an AH tunnel is NOT end-to-end
authentication; it's authentication of the tunnel endpoints.

NB: This does not fix the fact that IPsec leaks "packet tags."


# 1.178 06-Sep-2003 itojun

randomize IPv4/v6 fragment ID and IPv6 flowlabel. avoids predictability
of these fields. ip_id.c is from openbsd. ip6_id.c is adapted by kame.


# 1.177 06-Sep-2003 itojun

backout previous, we don't know if arc4random() corrides on reboot.


# 1.176 05-Sep-2003 itojun

initialize fragment ID with arc4random, not by time.tv_sec


# 1.175 22-Aug-2003 itojun

remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.


# 1.174 22-Aug-2003 itojun

change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.


# 1.173 15-Aug-2003 jonathan

(fast-ipsec): Add hooks to pass IPv4 IPsec traffic into fast-ipsec, if
configured with ``options FAST_IPSEC''. Kernels with KAME IPsec or
with no IPsec should work as before.

All calls to ip_output() now always pass an additional compulsory
argument: the inpcb associated with the packet being sent,
or 0 if no inpcb is available.

Fast-ipsec tested with ICMP or UDP over ESP. TCP doesn't work, yet.


# 1.172 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.171 14-Jul-2003 itojun

correct igmp. from love


# 1.170 03-Jul-2003 itojun

minor KNF


# 1.169 30-Jun-2003 itojun

branches: 1.169.2;
do not generate ICMP redirect when packet filter alters ip_dst to an
address that reside on the same link. Cedric Berger convinced me that
it is necessary.


# 1.168 30-Jun-2003 itojun

fix indent


# 1.167 23-Jun-2003 martin

Make sure to include opt_foo.h if a defflag option FOO is used.


# 1.166 15-Jun-2003 matt

Change the way multicasts are kept. They now use a hash table in the same
manner as the ifaddr hash table. By doing this, the mkludge code can go
away. At the same time, keep track of what pcbs are using what ifaddr and
when an address is deleted from an interface, notify/abort all sockets
that have that address as a source. Switch IGMP and multicasts to use pools
for allocation. Fix a number of potential problems in the igmp code where
allocation failures could cause a trap/panic.


# 1.165 11-Apr-2003 christos

PR/991: Darren Reed: Add a sysctl (checkinteface) to implement this. This
implementation is taken from FreeBSD, but we default to off.
XXX: We should really do this on a per ifaddr basis as jason suggested.


# 1.164 26-Feb-2003 matt

Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.163 12-Nov-2002 itojun

remove all entries in rt timer queue on ip_mtudisc change, instead of
destroying the queue.


# 1.162 12-Nov-2002 itojun

ckout previous - doesn't compile


# 1.161 12-Nov-2002 itojun

update ip_mtudisc sysctl change handling.


# 1.160 10-Nov-2002 itojun

always create pmtud timeout queue, as ip_mtudisc can be tweaked via
sysctl at runtime. From lha@stacken.kth.se


# 1.159 02-Nov-2002 perry

/*CONTCOND*/ while (0)'ed macros


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.158 23-Sep-2002 itojun

revert mtudisc_timeout value to the old one if update falis


# 1.157 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame.


# 1.156 11-Sep-2002 itojun

correct signedness mixup in pointer passing. sync w/kame


Revision tags: gehenna-devsw-base
# 1.155 14-Aug-2002 itojun

avoid swapping endian of ip_len and ip_off on mbuf, to meet with M_LEADINGSPACE
optimization made last year. should solve PR 17867 and 10195.

IP_HDRINCL behavior of raw ip socket is kept unchanged. we may want to
provide IP_HDRINCL variant that does not swap endian.


# 1.154 30-Jun-2002 thorpej

Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).


# 1.153 13-Jun-2002 itojun

set IPv4 parameter to modern value.
- turn on path MTU discovery (previous: turned off)
- ICMPv4 redirect entry timeout = 600 sec (previous: never timeout)


# 1.152 09-Jun-2002 itojun

whitespace


# 1.151 07-Jun-2002 itojun

look at rmx_mtu on IPsec tunnel MTU computation.
From: David Waitzman <djw@bbn.com>


Revision tags: netbsd-1-6-base
# 1.150 12-May-2002 matt

branches: 1.150.2; 1.150.4;
Eliminate commons.


# 1.149 12-May-2002 wiz

Spelling fixes, from Sergey Svishchev in kern/16650.


# 1.148 07-May-2002 matt

Change struct ipqe to use TAILQ's instead of LIST's (primarily for TCP's
benefit currently). Rework tcp_reass code to optimize the 4 most likely causes
of out-of-order packets: first OoO pkt, next OoO pkt in seq, OoO pkt is part
of new chuck of OoO packets, and the OoO pkt fills the first hole. Add evcnts
to instrument tcp_reass (enabled by the options TCP_REASS_COUNTERS). This is
part 1/2 of tcp_reass changes.


# 1.147 18-Apr-2002 matt

Change test for M_EXT to M_READONLY for MROUTING. We only need to to do
a pullup if we aren't allowed to modify the packet.


Revision tags: eeh-devprop-base newlock-base
# 1.146 08-Mar-2002 thorpej

Pool deals fairly well with physical memory shortage, but it doesn't
deal with shortages of the VM maps where the backing pages are mapped
(usually kmem_map). Try to deal with this:

* Group all information about the backend allocator for a pool in a
separate structure. The pool references this structure, rather than
the individual fields.
* Change the pool_init() API accordingly, and adjust all callers.
* Link all pools using the same backend allocator on a list.
* The backend allocator is responsible for waiting for physical memory
to become available, but will still fail if it cannot callocate KVA
space for the pages. If this happens, carefully drain all pools using
the same backend allocator, so that some KVA space can be freed.
* Change pool_reclaim() to indicate if it actually succeeded in freeing
some pages, and use that information to make draining easier and more
efficient.
* Get rid of PR_URGENT. There was only one use of it, and it could be
dealt with by the caller.

From art@openbsd.org.


Revision tags: ifpoll-base
# 1.145 25-Feb-2002 itojun

correctly enforce ipsec policy check on forwarding case.
From: Greg Troxel <gdt@ir.bbn.com>, Bill Chiarchiaro <wjc@work.cleartech.com>


# 1.144 24-Feb-2002 martin

Clear M_BCAST and M_MCAST on outgoing mbufs.
Don't copy ttl from the inner packet to the encapsulating packet. Make
the outer ttl sysctl'able. This should close PR 14269 from Jasper Wallace
(change partly from there) and it makes traceroute work over gre tunnels.


# 1.143 21-Feb-2002 itojun

suppress source quence message, based on router-req RFC (also could be abused
as DoS traffic generator). from kjc/kame


# 1.142 28-Nov-2001 darrenr

recompute hlen after calling pfil_run_hooks() in case ip_hl was changed.


# 1.141 13-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-mips-cache-base
# 1.140 04-Nov-2001 matt

Convert netinet to not use the internal <sys/queue.h> field names
but instead the access macros. Use the FOREACH macros where appropriate.


# 1.139 04-Nov-2001 matt

Change a few variable/tables to const since they are read-only.


# 1.138 29-Oct-2001 simonb

Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.137 17-Sep-2001 thorpej

branches: 1.137.2;
Split the pre-computed ifnet checksum flags into Tx and Rx directions.
Add capabilities bits that indicate an interface can only perform
in-bound TCPv4 or UDPv4 checksums. There is at least one Gig-E chip
for which this is true (Level One LXT-1001), and this is also the
case for the Intel i82559 10/100 Ethernet chips.


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.136 06-Aug-2001 itojun

branches: 1.136.2;
cache IPsec policy on in6?pcb. most of the lookup operations can be bypassed,
especially when it is a connected SOCK_STREAM in6?pcb. sync with kame.


# 1.135 02-Jun-2001 thorpej

branches: 1.135.2;
Implement support for IP/TCP/UDP checksum offloading provided by
network interfaces. This works by pre-computing the pseudo-header
checksum and caching it, delaying the actual checksum to ip_output()
if the hardware cannot perform the sum for us. In-bound checksums
can either be fully-checked by hardware, or summed up for final
verification by software. This method was modeled after how this
is done in FreeBSD, although the code is significantly different in
most places.

We don't delay checksums for IPv6/TCP, but we do take advantage of the
cached pseudo-header checksum.

Note: hardware-assisted checksumming defaults to "off". It is
enabled with ifconfig(8). See the manual page for details.

Implement hardware-assisted checksumming on the DP83820 Gigabit Ethernet,
3c90xB/3c90xC 10/100 Ethernet, and Alteon Tigon/Tigon2 Gigabit Ethernet.


# 1.134 21-May-2001 lukem

fix spelo in comment


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.133 16-Apr-2001 itojun

give a default value to net.inet.ip.maxfragpackets, to protect us from
"lots of fragmented packets" DoS attack.

the current default value is derived from ipv6 counterpart, which is
a magical value "200". it should be enough for normal systems, not sure
if it is enough when you take hundreds of thousands of tcp connections on
your system. if you have proposal for a better value with concrete reasons,
let me know.


# 1.132 13-Apr-2001 thorpej

Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.


# 1.131 27-Mar-2001 itojun

net.inet.ip.maxfragpackets defines the maximum size of ip reass queue
(prevents fragment flood from chewing up mbuf memory space).
derived from KAME net.inet6.ip6.maxfragpackets.


# 1.130 02-Mar-2001 itojun

branches: 1.130.2;
increase ipstat.ips_badaddr if the packet fails to pass address checks.


# 1.129 02-Mar-2001 itojun

reject packets with 127/8 on IPv4 src/dst, they must not appear on wire
(RFC1122). torture-tests will be welcomed.
XXX do we want to check source routing headers as well?


# 1.128 01-Mar-2001 itojun

make sure to enforce inbound ipsec policy checking, for any protocols on top
of ip (check it when final header is visited). sync with kame.
XXX kame team will need to re-check policy engine code


# 1.127 24-Jan-2001 itojun

- record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation


# 1.126 28-Dec-2000 thorpej

Back out the sledgehammer damage applied by wiz while I was out for
the holiday.


# 1.125 25-Dec-2000 wiz

Back out previous change. It causes NAT to fail, and was CLEARLY
NOT TESTED before it was committed.


# 1.124 22-Dec-2000 thorpej

Slight adjustment to how pfil_head's are registered. Instead of a
"key" and a "dlt", use a "type" (PFIL_TYPE_{AF,IFNET} for now) and
a val/ptr appropriate for that type. This allows for more future
flexibility with the pfil_hook mechanism.


# 1.123 14-Dec-2000 thorpej

Add ALTQ glue. XXX Temporary until ALTQ is changed to use a pfil hook.


# 1.122 24-Nov-2000 itojun

IFA_STATS stability (not complete); don't touch ip if it is NULL.


# 1.121 11-Nov-2000 thorpej

Restructure the PFIL_HOOKS mechanism a bit:
- All packets are passed to PFIL_HOOKS as they come off the wire, i.e.
fields in protocol headers in network order, etc.
- Allow for multiple hooks to be registered, using a "key" and a "dlt".
The "dlt" is a BPF data link type, indicating what type of header is
present.
- INET and INET6 register with key == AF_INET or AF_INET6, and
dlt == DLT_RAW.
- PFIL_HOOKS now take an argument for the filter hook, and mbuf **,
an ifnet *, and a direction (PFIL_IN or PFIL_OUT), thus making them
less IP (really, IP Filter) centric.

Maintain compatibility with IP Filter by adding wrapper functions for
IP Filter.


# 1.120 08-Nov-2000 ad

Update for hashinit() change.


# 1.119 13-Oct-2000 itojun

make sure we don't share external mbuf between m and mcopy, in ip_forward().
should solve PR 11201.


# 1.118 26-Aug-2000 itojun

make sure anonport{min,max} is not negative number


# 1.117 25-Aug-2000 tron

Add new sysctl variables "net.inet.ip.lowportmin" and
"net.inet.ip.lowportmax" which can be used to the set minimum
and maximum port number assigned to sockets using
IP_PORTRANGE_LOW.


# 1.116 06-Jul-2000 itojun

remove unnecessary #include <netkey/key_debug.h>. from kame.


# 1.115 28-Jun-2000 mrg

<vm/vm.h> -> <uvm/uvm_extern.h>


Revision tags: netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.114 10-May-2000 itojun

branches: 1.114.4;
add missing boundary checks to ip options processing.
correct timestamp option validation (len and ptr upper/lower bound
based on RFC791).
fill "pointer" field for parameter problem in timestamp option processing.


# 1.113 10-May-2000 itojun

correct more out-of-bounds memory access, if cnt == 1 and optlen > 1.


# 1.112 06-May-2000 sommerfeld

Handle large offsets with very small options correctly.


# 1.111 31-Mar-2000 jdolecek

Slighly improve previous - only include <netinet/ip_mroute.h> if MROUTING
is defined.


# 1.110 31-Mar-2000 jdolecek

include <netinet/ip_mroute.h> for ip_mforward() - needed after
last duplicate prototype sweep (prototype for ip_mforward() used to be in <netinet/ip_var.h>)


# 1.109 30-Mar-2000 augustss

Remove register declarations.


# 1.108 30-Mar-2000 simonb

Delete uninitialised declaration of ip_defttl - there's an initialised
decl earlier in this file.


# 1.107 10-Mar-2000 thorpej

Back out previous, and adjust a comment.


# 1.106 07-Mar-2000 thorpej

Back out part of 1.104 which isn't actually needed.


# 1.105 03-Mar-2000 itojun

remove unnecessary ttl initialization which I mistakingly bringed in
during KAME merge (this is part of WIDE's expeirmental reass code...)
NetBSD PR: 9412
From: Wolfgang Rupprecht <wolfgang@wsrcc.com>
Fix from: ho@crt.se
itojun was notified from: theo


# 1.104 02-Mar-2000 thorpej

Avoid a bug in GCC which manifests itself when processing unaligned
IP options. Problem pointed out by Matt Hargett and Erik Fair, analyzed
by me.


# 1.103 01-Mar-2000 itojun

introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)


# 1.102 20-Feb-2000 darrenr

pass "struct pfil_head *" to pfil_add_hook and pfil_remove hook rather
than "struct protosw *".


# 1.101 17-Feb-2000 darrenr

Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.


# 1.100 16-Feb-2000 itojun

- if ip_dst matches address on !IFF_UP interface, and
- there's no match against addresses on IFF_UP interface,
send icmp unreach if I'm router. drop it if I'm host.

Revised version of PR: 9387 from nrt@iij.ad.jp. Discussed with thorpej+nrt.


Revision tags: chs-ubc2-newbase
# 1.99 12-Feb-2000 thorpej

Typo (Thanks, Havard :-)


# 1.98 12-Feb-2000 thorpej

Small cosmetic change, and note a place where a statistic should be
gathered.


# 1.97 11-Feb-2000 itojun

fix in-kernel packet forwarding loop (till TTL becomes 0) when:
- a packet is delivered to an address X,
- and the address X is configured on my !IFF_UP interface
- and ipforwarding=1

NetBSD PR: 9387
From: nrt@iij.ad.jp


# 1.96 01-Feb-2000 thorpej

Use ifatoia() and sintosa() consistently, rather than using home-grown
casting macros intermixed.


# 1.95 31-Jan-2000 itojun

bring in latest KAME ipsec tree.
- interop issues in ipcomp is fixed
- padding type (after ESP) is configurable
- key database memory management (need more fixes)
- policy specification is revisited

XXX m->m_pkthdr.rcvif is still overloaded - hope to fix it soon


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.94 26-Oct-1999 itojun

disable ipflow (IPv4 fast fowarding) when IPsec is configured into the kernel.


# 1.93 17-Oct-1999 sommerfeld

branches: 1.93.2; 1.93.4;
In ip_forward():

Avoid forwarding ip unicast packets which were contained inside
link-level multicast packets; having M_MCAST still set in the packet
header flags will mean that the packet will get multicast to a bogus
group instead of unicast to the next hop.

Malformed packets like this have occasionally been spotted "in the
wild" on a mediaone cable modem segment which also had multiple netbsd
machines running as router/NAT boxes.

Without this, any subnet with multiple netbsd routers receiving all
multicasts will generate a packet storm on receipt of such a
multicast. Note that we already do the same check here for link-level
broadcasts; ip6_forward already does this as well.

Note that multicast forwarding does not go through ip_forward().

Adding some code to if_ethersubr to sanity check link-level
vs. ip-level multicast addresses might also be worthwhile.


Revision tags: chs-ubc2-base
# 1.92 23-Jul-1999 itojun

branches: 1.92.2;
do not include unnecessary include files.


# 1.91 09-Jul-1999 thorpej

defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).


# 1.90 06-Jul-1999 itojun

sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups


# 1.89 01-Jul-1999 itojun

IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.88 26-Jun-1999 sommerfeld

If the new global variable hostzerobroadcast is zero, no longer assume
address zero of each net/subnet is a broadcast address.
(The default value is nonzero, which preserves the current behavior).

This can be set using sysctl; the boot-time default can also be
configured using the HOSTZEROBROADCAST kernel config option.

While we're here, defopt HOSTZEROBROADCAST and SUBNETSARELOCAL


# 1.87 04-May-1999 hwr

It does not make much sense to increase a "output" counter on input.


# 1.86 03-May-1999 thorpej

In INADDR_TO_IA(), skip interfaces which are not up. Revert previous change
to ip_input.c to check the interface status after INADDR_TO_IA().

Fix cooked up by Heiko Rupp and myself.

Fixes PR 7480.


# 1.85 03-May-1999 hwr

Drop packets, that have a Class-D address as source address.
Implements the first half of PR 7003.


# 1.84 07-Apr-1999 proff

tiny KNF change


# 1.83 07-Apr-1999 proff

Prevent reception of packets on downed interfaces (via an up interface).
fixes kern/7327


Revision tags: netbsd-1-4-base
# 1.82 27-Mar-1999 aidan

branches: 1.82.2;
Added per-addr input/output statistics. Currently just support netatalk
and netinet, currently only tested under netinet.

Disabled by default, enabled by compiling the kernel with option
IFA_STATS. Enabling this feature seems to make the ip_output function
take 13% longer than before, which should be OK for people that need
this feature.


# 1.81 26-Mar-1999 proff

security: test for ip_len < ip_hl <<2 and drop packet accordingly


# 1.80 19-Jan-1999 mycroft

There's just no plausible reason to byte-swap ip_id internally. It's opaque.


# 1.79 19-Jan-1999 mycroft

Don't screw with ip_len; just subtract from it where we actually use the
value.


# 1.78 19-Jan-1999 mycroft

Don't overwrite the checksum fields when checking them. There's no reason to
do this, and it screws up ICMP replies.
XXX The returned IP checksum and length are still wrong.


# 1.77 11-Jan-1999 thorpej

Fix byte order and ip_len inconsistencies in ICMP reply code. Also, fix
some formatting and HTONS(foo) vs. foo = htons(foo) inconsistencies.

PR #6602, Darren Reed.


# 1.76 19-Dec-1998 thorpej

Reverse the copyright-notice-swap. It went against existing practice.


# 1.75 18-Dec-1998 thorpej

Add a lock around the IP fragment reassembly queue, to prevent ip_drain()
from corrupting the queue if called from a device's interrupt context.

Should fix PR #5684.


Revision tags: kenh-if-detach-base
# 1.74 13-Nov-1998 thorpej

branches: 1.74.2;
Once a fragmented IP packet has been reassembled, recompute the packet
length before passing it up the stack. From FreeBSD.


Revision tags: chs-ubc-base
# 1.73 08-Oct-1998 thorpej

Use the pool allocator for ipflow entries.


# 1.72 08-Oct-1998 thorpej

Use the pool allocator for ipqent structures.


# 1.71 30-Sep-1998 tls

Switch order of TNF and UCB copyrights so UCB copyright is first; this seems more appropriate since UCB wrote the original code, after all.


# 1.70 09-Sep-1998 thorpej

Make a diagnostic printf more sensible, PR #5951, Heiko W. Rupp.


# 1.69 09-Aug-1998 mrg

defopt PFIL_HOOKS.


Revision tags: eeh-paddr_t-base
# 1.68 17-Jul-1998 sommerfe

Fix PR5508: ipfil cut-through forwarding causes panic


# 1.67 01-Jun-1998 thorpej

Protect the ipflow_reap() call with splsoftnet.


# 1.66 24-May-1998 thorpej

Fix OBOB in IP timestamp option processing, as noted in FreeBSD PR 6738,
from Jennifer Dawn Meyers <jdm@enteract.com>.


# 1.65 04-May-1998 matt

Default IP flow to being enabled. Add a sysctl to control the maximum
number of flows (net.inet.ip.maxflows). If set to 0, will disable fast
path forwarding.


# 1.64 01-May-1998 thorpej

Allow packet filters to prevent a packet from creating a fast-forwarding
flow, by setting the "can fast forward" flag in the packet header, and
giving a chance for filters to clear the flag. If the flag is still
set after the filters have given it a chance, the packet will be used
to create a fast-forward flow entry.


# 1.63 29-Apr-1998 matt

Add support for "fast" forwarding. Add hooks in if_ethersubr.c and
if_fddisubr.c to fastpath IP forwarding. If ip_forward successfully
forwards a packet, it will create a cache (ipflow) entry. ether_input
and fddi_input will first call ipflow_fastforward with the received
packet and if the packet passes enough tests, it will be forwarded (the
ttl is decremented and the cksum is adjusted incrementally).


# 1.62 29-Apr-1998 matt

defopt GATEWAY


# 1.61 29-Apr-1998 kml

change path MTU timeout value to match RFC 1191


# 1.60 29-Apr-1998 kml

Add support for deletion of routes added by path MTU discovery;
uses new generic route timeout code. Add sysctl for timeout period.


# 1.59 19-Mar-1998 mrg

convert pfil(9) in and out lists from <sys/queue.h> LISTs to TAILQs, and
change pfil_add_hook to put output filters at the tail of the queue,
while continuing to place input filters at the head of the queue. update
the two users of these functions, and document these changes.

fixes PR#4593.


# 1.58 15-Feb-1998 tls

Add correct copyright notice for IP address hash change. This code is donated to TNF by the original copyright holder, Panix.


# 1.57 13-Feb-1998 tls

Change list of interface IP addresses to a hash. Improves performance on hosts with a large number of IP addresses significantly.


# 1.56 28-Jan-1998 thorpej

Use offsetof() from libkern.h


# 1.55 12-Jan-1998 scottr

Use option header file for MROUTING


# 1.54 05-Jan-1998 lukem

enhance ephemeral port allocation code:
* support sysctl net.inet.ip.anonportmin (lowest ephemeral port)
and net.inet.ip.anonportmax (highest ephemeral port).
these can't be set to >65535, < IPPORT_RESERVED (unless IPNOPRIVPORTS
is defined), and anonportmin has to be < anonportmax.
* use a cleaner way of only cycling through the available set once;
this will be useful for when a random allocation scheme is used
* define IPPORT_ANON{MIN,MAX} instead of IPPORT_USER{LOW,HIGH}


Revision tags: netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base
# 1.53 18-Oct-1997 kml

branches: 1.53.2;
change sysctl net.inet.icmp.mtudisc to net.inet.ip.mtudisc


# 1.52 17-Oct-1997 thorpej

Allow `subnetsarelocal' to be changed via sysctl.


Revision tags: thorpej-signal-base marc-pcmcia-base
# 1.51 29-Aug-1997 gwr

Tweaks to allow operation with an interface address of 0.0.0.0
(needed for NFS mountroot using BOOTP to get boot parameters)


Revision tags: marc-pcmcia-bp
# 1.50 24-Jun-1997 thorpej

branches: 1.50.4;
Eliminate use of dtom() from the network code, allowing more flexible
use of mbuf external storage and increasing performance (by eliminating
an m_pullup() for clusters in the IP reassembly code).

Changes from Koji Imada <koji@math.human.nagoya-u.ac.jp>, in PR #3628
and #3480, with ever-so-slight integration changes by me.


# 1.49 15-Apr-1997 christos

Move the mtod calls *after* we've made sure that the packet has passed the
filter successfully. Otherwise it can be NULL if the filter blocked it,
and we die. How did this ever work?


Revision tags: is-newarp-before-merge
# 1.48 26-Feb-1997 mrg

allow src-routed packetd by default, per host requirements


# 1.47 25-Feb-1997 cjs

Add net.inet.ip.allowsrcrt option which allows/drops all source
routed packets. This currently defaults to `drop,' but once we
verify that all applications that rely on determining remote IP
addresses for authentication are dropping the connection when they
see a source route option (not just disabling the source route
option), we can turn this back on and conform with the host
requirements.


# 1.46 19-Feb-1997 cjs

Fix bug in sysctl net.inet.ip.forwsrcrt handing: now you can read it
if securelevel > 0. (Thanks, cgd.)


# 1.45 18-Feb-1997 mrg

pseudo-device ipfilter brings in PFIL_HOOKS.


Revision tags: is-newarp-base
# 1.44 11-Jan-1997 thorpej

branches: 1.44.4;
Implement the IP_RECVIF socket option: supply a datagram packet's incoming
interface using a sockaddr_dl in a control mbuf.

Implement SO_TIMESTAMP for IP datagrams.

Move packet information option processing into a generic function
so that they work with multicast UDP and raw IP as well as unicast UDP.

Contributed by Bill Fenner <fenner@parc.xerox.com>.


# 1.43 20-Dec-1996 mrg

in pfil_hooks: always reassign ip after calling hook.


# 1.42 20-Dec-1996 mrg

remove pfil_bad.


# 1.41 25-Oct-1996 thorpej

Before concatenating frags, sanity check the length of the packet. If it's
larger than IP_MAXPACKET, discard it.
Based on a patch from Bill Fenner <fenner@parc.xerox.com>


# 1.40 22-Oct-1996 veego

Fix a panic from the pfil_hooks.


# 1.39 13-Oct-1996 christos

backout previous kprintf changes


# 1.38 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.37 21-Sep-1996 perry

commit fix in pr 2772 -- the IP input code was assuming that the
reserved (must be zero) flag must necessarily be zero. We now define
an IP_RF (by analogy to IP_DF and IP_MF) and mask it out when necessary.


# 1.36 14-Sep-1996 mrg

move the packet filter hooks in to a saner location. while i'm here, rename
PACKET_FILTER to PFIL_HOOKS.


# 1.35 09-Sep-1996 mycroft

Add in_nullhost() and in_hosteq() macros, to hide some protocol
details. Also, fix a bug in TCP wrt SYN+URG packets.


# 1.34 08-Sep-1996 mycroft

Save 68 bytes of the packet for ICMP, not 64. From Laine Stump, PR 2296.


# 1.33 06-Sep-1996 mrg

add packet filter interface code. see pfil(9) for more details. you
need the PACKET_FILTER option to enable this code. currently, ipfilter
version 3.1.1-beta has been converted to use this new interface.


# 1.32 14-Aug-1996 thorpej

Fix some DIAGNOSTIC printf() formats; ntohl() provides a 32-bit quantity,
and should be printed with %x, not %lx.


# 1.31 10-Jul-1996 cgd

print result of ntohl/htonl as a long. (makes -Wformat work on the
Alpha.)


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.30 16-Mar-1996 christos

branches: 1.30.4;
Fix printf format args.


# 1.29 26-Feb-1996 mrg

two more local addr changes, all done differently now (idea from charles)


# 1.28 13-Feb-1996 christos

netinet prototypes


# 1.27 16-Jan-1996 thorpej

Add a net.inet.ip.directed-broadcast sysctl as suggested by
Darren Reed <darrenr@vitruvius.arbld.unimelb.edu.au> in PR #1227.
This change is slightly different than the one submitted by Darren in
that the DIRECTED_BROADCAST compile-time option will behave like it used
to so that existing configurations utilizing it won't have to change.


# 1.26 15-Jan-1996 thorpej

Add net.inet.ip.forwsrcrt: if zero, the system will not forward
source-routed packets. Note this value is protected by kernel security
level; it can only be changed if securelevel < 1.


# 1.25 21-Nov-1995 cgd

make netinet work on systems where pointers and longs are 64 bits
(like the alpha). Biggest problem: IP headers were overlayed with
structure which included pointers, and which therefore didn't overlay
properly on 64-bit machines. Solution: instead of threading pointers
through IP header overlays, add a "queue element" structure to do
the threading, and point it at the ip headers.


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.24 12-Aug-1995 mycroft

splnet --> splsoftnet


# 1.23 12-Jun-1995 mycroft

Change in_pcbnotify*() to take an errno value. Make inetctlerrmap[] an
array on ints, not u_chars.


# 1.22 12-Jun-1995 mycroft

Various cleanup, including:
* Convert several data structures to use queue.h.
* Split in_pcbnotify() into two parts; one for notifying a specific PCB, and
one for notifying all PCBs for a particular foreign address.


# 1.21 07-Jun-1995 mycroft

Remove ip_ifmatrix completely.


# 1.20 04-Jun-1995 mycroft

Don't cast things unnecessarily.


# 1.19 04-Jun-1995 mycroft

Clean up many more casts.


# 1.18 01-Jun-1995 mycroft

Avoid byte-swapping IP addresses at run time.


# 1.17 15-May-1995 cgd

oops; forgot a '{'


# 1.16 14-May-1995 cgd

drop (and record) malformed IP fragments. Fixes pr 1030 (differently).


# 1.15 13-Apr-1995 cgd

be a bit more careful and explicit with types. (basically a large no-op.)


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.14 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.13 13-May-1994 mycroft

Update to 4.4-Lite networking code, with a few local changes.


# 1.12 14-Feb-1994 mycroft

PARANOID --> DIAGNOSTIC for inexpensive tests.


# 1.11 02-Feb-1994 hpeyerl

Multicast is no longer optional.


# 1.10 29-Jan-1994 brezak

Fix some cases of NOT dealing with m_pkthdr's. This code is still suspect though, at least this fixes some panics.


# 1.9 10-Jan-1994 mycroft

Should compile now with or without `options MULTICAST'.


# 1.8 09-Jan-1994 mycroft

Prototype the rest.


# 1.7 08-Jan-1994 mycroft

More prototypes.


# 1.6 08-Jan-1994 mycroft

Fix some inconsistent spacing; spaces at the end of lines, etc.


# 1.5 18-Dec-1993 mycroft

Canonicalize all #includes.


# 1.4 06-Dec-1993 hpeyerl

multicast support.
>From Chris Maeda, cmaeda@cs.washington.edu
These patches are derived from the IP Multicast patches for BSDI.


Revision tags: magnum-base netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.3 20-May-1993 cgd

branches: 1.3.4;
more rcsid additions and file header cleanups


# 1.2 04-May-1993 cgd

make ip_input recursion checking be for -DPARANOID, and make it panic


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.393 13-Nov-2019 ozaki-r

Get rid of unnecessary NULL checks for rt_ifa and ifa_ifp

They are always non-NULL nowadays.


# 1.392 19-Sep-2019 ozaki-r

Apply some missing changes lost on the previous commit


# 1.391 19-Sep-2019 ozaki-r

Avoid having a rtcache directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.
A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.

Reviewed by knakahara@ and yamaguchi@


# 1.390 15-Sep-2019 bouyer

Packet filters can return an mbuf chain with fragmented headers, so
m_pullup() it if needed and remove the KASSERT()s.


Revision tags: netbsd-9-base phil-wifi-20190609
# 1.389 13-May-2019 ozaki-r

branches: 1.389.2;
Count packets dropped by pfil


Revision tags: isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118
# 1.388 17-Jan-2019 knakahara

Fix ipsecif(4) cannot apply input direction packet filter. Reviewed by ozaki-r@n.o and ryo@n.o.

Add ATF later.


Revision tags: pgoyette-compat-1226 pgoyette-compat-1126
# 1.387 15-Nov-2018 maxv

Remove the 't' argument from m_tag_find().


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.386 02-Sep-2018 maxv

remove reference to ipnat, and duplicate comments


Revision tags: pgoyette-compat-0728
# 1.385 10-Jul-2018 maxv

Remove the second argument from ip_reass_packet(). We want the IP header
on the mbuf, not elsewhere. Simplifies the NPF reassembly code a little.
No real functional change.


Revision tags: phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.384 17-May-2018 maxv

branches: 1.384.2;
Add KASSERTs, related to PR/39794.


# 1.383 14-May-2018 maxv

Merge ipsec4_input and ipsec6_input into ipsec_ip_input. Make the argument
a bool for clarity. Optimize the function: if M_CANFASTFWD is not there
(because already removed by the firewall) leave now.

Makes it easier to see that M_CANFASTFWD is not removed on IPv6.


# 1.382 10-May-2018 maxv

Rename ipsec4_forward -> ipsec_mtu, and switch to void.


Revision tags: pgoyette-compat-0502
# 1.381 26-Apr-2018 maxv

Remove unused mbuf argument from sbsavetimestamp.


Revision tags: pgoyette-compat-0422 pgoyette-compat-0415
# 1.380 15-Apr-2018 maxv

Introduce a m_verify_packet function, that verifies the mbuf chain of a
packet to ensure it is not malformed. Call this function in "points of
interest", that are the IPv4/IPv6/IPsec entry points. There could be more.

We use M_VERIFY_PACKET(m), declared under DIAGNOSTIC only.

This function should not be called everywhere, especially not in places
that temporarily manipulate (and clobber) the mbuf structure; once they're
done they put the mbuf back in a correct format.


# 1.379 11-Apr-2018 maxv

Don't pass IP_ALLOWBROADCAST in ipsec4_input. The flag lands in
ipsec_getpolicybyaddr, and only IP_FORWARDING is taken.

In fact it would be good to change the 'flags' argument of ipsec4_input
to be a boolean, same for ipsec_getpolicybyaddr. It would be less
misleading.


# 1.378 11-Apr-2018 maxv

Add comment about IPsec.


# 1.377 11-Apr-2018 maxv

Small changes in ip_dooptions: replace bcopy by memcpy, the areas can't
overlap.


Revision tags: pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.376 24-Feb-2018 ozaki-r

branches: 1.376.2;
Avoid a deadlock between softnet_lock and IFNET_LOCK

A deadlock occurs because there is a violation of the rule of lock ordering;
softnet_lock is held with hodling IFNET_LOCK, which violates the rule.
To avoid the deadlock, replace softnet_lock in in_control and in6_control
with KERNEL_LOCK.

We also need to add some KERNEL_LOCKs to protect the network stack surely.
This is required, for example, for PR kern/51356.

Fix PR kern/53043


# 1.375 09-Feb-2018 maxv

Remove dead code.


# 1.374 07-Feb-2018 maxv

Remove null check on ip, it can't be null. (Confuses code scanners.)


# 1.373 06-Feb-2018 maxv

Typos and style a bit, no functional change.


# 1.372 05-Feb-2018 maxv

Exterminate IPSENDREDIRECTS and IPMTUDISCTIMEOUT, neither is documented.


# 1.371 05-Feb-2018 maxv

Nuke DIRECTED_BROADCAST, it is not documented and not enabled anywhere. It
probably wouldn't have built correctly anyway, since there is no associated
defflag.

These ten lines of code in ip_input.c already look a lot better.


# 1.370 05-Feb-2018 maxv

Clean up this mess. This is typically the kind of places where we need to
seriously cut the bullshit. These things are unreadable, undocumented, and
all they bought us was not figuring out we had IPv4 forwarding enabled by
default for 20+ years.


# 1.369 05-Feb-2018 maxv

Be tougher, and don't allow LSRR+SSRR (RFC7126).


# 1.368 05-Feb-2018 maxv

Kick duplicate options, they are not allowed (RFC791).


# 1.367 05-Feb-2018 maxv

Remove unused variable.


# 1.366 05-Feb-2018 maxv

Disable ip_allowsrcrt and ip_forwsrcrt. Enabling them by default was a
completely dumb idea, because they have security implications.

By sending an IPv4 packet containing an LSRR option, an attacker will
cause the system to forward the packet to another IPv4 address - and
this way he white-washes the source of the packet.

It is also possible for an attacker to reach hidden networks: if a server
has a public address, and a private one on an internal network (network
which has several internal machines connected), the attacker can send a
packet with:

source = 0.0.0.0
destination = public address of the server
LSRR first address = address of a machine on the internal network

And the packet will be forwarded, by the server, to the internal machine,
in some cases even with the internal IP address of the server as a source.


# 1.365 05-Feb-2018 maxv

Style, no functional change.


# 1.364 01-Jan-2018 christos

1) "#define ipi_spec_dst ipi_addr" in <netinet/in.h>
2) Change the IP_RECVPKTINFO option to control the generation of
IP_PKTINFO control messages, the way it's done in Solaris.
3) Remove the superfluous IP_RECVPKTINFO control message.
4) Change the IP_PKTINFO option to do different things depending on
the parameter it's supplied with:
- If it's sizeof(int), assume it's being used as in Linux:
- If it's non-zero, turn on the IP_RECVPKTINFO option.
- If it's zero, turn off the IP_RECVPKTINFO option.
- If it's sizeof(struct in_pktinfo), assume it's being used as in
Solaris, to set a default for the source interface and/or
source address for outgoing packets on the socket.
5) Return what Linux or Solaris compatible code expects, depending
on data size, and just added a fallback to a Linux (and current NetBSD)
compatible value if the size is unknown (as it is now), or,
in the future, if the calling application specifies a receiving
buffer that doesn't match either data item.

From: Tom Ivar Helbekkmo


Revision tags: tls-maxphys-base-20171202
# 1.363 24-Nov-2017 roy

Allow local communication over DETACHED addresses.
Allow binding to DETACHED or TENTATIVE addresses as we deny
sending upstream from them anyway.
Prefer non DETACHED or TENTATIVE addresses.


# 1.362 17-Nov-2017 ozaki-r

Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch

It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.

No functional change


# 1.361 27-Sep-2017 ozaki-r

Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).


Revision tags: nick-nhusb-base-20170825
# 1.360 27-Jul-2017 ozaki-r

Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.


# 1.359 19-Jul-2017 ozaki-r

Correct a comment


Revision tags: perseant-stdc-iso10646-base
# 1.358 08-Jul-2017 christos

Reorder the controls to the ones that need an interface and the ones that
don't; process the ones that don't first. Add a DIAGNOSTIC if there is no
interface; really this should be a KASSERT/panic because it is a bug if the
interface is not set at this point.


# 1.357 06-Jul-2017 christos

remove unnecessary casts (no functional change)


# 1.356 06-Jul-2017 christos

Merge the two copies SO_TIMESTAMP/SO_OTIMESTAMP processing to a single
function, and add a SOOPT_TIMESTAMP define reducing compat pollution from
5 places to 1.


Revision tags: netbsd-8-base
# 1.355 01-Jun-2017 chs

branches: 1.355.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base
# 1.354 31-Mar-2017 ozaki-r

Don't use a single global variable to store source route information for multiple incoming packets

It's not MP-safe. So use a m_tag to store the information instead.

Pointed out by knakahara@
The fix is from OpenBSD (originally fixed in FreeBSD)


# 1.353 31-Mar-2017 ozaki-r

Don't use a single global variable as a temporal storage for multiple packets

It's not MP-safe. So use local variables instead.


Revision tags: pgoyette-localcount-20170320
# 1.352 06-Mar-2017 ozaki-r

Make sure icmp_redirect_timeout_q and ip_mtudisc_timeout_q are initialized on bootup

Fix PR kern/52029


# 1.351 17-Feb-2017 ozaki-r

Fix return value


# 1.350 17-Feb-2017 ozaki-r

Protect sysctl_net_inet_ip_pmtudto with icmp_mtx instead of softnet_lock


# 1.349 07-Feb-2017 ozaki-r

Add missing NULL checks for m_get_rcvif


Revision tags: nick-nhusb-base-20170204
# 1.348 24-Jan-2017 ozaki-r

Tweak softnet_lock and NET_MPSAFE

- Don't hold softnet_lock in some functions if NET_MPSAFE
- Add softnet_lock to sysctl_net_inet_icmp_redirtimeout
- Add softnet_lock to expire_upcalls of ip_mroute.c
- Restore softnet_lock for in{,6}_pcbpurgeif{,0} if NET_MPSAFE
- Mark some softnet_lock for future work


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107
# 1.347 12-Dec-2016 ozaki-r

branches: 1.347.2;
Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.346 08-Dec-2016 ozaki-r

Use psref for ip_rtaddr

ip_rtaddr will be sleepable soon. So use psref instead of pserialize.


# 1.345 08-Dec-2016 ozaki-r

Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.344 18-Oct-2016 ozaki-r

Don't hold global locks if NET_MPSAFE is enabled

If NET_MPSAFE is enabled, don't hold KERNEL_LOCK and softnet_lock in
part of the network stack such as IP forwarding paths. The aim of the
change is to make it easy to test the network stack without the locks
and reduce our local diffs.

By default (i.e., if NET_MPSAFE isn't enabled), the locks are held
as they used to be.

Reviewed by knakahara@


# 1.343 18-Oct-2016 ozaki-r

Avoid double frees of mbuf

May fix one of panicks reported by Tom Ivar Helbekkmo in PR kern/51522


# 1.342 11-Oct-2016 ozaki-r

Fix kernel builds with IFA_STATS


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.341 07-Sep-2016 roy

Disallow input to detached addresses because they are not yet valid.


# 1.340 31-Aug-2016 ozaki-r

Make ipforward_rt and ip6_forward_rt percpu

Sharing one rtcache between CPUs is just a bad idea.

Reviewed by knakahara@


Revision tags: pgoyette-localcount-20160806
# 1.339 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


# 1.338 26-Jul-2016 ozaki-r

Fix downmatch increment


Revision tags: pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.337 08-Jul-2016 ozaki-r

branches: 1.337.2;
CID 1363344: remove dead code

We may need to reconsider a case when m_get_rcvif_psref returns NULL.


# 1.336 07-Jul-2016 ozaki-r

Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.


# 1.335 06-Jul-2016 ozaki-r

Switch the IPv4 address list to pslist(9)

Note that we leave the old list just in case; it seems there are some
kvm(3) users accessing the list. We can remove it later if we confirmed
nobody does actually.


# 1.334 06-Jul-2016 ozaki-r

Add and use pslist(9)-based hashtable for IPv4 addresses

Note that we leave the old hashtable to keep vmstat -H working.


# 1.333 04-Jul-2016 ozaki-r

Separate IP address matching functions

No functional change intended.


# 1.332 30-Jun-2016 ozaki-r

Tidy up goto lables

No functional change.


# 1.331 30-Jun-2016 ozaki-r

Fix error paths

Some error paths did m_put_rcvif_psref twice.


# 1.330 28-Jun-2016 ozaki-r

Add missing NULL checks for m_get_rcvif_psref


# 1.329 10-Jun-2016 ozaki-r

Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319
# 1.328 21-Jan-2016 riastradh

Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.


# 1.327 21-Jan-2016 riastradh

Give proper prototype to ip_output.


# 1.326 08-Jan-2016 knakahara

eliminate ip_input.c and ip6_input.c dependency on gif(4)


Revision tags: nick-nhusb-base-20151226
# 1.325 13-Oct-2015 roy

Include arp.h to restore the sysctl net.inet.ip.dad_count.
Fixes PR kern/49883 thanks to HITOSHI Osada.


Revision tags: nick-nhusb-base-20150921
# 1.324 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.323 07-Aug-2015 ozaki-r

Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.


Revision tags: nick-nhusb-base-20150606
# 1.322 02-May-2015 joerg

Fix !ARP build.


# 1.321 02-May-2015 roy

Add IPv4 address flags IN_IFF_TENTATIVE, IN_IFF_DUPLICATED and
IN_IFF_DETATCHED to mimic the IPv6 address behaviour.
Add SIOCGIFAFLAG_IN ioctl to retrieve the address flag via the
ifreq structure.
Add IPv4 DAD detection via the ARP methods described in RFC 5227.
Add sysctls net.inet.ip.dad_count and net.inet.arp.debug.

Discussed on tech-net@


Revision tags: nick-nhusb-base-20150406
# 1.320 26-Mar-2015 ozaki-r

Tidy up the regular path of ip_forward

No functional change is intended.


Revision tags: netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.319 16-Jun-2014 ozaki-r

branches: 1.319.2; 1.319.4; 1.319.6; 1.319.10;
Add 3rd argument to pktq_create to pass sc

It will be used to pass bridge sc for bridge_forward softint.

ok rmind@


# 1.318 05-Jun-2014 rmind

- Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.


# 1.317 30-May-2014 christos

Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.


# 1.316 29-May-2014 rmind

Make IGMP and multicast group management code MP-safe. Use a read-write
lock to protect the hash table of multicast address records; also, make it
private and eliminate some macros. In the long term, the lookup path ought
to be optimised.


# 1.315 28-May-2014 christos

CID 12164{49,51}: Remove bogus ifp == NULL checks; if ifp was really NULL,
we would have been dead a few lines before the tests.


# 1.314 23-May-2014 rmind

ip_input(), ip_savecontrol(): cache m->m_pkthdr.rcvif in a variable.


# 1.313 23-May-2014 rmind

Make ip_forward() static, there is no need to expose it.


# 1.312 23-May-2014 rmind

Make ip_input() static, there is no need to expose it.


# 1.311 22-May-2014 rmind

- Add in_init() and move some functions, variables and sysctls into in.c
where they belong to. Make some functions and variables static.
- ip_input.c: reduce some #ifdefs, cleanup a little.
- Move some sysctls into ip_flow.c as they belong there.

No functional change.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.310 19-Mar-2014 liamjfoy

branches: 1.310.2;
Remove ipflow_prune and replace with ipflow_reap. ok rmind@


Revision tags: riastradh-drm2-base3
# 1.309 25-Feb-2014 pooka

Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.308 29-Jun-2013 rmind

- Rewrite parts of pfil(9): use array to store hooks and thus be more cache
friendly (there are only few hooks in the system). Make the structures
opaque and the interface more strict.
- Remove PFIL_HOOKS option by making pfil(9) mandatory.


# 1.307 27-Jun-2013 christos

branches: 1.307.2;
flip src/dst


# 1.306 27-Jun-2013 christos

implement IP_PKTINFO and IP_RECVPKTINFO.


# 1.305 08-Jun-2013 rmind

Split IPsec code in ip_input() and ip_forward() into the separate routines
ipsec4_input() and ipsec4_forward(). Tested by christos@.


# 1.304 05-Jun-2013 christos

IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.


Revision tags: agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.303 29-Nov-2012 christos

Add a new sysctl to mark ports as reserved, so that they are not used in
the anonymous or reserved port allocation.


Revision tags: yamt-pagecache-base6
# 1.302 25-Jun-2012 christos

branches: 1.302.2;
rename rfc6056 -> portalgo, requested by yamt


# 1.301 22-Jun-2012 christos

PR/46602: Move the rfc6056 port randomization to the IP layer.


# 1.300 02-Jun-2012 dsl

Add some pre-processor magic to verify that the type of the data item
passed to sysctl_createv() actually matches the declared type for
the item itself.
In the places where the caller specifies a function and a structure
address (typically the 'softc') an explicit (void *) cast is now needed.
Fixes bugs in sys/dev/acpi/asus_acpi.c sys/dev/bluetooth/bcsp.c
sys/kern/vfs_bio.c sys/miscfs/syncfs/sync_subr.c and setting
AcpiGbl_EnableAmlDebugObject.
(mostly passing the address of a uint64_t when typed as CTLTYPE_INT).
I've test built quite a few kernels, but there may be some unfixed MD
fallout. Most likely passing &char[] to char *.
Also add CTLFLAG_UNSIGNED for unsiged decimals - not set yet.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.299 22-Mar-2012 drochner

remove KAME IPSEC, replaced by FAST_IPSEC


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.298 09-Jan-2012 liamjfoy

branches: 1.298.2; 1.298.6; 1.298.8;
check against NULL


# 1.297 19-Dec-2011 drochner

rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.296 31-Aug-2011 plunky

branches: 1.296.2; 1.296.6;
NULL does not need a cast


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.295 03-May-2011 dyoung

*_drain() routines may be called with locks held, so instead of doing
any work in *_drain(), set a drain-needed flag. Do the work in the
fasttimo handler.

Contributed by Coyote Point Systems, Inc.


# 1.294 14-Apr-2011 dyoung

In ipintr(), don't overwrite ipintrq.ifq_maxlen with IFQ_MAXLEN.

Initialize ipintrq.ifq_maxlen using IFQ_MAXLEN directly instead of using
the global ipqmaxlen. Get rid of the global ipqmaxlen.

Now it works again to override the maximum IP queue length with, for
example, sysctl -w net.inet.ip.ifq.maxlen=5.


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.293 13-Dec-2010 matt

branches: 1.293.2;
Back out rev that shouldn't have been committed.


# 1.292 11-Dec-2010 matt

Add routines to calculate a checkesum if the driver concludes that the
h/w can't do it.


Revision tags: uebayasi-xip-base4
# 1.291 05-Nov-2010 rmind

ip_randomid: make mechanism MP-safe and more modular.

OK matt@


# 1.290 05-Nov-2010 rmind

ip_reass_packet: finish abstraction; some clean-up.
Discussed some time ago with matt@.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.289 19-Jul-2010 rmind

Abstract IP reassembly into single generic routine - ip_reass_packet().
Make struct ipq private and struct ipqent not visible to userland.
Push ip_len adjustment into reassembly layer.

OK matt@


# 1.288 13-Jul-2010 rmind

Split-off IPv4 re-assembly mechanism into a separate module. Abstract
into ip_reass_init(), ip_reass_lookup(), etc (note: abstraction is not
yet complete). No functional changes to the actual mechanism.

OK matt@


# 1.287 09-Jul-2010 rmind

ip_input: move lookup for fragment queue a little bit further. OK matt@.


Revision tags: uebayasi-xip-base1
# 1.286 01-Apr-2010 tls

As suggested by at least 3 different people (the guilty parties know who
they are) avoid repeated kernel_lock/unlock by using an intrq on the stack.

About 5%-10% better from run to run, on my *very* simpleminded test. Can't
possibly be worse.


# 1.285 31-Mar-2010 tls

Don't hold kernel lock across call to ip_input() -- it blocked *all*
hardware interrupts for the length of time it took for all dequeued
packets to flow up the stack (on multiprocessors only). Initial testing
shows performance impact is minimal -- since this temporary fix actually
means taking/releasing the kernel lock per-packet, that seems
acceptable.

Holding the kernel lock across the ip_input() call duplicated the
exclusion intended to be provided by the socket locks/softnet lock
(same lock, for INET/INET6 sockets) and could mask serious bugs. Several
hours' testing didn't turn any up but I'd be surprised if some don't now
appear.

Damon Permezel noticed the problem. Temporary fix suggested by matt@.


Revision tags: yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.284 16-Sep-2009 pooka

branches: 1.284.2; 1.284.4;
Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.283 17-Jul-2009 minskim

Delete trailing whitespace.


Revision tags: yamt-nfs-mp-base6
# 1.282 16-Jul-2009 minskim

Add the IP_RECVTTL option support.

If the IP_RECVTTL option is enabled on a SOCK_DGRAM socket, the
recvmsg(2) call will return the TTL of the received datagram. The
msg_control field in the msghdr structure points to a buffer that
contains a cmsghdr structure followed by the TTL value.

Modeled after FreeBSD implementation.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.281 18-Apr-2009 tsutsui

Remove extra whitespace added by a stupid tool.
XXX: more in src/sys/arch


# 1.280 15-Apr-2009 elad

Remove a few KAUTH_GENERIC_ISSUSER in favor of more descriptive
alternatives.

Discussed on tech-kern:

http://mail-index.netbsd.org/tech-kern/2009/04/11/msg004798.html

Input from ad@, christos@, dyoung@, tsutsui@.

Okay ad@.


# 1.279 18-Mar-2009 cegger

bcopy -> memcpy


Revision tags: nick-hppapmap-base2
# 1.278 19-Jan-2009 christos

branches: 1.278.2;
Provide compatibility to the old timeval SCM_TIMESTAMP messages.


Revision tags: mjf-devfs2-base
# 1.277 17-Dec-2008 cegger

kill MALLOC and FREE macros.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.276 23-Nov-2008 rmind

ip_input: fix an IPQ "lock" leak. (hi <matt>!)


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4
# 1.275 04-Oct-2008 pooka

branches: 1.275.2; 1.275.4;
POOL_INIT -> pool_init


Revision tags: wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.274 05-Sep-2008 seanb

Wrong route being consulted in one place
in ip_forward() after change to rtcache_*().
Restore previous behaviour.


# 1.273 20-Aug-2008 matt

Make the sysctl routines take out softnet_lock before dealing with
any data structures.

Change inet6ctlerrmap and zeroin6_addr to const.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base
# 1.272 05-May-2008 ad

branches: 1.272.2; 1.272.6;
- Convert hashinit() to use kmem_alloc(). The hash tables can be large
and it's better to not have them in kmem_map.
- Convert a couple of minor items along the way to kmem_alloc().
- Fix some memory leaks.


# 1.271 04-May-2008 thorpej

Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577


# 1.270 02-May-2008 ad

PR kern/38497 Out of memory allocating ksiginfo

Work around: don't acquire softnet_lock in protocol drain routines.


# 1.269 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.268 24-Apr-2008 ad

branches: 1.268.2;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.


# 1.267 23-Apr-2008 thorpej

Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.266 12-Apr-2008 thorpej

branches: 1.266.2;
Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.


# 1.265 09-Apr-2008 thorpej

- ipflow is not used outside ip_flow.c; move its definition there.
- Make ipflow_reap() private to ip_flow.c, and introduce ipflow_prune()
for external callers to use (avoids returning an ipflow * that is never
actually used anyway).


# 1.264 07-Apr-2008 thorpej

Change IP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ipstat structure; old netstat
binaries will continue to work properly.


# 1.263 27-Mar-2008 cube

- Make sure we send a reasonable fragment size when IPSEC is configured.
Otherwise we end up sending a dubious "0" whenever we cannot find a
proper association for the packet.
- Reset sack_newdata along with snd_nxt to avoid improper integer
arithmetics that lead to sending data from an incorrect place in the
stream, making it appear as corrupted.

Patch by Michael Van Elst, based on an analysis by Michael for the IPSEC
stuff and I for the SACK issue.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.262 06-Feb-2008 matt

branches: 1.262.6;
Add a new ip_id generation scheme based on a Fisher-Yates shuffle over a
sliding window. XXX replace use of arc4random RSN.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.261 14-Jan-2008 dyoung

Use rtcache_validate() instead of rtcache_getrt(). Shorten staircase
in in_losing().


Revision tags: vmlocking2-base3 matt-armv6-base
# 1.260 22-Dec-2007 matt

Fix offset calculation.
Make sure that all frags use the same TOS.


# 1.259 21-Dec-2007 matt

Also make sure the first is at 68 bytes long.


# 1.258 21-Dec-2007 matt

Prevent TCP blind data attacks by not allowing non-initial fragments to
start at less than 68 bytes (minimal fragment size).


# 1.257 20-Dec-2007 dyoung

Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.256 26-Nov-2007 yamt

branches: 1.256.2; 1.256.6;
inetctlerrmap: use designated initializer.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.255 09-Nov-2007 kefren

Don't MCLAIM in ipintr() because we do it anyway in ip_input()


Revision tags: jmcneill-base yamt-x86pmap-base4 yamt-x86pmap-base3 yamt-x86pmap-base2 vmlocking-base
# 1.254 02-Oct-2007 dyoung

branches: 1.254.2; 1.254.4;
Delete the unused second argument to ip_stripoptions(), move it
closer to its single caller in if_eon.c, try to move fewer bytes
by moving the IP header forward instead of moving the tail of the
mbuf backward, and use m_adj(9) instead of fiddling directly with
mbuf data members.


Revision tags: yamt-x86pmap-base
# 1.253 11-Sep-2007 degroote

branches: 1.253.2;
In some FAST_IPSEC, spl level is not restored correctly. Fix that.

Spotted by Wolfgang Stukenbrock in pr/36800


Revision tags: nick-csl-alignment-base5
# 1.252 30-Aug-2007 dyoung

Use malloc(9) for sockaddrs instead of pool(9), and remove dom_sa_pool
and dom_sa_len members from struct domain. Pools of fixed-size
objects are too rigid for sockaddr_dls, whose size can vary over
a wide range.

Return sockaddr_dl to its "historical" size. Now that I'm using
malloc(9) instead of pool(9) to allocate sockaddr_dl, I can create
a sockaddr_dl of any size in the kernel, so expanding sockaddr_dl
is useless.

Avoid using sizeof(struct sockaddr_dl) in the kernel.

Introduce sockaddr_dl_alloc() for allocating & initializing an
arbitrary sockaddr_dl on the heap.

Add an argument, the sockaddr length, to sockaddr_alloc(),
sockaddr_copy(), and sockaddr_dl_setaddr().

Constify: LLADDR() -> CLLADDR().

Where the kernel overwrites LLADDR(), use sockaddr_dl_setaddr(),
instead. Used properly, sockaddr_dl_setaddr() will not overrun
the end of the sockaddr.


# 1.251 10-Aug-2007 dyoung

branches: 1.251.2;
Use sockaddr_dl_init().


Revision tags: matt-mips64-base
# 1.250 19-Jul-2007 dyoung

branches: 1.250.4; 1.250.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base yamt-idlelwp-base8 mjf-ufs-trans-base
# 1.249 02-May-2007 dyoung

branches: 1.249.2;
Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


Revision tags: thorpej-atomic-base
# 1.248 25-Mar-2007 liamjfoy

Add net.inet.ip.hashsize to control the IPv4 fast forward hash table size.


# 1.247 24-Mar-2007 liamjfoy

Don't call ip*flow_reap if we're just looking up maxflows


# 1.246 12-Mar-2007 ad

branches: 1.246.2; 1.246.4;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.


# 1.245 05-Mar-2007 liamjfoy

branches: 1.245.2;
Move ipflow_slowtimo from ip_slowtimo and into in_proto.c

ok matt@


# 1.244 04-Mar-2007 christos

Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.243 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.242 29-Jan-2007 dyoung

branches: 1.242.2;
Cosmetic: remove extraneous, non-KNF parentheses. Change a
sizeof(type) to a sizeof(*ptr) so the correctness of the statement
is correct "at a glance" (or so I hope).


# 1.241 22-Dec-2006 ad

ipintr(): check if the queue is empty before looping. Hardly a giant
win, but removed 30% of splnet() calls in one local test.


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.240 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.239 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


# 1.238 06-Dec-2006 dyoung

KNF.


# 1.237 06-Dec-2006 dyoung

KNF.


Revision tags: netbsd-4-0-RC1 netbsd-4-base
# 1.236 16-Nov-2006 christos

branches: 1.236.2; 1.236.4;
__unused removal on arguments; approved by core.


Revision tags: yamt-splraiseipl-base2
# 1.235 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


# 1.234 10-Oct-2006 dogcow

change the MOWNER_INIT define to take two args; fix extant struct mowner
decls to use it. Makes options MBUFTRACE compile again and not whinge about
missing structure declarations. (Also makes initialization consistent.)


# 1.233 05-Oct-2006 tls

Protect calls to pool_put/pool_get that may occur in interrupt context
with spl used to protect other allocations and frees, or datastructure
element insertion and removal, in adjacent code.

It is almost unquestionably the case that some of the spl()/splx() calls
added here are superfluous, but it really seems wrong to see:

s=splfoo();
/* frob data structure */
splx(s);
pool_put(x);

and if we think we need to protect the first operation, then it is hard
to see why we should not think we need to protect the next. "Better
safe than sorry".

It is also almost unquestionably the case that I missed some pool
gets/puts from interrupt context with my strategy for finding these
calls; use of PR_NOWAIT is a strong hint that a pool may be used from
interrupt context but many callers in the kernel pass a "can wait/can't
wait" flag down such that my searches might not have found them. One
notable area that needs to be looked at is pf.

See also:

http://mail-index.netbsd.org/tech-kern/2006/07/19/0003.html
http://mail-index.netbsd.org/tech-kern/2006/07/19/0009.html


# 1.232 19-Sep-2006 elad

Remove ugly (void *) casts from network scope authorization wrapper and
calls to it.

While here, adapt code for system scope listeners to avoid some more
casts (forgotten in previous run).

Update documentation.


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9
# 1.231 13-Sep-2006 elad

branches: 1.231.2;
Don't use KAUTH_RESULT_* where it's not applicable.
Prompted by yamt@.


# 1.230 08-Sep-2006 elad

First take at security model abstraction.

- Add a few scopes to the kernel: system, network, and machdep.

- Add a few more actions/sub-actions (requests), and start using them as
opposed to the KAUTH_GENERIC_ISSUSER place-holders.

- Introduce a basic set of listeners that implement our "traditional"
security model, called "bsd44". This is the default (and only) model we
have at the moment.

- Update all relevant documentation.

- Add some code and docs to help folks who want to actually use this stuff:

* There's a sample overlay model, sitting on-top of "bsd44", for
fast experimenting with tweaking just a subset of an existing model.

This is pretty cool because it's *really* straightforward to do stuff
you had to use ugly hacks for until now...

* And of course, documentation describing how to do the above for quick
reference, including code samples.

All of these changes were tested for regressions using a Python-based
testsuite that will be (I hope) available soon via pkgsrc. Information
about the tests, and how to write new ones, can be found on:

http://kauth.linbsd.org/kauthwiki

NOTE FOR DEVELOPERS: *PLEASE* don't add any code that does any of the
following:

- Uses a KAUTH_GENERIC_ISSUSER kauth(9) request,
- Checks 'securelevel' directly,
- Checks a uid/gid directly.

(or if you feel you have to, contact me first)

This is still work in progress; It's far from being done, but now it'll
be a lot easier.

Relevant mailing list threads:

http://mail-index.netbsd.org/tech-security/2006/01/25/0011.html
http://mail-index.netbsd.org/tech-security/2006/03/24/0001.html
http://mail-index.netbsd.org/tech-security/2006/04/18/0000.html
http://mail-index.netbsd.org/tech-security/2006/05/15/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/01/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/25/0000.html

Many thanks to YAMAMOTO Takashi, Matt Thomas, and Christos Zoulas for help
stablizing kauth(9).

Full credit for the regression tests, making sure these changes didn't break
anything, goes to Matt Fleming and Jaime Fournier.

Happy birthday Randi! :)


Revision tags: yamt-pdpolicy-base8 rpaulo-netinet-merge-pcb-base
# 1.229 30-Aug-2006 christos

branches: 1.229.2;
fix initializer


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.228 30-Jul-2006 elad

ugh.. more stuff that's overdue and should not be in 4.0: remove the
sysctl(9) flags CTLFLAG_READONLY[12]. luckily they're not documented
so it's only half regression.

only two knobs used them; proc.curproc.corename (check added in the
existing handler; its CTLFLAG_ANYWRITE, yay) and net.inet.ip.forwsrcrt,
that got its own handler now too.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base
# 1.227 07-Jun-2006 kardel

merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base simonb-timecounters-base
# 1.226 08-May-2006 liamjfoy

branches: 1.226.2;
#if -> #ifdef

ok christos


# 1.225 15-Apr-2006 christos

Coverity CID 1134: Protect against NULL deref.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.224 18-Feb-2006 joerg

branches: 1.224.2; 1.224.4; 1.224.6;
Print the source and destination IP in ip_forward's DIAGNOSTIC code
with inet_ntoa, making it more human friendly.

From Liam J. Foy in private mail.


# 1.223 24-Dec-2005 perry

branches: 1.223.2; 1.223.4; 1.223.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.222 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.221 01-Nov-2005 christos

Don't decrement the ttl, until we are sure that we can forward this packet.
Before if there was no route, we would call icmp_error with a datagram
packet that has an incorrect checksum. (From Liam Foy)


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base
# 1.220 23-Oct-2005 christos

No need to pass an interface when only the mtu is needed. From OpenBSD via
Liam Foy.


Revision tags: yamt-vop-base
# 1.219 05-Aug-2005 elad

branches: 1.219.2;
Add sysctls for IP, ICMP, TCP, and UDP statistics.


# 1.218 28-Jun-2005 seanb

branches: 1.218.2;
- Return ICMP_UNREACH_NET when no route found as per
section 4.3.3.1 of rfc1812.


# 1.217 09-Jun-2005 atatat

Properly fix the constipated lossage wrt -Wcast-qual and the sysctl
code. I know it's not the prettiest code, but it seems to work rather
well in spite of itself.


# 1.216 01-Jun-2005 blymn

Unconstify rnode to prevent compile error when GATEWAY option set.


Revision tags: kent-audio2-base
# 1.215 29-Apr-2005 yamt

move decl of inetsw to its own header to avoid array of incomplete type.
found by gcc4. reported by Adam Ciarcinski.


# 1.214 18-Apr-2005 yamt

fix problems related to loopback interface checksum omission. PR/29971.

- for ipv4, defer decision to ip layer as h/w checksum offloading does
so that it can check the actual interface the packet is going to.
- for ipv6, disable it.
(maybe will be revisited when it implements h/w checksum offloading.)

ok'ed by Jason Thorpe.


# 1.213 29-Mar-2005 yamt

ip_reass: clear stale csum_flags.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base
# 1.212 26-Feb-2005 perry

branches: 1.212.2;
nuke trailing whitespace


Revision tags: yamt-km-base2
# 1.211 03-Feb-2005 perry

ANSIfy function declarations


# 1.210 02-Feb-2005 perry

de-__P -- will ANSIfy .c files later.


Revision tags: yamt-km-base
# 1.209 24-Jan-2005 matt

branches: 1.209.2;
Add IFNET_FOREACH and IFADDR_FOREACH macros and start using them.


Revision tags: kent-audio1-beforemerge
# 1.208 19-Dec-2004 christos

branches: 1.208.2;
yamt's changes seem to fix all the checksumming issues. Turn the loopback
checksums back off so we can make sure that everything works.


# 1.207 17-Dec-2004 christos

Turn checksumming on loopback back on until we fix the bugs in it.
Connect over tcp on the loopback is broken:

4729 amq 0.000007 CALL connect(4,0x804f2a0,0x1c)
4729 amq 75.007420 RET connect -1 errno 60 Connection timed out


# 1.206 15-Dec-2004 thorpej

Don't perform checksums on loopback interfaces. They can be reenabled with
the net.inet.*.do_loopback_cksum sysctl.

Approved by: groo


Revision tags: kent-audio1-base
# 1.205 06-Oct-2004 darrenr

Add a comment to document what setting "srcrt" is really on about in ipintr()


# 1.204 29-Sep-2004 christos

PR/27081: Sean Boudreau: ip_input() bad csum count not incremented on sw csum


Revision tags: BEFORE-IPF413
# 1.203 25-May-2004 atatat

Sysctl descriptions under net subtree (net.key not done)


# 1.202 02-May-2004 darrenr

at line 543, we do a pullup here of hlen bytes into the mbuf,
so these later ones are superfluous.


# 1.201 01-May-2004 matt

Use EVCNT_ATTACH_STATIC{,2}


# 1.200 25-Apr-2004 simonb

Initialise (most) pools from a link set instead of explicit calls
to pool_init. Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.

Convert struct session, ucred and lockf to pools.


# 1.199 22-Apr-2004 matt

Constify protosw arrays. This can reduce the kernel .data section by
over 4K (if all the network protocols) are loaded.


# 1.198 01-Apr-2004 matt

In ip_reass_ttl_descr, make i signed since it's compared to >= 0


Revision tags: netbsd-2-0-base BEFORE-IPF411
# 1.197 24-Mar-2004 atatat

branches: 1.197.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.


# 1.196 15-Jan-2004 itojun

correct typo in 1.94 -> 1.95. pointed out by Shiva Shenoy


# 1.195 14-Dec-2003 thorpej

Fix syntax errors in CHECK_NMBCLUSTER_PARAMS().


# 1.194 14-Dec-2003 jonathan

Second part of hashed IP_reassembly changes:

When under pressure for mbufs or we have too many fragments in the IP
reassembly queue, drop half of all fragments. This multiplicative-drop
strategy ensures we return to a healthy state, even under borderline
denial-of-service from extremely lossy NFS-over-UDP peers.
The multiplicative-drop phase currently drops 50% of fragments, but
has pre-placed support for implementing drop-fractions other than 50%

The threshhold for the `drop-half' phase is the new variable,
ip_maxfrags which is calculated as nmbclusters/4.

ip_input.c now keeps ip_nmbclusters, a cached copy of nmbclusters.
Before using limits derived from nmbclusters, we check if nmbclusters
and ip_nmclusters are equal. If not, we recompute Ip parameters
derived from nmbclusters. Based on a suggestion by Jason Thorpe.
ip_maxfrags is currently auto-recalcuated.

The counters ip_nfrags and ip_nfragpacketsr are now declared static
and uninitialized (bss), to discourage tampering with them.


# 1.193 12-Dec-2003 scw

Make fast-ipsec and ipflow (Fast Forwarding) interoperate.

The idea is that we only clear M_CANFASTFWD if an SPD exists
for the packet. Otherwise, it's safe to add a fast-forward
cache entry for the route.

To make this work properly, we invalidate the entire ipflow
cache if a fast-ipsec key is added or changed.


# 1.192 08-Dec-2003 jonathan

Add new field ipq_nfrags to struct ipq. Maintain count of fragments
(fragments, not fragmented packets) in each queue entry.
Use ipq_nfrags to maintain a count of total fragments in reassembly queue.


# 1.191 07-Dec-2003 jonathan

KNF: s/unsigned/u_int/, in a couple of places I missed.


# 1.190 06-Dec-2003 jonathan

Replace the single global IP reassembly list/listhead, with a
hashtable of list-heads. Independently re-invented, then reworked to
match similar code in FreeBSD.


# 1.189 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.188 04-Dec-2003 scw

ipflow (IP fast forwarding) is not compatible with FAST_IPSEC either.

XXX: The decision whether or not to fast forward should be made
XXX: dynamically. Using the current approach seriously reduces
XXX: routing performance on gateways with IPsec enabled.


# 1.187 26-Nov-2003 itojun

define RANDOM_IP_ID by default (unifdef -DRANDOM_IP_ID).
one use remains in sys/netipsec, which is kept for freebsd source code compat.


# 1.186 24-Nov-2003 scw

For FAST_IPSEC, ipfilter gets to see wire-format IPsec-encapsulated packets
only. Decapsulated packets bypass ipfilter. This mimics current behaviour
for Kame IPsec.


# 1.185 19-Nov-2003 fvdl

Correct number of arguments to sysctl_rdint.


# 1.184 19-Nov-2003 jonathan

Patch back support for (badly) randomized IP ids, by request:

* Include "opt_inet.h" everywhere IP-ids are generated with ip_newid(),
so the RANDOM_IP_ID option is visible. Also in ip_id(), to ensure
the prototype for ip_randomid() is made visible.

* Add new sysctl to enable randomized IP-ids, provided the kernel was
configured with RANDOM_IP_ID. (The sysctl defaults to zero, and is
a read-only zero if RANDOM_IP_ID is not configured).

Note that the implementation of randomized IP ids is still defective,
and should not be enabled at all (even if configured) without
very careful deliberation. Caveat emptor.


# 1.183 17-Nov-2003 jonathan

Diff to netinet/ip_input.c (restore ip_id, initialize) for ip_id fix:

Revert the (default) ip_id algorithm to the pre-randomid algorithm,
due to demonstrated low-period repeated IDs from the randomized IP_id
code. Consensus is that the low-period repetition (much less than
2^15) is not suitable for general-purpose use.

Allocators of new IPv4 IDs should now call the function ip_newid().
Randomized IP_ids is now a config-time option, "options RANDOM_IP_ID".
ip_newid() can use ip_random-id()_IP_ID if and only if configured
with RANDOM_IP_ID. A sysctl knob should be provided.

This API may be reworked in the near future to support linear ip_id
counters per (src,dst) IP-address pair.


# 1.182 12-Nov-2003 itojun

KNF


# 1.181 11-Nov-2003 jonathan

Change global head-of-local-IP-address list from in_ifaddr to
in_ifaddrhead. Recent changes in struct names caused a namespace
collision in fast-ipsec, which are most cleanly fixed by using
"in_ifaddrhead" as the listhead name.


# 1.180 10-Nov-2003 jonathan

Make per-protocol network input queue stats visible to userland via
sysctl. Add a protocol-independent sysctl handler to show the per-protocol
"struct ifq' statistics. Add IP(v4) specific call to the handler.
Other protocols can show their per-protocol input statistics by
allocating a sysclt node and calling sysctl_ifq() with their own struct ifq *.

As posted to tech-kern plus improvements/cleanup suggested by Andrew Brown.


# 1.179 28-Sep-2003 mycroft

Remove some code that breaks AH tunnels completely. The comment describing
the purpose of this code appears to be on crack -- it's talking about
end-to-end authentication, but the purpose of an AH tunnel is NOT end-to-end
authentication; it's authentication of the tunnel endpoints.

NB: This does not fix the fact that IPsec leaks "packet tags."


# 1.178 06-Sep-2003 itojun

randomize IPv4/v6 fragment ID and IPv6 flowlabel. avoids predictability
of these fields. ip_id.c is from openbsd. ip6_id.c is adapted by kame.


# 1.177 06-Sep-2003 itojun

backout previous, we don't know if arc4random() corrides on reboot.


# 1.176 05-Sep-2003 itojun

initialize fragment ID with arc4random, not by time.tv_sec


# 1.175 22-Aug-2003 itojun

remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.


# 1.174 22-Aug-2003 itojun

change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.


# 1.173 15-Aug-2003 jonathan

(fast-ipsec): Add hooks to pass IPv4 IPsec traffic into fast-ipsec, if
configured with ``options FAST_IPSEC''. Kernels with KAME IPsec or
with no IPsec should work as before.

All calls to ip_output() now always pass an additional compulsory
argument: the inpcb associated with the packet being sent,
or 0 if no inpcb is available.

Fast-ipsec tested with ICMP or UDP over ESP. TCP doesn't work, yet.


# 1.172 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.171 14-Jul-2003 itojun

correct igmp. from love


# 1.170 03-Jul-2003 itojun

minor KNF


# 1.169 30-Jun-2003 itojun

branches: 1.169.2;
do not generate ICMP redirect when packet filter alters ip_dst to an
address that reside on the same link. Cedric Berger convinced me that
it is necessary.


# 1.168 30-Jun-2003 itojun

fix indent


# 1.167 23-Jun-2003 martin

Make sure to include opt_foo.h if a defflag option FOO is used.


# 1.166 15-Jun-2003 matt

Change the way multicasts are kept. They now use a hash table in the same
manner as the ifaddr hash table. By doing this, the mkludge code can go
away. At the same time, keep track of what pcbs are using what ifaddr and
when an address is deleted from an interface, notify/abort all sockets
that have that address as a source. Switch IGMP and multicasts to use pools
for allocation. Fix a number of potential problems in the igmp code where
allocation failures could cause a trap/panic.


# 1.165 11-Apr-2003 christos

PR/991: Darren Reed: Add a sysctl (checkinteface) to implement this. This
implementation is taken from FreeBSD, but we default to off.
XXX: We should really do this on a per ifaddr basis as jason suggested.


# 1.164 26-Feb-2003 matt

Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.163 12-Nov-2002 itojun

remove all entries in rt timer queue on ip_mtudisc change, instead of
destroying the queue.


# 1.162 12-Nov-2002 itojun

ckout previous - doesn't compile


# 1.161 12-Nov-2002 itojun

update ip_mtudisc sysctl change handling.


# 1.160 10-Nov-2002 itojun

always create pmtud timeout queue, as ip_mtudisc can be tweaked via
sysctl at runtime. From lha@stacken.kth.se


# 1.159 02-Nov-2002 perry

/*CONTCOND*/ while (0)'ed macros


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.158 23-Sep-2002 itojun

revert mtudisc_timeout value to the old one if update falis


# 1.157 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame.


# 1.156 11-Sep-2002 itojun

correct signedness mixup in pointer passing. sync w/kame


Revision tags: gehenna-devsw-base
# 1.155 14-Aug-2002 itojun

avoid swapping endian of ip_len and ip_off on mbuf, to meet with M_LEADINGSPACE
optimization made last year. should solve PR 17867 and 10195.

IP_HDRINCL behavior of raw ip socket is kept unchanged. we may want to
provide IP_HDRINCL variant that does not swap endian.


# 1.154 30-Jun-2002 thorpej

Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).


# 1.153 13-Jun-2002 itojun

set IPv4 parameter to modern value.
- turn on path MTU discovery (previous: turned off)
- ICMPv4 redirect entry timeout = 600 sec (previous: never timeout)


# 1.152 09-Jun-2002 itojun

whitespace


# 1.151 07-Jun-2002 itojun

look at rmx_mtu on IPsec tunnel MTU computation.
From: David Waitzman <djw@bbn.com>


Revision tags: netbsd-1-6-base
# 1.150 12-May-2002 matt

branches: 1.150.2; 1.150.4;
Eliminate commons.


# 1.149 12-May-2002 wiz

Spelling fixes, from Sergey Svishchev in kern/16650.


# 1.148 07-May-2002 matt

Change struct ipqe to use TAILQ's instead of LIST's (primarily for TCP's
benefit currently). Rework tcp_reass code to optimize the 4 most likely causes
of out-of-order packets: first OoO pkt, next OoO pkt in seq, OoO pkt is part
of new chuck of OoO packets, and the OoO pkt fills the first hole. Add evcnts
to instrument tcp_reass (enabled by the options TCP_REASS_COUNTERS). This is
part 1/2 of tcp_reass changes.


# 1.147 18-Apr-2002 matt

Change test for M_EXT to M_READONLY for MROUTING. We only need to to do
a pullup if we aren't allowed to modify the packet.


Revision tags: eeh-devprop-base newlock-base
# 1.146 08-Mar-2002 thorpej

Pool deals fairly well with physical memory shortage, but it doesn't
deal with shortages of the VM maps where the backing pages are mapped
(usually kmem_map). Try to deal with this:

* Group all information about the backend allocator for a pool in a
separate structure. The pool references this structure, rather than
the individual fields.
* Change the pool_init() API accordingly, and adjust all callers.
* Link all pools using the same backend allocator on a list.
* The backend allocator is responsible for waiting for physical memory
to become available, but will still fail if it cannot callocate KVA
space for the pages. If this happens, carefully drain all pools using
the same backend allocator, so that some KVA space can be freed.
* Change pool_reclaim() to indicate if it actually succeeded in freeing
some pages, and use that information to make draining easier and more
efficient.
* Get rid of PR_URGENT. There was only one use of it, and it could be
dealt with by the caller.

From art@openbsd.org.


Revision tags: ifpoll-base
# 1.145 25-Feb-2002 itojun

correctly enforce ipsec policy check on forwarding case.
From: Greg Troxel <gdt@ir.bbn.com>, Bill Chiarchiaro <wjc@work.cleartech.com>


# 1.144 24-Feb-2002 martin

Clear M_BCAST and M_MCAST on outgoing mbufs.
Don't copy ttl from the inner packet to the encapsulating packet. Make
the outer ttl sysctl'able. This should close PR 14269 from Jasper Wallace
(change partly from there) and it makes traceroute work over gre tunnels.


# 1.143 21-Feb-2002 itojun

suppress source quence message, based on router-req RFC (also could be abused
as DoS traffic generator). from kjc/kame


# 1.142 28-Nov-2001 darrenr

recompute hlen after calling pfil_run_hooks() in case ip_hl was changed.


# 1.141 13-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-mips-cache-base
# 1.140 04-Nov-2001 matt

Convert netinet to not use the internal <sys/queue.h> field names
but instead the access macros. Use the FOREACH macros where appropriate.


# 1.139 04-Nov-2001 matt

Change a few variable/tables to const since they are read-only.


# 1.138 29-Oct-2001 simonb

Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.137 17-Sep-2001 thorpej

branches: 1.137.2;
Split the pre-computed ifnet checksum flags into Tx and Rx directions.
Add capabilities bits that indicate an interface can only perform
in-bound TCPv4 or UDPv4 checksums. There is at least one Gig-E chip
for which this is true (Level One LXT-1001), and this is also the
case for the Intel i82559 10/100 Ethernet chips.


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.136 06-Aug-2001 itojun

branches: 1.136.2;
cache IPsec policy on in6?pcb. most of the lookup operations can be bypassed,
especially when it is a connected SOCK_STREAM in6?pcb. sync with kame.


# 1.135 02-Jun-2001 thorpej

branches: 1.135.2;
Implement support for IP/TCP/UDP checksum offloading provided by
network interfaces. This works by pre-computing the pseudo-header
checksum and caching it, delaying the actual checksum to ip_output()
if the hardware cannot perform the sum for us. In-bound checksums
can either be fully-checked by hardware, or summed up for final
verification by software. This method was modeled after how this
is done in FreeBSD, although the code is significantly different in
most places.

We don't delay checksums for IPv6/TCP, but we do take advantage of the
cached pseudo-header checksum.

Note: hardware-assisted checksumming defaults to "off". It is
enabled with ifconfig(8). See the manual page for details.

Implement hardware-assisted checksumming on the DP83820 Gigabit Ethernet,
3c90xB/3c90xC 10/100 Ethernet, and Alteon Tigon/Tigon2 Gigabit Ethernet.


# 1.134 21-May-2001 lukem

fix spelo in comment


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.133 16-Apr-2001 itojun

give a default value to net.inet.ip.maxfragpackets, to protect us from
"lots of fragmented packets" DoS attack.

the current default value is derived from ipv6 counterpart, which is
a magical value "200". it should be enough for normal systems, not sure
if it is enough when you take hundreds of thousands of tcp connections on
your system. if you have proposal for a better value with concrete reasons,
let me know.


# 1.132 13-Apr-2001 thorpej

Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.


# 1.131 27-Mar-2001 itojun

net.inet.ip.maxfragpackets defines the maximum size of ip reass queue
(prevents fragment flood from chewing up mbuf memory space).
derived from KAME net.inet6.ip6.maxfragpackets.


# 1.130 02-Mar-2001 itojun

branches: 1.130.2;
increase ipstat.ips_badaddr if the packet fails to pass address checks.


# 1.129 02-Mar-2001 itojun

reject packets with 127/8 on IPv4 src/dst, they must not appear on wire
(RFC1122). torture-tests will be welcomed.
XXX do we want to check source routing headers as well?


# 1.128 01-Mar-2001 itojun

make sure to enforce inbound ipsec policy checking, for any protocols on top
of ip (check it when final header is visited). sync with kame.
XXX kame team will need to re-check policy engine code


# 1.127 24-Jan-2001 itojun

- record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation


# 1.126 28-Dec-2000 thorpej

Back out the sledgehammer damage applied by wiz while I was out for
the holiday.


# 1.125 25-Dec-2000 wiz

Back out previous change. It causes NAT to fail, and was CLEARLY
NOT TESTED before it was committed.


# 1.124 22-Dec-2000 thorpej

Slight adjustment to how pfil_head's are registered. Instead of a
"key" and a "dlt", use a "type" (PFIL_TYPE_{AF,IFNET} for now) and
a val/ptr appropriate for that type. This allows for more future
flexibility with the pfil_hook mechanism.


# 1.123 14-Dec-2000 thorpej

Add ALTQ glue. XXX Temporary until ALTQ is changed to use a pfil hook.


# 1.122 24-Nov-2000 itojun

IFA_STATS stability (not complete); don't touch ip if it is NULL.


# 1.121 11-Nov-2000 thorpej

Restructure the PFIL_HOOKS mechanism a bit:
- All packets are passed to PFIL_HOOKS as they come off the wire, i.e.
fields in protocol headers in network order, etc.
- Allow for multiple hooks to be registered, using a "key" and a "dlt".
The "dlt" is a BPF data link type, indicating what type of header is
present.
- INET and INET6 register with key == AF_INET or AF_INET6, and
dlt == DLT_RAW.
- PFIL_HOOKS now take an argument for the filter hook, and mbuf **,
an ifnet *, and a direction (PFIL_IN or PFIL_OUT), thus making them
less IP (really, IP Filter) centric.

Maintain compatibility with IP Filter by adding wrapper functions for
IP Filter.


# 1.120 08-Nov-2000 ad

Update for hashinit() change.


# 1.119 13-Oct-2000 itojun

make sure we don't share external mbuf between m and mcopy, in ip_forward().
should solve PR 11201.


# 1.118 26-Aug-2000 itojun

make sure anonport{min,max} is not negative number


# 1.117 25-Aug-2000 tron

Add new sysctl variables "net.inet.ip.lowportmin" and
"net.inet.ip.lowportmax" which can be used to the set minimum
and maximum port number assigned to sockets using
IP_PORTRANGE_LOW.


# 1.116 06-Jul-2000 itojun

remove unnecessary #include <netkey/key_debug.h>. from kame.


# 1.115 28-Jun-2000 mrg

<vm/vm.h> -> <uvm/uvm_extern.h>


Revision tags: netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.114 10-May-2000 itojun

branches: 1.114.4;
add missing boundary checks to ip options processing.
correct timestamp option validation (len and ptr upper/lower bound
based on RFC791).
fill "pointer" field for parameter problem in timestamp option processing.


# 1.113 10-May-2000 itojun

correct more out-of-bounds memory access, if cnt == 1 and optlen > 1.


# 1.112 06-May-2000 sommerfeld

Handle large offsets with very small options correctly.


# 1.111 31-Mar-2000 jdolecek

Slighly improve previous - only include <netinet/ip_mroute.h> if MROUTING
is defined.


# 1.110 31-Mar-2000 jdolecek

include <netinet/ip_mroute.h> for ip_mforward() - needed after
last duplicate prototype sweep (prototype for ip_mforward() used to be in <netinet/ip_var.h>)


# 1.109 30-Mar-2000 augustss

Remove register declarations.


# 1.108 30-Mar-2000 simonb

Delete uninitialised declaration of ip_defttl - there's an initialised
decl earlier in this file.


# 1.107 10-Mar-2000 thorpej

Back out previous, and adjust a comment.


# 1.106 07-Mar-2000 thorpej

Back out part of 1.104 which isn't actually needed.


# 1.105 03-Mar-2000 itojun

remove unnecessary ttl initialization which I mistakingly bringed in
during KAME merge (this is part of WIDE's expeirmental reass code...)
NetBSD PR: 9412
From: Wolfgang Rupprecht <wolfgang@wsrcc.com>
Fix from: ho@crt.se
itojun was notified from: theo


# 1.104 02-Mar-2000 thorpej

Avoid a bug in GCC which manifests itself when processing unaligned
IP options. Problem pointed out by Matt Hargett and Erik Fair, analyzed
by me.


# 1.103 01-Mar-2000 itojun

introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)


# 1.102 20-Feb-2000 darrenr

pass "struct pfil_head *" to pfil_add_hook and pfil_remove hook rather
than "struct protosw *".


# 1.101 17-Feb-2000 darrenr

Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.


# 1.100 16-Feb-2000 itojun

- if ip_dst matches address on !IFF_UP interface, and
- there's no match against addresses on IFF_UP interface,
send icmp unreach if I'm router. drop it if I'm host.

Revised version of PR: 9387 from nrt@iij.ad.jp. Discussed with thorpej+nrt.


Revision tags: chs-ubc2-newbase
# 1.99 12-Feb-2000 thorpej

Typo (Thanks, Havard :-)


# 1.98 12-Feb-2000 thorpej

Small cosmetic change, and note a place where a statistic should be
gathered.


# 1.97 11-Feb-2000 itojun

fix in-kernel packet forwarding loop (till TTL becomes 0) when:
- a packet is delivered to an address X,
- and the address X is configured on my !IFF_UP interface
- and ipforwarding=1

NetBSD PR: 9387
From: nrt@iij.ad.jp


# 1.96 01-Feb-2000 thorpej

Use ifatoia() and sintosa() consistently, rather than using home-grown
casting macros intermixed.


# 1.95 31-Jan-2000 itojun

bring in latest KAME ipsec tree.
- interop issues in ipcomp is fixed
- padding type (after ESP) is configurable
- key database memory management (need more fixes)
- policy specification is revisited

XXX m->m_pkthdr.rcvif is still overloaded - hope to fix it soon


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.94 26-Oct-1999 itojun

disable ipflow (IPv4 fast fowarding) when IPsec is configured into the kernel.


# 1.93 17-Oct-1999 sommerfeld

branches: 1.93.2; 1.93.4;
In ip_forward():

Avoid forwarding ip unicast packets which were contained inside
link-level multicast packets; having M_MCAST still set in the packet
header flags will mean that the packet will get multicast to a bogus
group instead of unicast to the next hop.

Malformed packets like this have occasionally been spotted "in the
wild" on a mediaone cable modem segment which also had multiple netbsd
machines running as router/NAT boxes.

Without this, any subnet with multiple netbsd routers receiving all
multicasts will generate a packet storm on receipt of such a
multicast. Note that we already do the same check here for link-level
broadcasts; ip6_forward already does this as well.

Note that multicast forwarding does not go through ip_forward().

Adding some code to if_ethersubr to sanity check link-level
vs. ip-level multicast addresses might also be worthwhile.


Revision tags: chs-ubc2-base
# 1.92 23-Jul-1999 itojun

branches: 1.92.2;
do not include unnecessary include files.


# 1.91 09-Jul-1999 thorpej

defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).


# 1.90 06-Jul-1999 itojun

sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups


# 1.89 01-Jul-1999 itojun

IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.88 26-Jun-1999 sommerfeld

If the new global variable hostzerobroadcast is zero, no longer assume
address zero of each net/subnet is a broadcast address.
(The default value is nonzero, which preserves the current behavior).

This can be set using sysctl; the boot-time default can also be
configured using the HOSTZEROBROADCAST kernel config option.

While we're here, defopt HOSTZEROBROADCAST and SUBNETSARELOCAL


# 1.87 04-May-1999 hwr

It does not make much sense to increase a "output" counter on input.


# 1.86 03-May-1999 thorpej

In INADDR_TO_IA(), skip interfaces which are not up. Revert previous change
to ip_input.c to check the interface status after INADDR_TO_IA().

Fix cooked up by Heiko Rupp and myself.

Fixes PR 7480.


# 1.85 03-May-1999 hwr

Drop packets, that have a Class-D address as source address.
Implements the first half of PR 7003.


# 1.84 07-Apr-1999 proff

tiny KNF change


# 1.83 07-Apr-1999 proff

Prevent reception of packets on downed interfaces (via an up interface).
fixes kern/7327


Revision tags: netbsd-1-4-base
# 1.82 27-Mar-1999 aidan

branches: 1.82.2;
Added per-addr input/output statistics. Currently just support netatalk
and netinet, currently only tested under netinet.

Disabled by default, enabled by compiling the kernel with option
IFA_STATS. Enabling this feature seems to make the ip_output function
take 13% longer than before, which should be OK for people that need
this feature.


# 1.81 26-Mar-1999 proff

security: test for ip_len < ip_hl <<2 and drop packet accordingly


# 1.80 19-Jan-1999 mycroft

There's just no plausible reason to byte-swap ip_id internally. It's opaque.


# 1.79 19-Jan-1999 mycroft

Don't screw with ip_len; just subtract from it where we actually use the
value.


# 1.78 19-Jan-1999 mycroft

Don't overwrite the checksum fields when checking them. There's no reason to
do this, and it screws up ICMP replies.
XXX The returned IP checksum and length are still wrong.


# 1.77 11-Jan-1999 thorpej

Fix byte order and ip_len inconsistencies in ICMP reply code. Also, fix
some formatting and HTONS(foo) vs. foo = htons(foo) inconsistencies.

PR #6602, Darren Reed.


# 1.76 19-Dec-1998 thorpej

Reverse the copyright-notice-swap. It went against existing practice.


# 1.75 18-Dec-1998 thorpej

Add a lock around the IP fragment reassembly queue, to prevent ip_drain()
from corrupting the queue if called from a device's interrupt context.

Should fix PR #5684.


Revision tags: kenh-if-detach-base
# 1.74 13-Nov-1998 thorpej

branches: 1.74.2;
Once a fragmented IP packet has been reassembled, recompute the packet
length before passing it up the stack. From FreeBSD.


Revision tags: chs-ubc-base
# 1.73 08-Oct-1998 thorpej

Use the pool allocator for ipflow entries.


# 1.72 08-Oct-1998 thorpej

Use the pool allocator for ipqent structures.


# 1.71 30-Sep-1998 tls

Switch order of TNF and UCB copyrights so UCB copyright is first; this seems more appropriate since UCB wrote the original code, after all.


# 1.70 09-Sep-1998 thorpej

Make a diagnostic printf more sensible, PR #5951, Heiko W. Rupp.


# 1.69 09-Aug-1998 mrg

defopt PFIL_HOOKS.


Revision tags: eeh-paddr_t-base
# 1.68 17-Jul-1998 sommerfe

Fix PR5508: ipfil cut-through forwarding causes panic


# 1.67 01-Jun-1998 thorpej

Protect the ipflow_reap() call with splsoftnet.


# 1.66 24-May-1998 thorpej

Fix OBOB in IP timestamp option processing, as noted in FreeBSD PR 6738,
from Jennifer Dawn Meyers <jdm@enteract.com>.


# 1.65 04-May-1998 matt

Default IP flow to being enabled. Add a sysctl to control the maximum
number of flows (net.inet.ip.maxflows). If set to 0, will disable fast
path forwarding.


# 1.64 01-May-1998 thorpej

Allow packet filters to prevent a packet from creating a fast-forwarding
flow, by setting the "can fast forward" flag in the packet header, and
giving a chance for filters to clear the flag. If the flag is still
set after the filters have given it a chance, the packet will be used
to create a fast-forward flow entry.


# 1.63 29-Apr-1998 matt

Add support for "fast" forwarding. Add hooks in if_ethersubr.c and
if_fddisubr.c to fastpath IP forwarding. If ip_forward successfully
forwards a packet, it will create a cache (ipflow) entry. ether_input
and fddi_input will first call ipflow_fastforward with the received
packet and if the packet passes enough tests, it will be forwarded (the
ttl is decremented and the cksum is adjusted incrementally).


# 1.62 29-Apr-1998 matt

defopt GATEWAY


# 1.61 29-Apr-1998 kml

change path MTU timeout value to match RFC 1191


# 1.60 29-Apr-1998 kml

Add support for deletion of routes added by path MTU discovery;
uses new generic route timeout code. Add sysctl for timeout period.


# 1.59 19-Mar-1998 mrg

convert pfil(9) in and out lists from <sys/queue.h> LISTs to TAILQs, and
change pfil_add_hook to put output filters at the tail of the queue,
while continuing to place input filters at the head of the queue. update
the two users of these functions, and document these changes.

fixes PR#4593.


# 1.58 15-Feb-1998 tls

Add correct copyright notice for IP address hash change. This code is donated to TNF by the original copyright holder, Panix.


# 1.57 13-Feb-1998 tls

Change list of interface IP addresses to a hash. Improves performance on hosts with a large number of IP addresses significantly.


# 1.56 28-Jan-1998 thorpej

Use offsetof() from libkern.h


# 1.55 12-Jan-1998 scottr

Use option header file for MROUTING


# 1.54 05-Jan-1998 lukem

enhance ephemeral port allocation code:
* support sysctl net.inet.ip.anonportmin (lowest ephemeral port)
and net.inet.ip.anonportmax (highest ephemeral port).
these can't be set to >65535, < IPPORT_RESERVED (unless IPNOPRIVPORTS
is defined), and anonportmin has to be < anonportmax.
* use a cleaner way of only cycling through the available set once;
this will be useful for when a random allocation scheme is used
* define IPPORT_ANON{MIN,MAX} instead of IPPORT_USER{LOW,HIGH}


Revision tags: netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base
# 1.53 18-Oct-1997 kml

branches: 1.53.2;
change sysctl net.inet.icmp.mtudisc to net.inet.ip.mtudisc


# 1.52 17-Oct-1997 thorpej

Allow `subnetsarelocal' to be changed via sysctl.


Revision tags: thorpej-signal-base marc-pcmcia-base
# 1.51 29-Aug-1997 gwr

Tweaks to allow operation with an interface address of 0.0.0.0
(needed for NFS mountroot using BOOTP to get boot parameters)


Revision tags: marc-pcmcia-bp
# 1.50 24-Jun-1997 thorpej

branches: 1.50.4;
Eliminate use of dtom() from the network code, allowing more flexible
use of mbuf external storage and increasing performance (by eliminating
an m_pullup() for clusters in the IP reassembly code).

Changes from Koji Imada <koji@math.human.nagoya-u.ac.jp>, in PR #3628
and #3480, with ever-so-slight integration changes by me.


# 1.49 15-Apr-1997 christos

Move the mtod calls *after* we've made sure that the packet has passed the
filter successfully. Otherwise it can be NULL if the filter blocked it,
and we die. How did this ever work?


Revision tags: is-newarp-before-merge
# 1.48 26-Feb-1997 mrg

allow src-routed packetd by default, per host requirements


# 1.47 25-Feb-1997 cjs

Add net.inet.ip.allowsrcrt option which allows/drops all source
routed packets. This currently defaults to `drop,' but once we
verify that all applications that rely on determining remote IP
addresses for authentication are dropping the connection when they
see a source route option (not just disabling the source route
option), we can turn this back on and conform with the host
requirements.


# 1.46 19-Feb-1997 cjs

Fix bug in sysctl net.inet.ip.forwsrcrt handing: now you can read it
if securelevel > 0. (Thanks, cgd.)


# 1.45 18-Feb-1997 mrg

pseudo-device ipfilter brings in PFIL_HOOKS.


Revision tags: is-newarp-base
# 1.44 11-Jan-1997 thorpej

branches: 1.44.4;
Implement the IP_RECVIF socket option: supply a datagram packet's incoming
interface using a sockaddr_dl in a control mbuf.

Implement SO_TIMESTAMP for IP datagrams.

Move packet information option processing into a generic function
so that they work with multicast UDP and raw IP as well as unicast UDP.

Contributed by Bill Fenner <fenner@parc.xerox.com>.


# 1.43 20-Dec-1996 mrg

in pfil_hooks: always reassign ip after calling hook.


# 1.42 20-Dec-1996 mrg

remove pfil_bad.


# 1.41 25-Oct-1996 thorpej

Before concatenating frags, sanity check the length of the packet. If it's
larger than IP_MAXPACKET, discard it.
Based on a patch from Bill Fenner <fenner@parc.xerox.com>


# 1.40 22-Oct-1996 veego

Fix a panic from the pfil_hooks.


# 1.39 13-Oct-1996 christos

backout previous kprintf changes


# 1.38 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.37 21-Sep-1996 perry

commit fix in pr 2772 -- the IP input code was assuming that the
reserved (must be zero) flag must necessarily be zero. We now define
an IP_RF (by analogy to IP_DF and IP_MF) and mask it out when necessary.


# 1.36 14-Sep-1996 mrg

move the packet filter hooks in to a saner location. while i'm here, rename
PACKET_FILTER to PFIL_HOOKS.


# 1.35 09-Sep-1996 mycroft

Add in_nullhost() and in_hosteq() macros, to hide some protocol
details. Also, fix a bug in TCP wrt SYN+URG packets.


# 1.34 08-Sep-1996 mycroft

Save 68 bytes of the packet for ICMP, not 64. From Laine Stump, PR 2296.


# 1.33 06-Sep-1996 mrg

add packet filter interface code. see pfil(9) for more details. you
need the PACKET_FILTER option to enable this code. currently, ipfilter
version 3.1.1-beta has been converted to use this new interface.


# 1.32 14-Aug-1996 thorpej

Fix some DIAGNOSTIC printf() formats; ntohl() provides a 32-bit quantity,
and should be printed with %x, not %lx.


# 1.31 10-Jul-1996 cgd

print result of ntohl/htonl as a long. (makes -Wformat work on the
Alpha.)


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.30 16-Mar-1996 christos

branches: 1.30.4;
Fix printf format args.


# 1.29 26-Feb-1996 mrg

two more local addr changes, all done differently now (idea from charles)


# 1.28 13-Feb-1996 christos

netinet prototypes


# 1.27 16-Jan-1996 thorpej

Add a net.inet.ip.directed-broadcast sysctl as suggested by
Darren Reed <darrenr@vitruvius.arbld.unimelb.edu.au> in PR #1227.
This change is slightly different than the one submitted by Darren in
that the DIRECTED_BROADCAST compile-time option will behave like it used
to so that existing configurations utilizing it won't have to change.


# 1.26 15-Jan-1996 thorpej

Add net.inet.ip.forwsrcrt: if zero, the system will not forward
source-routed packets. Note this value is protected by kernel security
level; it can only be changed if securelevel < 1.


# 1.25 21-Nov-1995 cgd

make netinet work on systems where pointers and longs are 64 bits
(like the alpha). Biggest problem: IP headers were overlayed with
structure which included pointers, and which therefore didn't overlay
properly on 64-bit machines. Solution: instead of threading pointers
through IP header overlays, add a "queue element" structure to do
the threading, and point it at the ip headers.


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.24 12-Aug-1995 mycroft

splnet --> splsoftnet


# 1.23 12-Jun-1995 mycroft

Change in_pcbnotify*() to take an errno value. Make inetctlerrmap[] an
array on ints, not u_chars.


# 1.22 12-Jun-1995 mycroft

Various cleanup, including:
* Convert several data structures to use queue.h.
* Split in_pcbnotify() into two parts; one for notifying a specific PCB, and
one for notifying all PCBs for a particular foreign address.


# 1.21 07-Jun-1995 mycroft

Remove ip_ifmatrix completely.


# 1.20 04-Jun-1995 mycroft

Don't cast things unnecessarily.


# 1.19 04-Jun-1995 mycroft

Clean up many more casts.


# 1.18 01-Jun-1995 mycroft

Avoid byte-swapping IP addresses at run time.


# 1.17 15-May-1995 cgd

oops; forgot a '{'


# 1.16 14-May-1995 cgd

drop (and record) malformed IP fragments. Fixes pr 1030 (differently).


# 1.15 13-Apr-1995 cgd

be a bit more careful and explicit with types. (basically a large no-op.)


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.14 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.13 13-May-1994 mycroft

Update to 4.4-Lite networking code, with a few local changes.


# 1.12 14-Feb-1994 mycroft

PARANOID --> DIAGNOSTIC for inexpensive tests.


# 1.11 02-Feb-1994 hpeyerl

Multicast is no longer optional.


# 1.10 29-Jan-1994 brezak

Fix some cases of NOT dealing with m_pkthdr's. This code is still suspect though, at least this fixes some panics.


# 1.9 10-Jan-1994 mycroft

Should compile now with or without `options MULTICAST'.


# 1.8 09-Jan-1994 mycroft

Prototype the rest.


# 1.7 08-Jan-1994 mycroft

More prototypes.


# 1.6 08-Jan-1994 mycroft

Fix some inconsistent spacing; spaces at the end of lines, etc.


# 1.5 18-Dec-1993 mycroft

Canonicalize all #includes.


# 1.4 06-Dec-1993 hpeyerl

multicast support.
>From Chris Maeda, cmaeda@cs.washington.edu
These patches are derived from the IP Multicast patches for BSDI.


Revision tags: magnum-base netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.3 20-May-1993 cgd

branches: 1.3.4;
more rcsid additions and file header cleanups


# 1.2 04-May-1993 cgd

make ip_input recursion checking be for -DPARANOID, and make it panic


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.392 19-Sep-2019 ozaki-r

Apply some missing changes lost on the previous commit


# 1.391 19-Sep-2019 ozaki-r

Avoid having a rtcache directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.
A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.

Reviewed by knakahara@ and yamaguchi@


# 1.390 15-Sep-2019 bouyer

Packet filters can return an mbuf chain with fragmented headers, so
m_pullup() it if needed and remove the KASSERT()s.


Revision tags: netbsd-9-base phil-wifi-20190609
# 1.389 13-May-2019 ozaki-r

branches: 1.389.2;
Count packets dropped by pfil


Revision tags: isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118
# 1.388 17-Jan-2019 knakahara

Fix ipsecif(4) cannot apply input direction packet filter. Reviewed by ozaki-r@n.o and ryo@n.o.

Add ATF later.


Revision tags: pgoyette-compat-1226 pgoyette-compat-1126
# 1.387 15-Nov-2018 maxv

Remove the 't' argument from m_tag_find().


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.386 02-Sep-2018 maxv

remove reference to ipnat, and duplicate comments


Revision tags: pgoyette-compat-0728
# 1.385 10-Jul-2018 maxv

Remove the second argument from ip_reass_packet(). We want the IP header
on the mbuf, not elsewhere. Simplifies the NPF reassembly code a little.
No real functional change.


Revision tags: phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.384 17-May-2018 maxv

branches: 1.384.2;
Add KASSERTs, related to PR/39794.


# 1.383 14-May-2018 maxv

Merge ipsec4_input and ipsec6_input into ipsec_ip_input. Make the argument
a bool for clarity. Optimize the function: if M_CANFASTFWD is not there
(because already removed by the firewall) leave now.

Makes it easier to see that M_CANFASTFWD is not removed on IPv6.


# 1.382 10-May-2018 maxv

Rename ipsec4_forward -> ipsec_mtu, and switch to void.


Revision tags: pgoyette-compat-0502
# 1.381 26-Apr-2018 maxv

Remove unused mbuf argument from sbsavetimestamp.


Revision tags: pgoyette-compat-0422 pgoyette-compat-0415
# 1.380 15-Apr-2018 maxv

Introduce a m_verify_packet function, that verifies the mbuf chain of a
packet to ensure it is not malformed. Call this function in "points of
interest", that are the IPv4/IPv6/IPsec entry points. There could be more.

We use M_VERIFY_PACKET(m), declared under DIAGNOSTIC only.

This function should not be called everywhere, especially not in places
that temporarily manipulate (and clobber) the mbuf structure; once they're
done they put the mbuf back in a correct format.


# 1.379 11-Apr-2018 maxv

Don't pass IP_ALLOWBROADCAST in ipsec4_input. The flag lands in
ipsec_getpolicybyaddr, and only IP_FORWARDING is taken.

In fact it would be good to change the 'flags' argument of ipsec4_input
to be a boolean, same for ipsec_getpolicybyaddr. It would be less
misleading.


# 1.378 11-Apr-2018 maxv

Add comment about IPsec.


# 1.377 11-Apr-2018 maxv

Small changes in ip_dooptions: replace bcopy by memcpy, the areas can't
overlap.


Revision tags: pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.376 24-Feb-2018 ozaki-r

branches: 1.376.2;
Avoid a deadlock between softnet_lock and IFNET_LOCK

A deadlock occurs because there is a violation of the rule of lock ordering;
softnet_lock is held with hodling IFNET_LOCK, which violates the rule.
To avoid the deadlock, replace softnet_lock in in_control and in6_control
with KERNEL_LOCK.

We also need to add some KERNEL_LOCKs to protect the network stack surely.
This is required, for example, for PR kern/51356.

Fix PR kern/53043


# 1.375 09-Feb-2018 maxv

Remove dead code.


# 1.374 07-Feb-2018 maxv

Remove null check on ip, it can't be null. (Confuses code scanners.)


# 1.373 06-Feb-2018 maxv

Typos and style a bit, no functional change.


# 1.372 05-Feb-2018 maxv

Exterminate IPSENDREDIRECTS and IPMTUDISCTIMEOUT, neither is documented.


# 1.371 05-Feb-2018 maxv

Nuke DIRECTED_BROADCAST, it is not documented and not enabled anywhere. It
probably wouldn't have built correctly anyway, since there is no associated
defflag.

These ten lines of code in ip_input.c already look a lot better.


# 1.370 05-Feb-2018 maxv

Clean up this mess. This is typically the kind of places where we need to
seriously cut the bullshit. These things are unreadable, undocumented, and
all they bought us was not figuring out we had IPv4 forwarding enabled by
default for 20+ years.


# 1.369 05-Feb-2018 maxv

Be tougher, and don't allow LSRR+SSRR (RFC7126).


# 1.368 05-Feb-2018 maxv

Kick duplicate options, they are not allowed (RFC791).


# 1.367 05-Feb-2018 maxv

Remove unused variable.


# 1.366 05-Feb-2018 maxv

Disable ip_allowsrcrt and ip_forwsrcrt. Enabling them by default was a
completely dumb idea, because they have security implications.

By sending an IPv4 packet containing an LSRR option, an attacker will
cause the system to forward the packet to another IPv4 address - and
this way he white-washes the source of the packet.

It is also possible for an attacker to reach hidden networks: if a server
has a public address, and a private one on an internal network (network
which has several internal machines connected), the attacker can send a
packet with:

source = 0.0.0.0
destination = public address of the server
LSRR first address = address of a machine on the internal network

And the packet will be forwarded, by the server, to the internal machine,
in some cases even with the internal IP address of the server as a source.


# 1.365 05-Feb-2018 maxv

Style, no functional change.


# 1.364 01-Jan-2018 christos

1) "#define ipi_spec_dst ipi_addr" in <netinet/in.h>
2) Change the IP_RECVPKTINFO option to control the generation of
IP_PKTINFO control messages, the way it's done in Solaris.
3) Remove the superfluous IP_RECVPKTINFO control message.
4) Change the IP_PKTINFO option to do different things depending on
the parameter it's supplied with:
- If it's sizeof(int), assume it's being used as in Linux:
- If it's non-zero, turn on the IP_RECVPKTINFO option.
- If it's zero, turn off the IP_RECVPKTINFO option.
- If it's sizeof(struct in_pktinfo), assume it's being used as in
Solaris, to set a default for the source interface and/or
source address for outgoing packets on the socket.
5) Return what Linux or Solaris compatible code expects, depending
on data size, and just added a fallback to a Linux (and current NetBSD)
compatible value if the size is unknown (as it is now), or,
in the future, if the calling application specifies a receiving
buffer that doesn't match either data item.

From: Tom Ivar Helbekkmo


Revision tags: tls-maxphys-base-20171202
# 1.363 24-Nov-2017 roy

Allow local communication over DETACHED addresses.
Allow binding to DETACHED or TENTATIVE addresses as we deny
sending upstream from them anyway.
Prefer non DETACHED or TENTATIVE addresses.


# 1.362 17-Nov-2017 ozaki-r

Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch

It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.

No functional change


# 1.361 27-Sep-2017 ozaki-r

Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).


Revision tags: nick-nhusb-base-20170825
# 1.360 27-Jul-2017 ozaki-r

Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.


# 1.359 19-Jul-2017 ozaki-r

Correct a comment


Revision tags: perseant-stdc-iso10646-base
# 1.358 08-Jul-2017 christos

Reorder the controls to the ones that need an interface and the ones that
don't; process the ones that don't first. Add a DIAGNOSTIC if there is no
interface; really this should be a KASSERT/panic because it is a bug if the
interface is not set at this point.


# 1.357 06-Jul-2017 christos

remove unnecessary casts (no functional change)


# 1.356 06-Jul-2017 christos

Merge the two copies SO_TIMESTAMP/SO_OTIMESTAMP processing to a single
function, and add a SOOPT_TIMESTAMP define reducing compat pollution from
5 places to 1.


Revision tags: netbsd-8-base
# 1.355 01-Jun-2017 chs

branches: 1.355.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base
# 1.354 31-Mar-2017 ozaki-r

Don't use a single global variable to store source route information for multiple incoming packets

It's not MP-safe. So use a m_tag to store the information instead.

Pointed out by knakahara@
The fix is from OpenBSD (originally fixed in FreeBSD)


# 1.353 31-Mar-2017 ozaki-r

Don't use a single global variable as a temporal storage for multiple packets

It's not MP-safe. So use local variables instead.


Revision tags: pgoyette-localcount-20170320
# 1.352 06-Mar-2017 ozaki-r

Make sure icmp_redirect_timeout_q and ip_mtudisc_timeout_q are initialized on bootup

Fix PR kern/52029


# 1.351 17-Feb-2017 ozaki-r

Fix return value


# 1.350 17-Feb-2017 ozaki-r

Protect sysctl_net_inet_ip_pmtudto with icmp_mtx instead of softnet_lock


# 1.349 07-Feb-2017 ozaki-r

Add missing NULL checks for m_get_rcvif


Revision tags: nick-nhusb-base-20170204
# 1.348 24-Jan-2017 ozaki-r

Tweak softnet_lock and NET_MPSAFE

- Don't hold softnet_lock in some functions if NET_MPSAFE
- Add softnet_lock to sysctl_net_inet_icmp_redirtimeout
- Add softnet_lock to expire_upcalls of ip_mroute.c
- Restore softnet_lock for in{,6}_pcbpurgeif{,0} if NET_MPSAFE
- Mark some softnet_lock for future work


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107
# 1.347 12-Dec-2016 ozaki-r

branches: 1.347.2;
Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.346 08-Dec-2016 ozaki-r

Use psref for ip_rtaddr

ip_rtaddr will be sleepable soon. So use psref instead of pserialize.


# 1.345 08-Dec-2016 ozaki-r

Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.344 18-Oct-2016 ozaki-r

Don't hold global locks if NET_MPSAFE is enabled

If NET_MPSAFE is enabled, don't hold KERNEL_LOCK and softnet_lock in
part of the network stack such as IP forwarding paths. The aim of the
change is to make it easy to test the network stack without the locks
and reduce our local diffs.

By default (i.e., if NET_MPSAFE isn't enabled), the locks are held
as they used to be.

Reviewed by knakahara@


# 1.343 18-Oct-2016 ozaki-r

Avoid double frees of mbuf

May fix one of panicks reported by Tom Ivar Helbekkmo in PR kern/51522


# 1.342 11-Oct-2016 ozaki-r

Fix kernel builds with IFA_STATS


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.341 07-Sep-2016 roy

Disallow input to detached addresses because they are not yet valid.


# 1.340 31-Aug-2016 ozaki-r

Make ipforward_rt and ip6_forward_rt percpu

Sharing one rtcache between CPUs is just a bad idea.

Reviewed by knakahara@


Revision tags: pgoyette-localcount-20160806
# 1.339 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


# 1.338 26-Jul-2016 ozaki-r

Fix downmatch increment


Revision tags: pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.337 08-Jul-2016 ozaki-r

branches: 1.337.2;
CID 1363344: remove dead code

We may need to reconsider a case when m_get_rcvif_psref returns NULL.


# 1.336 07-Jul-2016 ozaki-r

Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.


# 1.335 06-Jul-2016 ozaki-r

Switch the IPv4 address list to pslist(9)

Note that we leave the old list just in case; it seems there are some
kvm(3) users accessing the list. We can remove it later if we confirmed
nobody does actually.


# 1.334 06-Jul-2016 ozaki-r

Add and use pslist(9)-based hashtable for IPv4 addresses

Note that we leave the old hashtable to keep vmstat -H working.


# 1.333 04-Jul-2016 ozaki-r

Separate IP address matching functions

No functional change intended.


# 1.332 30-Jun-2016 ozaki-r

Tidy up goto lables

No functional change.


# 1.331 30-Jun-2016 ozaki-r

Fix error paths

Some error paths did m_put_rcvif_psref twice.


# 1.330 28-Jun-2016 ozaki-r

Add missing NULL checks for m_get_rcvif_psref


# 1.329 10-Jun-2016 ozaki-r

Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319
# 1.328 21-Jan-2016 riastradh

Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.


# 1.327 21-Jan-2016 riastradh

Give proper prototype to ip_output.


# 1.326 08-Jan-2016 knakahara

eliminate ip_input.c and ip6_input.c dependency on gif(4)


Revision tags: nick-nhusb-base-20151226
# 1.325 13-Oct-2015 roy

Include arp.h to restore the sysctl net.inet.ip.dad_count.
Fixes PR kern/49883 thanks to HITOSHI Osada.


Revision tags: nick-nhusb-base-20150921
# 1.324 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.323 07-Aug-2015 ozaki-r

Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.


Revision tags: nick-nhusb-base-20150606
# 1.322 02-May-2015 joerg

Fix !ARP build.


# 1.321 02-May-2015 roy

Add IPv4 address flags IN_IFF_TENTATIVE, IN_IFF_DUPLICATED and
IN_IFF_DETATCHED to mimic the IPv6 address behaviour.
Add SIOCGIFAFLAG_IN ioctl to retrieve the address flag via the
ifreq structure.
Add IPv4 DAD detection via the ARP methods described in RFC 5227.
Add sysctls net.inet.ip.dad_count and net.inet.arp.debug.

Discussed on tech-net@


Revision tags: nick-nhusb-base-20150406
# 1.320 26-Mar-2015 ozaki-r

Tidy up the regular path of ip_forward

No functional change is intended.


Revision tags: netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.319 16-Jun-2014 ozaki-r

branches: 1.319.2; 1.319.4; 1.319.6; 1.319.10;
Add 3rd argument to pktq_create to pass sc

It will be used to pass bridge sc for bridge_forward softint.

ok rmind@


# 1.318 05-Jun-2014 rmind

- Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.


# 1.317 30-May-2014 christos

Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.


# 1.316 29-May-2014 rmind

Make IGMP and multicast group management code MP-safe. Use a read-write
lock to protect the hash table of multicast address records; also, make it
private and eliminate some macros. In the long term, the lookup path ought
to be optimised.


# 1.315 28-May-2014 christos

CID 12164{49,51}: Remove bogus ifp == NULL checks; if ifp was really NULL,
we would have been dead a few lines before the tests.


# 1.314 23-May-2014 rmind

ip_input(), ip_savecontrol(): cache m->m_pkthdr.rcvif in a variable.


# 1.313 23-May-2014 rmind

Make ip_forward() static, there is no need to expose it.


# 1.312 23-May-2014 rmind

Make ip_input() static, there is no need to expose it.


# 1.311 22-May-2014 rmind

- Add in_init() and move some functions, variables and sysctls into in.c
where they belong to. Make some functions and variables static.
- ip_input.c: reduce some #ifdefs, cleanup a little.
- Move some sysctls into ip_flow.c as they belong there.

No functional change.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.310 19-Mar-2014 liamjfoy

branches: 1.310.2;
Remove ipflow_prune and replace with ipflow_reap. ok rmind@


Revision tags: riastradh-drm2-base3
# 1.309 25-Feb-2014 pooka

Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.308 29-Jun-2013 rmind

- Rewrite parts of pfil(9): use array to store hooks and thus be more cache
friendly (there are only few hooks in the system). Make the structures
opaque and the interface more strict.
- Remove PFIL_HOOKS option by making pfil(9) mandatory.


# 1.307 27-Jun-2013 christos

branches: 1.307.2;
flip src/dst


# 1.306 27-Jun-2013 christos

implement IP_PKTINFO and IP_RECVPKTINFO.


# 1.305 08-Jun-2013 rmind

Split IPsec code in ip_input() and ip_forward() into the separate routines
ipsec4_input() and ipsec4_forward(). Tested by christos@.


# 1.304 05-Jun-2013 christos

IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.


Revision tags: agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.303 29-Nov-2012 christos

Add a new sysctl to mark ports as reserved, so that they are not used in
the anonymous or reserved port allocation.


Revision tags: yamt-pagecache-base6
# 1.302 25-Jun-2012 christos

branches: 1.302.2;
rename rfc6056 -> portalgo, requested by yamt


# 1.301 22-Jun-2012 christos

PR/46602: Move the rfc6056 port randomization to the IP layer.


# 1.300 02-Jun-2012 dsl

Add some pre-processor magic to verify that the type of the data item
passed to sysctl_createv() actually matches the declared type for
the item itself.
In the places where the caller specifies a function and a structure
address (typically the 'softc') an explicit (void *) cast is now needed.
Fixes bugs in sys/dev/acpi/asus_acpi.c sys/dev/bluetooth/bcsp.c
sys/kern/vfs_bio.c sys/miscfs/syncfs/sync_subr.c and setting
AcpiGbl_EnableAmlDebugObject.
(mostly passing the address of a uint64_t when typed as CTLTYPE_INT).
I've test built quite a few kernels, but there may be some unfixed MD
fallout. Most likely passing &char[] to char *.
Also add CTLFLAG_UNSIGNED for unsiged decimals - not set yet.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.299 22-Mar-2012 drochner

remove KAME IPSEC, replaced by FAST_IPSEC


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.298 09-Jan-2012 liamjfoy

branches: 1.298.2; 1.298.6; 1.298.8;
check against NULL


# 1.297 19-Dec-2011 drochner

rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.296 31-Aug-2011 plunky

branches: 1.296.2; 1.296.6;
NULL does not need a cast


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.295 03-May-2011 dyoung

*_drain() routines may be called with locks held, so instead of doing
any work in *_drain(), set a drain-needed flag. Do the work in the
fasttimo handler.

Contributed by Coyote Point Systems, Inc.


# 1.294 14-Apr-2011 dyoung

In ipintr(), don't overwrite ipintrq.ifq_maxlen with IFQ_MAXLEN.

Initialize ipintrq.ifq_maxlen using IFQ_MAXLEN directly instead of using
the global ipqmaxlen. Get rid of the global ipqmaxlen.

Now it works again to override the maximum IP queue length with, for
example, sysctl -w net.inet.ip.ifq.maxlen=5.


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.293 13-Dec-2010 matt

branches: 1.293.2;
Back out rev that shouldn't have been committed.


# 1.292 11-Dec-2010 matt

Add routines to calculate a checkesum if the driver concludes that the
h/w can't do it.


Revision tags: uebayasi-xip-base4
# 1.291 05-Nov-2010 rmind

ip_randomid: make mechanism MP-safe and more modular.

OK matt@


# 1.290 05-Nov-2010 rmind

ip_reass_packet: finish abstraction; some clean-up.
Discussed some time ago with matt@.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.289 19-Jul-2010 rmind

Abstract IP reassembly into single generic routine - ip_reass_packet().
Make struct ipq private and struct ipqent not visible to userland.
Push ip_len adjustment into reassembly layer.

OK matt@


# 1.288 13-Jul-2010 rmind

Split-off IPv4 re-assembly mechanism into a separate module. Abstract
into ip_reass_init(), ip_reass_lookup(), etc (note: abstraction is not
yet complete). No functional changes to the actual mechanism.

OK matt@


# 1.287 09-Jul-2010 rmind

ip_input: move lookup for fragment queue a little bit further. OK matt@.


Revision tags: uebayasi-xip-base1
# 1.286 01-Apr-2010 tls

As suggested by at least 3 different people (the guilty parties know who
they are) avoid repeated kernel_lock/unlock by using an intrq on the stack.

About 5%-10% better from run to run, on my *very* simpleminded test. Can't
possibly be worse.


# 1.285 31-Mar-2010 tls

Don't hold kernel lock across call to ip_input() -- it blocked *all*
hardware interrupts for the length of time it took for all dequeued
packets to flow up the stack (on multiprocessors only). Initial testing
shows performance impact is minimal -- since this temporary fix actually
means taking/releasing the kernel lock per-packet, that seems
acceptable.

Holding the kernel lock across the ip_input() call duplicated the
exclusion intended to be provided by the socket locks/softnet lock
(same lock, for INET/INET6 sockets) and could mask serious bugs. Several
hours' testing didn't turn any up but I'd be surprised if some don't now
appear.

Damon Permezel noticed the problem. Temporary fix suggested by matt@.


Revision tags: yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.284 16-Sep-2009 pooka

branches: 1.284.2; 1.284.4;
Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.283 17-Jul-2009 minskim

Delete trailing whitespace.


Revision tags: yamt-nfs-mp-base6
# 1.282 16-Jul-2009 minskim

Add the IP_RECVTTL option support.

If the IP_RECVTTL option is enabled on a SOCK_DGRAM socket, the
recvmsg(2) call will return the TTL of the received datagram. The
msg_control field in the msghdr structure points to a buffer that
contains a cmsghdr structure followed by the TTL value.

Modeled after FreeBSD implementation.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.281 18-Apr-2009 tsutsui

Remove extra whitespace added by a stupid tool.
XXX: more in src/sys/arch


# 1.280 15-Apr-2009 elad

Remove a few KAUTH_GENERIC_ISSUSER in favor of more descriptive
alternatives.

Discussed on tech-kern:

http://mail-index.netbsd.org/tech-kern/2009/04/11/msg004798.html

Input from ad@, christos@, dyoung@, tsutsui@.

Okay ad@.


# 1.279 18-Mar-2009 cegger

bcopy -> memcpy


Revision tags: nick-hppapmap-base2
# 1.278 19-Jan-2009 christos

branches: 1.278.2;
Provide compatibility to the old timeval SCM_TIMESTAMP messages.


Revision tags: mjf-devfs2-base
# 1.277 17-Dec-2008 cegger

kill MALLOC and FREE macros.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.276 23-Nov-2008 rmind

ip_input: fix an IPQ "lock" leak. (hi <matt>!)


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4
# 1.275 04-Oct-2008 pooka

branches: 1.275.2; 1.275.4;
POOL_INIT -> pool_init


Revision tags: wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.274 05-Sep-2008 seanb

Wrong route being consulted in one place
in ip_forward() after change to rtcache_*().
Restore previous behaviour.


# 1.273 20-Aug-2008 matt

Make the sysctl routines take out softnet_lock before dealing with
any data structures.

Change inet6ctlerrmap and zeroin6_addr to const.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base
# 1.272 05-May-2008 ad

branches: 1.272.2; 1.272.6;
- Convert hashinit() to use kmem_alloc(). The hash tables can be large
and it's better to not have them in kmem_map.
- Convert a couple of minor items along the way to kmem_alloc().
- Fix some memory leaks.


# 1.271 04-May-2008 thorpej

Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577


# 1.270 02-May-2008 ad

PR kern/38497 Out of memory allocating ksiginfo

Work around: don't acquire softnet_lock in protocol drain routines.


# 1.269 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.268 24-Apr-2008 ad

branches: 1.268.2;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.


# 1.267 23-Apr-2008 thorpej

Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.266 12-Apr-2008 thorpej

branches: 1.266.2;
Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.


# 1.265 09-Apr-2008 thorpej

- ipflow is not used outside ip_flow.c; move its definition there.
- Make ipflow_reap() private to ip_flow.c, and introduce ipflow_prune()
for external callers to use (avoids returning an ipflow * that is never
actually used anyway).


# 1.264 07-Apr-2008 thorpej

Change IP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ipstat structure; old netstat
binaries will continue to work properly.


# 1.263 27-Mar-2008 cube

- Make sure we send a reasonable fragment size when IPSEC is configured.
Otherwise we end up sending a dubious "0" whenever we cannot find a
proper association for the packet.
- Reset sack_newdata along with snd_nxt to avoid improper integer
arithmetics that lead to sending data from an incorrect place in the
stream, making it appear as corrupted.

Patch by Michael Van Elst, based on an analysis by Michael for the IPSEC
stuff and I for the SACK issue.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.262 06-Feb-2008 matt

branches: 1.262.6;
Add a new ip_id generation scheme based on a Fisher-Yates shuffle over a
sliding window. XXX replace use of arc4random RSN.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.261 14-Jan-2008 dyoung

Use rtcache_validate() instead of rtcache_getrt(). Shorten staircase
in in_losing().


Revision tags: vmlocking2-base3 matt-armv6-base
# 1.260 22-Dec-2007 matt

Fix offset calculation.
Make sure that all frags use the same TOS.


# 1.259 21-Dec-2007 matt

Also make sure the first is at 68 bytes long.


# 1.258 21-Dec-2007 matt

Prevent TCP blind data attacks by not allowing non-initial fragments to
start at less than 68 bytes (minimal fragment size).


# 1.257 20-Dec-2007 dyoung

Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.256 26-Nov-2007 yamt

branches: 1.256.2; 1.256.6;
inetctlerrmap: use designated initializer.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.255 09-Nov-2007 kefren

Don't MCLAIM in ipintr() because we do it anyway in ip_input()


Revision tags: jmcneill-base yamt-x86pmap-base4 yamt-x86pmap-base3 yamt-x86pmap-base2 vmlocking-base
# 1.254 02-Oct-2007 dyoung

branches: 1.254.2; 1.254.4;
Delete the unused second argument to ip_stripoptions(), move it
closer to its single caller in if_eon.c, try to move fewer bytes
by moving the IP header forward instead of moving the tail of the
mbuf backward, and use m_adj(9) instead of fiddling directly with
mbuf data members.


Revision tags: yamt-x86pmap-base
# 1.253 11-Sep-2007 degroote

branches: 1.253.2;
In some FAST_IPSEC, spl level is not restored correctly. Fix that.

Spotted by Wolfgang Stukenbrock in pr/36800


Revision tags: nick-csl-alignment-base5
# 1.252 30-Aug-2007 dyoung

Use malloc(9) for sockaddrs instead of pool(9), and remove dom_sa_pool
and dom_sa_len members from struct domain. Pools of fixed-size
objects are too rigid for sockaddr_dls, whose size can vary over
a wide range.

Return sockaddr_dl to its "historical" size. Now that I'm using
malloc(9) instead of pool(9) to allocate sockaddr_dl, I can create
a sockaddr_dl of any size in the kernel, so expanding sockaddr_dl
is useless.

Avoid using sizeof(struct sockaddr_dl) in the kernel.

Introduce sockaddr_dl_alloc() for allocating & initializing an
arbitrary sockaddr_dl on the heap.

Add an argument, the sockaddr length, to sockaddr_alloc(),
sockaddr_copy(), and sockaddr_dl_setaddr().

Constify: LLADDR() -> CLLADDR().

Where the kernel overwrites LLADDR(), use sockaddr_dl_setaddr(),
instead. Used properly, sockaddr_dl_setaddr() will not overrun
the end of the sockaddr.


# 1.251 10-Aug-2007 dyoung

branches: 1.251.2;
Use sockaddr_dl_init().


Revision tags: matt-mips64-base
# 1.250 19-Jul-2007 dyoung

branches: 1.250.4; 1.250.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base yamt-idlelwp-base8 mjf-ufs-trans-base
# 1.249 02-May-2007 dyoung

branches: 1.249.2;
Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


Revision tags: thorpej-atomic-base
# 1.248 25-Mar-2007 liamjfoy

Add net.inet.ip.hashsize to control the IPv4 fast forward hash table size.


# 1.247 24-Mar-2007 liamjfoy

Don't call ip*flow_reap if we're just looking up maxflows


# 1.246 12-Mar-2007 ad

branches: 1.246.2; 1.246.4;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.


# 1.245 05-Mar-2007 liamjfoy

branches: 1.245.2;
Move ipflow_slowtimo from ip_slowtimo and into in_proto.c

ok matt@


# 1.244 04-Mar-2007 christos

Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.243 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.242 29-Jan-2007 dyoung

branches: 1.242.2;
Cosmetic: remove extraneous, non-KNF parentheses. Change a
sizeof(type) to a sizeof(*ptr) so the correctness of the statement
is correct "at a glance" (or so I hope).


# 1.241 22-Dec-2006 ad

ipintr(): check if the queue is empty before looping. Hardly a giant
win, but removed 30% of splnet() calls in one local test.


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.240 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.239 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


# 1.238 06-Dec-2006 dyoung

KNF.


# 1.237 06-Dec-2006 dyoung

KNF.


Revision tags: netbsd-4-0-RC1 netbsd-4-base
# 1.236 16-Nov-2006 christos

branches: 1.236.2; 1.236.4;
__unused removal on arguments; approved by core.


Revision tags: yamt-splraiseipl-base2
# 1.235 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


# 1.234 10-Oct-2006 dogcow

change the MOWNER_INIT define to take two args; fix extant struct mowner
decls to use it. Makes options MBUFTRACE compile again and not whinge about
missing structure declarations. (Also makes initialization consistent.)


# 1.233 05-Oct-2006 tls

Protect calls to pool_put/pool_get that may occur in interrupt context
with spl used to protect other allocations and frees, or datastructure
element insertion and removal, in adjacent code.

It is almost unquestionably the case that some of the spl()/splx() calls
added here are superfluous, but it really seems wrong to see:

s=splfoo();
/* frob data structure */
splx(s);
pool_put(x);

and if we think we need to protect the first operation, then it is hard
to see why we should not think we need to protect the next. "Better
safe than sorry".

It is also almost unquestionably the case that I missed some pool
gets/puts from interrupt context with my strategy for finding these
calls; use of PR_NOWAIT is a strong hint that a pool may be used from
interrupt context but many callers in the kernel pass a "can wait/can't
wait" flag down such that my searches might not have found them. One
notable area that needs to be looked at is pf.

See also:

http://mail-index.netbsd.org/tech-kern/2006/07/19/0003.html
http://mail-index.netbsd.org/tech-kern/2006/07/19/0009.html


# 1.232 19-Sep-2006 elad

Remove ugly (void *) casts from network scope authorization wrapper and
calls to it.

While here, adapt code for system scope listeners to avoid some more
casts (forgotten in previous run).

Update documentation.


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9
# 1.231 13-Sep-2006 elad

branches: 1.231.2;
Don't use KAUTH_RESULT_* where it's not applicable.
Prompted by yamt@.


# 1.230 08-Sep-2006 elad

First take at security model abstraction.

- Add a few scopes to the kernel: system, network, and machdep.

- Add a few more actions/sub-actions (requests), and start using them as
opposed to the KAUTH_GENERIC_ISSUSER place-holders.

- Introduce a basic set of listeners that implement our "traditional"
security model, called "bsd44". This is the default (and only) model we
have at the moment.

- Update all relevant documentation.

- Add some code and docs to help folks who want to actually use this stuff:

* There's a sample overlay model, sitting on-top of "bsd44", for
fast experimenting with tweaking just a subset of an existing model.

This is pretty cool because it's *really* straightforward to do stuff
you had to use ugly hacks for until now...

* And of course, documentation describing how to do the above for quick
reference, including code samples.

All of these changes were tested for regressions using a Python-based
testsuite that will be (I hope) available soon via pkgsrc. Information
about the tests, and how to write new ones, can be found on:

http://kauth.linbsd.org/kauthwiki

NOTE FOR DEVELOPERS: *PLEASE* don't add any code that does any of the
following:

- Uses a KAUTH_GENERIC_ISSUSER kauth(9) request,
- Checks 'securelevel' directly,
- Checks a uid/gid directly.

(or if you feel you have to, contact me first)

This is still work in progress; It's far from being done, but now it'll
be a lot easier.

Relevant mailing list threads:

http://mail-index.netbsd.org/tech-security/2006/01/25/0011.html
http://mail-index.netbsd.org/tech-security/2006/03/24/0001.html
http://mail-index.netbsd.org/tech-security/2006/04/18/0000.html
http://mail-index.netbsd.org/tech-security/2006/05/15/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/01/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/25/0000.html

Many thanks to YAMAMOTO Takashi, Matt Thomas, and Christos Zoulas for help
stablizing kauth(9).

Full credit for the regression tests, making sure these changes didn't break
anything, goes to Matt Fleming and Jaime Fournier.

Happy birthday Randi! :)


Revision tags: yamt-pdpolicy-base8 rpaulo-netinet-merge-pcb-base
# 1.229 30-Aug-2006 christos

branches: 1.229.2;
fix initializer


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.228 30-Jul-2006 elad

ugh.. more stuff that's overdue and should not be in 4.0: remove the
sysctl(9) flags CTLFLAG_READONLY[12]. luckily they're not documented
so it's only half regression.

only two knobs used them; proc.curproc.corename (check added in the
existing handler; its CTLFLAG_ANYWRITE, yay) and net.inet.ip.forwsrcrt,
that got its own handler now too.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base
# 1.227 07-Jun-2006 kardel

merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base simonb-timecounters-base
# 1.226 08-May-2006 liamjfoy

branches: 1.226.2;
#if -> #ifdef

ok christos


# 1.225 15-Apr-2006 christos

Coverity CID 1134: Protect against NULL deref.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.224 18-Feb-2006 joerg

branches: 1.224.2; 1.224.4; 1.224.6;
Print the source and destination IP in ip_forward's DIAGNOSTIC code
with inet_ntoa, making it more human friendly.

From Liam J. Foy in private mail.


# 1.223 24-Dec-2005 perry

branches: 1.223.2; 1.223.4; 1.223.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.222 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.221 01-Nov-2005 christos

Don't decrement the ttl, until we are sure that we can forward this packet.
Before if there was no route, we would call icmp_error with a datagram
packet that has an incorrect checksum. (From Liam Foy)


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base
# 1.220 23-Oct-2005 christos

No need to pass an interface when only the mtu is needed. From OpenBSD via
Liam Foy.


Revision tags: yamt-vop-base
# 1.219 05-Aug-2005 elad

branches: 1.219.2;
Add sysctls for IP, ICMP, TCP, and UDP statistics.


# 1.218 28-Jun-2005 seanb

branches: 1.218.2;
- Return ICMP_UNREACH_NET when no route found as per
section 4.3.3.1 of rfc1812.


# 1.217 09-Jun-2005 atatat

Properly fix the constipated lossage wrt -Wcast-qual and the sysctl
code. I know it's not the prettiest code, but it seems to work rather
well in spite of itself.


# 1.216 01-Jun-2005 blymn

Unconstify rnode to prevent compile error when GATEWAY option set.


Revision tags: kent-audio2-base
# 1.215 29-Apr-2005 yamt

move decl of inetsw to its own header to avoid array of incomplete type.
found by gcc4. reported by Adam Ciarcinski.


# 1.214 18-Apr-2005 yamt

fix problems related to loopback interface checksum omission. PR/29971.

- for ipv4, defer decision to ip layer as h/w checksum offloading does
so that it can check the actual interface the packet is going to.
- for ipv6, disable it.
(maybe will be revisited when it implements h/w checksum offloading.)

ok'ed by Jason Thorpe.


# 1.213 29-Mar-2005 yamt

ip_reass: clear stale csum_flags.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base
# 1.212 26-Feb-2005 perry

branches: 1.212.2;
nuke trailing whitespace


Revision tags: yamt-km-base2
# 1.211 03-Feb-2005 perry

ANSIfy function declarations


# 1.210 02-Feb-2005 perry

de-__P -- will ANSIfy .c files later.


Revision tags: yamt-km-base
# 1.209 24-Jan-2005 matt

branches: 1.209.2;
Add IFNET_FOREACH and IFADDR_FOREACH macros and start using them.


Revision tags: kent-audio1-beforemerge
# 1.208 19-Dec-2004 christos

branches: 1.208.2;
yamt's changes seem to fix all the checksumming issues. Turn the loopback
checksums back off so we can make sure that everything works.


# 1.207 17-Dec-2004 christos

Turn checksumming on loopback back on until we fix the bugs in it.
Connect over tcp on the loopback is broken:

4729 amq 0.000007 CALL connect(4,0x804f2a0,0x1c)
4729 amq 75.007420 RET connect -1 errno 60 Connection timed out


# 1.206 15-Dec-2004 thorpej

Don't perform checksums on loopback interfaces. They can be reenabled with
the net.inet.*.do_loopback_cksum sysctl.

Approved by: groo


Revision tags: kent-audio1-base
# 1.205 06-Oct-2004 darrenr

Add a comment to document what setting "srcrt" is really on about in ipintr()


# 1.204 29-Sep-2004 christos

PR/27081: Sean Boudreau: ip_input() bad csum count not incremented on sw csum


Revision tags: BEFORE-IPF413
# 1.203 25-May-2004 atatat

Sysctl descriptions under net subtree (net.key not done)


# 1.202 02-May-2004 darrenr

at line 543, we do a pullup here of hlen bytes into the mbuf,
so these later ones are superfluous.


# 1.201 01-May-2004 matt

Use EVCNT_ATTACH_STATIC{,2}


# 1.200 25-Apr-2004 simonb

Initialise (most) pools from a link set instead of explicit calls
to pool_init. Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.

Convert struct session, ucred and lockf to pools.


# 1.199 22-Apr-2004 matt

Constify protosw arrays. This can reduce the kernel .data section by
over 4K (if all the network protocols) are loaded.


# 1.198 01-Apr-2004 matt

In ip_reass_ttl_descr, make i signed since it's compared to >= 0


Revision tags: netbsd-2-0-base BEFORE-IPF411
# 1.197 24-Mar-2004 atatat

branches: 1.197.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.


# 1.196 15-Jan-2004 itojun

correct typo in 1.94 -> 1.95. pointed out by Shiva Shenoy


# 1.195 14-Dec-2003 thorpej

Fix syntax errors in CHECK_NMBCLUSTER_PARAMS().


# 1.194 14-Dec-2003 jonathan

Second part of hashed IP_reassembly changes:

When under pressure for mbufs or we have too many fragments in the IP
reassembly queue, drop half of all fragments. This multiplicative-drop
strategy ensures we return to a healthy state, even under borderline
denial-of-service from extremely lossy NFS-over-UDP peers.
The multiplicative-drop phase currently drops 50% of fragments, but
has pre-placed support for implementing drop-fractions other than 50%

The threshhold for the `drop-half' phase is the new variable,
ip_maxfrags which is calculated as nmbclusters/4.

ip_input.c now keeps ip_nmbclusters, a cached copy of nmbclusters.
Before using limits derived from nmbclusters, we check if nmbclusters
and ip_nmclusters are equal. If not, we recompute Ip parameters
derived from nmbclusters. Based on a suggestion by Jason Thorpe.
ip_maxfrags is currently auto-recalcuated.

The counters ip_nfrags and ip_nfragpacketsr are now declared static
and uninitialized (bss), to discourage tampering with them.


# 1.193 12-Dec-2003 scw

Make fast-ipsec and ipflow (Fast Forwarding) interoperate.

The idea is that we only clear M_CANFASTFWD if an SPD exists
for the packet. Otherwise, it's safe to add a fast-forward
cache entry for the route.

To make this work properly, we invalidate the entire ipflow
cache if a fast-ipsec key is added or changed.


# 1.192 08-Dec-2003 jonathan

Add new field ipq_nfrags to struct ipq. Maintain count of fragments
(fragments, not fragmented packets) in each queue entry.
Use ipq_nfrags to maintain a count of total fragments in reassembly queue.


# 1.191 07-Dec-2003 jonathan

KNF: s/unsigned/u_int/, in a couple of places I missed.


# 1.190 06-Dec-2003 jonathan

Replace the single global IP reassembly list/listhead, with a
hashtable of list-heads. Independently re-invented, then reworked to
match similar code in FreeBSD.


# 1.189 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.188 04-Dec-2003 scw

ipflow (IP fast forwarding) is not compatible with FAST_IPSEC either.

XXX: The decision whether or not to fast forward should be made
XXX: dynamically. Using the current approach seriously reduces
XXX: routing performance on gateways with IPsec enabled.


# 1.187 26-Nov-2003 itojun

define RANDOM_IP_ID by default (unifdef -DRANDOM_IP_ID).
one use remains in sys/netipsec, which is kept for freebsd source code compat.


# 1.186 24-Nov-2003 scw

For FAST_IPSEC, ipfilter gets to see wire-format IPsec-encapsulated packets
only. Decapsulated packets bypass ipfilter. This mimics current behaviour
for Kame IPsec.


# 1.185 19-Nov-2003 fvdl

Correct number of arguments to sysctl_rdint.


# 1.184 19-Nov-2003 jonathan

Patch back support for (badly) randomized IP ids, by request:

* Include "opt_inet.h" everywhere IP-ids are generated with ip_newid(),
so the RANDOM_IP_ID option is visible. Also in ip_id(), to ensure
the prototype for ip_randomid() is made visible.

* Add new sysctl to enable randomized IP-ids, provided the kernel was
configured with RANDOM_IP_ID. (The sysctl defaults to zero, and is
a read-only zero if RANDOM_IP_ID is not configured).

Note that the implementation of randomized IP ids is still defective,
and should not be enabled at all (even if configured) without
very careful deliberation. Caveat emptor.


# 1.183 17-Nov-2003 jonathan

Diff to netinet/ip_input.c (restore ip_id, initialize) for ip_id fix:

Revert the (default) ip_id algorithm to the pre-randomid algorithm,
due to demonstrated low-period repeated IDs from the randomized IP_id
code. Consensus is that the low-period repetition (much less than
2^15) is not suitable for general-purpose use.

Allocators of new IPv4 IDs should now call the function ip_newid().
Randomized IP_ids is now a config-time option, "options RANDOM_IP_ID".
ip_newid() can use ip_random-id()_IP_ID if and only if configured
with RANDOM_IP_ID. A sysctl knob should be provided.

This API may be reworked in the near future to support linear ip_id
counters per (src,dst) IP-address pair.


# 1.182 12-Nov-2003 itojun

KNF


# 1.181 11-Nov-2003 jonathan

Change global head-of-local-IP-address list from in_ifaddr to
in_ifaddrhead. Recent changes in struct names caused a namespace
collision in fast-ipsec, which are most cleanly fixed by using
"in_ifaddrhead" as the listhead name.


# 1.180 10-Nov-2003 jonathan

Make per-protocol network input queue stats visible to userland via
sysctl. Add a protocol-independent sysctl handler to show the per-protocol
"struct ifq' statistics. Add IP(v4) specific call to the handler.
Other protocols can show their per-protocol input statistics by
allocating a sysclt node and calling sysctl_ifq() with their own struct ifq *.

As posted to tech-kern plus improvements/cleanup suggested by Andrew Brown.


# 1.179 28-Sep-2003 mycroft

Remove some code that breaks AH tunnels completely. The comment describing
the purpose of this code appears to be on crack -- it's talking about
end-to-end authentication, but the purpose of an AH tunnel is NOT end-to-end
authentication; it's authentication of the tunnel endpoints.

NB: This does not fix the fact that IPsec leaks "packet tags."


# 1.178 06-Sep-2003 itojun

randomize IPv4/v6 fragment ID and IPv6 flowlabel. avoids predictability
of these fields. ip_id.c is from openbsd. ip6_id.c is adapted by kame.


# 1.177 06-Sep-2003 itojun

backout previous, we don't know if arc4random() corrides on reboot.


# 1.176 05-Sep-2003 itojun

initialize fragment ID with arc4random, not by time.tv_sec


# 1.175 22-Aug-2003 itojun

remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.


# 1.174 22-Aug-2003 itojun

change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.


# 1.173 15-Aug-2003 jonathan

(fast-ipsec): Add hooks to pass IPv4 IPsec traffic into fast-ipsec, if
configured with ``options FAST_IPSEC''. Kernels with KAME IPsec or
with no IPsec should work as before.

All calls to ip_output() now always pass an additional compulsory
argument: the inpcb associated with the packet being sent,
or 0 if no inpcb is available.

Fast-ipsec tested with ICMP or UDP over ESP. TCP doesn't work, yet.


# 1.172 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.171 14-Jul-2003 itojun

correct igmp. from love


# 1.170 03-Jul-2003 itojun

minor KNF


# 1.169 30-Jun-2003 itojun

branches: 1.169.2;
do not generate ICMP redirect when packet filter alters ip_dst to an
address that reside on the same link. Cedric Berger convinced me that
it is necessary.


# 1.168 30-Jun-2003 itojun

fix indent


# 1.167 23-Jun-2003 martin

Make sure to include opt_foo.h if a defflag option FOO is used.


# 1.166 15-Jun-2003 matt

Change the way multicasts are kept. They now use a hash table in the same
manner as the ifaddr hash table. By doing this, the mkludge code can go
away. At the same time, keep track of what pcbs are using what ifaddr and
when an address is deleted from an interface, notify/abort all sockets
that have that address as a source. Switch IGMP and multicasts to use pools
for allocation. Fix a number of potential problems in the igmp code where
allocation failures could cause a trap/panic.


# 1.165 11-Apr-2003 christos

PR/991: Darren Reed: Add a sysctl (checkinteface) to implement this. This
implementation is taken from FreeBSD, but we default to off.
XXX: We should really do this on a per ifaddr basis as jason suggested.


# 1.164 26-Feb-2003 matt

Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.163 12-Nov-2002 itojun

remove all entries in rt timer queue on ip_mtudisc change, instead of
destroying the queue.


# 1.162 12-Nov-2002 itojun

ckout previous - doesn't compile


# 1.161 12-Nov-2002 itojun

update ip_mtudisc sysctl change handling.


# 1.160 10-Nov-2002 itojun

always create pmtud timeout queue, as ip_mtudisc can be tweaked via
sysctl at runtime. From lha@stacken.kth.se


# 1.159 02-Nov-2002 perry

/*CONTCOND*/ while (0)'ed macros


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.158 23-Sep-2002 itojun

revert mtudisc_timeout value to the old one if update falis


# 1.157 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame.


# 1.156 11-Sep-2002 itojun

correct signedness mixup in pointer passing. sync w/kame


Revision tags: gehenna-devsw-base
# 1.155 14-Aug-2002 itojun

avoid swapping endian of ip_len and ip_off on mbuf, to meet with M_LEADINGSPACE
optimization made last year. should solve PR 17867 and 10195.

IP_HDRINCL behavior of raw ip socket is kept unchanged. we may want to
provide IP_HDRINCL variant that does not swap endian.


# 1.154 30-Jun-2002 thorpej

Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).


# 1.153 13-Jun-2002 itojun

set IPv4 parameter to modern value.
- turn on path MTU discovery (previous: turned off)
- ICMPv4 redirect entry timeout = 600 sec (previous: never timeout)


# 1.152 09-Jun-2002 itojun

whitespace


# 1.151 07-Jun-2002 itojun

look at rmx_mtu on IPsec tunnel MTU computation.
From: David Waitzman <djw@bbn.com>


Revision tags: netbsd-1-6-base
# 1.150 12-May-2002 matt

branches: 1.150.2; 1.150.4;
Eliminate commons.


# 1.149 12-May-2002 wiz

Spelling fixes, from Sergey Svishchev in kern/16650.


# 1.148 07-May-2002 matt

Change struct ipqe to use TAILQ's instead of LIST's (primarily for TCP's
benefit currently). Rework tcp_reass code to optimize the 4 most likely causes
of out-of-order packets: first OoO pkt, next OoO pkt in seq, OoO pkt is part
of new chuck of OoO packets, and the OoO pkt fills the first hole. Add evcnts
to instrument tcp_reass (enabled by the options TCP_REASS_COUNTERS). This is
part 1/2 of tcp_reass changes.


# 1.147 18-Apr-2002 matt

Change test for M_EXT to M_READONLY for MROUTING. We only need to to do
a pullup if we aren't allowed to modify the packet.


Revision tags: eeh-devprop-base newlock-base
# 1.146 08-Mar-2002 thorpej

Pool deals fairly well with physical memory shortage, but it doesn't
deal with shortages of the VM maps where the backing pages are mapped
(usually kmem_map). Try to deal with this:

* Group all information about the backend allocator for a pool in a
separate structure. The pool references this structure, rather than
the individual fields.
* Change the pool_init() API accordingly, and adjust all callers.
* Link all pools using the same backend allocator on a list.
* The backend allocator is responsible for waiting for physical memory
to become available, but will still fail if it cannot callocate KVA
space for the pages. If this happens, carefully drain all pools using
the same backend allocator, so that some KVA space can be freed.
* Change pool_reclaim() to indicate if it actually succeeded in freeing
some pages, and use that information to make draining easier and more
efficient.
* Get rid of PR_URGENT. There was only one use of it, and it could be
dealt with by the caller.

From art@openbsd.org.


Revision tags: ifpoll-base
# 1.145 25-Feb-2002 itojun

correctly enforce ipsec policy check on forwarding case.
From: Greg Troxel <gdt@ir.bbn.com>, Bill Chiarchiaro <wjc@work.cleartech.com>


# 1.144 24-Feb-2002 martin

Clear M_BCAST and M_MCAST on outgoing mbufs.
Don't copy ttl from the inner packet to the encapsulating packet. Make
the outer ttl sysctl'able. This should close PR 14269 from Jasper Wallace
(change partly from there) and it makes traceroute work over gre tunnels.


# 1.143 21-Feb-2002 itojun

suppress source quence message, based on router-req RFC (also could be abused
as DoS traffic generator). from kjc/kame


# 1.142 28-Nov-2001 darrenr

recompute hlen after calling pfil_run_hooks() in case ip_hl was changed.


# 1.141 13-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-mips-cache-base
# 1.140 04-Nov-2001 matt

Convert netinet to not use the internal <sys/queue.h> field names
but instead the access macros. Use the FOREACH macros where appropriate.


# 1.139 04-Nov-2001 matt

Change a few variable/tables to const since they are read-only.


# 1.138 29-Oct-2001 simonb

Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.137 17-Sep-2001 thorpej

branches: 1.137.2;
Split the pre-computed ifnet checksum flags into Tx and Rx directions.
Add capabilities bits that indicate an interface can only perform
in-bound TCPv4 or UDPv4 checksums. There is at least one Gig-E chip
for which this is true (Level One LXT-1001), and this is also the
case for the Intel i82559 10/100 Ethernet chips.


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.136 06-Aug-2001 itojun

branches: 1.136.2;
cache IPsec policy on in6?pcb. most of the lookup operations can be bypassed,
especially when it is a connected SOCK_STREAM in6?pcb. sync with kame.


# 1.135 02-Jun-2001 thorpej

branches: 1.135.2;
Implement support for IP/TCP/UDP checksum offloading provided by
network interfaces. This works by pre-computing the pseudo-header
checksum and caching it, delaying the actual checksum to ip_output()
if the hardware cannot perform the sum for us. In-bound checksums
can either be fully-checked by hardware, or summed up for final
verification by software. This method was modeled after how this
is done in FreeBSD, although the code is significantly different in
most places.

We don't delay checksums for IPv6/TCP, but we do take advantage of the
cached pseudo-header checksum.

Note: hardware-assisted checksumming defaults to "off". It is
enabled with ifconfig(8). See the manual page for details.

Implement hardware-assisted checksumming on the DP83820 Gigabit Ethernet,
3c90xB/3c90xC 10/100 Ethernet, and Alteon Tigon/Tigon2 Gigabit Ethernet.


# 1.134 21-May-2001 lukem

fix spelo in comment


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.133 16-Apr-2001 itojun

give a default value to net.inet.ip.maxfragpackets, to protect us from
"lots of fragmented packets" DoS attack.

the current default value is derived from ipv6 counterpart, which is
a magical value "200". it should be enough for normal systems, not sure
if it is enough when you take hundreds of thousands of tcp connections on
your system. if you have proposal for a better value with concrete reasons,
let me know.


# 1.132 13-Apr-2001 thorpej

Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.


# 1.131 27-Mar-2001 itojun

net.inet.ip.maxfragpackets defines the maximum size of ip reass queue
(prevents fragment flood from chewing up mbuf memory space).
derived from KAME net.inet6.ip6.maxfragpackets.


# 1.130 02-Mar-2001 itojun

branches: 1.130.2;
increase ipstat.ips_badaddr if the packet fails to pass address checks.


# 1.129 02-Mar-2001 itojun

reject packets with 127/8 on IPv4 src/dst, they must not appear on wire
(RFC1122). torture-tests will be welcomed.
XXX do we want to check source routing headers as well?


# 1.128 01-Mar-2001 itojun

make sure to enforce inbound ipsec policy checking, for any protocols on top
of ip (check it when final header is visited). sync with kame.
XXX kame team will need to re-check policy engine code


# 1.127 24-Jan-2001 itojun

- record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation


# 1.126 28-Dec-2000 thorpej

Back out the sledgehammer damage applied by wiz while I was out for
the holiday.


# 1.125 25-Dec-2000 wiz

Back out previous change. It causes NAT to fail, and was CLEARLY
NOT TESTED before it was committed.


# 1.124 22-Dec-2000 thorpej

Slight adjustment to how pfil_head's are registered. Instead of a
"key" and a "dlt", use a "type" (PFIL_TYPE_{AF,IFNET} for now) and
a val/ptr appropriate for that type. This allows for more future
flexibility with the pfil_hook mechanism.


# 1.123 14-Dec-2000 thorpej

Add ALTQ glue. XXX Temporary until ALTQ is changed to use a pfil hook.


# 1.122 24-Nov-2000 itojun

IFA_STATS stability (not complete); don't touch ip if it is NULL.


# 1.121 11-Nov-2000 thorpej

Restructure the PFIL_HOOKS mechanism a bit:
- All packets are passed to PFIL_HOOKS as they come off the wire, i.e.
fields in protocol headers in network order, etc.
- Allow for multiple hooks to be registered, using a "key" and a "dlt".
The "dlt" is a BPF data link type, indicating what type of header is
present.
- INET and INET6 register with key == AF_INET or AF_INET6, and
dlt == DLT_RAW.
- PFIL_HOOKS now take an argument for the filter hook, and mbuf **,
an ifnet *, and a direction (PFIL_IN or PFIL_OUT), thus making them
less IP (really, IP Filter) centric.

Maintain compatibility with IP Filter by adding wrapper functions for
IP Filter.


# 1.120 08-Nov-2000 ad

Update for hashinit() change.


# 1.119 13-Oct-2000 itojun

make sure we don't share external mbuf between m and mcopy, in ip_forward().
should solve PR 11201.


# 1.118 26-Aug-2000 itojun

make sure anonport{min,max} is not negative number


# 1.117 25-Aug-2000 tron

Add new sysctl variables "net.inet.ip.lowportmin" and
"net.inet.ip.lowportmax" which can be used to the set minimum
and maximum port number assigned to sockets using
IP_PORTRANGE_LOW.


# 1.116 06-Jul-2000 itojun

remove unnecessary #include <netkey/key_debug.h>. from kame.


# 1.115 28-Jun-2000 mrg

<vm/vm.h> -> <uvm/uvm_extern.h>


Revision tags: netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.114 10-May-2000 itojun

branches: 1.114.4;
add missing boundary checks to ip options processing.
correct timestamp option validation (len and ptr upper/lower bound
based on RFC791).
fill "pointer" field for parameter problem in timestamp option processing.


# 1.113 10-May-2000 itojun

correct more out-of-bounds memory access, if cnt == 1 and optlen > 1.


# 1.112 06-May-2000 sommerfeld

Handle large offsets with very small options correctly.


# 1.111 31-Mar-2000 jdolecek

Slighly improve previous - only include <netinet/ip_mroute.h> if MROUTING
is defined.


# 1.110 31-Mar-2000 jdolecek

include <netinet/ip_mroute.h> for ip_mforward() - needed after
last duplicate prototype sweep (prototype for ip_mforward() used to be in <netinet/ip_var.h>)


# 1.109 30-Mar-2000 augustss

Remove register declarations.


# 1.108 30-Mar-2000 simonb

Delete uninitialised declaration of ip_defttl - there's an initialised
decl earlier in this file.


# 1.107 10-Mar-2000 thorpej

Back out previous, and adjust a comment.


# 1.106 07-Mar-2000 thorpej

Back out part of 1.104 which isn't actually needed.


# 1.105 03-Mar-2000 itojun

remove unnecessary ttl initialization which I mistakingly bringed in
during KAME merge (this is part of WIDE's expeirmental reass code...)
NetBSD PR: 9412
From: Wolfgang Rupprecht <wolfgang@wsrcc.com>
Fix from: ho@crt.se
itojun was notified from: theo


# 1.104 02-Mar-2000 thorpej

Avoid a bug in GCC which manifests itself when processing unaligned
IP options. Problem pointed out by Matt Hargett and Erik Fair, analyzed
by me.


# 1.103 01-Mar-2000 itojun

introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)


# 1.102 20-Feb-2000 darrenr

pass "struct pfil_head *" to pfil_add_hook and pfil_remove hook rather
than "struct protosw *".


# 1.101 17-Feb-2000 darrenr

Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.


# 1.100 16-Feb-2000 itojun

- if ip_dst matches address on !IFF_UP interface, and
- there's no match against addresses on IFF_UP interface,
send icmp unreach if I'm router. drop it if I'm host.

Revised version of PR: 9387 from nrt@iij.ad.jp. Discussed with thorpej+nrt.


Revision tags: chs-ubc2-newbase
# 1.99 12-Feb-2000 thorpej

Typo (Thanks, Havard :-)


# 1.98 12-Feb-2000 thorpej

Small cosmetic change, and note a place where a statistic should be
gathered.


# 1.97 11-Feb-2000 itojun

fix in-kernel packet forwarding loop (till TTL becomes 0) when:
- a packet is delivered to an address X,
- and the address X is configured on my !IFF_UP interface
- and ipforwarding=1

NetBSD PR: 9387
From: nrt@iij.ad.jp


# 1.96 01-Feb-2000 thorpej

Use ifatoia() and sintosa() consistently, rather than using home-grown
casting macros intermixed.


# 1.95 31-Jan-2000 itojun

bring in latest KAME ipsec tree.
- interop issues in ipcomp is fixed
- padding type (after ESP) is configurable
- key database memory management (need more fixes)
- policy specification is revisited

XXX m->m_pkthdr.rcvif is still overloaded - hope to fix it soon


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.94 26-Oct-1999 itojun

disable ipflow (IPv4 fast fowarding) when IPsec is configured into the kernel.


# 1.93 17-Oct-1999 sommerfeld

branches: 1.93.2; 1.93.4;
In ip_forward():

Avoid forwarding ip unicast packets which were contained inside
link-level multicast packets; having M_MCAST still set in the packet
header flags will mean that the packet will get multicast to a bogus
group instead of unicast to the next hop.

Malformed packets like this have occasionally been spotted "in the
wild" on a mediaone cable modem segment which also had multiple netbsd
machines running as router/NAT boxes.

Without this, any subnet with multiple netbsd routers receiving all
multicasts will generate a packet storm on receipt of such a
multicast. Note that we already do the same check here for link-level
broadcasts; ip6_forward already does this as well.

Note that multicast forwarding does not go through ip_forward().

Adding some code to if_ethersubr to sanity check link-level
vs. ip-level multicast addresses might also be worthwhile.


Revision tags: chs-ubc2-base
# 1.92 23-Jul-1999 itojun

branches: 1.92.2;
do not include unnecessary include files.


# 1.91 09-Jul-1999 thorpej

defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).


# 1.90 06-Jul-1999 itojun

sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups


# 1.89 01-Jul-1999 itojun

IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.88 26-Jun-1999 sommerfeld

If the new global variable hostzerobroadcast is zero, no longer assume
address zero of each net/subnet is a broadcast address.
(The default value is nonzero, which preserves the current behavior).

This can be set using sysctl; the boot-time default can also be
configured using the HOSTZEROBROADCAST kernel config option.

While we're here, defopt HOSTZEROBROADCAST and SUBNETSARELOCAL


# 1.87 04-May-1999 hwr

It does not make much sense to increase a "output" counter on input.


# 1.86 03-May-1999 thorpej

In INADDR_TO_IA(), skip interfaces which are not up. Revert previous change
to ip_input.c to check the interface status after INADDR_TO_IA().

Fix cooked up by Heiko Rupp and myself.

Fixes PR 7480.


# 1.85 03-May-1999 hwr

Drop packets, that have a Class-D address as source address.
Implements the first half of PR 7003.


# 1.84 07-Apr-1999 proff

tiny KNF change


# 1.83 07-Apr-1999 proff

Prevent reception of packets on downed interfaces (via an up interface).
fixes kern/7327


Revision tags: netbsd-1-4-base
# 1.82 27-Mar-1999 aidan

branches: 1.82.2;
Added per-addr input/output statistics. Currently just support netatalk
and netinet, currently only tested under netinet.

Disabled by default, enabled by compiling the kernel with option
IFA_STATS. Enabling this feature seems to make the ip_output function
take 13% longer than before, which should be OK for people that need
this feature.


# 1.81 26-Mar-1999 proff

security: test for ip_len < ip_hl <<2 and drop packet accordingly


# 1.80 19-Jan-1999 mycroft

There's just no plausible reason to byte-swap ip_id internally. It's opaque.


# 1.79 19-Jan-1999 mycroft

Don't screw with ip_len; just subtract from it where we actually use the
value.


# 1.78 19-Jan-1999 mycroft

Don't overwrite the checksum fields when checking them. There's no reason to
do this, and it screws up ICMP replies.
XXX The returned IP checksum and length are still wrong.


# 1.77 11-Jan-1999 thorpej

Fix byte order and ip_len inconsistencies in ICMP reply code. Also, fix
some formatting and HTONS(foo) vs. foo = htons(foo) inconsistencies.

PR #6602, Darren Reed.


# 1.76 19-Dec-1998 thorpej

Reverse the copyright-notice-swap. It went against existing practice.


# 1.75 18-Dec-1998 thorpej

Add a lock around the IP fragment reassembly queue, to prevent ip_drain()
from corrupting the queue if called from a device's interrupt context.

Should fix PR #5684.


Revision tags: kenh-if-detach-base
# 1.74 13-Nov-1998 thorpej

branches: 1.74.2;
Once a fragmented IP packet has been reassembled, recompute the packet
length before passing it up the stack. From FreeBSD.


Revision tags: chs-ubc-base
# 1.73 08-Oct-1998 thorpej

Use the pool allocator for ipflow entries.


# 1.72 08-Oct-1998 thorpej

Use the pool allocator for ipqent structures.


# 1.71 30-Sep-1998 tls

Switch order of TNF and UCB copyrights so UCB copyright is first; this seems more appropriate since UCB wrote the original code, after all.


# 1.70 09-Sep-1998 thorpej

Make a diagnostic printf more sensible, PR #5951, Heiko W. Rupp.


# 1.69 09-Aug-1998 mrg

defopt PFIL_HOOKS.


Revision tags: eeh-paddr_t-base
# 1.68 17-Jul-1998 sommerfe

Fix PR5508: ipfil cut-through forwarding causes panic


# 1.67 01-Jun-1998 thorpej

Protect the ipflow_reap() call with splsoftnet.


# 1.66 24-May-1998 thorpej

Fix OBOB in IP timestamp option processing, as noted in FreeBSD PR 6738,
from Jennifer Dawn Meyers <jdm@enteract.com>.


# 1.65 04-May-1998 matt

Default IP flow to being enabled. Add a sysctl to control the maximum
number of flows (net.inet.ip.maxflows). If set to 0, will disable fast
path forwarding.


# 1.64 01-May-1998 thorpej

Allow packet filters to prevent a packet from creating a fast-forwarding
flow, by setting the "can fast forward" flag in the packet header, and
giving a chance for filters to clear the flag. If the flag is still
set after the filters have given it a chance, the packet will be used
to create a fast-forward flow entry.


# 1.63 29-Apr-1998 matt

Add support for "fast" forwarding. Add hooks in if_ethersubr.c and
if_fddisubr.c to fastpath IP forwarding. If ip_forward successfully
forwards a packet, it will create a cache (ipflow) entry. ether_input
and fddi_input will first call ipflow_fastforward with the received
packet and if the packet passes enough tests, it will be forwarded (the
ttl is decremented and the cksum is adjusted incrementally).


# 1.62 29-Apr-1998 matt

defopt GATEWAY


# 1.61 29-Apr-1998 kml

change path MTU timeout value to match RFC 1191


# 1.60 29-Apr-1998 kml

Add support for deletion of routes added by path MTU discovery;
uses new generic route timeout code. Add sysctl for timeout period.


# 1.59 19-Mar-1998 mrg

convert pfil(9) in and out lists from <sys/queue.h> LISTs to TAILQs, and
change pfil_add_hook to put output filters at the tail of the queue,
while continuing to place input filters at the head of the queue. update
the two users of these functions, and document these changes.

fixes PR#4593.


# 1.58 15-Feb-1998 tls

Add correct copyright notice for IP address hash change. This code is donated to TNF by the original copyright holder, Panix.


# 1.57 13-Feb-1998 tls

Change list of interface IP addresses to a hash. Improves performance on hosts with a large number of IP addresses significantly.


# 1.56 28-Jan-1998 thorpej

Use offsetof() from libkern.h


# 1.55 12-Jan-1998 scottr

Use option header file for MROUTING


# 1.54 05-Jan-1998 lukem

enhance ephemeral port allocation code:
* support sysctl net.inet.ip.anonportmin (lowest ephemeral port)
and net.inet.ip.anonportmax (highest ephemeral port).
these can't be set to >65535, < IPPORT_RESERVED (unless IPNOPRIVPORTS
is defined), and anonportmin has to be < anonportmax.
* use a cleaner way of only cycling through the available set once;
this will be useful for when a random allocation scheme is used
* define IPPORT_ANON{MIN,MAX} instead of IPPORT_USER{LOW,HIGH}


Revision tags: netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base
# 1.53 18-Oct-1997 kml

branches: 1.53.2;
change sysctl net.inet.icmp.mtudisc to net.inet.ip.mtudisc


# 1.52 17-Oct-1997 thorpej

Allow `subnetsarelocal' to be changed via sysctl.


Revision tags: thorpej-signal-base marc-pcmcia-base
# 1.51 29-Aug-1997 gwr

Tweaks to allow operation with an interface address of 0.0.0.0
(needed for NFS mountroot using BOOTP to get boot parameters)


Revision tags: marc-pcmcia-bp
# 1.50 24-Jun-1997 thorpej

branches: 1.50.4;
Eliminate use of dtom() from the network code, allowing more flexible
use of mbuf external storage and increasing performance (by eliminating
an m_pullup() for clusters in the IP reassembly code).

Changes from Koji Imada <koji@math.human.nagoya-u.ac.jp>, in PR #3628
and #3480, with ever-so-slight integration changes by me.


# 1.49 15-Apr-1997 christos

Move the mtod calls *after* we've made sure that the packet has passed the
filter successfully. Otherwise it can be NULL if the filter blocked it,
and we die. How did this ever work?


Revision tags: is-newarp-before-merge
# 1.48 26-Feb-1997 mrg

allow src-routed packetd by default, per host requirements


# 1.47 25-Feb-1997 cjs

Add net.inet.ip.allowsrcrt option which allows/drops all source
routed packets. This currently defaults to `drop,' but once we
verify that all applications that rely on determining remote IP
addresses for authentication are dropping the connection when they
see a source route option (not just disabling the source route
option), we can turn this back on and conform with the host
requirements.


# 1.46 19-Feb-1997 cjs

Fix bug in sysctl net.inet.ip.forwsrcrt handing: now you can read it
if securelevel > 0. (Thanks, cgd.)


# 1.45 18-Feb-1997 mrg

pseudo-device ipfilter brings in PFIL_HOOKS.


Revision tags: is-newarp-base
# 1.44 11-Jan-1997 thorpej

branches: 1.44.4;
Implement the IP_RECVIF socket option: supply a datagram packet's incoming
interface using a sockaddr_dl in a control mbuf.

Implement SO_TIMESTAMP for IP datagrams.

Move packet information option processing into a generic function
so that they work with multicast UDP and raw IP as well as unicast UDP.

Contributed by Bill Fenner <fenner@parc.xerox.com>.


# 1.43 20-Dec-1996 mrg

in pfil_hooks: always reassign ip after calling hook.


# 1.42 20-Dec-1996 mrg

remove pfil_bad.


# 1.41 25-Oct-1996 thorpej

Before concatenating frags, sanity check the length of the packet. If it's
larger than IP_MAXPACKET, discard it.
Based on a patch from Bill Fenner <fenner@parc.xerox.com>


# 1.40 22-Oct-1996 veego

Fix a panic from the pfil_hooks.


# 1.39 13-Oct-1996 christos

backout previous kprintf changes


# 1.38 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.37 21-Sep-1996 perry

commit fix in pr 2772 -- the IP input code was assuming that the
reserved (must be zero) flag must necessarily be zero. We now define
an IP_RF (by analogy to IP_DF and IP_MF) and mask it out when necessary.


# 1.36 14-Sep-1996 mrg

move the packet filter hooks in to a saner location. while i'm here, rename
PACKET_FILTER to PFIL_HOOKS.


# 1.35 09-Sep-1996 mycroft

Add in_nullhost() and in_hosteq() macros, to hide some protocol
details. Also, fix a bug in TCP wrt SYN+URG packets.


# 1.34 08-Sep-1996 mycroft

Save 68 bytes of the packet for ICMP, not 64. From Laine Stump, PR 2296.


# 1.33 06-Sep-1996 mrg

add packet filter interface code. see pfil(9) for more details. you
need the PACKET_FILTER option to enable this code. currently, ipfilter
version 3.1.1-beta has been converted to use this new interface.


# 1.32 14-Aug-1996 thorpej

Fix some DIAGNOSTIC printf() formats; ntohl() provides a 32-bit quantity,
and should be printed with %x, not %lx.


# 1.31 10-Jul-1996 cgd

print result of ntohl/htonl as a long. (makes -Wformat work on the
Alpha.)


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.30 16-Mar-1996 christos

branches: 1.30.4;
Fix printf format args.


# 1.29 26-Feb-1996 mrg

two more local addr changes, all done differently now (idea from charles)


# 1.28 13-Feb-1996 christos

netinet prototypes


# 1.27 16-Jan-1996 thorpej

Add a net.inet.ip.directed-broadcast sysctl as suggested by
Darren Reed <darrenr@vitruvius.arbld.unimelb.edu.au> in PR #1227.
This change is slightly different than the one submitted by Darren in
that the DIRECTED_BROADCAST compile-time option will behave like it used
to so that existing configurations utilizing it won't have to change.


# 1.26 15-Jan-1996 thorpej

Add net.inet.ip.forwsrcrt: if zero, the system will not forward
source-routed packets. Note this value is protected by kernel security
level; it can only be changed if securelevel < 1.


# 1.25 21-Nov-1995 cgd

make netinet work on systems where pointers and longs are 64 bits
(like the alpha). Biggest problem: IP headers were overlayed with
structure which included pointers, and which therefore didn't overlay
properly on 64-bit machines. Solution: instead of threading pointers
through IP header overlays, add a "queue element" structure to do
the threading, and point it at the ip headers.


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.24 12-Aug-1995 mycroft

splnet --> splsoftnet


# 1.23 12-Jun-1995 mycroft

Change in_pcbnotify*() to take an errno value. Make inetctlerrmap[] an
array on ints, not u_chars.


# 1.22 12-Jun-1995 mycroft

Various cleanup, including:
* Convert several data structures to use queue.h.
* Split in_pcbnotify() into two parts; one for notifying a specific PCB, and
one for notifying all PCBs for a particular foreign address.


# 1.21 07-Jun-1995 mycroft

Remove ip_ifmatrix completely.


# 1.20 04-Jun-1995 mycroft

Don't cast things unnecessarily.


# 1.19 04-Jun-1995 mycroft

Clean up many more casts.


# 1.18 01-Jun-1995 mycroft

Avoid byte-swapping IP addresses at run time.


# 1.17 15-May-1995 cgd

oops; forgot a '{'


# 1.16 14-May-1995 cgd

drop (and record) malformed IP fragments. Fixes pr 1030 (differently).


# 1.15 13-Apr-1995 cgd

be a bit more careful and explicit with types. (basically a large no-op.)


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.14 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.13 13-May-1994 mycroft

Update to 4.4-Lite networking code, with a few local changes.


# 1.12 14-Feb-1994 mycroft

PARANOID --> DIAGNOSTIC for inexpensive tests.


# 1.11 02-Feb-1994 hpeyerl

Multicast is no longer optional.


# 1.10 29-Jan-1994 brezak

Fix some cases of NOT dealing with m_pkthdr's. This code is still suspect though, at least this fixes some panics.


# 1.9 10-Jan-1994 mycroft

Should compile now with or without `options MULTICAST'.


# 1.8 09-Jan-1994 mycroft

Prototype the rest.


# 1.7 08-Jan-1994 mycroft

More prototypes.


# 1.6 08-Jan-1994 mycroft

Fix some inconsistent spacing; spaces at the end of lines, etc.


# 1.5 18-Dec-1993 mycroft

Canonicalize all #includes.


# 1.4 06-Dec-1993 hpeyerl

multicast support.
>From Chris Maeda, cmaeda@cs.washington.edu
These patches are derived from the IP Multicast patches for BSDI.


Revision tags: magnum-base netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.3 20-May-1993 cgd

branches: 1.3.4;
more rcsid additions and file header cleanups


# 1.2 04-May-1993 cgd

make ip_input recursion checking be for -DPARANOID, and make it panic


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.390 15-Sep-2019 bouyer

Packet filters can return an mbuf chain with fragmented headers, so
m_pullup() it if needed and remove the KASSERT()s.


Revision tags: netbsd-9-base phil-wifi-20190609
# 1.389 13-May-2019 ozaki-r

Count packets dropped by pfil


Revision tags: isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118
# 1.388 17-Jan-2019 knakahara

Fix ipsecif(4) cannot apply input direction packet filter. Reviewed by ozaki-r@n.o and ryo@n.o.

Add ATF later.


Revision tags: pgoyette-compat-1226 pgoyette-compat-1126
# 1.387 15-Nov-2018 maxv

Remove the 't' argument from m_tag_find().


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.386 02-Sep-2018 maxv

remove reference to ipnat, and duplicate comments


Revision tags: pgoyette-compat-0728
# 1.385 10-Jul-2018 maxv

Remove the second argument from ip_reass_packet(). We want the IP header
on the mbuf, not elsewhere. Simplifies the NPF reassembly code a little.
No real functional change.


Revision tags: phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.384 17-May-2018 maxv

branches: 1.384.2;
Add KASSERTs, related to PR/39794.


# 1.383 14-May-2018 maxv

Merge ipsec4_input and ipsec6_input into ipsec_ip_input. Make the argument
a bool for clarity. Optimize the function: if M_CANFASTFWD is not there
(because already removed by the firewall) leave now.

Makes it easier to see that M_CANFASTFWD is not removed on IPv6.


# 1.382 10-May-2018 maxv

Rename ipsec4_forward -> ipsec_mtu, and switch to void.


Revision tags: pgoyette-compat-0502
# 1.381 26-Apr-2018 maxv

Remove unused mbuf argument from sbsavetimestamp.


Revision tags: pgoyette-compat-0422 pgoyette-compat-0415
# 1.380 15-Apr-2018 maxv

Introduce a m_verify_packet function, that verifies the mbuf chain of a
packet to ensure it is not malformed. Call this function in "points of
interest", that are the IPv4/IPv6/IPsec entry points. There could be more.

We use M_VERIFY_PACKET(m), declared under DIAGNOSTIC only.

This function should not be called everywhere, especially not in places
that temporarily manipulate (and clobber) the mbuf structure; once they're
done they put the mbuf back in a correct format.


# 1.379 11-Apr-2018 maxv

Don't pass IP_ALLOWBROADCAST in ipsec4_input. The flag lands in
ipsec_getpolicybyaddr, and only IP_FORWARDING is taken.

In fact it would be good to change the 'flags' argument of ipsec4_input
to be a boolean, same for ipsec_getpolicybyaddr. It would be less
misleading.


# 1.378 11-Apr-2018 maxv

Add comment about IPsec.


# 1.377 11-Apr-2018 maxv

Small changes in ip_dooptions: replace bcopy by memcpy, the areas can't
overlap.


Revision tags: pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.376 24-Feb-2018 ozaki-r

branches: 1.376.2;
Avoid a deadlock between softnet_lock and IFNET_LOCK

A deadlock occurs because there is a violation of the rule of lock ordering;
softnet_lock is held with hodling IFNET_LOCK, which violates the rule.
To avoid the deadlock, replace softnet_lock in in_control and in6_control
with KERNEL_LOCK.

We also need to add some KERNEL_LOCKs to protect the network stack surely.
This is required, for example, for PR kern/51356.

Fix PR kern/53043


# 1.375 09-Feb-2018 maxv

Remove dead code.


# 1.374 07-Feb-2018 maxv

Remove null check on ip, it can't be null. (Confuses code scanners.)


# 1.373 06-Feb-2018 maxv

Typos and style a bit, no functional change.


# 1.372 05-Feb-2018 maxv

Exterminate IPSENDREDIRECTS and IPMTUDISCTIMEOUT, neither is documented.


# 1.371 05-Feb-2018 maxv

Nuke DIRECTED_BROADCAST, it is not documented and not enabled anywhere. It
probably wouldn't have built correctly anyway, since there is no associated
defflag.

These ten lines of code in ip_input.c already look a lot better.


# 1.370 05-Feb-2018 maxv

Clean up this mess. This is typically the kind of places where we need to
seriously cut the bullshit. These things are unreadable, undocumented, and
all they bought us was not figuring out we had IPv4 forwarding enabled by
default for 20+ years.


# 1.369 05-Feb-2018 maxv

Be tougher, and don't allow LSRR+SSRR (RFC7126).


# 1.368 05-Feb-2018 maxv

Kick duplicate options, they are not allowed (RFC791).


# 1.367 05-Feb-2018 maxv

Remove unused variable.


# 1.366 05-Feb-2018 maxv

Disable ip_allowsrcrt and ip_forwsrcrt. Enabling them by default was a
completely dumb idea, because they have security implications.

By sending an IPv4 packet containing an LSRR option, an attacker will
cause the system to forward the packet to another IPv4 address - and
this way he white-washes the source of the packet.

It is also possible for an attacker to reach hidden networks: if a server
has a public address, and a private one on an internal network (network
which has several internal machines connected), the attacker can send a
packet with:

source = 0.0.0.0
destination = public address of the server
LSRR first address = address of a machine on the internal network

And the packet will be forwarded, by the server, to the internal machine,
in some cases even with the internal IP address of the server as a source.


# 1.365 05-Feb-2018 maxv

Style, no functional change.


# 1.364 01-Jan-2018 christos

1) "#define ipi_spec_dst ipi_addr" in <netinet/in.h>
2) Change the IP_RECVPKTINFO option to control the generation of
IP_PKTINFO control messages, the way it's done in Solaris.
3) Remove the superfluous IP_RECVPKTINFO control message.
4) Change the IP_PKTINFO option to do different things depending on
the parameter it's supplied with:
- If it's sizeof(int), assume it's being used as in Linux:
- If it's non-zero, turn on the IP_RECVPKTINFO option.
- If it's zero, turn off the IP_RECVPKTINFO option.
- If it's sizeof(struct in_pktinfo), assume it's being used as in
Solaris, to set a default for the source interface and/or
source address for outgoing packets on the socket.
5) Return what Linux or Solaris compatible code expects, depending
on data size, and just added a fallback to a Linux (and current NetBSD)
compatible value if the size is unknown (as it is now), or,
in the future, if the calling application specifies a receiving
buffer that doesn't match either data item.

From: Tom Ivar Helbekkmo


Revision tags: tls-maxphys-base-20171202
# 1.363 24-Nov-2017 roy

Allow local communication over DETACHED addresses.
Allow binding to DETACHED or TENTATIVE addresses as we deny
sending upstream from them anyway.
Prefer non DETACHED or TENTATIVE addresses.


# 1.362 17-Nov-2017 ozaki-r

Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch

It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.

No functional change


# 1.361 27-Sep-2017 ozaki-r

Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).


Revision tags: nick-nhusb-base-20170825
# 1.360 27-Jul-2017 ozaki-r

Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.


# 1.359 19-Jul-2017 ozaki-r

Correct a comment


Revision tags: perseant-stdc-iso10646-base
# 1.358 08-Jul-2017 christos

Reorder the controls to the ones that need an interface and the ones that
don't; process the ones that don't first. Add a DIAGNOSTIC if there is no
interface; really this should be a KASSERT/panic because it is a bug if the
interface is not set at this point.


# 1.357 06-Jul-2017 christos

remove unnecessary casts (no functional change)


# 1.356 06-Jul-2017 christos

Merge the two copies SO_TIMESTAMP/SO_OTIMESTAMP processing to a single
function, and add a SOOPT_TIMESTAMP define reducing compat pollution from
5 places to 1.


Revision tags: netbsd-8-base
# 1.355 01-Jun-2017 chs

branches: 1.355.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base
# 1.354 31-Mar-2017 ozaki-r

Don't use a single global variable to store source route information for multiple incoming packets

It's not MP-safe. So use a m_tag to store the information instead.

Pointed out by knakahara@
The fix is from OpenBSD (originally fixed in FreeBSD)


# 1.353 31-Mar-2017 ozaki-r

Don't use a single global variable as a temporal storage for multiple packets

It's not MP-safe. So use local variables instead.


Revision tags: pgoyette-localcount-20170320
# 1.352 06-Mar-2017 ozaki-r

Make sure icmp_redirect_timeout_q and ip_mtudisc_timeout_q are initialized on bootup

Fix PR kern/52029


# 1.351 17-Feb-2017 ozaki-r

Fix return value


# 1.350 17-Feb-2017 ozaki-r

Protect sysctl_net_inet_ip_pmtudto with icmp_mtx instead of softnet_lock


# 1.349 07-Feb-2017 ozaki-r

Add missing NULL checks for m_get_rcvif


Revision tags: nick-nhusb-base-20170204
# 1.348 24-Jan-2017 ozaki-r

Tweak softnet_lock and NET_MPSAFE

- Don't hold softnet_lock in some functions if NET_MPSAFE
- Add softnet_lock to sysctl_net_inet_icmp_redirtimeout
- Add softnet_lock to expire_upcalls of ip_mroute.c
- Restore softnet_lock for in{,6}_pcbpurgeif{,0} if NET_MPSAFE
- Mark some softnet_lock for future work


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107
# 1.347 12-Dec-2016 ozaki-r

branches: 1.347.2;
Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.346 08-Dec-2016 ozaki-r

Use psref for ip_rtaddr

ip_rtaddr will be sleepable soon. So use psref instead of pserialize.


# 1.345 08-Dec-2016 ozaki-r

Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.344 18-Oct-2016 ozaki-r

Don't hold global locks if NET_MPSAFE is enabled

If NET_MPSAFE is enabled, don't hold KERNEL_LOCK and softnet_lock in
part of the network stack such as IP forwarding paths. The aim of the
change is to make it easy to test the network stack without the locks
and reduce our local diffs.

By default (i.e., if NET_MPSAFE isn't enabled), the locks are held
as they used to be.

Reviewed by knakahara@


# 1.343 18-Oct-2016 ozaki-r

Avoid double frees of mbuf

May fix one of panicks reported by Tom Ivar Helbekkmo in PR kern/51522


# 1.342 11-Oct-2016 ozaki-r

Fix kernel builds with IFA_STATS


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.341 07-Sep-2016 roy

Disallow input to detached addresses because they are not yet valid.


# 1.340 31-Aug-2016 ozaki-r

Make ipforward_rt and ip6_forward_rt percpu

Sharing one rtcache between CPUs is just a bad idea.

Reviewed by knakahara@


Revision tags: pgoyette-localcount-20160806
# 1.339 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


# 1.338 26-Jul-2016 ozaki-r

Fix downmatch increment


Revision tags: pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.337 08-Jul-2016 ozaki-r

branches: 1.337.2;
CID 1363344: remove dead code

We may need to reconsider a case when m_get_rcvif_psref returns NULL.


# 1.336 07-Jul-2016 ozaki-r

Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.


# 1.335 06-Jul-2016 ozaki-r

Switch the IPv4 address list to pslist(9)

Note that we leave the old list just in case; it seems there are some
kvm(3) users accessing the list. We can remove it later if we confirmed
nobody does actually.


# 1.334 06-Jul-2016 ozaki-r

Add and use pslist(9)-based hashtable for IPv4 addresses

Note that we leave the old hashtable to keep vmstat -H working.


# 1.333 04-Jul-2016 ozaki-r

Separate IP address matching functions

No functional change intended.


# 1.332 30-Jun-2016 ozaki-r

Tidy up goto lables

No functional change.


# 1.331 30-Jun-2016 ozaki-r

Fix error paths

Some error paths did m_put_rcvif_psref twice.


# 1.330 28-Jun-2016 ozaki-r

Add missing NULL checks for m_get_rcvif_psref


# 1.329 10-Jun-2016 ozaki-r

Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319
# 1.328 21-Jan-2016 riastradh

Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.


# 1.327 21-Jan-2016 riastradh

Give proper prototype to ip_output.


# 1.326 08-Jan-2016 knakahara

eliminate ip_input.c and ip6_input.c dependency on gif(4)


Revision tags: nick-nhusb-base-20151226
# 1.325 13-Oct-2015 roy

Include arp.h to restore the sysctl net.inet.ip.dad_count.
Fixes PR kern/49883 thanks to HITOSHI Osada.


Revision tags: nick-nhusb-base-20150921
# 1.324 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.323 07-Aug-2015 ozaki-r

Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.


Revision tags: nick-nhusb-base-20150606
# 1.322 02-May-2015 joerg

Fix !ARP build.


# 1.321 02-May-2015 roy

Add IPv4 address flags IN_IFF_TENTATIVE, IN_IFF_DUPLICATED and
IN_IFF_DETATCHED to mimic the IPv6 address behaviour.
Add SIOCGIFAFLAG_IN ioctl to retrieve the address flag via the
ifreq structure.
Add IPv4 DAD detection via the ARP methods described in RFC 5227.
Add sysctls net.inet.ip.dad_count and net.inet.arp.debug.

Discussed on tech-net@


Revision tags: nick-nhusb-base-20150406
# 1.320 26-Mar-2015 ozaki-r

Tidy up the regular path of ip_forward

No functional change is intended.


Revision tags: netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.319 16-Jun-2014 ozaki-r

branches: 1.319.2; 1.319.4; 1.319.6; 1.319.10;
Add 3rd argument to pktq_create to pass sc

It will be used to pass bridge sc for bridge_forward softint.

ok rmind@


# 1.318 05-Jun-2014 rmind

- Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.


# 1.317 30-May-2014 christos

Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.


# 1.316 29-May-2014 rmind

Make IGMP and multicast group management code MP-safe. Use a read-write
lock to protect the hash table of multicast address records; also, make it
private and eliminate some macros. In the long term, the lookup path ought
to be optimised.


# 1.315 28-May-2014 christos

CID 12164{49,51}: Remove bogus ifp == NULL checks; if ifp was really NULL,
we would have been dead a few lines before the tests.


# 1.314 23-May-2014 rmind

ip_input(), ip_savecontrol(): cache m->m_pkthdr.rcvif in a variable.


# 1.313 23-May-2014 rmind

Make ip_forward() static, there is no need to expose it.


# 1.312 23-May-2014 rmind

Make ip_input() static, there is no need to expose it.


# 1.311 22-May-2014 rmind

- Add in_init() and move some functions, variables and sysctls into in.c
where they belong to. Make some functions and variables static.
- ip_input.c: reduce some #ifdefs, cleanup a little.
- Move some sysctls into ip_flow.c as they belong there.

No functional change.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.310 19-Mar-2014 liamjfoy

branches: 1.310.2;
Remove ipflow_prune and replace with ipflow_reap. ok rmind@


Revision tags: riastradh-drm2-base3
# 1.309 25-Feb-2014 pooka

Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.308 29-Jun-2013 rmind

- Rewrite parts of pfil(9): use array to store hooks and thus be more cache
friendly (there are only few hooks in the system). Make the structures
opaque and the interface more strict.
- Remove PFIL_HOOKS option by making pfil(9) mandatory.


# 1.307 27-Jun-2013 christos

branches: 1.307.2;
flip src/dst


# 1.306 27-Jun-2013 christos

implement IP_PKTINFO and IP_RECVPKTINFO.


# 1.305 08-Jun-2013 rmind

Split IPsec code in ip_input() and ip_forward() into the separate routines
ipsec4_input() and ipsec4_forward(). Tested by christos@.


# 1.304 05-Jun-2013 christos

IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.


Revision tags: agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.303 29-Nov-2012 christos

Add a new sysctl to mark ports as reserved, so that they are not used in
the anonymous or reserved port allocation.


Revision tags: yamt-pagecache-base6
# 1.302 25-Jun-2012 christos

branches: 1.302.2;
rename rfc6056 -> portalgo, requested by yamt


# 1.301 22-Jun-2012 christos

PR/46602: Move the rfc6056 port randomization to the IP layer.


# 1.300 02-Jun-2012 dsl

Add some pre-processor magic to verify that the type of the data item
passed to sysctl_createv() actually matches the declared type for
the item itself.
In the places where the caller specifies a function and a structure
address (typically the 'softc') an explicit (void *) cast is now needed.
Fixes bugs in sys/dev/acpi/asus_acpi.c sys/dev/bluetooth/bcsp.c
sys/kern/vfs_bio.c sys/miscfs/syncfs/sync_subr.c and setting
AcpiGbl_EnableAmlDebugObject.
(mostly passing the address of a uint64_t when typed as CTLTYPE_INT).
I've test built quite a few kernels, but there may be some unfixed MD
fallout. Most likely passing &char[] to char *.
Also add CTLFLAG_UNSIGNED for unsiged decimals - not set yet.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.299 22-Mar-2012 drochner

remove KAME IPSEC, replaced by FAST_IPSEC


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.298 09-Jan-2012 liamjfoy

branches: 1.298.2; 1.298.6; 1.298.8;
check against NULL


# 1.297 19-Dec-2011 drochner

rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.296 31-Aug-2011 plunky

branches: 1.296.2; 1.296.6;
NULL does not need a cast


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.295 03-May-2011 dyoung

*_drain() routines may be called with locks held, so instead of doing
any work in *_drain(), set a drain-needed flag. Do the work in the
fasttimo handler.

Contributed by Coyote Point Systems, Inc.


# 1.294 14-Apr-2011 dyoung

In ipintr(), don't overwrite ipintrq.ifq_maxlen with IFQ_MAXLEN.

Initialize ipintrq.ifq_maxlen using IFQ_MAXLEN directly instead of using
the global ipqmaxlen. Get rid of the global ipqmaxlen.

Now it works again to override the maximum IP queue length with, for
example, sysctl -w net.inet.ip.ifq.maxlen=5.


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.293 13-Dec-2010 matt

branches: 1.293.2;
Back out rev that shouldn't have been committed.


# 1.292 11-Dec-2010 matt

Add routines to calculate a checkesum if the driver concludes that the
h/w can't do it.


Revision tags: uebayasi-xip-base4
# 1.291 05-Nov-2010 rmind

ip_randomid: make mechanism MP-safe and more modular.

OK matt@


# 1.290 05-Nov-2010 rmind

ip_reass_packet: finish abstraction; some clean-up.
Discussed some time ago with matt@.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.289 19-Jul-2010 rmind

Abstract IP reassembly into single generic routine - ip_reass_packet().
Make struct ipq private and struct ipqent not visible to userland.
Push ip_len adjustment into reassembly layer.

OK matt@


# 1.288 13-Jul-2010 rmind

Split-off IPv4 re-assembly mechanism into a separate module. Abstract
into ip_reass_init(), ip_reass_lookup(), etc (note: abstraction is not
yet complete). No functional changes to the actual mechanism.

OK matt@


# 1.287 09-Jul-2010 rmind

ip_input: move lookup for fragment queue a little bit further. OK matt@.


Revision tags: uebayasi-xip-base1
# 1.286 01-Apr-2010 tls

As suggested by at least 3 different people (the guilty parties know who
they are) avoid repeated kernel_lock/unlock by using an intrq on the stack.

About 5%-10% better from run to run, on my *very* simpleminded test. Can't
possibly be worse.


# 1.285 31-Mar-2010 tls

Don't hold kernel lock across call to ip_input() -- it blocked *all*
hardware interrupts for the length of time it took for all dequeued
packets to flow up the stack (on multiprocessors only). Initial testing
shows performance impact is minimal -- since this temporary fix actually
means taking/releasing the kernel lock per-packet, that seems
acceptable.

Holding the kernel lock across the ip_input() call duplicated the
exclusion intended to be provided by the socket locks/softnet lock
(same lock, for INET/INET6 sockets) and could mask serious bugs. Several
hours' testing didn't turn any up but I'd be surprised if some don't now
appear.

Damon Permezel noticed the problem. Temporary fix suggested by matt@.


Revision tags: yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.284 16-Sep-2009 pooka

branches: 1.284.2; 1.284.4;
Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.283 17-Jul-2009 minskim

Delete trailing whitespace.


Revision tags: yamt-nfs-mp-base6
# 1.282 16-Jul-2009 minskim

Add the IP_RECVTTL option support.

If the IP_RECVTTL option is enabled on a SOCK_DGRAM socket, the
recvmsg(2) call will return the TTL of the received datagram. The
msg_control field in the msghdr structure points to a buffer that
contains a cmsghdr structure followed by the TTL value.

Modeled after FreeBSD implementation.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.281 18-Apr-2009 tsutsui

Remove extra whitespace added by a stupid tool.
XXX: more in src/sys/arch


# 1.280 15-Apr-2009 elad

Remove a few KAUTH_GENERIC_ISSUSER in favor of more descriptive
alternatives.

Discussed on tech-kern:

http://mail-index.netbsd.org/tech-kern/2009/04/11/msg004798.html

Input from ad@, christos@, dyoung@, tsutsui@.

Okay ad@.


# 1.279 18-Mar-2009 cegger

bcopy -> memcpy


Revision tags: nick-hppapmap-base2
# 1.278 19-Jan-2009 christos

branches: 1.278.2;
Provide compatibility to the old timeval SCM_TIMESTAMP messages.


Revision tags: mjf-devfs2-base
# 1.277 17-Dec-2008 cegger

kill MALLOC and FREE macros.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.276 23-Nov-2008 rmind

ip_input: fix an IPQ "lock" leak. (hi <matt>!)


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4
# 1.275 04-Oct-2008 pooka

branches: 1.275.2; 1.275.4;
POOL_INIT -> pool_init


Revision tags: wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.274 05-Sep-2008 seanb

Wrong route being consulted in one place
in ip_forward() after change to rtcache_*().
Restore previous behaviour.


# 1.273 20-Aug-2008 matt

Make the sysctl routines take out softnet_lock before dealing with
any data structures.

Change inet6ctlerrmap and zeroin6_addr to const.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base
# 1.272 05-May-2008 ad

branches: 1.272.2; 1.272.6;
- Convert hashinit() to use kmem_alloc(). The hash tables can be large
and it's better to not have them in kmem_map.
- Convert a couple of minor items along the way to kmem_alloc().
- Fix some memory leaks.


# 1.271 04-May-2008 thorpej

Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577


# 1.270 02-May-2008 ad

PR kern/38497 Out of memory allocating ksiginfo

Work around: don't acquire softnet_lock in protocol drain routines.


# 1.269 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.268 24-Apr-2008 ad

branches: 1.268.2;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.


# 1.267 23-Apr-2008 thorpej

Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.266 12-Apr-2008 thorpej

branches: 1.266.2;
Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.


# 1.265 09-Apr-2008 thorpej

- ipflow is not used outside ip_flow.c; move its definition there.
- Make ipflow_reap() private to ip_flow.c, and introduce ipflow_prune()
for external callers to use (avoids returning an ipflow * that is never
actually used anyway).


# 1.264 07-Apr-2008 thorpej

Change IP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ipstat structure; old netstat
binaries will continue to work properly.


# 1.263 27-Mar-2008 cube

- Make sure we send a reasonable fragment size when IPSEC is configured.
Otherwise we end up sending a dubious "0" whenever we cannot find a
proper association for the packet.
- Reset sack_newdata along with snd_nxt to avoid improper integer
arithmetics that lead to sending data from an incorrect place in the
stream, making it appear as corrupted.

Patch by Michael Van Elst, based on an analysis by Michael for the IPSEC
stuff and I for the SACK issue.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.262 06-Feb-2008 matt

branches: 1.262.6;
Add a new ip_id generation scheme based on a Fisher-Yates shuffle over a
sliding window. XXX replace use of arc4random RSN.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.261 14-Jan-2008 dyoung

Use rtcache_validate() instead of rtcache_getrt(). Shorten staircase
in in_losing().


Revision tags: vmlocking2-base3 matt-armv6-base
# 1.260 22-Dec-2007 matt

Fix offset calculation.
Make sure that all frags use the same TOS.


# 1.259 21-Dec-2007 matt

Also make sure the first is at 68 bytes long.


# 1.258 21-Dec-2007 matt

Prevent TCP blind data attacks by not allowing non-initial fragments to
start at less than 68 bytes (minimal fragment size).


# 1.257 20-Dec-2007 dyoung

Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.256 26-Nov-2007 yamt

branches: 1.256.2; 1.256.6;
inetctlerrmap: use designated initializer.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.255 09-Nov-2007 kefren

Don't MCLAIM in ipintr() because we do it anyway in ip_input()


Revision tags: jmcneill-base yamt-x86pmap-base4 yamt-x86pmap-base3 yamt-x86pmap-base2 vmlocking-base
# 1.254 02-Oct-2007 dyoung

branches: 1.254.2; 1.254.4;
Delete the unused second argument to ip_stripoptions(), move it
closer to its single caller in if_eon.c, try to move fewer bytes
by moving the IP header forward instead of moving the tail of the
mbuf backward, and use m_adj(9) instead of fiddling directly with
mbuf data members.


Revision tags: yamt-x86pmap-base
# 1.253 11-Sep-2007 degroote

branches: 1.253.2;
In some FAST_IPSEC, spl level is not restored correctly. Fix that.

Spotted by Wolfgang Stukenbrock in pr/36800


Revision tags: nick-csl-alignment-base5
# 1.252 30-Aug-2007 dyoung

Use malloc(9) for sockaddrs instead of pool(9), and remove dom_sa_pool
and dom_sa_len members from struct domain. Pools of fixed-size
objects are too rigid for sockaddr_dls, whose size can vary over
a wide range.

Return sockaddr_dl to its "historical" size. Now that I'm using
malloc(9) instead of pool(9) to allocate sockaddr_dl, I can create
a sockaddr_dl of any size in the kernel, so expanding sockaddr_dl
is useless.

Avoid using sizeof(struct sockaddr_dl) in the kernel.

Introduce sockaddr_dl_alloc() for allocating & initializing an
arbitrary sockaddr_dl on the heap.

Add an argument, the sockaddr length, to sockaddr_alloc(),
sockaddr_copy(), and sockaddr_dl_setaddr().

Constify: LLADDR() -> CLLADDR().

Where the kernel overwrites LLADDR(), use sockaddr_dl_setaddr(),
instead. Used properly, sockaddr_dl_setaddr() will not overrun
the end of the sockaddr.


# 1.251 10-Aug-2007 dyoung

branches: 1.251.2;
Use sockaddr_dl_init().


Revision tags: matt-mips64-base
# 1.250 19-Jul-2007 dyoung

branches: 1.250.4; 1.250.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base yamt-idlelwp-base8 mjf-ufs-trans-base
# 1.249 02-May-2007 dyoung

branches: 1.249.2;
Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


Revision tags: thorpej-atomic-base
# 1.248 25-Mar-2007 liamjfoy

Add net.inet.ip.hashsize to control the IPv4 fast forward hash table size.


# 1.247 24-Mar-2007 liamjfoy

Don't call ip*flow_reap if we're just looking up maxflows


# 1.246 12-Mar-2007 ad

branches: 1.246.2; 1.246.4;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.


# 1.245 05-Mar-2007 liamjfoy

branches: 1.245.2;
Move ipflow_slowtimo from ip_slowtimo and into in_proto.c

ok matt@


# 1.244 04-Mar-2007 christos

Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.243 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.242 29-Jan-2007 dyoung

branches: 1.242.2;
Cosmetic: remove extraneous, non-KNF parentheses. Change a
sizeof(type) to a sizeof(*ptr) so the correctness of the statement
is correct "at a glance" (or so I hope).


# 1.241 22-Dec-2006 ad

ipintr(): check if the queue is empty before looping. Hardly a giant
win, but removed 30% of splnet() calls in one local test.


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.240 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.239 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


# 1.238 06-Dec-2006 dyoung

KNF.


# 1.237 06-Dec-2006 dyoung

KNF.


Revision tags: netbsd-4-0-RC1 netbsd-4-base
# 1.236 16-Nov-2006 christos

branches: 1.236.2; 1.236.4;
__unused removal on arguments; approved by core.


Revision tags: yamt-splraiseipl-base2
# 1.235 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


# 1.234 10-Oct-2006 dogcow

change the MOWNER_INIT define to take two args; fix extant struct mowner
decls to use it. Makes options MBUFTRACE compile again and not whinge about
missing structure declarations. (Also makes initialization consistent.)


# 1.233 05-Oct-2006 tls

Protect calls to pool_put/pool_get that may occur in interrupt context
with spl used to protect other allocations and frees, or datastructure
element insertion and removal, in adjacent code.

It is almost unquestionably the case that some of the spl()/splx() calls
added here are superfluous, but it really seems wrong to see:

s=splfoo();
/* frob data structure */
splx(s);
pool_put(x);

and if we think we need to protect the first operation, then it is hard
to see why we should not think we need to protect the next. "Better
safe than sorry".

It is also almost unquestionably the case that I missed some pool
gets/puts from interrupt context with my strategy for finding these
calls; use of PR_NOWAIT is a strong hint that a pool may be used from
interrupt context but many callers in the kernel pass a "can wait/can't
wait" flag down such that my searches might not have found them. One
notable area that needs to be looked at is pf.

See also:

http://mail-index.netbsd.org/tech-kern/2006/07/19/0003.html
http://mail-index.netbsd.org/tech-kern/2006/07/19/0009.html


# 1.232 19-Sep-2006 elad

Remove ugly (void *) casts from network scope authorization wrapper and
calls to it.

While here, adapt code for system scope listeners to avoid some more
casts (forgotten in previous run).

Update documentation.


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9
# 1.231 13-Sep-2006 elad

branches: 1.231.2;
Don't use KAUTH_RESULT_* where it's not applicable.
Prompted by yamt@.


# 1.230 08-Sep-2006 elad

First take at security model abstraction.

- Add a few scopes to the kernel: system, network, and machdep.

- Add a few more actions/sub-actions (requests), and start using them as
opposed to the KAUTH_GENERIC_ISSUSER place-holders.

- Introduce a basic set of listeners that implement our "traditional"
security model, called "bsd44". This is the default (and only) model we
have at the moment.

- Update all relevant documentation.

- Add some code and docs to help folks who want to actually use this stuff:

* There's a sample overlay model, sitting on-top of "bsd44", for
fast experimenting with tweaking just a subset of an existing model.

This is pretty cool because it's *really* straightforward to do stuff
you had to use ugly hacks for until now...

* And of course, documentation describing how to do the above for quick
reference, including code samples.

All of these changes were tested for regressions using a Python-based
testsuite that will be (I hope) available soon via pkgsrc. Information
about the tests, and how to write new ones, can be found on:

http://kauth.linbsd.org/kauthwiki

NOTE FOR DEVELOPERS: *PLEASE* don't add any code that does any of the
following:

- Uses a KAUTH_GENERIC_ISSUSER kauth(9) request,
- Checks 'securelevel' directly,
- Checks a uid/gid directly.

(or if you feel you have to, contact me first)

This is still work in progress; It's far from being done, but now it'll
be a lot easier.

Relevant mailing list threads:

http://mail-index.netbsd.org/tech-security/2006/01/25/0011.html
http://mail-index.netbsd.org/tech-security/2006/03/24/0001.html
http://mail-index.netbsd.org/tech-security/2006/04/18/0000.html
http://mail-index.netbsd.org/tech-security/2006/05/15/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/01/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/25/0000.html

Many thanks to YAMAMOTO Takashi, Matt Thomas, and Christos Zoulas for help
stablizing kauth(9).

Full credit for the regression tests, making sure these changes didn't break
anything, goes to Matt Fleming and Jaime Fournier.

Happy birthday Randi! :)


Revision tags: yamt-pdpolicy-base8 rpaulo-netinet-merge-pcb-base
# 1.229 30-Aug-2006 christos

branches: 1.229.2;
fix initializer


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.228 30-Jul-2006 elad

ugh.. more stuff that's overdue and should not be in 4.0: remove the
sysctl(9) flags CTLFLAG_READONLY[12]. luckily they're not documented
so it's only half regression.

only two knobs used them; proc.curproc.corename (check added in the
existing handler; its CTLFLAG_ANYWRITE, yay) and net.inet.ip.forwsrcrt,
that got its own handler now too.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base
# 1.227 07-Jun-2006 kardel

merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base simonb-timecounters-base
# 1.226 08-May-2006 liamjfoy

branches: 1.226.2;
#if -> #ifdef

ok christos


# 1.225 15-Apr-2006 christos

Coverity CID 1134: Protect against NULL deref.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.224 18-Feb-2006 joerg

branches: 1.224.2; 1.224.4; 1.224.6;
Print the source and destination IP in ip_forward's DIAGNOSTIC code
with inet_ntoa, making it more human friendly.

From Liam J. Foy in private mail.


# 1.223 24-Dec-2005 perry

branches: 1.223.2; 1.223.4; 1.223.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.222 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.221 01-Nov-2005 christos

Don't decrement the ttl, until we are sure that we can forward this packet.
Before if there was no route, we would call icmp_error with a datagram
packet that has an incorrect checksum. (From Liam Foy)


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base
# 1.220 23-Oct-2005 christos

No need to pass an interface when only the mtu is needed. From OpenBSD via
Liam Foy.


Revision tags: yamt-vop-base
# 1.219 05-Aug-2005 elad

branches: 1.219.2;
Add sysctls for IP, ICMP, TCP, and UDP statistics.


# 1.218 28-Jun-2005 seanb

branches: 1.218.2;
- Return ICMP_UNREACH_NET when no route found as per
section 4.3.3.1 of rfc1812.


# 1.217 09-Jun-2005 atatat

Properly fix the constipated lossage wrt -Wcast-qual and the sysctl
code. I know it's not the prettiest code, but it seems to work rather
well in spite of itself.


# 1.216 01-Jun-2005 blymn

Unconstify rnode to prevent compile error when GATEWAY option set.


Revision tags: kent-audio2-base
# 1.215 29-Apr-2005 yamt

move decl of inetsw to its own header to avoid array of incomplete type.
found by gcc4. reported by Adam Ciarcinski.


# 1.214 18-Apr-2005 yamt

fix problems related to loopback interface checksum omission. PR/29971.

- for ipv4, defer decision to ip layer as h/w checksum offloading does
so that it can check the actual interface the packet is going to.
- for ipv6, disable it.
(maybe will be revisited when it implements h/w checksum offloading.)

ok'ed by Jason Thorpe.


# 1.213 29-Mar-2005 yamt

ip_reass: clear stale csum_flags.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base
# 1.212 26-Feb-2005 perry

branches: 1.212.2;
nuke trailing whitespace


Revision tags: yamt-km-base2
# 1.211 03-Feb-2005 perry

ANSIfy function declarations


# 1.210 02-Feb-2005 perry

de-__P -- will ANSIfy .c files later.


Revision tags: yamt-km-base
# 1.209 24-Jan-2005 matt

branches: 1.209.2;
Add IFNET_FOREACH and IFADDR_FOREACH macros and start using them.


Revision tags: kent-audio1-beforemerge
# 1.208 19-Dec-2004 christos

branches: 1.208.2;
yamt's changes seem to fix all the checksumming issues. Turn the loopback
checksums back off so we can make sure that everything works.


# 1.207 17-Dec-2004 christos

Turn checksumming on loopback back on until we fix the bugs in it.
Connect over tcp on the loopback is broken:

4729 amq 0.000007 CALL connect(4,0x804f2a0,0x1c)
4729 amq 75.007420 RET connect -1 errno 60 Connection timed out


# 1.206 15-Dec-2004 thorpej

Don't perform checksums on loopback interfaces. They can be reenabled with
the net.inet.*.do_loopback_cksum sysctl.

Approved by: groo


Revision tags: kent-audio1-base
# 1.205 06-Oct-2004 darrenr

Add a comment to document what setting "srcrt" is really on about in ipintr()


# 1.204 29-Sep-2004 christos

PR/27081: Sean Boudreau: ip_input() bad csum count not incremented on sw csum


Revision tags: BEFORE-IPF413
# 1.203 25-May-2004 atatat

Sysctl descriptions under net subtree (net.key not done)


# 1.202 02-May-2004 darrenr

at line 543, we do a pullup here of hlen bytes into the mbuf,
so these later ones are superfluous.


# 1.201 01-May-2004 matt

Use EVCNT_ATTACH_STATIC{,2}


# 1.200 25-Apr-2004 simonb

Initialise (most) pools from a link set instead of explicit calls
to pool_init. Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.

Convert struct session, ucred and lockf to pools.


# 1.199 22-Apr-2004 matt

Constify protosw arrays. This can reduce the kernel .data section by
over 4K (if all the network protocols) are loaded.


# 1.198 01-Apr-2004 matt

In ip_reass_ttl_descr, make i signed since it's compared to >= 0


Revision tags: netbsd-2-0-base BEFORE-IPF411
# 1.197 24-Mar-2004 atatat

branches: 1.197.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.


# 1.196 15-Jan-2004 itojun

correct typo in 1.94 -> 1.95. pointed out by Shiva Shenoy


# 1.195 14-Dec-2003 thorpej

Fix syntax errors in CHECK_NMBCLUSTER_PARAMS().


# 1.194 14-Dec-2003 jonathan

Second part of hashed IP_reassembly changes:

When under pressure for mbufs or we have too many fragments in the IP
reassembly queue, drop half of all fragments. This multiplicative-drop
strategy ensures we return to a healthy state, even under borderline
denial-of-service from extremely lossy NFS-over-UDP peers.
The multiplicative-drop phase currently drops 50% of fragments, but
has pre-placed support for implementing drop-fractions other than 50%

The threshhold for the `drop-half' phase is the new variable,
ip_maxfrags which is calculated as nmbclusters/4.

ip_input.c now keeps ip_nmbclusters, a cached copy of nmbclusters.
Before using limits derived from nmbclusters, we check if nmbclusters
and ip_nmclusters are equal. If not, we recompute Ip parameters
derived from nmbclusters. Based on a suggestion by Jason Thorpe.
ip_maxfrags is currently auto-recalcuated.

The counters ip_nfrags and ip_nfragpacketsr are now declared static
and uninitialized (bss), to discourage tampering with them.


# 1.193 12-Dec-2003 scw

Make fast-ipsec and ipflow (Fast Forwarding) interoperate.

The idea is that we only clear M_CANFASTFWD if an SPD exists
for the packet. Otherwise, it's safe to add a fast-forward
cache entry for the route.

To make this work properly, we invalidate the entire ipflow
cache if a fast-ipsec key is added or changed.


# 1.192 08-Dec-2003 jonathan

Add new field ipq_nfrags to struct ipq. Maintain count of fragments
(fragments, not fragmented packets) in each queue entry.
Use ipq_nfrags to maintain a count of total fragments in reassembly queue.


# 1.191 07-Dec-2003 jonathan

KNF: s/unsigned/u_int/, in a couple of places I missed.


# 1.190 06-Dec-2003 jonathan

Replace the single global IP reassembly list/listhead, with a
hashtable of list-heads. Independently re-invented, then reworked to
match similar code in FreeBSD.


# 1.189 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.188 04-Dec-2003 scw

ipflow (IP fast forwarding) is not compatible with FAST_IPSEC either.

XXX: The decision whether or not to fast forward should be made
XXX: dynamically. Using the current approach seriously reduces
XXX: routing performance on gateways with IPsec enabled.


# 1.187 26-Nov-2003 itojun

define RANDOM_IP_ID by default (unifdef -DRANDOM_IP_ID).
one use remains in sys/netipsec, which is kept for freebsd source code compat.


# 1.186 24-Nov-2003 scw

For FAST_IPSEC, ipfilter gets to see wire-format IPsec-encapsulated packets
only. Decapsulated packets bypass ipfilter. This mimics current behaviour
for Kame IPsec.


# 1.185 19-Nov-2003 fvdl

Correct number of arguments to sysctl_rdint.


# 1.184 19-Nov-2003 jonathan

Patch back support for (badly) randomized IP ids, by request:

* Include "opt_inet.h" everywhere IP-ids are generated with ip_newid(),
so the RANDOM_IP_ID option is visible. Also in ip_id(), to ensure
the prototype for ip_randomid() is made visible.

* Add new sysctl to enable randomized IP-ids, provided the kernel was
configured with RANDOM_IP_ID. (The sysctl defaults to zero, and is
a read-only zero if RANDOM_IP_ID is not configured).

Note that the implementation of randomized IP ids is still defective,
and should not be enabled at all (even if configured) without
very careful deliberation. Caveat emptor.


# 1.183 17-Nov-2003 jonathan

Diff to netinet/ip_input.c (restore ip_id, initialize) for ip_id fix:

Revert the (default) ip_id algorithm to the pre-randomid algorithm,
due to demonstrated low-period repeated IDs from the randomized IP_id
code. Consensus is that the low-period repetition (much less than
2^15) is not suitable for general-purpose use.

Allocators of new IPv4 IDs should now call the function ip_newid().
Randomized IP_ids is now a config-time option, "options RANDOM_IP_ID".
ip_newid() can use ip_random-id()_IP_ID if and only if configured
with RANDOM_IP_ID. A sysctl knob should be provided.

This API may be reworked in the near future to support linear ip_id
counters per (src,dst) IP-address pair.


# 1.182 12-Nov-2003 itojun

KNF


# 1.181 11-Nov-2003 jonathan

Change global head-of-local-IP-address list from in_ifaddr to
in_ifaddrhead. Recent changes in struct names caused a namespace
collision in fast-ipsec, which are most cleanly fixed by using
"in_ifaddrhead" as the listhead name.


# 1.180 10-Nov-2003 jonathan

Make per-protocol network input queue stats visible to userland via
sysctl. Add a protocol-independent sysctl handler to show the per-protocol
"struct ifq' statistics. Add IP(v4) specific call to the handler.
Other protocols can show their per-protocol input statistics by
allocating a sysclt node and calling sysctl_ifq() with their own struct ifq *.

As posted to tech-kern plus improvements/cleanup suggested by Andrew Brown.


# 1.179 28-Sep-2003 mycroft

Remove some code that breaks AH tunnels completely. The comment describing
the purpose of this code appears to be on crack -- it's talking about
end-to-end authentication, but the purpose of an AH tunnel is NOT end-to-end
authentication; it's authentication of the tunnel endpoints.

NB: This does not fix the fact that IPsec leaks "packet tags."


# 1.178 06-Sep-2003 itojun

randomize IPv4/v6 fragment ID and IPv6 flowlabel. avoids predictability
of these fields. ip_id.c is from openbsd. ip6_id.c is adapted by kame.


# 1.177 06-Sep-2003 itojun

backout previous, we don't know if arc4random() corrides on reboot.


# 1.176 05-Sep-2003 itojun

initialize fragment ID with arc4random, not by time.tv_sec


# 1.175 22-Aug-2003 itojun

remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.


# 1.174 22-Aug-2003 itojun

change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.


# 1.173 15-Aug-2003 jonathan

(fast-ipsec): Add hooks to pass IPv4 IPsec traffic into fast-ipsec, if
configured with ``options FAST_IPSEC''. Kernels with KAME IPsec or
with no IPsec should work as before.

All calls to ip_output() now always pass an additional compulsory
argument: the inpcb associated with the packet being sent,
or 0 if no inpcb is available.

Fast-ipsec tested with ICMP or UDP over ESP. TCP doesn't work, yet.


# 1.172 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.171 14-Jul-2003 itojun

correct igmp. from love


# 1.170 03-Jul-2003 itojun

minor KNF


# 1.169 30-Jun-2003 itojun

branches: 1.169.2;
do not generate ICMP redirect when packet filter alters ip_dst to an
address that reside on the same link. Cedric Berger convinced me that
it is necessary.


# 1.168 30-Jun-2003 itojun

fix indent


# 1.167 23-Jun-2003 martin

Make sure to include opt_foo.h if a defflag option FOO is used.


# 1.166 15-Jun-2003 matt

Change the way multicasts are kept. They now use a hash table in the same
manner as the ifaddr hash table. By doing this, the mkludge code can go
away. At the same time, keep track of what pcbs are using what ifaddr and
when an address is deleted from an interface, notify/abort all sockets
that have that address as a source. Switch IGMP and multicasts to use pools
for allocation. Fix a number of potential problems in the igmp code where
allocation failures could cause a trap/panic.


# 1.165 11-Apr-2003 christos

PR/991: Darren Reed: Add a sysctl (checkinteface) to implement this. This
implementation is taken from FreeBSD, but we default to off.
XXX: We should really do this on a per ifaddr basis as jason suggested.


# 1.164 26-Feb-2003 matt

Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.163 12-Nov-2002 itojun

remove all entries in rt timer queue on ip_mtudisc change, instead of
destroying the queue.


# 1.162 12-Nov-2002 itojun

ckout previous - doesn't compile


# 1.161 12-Nov-2002 itojun

update ip_mtudisc sysctl change handling.


# 1.160 10-Nov-2002 itojun

always create pmtud timeout queue, as ip_mtudisc can be tweaked via
sysctl at runtime. From lha@stacken.kth.se


# 1.159 02-Nov-2002 perry

/*CONTCOND*/ while (0)'ed macros


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.158 23-Sep-2002 itojun

revert mtudisc_timeout value to the old one if update falis


# 1.157 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame.


# 1.156 11-Sep-2002 itojun

correct signedness mixup in pointer passing. sync w/kame


Revision tags: gehenna-devsw-base
# 1.155 14-Aug-2002 itojun

avoid swapping endian of ip_len and ip_off on mbuf, to meet with M_LEADINGSPACE
optimization made last year. should solve PR 17867 and 10195.

IP_HDRINCL behavior of raw ip socket is kept unchanged. we may want to
provide IP_HDRINCL variant that does not swap endian.


# 1.154 30-Jun-2002 thorpej

Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).


# 1.153 13-Jun-2002 itojun

set IPv4 parameter to modern value.
- turn on path MTU discovery (previous: turned off)
- ICMPv4 redirect entry timeout = 600 sec (previous: never timeout)


# 1.152 09-Jun-2002 itojun

whitespace


# 1.151 07-Jun-2002 itojun

look at rmx_mtu on IPsec tunnel MTU computation.
From: David Waitzman <djw@bbn.com>


Revision tags: netbsd-1-6-base
# 1.150 12-May-2002 matt

branches: 1.150.2; 1.150.4;
Eliminate commons.


# 1.149 12-May-2002 wiz

Spelling fixes, from Sergey Svishchev in kern/16650.


# 1.148 07-May-2002 matt

Change struct ipqe to use TAILQ's instead of LIST's (primarily for TCP's
benefit currently). Rework tcp_reass code to optimize the 4 most likely causes
of out-of-order packets: first OoO pkt, next OoO pkt in seq, OoO pkt is part
of new chuck of OoO packets, and the OoO pkt fills the first hole. Add evcnts
to instrument tcp_reass (enabled by the options TCP_REASS_COUNTERS). This is
part 1/2 of tcp_reass changes.


# 1.147 18-Apr-2002 matt

Change test for M_EXT to M_READONLY for MROUTING. We only need to to do
a pullup if we aren't allowed to modify the packet.


Revision tags: eeh-devprop-base newlock-base
# 1.146 08-Mar-2002 thorpej

Pool deals fairly well with physical memory shortage, but it doesn't
deal with shortages of the VM maps where the backing pages are mapped
(usually kmem_map). Try to deal with this:

* Group all information about the backend allocator for a pool in a
separate structure. The pool references this structure, rather than
the individual fields.
* Change the pool_init() API accordingly, and adjust all callers.
* Link all pools using the same backend allocator on a list.
* The backend allocator is responsible for waiting for physical memory
to become available, but will still fail if it cannot callocate KVA
space for the pages. If this happens, carefully drain all pools using
the same backend allocator, so that some KVA space can be freed.
* Change pool_reclaim() to indicate if it actually succeeded in freeing
some pages, and use that information to make draining easier and more
efficient.
* Get rid of PR_URGENT. There was only one use of it, and it could be
dealt with by the caller.

From art@openbsd.org.


Revision tags: ifpoll-base
# 1.145 25-Feb-2002 itojun

correctly enforce ipsec policy check on forwarding case.
From: Greg Troxel <gdt@ir.bbn.com>, Bill Chiarchiaro <wjc@work.cleartech.com>


# 1.144 24-Feb-2002 martin

Clear M_BCAST and M_MCAST on outgoing mbufs.
Don't copy ttl from the inner packet to the encapsulating packet. Make
the outer ttl sysctl'able. This should close PR 14269 from Jasper Wallace
(change partly from there) and it makes traceroute work over gre tunnels.


# 1.143 21-Feb-2002 itojun

suppress source quence message, based on router-req RFC (also could be abused
as DoS traffic generator). from kjc/kame


# 1.142 28-Nov-2001 darrenr

recompute hlen after calling pfil_run_hooks() in case ip_hl was changed.


# 1.141 13-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-mips-cache-base
# 1.140 04-Nov-2001 matt

Convert netinet to not use the internal <sys/queue.h> field names
but instead the access macros. Use the FOREACH macros where appropriate.


# 1.139 04-Nov-2001 matt

Change a few variable/tables to const since they are read-only.


# 1.138 29-Oct-2001 simonb

Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.137 17-Sep-2001 thorpej

branches: 1.137.2;
Split the pre-computed ifnet checksum flags into Tx and Rx directions.
Add capabilities bits that indicate an interface can only perform
in-bound TCPv4 or UDPv4 checksums. There is at least one Gig-E chip
for which this is true (Level One LXT-1001), and this is also the
case for the Intel i82559 10/100 Ethernet chips.


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.136 06-Aug-2001 itojun

branches: 1.136.2;
cache IPsec policy on in6?pcb. most of the lookup operations can be bypassed,
especially when it is a connected SOCK_STREAM in6?pcb. sync with kame.


# 1.135 02-Jun-2001 thorpej

branches: 1.135.2;
Implement support for IP/TCP/UDP checksum offloading provided by
network interfaces. This works by pre-computing the pseudo-header
checksum and caching it, delaying the actual checksum to ip_output()
if the hardware cannot perform the sum for us. In-bound checksums
can either be fully-checked by hardware, or summed up for final
verification by software. This method was modeled after how this
is done in FreeBSD, although the code is significantly different in
most places.

We don't delay checksums for IPv6/TCP, but we do take advantage of the
cached pseudo-header checksum.

Note: hardware-assisted checksumming defaults to "off". It is
enabled with ifconfig(8). See the manual page for details.

Implement hardware-assisted checksumming on the DP83820 Gigabit Ethernet,
3c90xB/3c90xC 10/100 Ethernet, and Alteon Tigon/Tigon2 Gigabit Ethernet.


# 1.134 21-May-2001 lukem

fix spelo in comment


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.133 16-Apr-2001 itojun

give a default value to net.inet.ip.maxfragpackets, to protect us from
"lots of fragmented packets" DoS attack.

the current default value is derived from ipv6 counterpart, which is
a magical value "200". it should be enough for normal systems, not sure
if it is enough when you take hundreds of thousands of tcp connections on
your system. if you have proposal for a better value with concrete reasons,
let me know.


# 1.132 13-Apr-2001 thorpej

Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.


# 1.131 27-Mar-2001 itojun

net.inet.ip.maxfragpackets defines the maximum size of ip reass queue
(prevents fragment flood from chewing up mbuf memory space).
derived from KAME net.inet6.ip6.maxfragpackets.


# 1.130 02-Mar-2001 itojun

branches: 1.130.2;
increase ipstat.ips_badaddr if the packet fails to pass address checks.


# 1.129 02-Mar-2001 itojun

reject packets with 127/8 on IPv4 src/dst, they must not appear on wire
(RFC1122). torture-tests will be welcomed.
XXX do we want to check source routing headers as well?


# 1.128 01-Mar-2001 itojun

make sure to enforce inbound ipsec policy checking, for any protocols on top
of ip (check it when final header is visited). sync with kame.
XXX kame team will need to re-check policy engine code


# 1.127 24-Jan-2001 itojun

- record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation


# 1.126 28-Dec-2000 thorpej

Back out the sledgehammer damage applied by wiz while I was out for
the holiday.


# 1.125 25-Dec-2000 wiz

Back out previous change. It causes NAT to fail, and was CLEARLY
NOT TESTED before it was committed.


# 1.124 22-Dec-2000 thorpej

Slight adjustment to how pfil_head's are registered. Instead of a
"key" and a "dlt", use a "type" (PFIL_TYPE_{AF,IFNET} for now) and
a val/ptr appropriate for that type. This allows for more future
flexibility with the pfil_hook mechanism.


# 1.123 14-Dec-2000 thorpej

Add ALTQ glue. XXX Temporary until ALTQ is changed to use a pfil hook.


# 1.122 24-Nov-2000 itojun

IFA_STATS stability (not complete); don't touch ip if it is NULL.


# 1.121 11-Nov-2000 thorpej

Restructure the PFIL_HOOKS mechanism a bit:
- All packets are passed to PFIL_HOOKS as they come off the wire, i.e.
fields in protocol headers in network order, etc.
- Allow for multiple hooks to be registered, using a "key" and a "dlt".
The "dlt" is a BPF data link type, indicating what type of header is
present.
- INET and INET6 register with key == AF_INET or AF_INET6, and
dlt == DLT_RAW.
- PFIL_HOOKS now take an argument for the filter hook, and mbuf **,
an ifnet *, and a direction (PFIL_IN or PFIL_OUT), thus making them
less IP (really, IP Filter) centric.

Maintain compatibility with IP Filter by adding wrapper functions for
IP Filter.


# 1.120 08-Nov-2000 ad

Update for hashinit() change.


# 1.119 13-Oct-2000 itojun

make sure we don't share external mbuf between m and mcopy, in ip_forward().
should solve PR 11201.


# 1.118 26-Aug-2000 itojun

make sure anonport{min,max} is not negative number


# 1.117 25-Aug-2000 tron

Add new sysctl variables "net.inet.ip.lowportmin" and
"net.inet.ip.lowportmax" which can be used to the set minimum
and maximum port number assigned to sockets using
IP_PORTRANGE_LOW.


# 1.116 06-Jul-2000 itojun

remove unnecessary #include <netkey/key_debug.h>. from kame.


# 1.115 28-Jun-2000 mrg

<vm/vm.h> -> <uvm/uvm_extern.h>


Revision tags: netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.114 10-May-2000 itojun

branches: 1.114.4;
add missing boundary checks to ip options processing.
correct timestamp option validation (len and ptr upper/lower bound
based on RFC791).
fill "pointer" field for parameter problem in timestamp option processing.


# 1.113 10-May-2000 itojun

correct more out-of-bounds memory access, if cnt == 1 and optlen > 1.


# 1.112 06-May-2000 sommerfeld

Handle large offsets with very small options correctly.


# 1.111 31-Mar-2000 jdolecek

Slighly improve previous - only include <netinet/ip_mroute.h> if MROUTING
is defined.


# 1.110 31-Mar-2000 jdolecek

include <netinet/ip_mroute.h> for ip_mforward() - needed after
last duplicate prototype sweep (prototype for ip_mforward() used to be in <netinet/ip_var.h>)


# 1.109 30-Mar-2000 augustss

Remove register declarations.


# 1.108 30-Mar-2000 simonb

Delete uninitialised declaration of ip_defttl - there's an initialised
decl earlier in this file.


# 1.107 10-Mar-2000 thorpej

Back out previous, and adjust a comment.


# 1.106 07-Mar-2000 thorpej

Back out part of 1.104 which isn't actually needed.


# 1.105 03-Mar-2000 itojun

remove unnecessary ttl initialization which I mistakingly bringed in
during KAME merge (this is part of WIDE's expeirmental reass code...)
NetBSD PR: 9412
From: Wolfgang Rupprecht <wolfgang@wsrcc.com>
Fix from: ho@crt.se
itojun was notified from: theo


# 1.104 02-Mar-2000 thorpej

Avoid a bug in GCC which manifests itself when processing unaligned
IP options. Problem pointed out by Matt Hargett and Erik Fair, analyzed
by me.


# 1.103 01-Mar-2000 itojun

introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)


# 1.102 20-Feb-2000 darrenr

pass "struct pfil_head *" to pfil_add_hook and pfil_remove hook rather
than "struct protosw *".


# 1.101 17-Feb-2000 darrenr

Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.


# 1.100 16-Feb-2000 itojun

- if ip_dst matches address on !IFF_UP interface, and
- there's no match against addresses on IFF_UP interface,
send icmp unreach if I'm router. drop it if I'm host.

Revised version of PR: 9387 from nrt@iij.ad.jp. Discussed with thorpej+nrt.


Revision tags: chs-ubc2-newbase
# 1.99 12-Feb-2000 thorpej

Typo (Thanks, Havard :-)


# 1.98 12-Feb-2000 thorpej

Small cosmetic change, and note a place where a statistic should be
gathered.


# 1.97 11-Feb-2000 itojun

fix in-kernel packet forwarding loop (till TTL becomes 0) when:
- a packet is delivered to an address X,
- and the address X is configured on my !IFF_UP interface
- and ipforwarding=1

NetBSD PR: 9387
From: nrt@iij.ad.jp


# 1.96 01-Feb-2000 thorpej

Use ifatoia() and sintosa() consistently, rather than using home-grown
casting macros intermixed.


# 1.95 31-Jan-2000 itojun

bring in latest KAME ipsec tree.
- interop issues in ipcomp is fixed
- padding type (after ESP) is configurable
- key database memory management (need more fixes)
- policy specification is revisited

XXX m->m_pkthdr.rcvif is still overloaded - hope to fix it soon


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.94 26-Oct-1999 itojun

disable ipflow (IPv4 fast fowarding) when IPsec is configured into the kernel.


# 1.93 17-Oct-1999 sommerfeld

branches: 1.93.2; 1.93.4;
In ip_forward():

Avoid forwarding ip unicast packets which were contained inside
link-level multicast packets; having M_MCAST still set in the packet
header flags will mean that the packet will get multicast to a bogus
group instead of unicast to the next hop.

Malformed packets like this have occasionally been spotted "in the
wild" on a mediaone cable modem segment which also had multiple netbsd
machines running as router/NAT boxes.

Without this, any subnet with multiple netbsd routers receiving all
multicasts will generate a packet storm on receipt of such a
multicast. Note that we already do the same check here for link-level
broadcasts; ip6_forward already does this as well.

Note that multicast forwarding does not go through ip_forward().

Adding some code to if_ethersubr to sanity check link-level
vs. ip-level multicast addresses might also be worthwhile.


Revision tags: chs-ubc2-base
# 1.92 23-Jul-1999 itojun

branches: 1.92.2;
do not include unnecessary include files.


# 1.91 09-Jul-1999 thorpej

defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).


# 1.90 06-Jul-1999 itojun

sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups


# 1.89 01-Jul-1999 itojun

IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.88 26-Jun-1999 sommerfeld

If the new global variable hostzerobroadcast is zero, no longer assume
address zero of each net/subnet is a broadcast address.
(The default value is nonzero, which preserves the current behavior).

This can be set using sysctl; the boot-time default can also be
configured using the HOSTZEROBROADCAST kernel config option.

While we're here, defopt HOSTZEROBROADCAST and SUBNETSARELOCAL


# 1.87 04-May-1999 hwr

It does not make much sense to increase a "output" counter on input.


# 1.86 03-May-1999 thorpej

In INADDR_TO_IA(), skip interfaces which are not up. Revert previous change
to ip_input.c to check the interface status after INADDR_TO_IA().

Fix cooked up by Heiko Rupp and myself.

Fixes PR 7480.


# 1.85 03-May-1999 hwr

Drop packets, that have a Class-D address as source address.
Implements the first half of PR 7003.


# 1.84 07-Apr-1999 proff

tiny KNF change


# 1.83 07-Apr-1999 proff

Prevent reception of packets on downed interfaces (via an up interface).
fixes kern/7327


Revision tags: netbsd-1-4-base
# 1.82 27-Mar-1999 aidan

branches: 1.82.2;
Added per-addr input/output statistics. Currently just support netatalk
and netinet, currently only tested under netinet.

Disabled by default, enabled by compiling the kernel with option
IFA_STATS. Enabling this feature seems to make the ip_output function
take 13% longer than before, which should be OK for people that need
this feature.


# 1.81 26-Mar-1999 proff

security: test for ip_len < ip_hl <<2 and drop packet accordingly


# 1.80 19-Jan-1999 mycroft

There's just no plausible reason to byte-swap ip_id internally. It's opaque.


# 1.79 19-Jan-1999 mycroft

Don't screw with ip_len; just subtract from it where we actually use the
value.


# 1.78 19-Jan-1999 mycroft

Don't overwrite the checksum fields when checking them. There's no reason to
do this, and it screws up ICMP replies.
XXX The returned IP checksum and length are still wrong.


# 1.77 11-Jan-1999 thorpej

Fix byte order and ip_len inconsistencies in ICMP reply code. Also, fix
some formatting and HTONS(foo) vs. foo = htons(foo) inconsistencies.

PR #6602, Darren Reed.


# 1.76 19-Dec-1998 thorpej

Reverse the copyright-notice-swap. It went against existing practice.


# 1.75 18-Dec-1998 thorpej

Add a lock around the IP fragment reassembly queue, to prevent ip_drain()
from corrupting the queue if called from a device's interrupt context.

Should fix PR #5684.


Revision tags: kenh-if-detach-base
# 1.74 13-Nov-1998 thorpej

branches: 1.74.2;
Once a fragmented IP packet has been reassembled, recompute the packet
length before passing it up the stack. From FreeBSD.


Revision tags: chs-ubc-base
# 1.73 08-Oct-1998 thorpej

Use the pool allocator for ipflow entries.


# 1.72 08-Oct-1998 thorpej

Use the pool allocator for ipqent structures.


# 1.71 30-Sep-1998 tls

Switch order of TNF and UCB copyrights so UCB copyright is first; this seems more appropriate since UCB wrote the original code, after all.


# 1.70 09-Sep-1998 thorpej

Make a diagnostic printf more sensible, PR #5951, Heiko W. Rupp.


# 1.69 09-Aug-1998 mrg

defopt PFIL_HOOKS.


Revision tags: eeh-paddr_t-base
# 1.68 17-Jul-1998 sommerfe

Fix PR5508: ipfil cut-through forwarding causes panic


# 1.67 01-Jun-1998 thorpej

Protect the ipflow_reap() call with splsoftnet.


# 1.66 24-May-1998 thorpej

Fix OBOB in IP timestamp option processing, as noted in FreeBSD PR 6738,
from Jennifer Dawn Meyers <jdm@enteract.com>.


# 1.65 04-May-1998 matt

Default IP flow to being enabled. Add a sysctl to control the maximum
number of flows (net.inet.ip.maxflows). If set to 0, will disable fast
path forwarding.


# 1.64 01-May-1998 thorpej

Allow packet filters to prevent a packet from creating a fast-forwarding
flow, by setting the "can fast forward" flag in the packet header, and
giving a chance for filters to clear the flag. If the flag is still
set after the filters have given it a chance, the packet will be used
to create a fast-forward flow entry.


# 1.63 29-Apr-1998 matt

Add support for "fast" forwarding. Add hooks in if_ethersubr.c and
if_fddisubr.c to fastpath IP forwarding. If ip_forward successfully
forwards a packet, it will create a cache (ipflow) entry. ether_input
and fddi_input will first call ipflow_fastforward with the received
packet and if the packet passes enough tests, it will be forwarded (the
ttl is decremented and the cksum is adjusted incrementally).


# 1.62 29-Apr-1998 matt

defopt GATEWAY


# 1.61 29-Apr-1998 kml

change path MTU timeout value to match RFC 1191


# 1.60 29-Apr-1998 kml

Add support for deletion of routes added by path MTU discovery;
uses new generic route timeout code. Add sysctl for timeout period.


# 1.59 19-Mar-1998 mrg

convert pfil(9) in and out lists from <sys/queue.h> LISTs to TAILQs, and
change pfil_add_hook to put output filters at the tail of the queue,
while continuing to place input filters at the head of the queue. update
the two users of these functions, and document these changes.

fixes PR#4593.


# 1.58 15-Feb-1998 tls

Add correct copyright notice for IP address hash change. This code is donated to TNF by the original copyright holder, Panix.


# 1.57 13-Feb-1998 tls

Change list of interface IP addresses to a hash. Improves performance on hosts with a large number of IP addresses significantly.


# 1.56 28-Jan-1998 thorpej

Use offsetof() from libkern.h


# 1.55 12-Jan-1998 scottr

Use option header file for MROUTING


# 1.54 05-Jan-1998 lukem

enhance ephemeral port allocation code:
* support sysctl net.inet.ip.anonportmin (lowest ephemeral port)
and net.inet.ip.anonportmax (highest ephemeral port).
these can't be set to >65535, < IPPORT_RESERVED (unless IPNOPRIVPORTS
is defined), and anonportmin has to be < anonportmax.
* use a cleaner way of only cycling through the available set once;
this will be useful for when a random allocation scheme is used
* define IPPORT_ANON{MIN,MAX} instead of IPPORT_USER{LOW,HIGH}


Revision tags: netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base
# 1.53 18-Oct-1997 kml

branches: 1.53.2;
change sysctl net.inet.icmp.mtudisc to net.inet.ip.mtudisc


# 1.52 17-Oct-1997 thorpej

Allow `subnetsarelocal' to be changed via sysctl.


Revision tags: thorpej-signal-base marc-pcmcia-base
# 1.51 29-Aug-1997 gwr

Tweaks to allow operation with an interface address of 0.0.0.0
(needed for NFS mountroot using BOOTP to get boot parameters)


Revision tags: marc-pcmcia-bp
# 1.50 24-Jun-1997 thorpej

branches: 1.50.4;
Eliminate use of dtom() from the network code, allowing more flexible
use of mbuf external storage and increasing performance (by eliminating
an m_pullup() for clusters in the IP reassembly code).

Changes from Koji Imada <koji@math.human.nagoya-u.ac.jp>, in PR #3628
and #3480, with ever-so-slight integration changes by me.


# 1.49 15-Apr-1997 christos

Move the mtod calls *after* we've made sure that the packet has passed the
filter successfully. Otherwise it can be NULL if the filter blocked it,
and we die. How did this ever work?


Revision tags: is-newarp-before-merge
# 1.48 26-Feb-1997 mrg

allow src-routed packetd by default, per host requirements


# 1.47 25-Feb-1997 cjs

Add net.inet.ip.allowsrcrt option which allows/drops all source
routed packets. This currently defaults to `drop,' but once we
verify that all applications that rely on determining remote IP
addresses for authentication are dropping the connection when they
see a source route option (not just disabling the source route
option), we can turn this back on and conform with the host
requirements.


# 1.46 19-Feb-1997 cjs

Fix bug in sysctl net.inet.ip.forwsrcrt handing: now you can read it
if securelevel > 0. (Thanks, cgd.)


# 1.45 18-Feb-1997 mrg

pseudo-device ipfilter brings in PFIL_HOOKS.


Revision tags: is-newarp-base
# 1.44 11-Jan-1997 thorpej

branches: 1.44.4;
Implement the IP_RECVIF socket option: supply a datagram packet's incoming
interface using a sockaddr_dl in a control mbuf.

Implement SO_TIMESTAMP for IP datagrams.

Move packet information option processing into a generic function
so that they work with multicast UDP and raw IP as well as unicast UDP.

Contributed by Bill Fenner <fenner@parc.xerox.com>.


# 1.43 20-Dec-1996 mrg

in pfil_hooks: always reassign ip after calling hook.


# 1.42 20-Dec-1996 mrg

remove pfil_bad.


# 1.41 25-Oct-1996 thorpej

Before concatenating frags, sanity check the length of the packet. If it's
larger than IP_MAXPACKET, discard it.
Based on a patch from Bill Fenner <fenner@parc.xerox.com>


# 1.40 22-Oct-1996 veego

Fix a panic from the pfil_hooks.


# 1.39 13-Oct-1996 christos

backout previous kprintf changes


# 1.38 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.37 21-Sep-1996 perry

commit fix in pr 2772 -- the IP input code was assuming that the
reserved (must be zero) flag must necessarily be zero. We now define
an IP_RF (by analogy to IP_DF and IP_MF) and mask it out when necessary.


# 1.36 14-Sep-1996 mrg

move the packet filter hooks in to a saner location. while i'm here, rename
PACKET_FILTER to PFIL_HOOKS.


# 1.35 09-Sep-1996 mycroft

Add in_nullhost() and in_hosteq() macros, to hide some protocol
details. Also, fix a bug in TCP wrt SYN+URG packets.


# 1.34 08-Sep-1996 mycroft

Save 68 bytes of the packet for ICMP, not 64. From Laine Stump, PR 2296.


# 1.33 06-Sep-1996 mrg

add packet filter interface code. see pfil(9) for more details. you
need the PACKET_FILTER option to enable this code. currently, ipfilter
version 3.1.1-beta has been converted to use this new interface.


# 1.32 14-Aug-1996 thorpej

Fix some DIAGNOSTIC printf() formats; ntohl() provides a 32-bit quantity,
and should be printed with %x, not %lx.


# 1.31 10-Jul-1996 cgd

print result of ntohl/htonl as a long. (makes -Wformat work on the
Alpha.)


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.30 16-Mar-1996 christos

branches: 1.30.4;
Fix printf format args.


# 1.29 26-Feb-1996 mrg

two more local addr changes, all done differently now (idea from charles)


# 1.28 13-Feb-1996 christos

netinet prototypes


# 1.27 16-Jan-1996 thorpej

Add a net.inet.ip.directed-broadcast sysctl as suggested by
Darren Reed <darrenr@vitruvius.arbld.unimelb.edu.au> in PR #1227.
This change is slightly different than the one submitted by Darren in
that the DIRECTED_BROADCAST compile-time option will behave like it used
to so that existing configurations utilizing it won't have to change.


# 1.26 15-Jan-1996 thorpej

Add net.inet.ip.forwsrcrt: if zero, the system will not forward
source-routed packets. Note this value is protected by kernel security
level; it can only be changed if securelevel < 1.


# 1.25 21-Nov-1995 cgd

make netinet work on systems where pointers and longs are 64 bits
(like the alpha). Biggest problem: IP headers were overlayed with
structure which included pointers, and which therefore didn't overlay
properly on 64-bit machines. Solution: instead of threading pointers
through IP header overlays, add a "queue element" structure to do
the threading, and point it at the ip headers.


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.24 12-Aug-1995 mycroft

splnet --> splsoftnet


# 1.23 12-Jun-1995 mycroft

Change in_pcbnotify*() to take an errno value. Make inetctlerrmap[] an
array on ints, not u_chars.


# 1.22 12-Jun-1995 mycroft

Various cleanup, including:
* Convert several data structures to use queue.h.
* Split in_pcbnotify() into two parts; one for notifying a specific PCB, and
one for notifying all PCBs for a particular foreign address.


# 1.21 07-Jun-1995 mycroft

Remove ip_ifmatrix completely.


# 1.20 04-Jun-1995 mycroft

Don't cast things unnecessarily.


# 1.19 04-Jun-1995 mycroft

Clean up many more casts.


# 1.18 01-Jun-1995 mycroft

Avoid byte-swapping IP addresses at run time.


# 1.17 15-May-1995 cgd

oops; forgot a '{'


# 1.16 14-May-1995 cgd

drop (and record) malformed IP fragments. Fixes pr 1030 (differently).


# 1.15 13-Apr-1995 cgd

be a bit more careful and explicit with types. (basically a large no-op.)


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.14 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.13 13-May-1994 mycroft

Update to 4.4-Lite networking code, with a few local changes.


# 1.12 14-Feb-1994 mycroft

PARANOID --> DIAGNOSTIC for inexpensive tests.


# 1.11 02-Feb-1994 hpeyerl

Multicast is no longer optional.


# 1.10 29-Jan-1994 brezak

Fix some cases of NOT dealing with m_pkthdr's. This code is still suspect though, at least this fixes some panics.


# 1.9 10-Jan-1994 mycroft

Should compile now with or without `options MULTICAST'.


# 1.8 09-Jan-1994 mycroft

Prototype the rest.


# 1.7 08-Jan-1994 mycroft

More prototypes.


# 1.6 08-Jan-1994 mycroft

Fix some inconsistent spacing; spaces at the end of lines, etc.


# 1.5 18-Dec-1993 mycroft

Canonicalize all #includes.


# 1.4 06-Dec-1993 hpeyerl

multicast support.
>From Chris Maeda, cmaeda@cs.washington.edu
These patches are derived from the IP Multicast patches for BSDI.


Revision tags: magnum-base netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.3 20-May-1993 cgd

branches: 1.3.4;
more rcsid additions and file header cleanups


# 1.2 04-May-1993 cgd

make ip_input recursion checking be for -DPARANOID, and make it panic


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.389 13-May-2019 ozaki-r

Count packets dropped by pfil


Revision tags: isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118
# 1.388 17-Jan-2019 knakahara

Fix ipsecif(4) cannot apply input direction packet filter. Reviewed by ozaki-r@n.o and ryo@n.o.

Add ATF later.


Revision tags: pgoyette-compat-1226 pgoyette-compat-1126
# 1.387 15-Nov-2018 maxv

Remove the 't' argument from m_tag_find().


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.386 02-Sep-2018 maxv

remove reference to ipnat, and duplicate comments


Revision tags: pgoyette-compat-0728
# 1.385 10-Jul-2018 maxv

Remove the second argument from ip_reass_packet(). We want the IP header
on the mbuf, not elsewhere. Simplifies the NPF reassembly code a little.
No real functional change.


Revision tags: phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.384 17-May-2018 maxv

Add KASSERTs, related to PR/39794.


# 1.383 14-May-2018 maxv

Merge ipsec4_input and ipsec6_input into ipsec_ip_input. Make the argument
a bool for clarity. Optimize the function: if M_CANFASTFWD is not there
(because already removed by the firewall) leave now.

Makes it easier to see that M_CANFASTFWD is not removed on IPv6.


# 1.382 10-May-2018 maxv

Rename ipsec4_forward -> ipsec_mtu, and switch to void.


Revision tags: pgoyette-compat-0502
# 1.381 26-Apr-2018 maxv

Remove unused mbuf argument from sbsavetimestamp.


Revision tags: pgoyette-compat-0422 pgoyette-compat-0415
# 1.380 15-Apr-2018 maxv

Introduce a m_verify_packet function, that verifies the mbuf chain of a
packet to ensure it is not malformed. Call this function in "points of
interest", that are the IPv4/IPv6/IPsec entry points. There could be more.

We use M_VERIFY_PACKET(m), declared under DIAGNOSTIC only.

This function should not be called everywhere, especially not in places
that temporarily manipulate (and clobber) the mbuf structure; once they're
done they put the mbuf back in a correct format.


# 1.379 11-Apr-2018 maxv

Don't pass IP_ALLOWBROADCAST in ipsec4_input. The flag lands in
ipsec_getpolicybyaddr, and only IP_FORWARDING is taken.

In fact it would be good to change the 'flags' argument of ipsec4_input
to be a boolean, same for ipsec_getpolicybyaddr. It would be less
misleading.


# 1.378 11-Apr-2018 maxv

Add comment about IPsec.


# 1.377 11-Apr-2018 maxv

Small changes in ip_dooptions: replace bcopy by memcpy, the areas can't
overlap.


Revision tags: pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.376 24-Feb-2018 ozaki-r

branches: 1.376.2;
Avoid a deadlock between softnet_lock and IFNET_LOCK

A deadlock occurs because there is a violation of the rule of lock ordering;
softnet_lock is held with hodling IFNET_LOCK, which violates the rule.
To avoid the deadlock, replace softnet_lock in in_control and in6_control
with KERNEL_LOCK.

We also need to add some KERNEL_LOCKs to protect the network stack surely.
This is required, for example, for PR kern/51356.

Fix PR kern/53043


# 1.375 09-Feb-2018 maxv

Remove dead code.


# 1.374 07-Feb-2018 maxv

Remove null check on ip, it can't be null. (Confuses code scanners.)


# 1.373 06-Feb-2018 maxv

Typos and style a bit, no functional change.


# 1.372 05-Feb-2018 maxv

Exterminate IPSENDREDIRECTS and IPMTUDISCTIMEOUT, neither is documented.


# 1.371 05-Feb-2018 maxv

Nuke DIRECTED_BROADCAST, it is not documented and not enabled anywhere. It
probably wouldn't have built correctly anyway, since there is no associated
defflag.

These ten lines of code in ip_input.c already look a lot better.


# 1.370 05-Feb-2018 maxv

Clean up this mess. This is typically the kind of places where we need to
seriously cut the bullshit. These things are unreadable, undocumented, and
all they bought us was not figuring out we had IPv4 forwarding enabled by
default for 20+ years.


# 1.369 05-Feb-2018 maxv

Be tougher, and don't allow LSRR+SSRR (RFC7126).


# 1.368 05-Feb-2018 maxv

Kick duplicate options, they are not allowed (RFC791).


# 1.367 05-Feb-2018 maxv

Remove unused variable.


# 1.366 05-Feb-2018 maxv

Disable ip_allowsrcrt and ip_forwsrcrt. Enabling them by default was a
completely dumb idea, because they have security implications.

By sending an IPv4 packet containing an LSRR option, an attacker will
cause the system to forward the packet to another IPv4 address - and
this way he white-washes the source of the packet.

It is also possible for an attacker to reach hidden networks: if a server
has a public address, and a private one on an internal network (network
which has several internal machines connected), the attacker can send a
packet with:

source = 0.0.0.0
destination = public address of the server
LSRR first address = address of a machine on the internal network

And the packet will be forwarded, by the server, to the internal machine,
in some cases even with the internal IP address of the server as a source.


# 1.365 05-Feb-2018 maxv

Style, no functional change.


# 1.364 01-Jan-2018 christos

1) "#define ipi_spec_dst ipi_addr" in <netinet/in.h>
2) Change the IP_RECVPKTINFO option to control the generation of
IP_PKTINFO control messages, the way it's done in Solaris.
3) Remove the superfluous IP_RECVPKTINFO control message.
4) Change the IP_PKTINFO option to do different things depending on
the parameter it's supplied with:
- If it's sizeof(int), assume it's being used as in Linux:
- If it's non-zero, turn on the IP_RECVPKTINFO option.
- If it's zero, turn off the IP_RECVPKTINFO option.
- If it's sizeof(struct in_pktinfo), assume it's being used as in
Solaris, to set a default for the source interface and/or
source address for outgoing packets on the socket.
5) Return what Linux or Solaris compatible code expects, depending
on data size, and just added a fallback to a Linux (and current NetBSD)
compatible value if the size is unknown (as it is now), or,
in the future, if the calling application specifies a receiving
buffer that doesn't match either data item.

From: Tom Ivar Helbekkmo


Revision tags: tls-maxphys-base-20171202
# 1.363 24-Nov-2017 roy

Allow local communication over DETACHED addresses.
Allow binding to DETACHED or TENTATIVE addresses as we deny
sending upstream from them anyway.
Prefer non DETACHED or TENTATIVE addresses.


# 1.362 17-Nov-2017 ozaki-r

Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch

It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.

No functional change


# 1.361 27-Sep-2017 ozaki-r

Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).


Revision tags: nick-nhusb-base-20170825
# 1.360 27-Jul-2017 ozaki-r

Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.


# 1.359 19-Jul-2017 ozaki-r

Correct a comment


Revision tags: perseant-stdc-iso10646-base
# 1.358 08-Jul-2017 christos

Reorder the controls to the ones that need an interface and the ones that
don't; process the ones that don't first. Add a DIAGNOSTIC if there is no
interface; really this should be a KASSERT/panic because it is a bug if the
interface is not set at this point.


# 1.357 06-Jul-2017 christos

remove unnecessary casts (no functional change)


# 1.356 06-Jul-2017 christos

Merge the two copies SO_TIMESTAMP/SO_OTIMESTAMP processing to a single
function, and add a SOOPT_TIMESTAMP define reducing compat pollution from
5 places to 1.


Revision tags: netbsd-8-base
# 1.355 01-Jun-2017 chs

branches: 1.355.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base
# 1.354 31-Mar-2017 ozaki-r

Don't use a single global variable to store source route information for multiple incoming packets

It's not MP-safe. So use a m_tag to store the information instead.

Pointed out by knakahara@
The fix is from OpenBSD (originally fixed in FreeBSD)


# 1.353 31-Mar-2017 ozaki-r

Don't use a single global variable as a temporal storage for multiple packets

It's not MP-safe. So use local variables instead.


Revision tags: pgoyette-localcount-20170320
# 1.352 06-Mar-2017 ozaki-r

Make sure icmp_redirect_timeout_q and ip_mtudisc_timeout_q are initialized on bootup

Fix PR kern/52029


# 1.351 17-Feb-2017 ozaki-r

Fix return value


# 1.350 17-Feb-2017 ozaki-r

Protect sysctl_net_inet_ip_pmtudto with icmp_mtx instead of softnet_lock


# 1.349 07-Feb-2017 ozaki-r

Add missing NULL checks for m_get_rcvif


Revision tags: nick-nhusb-base-20170204
# 1.348 24-Jan-2017 ozaki-r

Tweak softnet_lock and NET_MPSAFE

- Don't hold softnet_lock in some functions if NET_MPSAFE
- Add softnet_lock to sysctl_net_inet_icmp_redirtimeout
- Add softnet_lock to expire_upcalls of ip_mroute.c
- Restore softnet_lock for in{,6}_pcbpurgeif{,0} if NET_MPSAFE
- Mark some softnet_lock for future work


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107
# 1.347 12-Dec-2016 ozaki-r

branches: 1.347.2;
Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.346 08-Dec-2016 ozaki-r

Use psref for ip_rtaddr

ip_rtaddr will be sleepable soon. So use psref instead of pserialize.


# 1.345 08-Dec-2016 ozaki-r

Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.344 18-Oct-2016 ozaki-r

Don't hold global locks if NET_MPSAFE is enabled

If NET_MPSAFE is enabled, don't hold KERNEL_LOCK and softnet_lock in
part of the network stack such as IP forwarding paths. The aim of the
change is to make it easy to test the network stack without the locks
and reduce our local diffs.

By default (i.e., if NET_MPSAFE isn't enabled), the locks are held
as they used to be.

Reviewed by knakahara@


# 1.343 18-Oct-2016 ozaki-r

Avoid double frees of mbuf

May fix one of panicks reported by Tom Ivar Helbekkmo in PR kern/51522


# 1.342 11-Oct-2016 ozaki-r

Fix kernel builds with IFA_STATS


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.341 07-Sep-2016 roy

Disallow input to detached addresses because they are not yet valid.


# 1.340 31-Aug-2016 ozaki-r

Make ipforward_rt and ip6_forward_rt percpu

Sharing one rtcache between CPUs is just a bad idea.

Reviewed by knakahara@


Revision tags: pgoyette-localcount-20160806
# 1.339 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


# 1.338 26-Jul-2016 ozaki-r

Fix downmatch increment


Revision tags: pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.337 08-Jul-2016 ozaki-r

branches: 1.337.2;
CID 1363344: remove dead code

We may need to reconsider a case when m_get_rcvif_psref returns NULL.


# 1.336 07-Jul-2016 ozaki-r

Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.


# 1.335 06-Jul-2016 ozaki-r

Switch the IPv4 address list to pslist(9)

Note that we leave the old list just in case; it seems there are some
kvm(3) users accessing the list. We can remove it later if we confirmed
nobody does actually.


# 1.334 06-Jul-2016 ozaki-r

Add and use pslist(9)-based hashtable for IPv4 addresses

Note that we leave the old hashtable to keep vmstat -H working.


# 1.333 04-Jul-2016 ozaki-r

Separate IP address matching functions

No functional change intended.


# 1.332 30-Jun-2016 ozaki-r

Tidy up goto lables

No functional change.


# 1.331 30-Jun-2016 ozaki-r

Fix error paths

Some error paths did m_put_rcvif_psref twice.


# 1.330 28-Jun-2016 ozaki-r

Add missing NULL checks for m_get_rcvif_psref


# 1.329 10-Jun-2016 ozaki-r

Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319
# 1.328 21-Jan-2016 riastradh

Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.


# 1.327 21-Jan-2016 riastradh

Give proper prototype to ip_output.


# 1.326 08-Jan-2016 knakahara

eliminate ip_input.c and ip6_input.c dependency on gif(4)


Revision tags: nick-nhusb-base-20151226
# 1.325 13-Oct-2015 roy

Include arp.h to restore the sysctl net.inet.ip.dad_count.
Fixes PR kern/49883 thanks to HITOSHI Osada.


Revision tags: nick-nhusb-base-20150921
# 1.324 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.323 07-Aug-2015 ozaki-r

Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.


Revision tags: nick-nhusb-base-20150606
# 1.322 02-May-2015 joerg

Fix !ARP build.


# 1.321 02-May-2015 roy

Add IPv4 address flags IN_IFF_TENTATIVE, IN_IFF_DUPLICATED and
IN_IFF_DETATCHED to mimic the IPv6 address behaviour.
Add SIOCGIFAFLAG_IN ioctl to retrieve the address flag via the
ifreq structure.
Add IPv4 DAD detection via the ARP methods described in RFC 5227.
Add sysctls net.inet.ip.dad_count and net.inet.arp.debug.

Discussed on tech-net@


Revision tags: nick-nhusb-base-20150406
# 1.320 26-Mar-2015 ozaki-r

Tidy up the regular path of ip_forward

No functional change is intended.


Revision tags: netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.319 16-Jun-2014 ozaki-r

branches: 1.319.2; 1.319.4; 1.319.6; 1.319.10;
Add 3rd argument to pktq_create to pass sc

It will be used to pass bridge sc for bridge_forward softint.

ok rmind@


# 1.318 05-Jun-2014 rmind

- Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.


# 1.317 30-May-2014 christos

Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.


# 1.316 29-May-2014 rmind

Make IGMP and multicast group management code MP-safe. Use a read-write
lock to protect the hash table of multicast address records; also, make it
private and eliminate some macros. In the long term, the lookup path ought
to be optimised.


# 1.315 28-May-2014 christos

CID 12164{49,51}: Remove bogus ifp == NULL checks; if ifp was really NULL,
we would have been dead a few lines before the tests.


# 1.314 23-May-2014 rmind

ip_input(), ip_savecontrol(): cache m->m_pkthdr.rcvif in a variable.


# 1.313 23-May-2014 rmind

Make ip_forward() static, there is no need to expose it.


# 1.312 23-May-2014 rmind

Make ip_input() static, there is no need to expose it.


# 1.311 22-May-2014 rmind

- Add in_init() and move some functions, variables and sysctls into in.c
where they belong to. Make some functions and variables static.
- ip_input.c: reduce some #ifdefs, cleanup a little.
- Move some sysctls into ip_flow.c as they belong there.

No functional change.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.310 19-Mar-2014 liamjfoy

branches: 1.310.2;
Remove ipflow_prune and replace with ipflow_reap. ok rmind@


Revision tags: riastradh-drm2-base3
# 1.309 25-Feb-2014 pooka

Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.308 29-Jun-2013 rmind

- Rewrite parts of pfil(9): use array to store hooks and thus be more cache
friendly (there are only few hooks in the system). Make the structures
opaque and the interface more strict.
- Remove PFIL_HOOKS option by making pfil(9) mandatory.


# 1.307 27-Jun-2013 christos

branches: 1.307.2;
flip src/dst


# 1.306 27-Jun-2013 christos

implement IP_PKTINFO and IP_RECVPKTINFO.


# 1.305 08-Jun-2013 rmind

Split IPsec code in ip_input() and ip_forward() into the separate routines
ipsec4_input() and ipsec4_forward(). Tested by christos@.


# 1.304 05-Jun-2013 christos

IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.


Revision tags: agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.303 29-Nov-2012 christos

Add a new sysctl to mark ports as reserved, so that they are not used in
the anonymous or reserved port allocation.


Revision tags: yamt-pagecache-base6
# 1.302 25-Jun-2012 christos

branches: 1.302.2;
rename rfc6056 -> portalgo, requested by yamt


# 1.301 22-Jun-2012 christos

PR/46602: Move the rfc6056 port randomization to the IP layer.


# 1.300 02-Jun-2012 dsl

Add some pre-processor magic to verify that the type of the data item
passed to sysctl_createv() actually matches the declared type for
the item itself.
In the places where the caller specifies a function and a structure
address (typically the 'softc') an explicit (void *) cast is now needed.
Fixes bugs in sys/dev/acpi/asus_acpi.c sys/dev/bluetooth/bcsp.c
sys/kern/vfs_bio.c sys/miscfs/syncfs/sync_subr.c and setting
AcpiGbl_EnableAmlDebugObject.
(mostly passing the address of a uint64_t when typed as CTLTYPE_INT).
I've test built quite a few kernels, but there may be some unfixed MD
fallout. Most likely passing &char[] to char *.
Also add CTLFLAG_UNSIGNED for unsiged decimals - not set yet.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.299 22-Mar-2012 drochner

remove KAME IPSEC, replaced by FAST_IPSEC


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.298 09-Jan-2012 liamjfoy

branches: 1.298.2; 1.298.6; 1.298.8;
check against NULL


# 1.297 19-Dec-2011 drochner

rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.296 31-Aug-2011 plunky

branches: 1.296.2; 1.296.6;
NULL does not need a cast


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.295 03-May-2011 dyoung

*_drain() routines may be called with locks held, so instead of doing
any work in *_drain(), set a drain-needed flag. Do the work in the
fasttimo handler.

Contributed by Coyote Point Systems, Inc.


# 1.294 14-Apr-2011 dyoung

In ipintr(), don't overwrite ipintrq.ifq_maxlen with IFQ_MAXLEN.

Initialize ipintrq.ifq_maxlen using IFQ_MAXLEN directly instead of using
the global ipqmaxlen. Get rid of the global ipqmaxlen.

Now it works again to override the maximum IP queue length with, for
example, sysctl -w net.inet.ip.ifq.maxlen=5.


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.293 13-Dec-2010 matt

branches: 1.293.2;
Back out rev that shouldn't have been committed.


# 1.292 11-Dec-2010 matt

Add routines to calculate a checkesum if the driver concludes that the
h/w can't do it.


Revision tags: uebayasi-xip-base4
# 1.291 05-Nov-2010 rmind

ip_randomid: make mechanism MP-safe and more modular.

OK matt@


# 1.290 05-Nov-2010 rmind

ip_reass_packet: finish abstraction; some clean-up.
Discussed some time ago with matt@.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.289 19-Jul-2010 rmind

Abstract IP reassembly into single generic routine - ip_reass_packet().
Make struct ipq private and struct ipqent not visible to userland.
Push ip_len adjustment into reassembly layer.

OK matt@


# 1.288 13-Jul-2010 rmind

Split-off IPv4 re-assembly mechanism into a separate module. Abstract
into ip_reass_init(), ip_reass_lookup(), etc (note: abstraction is not
yet complete). No functional changes to the actual mechanism.

OK matt@


# 1.287 09-Jul-2010 rmind

ip_input: move lookup for fragment queue a little bit further. OK matt@.


Revision tags: uebayasi-xip-base1
# 1.286 01-Apr-2010 tls

As suggested by at least 3 different people (the guilty parties know who
they are) avoid repeated kernel_lock/unlock by using an intrq on the stack.

About 5%-10% better from run to run, on my *very* simpleminded test. Can't
possibly be worse.


# 1.285 31-Mar-2010 tls

Don't hold kernel lock across call to ip_input() -- it blocked *all*
hardware interrupts for the length of time it took for all dequeued
packets to flow up the stack (on multiprocessors only). Initial testing
shows performance impact is minimal -- since this temporary fix actually
means taking/releasing the kernel lock per-packet, that seems
acceptable.

Holding the kernel lock across the ip_input() call duplicated the
exclusion intended to be provided by the socket locks/softnet lock
(same lock, for INET/INET6 sockets) and could mask serious bugs. Several
hours' testing didn't turn any up but I'd be surprised if some don't now
appear.

Damon Permezel noticed the problem. Temporary fix suggested by matt@.


Revision tags: yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.284 16-Sep-2009 pooka

branches: 1.284.2; 1.284.4;
Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.283 17-Jul-2009 minskim

Delete trailing whitespace.


Revision tags: yamt-nfs-mp-base6
# 1.282 16-Jul-2009 minskim

Add the IP_RECVTTL option support.

If the IP_RECVTTL option is enabled on a SOCK_DGRAM socket, the
recvmsg(2) call will return the TTL of the received datagram. The
msg_control field in the msghdr structure points to a buffer that
contains a cmsghdr structure followed by the TTL value.

Modeled after FreeBSD implementation.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.281 18-Apr-2009 tsutsui

Remove extra whitespace added by a stupid tool.
XXX: more in src/sys/arch


# 1.280 15-Apr-2009 elad

Remove a few KAUTH_GENERIC_ISSUSER in favor of more descriptive
alternatives.

Discussed on tech-kern:

http://mail-index.netbsd.org/tech-kern/2009/04/11/msg004798.html

Input from ad@, christos@, dyoung@, tsutsui@.

Okay ad@.


# 1.279 18-Mar-2009 cegger

bcopy -> memcpy


Revision tags: nick-hppapmap-base2
# 1.278 19-Jan-2009 christos

branches: 1.278.2;
Provide compatibility to the old timeval SCM_TIMESTAMP messages.


Revision tags: mjf-devfs2-base
# 1.277 17-Dec-2008 cegger

kill MALLOC and FREE macros.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.276 23-Nov-2008 rmind

ip_input: fix an IPQ "lock" leak. (hi <matt>!)


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4
# 1.275 04-Oct-2008 pooka

branches: 1.275.2; 1.275.4;
POOL_INIT -> pool_init


Revision tags: wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.274 05-Sep-2008 seanb

Wrong route being consulted in one place
in ip_forward() after change to rtcache_*().
Restore previous behaviour.


# 1.273 20-Aug-2008 matt

Make the sysctl routines take out softnet_lock before dealing with
any data structures.

Change inet6ctlerrmap and zeroin6_addr to const.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base
# 1.272 05-May-2008 ad

branches: 1.272.2; 1.272.6;
- Convert hashinit() to use kmem_alloc(). The hash tables can be large
and it's better to not have them in kmem_map.
- Convert a couple of minor items along the way to kmem_alloc().
- Fix some memory leaks.


# 1.271 04-May-2008 thorpej

Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577


# 1.270 02-May-2008 ad

PR kern/38497 Out of memory allocating ksiginfo

Work around: don't acquire softnet_lock in protocol drain routines.


# 1.269 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.268 24-Apr-2008 ad

branches: 1.268.2;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.


# 1.267 23-Apr-2008 thorpej

Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.266 12-Apr-2008 thorpej

branches: 1.266.2;
Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.


# 1.265 09-Apr-2008 thorpej

- ipflow is not used outside ip_flow.c; move its definition there.
- Make ipflow_reap() private to ip_flow.c, and introduce ipflow_prune()
for external callers to use (avoids returning an ipflow * that is never
actually used anyway).


# 1.264 07-Apr-2008 thorpej

Change IP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ipstat structure; old netstat
binaries will continue to work properly.


# 1.263 27-Mar-2008 cube

- Make sure we send a reasonable fragment size when IPSEC is configured.
Otherwise we end up sending a dubious "0" whenever we cannot find a
proper association for the packet.
- Reset sack_newdata along with snd_nxt to avoid improper integer
arithmetics that lead to sending data from an incorrect place in the
stream, making it appear as corrupted.

Patch by Michael Van Elst, based on an analysis by Michael for the IPSEC
stuff and I for the SACK issue.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.262 06-Feb-2008 matt

branches: 1.262.6;
Add a new ip_id generation scheme based on a Fisher-Yates shuffle over a
sliding window. XXX replace use of arc4random RSN.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.261 14-Jan-2008 dyoung

Use rtcache_validate() instead of rtcache_getrt(). Shorten staircase
in in_losing().


Revision tags: vmlocking2-base3 matt-armv6-base
# 1.260 22-Dec-2007 matt

Fix offset calculation.
Make sure that all frags use the same TOS.


# 1.259 21-Dec-2007 matt

Also make sure the first is at 68 bytes long.


# 1.258 21-Dec-2007 matt

Prevent TCP blind data attacks by not allowing non-initial fragments to
start at less than 68 bytes (minimal fragment size).


# 1.257 20-Dec-2007 dyoung

Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.256 26-Nov-2007 yamt

branches: 1.256.2; 1.256.6;
inetctlerrmap: use designated initializer.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.255 09-Nov-2007 kefren

Don't MCLAIM in ipintr() because we do it anyway in ip_input()


Revision tags: jmcneill-base yamt-x86pmap-base4 yamt-x86pmap-base3 yamt-x86pmap-base2 vmlocking-base
# 1.254 02-Oct-2007 dyoung

branches: 1.254.2; 1.254.4;
Delete the unused second argument to ip_stripoptions(), move it
closer to its single caller in if_eon.c, try to move fewer bytes
by moving the IP header forward instead of moving the tail of the
mbuf backward, and use m_adj(9) instead of fiddling directly with
mbuf data members.


Revision tags: yamt-x86pmap-base
# 1.253 11-Sep-2007 degroote

branches: 1.253.2;
In some FAST_IPSEC, spl level is not restored correctly. Fix that.

Spotted by Wolfgang Stukenbrock in pr/36800


Revision tags: nick-csl-alignment-base5
# 1.252 30-Aug-2007 dyoung

Use malloc(9) for sockaddrs instead of pool(9), and remove dom_sa_pool
and dom_sa_len members from struct domain. Pools of fixed-size
objects are too rigid for sockaddr_dls, whose size can vary over
a wide range.

Return sockaddr_dl to its "historical" size. Now that I'm using
malloc(9) instead of pool(9) to allocate sockaddr_dl, I can create
a sockaddr_dl of any size in the kernel, so expanding sockaddr_dl
is useless.

Avoid using sizeof(struct sockaddr_dl) in the kernel.

Introduce sockaddr_dl_alloc() for allocating & initializing an
arbitrary sockaddr_dl on the heap.

Add an argument, the sockaddr length, to sockaddr_alloc(),
sockaddr_copy(), and sockaddr_dl_setaddr().

Constify: LLADDR() -> CLLADDR().

Where the kernel overwrites LLADDR(), use sockaddr_dl_setaddr(),
instead. Used properly, sockaddr_dl_setaddr() will not overrun
the end of the sockaddr.


# 1.251 10-Aug-2007 dyoung

branches: 1.251.2;
Use sockaddr_dl_init().


Revision tags: matt-mips64-base
# 1.250 19-Jul-2007 dyoung

branches: 1.250.4; 1.250.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base yamt-idlelwp-base8 mjf-ufs-trans-base
# 1.249 02-May-2007 dyoung

branches: 1.249.2;
Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


Revision tags: thorpej-atomic-base
# 1.248 25-Mar-2007 liamjfoy

Add net.inet.ip.hashsize to control the IPv4 fast forward hash table size.


# 1.247 24-Mar-2007 liamjfoy

Don't call ip*flow_reap if we're just looking up maxflows


# 1.246 12-Mar-2007 ad

branches: 1.246.2; 1.246.4;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.


# 1.245 05-Mar-2007 liamjfoy

branches: 1.245.2;
Move ipflow_slowtimo from ip_slowtimo and into in_proto.c

ok matt@


# 1.244 04-Mar-2007 christos

Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.243 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.242 29-Jan-2007 dyoung

branches: 1.242.2;
Cosmetic: remove extraneous, non-KNF parentheses. Change a
sizeof(type) to a sizeof(*ptr) so the correctness of the statement
is correct "at a glance" (or so I hope).


# 1.241 22-Dec-2006 ad

ipintr(): check if the queue is empty before looping. Hardly a giant
win, but removed 30% of splnet() calls in one local test.


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.240 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.239 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


# 1.238 06-Dec-2006 dyoung

KNF.


# 1.237 06-Dec-2006 dyoung

KNF.


Revision tags: netbsd-4-0-RC1 netbsd-4-base
# 1.236 16-Nov-2006 christos

branches: 1.236.2; 1.236.4;
__unused removal on arguments; approved by core.


Revision tags: yamt-splraiseipl-base2
# 1.235 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


# 1.234 10-Oct-2006 dogcow

change the MOWNER_INIT define to take two args; fix extant struct mowner
decls to use it. Makes options MBUFTRACE compile again and not whinge about
missing structure declarations. (Also makes initialization consistent.)


# 1.233 05-Oct-2006 tls

Protect calls to pool_put/pool_get that may occur in interrupt context
with spl used to protect other allocations and frees, or datastructure
element insertion and removal, in adjacent code.

It is almost unquestionably the case that some of the spl()/splx() calls
added here are superfluous, but it really seems wrong to see:

s=splfoo();
/* frob data structure */
splx(s);
pool_put(x);

and if we think we need to protect the first operation, then it is hard
to see why we should not think we need to protect the next. "Better
safe than sorry".

It is also almost unquestionably the case that I missed some pool
gets/puts from interrupt context with my strategy for finding these
calls; use of PR_NOWAIT is a strong hint that a pool may be used from
interrupt context but many callers in the kernel pass a "can wait/can't
wait" flag down such that my searches might not have found them. One
notable area that needs to be looked at is pf.

See also:

http://mail-index.netbsd.org/tech-kern/2006/07/19/0003.html
http://mail-index.netbsd.org/tech-kern/2006/07/19/0009.html


# 1.232 19-Sep-2006 elad

Remove ugly (void *) casts from network scope authorization wrapper and
calls to it.

While here, adapt code for system scope listeners to avoid some more
casts (forgotten in previous run).

Update documentation.


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9
# 1.231 13-Sep-2006 elad

branches: 1.231.2;
Don't use KAUTH_RESULT_* where it's not applicable.
Prompted by yamt@.


# 1.230 08-Sep-2006 elad

First take at security model abstraction.

- Add a few scopes to the kernel: system, network, and machdep.

- Add a few more actions/sub-actions (requests), and start using them as
opposed to the KAUTH_GENERIC_ISSUSER place-holders.

- Introduce a basic set of listeners that implement our "traditional"
security model, called "bsd44". This is the default (and only) model we
have at the moment.

- Update all relevant documentation.

- Add some code and docs to help folks who want to actually use this stuff:

* There's a sample overlay model, sitting on-top of "bsd44", for
fast experimenting with tweaking just a subset of an existing model.

This is pretty cool because it's *really* straightforward to do stuff
you had to use ugly hacks for until now...

* And of course, documentation describing how to do the above for quick
reference, including code samples.

All of these changes were tested for regressions using a Python-based
testsuite that will be (I hope) available soon via pkgsrc. Information
about the tests, and how to write new ones, can be found on:

http://kauth.linbsd.org/kauthwiki

NOTE FOR DEVELOPERS: *PLEASE* don't add any code that does any of the
following:

- Uses a KAUTH_GENERIC_ISSUSER kauth(9) request,
- Checks 'securelevel' directly,
- Checks a uid/gid directly.

(or if you feel you have to, contact me first)

This is still work in progress; It's far from being done, but now it'll
be a lot easier.

Relevant mailing list threads:

http://mail-index.netbsd.org/tech-security/2006/01/25/0011.html
http://mail-index.netbsd.org/tech-security/2006/03/24/0001.html
http://mail-index.netbsd.org/tech-security/2006/04/18/0000.html
http://mail-index.netbsd.org/tech-security/2006/05/15/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/01/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/25/0000.html

Many thanks to YAMAMOTO Takashi, Matt Thomas, and Christos Zoulas for help
stablizing kauth(9).

Full credit for the regression tests, making sure these changes didn't break
anything, goes to Matt Fleming and Jaime Fournier.

Happy birthday Randi! :)


Revision tags: yamt-pdpolicy-base8 rpaulo-netinet-merge-pcb-base
# 1.229 30-Aug-2006 christos

branches: 1.229.2;
fix initializer


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.228 30-Jul-2006 elad

ugh.. more stuff that's overdue and should not be in 4.0: remove the
sysctl(9) flags CTLFLAG_READONLY[12]. luckily they're not documented
so it's only half regression.

only two knobs used them; proc.curproc.corename (check added in the
existing handler; its CTLFLAG_ANYWRITE, yay) and net.inet.ip.forwsrcrt,
that got its own handler now too.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base
# 1.227 07-Jun-2006 kardel

merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base simonb-timecounters-base
# 1.226 08-May-2006 liamjfoy

branches: 1.226.2;
#if -> #ifdef

ok christos


# 1.225 15-Apr-2006 christos

Coverity CID 1134: Protect against NULL deref.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.224 18-Feb-2006 joerg

branches: 1.224.2; 1.224.4; 1.224.6;
Print the source and destination IP in ip_forward's DIAGNOSTIC code
with inet_ntoa, making it more human friendly.

From Liam J. Foy in private mail.


# 1.223 24-Dec-2005 perry

branches: 1.223.2; 1.223.4; 1.223.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.222 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.221 01-Nov-2005 christos

Don't decrement the ttl, until we are sure that we can forward this packet.
Before if there was no route, we would call icmp_error with a datagram
packet that has an incorrect checksum. (From Liam Foy)


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base
# 1.220 23-Oct-2005 christos

No need to pass an interface when only the mtu is needed. From OpenBSD via
Liam Foy.


Revision tags: yamt-vop-base
# 1.219 05-Aug-2005 elad

branches: 1.219.2;
Add sysctls for IP, ICMP, TCP, and UDP statistics.


# 1.218 28-Jun-2005 seanb

branches: 1.218.2;
- Return ICMP_UNREACH_NET when no route found as per
section 4.3.3.1 of rfc1812.


# 1.217 09-Jun-2005 atatat

Properly fix the constipated lossage wrt -Wcast-qual and the sysctl
code. I know it's not the prettiest code, but it seems to work rather
well in spite of itself.


# 1.216 01-Jun-2005 blymn

Unconstify rnode to prevent compile error when GATEWAY option set.


Revision tags: kent-audio2-base
# 1.215 29-Apr-2005 yamt

move decl of inetsw to its own header to avoid array of incomplete type.
found by gcc4. reported by Adam Ciarcinski.


# 1.214 18-Apr-2005 yamt

fix problems related to loopback interface checksum omission. PR/29971.

- for ipv4, defer decision to ip layer as h/w checksum offloading does
so that it can check the actual interface the packet is going to.
- for ipv6, disable it.
(maybe will be revisited when it implements h/w checksum offloading.)

ok'ed by Jason Thorpe.


# 1.213 29-Mar-2005 yamt

ip_reass: clear stale csum_flags.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base
# 1.212 26-Feb-2005 perry

branches: 1.212.2;
nuke trailing whitespace


Revision tags: yamt-km-base2
# 1.211 03-Feb-2005 perry

ANSIfy function declarations


# 1.210 02-Feb-2005 perry

de-__P -- will ANSIfy .c files later.


Revision tags: yamt-km-base
# 1.209 24-Jan-2005 matt

branches: 1.209.2;
Add IFNET_FOREACH and IFADDR_FOREACH macros and start using them.


Revision tags: kent-audio1-beforemerge
# 1.208 19-Dec-2004 christos

branches: 1.208.2;
yamt's changes seem to fix all the checksumming issues. Turn the loopback
checksums back off so we can make sure that everything works.


# 1.207 17-Dec-2004 christos

Turn checksumming on loopback back on until we fix the bugs in it.
Connect over tcp on the loopback is broken:

4729 amq 0.000007 CALL connect(4,0x804f2a0,0x1c)
4729 amq 75.007420 RET connect -1 errno 60 Connection timed out


# 1.206 15-Dec-2004 thorpej

Don't perform checksums on loopback interfaces. They can be reenabled with
the net.inet.*.do_loopback_cksum sysctl.

Approved by: groo


Revision tags: kent-audio1-base
# 1.205 06-Oct-2004 darrenr

Add a comment to document what setting "srcrt" is really on about in ipintr()


# 1.204 29-Sep-2004 christos

PR/27081: Sean Boudreau: ip_input() bad csum count not incremented on sw csum


Revision tags: BEFORE-IPF413
# 1.203 25-May-2004 atatat

Sysctl descriptions under net subtree (net.key not done)


# 1.202 02-May-2004 darrenr

at line 543, we do a pullup here of hlen bytes into the mbuf,
so these later ones are superfluous.


# 1.201 01-May-2004 matt

Use EVCNT_ATTACH_STATIC{,2}


# 1.200 25-Apr-2004 simonb

Initialise (most) pools from a link set instead of explicit calls
to pool_init. Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.

Convert struct session, ucred and lockf to pools.


# 1.199 22-Apr-2004 matt

Constify protosw arrays. This can reduce the kernel .data section by
over 4K (if all the network protocols) are loaded.


# 1.198 01-Apr-2004 matt

In ip_reass_ttl_descr, make i signed since it's compared to >= 0


Revision tags: netbsd-2-0-base BEFORE-IPF411
# 1.197 24-Mar-2004 atatat

branches: 1.197.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.


# 1.196 15-Jan-2004 itojun

correct typo in 1.94 -> 1.95. pointed out by Shiva Shenoy


# 1.195 14-Dec-2003 thorpej

Fix syntax errors in CHECK_NMBCLUSTER_PARAMS().


# 1.194 14-Dec-2003 jonathan

Second part of hashed IP_reassembly changes:

When under pressure for mbufs or we have too many fragments in the IP
reassembly queue, drop half of all fragments. This multiplicative-drop
strategy ensures we return to a healthy state, even under borderline
denial-of-service from extremely lossy NFS-over-UDP peers.
The multiplicative-drop phase currently drops 50% of fragments, but
has pre-placed support for implementing drop-fractions other than 50%

The threshhold for the `drop-half' phase is the new variable,
ip_maxfrags which is calculated as nmbclusters/4.

ip_input.c now keeps ip_nmbclusters, a cached copy of nmbclusters.
Before using limits derived from nmbclusters, we check if nmbclusters
and ip_nmclusters are equal. If not, we recompute Ip parameters
derived from nmbclusters. Based on a suggestion by Jason Thorpe.
ip_maxfrags is currently auto-recalcuated.

The counters ip_nfrags and ip_nfragpacketsr are now declared static
and uninitialized (bss), to discourage tampering with them.


# 1.193 12-Dec-2003 scw

Make fast-ipsec and ipflow (Fast Forwarding) interoperate.

The idea is that we only clear M_CANFASTFWD if an SPD exists
for the packet. Otherwise, it's safe to add a fast-forward
cache entry for the route.

To make this work properly, we invalidate the entire ipflow
cache if a fast-ipsec key is added or changed.


# 1.192 08-Dec-2003 jonathan

Add new field ipq_nfrags to struct ipq. Maintain count of fragments
(fragments, not fragmented packets) in each queue entry.
Use ipq_nfrags to maintain a count of total fragments in reassembly queue.


# 1.191 07-Dec-2003 jonathan

KNF: s/unsigned/u_int/, in a couple of places I missed.


# 1.190 06-Dec-2003 jonathan

Replace the single global IP reassembly list/listhead, with a
hashtable of list-heads. Independently re-invented, then reworked to
match similar code in FreeBSD.


# 1.189 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.188 04-Dec-2003 scw

ipflow (IP fast forwarding) is not compatible with FAST_IPSEC either.

XXX: The decision whether or not to fast forward should be made
XXX: dynamically. Using the current approach seriously reduces
XXX: routing performance on gateways with IPsec enabled.


# 1.187 26-Nov-2003 itojun

define RANDOM_IP_ID by default (unifdef -DRANDOM_IP_ID).
one use remains in sys/netipsec, which is kept for freebsd source code compat.


# 1.186 24-Nov-2003 scw

For FAST_IPSEC, ipfilter gets to see wire-format IPsec-encapsulated packets
only. Decapsulated packets bypass ipfilter. This mimics current behaviour
for Kame IPsec.


# 1.185 19-Nov-2003 fvdl

Correct number of arguments to sysctl_rdint.


# 1.184 19-Nov-2003 jonathan

Patch back support for (badly) randomized IP ids, by request:

* Include "opt_inet.h" everywhere IP-ids are generated with ip_newid(),
so the RANDOM_IP_ID option is visible. Also in ip_id(), to ensure
the prototype for ip_randomid() is made visible.

* Add new sysctl to enable randomized IP-ids, provided the kernel was
configured with RANDOM_IP_ID. (The sysctl defaults to zero, and is
a read-only zero if RANDOM_IP_ID is not configured).

Note that the implementation of randomized IP ids is still defective,
and should not be enabled at all (even if configured) without
very careful deliberation. Caveat emptor.


# 1.183 17-Nov-2003 jonathan

Diff to netinet/ip_input.c (restore ip_id, initialize) for ip_id fix:

Revert the (default) ip_id algorithm to the pre-randomid algorithm,
due to demonstrated low-period repeated IDs from the randomized IP_id
code. Consensus is that the low-period repetition (much less than
2^15) is not suitable for general-purpose use.

Allocators of new IPv4 IDs should now call the function ip_newid().
Randomized IP_ids is now a config-time option, "options RANDOM_IP_ID".
ip_newid() can use ip_random-id()_IP_ID if and only if configured
with RANDOM_IP_ID. A sysctl knob should be provided.

This API may be reworked in the near future to support linear ip_id
counters per (src,dst) IP-address pair.


# 1.182 12-Nov-2003 itojun

KNF


# 1.181 11-Nov-2003 jonathan

Change global head-of-local-IP-address list from in_ifaddr to
in_ifaddrhead. Recent changes in struct names caused a namespace
collision in fast-ipsec, which are most cleanly fixed by using
"in_ifaddrhead" as the listhead name.


# 1.180 10-Nov-2003 jonathan

Make per-protocol network input queue stats visible to userland via
sysctl. Add a protocol-independent sysctl handler to show the per-protocol
"struct ifq' statistics. Add IP(v4) specific call to the handler.
Other protocols can show their per-protocol input statistics by
allocating a sysclt node and calling sysctl_ifq() with their own struct ifq *.

As posted to tech-kern plus improvements/cleanup suggested by Andrew Brown.


# 1.179 28-Sep-2003 mycroft

Remove some code that breaks AH tunnels completely. The comment describing
the purpose of this code appears to be on crack -- it's talking about
end-to-end authentication, but the purpose of an AH tunnel is NOT end-to-end
authentication; it's authentication of the tunnel endpoints.

NB: This does not fix the fact that IPsec leaks "packet tags."


# 1.178 06-Sep-2003 itojun

randomize IPv4/v6 fragment ID and IPv6 flowlabel. avoids predictability
of these fields. ip_id.c is from openbsd. ip6_id.c is adapted by kame.


# 1.177 06-Sep-2003 itojun

backout previous, we don't know if arc4random() corrides on reboot.


# 1.176 05-Sep-2003 itojun

initialize fragment ID with arc4random, not by time.tv_sec


# 1.175 22-Aug-2003 itojun

remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.


# 1.174 22-Aug-2003 itojun

change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.


# 1.173 15-Aug-2003 jonathan

(fast-ipsec): Add hooks to pass IPv4 IPsec traffic into fast-ipsec, if
configured with ``options FAST_IPSEC''. Kernels with KAME IPsec or
with no IPsec should work as before.

All calls to ip_output() now always pass an additional compulsory
argument: the inpcb associated with the packet being sent,
or 0 if no inpcb is available.

Fast-ipsec tested with ICMP or UDP over ESP. TCP doesn't work, yet.


# 1.172 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.171 14-Jul-2003 itojun

correct igmp. from love


# 1.170 03-Jul-2003 itojun

minor KNF


# 1.169 30-Jun-2003 itojun

branches: 1.169.2;
do not generate ICMP redirect when packet filter alters ip_dst to an
address that reside on the same link. Cedric Berger convinced me that
it is necessary.


# 1.168 30-Jun-2003 itojun

fix indent


# 1.167 23-Jun-2003 martin

Make sure to include opt_foo.h if a defflag option FOO is used.


# 1.166 15-Jun-2003 matt

Change the way multicasts are kept. They now use a hash table in the same
manner as the ifaddr hash table. By doing this, the mkludge code can go
away. At the same time, keep track of what pcbs are using what ifaddr and
when an address is deleted from an interface, notify/abort all sockets
that have that address as a source. Switch IGMP and multicasts to use pools
for allocation. Fix a number of potential problems in the igmp code where
allocation failures could cause a trap/panic.


# 1.165 11-Apr-2003 christos

PR/991: Darren Reed: Add a sysctl (checkinteface) to implement this. This
implementation is taken from FreeBSD, but we default to off.
XXX: We should really do this on a per ifaddr basis as jason suggested.


# 1.164 26-Feb-2003 matt

Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.163 12-Nov-2002 itojun

remove all entries in rt timer queue on ip_mtudisc change, instead of
destroying the queue.


# 1.162 12-Nov-2002 itojun

ckout previous - doesn't compile


# 1.161 12-Nov-2002 itojun

update ip_mtudisc sysctl change handling.


# 1.160 10-Nov-2002 itojun

always create pmtud timeout queue, as ip_mtudisc can be tweaked via
sysctl at runtime. From lha@stacken.kth.se


# 1.159 02-Nov-2002 perry

/*CONTCOND*/ while (0)'ed macros


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.158 23-Sep-2002 itojun

revert mtudisc_timeout value to the old one if update falis


# 1.157 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame.


# 1.156 11-Sep-2002 itojun

correct signedness mixup in pointer passing. sync w/kame


Revision tags: gehenna-devsw-base
# 1.155 14-Aug-2002 itojun

avoid swapping endian of ip_len and ip_off on mbuf, to meet with M_LEADINGSPACE
optimization made last year. should solve PR 17867 and 10195.

IP_HDRINCL behavior of raw ip socket is kept unchanged. we may want to
provide IP_HDRINCL variant that does not swap endian.


# 1.154 30-Jun-2002 thorpej

Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).


# 1.153 13-Jun-2002 itojun

set IPv4 parameter to modern value.
- turn on path MTU discovery (previous: turned off)
- ICMPv4 redirect entry timeout = 600 sec (previous: never timeout)


# 1.152 09-Jun-2002 itojun

whitespace


# 1.151 07-Jun-2002 itojun

look at rmx_mtu on IPsec tunnel MTU computation.
From: David Waitzman <djw@bbn.com>


Revision tags: netbsd-1-6-base
# 1.150 12-May-2002 matt

branches: 1.150.2; 1.150.4;
Eliminate commons.


# 1.149 12-May-2002 wiz

Spelling fixes, from Sergey Svishchev in kern/16650.


# 1.148 07-May-2002 matt

Change struct ipqe to use TAILQ's instead of LIST's (primarily for TCP's
benefit currently). Rework tcp_reass code to optimize the 4 most likely causes
of out-of-order packets: first OoO pkt, next OoO pkt in seq, OoO pkt is part
of new chuck of OoO packets, and the OoO pkt fills the first hole. Add evcnts
to instrument tcp_reass (enabled by the options TCP_REASS_COUNTERS). This is
part 1/2 of tcp_reass changes.


# 1.147 18-Apr-2002 matt

Change test for M_EXT to M_READONLY for MROUTING. We only need to to do
a pullup if we aren't allowed to modify the packet.


Revision tags: eeh-devprop-base newlock-base
# 1.146 08-Mar-2002 thorpej

Pool deals fairly well with physical memory shortage, but it doesn't
deal with shortages of the VM maps where the backing pages are mapped
(usually kmem_map). Try to deal with this:

* Group all information about the backend allocator for a pool in a
separate structure. The pool references this structure, rather than
the individual fields.
* Change the pool_init() API accordingly, and adjust all callers.
* Link all pools using the same backend allocator on a list.
* The backend allocator is responsible for waiting for physical memory
to become available, but will still fail if it cannot callocate KVA
space for the pages. If this happens, carefully drain all pools using
the same backend allocator, so that some KVA space can be freed.
* Change pool_reclaim() to indicate if it actually succeeded in freeing
some pages, and use that information to make draining easier and more
efficient.
* Get rid of PR_URGENT. There was only one use of it, and it could be
dealt with by the caller.

From art@openbsd.org.


Revision tags: ifpoll-base
# 1.145 25-Feb-2002 itojun

correctly enforce ipsec policy check on forwarding case.
From: Greg Troxel <gdt@ir.bbn.com>, Bill Chiarchiaro <wjc@work.cleartech.com>


# 1.144 24-Feb-2002 martin

Clear M_BCAST and M_MCAST on outgoing mbufs.
Don't copy ttl from the inner packet to the encapsulating packet. Make
the outer ttl sysctl'able. This should close PR 14269 from Jasper Wallace
(change partly from there) and it makes traceroute work over gre tunnels.


# 1.143 21-Feb-2002 itojun

suppress source quence message, based on router-req RFC (also could be abused
as DoS traffic generator). from kjc/kame


# 1.142 28-Nov-2001 darrenr

recompute hlen after calling pfil_run_hooks() in case ip_hl was changed.


# 1.141 13-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-mips-cache-base
# 1.140 04-Nov-2001 matt

Convert netinet to not use the internal <sys/queue.h> field names
but instead the access macros. Use the FOREACH macros where appropriate.


# 1.139 04-Nov-2001 matt

Change a few variable/tables to const since they are read-only.


# 1.138 29-Oct-2001 simonb

Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.137 17-Sep-2001 thorpej

branches: 1.137.2;
Split the pre-computed ifnet checksum flags into Tx and Rx directions.
Add capabilities bits that indicate an interface can only perform
in-bound TCPv4 or UDPv4 checksums. There is at least one Gig-E chip
for which this is true (Level One LXT-1001), and this is also the
case for the Intel i82559 10/100 Ethernet chips.


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.136 06-Aug-2001 itojun

branches: 1.136.2;
cache IPsec policy on in6?pcb. most of the lookup operations can be bypassed,
especially when it is a connected SOCK_STREAM in6?pcb. sync with kame.


# 1.135 02-Jun-2001 thorpej

branches: 1.135.2;
Implement support for IP/TCP/UDP checksum offloading provided by
network interfaces. This works by pre-computing the pseudo-header
checksum and caching it, delaying the actual checksum to ip_output()
if the hardware cannot perform the sum for us. In-bound checksums
can either be fully-checked by hardware, or summed up for final
verification by software. This method was modeled after how this
is done in FreeBSD, although the code is significantly different in
most places.

We don't delay checksums for IPv6/TCP, but we do take advantage of the
cached pseudo-header checksum.

Note: hardware-assisted checksumming defaults to "off". It is
enabled with ifconfig(8). See the manual page for details.

Implement hardware-assisted checksumming on the DP83820 Gigabit Ethernet,
3c90xB/3c90xC 10/100 Ethernet, and Alteon Tigon/Tigon2 Gigabit Ethernet.


# 1.134 21-May-2001 lukem

fix spelo in comment


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.133 16-Apr-2001 itojun

give a default value to net.inet.ip.maxfragpackets, to protect us from
"lots of fragmented packets" DoS attack.

the current default value is derived from ipv6 counterpart, which is
a magical value "200". it should be enough for normal systems, not sure
if it is enough when you take hundreds of thousands of tcp connections on
your system. if you have proposal for a better value with concrete reasons,
let me know.


# 1.132 13-Apr-2001 thorpej

Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.


# 1.131 27-Mar-2001 itojun

net.inet.ip.maxfragpackets defines the maximum size of ip reass queue
(prevents fragment flood from chewing up mbuf memory space).
derived from KAME net.inet6.ip6.maxfragpackets.


# 1.130 02-Mar-2001 itojun

branches: 1.130.2;
increase ipstat.ips_badaddr if the packet fails to pass address checks.


# 1.129 02-Mar-2001 itojun

reject packets with 127/8 on IPv4 src/dst, they must not appear on wire
(RFC1122). torture-tests will be welcomed.
XXX do we want to check source routing headers as well?


# 1.128 01-Mar-2001 itojun

make sure to enforce inbound ipsec policy checking, for any protocols on top
of ip (check it when final header is visited). sync with kame.
XXX kame team will need to re-check policy engine code


# 1.127 24-Jan-2001 itojun

- record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation


# 1.126 28-Dec-2000 thorpej

Back out the sledgehammer damage applied by wiz while I was out for
the holiday.


# 1.125 25-Dec-2000 wiz

Back out previous change. It causes NAT to fail, and was CLEARLY
NOT TESTED before it was committed.


# 1.124 22-Dec-2000 thorpej

Slight adjustment to how pfil_head's are registered. Instead of a
"key" and a "dlt", use a "type" (PFIL_TYPE_{AF,IFNET} for now) and
a val/ptr appropriate for that type. This allows for more future
flexibility with the pfil_hook mechanism.


# 1.123 14-Dec-2000 thorpej

Add ALTQ glue. XXX Temporary until ALTQ is changed to use a pfil hook.


# 1.122 24-Nov-2000 itojun

IFA_STATS stability (not complete); don't touch ip if it is NULL.


# 1.121 11-Nov-2000 thorpej

Restructure the PFIL_HOOKS mechanism a bit:
- All packets are passed to PFIL_HOOKS as they come off the wire, i.e.
fields in protocol headers in network order, etc.
- Allow for multiple hooks to be registered, using a "key" and a "dlt".
The "dlt" is a BPF data link type, indicating what type of header is
present.
- INET and INET6 register with key == AF_INET or AF_INET6, and
dlt == DLT_RAW.
- PFIL_HOOKS now take an argument for the filter hook, and mbuf **,
an ifnet *, and a direction (PFIL_IN or PFIL_OUT), thus making them
less IP (really, IP Filter) centric.

Maintain compatibility with IP Filter by adding wrapper functions for
IP Filter.


# 1.120 08-Nov-2000 ad

Update for hashinit() change.


# 1.119 13-Oct-2000 itojun

make sure we don't share external mbuf between m and mcopy, in ip_forward().
should solve PR 11201.


# 1.118 26-Aug-2000 itojun

make sure anonport{min,max} is not negative number


# 1.117 25-Aug-2000 tron

Add new sysctl variables "net.inet.ip.lowportmin" and
"net.inet.ip.lowportmax" which can be used to the set minimum
and maximum port number assigned to sockets using
IP_PORTRANGE_LOW.


# 1.116 06-Jul-2000 itojun

remove unnecessary #include <netkey/key_debug.h>. from kame.


# 1.115 28-Jun-2000 mrg

<vm/vm.h> -> <uvm/uvm_extern.h>


Revision tags: netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.114 10-May-2000 itojun

branches: 1.114.4;
add missing boundary checks to ip options processing.
correct timestamp option validation (len and ptr upper/lower bound
based on RFC791).
fill "pointer" field for parameter problem in timestamp option processing.


# 1.113 10-May-2000 itojun

correct more out-of-bounds memory access, if cnt == 1 and optlen > 1.


# 1.112 06-May-2000 sommerfeld

Handle large offsets with very small options correctly.


# 1.111 31-Mar-2000 jdolecek

Slighly improve previous - only include <netinet/ip_mroute.h> if MROUTING
is defined.


# 1.110 31-Mar-2000 jdolecek

include <netinet/ip_mroute.h> for ip_mforward() - needed after
last duplicate prototype sweep (prototype for ip_mforward() used to be in <netinet/ip_var.h>)


# 1.109 30-Mar-2000 augustss

Remove register declarations.


# 1.108 30-Mar-2000 simonb

Delete uninitialised declaration of ip_defttl - there's an initialised
decl earlier in this file.


# 1.107 10-Mar-2000 thorpej

Back out previous, and adjust a comment.


# 1.106 07-Mar-2000 thorpej

Back out part of 1.104 which isn't actually needed.


# 1.105 03-Mar-2000 itojun

remove unnecessary ttl initialization which I mistakingly bringed in
during KAME merge (this is part of WIDE's expeirmental reass code...)
NetBSD PR: 9412
From: Wolfgang Rupprecht <wolfgang@wsrcc.com>
Fix from: ho@crt.se
itojun was notified from: theo


# 1.104 02-Mar-2000 thorpej

Avoid a bug in GCC which manifests itself when processing unaligned
IP options. Problem pointed out by Matt Hargett and Erik Fair, analyzed
by me.


# 1.103 01-Mar-2000 itojun

introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)


# 1.102 20-Feb-2000 darrenr

pass "struct pfil_head *" to pfil_add_hook and pfil_remove hook rather
than "struct protosw *".


# 1.101 17-Feb-2000 darrenr

Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.


# 1.100 16-Feb-2000 itojun

- if ip_dst matches address on !IFF_UP interface, and
- there's no match against addresses on IFF_UP interface,
send icmp unreach if I'm router. drop it if I'm host.

Revised version of PR: 9387 from nrt@iij.ad.jp. Discussed with thorpej+nrt.


Revision tags: chs-ubc2-newbase
# 1.99 12-Feb-2000 thorpej

Typo (Thanks, Havard :-)


# 1.98 12-Feb-2000 thorpej

Small cosmetic change, and note a place where a statistic should be
gathered.


# 1.97 11-Feb-2000 itojun

fix in-kernel packet forwarding loop (till TTL becomes 0) when:
- a packet is delivered to an address X,
- and the address X is configured on my !IFF_UP interface
- and ipforwarding=1

NetBSD PR: 9387
From: nrt@iij.ad.jp


# 1.96 01-Feb-2000 thorpej

Use ifatoia() and sintosa() consistently, rather than using home-grown
casting macros intermixed.


# 1.95 31-Jan-2000 itojun

bring in latest KAME ipsec tree.
- interop issues in ipcomp is fixed
- padding type (after ESP) is configurable
- key database memory management (need more fixes)
- policy specification is revisited

XXX m->m_pkthdr.rcvif is still overloaded - hope to fix it soon


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.94 26-Oct-1999 itojun

disable ipflow (IPv4 fast fowarding) when IPsec is configured into the kernel.


# 1.93 17-Oct-1999 sommerfeld

branches: 1.93.2; 1.93.4;
In ip_forward():

Avoid forwarding ip unicast packets which were contained inside
link-level multicast packets; having M_MCAST still set in the packet
header flags will mean that the packet will get multicast to a bogus
group instead of unicast to the next hop.

Malformed packets like this have occasionally been spotted "in the
wild" on a mediaone cable modem segment which also had multiple netbsd
machines running as router/NAT boxes.

Without this, any subnet with multiple netbsd routers receiving all
multicasts will generate a packet storm on receipt of such a
multicast. Note that we already do the same check here for link-level
broadcasts; ip6_forward already does this as well.

Note that multicast forwarding does not go through ip_forward().

Adding some code to if_ethersubr to sanity check link-level
vs. ip-level multicast addresses might also be worthwhile.


Revision tags: chs-ubc2-base
# 1.92 23-Jul-1999 itojun

branches: 1.92.2;
do not include unnecessary include files.


# 1.91 09-Jul-1999 thorpej

defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).


# 1.90 06-Jul-1999 itojun

sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups


# 1.89 01-Jul-1999 itojun

IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.88 26-Jun-1999 sommerfeld

If the new global variable hostzerobroadcast is zero, no longer assume
address zero of each net/subnet is a broadcast address.
(The default value is nonzero, which preserves the current behavior).

This can be set using sysctl; the boot-time default can also be
configured using the HOSTZEROBROADCAST kernel config option.

While we're here, defopt HOSTZEROBROADCAST and SUBNETSARELOCAL


# 1.87 04-May-1999 hwr

It does not make much sense to increase a "output" counter on input.


# 1.86 03-May-1999 thorpej

In INADDR_TO_IA(), skip interfaces which are not up. Revert previous change
to ip_input.c to check the interface status after INADDR_TO_IA().

Fix cooked up by Heiko Rupp and myself.

Fixes PR 7480.


# 1.85 03-May-1999 hwr

Drop packets, that have a Class-D address as source address.
Implements the first half of PR 7003.


# 1.84 07-Apr-1999 proff

tiny KNF change


# 1.83 07-Apr-1999 proff

Prevent reception of packets on downed interfaces (via an up interface).
fixes kern/7327


Revision tags: netbsd-1-4-base
# 1.82 27-Mar-1999 aidan

branches: 1.82.2;
Added per-addr input/output statistics. Currently just support netatalk
and netinet, currently only tested under netinet.

Disabled by default, enabled by compiling the kernel with option
IFA_STATS. Enabling this feature seems to make the ip_output function
take 13% longer than before, which should be OK for people that need
this feature.


# 1.81 26-Mar-1999 proff

security: test for ip_len < ip_hl <<2 and drop packet accordingly


# 1.80 19-Jan-1999 mycroft

There's just no plausible reason to byte-swap ip_id internally. It's opaque.


# 1.79 19-Jan-1999 mycroft

Don't screw with ip_len; just subtract from it where we actually use the
value.


# 1.78 19-Jan-1999 mycroft

Don't overwrite the checksum fields when checking them. There's no reason to
do this, and it screws up ICMP replies.
XXX The returned IP checksum and length are still wrong.


# 1.77 11-Jan-1999 thorpej

Fix byte order and ip_len inconsistencies in ICMP reply code. Also, fix
some formatting and HTONS(foo) vs. foo = htons(foo) inconsistencies.

PR #6602, Darren Reed.


# 1.76 19-Dec-1998 thorpej

Reverse the copyright-notice-swap. It went against existing practice.


# 1.75 18-Dec-1998 thorpej

Add a lock around the IP fragment reassembly queue, to prevent ip_drain()
from corrupting the queue if called from a device's interrupt context.

Should fix PR #5684.


Revision tags: kenh-if-detach-base
# 1.74 13-Nov-1998 thorpej

branches: 1.74.2;
Once a fragmented IP packet has been reassembled, recompute the packet
length before passing it up the stack. From FreeBSD.


Revision tags: chs-ubc-base
# 1.73 08-Oct-1998 thorpej

Use the pool allocator for ipflow entries.


# 1.72 08-Oct-1998 thorpej

Use the pool allocator for ipqent structures.


# 1.71 30-Sep-1998 tls

Switch order of TNF and UCB copyrights so UCB copyright is first; this seems more appropriate since UCB wrote the original code, after all.


# 1.70 09-Sep-1998 thorpej

Make a diagnostic printf more sensible, PR #5951, Heiko W. Rupp.


# 1.69 09-Aug-1998 mrg

defopt PFIL_HOOKS.


Revision tags: eeh-paddr_t-base
# 1.68 17-Jul-1998 sommerfe

Fix PR5508: ipfil cut-through forwarding causes panic


# 1.67 01-Jun-1998 thorpej

Protect the ipflow_reap() call with splsoftnet.


# 1.66 24-May-1998 thorpej

Fix OBOB in IP timestamp option processing, as noted in FreeBSD PR 6738,
from Jennifer Dawn Meyers <jdm@enteract.com>.


# 1.65 04-May-1998 matt

Default IP flow to being enabled. Add a sysctl to control the maximum
number of flows (net.inet.ip.maxflows). If set to 0, will disable fast
path forwarding.


# 1.64 01-May-1998 thorpej

Allow packet filters to prevent a packet from creating a fast-forwarding
flow, by setting the "can fast forward" flag in the packet header, and
giving a chance for filters to clear the flag. If the flag is still
set after the filters have given it a chance, the packet will be used
to create a fast-forward flow entry.


# 1.63 29-Apr-1998 matt

Add support for "fast" forwarding. Add hooks in if_ethersubr.c and
if_fddisubr.c to fastpath IP forwarding. If ip_forward successfully
forwards a packet, it will create a cache (ipflow) entry. ether_input
and fddi_input will first call ipflow_fastforward with the received
packet and if the packet passes enough tests, it will be forwarded (the
ttl is decremented and the cksum is adjusted incrementally).


# 1.62 29-Apr-1998 matt

defopt GATEWAY


# 1.61 29-Apr-1998 kml

change path MTU timeout value to match RFC 1191


# 1.60 29-Apr-1998 kml

Add support for deletion of routes added by path MTU discovery;
uses new generic route timeout code. Add sysctl for timeout period.


# 1.59 19-Mar-1998 mrg

convert pfil(9) in and out lists from <sys/queue.h> LISTs to TAILQs, and
change pfil_add_hook to put output filters at the tail of the queue,
while continuing to place input filters at the head of the queue. update
the two users of these functions, and document these changes.

fixes PR#4593.


# 1.58 15-Feb-1998 tls

Add correct copyright notice for IP address hash change. This code is donated to TNF by the original copyright holder, Panix.


# 1.57 13-Feb-1998 tls

Change list of interface IP addresses to a hash. Improves performance on hosts with a large number of IP addresses significantly.


# 1.56 28-Jan-1998 thorpej

Use offsetof() from libkern.h


# 1.55 12-Jan-1998 scottr

Use option header file for MROUTING


# 1.54 05-Jan-1998 lukem

enhance ephemeral port allocation code:
* support sysctl net.inet.ip.anonportmin (lowest ephemeral port)
and net.inet.ip.anonportmax (highest ephemeral port).
these can't be set to >65535, < IPPORT_RESERVED (unless IPNOPRIVPORTS
is defined), and anonportmin has to be < anonportmax.
* use a cleaner way of only cycling through the available set once;
this will be useful for when a random allocation scheme is used
* define IPPORT_ANON{MIN,MAX} instead of IPPORT_USER{LOW,HIGH}


Revision tags: netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base
# 1.53 18-Oct-1997 kml

branches: 1.53.2;
change sysctl net.inet.icmp.mtudisc to net.inet.ip.mtudisc


# 1.52 17-Oct-1997 thorpej

Allow `subnetsarelocal' to be changed via sysctl.


Revision tags: thorpej-signal-base marc-pcmcia-base
# 1.51 29-Aug-1997 gwr

Tweaks to allow operation with an interface address of 0.0.0.0
(needed for NFS mountroot using BOOTP to get boot parameters)


Revision tags: marc-pcmcia-bp
# 1.50 24-Jun-1997 thorpej

branches: 1.50.4;
Eliminate use of dtom() from the network code, allowing more flexible
use of mbuf external storage and increasing performance (by eliminating
an m_pullup() for clusters in the IP reassembly code).

Changes from Koji Imada <koji@math.human.nagoya-u.ac.jp>, in PR #3628
and #3480, with ever-so-slight integration changes by me.


# 1.49 15-Apr-1997 christos

Move the mtod calls *after* we've made sure that the packet has passed the
filter successfully. Otherwise it can be NULL if the filter blocked it,
and we die. How did this ever work?


Revision tags: is-newarp-before-merge
# 1.48 26-Feb-1997 mrg

allow src-routed packetd by default, per host requirements


# 1.47 25-Feb-1997 cjs

Add net.inet.ip.allowsrcrt option which allows/drops all source
routed packets. This currently defaults to `drop,' but once we
verify that all applications that rely on determining remote IP
addresses for authentication are dropping the connection when they
see a source route option (not just disabling the source route
option), we can turn this back on and conform with the host
requirements.


# 1.46 19-Feb-1997 cjs

Fix bug in sysctl net.inet.ip.forwsrcrt handing: now you can read it
if securelevel > 0. (Thanks, cgd.)


# 1.45 18-Feb-1997 mrg

pseudo-device ipfilter brings in PFIL_HOOKS.


Revision tags: is-newarp-base
# 1.44 11-Jan-1997 thorpej

branches: 1.44.4;
Implement the IP_RECVIF socket option: supply a datagram packet's incoming
interface using a sockaddr_dl in a control mbuf.

Implement SO_TIMESTAMP for IP datagrams.

Move packet information option processing into a generic function
so that they work with multicast UDP and raw IP as well as unicast UDP.

Contributed by Bill Fenner <fenner@parc.xerox.com>.


# 1.43 20-Dec-1996 mrg

in pfil_hooks: always reassign ip after calling hook.


# 1.42 20-Dec-1996 mrg

remove pfil_bad.


# 1.41 25-Oct-1996 thorpej

Before concatenating frags, sanity check the length of the packet. If it's
larger than IP_MAXPACKET, discard it.
Based on a patch from Bill Fenner <fenner@parc.xerox.com>


# 1.40 22-Oct-1996 veego

Fix a panic from the pfil_hooks.


# 1.39 13-Oct-1996 christos

backout previous kprintf changes


# 1.38 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.37 21-Sep-1996 perry

commit fix in pr 2772 -- the IP input code was assuming that the
reserved (must be zero) flag must necessarily be zero. We now define
an IP_RF (by analogy to IP_DF and IP_MF) and mask it out when necessary.


# 1.36 14-Sep-1996 mrg

move the packet filter hooks in to a saner location. while i'm here, rename
PACKET_FILTER to PFIL_HOOKS.


# 1.35 09-Sep-1996 mycroft

Add in_nullhost() and in_hosteq() macros, to hide some protocol
details. Also, fix a bug in TCP wrt SYN+URG packets.


# 1.34 08-Sep-1996 mycroft

Save 68 bytes of the packet for ICMP, not 64. From Laine Stump, PR 2296.


# 1.33 06-Sep-1996 mrg

add packet filter interface code. see pfil(9) for more details. you
need the PACKET_FILTER option to enable this code. currently, ipfilter
version 3.1.1-beta has been converted to use this new interface.


# 1.32 14-Aug-1996 thorpej

Fix some DIAGNOSTIC printf() formats; ntohl() provides a 32-bit quantity,
and should be printed with %x, not %lx.


# 1.31 10-Jul-1996 cgd

print result of ntohl/htonl as a long. (makes -Wformat work on the
Alpha.)


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.30 16-Mar-1996 christos

branches: 1.30.4;
Fix printf format args.


# 1.29 26-Feb-1996 mrg

two more local addr changes, all done differently now (idea from charles)


# 1.28 13-Feb-1996 christos

netinet prototypes


# 1.27 16-Jan-1996 thorpej

Add a net.inet.ip.directed-broadcast sysctl as suggested by
Darren Reed <darrenr@vitruvius.arbld.unimelb.edu.au> in PR #1227.
This change is slightly different than the one submitted by Darren in
that the DIRECTED_BROADCAST compile-time option will behave like it used
to so that existing configurations utilizing it won't have to change.


# 1.26 15-Jan-1996 thorpej

Add net.inet.ip.forwsrcrt: if zero, the system will not forward
source-routed packets. Note this value is protected by kernel security
level; it can only be changed if securelevel < 1.


# 1.25 21-Nov-1995 cgd

make netinet work on systems where pointers and longs are 64 bits
(like the alpha). Biggest problem: IP headers were overlayed with
structure which included pointers, and which therefore didn't overlay
properly on 64-bit machines. Solution: instead of threading pointers
through IP header overlays, add a "queue element" structure to do
the threading, and point it at the ip headers.


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.24 12-Aug-1995 mycroft

splnet --> splsoftnet


# 1.23 12-Jun-1995 mycroft

Change in_pcbnotify*() to take an errno value. Make inetctlerrmap[] an
array on ints, not u_chars.


# 1.22 12-Jun-1995 mycroft

Various cleanup, including:
* Convert several data structures to use queue.h.
* Split in_pcbnotify() into two parts; one for notifying a specific PCB, and
one for notifying all PCBs for a particular foreign address.


# 1.21 07-Jun-1995 mycroft

Remove ip_ifmatrix completely.


# 1.20 04-Jun-1995 mycroft

Don't cast things unnecessarily.


# 1.19 04-Jun-1995 mycroft

Clean up many more casts.


# 1.18 01-Jun-1995 mycroft

Avoid byte-swapping IP addresses at run time.


# 1.17 15-May-1995 cgd

oops; forgot a '{'


# 1.16 14-May-1995 cgd

drop (and record) malformed IP fragments. Fixes pr 1030 (differently).


# 1.15 13-Apr-1995 cgd

be a bit more careful and explicit with types. (basically a large no-op.)


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.14 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.13 13-May-1994 mycroft

Update to 4.4-Lite networking code, with a few local changes.


# 1.12 14-Feb-1994 mycroft

PARANOID --> DIAGNOSTIC for inexpensive tests.


# 1.11 02-Feb-1994 hpeyerl

Multicast is no longer optional.


# 1.10 29-Jan-1994 brezak

Fix some cases of NOT dealing with m_pkthdr's. This code is still suspect though, at least this fixes some panics.


# 1.9 10-Jan-1994 mycroft

Should compile now with or without `options MULTICAST'.


# 1.8 09-Jan-1994 mycroft

Prototype the rest.


# 1.7 08-Jan-1994 mycroft

More prototypes.


# 1.6 08-Jan-1994 mycroft

Fix some inconsistent spacing; spaces at the end of lines, etc.


# 1.5 18-Dec-1993 mycroft

Canonicalize all #includes.


# 1.4 06-Dec-1993 hpeyerl

multicast support.
>From Chris Maeda, cmaeda@cs.washington.edu
These patches are derived from the IP Multicast patches for BSDI.


Revision tags: magnum-base netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.3 20-May-1993 cgd

branches: 1.3.4;
more rcsid additions and file header cleanups


# 1.2 04-May-1993 cgd

make ip_input recursion checking be for -DPARANOID, and make it panic


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


Revision tags: isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118
# 1.388 17-Jan-2019 knakahara

Fix ipsecif(4) cannot apply input direction packet filter. Reviewed by ozaki-r@n.o and ryo@n.o.

Add ATF later.


Revision tags: pgoyette-compat-1226 pgoyette-compat-1126
# 1.387 15-Nov-2018 maxv

Remove the 't' argument from m_tag_find().


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.386 02-Sep-2018 maxv

remove reference to ipnat, and duplicate comments


Revision tags: pgoyette-compat-0728
# 1.385 10-Jul-2018 maxv

Remove the second argument from ip_reass_packet(). We want the IP header
on the mbuf, not elsewhere. Simplifies the NPF reassembly code a little.
No real functional change.


Revision tags: phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521
# 1.384 17-May-2018 maxv

Add KASSERTs, related to PR/39794.


# 1.383 14-May-2018 maxv

Merge ipsec4_input and ipsec6_input into ipsec_ip_input. Make the argument
a bool for clarity. Optimize the function: if M_CANFASTFWD is not there
(because already removed by the firewall) leave now.

Makes it easier to see that M_CANFASTFWD is not removed on IPv6.


# 1.382 10-May-2018 maxv

Rename ipsec4_forward -> ipsec_mtu, and switch to void.


Revision tags: pgoyette-compat-0502
# 1.381 26-Apr-2018 maxv

Remove unused mbuf argument from sbsavetimestamp.


Revision tags: pgoyette-compat-0422 pgoyette-compat-0415
# 1.380 15-Apr-2018 maxv

Introduce a m_verify_packet function, that verifies the mbuf chain of a
packet to ensure it is not malformed. Call this function in "points of
interest", that are the IPv4/IPv6/IPsec entry points. There could be more.

We use M_VERIFY_PACKET(m), declared under DIAGNOSTIC only.

This function should not be called everywhere, especially not in places
that temporarily manipulate (and clobber) the mbuf structure; once they're
done they put the mbuf back in a correct format.


# 1.379 11-Apr-2018 maxv

Don't pass IP_ALLOWBROADCAST in ipsec4_input. The flag lands in
ipsec_getpolicybyaddr, and only IP_FORWARDING is taken.

In fact it would be good to change the 'flags' argument of ipsec4_input
to be a boolean, same for ipsec_getpolicybyaddr. It would be less
misleading.


# 1.378 11-Apr-2018 maxv

Add comment about IPsec.


# 1.377 11-Apr-2018 maxv

Small changes in ip_dooptions: replace bcopy by memcpy, the areas can't
overlap.


Revision tags: pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.376 24-Feb-2018 ozaki-r

branches: 1.376.2;
Avoid a deadlock between softnet_lock and IFNET_LOCK

A deadlock occurs because there is a violation of the rule of lock ordering;
softnet_lock is held with hodling IFNET_LOCK, which violates the rule.
To avoid the deadlock, replace softnet_lock in in_control and in6_control
with KERNEL_LOCK.

We also need to add some KERNEL_LOCKs to protect the network stack surely.
This is required, for example, for PR kern/51356.

Fix PR kern/53043


# 1.375 09-Feb-2018 maxv

Remove dead code.


# 1.374 07-Feb-2018 maxv

Remove null check on ip, it can't be null. (Confuses code scanners.)


# 1.373 06-Feb-2018 maxv

Typos and style a bit, no functional change.


# 1.372 05-Feb-2018 maxv

Exterminate IPSENDREDIRECTS and IPMTUDISCTIMEOUT, neither is documented.


# 1.371 05-Feb-2018 maxv

Nuke DIRECTED_BROADCAST, it is not documented and not enabled anywhere. It
probably wouldn't have built correctly anyway, since there is no associated
defflag.

These ten lines of code in ip_input.c already look a lot better.


# 1.370 05-Feb-2018 maxv

Clean up this mess. This is typically the kind of places where we need to
seriously cut the bullshit. These things are unreadable, undocumented, and
all they bought us was not figuring out we had IPv4 forwarding enabled by
default for 20+ years.


# 1.369 05-Feb-2018 maxv

Be tougher, and don't allow LSRR+SSRR (RFC7126).


# 1.368 05-Feb-2018 maxv

Kick duplicate options, they are not allowed (RFC791).


# 1.367 05-Feb-2018 maxv

Remove unused variable.


# 1.366 05-Feb-2018 maxv

Disable ip_allowsrcrt and ip_forwsrcrt. Enabling them by default was a
completely dumb idea, because they have security implications.

By sending an IPv4 packet containing an LSRR option, an attacker will
cause the system to forward the packet to another IPv4 address - and
this way he white-washes the source of the packet.

It is also possible for an attacker to reach hidden networks: if a server
has a public address, and a private one on an internal network (network
which has several internal machines connected), the attacker can send a
packet with:

source = 0.0.0.0
destination = public address of the server
LSRR first address = address of a machine on the internal network

And the packet will be forwarded, by the server, to the internal machine,
in some cases even with the internal IP address of the server as a source.


# 1.365 05-Feb-2018 maxv

Style, no functional change.


# 1.364 01-Jan-2018 christos

1) "#define ipi_spec_dst ipi_addr" in <netinet/in.h>
2) Change the IP_RECVPKTINFO option to control the generation of
IP_PKTINFO control messages, the way it's done in Solaris.
3) Remove the superfluous IP_RECVPKTINFO control message.
4) Change the IP_PKTINFO option to do different things depending on
the parameter it's supplied with:
- If it's sizeof(int), assume it's being used as in Linux:
- If it's non-zero, turn on the IP_RECVPKTINFO option.
- If it's zero, turn off the IP_RECVPKTINFO option.
- If it's sizeof(struct in_pktinfo), assume it's being used as in
Solaris, to set a default for the source interface and/or
source address for outgoing packets on the socket.
5) Return what Linux or Solaris compatible code expects, depending
on data size, and just added a fallback to a Linux (and current NetBSD)
compatible value if the size is unknown (as it is now), or,
in the future, if the calling application specifies a receiving
buffer that doesn't match either data item.

From: Tom Ivar Helbekkmo


Revision tags: tls-maxphys-base-20171202
# 1.363 24-Nov-2017 roy

Allow local communication over DETACHED addresses.
Allow binding to DETACHED or TENTATIVE addresses as we deny
sending upstream from them anyway.
Prefer non DETACHED or TENTATIVE addresses.


# 1.362 17-Nov-2017 ozaki-r

Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch

It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.

No functional change


# 1.361 27-Sep-2017 ozaki-r

Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).


Revision tags: nick-nhusb-base-20170825
# 1.360 27-Jul-2017 ozaki-r

Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.


# 1.359 19-Jul-2017 ozaki-r

Correct a comment


Revision tags: perseant-stdc-iso10646-base
# 1.358 08-Jul-2017 christos

Reorder the controls to the ones that need an interface and the ones that
don't; process the ones that don't first. Add a DIAGNOSTIC if there is no
interface; really this should be a KASSERT/panic because it is a bug if the
interface is not set at this point.


# 1.357 06-Jul-2017 christos

remove unnecessary casts (no functional change)


# 1.356 06-Jul-2017 christos

Merge the two copies SO_TIMESTAMP/SO_OTIMESTAMP processing to a single
function, and add a SOOPT_TIMESTAMP define reducing compat pollution from
5 places to 1.


Revision tags: netbsd-8-base
# 1.355 01-Jun-2017 chs

branches: 1.355.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base
# 1.354 31-Mar-2017 ozaki-r

Don't use a single global variable to store source route information for multiple incoming packets

It's not MP-safe. So use a m_tag to store the information instead.

Pointed out by knakahara@
The fix is from OpenBSD (originally fixed in FreeBSD)


# 1.353 31-Mar-2017 ozaki-r

Don't use a single global variable as a temporal storage for multiple packets

It's not MP-safe. So use local variables instead.


Revision tags: pgoyette-localcount-20170320
# 1.352 06-Mar-2017 ozaki-r

Make sure icmp_redirect_timeout_q and ip_mtudisc_timeout_q are initialized on bootup

Fix PR kern/52029


# 1.351 17-Feb-2017 ozaki-r

Fix return value


# 1.350 17-Feb-2017 ozaki-r

Protect sysctl_net_inet_ip_pmtudto with icmp_mtx instead of softnet_lock


# 1.349 07-Feb-2017 ozaki-r

Add missing NULL checks for m_get_rcvif


Revision tags: nick-nhusb-base-20170204
# 1.348 24-Jan-2017 ozaki-r

Tweak softnet_lock and NET_MPSAFE

- Don't hold softnet_lock in some functions if NET_MPSAFE
- Add softnet_lock to sysctl_net_inet_icmp_redirtimeout
- Add softnet_lock to expire_upcalls of ip_mroute.c
- Restore softnet_lock for in{,6}_pcbpurgeif{,0} if NET_MPSAFE
- Mark some softnet_lock for future work


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107
# 1.347 12-Dec-2016 ozaki-r

branches: 1.347.2;
Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.346 08-Dec-2016 ozaki-r

Use psref for ip_rtaddr

ip_rtaddr will be sleepable soon. So use psref instead of pserialize.


# 1.345 08-Dec-2016 ozaki-r

Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.344 18-Oct-2016 ozaki-r

Don't hold global locks if NET_MPSAFE is enabled

If NET_MPSAFE is enabled, don't hold KERNEL_LOCK and softnet_lock in
part of the network stack such as IP forwarding paths. The aim of the
change is to make it easy to test the network stack without the locks
and reduce our local diffs.

By default (i.e., if NET_MPSAFE isn't enabled), the locks are held
as they used to be.

Reviewed by knakahara@


# 1.343 18-Oct-2016 ozaki-r

Avoid double frees of mbuf

May fix one of panicks reported by Tom Ivar Helbekkmo in PR kern/51522


# 1.342 11-Oct-2016 ozaki-r

Fix kernel builds with IFA_STATS


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.341 07-Sep-2016 roy

Disallow input to detached addresses because they are not yet valid.


# 1.340 31-Aug-2016 ozaki-r

Make ipforward_rt and ip6_forward_rt percpu

Sharing one rtcache between CPUs is just a bad idea.

Reviewed by knakahara@


Revision tags: pgoyette-localcount-20160806
# 1.339 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


# 1.338 26-Jul-2016 ozaki-r

Fix downmatch increment


Revision tags: pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.337 08-Jul-2016 ozaki-r

branches: 1.337.2;
CID 1363344: remove dead code

We may need to reconsider a case when m_get_rcvif_psref returns NULL.


# 1.336 07-Jul-2016 ozaki-r

Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.


# 1.335 06-Jul-2016 ozaki-r

Switch the IPv4 address list to pslist(9)

Note that we leave the old list just in case; it seems there are some
kvm(3) users accessing the list. We can remove it later if we confirmed
nobody does actually.


# 1.334 06-Jul-2016 ozaki-r

Add and use pslist(9)-based hashtable for IPv4 addresses

Note that we leave the old hashtable to keep vmstat -H working.


# 1.333 04-Jul-2016 ozaki-r

Separate IP address matching functions

No functional change intended.


# 1.332 30-Jun-2016 ozaki-r

Tidy up goto lables

No functional change.


# 1.331 30-Jun-2016 ozaki-r

Fix error paths

Some error paths did m_put_rcvif_psref twice.


# 1.330 28-Jun-2016 ozaki-r

Add missing NULL checks for m_get_rcvif_psref


# 1.329 10-Jun-2016 ozaki-r

Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319
# 1.328 21-Jan-2016 riastradh

Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.


# 1.327 21-Jan-2016 riastradh

Give proper prototype to ip_output.


# 1.326 08-Jan-2016 knakahara

eliminate ip_input.c and ip6_input.c dependency on gif(4)


Revision tags: nick-nhusb-base-20151226
# 1.325 13-Oct-2015 roy

Include arp.h to restore the sysctl net.inet.ip.dad_count.
Fixes PR kern/49883 thanks to HITOSHI Osada.


Revision tags: nick-nhusb-base-20150921
# 1.324 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.323 07-Aug-2015 ozaki-r

Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.


Revision tags: nick-nhusb-base-20150606
# 1.322 02-May-2015 joerg

Fix !ARP build.


# 1.321 02-May-2015 roy

Add IPv4 address flags IN_IFF_TENTATIVE, IN_IFF_DUPLICATED and
IN_IFF_DETATCHED to mimic the IPv6 address behaviour.
Add SIOCGIFAFLAG_IN ioctl to retrieve the address flag via the
ifreq structure.
Add IPv4 DAD detection via the ARP methods described in RFC 5227.
Add sysctls net.inet.ip.dad_count and net.inet.arp.debug.

Discussed on tech-net@


Revision tags: nick-nhusb-base-20150406
# 1.320 26-Mar-2015 ozaki-r

Tidy up the regular path of ip_forward

No functional change is intended.


Revision tags: netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.319 16-Jun-2014 ozaki-r

branches: 1.319.2; 1.319.4; 1.319.6; 1.319.10;
Add 3rd argument to pktq_create to pass sc

It will be used to pass bridge sc for bridge_forward softint.

ok rmind@


# 1.318 05-Jun-2014 rmind

- Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.


# 1.317 30-May-2014 christos

Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.


# 1.316 29-May-2014 rmind

Make IGMP and multicast group management code MP-safe. Use a read-write
lock to protect the hash table of multicast address records; also, make it
private and eliminate some macros. In the long term, the lookup path ought
to be optimised.


# 1.315 28-May-2014 christos

CID 12164{49,51}: Remove bogus ifp == NULL checks; if ifp was really NULL,
we would have been dead a few lines before the tests.


# 1.314 23-May-2014 rmind

ip_input(), ip_savecontrol(): cache m->m_pkthdr.rcvif in a variable.


# 1.313 23-May-2014 rmind

Make ip_forward() static, there is no need to expose it.


# 1.312 23-May-2014 rmind

Make ip_input() static, there is no need to expose it.


# 1.311 22-May-2014 rmind

- Add in_init() and move some functions, variables and sysctls into in.c
where they belong to. Make some functions and variables static.
- ip_input.c: reduce some #ifdefs, cleanup a little.
- Move some sysctls into ip_flow.c as they belong there.

No functional change.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.310 19-Mar-2014 liamjfoy

branches: 1.310.2;
Remove ipflow_prune and replace with ipflow_reap. ok rmind@


Revision tags: riastradh-drm2-base3
# 1.309 25-Feb-2014 pooka

Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.308 29-Jun-2013 rmind

- Rewrite parts of pfil(9): use array to store hooks and thus be more cache
friendly (there are only few hooks in the system). Make the structures
opaque and the interface more strict.
- Remove PFIL_HOOKS option by making pfil(9) mandatory.


# 1.307 27-Jun-2013 christos

branches: 1.307.2;
flip src/dst


# 1.306 27-Jun-2013 christos

implement IP_PKTINFO and IP_RECVPKTINFO.


# 1.305 08-Jun-2013 rmind

Split IPsec code in ip_input() and ip_forward() into the separate routines
ipsec4_input() and ipsec4_forward(). Tested by christos@.


# 1.304 05-Jun-2013 christos

IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.


Revision tags: agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.303 29-Nov-2012 christos

Add a new sysctl to mark ports as reserved, so that they are not used in
the anonymous or reserved port allocation.


Revision tags: yamt-pagecache-base6
# 1.302 25-Jun-2012 christos

branches: 1.302.2;
rename rfc6056 -> portalgo, requested by yamt


# 1.301 22-Jun-2012 christos

PR/46602: Move the rfc6056 port randomization to the IP layer.


# 1.300 02-Jun-2012 dsl

Add some pre-processor magic to verify that the type of the data item
passed to sysctl_createv() actually matches the declared type for
the item itself.
In the places where the caller specifies a function and a structure
address (typically the 'softc') an explicit (void *) cast is now needed.
Fixes bugs in sys/dev/acpi/asus_acpi.c sys/dev/bluetooth/bcsp.c
sys/kern/vfs_bio.c sys/miscfs/syncfs/sync_subr.c and setting
AcpiGbl_EnableAmlDebugObject.
(mostly passing the address of a uint64_t when typed as CTLTYPE_INT).
I've test built quite a few kernels, but there may be some unfixed MD
fallout. Most likely passing &char[] to char *.
Also add CTLFLAG_UNSIGNED for unsiged decimals - not set yet.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.299 22-Mar-2012 drochner

remove KAME IPSEC, replaced by FAST_IPSEC


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.298 09-Jan-2012 liamjfoy

branches: 1.298.2; 1.298.6; 1.298.8;
check against NULL


# 1.297 19-Dec-2011 drochner

rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.296 31-Aug-2011 plunky

branches: 1.296.2; 1.296.6;
NULL does not need a cast


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.295 03-May-2011 dyoung

*_drain() routines may be called with locks held, so instead of doing
any work in *_drain(), set a drain-needed flag. Do the work in the
fasttimo handler.

Contributed by Coyote Point Systems, Inc.


# 1.294 14-Apr-2011 dyoung

In ipintr(), don't overwrite ipintrq.ifq_maxlen with IFQ_MAXLEN.

Initialize ipintrq.ifq_maxlen using IFQ_MAXLEN directly instead of using
the global ipqmaxlen. Get rid of the global ipqmaxlen.

Now it works again to override the maximum IP queue length with, for
example, sysctl -w net.inet.ip.ifq.maxlen=5.


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.293 13-Dec-2010 matt

branches: 1.293.2;
Back out rev that shouldn't have been committed.


# 1.292 11-Dec-2010 matt

Add routines to calculate a checkesum if the driver concludes that the
h/w can't do it.


Revision tags: uebayasi-xip-base4
# 1.291 05-Nov-2010 rmind

ip_randomid: make mechanism MP-safe and more modular.

OK matt@


# 1.290 05-Nov-2010 rmind

ip_reass_packet: finish abstraction; some clean-up.
Discussed some time ago with matt@.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.289 19-Jul-2010 rmind

Abstract IP reassembly into single generic routine - ip_reass_packet().
Make struct ipq private and struct ipqent not visible to userland.
Push ip_len adjustment into reassembly layer.

OK matt@


# 1.288 13-Jul-2010 rmind

Split-off IPv4 re-assembly mechanism into a separate module. Abstract
into ip_reass_init(), ip_reass_lookup(), etc (note: abstraction is not
yet complete). No functional changes to the actual mechanism.

OK matt@


# 1.287 09-Jul-2010 rmind

ip_input: move lookup for fragment queue a little bit further. OK matt@.


Revision tags: uebayasi-xip-base1
# 1.286 01-Apr-2010 tls

As suggested by at least 3 different people (the guilty parties know who
they are) avoid repeated kernel_lock/unlock by using an intrq on the stack.

About 5%-10% better from run to run, on my *very* simpleminded test. Can't
possibly be worse.


# 1.285 31-Mar-2010 tls

Don't hold kernel lock across call to ip_input() -- it blocked *all*
hardware interrupts for the length of time it took for all dequeued
packets to flow up the stack (on multiprocessors only). Initial testing
shows performance impact is minimal -- since this temporary fix actually
means taking/releasing the kernel lock per-packet, that seems
acceptable.

Holding the kernel lock across the ip_input() call duplicated the
exclusion intended to be provided by the socket locks/softnet lock
(same lock, for INET/INET6 sockets) and could mask serious bugs. Several
hours' testing didn't turn any up but I'd be surprised if some don't now
appear.

Damon Permezel noticed the problem. Temporary fix suggested by matt@.


Revision tags: yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.284 16-Sep-2009 pooka

branches: 1.284.2; 1.284.4;
Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.283 17-Jul-2009 minskim

Delete trailing whitespace.


Revision tags: yamt-nfs-mp-base6
# 1.282 16-Jul-2009 minskim

Add the IP_RECVTTL option support.

If the IP_RECVTTL option is enabled on a SOCK_DGRAM socket, the
recvmsg(2) call will return the TTL of the received datagram. The
msg_control field in the msghdr structure points to a buffer that
contains a cmsghdr structure followed by the TTL value.

Modeled after FreeBSD implementation.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.281 18-Apr-2009 tsutsui

Remove extra whitespace added by a stupid tool.
XXX: more in src/sys/arch


# 1.280 15-Apr-2009 elad

Remove a few KAUTH_GENERIC_ISSUSER in favor of more descriptive
alternatives.

Discussed on tech-kern:

http://mail-index.netbsd.org/tech-kern/2009/04/11/msg004798.html

Input from ad@, christos@, dyoung@, tsutsui@.

Okay ad@.


# 1.279 18-Mar-2009 cegger

bcopy -> memcpy


Revision tags: nick-hppapmap-base2
# 1.278 19-Jan-2009 christos

branches: 1.278.2;
Provide compatibility to the old timeval SCM_TIMESTAMP messages.


Revision tags: mjf-devfs2-base
# 1.277 17-Dec-2008 cegger

kill MALLOC and FREE macros.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.276 23-Nov-2008 rmind

ip_input: fix an IPQ "lock" leak. (hi <matt>!)


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4
# 1.275 04-Oct-2008 pooka

branches: 1.275.2; 1.275.4;
POOL_INIT -> pool_init


Revision tags: wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.274 05-Sep-2008 seanb

Wrong route being consulted in one place
in ip_forward() after change to rtcache_*().
Restore previous behaviour.


# 1.273 20-Aug-2008 matt

Make the sysctl routines take out softnet_lock before dealing with
any data structures.

Change inet6ctlerrmap and zeroin6_addr to const.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base
# 1.272 05-May-2008 ad

branches: 1.272.2; 1.272.6;
- Convert hashinit() to use kmem_alloc(). The hash tables can be large
and it's better to not have them in kmem_map.
- Convert a couple of minor items along the way to kmem_alloc().
- Fix some memory leaks.


# 1.271 04-May-2008 thorpej

Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577


# 1.270 02-May-2008 ad

PR kern/38497 Out of memory allocating ksiginfo

Work around: don't acquire softnet_lock in protocol drain routines.


# 1.269 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.268 24-Apr-2008 ad

branches: 1.268.2;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.


# 1.267 23-Apr-2008 thorpej

Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.266 12-Apr-2008 thorpej

branches: 1.266.2;
Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.


# 1.265 09-Apr-2008 thorpej

- ipflow is not used outside ip_flow.c; move its definition there.
- Make ipflow_reap() private to ip_flow.c, and introduce ipflow_prune()
for external callers to use (avoids returning an ipflow * that is never
actually used anyway).


# 1.264 07-Apr-2008 thorpej

Change IP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ipstat structure; old netstat
binaries will continue to work properly.


# 1.263 27-Mar-2008 cube

- Make sure we send a reasonable fragment size when IPSEC is configured.
Otherwise we end up sending a dubious "0" whenever we cannot find a
proper association for the packet.
- Reset sack_newdata along with snd_nxt to avoid improper integer
arithmetics that lead to sending data from an incorrect place in the
stream, making it appear as corrupted.

Patch by Michael Van Elst, based on an analysis by Michael for the IPSEC
stuff and I for the SACK issue.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.262 06-Feb-2008 matt

branches: 1.262.6;
Add a new ip_id generation scheme based on a Fisher-Yates shuffle over a
sliding window. XXX replace use of arc4random RSN.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.261 14-Jan-2008 dyoung

Use rtcache_validate() instead of rtcache_getrt(). Shorten staircase
in in_losing().


Revision tags: vmlocking2-base3 matt-armv6-base
# 1.260 22-Dec-2007 matt

Fix offset calculation.
Make sure that all frags use the same TOS.


# 1.259 21-Dec-2007 matt

Also make sure the first is at 68 bytes long.


# 1.258 21-Dec-2007 matt

Prevent TCP blind data attacks by not allowing non-initial fragments to
start at less than 68 bytes (minimal fragment size).


# 1.257 20-Dec-2007 dyoung

Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.256 26-Nov-2007 yamt

branches: 1.256.2; 1.256.6;
inetctlerrmap: use designated initializer.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.255 09-Nov-2007 kefren

Don't MCLAIM in ipintr() because we do it anyway in ip_input()


Revision tags: jmcneill-base yamt-x86pmap-base4 yamt-x86pmap-base3 yamt-x86pmap-base2 vmlocking-base
# 1.254 02-Oct-2007 dyoung

branches: 1.254.2; 1.254.4;
Delete the unused second argument to ip_stripoptions(), move it
closer to its single caller in if_eon.c, try to move fewer bytes
by moving the IP header forward instead of moving the tail of the
mbuf backward, and use m_adj(9) instead of fiddling directly with
mbuf data members.


Revision tags: yamt-x86pmap-base
# 1.253 11-Sep-2007 degroote

branches: 1.253.2;
In some FAST_IPSEC, spl level is not restored correctly. Fix that.

Spotted by Wolfgang Stukenbrock in pr/36800


Revision tags: nick-csl-alignment-base5
# 1.252 30-Aug-2007 dyoung

Use malloc(9) for sockaddrs instead of pool(9), and remove dom_sa_pool
and dom_sa_len members from struct domain. Pools of fixed-size
objects are too rigid for sockaddr_dls, whose size can vary over
a wide range.

Return sockaddr_dl to its "historical" size. Now that I'm using
malloc(9) instead of pool(9) to allocate sockaddr_dl, I can create
a sockaddr_dl of any size in the kernel, so expanding sockaddr_dl
is useless.

Avoid using sizeof(struct sockaddr_dl) in the kernel.

Introduce sockaddr_dl_alloc() for allocating & initializing an
arbitrary sockaddr_dl on the heap.

Add an argument, the sockaddr length, to sockaddr_alloc(),
sockaddr_copy(), and sockaddr_dl_setaddr().

Constify: LLADDR() -> CLLADDR().

Where the kernel overwrites LLADDR(), use sockaddr_dl_setaddr(),
instead. Used properly, sockaddr_dl_setaddr() will not overrun
the end of the sockaddr.


# 1.251 10-Aug-2007 dyoung

branches: 1.251.2;
Use sockaddr_dl_init().


Revision tags: matt-mips64-base
# 1.250 19-Jul-2007 dyoung

branches: 1.250.4; 1.250.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base yamt-idlelwp-base8 mjf-ufs-trans-base
# 1.249 02-May-2007 dyoung

branches: 1.249.2;
Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


Revision tags: thorpej-atomic-base
# 1.248 25-Mar-2007 liamjfoy

Add net.inet.ip.hashsize to control the IPv4 fast forward hash table size.


# 1.247 24-Mar-2007 liamjfoy

Don't call ip*flow_reap if we're just looking up maxflows


# 1.246 12-Mar-2007 ad

branches: 1.246.2; 1.246.4;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.


# 1.245 05-Mar-2007 liamjfoy

branches: 1.245.2;
Move ipflow_slowtimo from ip_slowtimo and into in_proto.c

ok matt@


# 1.244 04-Mar-2007 christos

Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.243 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.242 29-Jan-2007 dyoung

branches: 1.242.2;
Cosmetic: remove extraneous, non-KNF parentheses. Change a
sizeof(type) to a sizeof(*ptr) so the correctness of the statement
is correct "at a glance" (or so I hope).


# 1.241 22-Dec-2006 ad

ipintr(): check if the queue is empty before looping. Hardly a giant
win, but removed 30% of splnet() calls in one local test.


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.240 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.239 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


# 1.238 06-Dec-2006 dyoung

KNF.


# 1.237 06-Dec-2006 dyoung

KNF.


Revision tags: netbsd-4-0-RC1 netbsd-4-base
# 1.236 16-Nov-2006 christos

branches: 1.236.2; 1.236.4;
__unused removal on arguments; approved by core.


Revision tags: yamt-splraiseipl-base2
# 1.235 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


# 1.234 10-Oct-2006 dogcow

change the MOWNER_INIT define to take two args; fix extant struct mowner
decls to use it. Makes options MBUFTRACE compile again and not whinge about
missing structure declarations. (Also makes initialization consistent.)


# 1.233 05-Oct-2006 tls

Protect calls to pool_put/pool_get that may occur in interrupt context
with spl used to protect other allocations and frees, or datastructure
element insertion and removal, in adjacent code.

It is almost unquestionably the case that some of the spl()/splx() calls
added here are superfluous, but it really seems wrong to see:

s=splfoo();
/* frob data structure */
splx(s);
pool_put(x);

and if we think we need to protect the first operation, then it is hard
to see why we should not think we need to protect the next. "Better
safe than sorry".

It is also almost unquestionably the case that I missed some pool
gets/puts from interrupt context with my strategy for finding these
calls; use of PR_NOWAIT is a strong hint that a pool may be used from
interrupt context but many callers in the kernel pass a "can wait/can't
wait" flag down such that my searches might not have found them. One
notable area that needs to be looked at is pf.

See also:

http://mail-index.netbsd.org/tech-kern/2006/07/19/0003.html
http://mail-index.netbsd.org/tech-kern/2006/07/19/0009.html


# 1.232 19-Sep-2006 elad

Remove ugly (void *) casts from network scope authorization wrapper and
calls to it.

While here, adapt code for system scope listeners to avoid some more
casts (forgotten in previous run).

Update documentation.


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9
# 1.231 13-Sep-2006 elad

branches: 1.231.2;
Don't use KAUTH_RESULT_* where it's not applicable.
Prompted by yamt@.


# 1.230 08-Sep-2006 elad

First take at security model abstraction.

- Add a few scopes to the kernel: system, network, and machdep.

- Add a few more actions/sub-actions (requests), and start using them as
opposed to the KAUTH_GENERIC_ISSUSER place-holders.

- Introduce a basic set of listeners that implement our "traditional"
security model, called "bsd44". This is the default (and only) model we
have at the moment.

- Update all relevant documentation.

- Add some code and docs to help folks who want to actually use this stuff:

* There's a sample overlay model, sitting on-top of "bsd44", for
fast experimenting with tweaking just a subset of an existing model.

This is pretty cool because it's *really* straightforward to do stuff
you had to use ugly hacks for until now...

* And of course, documentation describing how to do the above for quick
reference, including code samples.

All of these changes were tested for regressions using a Python-based
testsuite that will be (I hope) available soon via pkgsrc. Information
about the tests, and how to write new ones, can be found on:

http://kauth.linbsd.org/kauthwiki

NOTE FOR DEVELOPERS: *PLEASE* don't add any code that does any of the
following:

- Uses a KAUTH_GENERIC_ISSUSER kauth(9) request,
- Checks 'securelevel' directly,
- Checks a uid/gid directly.

(or if you feel you have to, contact me first)

This is still work in progress; It's far from being done, but now it'll
be a lot easier.

Relevant mailing list threads:

http://mail-index.netbsd.org/tech-security/2006/01/25/0011.html
http://mail-index.netbsd.org/tech-security/2006/03/24/0001.html
http://mail-index.netbsd.org/tech-security/2006/04/18/0000.html
http://mail-index.netbsd.org/tech-security/2006/05/15/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/01/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/25/0000.html

Many thanks to YAMAMOTO Takashi, Matt Thomas, and Christos Zoulas for help
stablizing kauth(9).

Full credit for the regression tests, making sure these changes didn't break
anything, goes to Matt Fleming and Jaime Fournier.

Happy birthday Randi! :)


Revision tags: yamt-pdpolicy-base8 rpaulo-netinet-merge-pcb-base
# 1.229 30-Aug-2006 christos

branches: 1.229.2;
fix initializer


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.228 30-Jul-2006 elad

ugh.. more stuff that's overdue and should not be in 4.0: remove the
sysctl(9) flags CTLFLAG_READONLY[12]. luckily they're not documented
so it's only half regression.

only two knobs used them; proc.curproc.corename (check added in the
existing handler; its CTLFLAG_ANYWRITE, yay) and net.inet.ip.forwsrcrt,
that got its own handler now too.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base
# 1.227 07-Jun-2006 kardel

merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base simonb-timecounters-base
# 1.226 08-May-2006 liamjfoy

branches: 1.226.2;
#if -> #ifdef

ok christos


# 1.225 15-Apr-2006 christos

Coverity CID 1134: Protect against NULL deref.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.224 18-Feb-2006 joerg

branches: 1.224.2; 1.224.4; 1.224.6;
Print the source and destination IP in ip_forward's DIAGNOSTIC code
with inet_ntoa, making it more human friendly.

From Liam J. Foy in private mail.


# 1.223 24-Dec-2005 perry

branches: 1.223.2; 1.223.4; 1.223.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.222 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.221 01-Nov-2005 christos

Don't decrement the ttl, until we are sure that we can forward this packet.
Before if there was no route, we would call icmp_error with a datagram
packet that has an incorrect checksum. (From Liam Foy)


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base
# 1.220 23-Oct-2005 christos

No need to pass an interface when only the mtu is needed. From OpenBSD via
Liam Foy.


Revision tags: yamt-vop-base
# 1.219 05-Aug-2005 elad

branches: 1.219.2;
Add sysctls for IP, ICMP, TCP, and UDP statistics.


# 1.218 28-Jun-2005 seanb

branches: 1.218.2;
- Return ICMP_UNREACH_NET when no route found as per
section 4.3.3.1 of rfc1812.


# 1.217 09-Jun-2005 atatat

Properly fix the constipated lossage wrt -Wcast-qual and the sysctl
code. I know it's not the prettiest code, but it seems to work rather
well in spite of itself.


# 1.216 01-Jun-2005 blymn

Unconstify rnode to prevent compile error when GATEWAY option set.


Revision tags: kent-audio2-base
# 1.215 29-Apr-2005 yamt

move decl of inetsw to its own header to avoid array of incomplete type.
found by gcc4. reported by Adam Ciarcinski.


# 1.214 18-Apr-2005 yamt

fix problems related to loopback interface checksum omission. PR/29971.

- for ipv4, defer decision to ip layer as h/w checksum offloading does
so that it can check the actual interface the packet is going to.
- for ipv6, disable it.
(maybe will be revisited when it implements h/w checksum offloading.)

ok'ed by Jason Thorpe.


# 1.213 29-Mar-2005 yamt

ip_reass: clear stale csum_flags.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base
# 1.212 26-Feb-2005 perry

branches: 1.212.2;
nuke trailing whitespace


Revision tags: yamt-km-base2
# 1.211 03-Feb-2005 perry

ANSIfy function declarations


# 1.210 02-Feb-2005 perry

de-__P -- will ANSIfy .c files later.


Revision tags: yamt-km-base
# 1.209 24-Jan-2005 matt

branches: 1.209.2;
Add IFNET_FOREACH and IFADDR_FOREACH macros and start using them.


Revision tags: kent-audio1-beforemerge
# 1.208 19-Dec-2004 christos

branches: 1.208.2;
yamt's changes seem to fix all the checksumming issues. Turn the loopback
checksums back off so we can make sure that everything works.


# 1.207 17-Dec-2004 christos

Turn checksumming on loopback back on until we fix the bugs in it.
Connect over tcp on the loopback is broken:

4729 amq 0.000007 CALL connect(4,0x804f2a0,0x1c)
4729 amq 75.007420 RET connect -1 errno 60 Connection timed out


# 1.206 15-Dec-2004 thorpej

Don't perform checksums on loopback interfaces. They can be reenabled with
the net.inet.*.do_loopback_cksum sysctl.

Approved by: groo


Revision tags: kent-audio1-base
# 1.205 06-Oct-2004 darrenr

Add a comment to document what setting "srcrt" is really on about in ipintr()


# 1.204 29-Sep-2004 christos

PR/27081: Sean Boudreau: ip_input() bad csum count not incremented on sw csum


Revision tags: BEFORE-IPF413
# 1.203 25-May-2004 atatat

Sysctl descriptions under net subtree (net.key not done)


# 1.202 02-May-2004 darrenr

at line 543, we do a pullup here of hlen bytes into the mbuf,
so these later ones are superfluous.


# 1.201 01-May-2004 matt

Use EVCNT_ATTACH_STATIC{,2}


# 1.200 25-Apr-2004 simonb

Initialise (most) pools from a link set instead of explicit calls
to pool_init. Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.

Convert struct session, ucred and lockf to pools.


# 1.199 22-Apr-2004 matt

Constify protosw arrays. This can reduce the kernel .data section by
over 4K (if all the network protocols) are loaded.


# 1.198 01-Apr-2004 matt

In ip_reass_ttl_descr, make i signed since it's compared to >= 0


Revision tags: netbsd-2-0-base BEFORE-IPF411
# 1.197 24-Mar-2004 atatat

branches: 1.197.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.


# 1.196 15-Jan-2004 itojun

correct typo in 1.94 -> 1.95. pointed out by Shiva Shenoy


# 1.195 14-Dec-2003 thorpej

Fix syntax errors in CHECK_NMBCLUSTER_PARAMS().


# 1.194 14-Dec-2003 jonathan

Second part of hashed IP_reassembly changes:

When under pressure for mbufs or we have too many fragments in the IP
reassembly queue, drop half of all fragments. This multiplicative-drop
strategy ensures we return to a healthy state, even under borderline
denial-of-service from extremely lossy NFS-over-UDP peers.
The multiplicative-drop phase currently drops 50% of fragments, but
has pre-placed support for implementing drop-fractions other than 50%

The threshhold for the `drop-half' phase is the new variable,
ip_maxfrags which is calculated as nmbclusters/4.

ip_input.c now keeps ip_nmbclusters, a cached copy of nmbclusters.
Before using limits derived from nmbclusters, we check if nmbclusters
and ip_nmclusters are equal. If not, we recompute Ip parameters
derived from nmbclusters. Based on a suggestion by Jason Thorpe.
ip_maxfrags is currently auto-recalcuated.

The counters ip_nfrags and ip_nfragpacketsr are now declared static
and uninitialized (bss), to discourage tampering with them.


# 1.193 12-Dec-2003 scw

Make fast-ipsec and ipflow (Fast Forwarding) interoperate.

The idea is that we only clear M_CANFASTFWD if an SPD exists
for the packet. Otherwise, it's safe to add a fast-forward
cache entry for the route.

To make this work properly, we invalidate the entire ipflow
cache if a fast-ipsec key is added or changed.


# 1.192 08-Dec-2003 jonathan

Add new field ipq_nfrags to struct ipq. Maintain count of fragments
(fragments, not fragmented packets) in each queue entry.
Use ipq_nfrags to maintain a count of total fragments in reassembly queue.


# 1.191 07-Dec-2003 jonathan

KNF: s/unsigned/u_int/, in a couple of places I missed.


# 1.190 06-Dec-2003 jonathan

Replace the single global IP reassembly list/listhead, with a
hashtable of list-heads. Independently re-invented, then reworked to
match similar code in FreeBSD.


# 1.189 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.188 04-Dec-2003 scw

ipflow (IP fast forwarding) is not compatible with FAST_IPSEC either.

XXX: The decision whether or not to fast forward should be made
XXX: dynamically. Using the current approach seriously reduces
XXX: routing performance on gateways with IPsec enabled.


# 1.187 26-Nov-2003 itojun

define RANDOM_IP_ID by default (unifdef -DRANDOM_IP_ID).
one use remains in sys/netipsec, which is kept for freebsd source code compat.


# 1.186 24-Nov-2003 scw

For FAST_IPSEC, ipfilter gets to see wire-format IPsec-encapsulated packets
only. Decapsulated packets bypass ipfilter. This mimics current behaviour
for Kame IPsec.


# 1.185 19-Nov-2003 fvdl

Correct number of arguments to sysctl_rdint.


# 1.184 19-Nov-2003 jonathan

Patch back support for (badly) randomized IP ids, by request:

* Include "opt_inet.h" everywhere IP-ids are generated with ip_newid(),
so the RANDOM_IP_ID option is visible. Also in ip_id(), to ensure
the prototype for ip_randomid() is made visible.

* Add new sysctl to enable randomized IP-ids, provided the kernel was
configured with RANDOM_IP_ID. (The sysctl defaults to zero, and is
a read-only zero if RANDOM_IP_ID is not configured).

Note that the implementation of randomized IP ids is still defective,
and should not be enabled at all (even if configured) without
very careful deliberation. Caveat emptor.


# 1.183 17-Nov-2003 jonathan

Diff to netinet/ip_input.c (restore ip_id, initialize) for ip_id fix:

Revert the (default) ip_id algorithm to the pre-randomid algorithm,
due to demonstrated low-period repeated IDs from the randomized IP_id
code. Consensus is that the low-period repetition (much less than
2^15) is not suitable for general-purpose use.

Allocators of new IPv4 IDs should now call the function ip_newid().
Randomized IP_ids is now a config-time option, "options RANDOM_IP_ID".
ip_newid() can use ip_random-id()_IP_ID if and only if configured
with RANDOM_IP_ID. A sysctl knob should be provided.

This API may be reworked in the near future to support linear ip_id
counters per (src,dst) IP-address pair.


# 1.182 12-Nov-2003 itojun

KNF


# 1.181 11-Nov-2003 jonathan

Change global head-of-local-IP-address list from in_ifaddr to
in_ifaddrhead. Recent changes in struct names caused a namespace
collision in fast-ipsec, which are most cleanly fixed by using
"in_ifaddrhead" as the listhead name.


# 1.180 10-Nov-2003 jonathan

Make per-protocol network input queue stats visible to userland via
sysctl. Add a protocol-independent sysctl handler to show the per-protocol
"struct ifq' statistics. Add IP(v4) specific call to the handler.
Other protocols can show their per-protocol input statistics by
allocating a sysclt node and calling sysctl_ifq() with their own struct ifq *.

As posted to tech-kern plus improvements/cleanup suggested by Andrew Brown.


# 1.179 28-Sep-2003 mycroft

Remove some code that breaks AH tunnels completely. The comment describing
the purpose of this code appears to be on crack -- it's talking about
end-to-end authentication, but the purpose of an AH tunnel is NOT end-to-end
authentication; it's authentication of the tunnel endpoints.

NB: This does not fix the fact that IPsec leaks "packet tags."


# 1.178 06-Sep-2003 itojun

randomize IPv4/v6 fragment ID and IPv6 flowlabel. avoids predictability
of these fields. ip_id.c is from openbsd. ip6_id.c is adapted by kame.


# 1.177 06-Sep-2003 itojun

backout previous, we don't know if arc4random() corrides on reboot.


# 1.176 05-Sep-2003 itojun

initialize fragment ID with arc4random, not by time.tv_sec


# 1.175 22-Aug-2003 itojun

remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.


# 1.174 22-Aug-2003 itojun

change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.


# 1.173 15-Aug-2003 jonathan

(fast-ipsec): Add hooks to pass IPv4 IPsec traffic into fast-ipsec, if
configured with ``options FAST_IPSEC''. Kernels with KAME IPsec or
with no IPsec should work as before.

All calls to ip_output() now always pass an additional compulsory
argument: the inpcb associated with the packet being sent,
or 0 if no inpcb is available.

Fast-ipsec tested with ICMP or UDP over ESP. TCP doesn't work, yet.


# 1.172 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.171 14-Jul-2003 itojun

correct igmp. from love


# 1.170 03-Jul-2003 itojun

minor KNF


# 1.169 30-Jun-2003 itojun

branches: 1.169.2;
do not generate ICMP redirect when packet filter alters ip_dst to an
address that reside on the same link. Cedric Berger convinced me that
it is necessary.


# 1.168 30-Jun-2003 itojun

fix indent


# 1.167 23-Jun-2003 martin

Make sure to include opt_foo.h if a defflag option FOO is used.


# 1.166 15-Jun-2003 matt

Change the way multicasts are kept. They now use a hash table in the same
manner as the ifaddr hash table. By doing this, the mkludge code can go
away. At the same time, keep track of what pcbs are using what ifaddr and
when an address is deleted from an interface, notify/abort all sockets
that have that address as a source. Switch IGMP and multicasts to use pools
for allocation. Fix a number of potential problems in the igmp code where
allocation failures could cause a trap/panic.


# 1.165 11-Apr-2003 christos

PR/991: Darren Reed: Add a sysctl (checkinteface) to implement this. This
implementation is taken from FreeBSD, but we default to off.
XXX: We should really do this on a per ifaddr basis as jason suggested.


# 1.164 26-Feb-2003 matt

Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.163 12-Nov-2002 itojun

remove all entries in rt timer queue on ip_mtudisc change, instead of
destroying the queue.


# 1.162 12-Nov-2002 itojun

ckout previous - doesn't compile


# 1.161 12-Nov-2002 itojun

update ip_mtudisc sysctl change handling.


# 1.160 10-Nov-2002 itojun

always create pmtud timeout queue, as ip_mtudisc can be tweaked via
sysctl at runtime. From lha@stacken.kth.se


# 1.159 02-Nov-2002 perry

/*CONTCOND*/ while (0)'ed macros


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.158 23-Sep-2002 itojun

revert mtudisc_timeout value to the old one if update falis


# 1.157 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame.


# 1.156 11-Sep-2002 itojun

correct signedness mixup in pointer passing. sync w/kame


Revision tags: gehenna-devsw-base
# 1.155 14-Aug-2002 itojun

avoid swapping endian of ip_len and ip_off on mbuf, to meet with M_LEADINGSPACE
optimization made last year. should solve PR 17867 and 10195.

IP_HDRINCL behavior of raw ip socket is kept unchanged. we may want to
provide IP_HDRINCL variant that does not swap endian.


# 1.154 30-Jun-2002 thorpej

Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).


# 1.153 13-Jun-2002 itojun

set IPv4 parameter to modern value.
- turn on path MTU discovery (previous: turned off)
- ICMPv4 redirect entry timeout = 600 sec (previous: never timeout)


# 1.152 09-Jun-2002 itojun

whitespace


# 1.151 07-Jun-2002 itojun

look at rmx_mtu on IPsec tunnel MTU computation.
From: David Waitzman <djw@bbn.com>


Revision tags: netbsd-1-6-base
# 1.150 12-May-2002 matt

branches: 1.150.2; 1.150.4;
Eliminate commons.


# 1.149 12-May-2002 wiz

Spelling fixes, from Sergey Svishchev in kern/16650.


# 1.148 07-May-2002 matt

Change struct ipqe to use TAILQ's instead of LIST's (primarily for TCP's
benefit currently). Rework tcp_reass code to optimize the 4 most likely causes
of out-of-order packets: first OoO pkt, next OoO pkt in seq, OoO pkt is part
of new chuck of OoO packets, and the OoO pkt fills the first hole. Add evcnts
to instrument tcp_reass (enabled by the options TCP_REASS_COUNTERS). This is
part 1/2 of tcp_reass changes.


# 1.147 18-Apr-2002 matt

Change test for M_EXT to M_READONLY for MROUTING. We only need to to do
a pullup if we aren't allowed to modify the packet.


Revision tags: eeh-devprop-base newlock-base
# 1.146 08-Mar-2002 thorpej

Pool deals fairly well with physical memory shortage, but it doesn't
deal with shortages of the VM maps where the backing pages are mapped
(usually kmem_map). Try to deal with this:

* Group all information about the backend allocator for a pool in a
separate structure. The pool references this structure, rather than
the individual fields.
* Change the pool_init() API accordingly, and adjust all callers.
* Link all pools using the same backend allocator on a list.
* The backend allocator is responsible for waiting for physical memory
to become available, but will still fail if it cannot callocate KVA
space for the pages. If this happens, carefully drain all pools using
the same backend allocator, so that some KVA space can be freed.
* Change pool_reclaim() to indicate if it actually succeeded in freeing
some pages, and use that information to make draining easier and more
efficient.
* Get rid of PR_URGENT. There was only one use of it, and it could be
dealt with by the caller.

From art@openbsd.org.


Revision tags: ifpoll-base
# 1.145 25-Feb-2002 itojun

correctly enforce ipsec policy check on forwarding case.
From: Greg Troxel <gdt@ir.bbn.com>, Bill Chiarchiaro <wjc@work.cleartech.com>


# 1.144 24-Feb-2002 martin

Clear M_BCAST and M_MCAST on outgoing mbufs.
Don't copy ttl from the inner packet to the encapsulating packet. Make
the outer ttl sysctl'able. This should close PR 14269 from Jasper Wallace
(change partly from there) and it makes traceroute work over gre tunnels.


# 1.143 21-Feb-2002 itojun

suppress source quence message, based on router-req RFC (also could be abused
as DoS traffic generator). from kjc/kame


# 1.142 28-Nov-2001 darrenr

recompute hlen after calling pfil_run_hooks() in case ip_hl was changed.


# 1.141 13-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-mips-cache-base
# 1.140 04-Nov-2001 matt

Convert netinet to not use the internal <sys/queue.h> field names
but instead the access macros. Use the FOREACH macros where appropriate.


# 1.139 04-Nov-2001 matt

Change a few variable/tables to const since they are read-only.


# 1.138 29-Oct-2001 simonb

Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.137 17-Sep-2001 thorpej

branches: 1.137.2;
Split the pre-computed ifnet checksum flags into Tx and Rx directions.
Add capabilities bits that indicate an interface can only perform
in-bound TCPv4 or UDPv4 checksums. There is at least one Gig-E chip
for which this is true (Level One LXT-1001), and this is also the
case for the Intel i82559 10/100 Ethernet chips.


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.136 06-Aug-2001 itojun

branches: 1.136.2;
cache IPsec policy on in6?pcb. most of the lookup operations can be bypassed,
especially when it is a connected SOCK_STREAM in6?pcb. sync with kame.


# 1.135 02-Jun-2001 thorpej

branches: 1.135.2;
Implement support for IP/TCP/UDP checksum offloading provided by
network interfaces. This works by pre-computing the pseudo-header
checksum and caching it, delaying the actual checksum to ip_output()
if the hardware cannot perform the sum for us. In-bound checksums
can either be fully-checked by hardware, or summed up for final
verification by software. This method was modeled after how this
is done in FreeBSD, although the code is significantly different in
most places.

We don't delay checksums for IPv6/TCP, but we do take advantage of the
cached pseudo-header checksum.

Note: hardware-assisted checksumming defaults to "off". It is
enabled with ifconfig(8). See the manual page for details.

Implement hardware-assisted checksumming on the DP83820 Gigabit Ethernet,
3c90xB/3c90xC 10/100 Ethernet, and Alteon Tigon/Tigon2 Gigabit Ethernet.


# 1.134 21-May-2001 lukem

fix spelo in comment


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.133 16-Apr-2001 itojun

give a default value to net.inet.ip.maxfragpackets, to protect us from
"lots of fragmented packets" DoS attack.

the current default value is derived from ipv6 counterpart, which is
a magical value "200". it should be enough for normal systems, not sure
if it is enough when you take hundreds of thousands of tcp connections on
your system. if you have proposal for a better value with concrete reasons,
let me know.


# 1.132 13-Apr-2001 thorpej

Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.


# 1.131 27-Mar-2001 itojun

net.inet.ip.maxfragpackets defines the maximum size of ip reass queue
(prevents fragment flood from chewing up mbuf memory space).
derived from KAME net.inet6.ip6.maxfragpackets.


# 1.130 02-Mar-2001 itojun

branches: 1.130.2;
increase ipstat.ips_badaddr if the packet fails to pass address checks.


# 1.129 02-Mar-2001 itojun

reject packets with 127/8 on IPv4 src/dst, they must not appear on wire
(RFC1122). torture-tests will be welcomed.
XXX do we want to check source routing headers as well?


# 1.128 01-Mar-2001 itojun

make sure to enforce inbound ipsec policy checking, for any protocols on top
of ip (check it when final header is visited). sync with kame.
XXX kame team will need to re-check policy engine code


# 1.127 24-Jan-2001 itojun

- record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation


# 1.126 28-Dec-2000 thorpej

Back out the sledgehammer damage applied by wiz while I was out for
the holiday.


# 1.125 25-Dec-2000 wiz

Back out previous change. It causes NAT to fail, and was CLEARLY
NOT TESTED before it was committed.


# 1.124 22-Dec-2000 thorpej

Slight adjustment to how pfil_head's are registered. Instead of a
"key" and a "dlt", use a "type" (PFIL_TYPE_{AF,IFNET} for now) and
a val/ptr appropriate for that type. This allows for more future
flexibility with the pfil_hook mechanism.


# 1.123 14-Dec-2000 thorpej

Add ALTQ glue. XXX Temporary until ALTQ is changed to use a pfil hook.


# 1.122 24-Nov-2000 itojun

IFA_STATS stability (not complete); don't touch ip if it is NULL.


# 1.121 11-Nov-2000 thorpej

Restructure the PFIL_HOOKS mechanism a bit:
- All packets are passed to PFIL_HOOKS as they come off the wire, i.e.
fields in protocol headers in network order, etc.
- Allow for multiple hooks to be registered, using a "key" and a "dlt".
The "dlt" is a BPF data link type, indicating what type of header is
present.
- INET and INET6 register with key == AF_INET or AF_INET6, and
dlt == DLT_RAW.
- PFIL_HOOKS now take an argument for the filter hook, and mbuf **,
an ifnet *, and a direction (PFIL_IN or PFIL_OUT), thus making them
less IP (really, IP Filter) centric.

Maintain compatibility with IP Filter by adding wrapper functions for
IP Filter.


# 1.120 08-Nov-2000 ad

Update for hashinit() change.


# 1.119 13-Oct-2000 itojun

make sure we don't share external mbuf between m and mcopy, in ip_forward().
should solve PR 11201.


# 1.118 26-Aug-2000 itojun

make sure anonport{min,max} is not negative number


# 1.117 25-Aug-2000 tron

Add new sysctl variables "net.inet.ip.lowportmin" and
"net.inet.ip.lowportmax" which can be used to the set minimum
and maximum port number assigned to sockets using
IP_PORTRANGE_LOW.


# 1.116 06-Jul-2000 itojun

remove unnecessary #include <netkey/key_debug.h>. from kame.


# 1.115 28-Jun-2000 mrg

<vm/vm.h> -> <uvm/uvm_extern.h>


Revision tags: netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.114 10-May-2000 itojun

branches: 1.114.4;
add missing boundary checks to ip options processing.
correct timestamp option validation (len and ptr upper/lower bound
based on RFC791).
fill "pointer" field for parameter problem in timestamp option processing.


# 1.113 10-May-2000 itojun

correct more out-of-bounds memory access, if cnt == 1 and optlen > 1.


# 1.112 06-May-2000 sommerfeld

Handle large offsets with very small options correctly.


# 1.111 31-Mar-2000 jdolecek

Slighly improve previous - only include <netinet/ip_mroute.h> if MROUTING
is defined.


# 1.110 31-Mar-2000 jdolecek

include <netinet/ip_mroute.h> for ip_mforward() - needed after
last duplicate prototype sweep (prototype for ip_mforward() used to be in <netinet/ip_var.h>)


# 1.109 30-Mar-2000 augustss

Remove register declarations.


# 1.108 30-Mar-2000 simonb

Delete uninitialised declaration of ip_defttl - there's an initialised
decl earlier in this file.


# 1.107 10-Mar-2000 thorpej

Back out previous, and adjust a comment.


# 1.106 07-Mar-2000 thorpej

Back out part of 1.104 which isn't actually needed.


# 1.105 03-Mar-2000 itojun

remove unnecessary ttl initialization which I mistakingly bringed in
during KAME merge (this is part of WIDE's expeirmental reass code...)
NetBSD PR: 9412
From: Wolfgang Rupprecht <wolfgang@wsrcc.com>
Fix from: ho@crt.se
itojun was notified from: theo


# 1.104 02-Mar-2000 thorpej

Avoid a bug in GCC which manifests itself when processing unaligned
IP options. Problem pointed out by Matt Hargett and Erik Fair, analyzed
by me.


# 1.103 01-Mar-2000 itojun

introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)


# 1.102 20-Feb-2000 darrenr

pass "struct pfil_head *" to pfil_add_hook and pfil_remove hook rather
than "struct protosw *".


# 1.101 17-Feb-2000 darrenr

Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.


# 1.100 16-Feb-2000 itojun

- if ip_dst matches address on !IFF_UP interface, and
- there's no match against addresses on IFF_UP interface,
send icmp unreach if I'm router. drop it if I'm host.

Revised version of PR: 9387 from nrt@iij.ad.jp. Discussed with thorpej+nrt.


Revision tags: chs-ubc2-newbase
# 1.99 12-Feb-2000 thorpej

Typo (Thanks, Havard :-)


# 1.98 12-Feb-2000 thorpej

Small cosmetic change, and note a place where a statistic should be
gathered.


# 1.97 11-Feb-2000 itojun

fix in-kernel packet forwarding loop (till TTL becomes 0) when:
- a packet is delivered to an address X,
- and the address X is configured on my !IFF_UP interface
- and ipforwarding=1

NetBSD PR: 9387
From: nrt@iij.ad.jp


# 1.96 01-Feb-2000 thorpej

Use ifatoia() and sintosa() consistently, rather than using home-grown
casting macros intermixed.


# 1.95 31-Jan-2000 itojun

bring in latest KAME ipsec tree.
- interop issues in ipcomp is fixed
- padding type (after ESP) is configurable
- key database memory management (need more fixes)
- policy specification is revisited

XXX m->m_pkthdr.rcvif is still overloaded - hope to fix it soon


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.94 26-Oct-1999 itojun

disable ipflow (IPv4 fast fowarding) when IPsec is configured into the kernel.


# 1.93 17-Oct-1999 sommerfeld

branches: 1.93.2; 1.93.4;
In ip_forward():

Avoid forwarding ip unicast packets which were contained inside
link-level multicast packets; having M_MCAST still set in the packet
header flags will mean that the packet will get multicast to a bogus
group instead of unicast to the next hop.

Malformed packets like this have occasionally been spotted "in the
wild" on a mediaone cable modem segment which also had multiple netbsd
machines running as router/NAT boxes.

Without this, any subnet with multiple netbsd routers receiving all
multicasts will generate a packet storm on receipt of such a
multicast. Note that we already do the same check here for link-level
broadcasts; ip6_forward already does this as well.

Note that multicast forwarding does not go through ip_forward().

Adding some code to if_ethersubr to sanity check link-level
vs. ip-level multicast addresses might also be worthwhile.


Revision tags: chs-ubc2-base
# 1.92 23-Jul-1999 itojun

branches: 1.92.2;
do not include unnecessary include files.


# 1.91 09-Jul-1999 thorpej

defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).


# 1.90 06-Jul-1999 itojun

sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups


# 1.89 01-Jul-1999 itojun

IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.88 26-Jun-1999 sommerfeld

If the new global variable hostzerobroadcast is zero, no longer assume
address zero of each net/subnet is a broadcast address.
(The default value is nonzero, which preserves the current behavior).

This can be set using sysctl; the boot-time default can also be
configured using the HOSTZEROBROADCAST kernel config option.

While we're here, defopt HOSTZEROBROADCAST and SUBNETSARELOCAL


# 1.87 04-May-1999 hwr

It does not make much sense to increase a "output" counter on input.


# 1.86 03-May-1999 thorpej

In INADDR_TO_IA(), skip interfaces which are not up. Revert previous change
to ip_input.c to check the interface status after INADDR_TO_IA().

Fix cooked up by Heiko Rupp and myself.

Fixes PR 7480.


# 1.85 03-May-1999 hwr

Drop packets, that have a Class-D address as source address.
Implements the first half of PR 7003.


# 1.84 07-Apr-1999 proff

tiny KNF change


# 1.83 07-Apr-1999 proff

Prevent reception of packets on downed interfaces (via an up interface).
fixes kern/7327


Revision tags: netbsd-1-4-base
# 1.82 27-Mar-1999 aidan

branches: 1.82.2;
Added per-addr input/output statistics. Currently just support netatalk
and netinet, currently only tested under netinet.

Disabled by default, enabled by compiling the kernel with option
IFA_STATS. Enabling this feature seems to make the ip_output function
take 13% longer than before, which should be OK for people that need
this feature.


# 1.81 26-Mar-1999 proff

security: test for ip_len < ip_hl <<2 and drop packet accordingly


# 1.80 19-Jan-1999 mycroft

There's just no plausible reason to byte-swap ip_id internally. It's opaque.


# 1.79 19-Jan-1999 mycroft

Don't screw with ip_len; just subtract from it where we actually use the
value.


# 1.78 19-Jan-1999 mycroft

Don't overwrite the checksum fields when checking them. There's no reason to
do this, and it screws up ICMP replies.
XXX The returned IP checksum and length are still wrong.


# 1.77 11-Jan-1999 thorpej

Fix byte order and ip_len inconsistencies in ICMP reply code. Also, fix
some formatting and HTONS(foo) vs. foo = htons(foo) inconsistencies.

PR #6602, Darren Reed.


# 1.76 19-Dec-1998 thorpej

Reverse the copyright-notice-swap. It went against existing practice.


# 1.75 18-Dec-1998 thorpej

Add a lock around the IP fragment reassembly queue, to prevent ip_drain()
from corrupting the queue if called from a device's interrupt context.

Should fix PR #5684.


Revision tags: kenh-if-detach-base
# 1.74 13-Nov-1998 thorpej

branches: 1.74.2;
Once a fragmented IP packet has been reassembled, recompute the packet
length before passing it up the stack. From FreeBSD.


Revision tags: chs-ubc-base
# 1.73 08-Oct-1998 thorpej

Use the pool allocator for ipflow entries.


# 1.72 08-Oct-1998 thorpej

Use the pool allocator for ipqent structures.


# 1.71 30-Sep-1998 tls

Switch order of TNF and UCB copyrights so UCB copyright is first; this seems more appropriate since UCB wrote the original code, after all.


# 1.70 09-Sep-1998 thorpej

Make a diagnostic printf more sensible, PR #5951, Heiko W. Rupp.


# 1.69 09-Aug-1998 mrg

defopt PFIL_HOOKS.


Revision tags: eeh-paddr_t-base
# 1.68 17-Jul-1998 sommerfe

Fix PR5508: ipfil cut-through forwarding causes panic


# 1.67 01-Jun-1998 thorpej

Protect the ipflow_reap() call with splsoftnet.


# 1.66 24-May-1998 thorpej

Fix OBOB in IP timestamp option processing, as noted in FreeBSD PR 6738,
from Jennifer Dawn Meyers <jdm@enteract.com>.


# 1.65 04-May-1998 matt

Default IP flow to being enabled. Add a sysctl to control the maximum
number of flows (net.inet.ip.maxflows). If set to 0, will disable fast
path forwarding.


# 1.64 01-May-1998 thorpej

Allow packet filters to prevent a packet from creating a fast-forwarding
flow, by setting the "can fast forward" flag in the packet header, and
giving a chance for filters to clear the flag. If the flag is still
set after the filters have given it a chance, the packet will be used
to create a fast-forward flow entry.


# 1.63 29-Apr-1998 matt

Add support for "fast" forwarding. Add hooks in if_ethersubr.c and
if_fddisubr.c to fastpath IP forwarding. If ip_forward successfully
forwards a packet, it will create a cache (ipflow) entry. ether_input
and fddi_input will first call ipflow_fastforward with the received
packet and if the packet passes enough tests, it will be forwarded (the
ttl is decremented and the cksum is adjusted incrementally).


# 1.62 29-Apr-1998 matt

defopt GATEWAY


# 1.61 29-Apr-1998 kml

change path MTU timeout value to match RFC 1191


# 1.60 29-Apr-1998 kml

Add support for deletion of routes added by path MTU discovery;
uses new generic route timeout code. Add sysctl for timeout period.


# 1.59 19-Mar-1998 mrg

convert pfil(9) in and out lists from <sys/queue.h> LISTs to TAILQs, and
change pfil_add_hook to put output filters at the tail of the queue,
while continuing to place input filters at the head of the queue. update
the two users of these functions, and document these changes.

fixes PR#4593.


# 1.58 15-Feb-1998 tls

Add correct copyright notice for IP address hash change. This code is donated to TNF by the original copyright holder, Panix.


# 1.57 13-Feb-1998 tls

Change list of interface IP addresses to a hash. Improves performance on hosts with a large number of IP addresses significantly.


# 1.56 28-Jan-1998 thorpej

Use offsetof() from libkern.h


# 1.55 12-Jan-1998 scottr

Use option header file for MROUTING


# 1.54 05-Jan-1998 lukem

enhance ephemeral port allocation code:
* support sysctl net.inet.ip.anonportmin (lowest ephemeral port)
and net.inet.ip.anonportmax (highest ephemeral port).
these can't be set to >65535, < IPPORT_RESERVED (unless IPNOPRIVPORTS
is defined), and anonportmin has to be < anonportmax.
* use a cleaner way of only cycling through the available set once;
this will be useful for when a random allocation scheme is used
* define IPPORT_ANON{MIN,MAX} instead of IPPORT_USER{LOW,HIGH}


Revision tags: netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base
# 1.53 18-Oct-1997 kml

branches: 1.53.2;
change sysctl net.inet.icmp.mtudisc to net.inet.ip.mtudisc


# 1.52 17-Oct-1997 thorpej

Allow `subnetsarelocal' to be changed via sysctl.


Revision tags: thorpej-signal-base marc-pcmcia-base
# 1.51 29-Aug-1997 gwr

Tweaks to allow operation with an interface address of 0.0.0.0
(needed for NFS mountroot using BOOTP to get boot parameters)


Revision tags: marc-pcmcia-bp
# 1.50 24-Jun-1997 thorpej

branches: 1.50.4;
Eliminate use of dtom() from the network code, allowing more flexible
use of mbuf external storage and increasing performance (by eliminating
an m_pullup() for clusters in the IP reassembly code).

Changes from Koji Imada <koji@math.human.nagoya-u.ac.jp>, in PR #3628
and #3480, with ever-so-slight integration changes by me.


# 1.49 15-Apr-1997 christos

Move the mtod calls *after* we've made sure that the packet has passed the
filter successfully. Otherwise it can be NULL if the filter blocked it,
and we die. How did this ever work?


Revision tags: is-newarp-before-merge
# 1.48 26-Feb-1997 mrg

allow src-routed packetd by default, per host requirements


# 1.47 25-Feb-1997 cjs

Add net.inet.ip.allowsrcrt option which allows/drops all source
routed packets. This currently defaults to `drop,' but once we
verify that all applications that rely on determining remote IP
addresses for authentication are dropping the connection when they
see a source route option (not just disabling the source route
option), we can turn this back on and conform with the host
requirements.


# 1.46 19-Feb-1997 cjs

Fix bug in sysctl net.inet.ip.forwsrcrt handing: now you can read it
if securelevel > 0. (Thanks, cgd.)


# 1.45 18-Feb-1997 mrg

pseudo-device ipfilter brings in PFIL_HOOKS.


Revision tags: is-newarp-base
# 1.44 11-Jan-1997 thorpej

branches: 1.44.4;
Implement the IP_RECVIF socket option: supply a datagram packet's incoming
interface using a sockaddr_dl in a control mbuf.

Implement SO_TIMESTAMP for IP datagrams.

Move packet information option processing into a generic function
so that they work with multicast UDP and raw IP as well as unicast UDP.

Contributed by Bill Fenner <fenner@parc.xerox.com>.


# 1.43 20-Dec-1996 mrg

in pfil_hooks: always reassign ip after calling hook.


# 1.42 20-Dec-1996 mrg

remove pfil_bad.


# 1.41 25-Oct-1996 thorpej

Before concatenating frags, sanity check the length of the packet. If it's
larger than IP_MAXPACKET, discard it.
Based on a patch from Bill Fenner <fenner@parc.xerox.com>


# 1.40 22-Oct-1996 veego

Fix a panic from the pfil_hooks.


# 1.39 13-Oct-1996 christos

backout previous kprintf changes


# 1.38 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.37 21-Sep-1996 perry

commit fix in pr 2772 -- the IP input code was assuming that the
reserved (must be zero) flag must necessarily be zero. We now define
an IP_RF (by analogy to IP_DF and IP_MF) and mask it out when necessary.


# 1.36 14-Sep-1996 mrg

move the packet filter hooks in to a saner location. while i'm here, rename
PACKET_FILTER to PFIL_HOOKS.


# 1.35 09-Sep-1996 mycroft

Add in_nullhost() and in_hosteq() macros, to hide some protocol
details. Also, fix a bug in TCP wrt SYN+URG packets.


# 1.34 08-Sep-1996 mycroft

Save 68 bytes of the packet for ICMP, not 64. From Laine Stump, PR 2296.


# 1.33 06-Sep-1996 mrg

add packet filter interface code. see pfil(9) for more details. you
need the PACKET_FILTER option to enable this code. currently, ipfilter
version 3.1.1-beta has been converted to use this new interface.


# 1.32 14-Aug-1996 thorpej

Fix some DIAGNOSTIC printf() formats; ntohl() provides a 32-bit quantity,
and should be printed with %x, not %lx.


# 1.31 10-Jul-1996 cgd

print result of ntohl/htonl as a long. (makes -Wformat work on the
Alpha.)


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.30 16-Mar-1996 christos

branches: 1.30.4;
Fix printf format args.


# 1.29 26-Feb-1996 mrg

two more local addr changes, all done differently now (idea from charles)


# 1.28 13-Feb-1996 christos

netinet prototypes


# 1.27 16-Jan-1996 thorpej

Add a net.inet.ip.directed-broadcast sysctl as suggested by
Darren Reed <darrenr@vitruvius.arbld.unimelb.edu.au> in PR #1227.
This change is slightly different than the one submitted by Darren in
that the DIRECTED_BROADCAST compile-time option will behave like it used
to so that existing configurations utilizing it won't have to change.


# 1.26 15-Jan-1996 thorpej

Add net.inet.ip.forwsrcrt: if zero, the system will not forward
source-routed packets. Note this value is protected by kernel security
level; it can only be changed if securelevel < 1.


# 1.25 21-Nov-1995 cgd

make netinet work on systems where pointers and longs are 64 bits
(like the alpha). Biggest problem: IP headers were overlayed with
structure which included pointers, and which therefore didn't overlay
properly on 64-bit machines. Solution: instead of threading pointers
through IP header overlays, add a "queue element" structure to do
the threading, and point it at the ip headers.


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.24 12-Aug-1995 mycroft

splnet --> splsoftnet


# 1.23 12-Jun-1995 mycroft

Change in_pcbnotify*() to take an errno value. Make inetctlerrmap[] an
array on ints, not u_chars.


# 1.22 12-Jun-1995 mycroft

Various cleanup, including:
* Convert several data structures to use queue.h.
* Split in_pcbnotify() into two parts; one for notifying a specific PCB, and
one for notifying all PCBs for a particular foreign address.


# 1.21 07-Jun-1995 mycroft

Remove ip_ifmatrix completely.


# 1.20 04-Jun-1995 mycroft

Don't cast things unnecessarily.


# 1.19 04-Jun-1995 mycroft

Clean up many more casts.


# 1.18 01-Jun-1995 mycroft

Avoid byte-swapping IP addresses at run time.


# 1.17 15-May-1995 cgd

oops; forgot a '{'


# 1.16 14-May-1995 cgd

drop (and record) malformed IP fragments. Fixes pr 1030 (differently).


# 1.15 13-Apr-1995 cgd

be a bit more careful and explicit with types. (basically a large no-op.)


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.14 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.13 13-May-1994 mycroft

Update to 4.4-Lite networking code, with a few local changes.


# 1.12 14-Feb-1994 mycroft

PARANOID --> DIAGNOSTIC for inexpensive tests.


# 1.11 02-Feb-1994 hpeyerl

Multicast is no longer optional.


# 1.10 29-Jan-1994 brezak

Fix some cases of NOT dealing with m_pkthdr's. This code is still suspect though, at least this fixes some panics.


# 1.9 10-Jan-1994 mycroft

Should compile now with or without `options MULTICAST'.


# 1.8 09-Jan-1994 mycroft

Prototype the rest.


# 1.7 08-Jan-1994 mycroft

More prototypes.


# 1.6 08-Jan-1994 mycroft

Fix some inconsistent spacing; spaces at the end of lines, etc.


# 1.5 18-Dec-1993 mycroft

Canonicalize all #includes.


# 1.4 06-Dec-1993 hpeyerl

multicast support.
>From Chris Maeda, cmaeda@cs.washington.edu
These patches are derived from the IP Multicast patches for BSDI.


Revision tags: magnum-base netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.3 20-May-1993 cgd

branches: 1.3.4;
more rcsid additions and file header cleanups


# 1.2 04-May-1993 cgd

make ip_input recursion checking be for -DPARANOID, and make it panic


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.375 09-Feb-2018 maxv

Remove dead code.


# 1.374 07-Feb-2018 maxv

Remove null check on ip, it can't be null. (Confuses code scanners.)


# 1.373 06-Feb-2018 maxv

Typos and style a bit, no functional change.


# 1.372 05-Feb-2018 maxv

Exterminate IPSENDREDIRECTS and IPMTUDISCTIMEOUT, neither is documented.


# 1.371 05-Feb-2018 maxv

Nuke DIRECTED_BROADCAST, it is not documented and not enabled anywhere. It
probably wouldn't have built correctly anyway, since there is no associated
defflag.

These ten lines of code in ip_input.c already look a lot better.


# 1.370 05-Feb-2018 maxv

Clean up this mess. This is typically the kind of places where we need to
seriously cut the bullshit. These things are unreadable, undocumented, and
all they bought us was not figuring out we had IPv4 forwarding enabled by
default for 20+ years.


# 1.369 05-Feb-2018 maxv

Be tougher, and don't allow LSRR+SSRR (RFC7126).


# 1.368 05-Feb-2018 maxv

Kick duplicate options, they are not allowed (RFC791).


# 1.367 05-Feb-2018 maxv

Remove unused variable.


# 1.366 05-Feb-2018 maxv

Disable ip_allowsrcrt and ip_forwsrcrt. Enabling them by default was a
completely dumb idea, because they have security implications.

By sending an IPv4 packet containing an LSRR option, an attacker will
cause the system to forward the packet to another IPv4 address - and
this way he white-washes the source of the packet.

It is also possible for an attacker to reach hidden networks: if a server
has a public address, and a private one on an internal network (network
which has several internal machines connected), the attacker can send a
packet with:

source = 0.0.0.0
destination = public address of the server
LSRR first address = address of a machine on the internal network

And the packet will be forwarded, by the server, to the internal machine,
in some cases even with the internal IP address of the server as a source.


# 1.365 05-Feb-2018 maxv

Style, no functional change.


# 1.364 01-Jan-2018 christos

1) "#define ipi_spec_dst ipi_addr" in <netinet/in.h>
2) Change the IP_RECVPKTINFO option to control the generation of
IP_PKTINFO control messages, the way it's done in Solaris.
3) Remove the superfluous IP_RECVPKTINFO control message.
4) Change the IP_PKTINFO option to do different things depending on
the parameter it's supplied with:
- If it's sizeof(int), assume it's being used as in Linux:
- If it's non-zero, turn on the IP_RECVPKTINFO option.
- If it's zero, turn off the IP_RECVPKTINFO option.
- If it's sizeof(struct in_pktinfo), assume it's being used as in
Solaris, to set a default for the source interface and/or
source address for outgoing packets on the socket.
5) Return what Linux or Solaris compatible code expects, depending
on data size, and just added a fallback to a Linux (and current NetBSD)
compatible value if the size is unknown (as it is now), or,
in the future, if the calling application specifies a receiving
buffer that doesn't match either data item.

From: Tom Ivar Helbekkmo


Revision tags: tls-maxphys-base-20171202
# 1.363 24-Nov-2017 roy

Allow local communication over DETACHED addresses.
Allow binding to DETACHED or TENTATIVE addresses as we deny
sending upstream from them anyway.
Prefer non DETACHED or TENTATIVE addresses.


# 1.362 17-Nov-2017 ozaki-r

Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch

It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.

No functional change


# 1.361 27-Sep-2017 ozaki-r

Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).


Revision tags: nick-nhusb-base-20170825
# 1.360 27-Jul-2017 ozaki-r

Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.


# 1.359 19-Jul-2017 ozaki-r

Correct a comment


Revision tags: perseant-stdc-iso10646-base
# 1.358 08-Jul-2017 christos

Reorder the controls to the ones that need an interface and the ones that
don't; process the ones that don't first. Add a DIAGNOSTIC if there is no
interface; really this should be a KASSERT/panic because it is a bug if the
interface is not set at this point.


# 1.357 06-Jul-2017 christos

remove unnecessary casts (no functional change)


# 1.356 06-Jul-2017 christos

Merge the two copies SO_TIMESTAMP/SO_OTIMESTAMP processing to a single
function, and add a SOOPT_TIMESTAMP define reducing compat pollution from
5 places to 1.


Revision tags: netbsd-8-base
# 1.355 01-Jun-2017 chs

branches: 1.355.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base
# 1.354 31-Mar-2017 ozaki-r

Don't use a single global variable to store source route information for multiple incoming packets

It's not MP-safe. So use a m_tag to store the information instead.

Pointed out by knakahara@
The fix is from OpenBSD (originally fixed in FreeBSD)


# 1.353 31-Mar-2017 ozaki-r

Don't use a single global variable as a temporal storage for multiple packets

It's not MP-safe. So use local variables instead.


Revision tags: pgoyette-localcount-20170320
# 1.352 06-Mar-2017 ozaki-r

Make sure icmp_redirect_timeout_q and ip_mtudisc_timeout_q are initialized on bootup

Fix PR kern/52029


# 1.351 17-Feb-2017 ozaki-r

Fix return value


# 1.350 17-Feb-2017 ozaki-r

Protect sysctl_net_inet_ip_pmtudto with icmp_mtx instead of softnet_lock


# 1.349 07-Feb-2017 ozaki-r

Add missing NULL checks for m_get_rcvif


Revision tags: nick-nhusb-base-20170204
# 1.348 24-Jan-2017 ozaki-r

Tweak softnet_lock and NET_MPSAFE

- Don't hold softnet_lock in some functions if NET_MPSAFE
- Add softnet_lock to sysctl_net_inet_icmp_redirtimeout
- Add softnet_lock to expire_upcalls of ip_mroute.c
- Restore softnet_lock for in{,6}_pcbpurgeif{,0} if NET_MPSAFE
- Mark some softnet_lock for future work


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107
# 1.347 12-Dec-2016 ozaki-r

branches: 1.347.2;
Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.346 08-Dec-2016 ozaki-r

Use psref for ip_rtaddr

ip_rtaddr will be sleepable soon. So use psref instead of pserialize.


# 1.345 08-Dec-2016 ozaki-r

Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.344 18-Oct-2016 ozaki-r

Don't hold global locks if NET_MPSAFE is enabled

If NET_MPSAFE is enabled, don't hold KERNEL_LOCK and softnet_lock in
part of the network stack such as IP forwarding paths. The aim of the
change is to make it easy to test the network stack without the locks
and reduce our local diffs.

By default (i.e., if NET_MPSAFE isn't enabled), the locks are held
as they used to be.

Reviewed by knakahara@


# 1.343 18-Oct-2016 ozaki-r

Avoid double frees of mbuf

May fix one of panicks reported by Tom Ivar Helbekkmo in PR kern/51522


# 1.342 11-Oct-2016 ozaki-r

Fix kernel builds with IFA_STATS


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.341 07-Sep-2016 roy

Disallow input to detached addresses because they are not yet valid.


# 1.340 31-Aug-2016 ozaki-r

Make ipforward_rt and ip6_forward_rt percpu

Sharing one rtcache between CPUs is just a bad idea.

Reviewed by knakahara@


Revision tags: pgoyette-localcount-20160806
# 1.339 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


# 1.338 26-Jul-2016 ozaki-r

Fix downmatch increment


Revision tags: pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.337 08-Jul-2016 ozaki-r

branches: 1.337.2;
CID 1363344: remove dead code

We may need to reconsider a case when m_get_rcvif_psref returns NULL.


# 1.336 07-Jul-2016 ozaki-r

Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.


# 1.335 06-Jul-2016 ozaki-r

Switch the IPv4 address list to pslist(9)

Note that we leave the old list just in case; it seems there are some
kvm(3) users accessing the list. We can remove it later if we confirmed
nobody does actually.


# 1.334 06-Jul-2016 ozaki-r

Add and use pslist(9)-based hashtable for IPv4 addresses

Note that we leave the old hashtable to keep vmstat -H working.


# 1.333 04-Jul-2016 ozaki-r

Separate IP address matching functions

No functional change intended.


# 1.332 30-Jun-2016 ozaki-r

Tidy up goto lables

No functional change.


# 1.331 30-Jun-2016 ozaki-r

Fix error paths

Some error paths did m_put_rcvif_psref twice.


# 1.330 28-Jun-2016 ozaki-r

Add missing NULL checks for m_get_rcvif_psref


# 1.329 10-Jun-2016 ozaki-r

Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319
# 1.328 21-Jan-2016 riastradh

Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.


# 1.327 21-Jan-2016 riastradh

Give proper prototype to ip_output.


# 1.326 08-Jan-2016 knakahara

eliminate ip_input.c and ip6_input.c dependency on gif(4)


Revision tags: nick-nhusb-base-20151226
# 1.325 13-Oct-2015 roy

Include arp.h to restore the sysctl net.inet.ip.dad_count.
Fixes PR kern/49883 thanks to HITOSHI Osada.


Revision tags: nick-nhusb-base-20150921
# 1.324 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.323 07-Aug-2015 ozaki-r

Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.


Revision tags: nick-nhusb-base-20150606
# 1.322 02-May-2015 joerg

Fix !ARP build.


# 1.321 02-May-2015 roy

Add IPv4 address flags IN_IFF_TENTATIVE, IN_IFF_DUPLICATED and
IN_IFF_DETATCHED to mimic the IPv6 address behaviour.
Add SIOCGIFAFLAG_IN ioctl to retrieve the address flag via the
ifreq structure.
Add IPv4 DAD detection via the ARP methods described in RFC 5227.
Add sysctls net.inet.ip.dad_count and net.inet.arp.debug.

Discussed on tech-net@


Revision tags: nick-nhusb-base-20150406
# 1.320 26-Mar-2015 ozaki-r

Tidy up the regular path of ip_forward

No functional change is intended.


Revision tags: netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.319 16-Jun-2014 ozaki-r

branches: 1.319.2; 1.319.4; 1.319.6; 1.319.10;
Add 3rd argument to pktq_create to pass sc

It will be used to pass bridge sc for bridge_forward softint.

ok rmind@


# 1.318 05-Jun-2014 rmind

- Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.


# 1.317 30-May-2014 christos

Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.


# 1.316 29-May-2014 rmind

Make IGMP and multicast group management code MP-safe. Use a read-write
lock to protect the hash table of multicast address records; also, make it
private and eliminate some macros. In the long term, the lookup path ought
to be optimised.


# 1.315 28-May-2014 christos

CID 12164{49,51}: Remove bogus ifp == NULL checks; if ifp was really NULL,
we would have been dead a few lines before the tests.


# 1.314 23-May-2014 rmind

ip_input(), ip_savecontrol(): cache m->m_pkthdr.rcvif in a variable.


# 1.313 23-May-2014 rmind

Make ip_forward() static, there is no need to expose it.


# 1.312 23-May-2014 rmind

Make ip_input() static, there is no need to expose it.


# 1.311 22-May-2014 rmind

- Add in_init() and move some functions, variables and sysctls into in.c
where they belong to. Make some functions and variables static.
- ip_input.c: reduce some #ifdefs, cleanup a little.
- Move some sysctls into ip_flow.c as they belong there.

No functional change.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.310 19-Mar-2014 liamjfoy

branches: 1.310.2;
Remove ipflow_prune and replace with ipflow_reap. ok rmind@


Revision tags: riastradh-drm2-base3
# 1.309 25-Feb-2014 pooka

Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.308 29-Jun-2013 rmind

- Rewrite parts of pfil(9): use array to store hooks and thus be more cache
friendly (there are only few hooks in the system). Make the structures
opaque and the interface more strict.
- Remove PFIL_HOOKS option by making pfil(9) mandatory.


# 1.307 27-Jun-2013 christos

branches: 1.307.2;
flip src/dst


# 1.306 27-Jun-2013 christos

implement IP_PKTINFO and IP_RECVPKTINFO.


# 1.305 08-Jun-2013 rmind

Split IPsec code in ip_input() and ip_forward() into the separate routines
ipsec4_input() and ipsec4_forward(). Tested by christos@.


# 1.304 05-Jun-2013 christos

IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.


Revision tags: agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.303 29-Nov-2012 christos

Add a new sysctl to mark ports as reserved, so that they are not used in
the anonymous or reserved port allocation.


Revision tags: yamt-pagecache-base6
# 1.302 25-Jun-2012 christos

branches: 1.302.2;
rename rfc6056 -> portalgo, requested by yamt


# 1.301 22-Jun-2012 christos

PR/46602: Move the rfc6056 port randomization to the IP layer.


# 1.300 02-Jun-2012 dsl

Add some pre-processor magic to verify that the type of the data item
passed to sysctl_createv() actually matches the declared type for
the item itself.
In the places where the caller specifies a function and a structure
address (typically the 'softc') an explicit (void *) cast is now needed.
Fixes bugs in sys/dev/acpi/asus_acpi.c sys/dev/bluetooth/bcsp.c
sys/kern/vfs_bio.c sys/miscfs/syncfs/sync_subr.c and setting
AcpiGbl_EnableAmlDebugObject.
(mostly passing the address of a uint64_t when typed as CTLTYPE_INT).
I've test built quite a few kernels, but there may be some unfixed MD
fallout. Most likely passing &char[] to char *.
Also add CTLFLAG_UNSIGNED for unsiged decimals - not set yet.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.299 22-Mar-2012 drochner

remove KAME IPSEC, replaced by FAST_IPSEC


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.298 09-Jan-2012 liamjfoy

branches: 1.298.2; 1.298.6; 1.298.8;
check against NULL


# 1.297 19-Dec-2011 drochner

rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.296 31-Aug-2011 plunky

branches: 1.296.2; 1.296.6;
NULL does not need a cast


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.295 03-May-2011 dyoung

*_drain() routines may be called with locks held, so instead of doing
any work in *_drain(), set a drain-needed flag. Do the work in the
fasttimo handler.

Contributed by Coyote Point Systems, Inc.


# 1.294 14-Apr-2011 dyoung

In ipintr(), don't overwrite ipintrq.ifq_maxlen with IFQ_MAXLEN.

Initialize ipintrq.ifq_maxlen using IFQ_MAXLEN directly instead of using
the global ipqmaxlen. Get rid of the global ipqmaxlen.

Now it works again to override the maximum IP queue length with, for
example, sysctl -w net.inet.ip.ifq.maxlen=5.


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.293 13-Dec-2010 matt

branches: 1.293.2;
Back out rev that shouldn't have been committed.


# 1.292 11-Dec-2010 matt

Add routines to calculate a checkesum if the driver concludes that the
h/w can't do it.


Revision tags: uebayasi-xip-base4
# 1.291 05-Nov-2010 rmind

ip_randomid: make mechanism MP-safe and more modular.

OK matt@


# 1.290 05-Nov-2010 rmind

ip_reass_packet: finish abstraction; some clean-up.
Discussed some time ago with matt@.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.289 19-Jul-2010 rmind

Abstract IP reassembly into single generic routine - ip_reass_packet().
Make struct ipq private and struct ipqent not visible to userland.
Push ip_len adjustment into reassembly layer.

OK matt@


# 1.288 13-Jul-2010 rmind

Split-off IPv4 re-assembly mechanism into a separate module. Abstract
into ip_reass_init(), ip_reass_lookup(), etc (note: abstraction is not
yet complete). No functional changes to the actual mechanism.

OK matt@


# 1.287 09-Jul-2010 rmind

ip_input: move lookup for fragment queue a little bit further. OK matt@.


Revision tags: uebayasi-xip-base1
# 1.286 01-Apr-2010 tls

As suggested by at least 3 different people (the guilty parties know who
they are) avoid repeated kernel_lock/unlock by using an intrq on the stack.

About 5%-10% better from run to run, on my *very* simpleminded test. Can't
possibly be worse.


# 1.285 31-Mar-2010 tls

Don't hold kernel lock across call to ip_input() -- it blocked *all*
hardware interrupts for the length of time it took for all dequeued
packets to flow up the stack (on multiprocessors only). Initial testing
shows performance impact is minimal -- since this temporary fix actually
means taking/releasing the kernel lock per-packet, that seems
acceptable.

Holding the kernel lock across the ip_input() call duplicated the
exclusion intended to be provided by the socket locks/softnet lock
(same lock, for INET/INET6 sockets) and could mask serious bugs. Several
hours' testing didn't turn any up but I'd be surprised if some don't now
appear.

Damon Permezel noticed the problem. Temporary fix suggested by matt@.


Revision tags: yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.284 16-Sep-2009 pooka

branches: 1.284.2; 1.284.4;
Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.283 17-Jul-2009 minskim

Delete trailing whitespace.


Revision tags: yamt-nfs-mp-base6
# 1.282 16-Jul-2009 minskim

Add the IP_RECVTTL option support.

If the IP_RECVTTL option is enabled on a SOCK_DGRAM socket, the
recvmsg(2) call will return the TTL of the received datagram. The
msg_control field in the msghdr structure points to a buffer that
contains a cmsghdr structure followed by the TTL value.

Modeled after FreeBSD implementation.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.281 18-Apr-2009 tsutsui

Remove extra whitespace added by a stupid tool.
XXX: more in src/sys/arch


# 1.280 15-Apr-2009 elad

Remove a few KAUTH_GENERIC_ISSUSER in favor of more descriptive
alternatives.

Discussed on tech-kern:

http://mail-index.netbsd.org/tech-kern/2009/04/11/msg004798.html

Input from ad@, christos@, dyoung@, tsutsui@.

Okay ad@.


# 1.279 18-Mar-2009 cegger

bcopy -> memcpy


Revision tags: nick-hppapmap-base2
# 1.278 19-Jan-2009 christos

branches: 1.278.2;
Provide compatibility to the old timeval SCM_TIMESTAMP messages.


Revision tags: mjf-devfs2-base
# 1.277 17-Dec-2008 cegger

kill MALLOC and FREE macros.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.276 23-Nov-2008 rmind

ip_input: fix an IPQ "lock" leak. (hi <matt>!)


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4
# 1.275 04-Oct-2008 pooka

branches: 1.275.2; 1.275.4;
POOL_INIT -> pool_init


Revision tags: wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.274 05-Sep-2008 seanb

Wrong route being consulted in one place
in ip_forward() after change to rtcache_*().
Restore previous behaviour.


# 1.273 20-Aug-2008 matt

Make the sysctl routines take out softnet_lock before dealing with
any data structures.

Change inet6ctlerrmap and zeroin6_addr to const.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base
# 1.272 05-May-2008 ad

branches: 1.272.2; 1.272.6;
- Convert hashinit() to use kmem_alloc(). The hash tables can be large
and it's better to not have them in kmem_map.
- Convert a couple of minor items along the way to kmem_alloc().
- Fix some memory leaks.


# 1.271 04-May-2008 thorpej

Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577


# 1.270 02-May-2008 ad

PR kern/38497 Out of memory allocating ksiginfo

Work around: don't acquire softnet_lock in protocol drain routines.


# 1.269 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.268 24-Apr-2008 ad

branches: 1.268.2;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.


# 1.267 23-Apr-2008 thorpej

Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.266 12-Apr-2008 thorpej

branches: 1.266.2;
Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.


# 1.265 09-Apr-2008 thorpej

- ipflow is not used outside ip_flow.c; move its definition there.
- Make ipflow_reap() private to ip_flow.c, and introduce ipflow_prune()
for external callers to use (avoids returning an ipflow * that is never
actually used anyway).


# 1.264 07-Apr-2008 thorpej

Change IP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ipstat structure; old netstat
binaries will continue to work properly.


# 1.263 27-Mar-2008 cube

- Make sure we send a reasonable fragment size when IPSEC is configured.
Otherwise we end up sending a dubious "0" whenever we cannot find a
proper association for the packet.
- Reset sack_newdata along with snd_nxt to avoid improper integer
arithmetics that lead to sending data from an incorrect place in the
stream, making it appear as corrupted.

Patch by Michael Van Elst, based on an analysis by Michael for the IPSEC
stuff and I for the SACK issue.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.262 06-Feb-2008 matt

branches: 1.262.6;
Add a new ip_id generation scheme based on a Fisher-Yates shuffle over a
sliding window. XXX replace use of arc4random RSN.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.261 14-Jan-2008 dyoung

Use rtcache_validate() instead of rtcache_getrt(). Shorten staircase
in in_losing().


Revision tags: vmlocking2-base3 matt-armv6-base
# 1.260 22-Dec-2007 matt

Fix offset calculation.
Make sure that all frags use the same TOS.


# 1.259 21-Dec-2007 matt

Also make sure the first is at 68 bytes long.


# 1.258 21-Dec-2007 matt

Prevent TCP blind data attacks by not allowing non-initial fragments to
start at less than 68 bytes (minimal fragment size).


# 1.257 20-Dec-2007 dyoung

Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.256 26-Nov-2007 yamt

branches: 1.256.2; 1.256.6;
inetctlerrmap: use designated initializer.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.255 09-Nov-2007 kefren

Don't MCLAIM in ipintr() because we do it anyway in ip_input()


Revision tags: jmcneill-base yamt-x86pmap-base4 yamt-x86pmap-base3 yamt-x86pmap-base2 vmlocking-base
# 1.254 02-Oct-2007 dyoung

branches: 1.254.2; 1.254.4;
Delete the unused second argument to ip_stripoptions(), move it
closer to its single caller in if_eon.c, try to move fewer bytes
by moving the IP header forward instead of moving the tail of the
mbuf backward, and use m_adj(9) instead of fiddling directly with
mbuf data members.


Revision tags: yamt-x86pmap-base
# 1.253 11-Sep-2007 degroote

branches: 1.253.2;
In some FAST_IPSEC, spl level is not restored correctly. Fix that.

Spotted by Wolfgang Stukenbrock in pr/36800


Revision tags: nick-csl-alignment-base5
# 1.252 30-Aug-2007 dyoung

Use malloc(9) for sockaddrs instead of pool(9), and remove dom_sa_pool
and dom_sa_len members from struct domain. Pools of fixed-size
objects are too rigid for sockaddr_dls, whose size can vary over
a wide range.

Return sockaddr_dl to its "historical" size. Now that I'm using
malloc(9) instead of pool(9) to allocate sockaddr_dl, I can create
a sockaddr_dl of any size in the kernel, so expanding sockaddr_dl
is useless.

Avoid using sizeof(struct sockaddr_dl) in the kernel.

Introduce sockaddr_dl_alloc() for allocating & initializing an
arbitrary sockaddr_dl on the heap.

Add an argument, the sockaddr length, to sockaddr_alloc(),
sockaddr_copy(), and sockaddr_dl_setaddr().

Constify: LLADDR() -> CLLADDR().

Where the kernel overwrites LLADDR(), use sockaddr_dl_setaddr(),
instead. Used properly, sockaddr_dl_setaddr() will not overrun
the end of the sockaddr.


# 1.251 10-Aug-2007 dyoung

branches: 1.251.2;
Use sockaddr_dl_init().


Revision tags: matt-mips64-base
# 1.250 19-Jul-2007 dyoung

branches: 1.250.4; 1.250.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base yamt-idlelwp-base8 mjf-ufs-trans-base
# 1.249 02-May-2007 dyoung

branches: 1.249.2;
Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


Revision tags: thorpej-atomic-base
# 1.248 25-Mar-2007 liamjfoy

Add net.inet.ip.hashsize to control the IPv4 fast forward hash table size.


# 1.247 24-Mar-2007 liamjfoy

Don't call ip*flow_reap if we're just looking up maxflows


# 1.246 12-Mar-2007 ad

branches: 1.246.2; 1.246.4;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.


# 1.245 05-Mar-2007 liamjfoy

branches: 1.245.2;
Move ipflow_slowtimo from ip_slowtimo and into in_proto.c

ok matt@


# 1.244 04-Mar-2007 christos

Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.243 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.242 29-Jan-2007 dyoung

branches: 1.242.2;
Cosmetic: remove extraneous, non-KNF parentheses. Change a
sizeof(type) to a sizeof(*ptr) so the correctness of the statement
is correct "at a glance" (or so I hope).


# 1.241 22-Dec-2006 ad

ipintr(): check if the queue is empty before looping. Hardly a giant
win, but removed 30% of splnet() calls in one local test.


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.240 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.239 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


# 1.238 06-Dec-2006 dyoung

KNF.


# 1.237 06-Dec-2006 dyoung

KNF.


Revision tags: netbsd-4-0-RC1 netbsd-4-base
# 1.236 16-Nov-2006 christos

branches: 1.236.2; 1.236.4;
__unused removal on arguments; approved by core.


Revision tags: yamt-splraiseipl-base2
# 1.235 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


# 1.234 10-Oct-2006 dogcow

change the MOWNER_INIT define to take two args; fix extant struct mowner
decls to use it. Makes options MBUFTRACE compile again and not whinge about
missing structure declarations. (Also makes initialization consistent.)


# 1.233 05-Oct-2006 tls

Protect calls to pool_put/pool_get that may occur in interrupt context
with spl used to protect other allocations and frees, or datastructure
element insertion and removal, in adjacent code.

It is almost unquestionably the case that some of the spl()/splx() calls
added here are superfluous, but it really seems wrong to see:

s=splfoo();
/* frob data structure */
splx(s);
pool_put(x);

and if we think we need to protect the first operation, then it is hard
to see why we should not think we need to protect the next. "Better
safe than sorry".

It is also almost unquestionably the case that I missed some pool
gets/puts from interrupt context with my strategy for finding these
calls; use of PR_NOWAIT is a strong hint that a pool may be used from
interrupt context but many callers in the kernel pass a "can wait/can't
wait" flag down such that my searches might not have found them. One
notable area that needs to be looked at is pf.

See also:

http://mail-index.netbsd.org/tech-kern/2006/07/19/0003.html
http://mail-index.netbsd.org/tech-kern/2006/07/19/0009.html


# 1.232 19-Sep-2006 elad

Remove ugly (void *) casts from network scope authorization wrapper and
calls to it.

While here, adapt code for system scope listeners to avoid some more
casts (forgotten in previous run).

Update documentation.


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9
# 1.231 13-Sep-2006 elad

branches: 1.231.2;
Don't use KAUTH_RESULT_* where it's not applicable.
Prompted by yamt@.


# 1.230 08-Sep-2006 elad

First take at security model abstraction.

- Add a few scopes to the kernel: system, network, and machdep.

- Add a few more actions/sub-actions (requests), and start using them as
opposed to the KAUTH_GENERIC_ISSUSER place-holders.

- Introduce a basic set of listeners that implement our "traditional"
security model, called "bsd44". This is the default (and only) model we
have at the moment.

- Update all relevant documentation.

- Add some code and docs to help folks who want to actually use this stuff:

* There's a sample overlay model, sitting on-top of "bsd44", for
fast experimenting with tweaking just a subset of an existing model.

This is pretty cool because it's *really* straightforward to do stuff
you had to use ugly hacks for until now...

* And of course, documentation describing how to do the above for quick
reference, including code samples.

All of these changes were tested for regressions using a Python-based
testsuite that will be (I hope) available soon via pkgsrc. Information
about the tests, and how to write new ones, can be found on:

http://kauth.linbsd.org/kauthwiki

NOTE FOR DEVELOPERS: *PLEASE* don't add any code that does any of the
following:

- Uses a KAUTH_GENERIC_ISSUSER kauth(9) request,
- Checks 'securelevel' directly,
- Checks a uid/gid directly.

(or if you feel you have to, contact me first)

This is still work in progress; It's far from being done, but now it'll
be a lot easier.

Relevant mailing list threads:

http://mail-index.netbsd.org/tech-security/2006/01/25/0011.html
http://mail-index.netbsd.org/tech-security/2006/03/24/0001.html
http://mail-index.netbsd.org/tech-security/2006/04/18/0000.html
http://mail-index.netbsd.org/tech-security/2006/05/15/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/01/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/25/0000.html

Many thanks to YAMAMOTO Takashi, Matt Thomas, and Christos Zoulas for help
stablizing kauth(9).

Full credit for the regression tests, making sure these changes didn't break
anything, goes to Matt Fleming and Jaime Fournier.

Happy birthday Randi! :)


Revision tags: yamt-pdpolicy-base8 rpaulo-netinet-merge-pcb-base
# 1.229 30-Aug-2006 christos

branches: 1.229.2;
fix initializer


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.228 30-Jul-2006 elad

ugh.. more stuff that's overdue and should not be in 4.0: remove the
sysctl(9) flags CTLFLAG_READONLY[12]. luckily they're not documented
so it's only half regression.

only two knobs used them; proc.curproc.corename (check added in the
existing handler; its CTLFLAG_ANYWRITE, yay) and net.inet.ip.forwsrcrt,
that got its own handler now too.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base
# 1.227 07-Jun-2006 kardel

merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base simonb-timecounters-base
# 1.226 08-May-2006 liamjfoy

branches: 1.226.2;
#if -> #ifdef

ok christos


# 1.225 15-Apr-2006 christos

Coverity CID 1134: Protect against NULL deref.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.224 18-Feb-2006 joerg

branches: 1.224.2; 1.224.4; 1.224.6;
Print the source and destination IP in ip_forward's DIAGNOSTIC code
with inet_ntoa, making it more human friendly.

From Liam J. Foy in private mail.


# 1.223 24-Dec-2005 perry

branches: 1.223.2; 1.223.4; 1.223.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.222 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.221 01-Nov-2005 christos

Don't decrement the ttl, until we are sure that we can forward this packet.
Before if there was no route, we would call icmp_error with a datagram
packet that has an incorrect checksum. (From Liam Foy)


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base
# 1.220 23-Oct-2005 christos

No need to pass an interface when only the mtu is needed. From OpenBSD via
Liam Foy.


Revision tags: yamt-vop-base
# 1.219 05-Aug-2005 elad

branches: 1.219.2;
Add sysctls for IP, ICMP, TCP, and UDP statistics.


# 1.218 28-Jun-2005 seanb

branches: 1.218.2;
- Return ICMP_UNREACH_NET when no route found as per
section 4.3.3.1 of rfc1812.


# 1.217 09-Jun-2005 atatat

Properly fix the constipated lossage wrt -Wcast-qual and the sysctl
code. I know it's not the prettiest code, but it seems to work rather
well in spite of itself.


# 1.216 01-Jun-2005 blymn

Unconstify rnode to prevent compile error when GATEWAY option set.


Revision tags: kent-audio2-base
# 1.215 29-Apr-2005 yamt

move decl of inetsw to its own header to avoid array of incomplete type.
found by gcc4. reported by Adam Ciarcinski.


# 1.214 18-Apr-2005 yamt

fix problems related to loopback interface checksum omission. PR/29971.

- for ipv4, defer decision to ip layer as h/w checksum offloading does
so that it can check the actual interface the packet is going to.
- for ipv6, disable it.
(maybe will be revisited when it implements h/w checksum offloading.)

ok'ed by Jason Thorpe.


# 1.213 29-Mar-2005 yamt

ip_reass: clear stale csum_flags.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base
# 1.212 26-Feb-2005 perry

branches: 1.212.2;
nuke trailing whitespace


Revision tags: yamt-km-base2
# 1.211 03-Feb-2005 perry

ANSIfy function declarations


# 1.210 02-Feb-2005 perry

de-__P -- will ANSIfy .c files later.


Revision tags: yamt-km-base
# 1.209 24-Jan-2005 matt

branches: 1.209.2;
Add IFNET_FOREACH and IFADDR_FOREACH macros and start using them.


Revision tags: kent-audio1-beforemerge
# 1.208 19-Dec-2004 christos

branches: 1.208.2;
yamt's changes seem to fix all the checksumming issues. Turn the loopback
checksums back off so we can make sure that everything works.


# 1.207 17-Dec-2004 christos

Turn checksumming on loopback back on until we fix the bugs in it.
Connect over tcp on the loopback is broken:

4729 amq 0.000007 CALL connect(4,0x804f2a0,0x1c)
4729 amq 75.007420 RET connect -1 errno 60 Connection timed out


# 1.206 15-Dec-2004 thorpej

Don't perform checksums on loopback interfaces. They can be reenabled with
the net.inet.*.do_loopback_cksum sysctl.

Approved by: groo


Revision tags: kent-audio1-base
# 1.205 06-Oct-2004 darrenr

Add a comment to document what setting "srcrt" is really on about in ipintr()


# 1.204 29-Sep-2004 christos

PR/27081: Sean Boudreau: ip_input() bad csum count not incremented on sw csum


Revision tags: BEFORE-IPF413
# 1.203 25-May-2004 atatat

Sysctl descriptions under net subtree (net.key not done)


# 1.202 02-May-2004 darrenr

at line 543, we do a pullup here of hlen bytes into the mbuf,
so these later ones are superfluous.


# 1.201 01-May-2004 matt

Use EVCNT_ATTACH_STATIC{,2}


# 1.200 25-Apr-2004 simonb

Initialise (most) pools from a link set instead of explicit calls
to pool_init. Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.

Convert struct session, ucred and lockf to pools.


# 1.199 22-Apr-2004 matt

Constify protosw arrays. This can reduce the kernel .data section by
over 4K (if all the network protocols) are loaded.


# 1.198 01-Apr-2004 matt

In ip_reass_ttl_descr, make i signed since it's compared to >= 0


Revision tags: netbsd-2-0-base BEFORE-IPF411
# 1.197 24-Mar-2004 atatat

branches: 1.197.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.


# 1.196 15-Jan-2004 itojun

correct typo in 1.94 -> 1.95. pointed out by Shiva Shenoy


# 1.195 14-Dec-2003 thorpej

Fix syntax errors in CHECK_NMBCLUSTER_PARAMS().


# 1.194 14-Dec-2003 jonathan

Second part of hashed IP_reassembly changes:

When under pressure for mbufs or we have too many fragments in the IP
reassembly queue, drop half of all fragments. This multiplicative-drop
strategy ensures we return to a healthy state, even under borderline
denial-of-service from extremely lossy NFS-over-UDP peers.
The multiplicative-drop phase currently drops 50% of fragments, but
has pre-placed support for implementing drop-fractions other than 50%

The threshhold for the `drop-half' phase is the new variable,
ip_maxfrags which is calculated as nmbclusters/4.

ip_input.c now keeps ip_nmbclusters, a cached copy of nmbclusters.
Before using limits derived from nmbclusters, we check if nmbclusters
and ip_nmclusters are equal. If not, we recompute Ip parameters
derived from nmbclusters. Based on a suggestion by Jason Thorpe.
ip_maxfrags is currently auto-recalcuated.

The counters ip_nfrags and ip_nfragpacketsr are now declared static
and uninitialized (bss), to discourage tampering with them.


# 1.193 12-Dec-2003 scw

Make fast-ipsec and ipflow (Fast Forwarding) interoperate.

The idea is that we only clear M_CANFASTFWD if an SPD exists
for the packet. Otherwise, it's safe to add a fast-forward
cache entry for the route.

To make this work properly, we invalidate the entire ipflow
cache if a fast-ipsec key is added or changed.


# 1.192 08-Dec-2003 jonathan

Add new field ipq_nfrags to struct ipq. Maintain count of fragments
(fragments, not fragmented packets) in each queue entry.
Use ipq_nfrags to maintain a count of total fragments in reassembly queue.


# 1.191 07-Dec-2003 jonathan

KNF: s/unsigned/u_int/, in a couple of places I missed.


# 1.190 06-Dec-2003 jonathan

Replace the single global IP reassembly list/listhead, with a
hashtable of list-heads. Independently re-invented, then reworked to
match similar code in FreeBSD.


# 1.189 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.188 04-Dec-2003 scw

ipflow (IP fast forwarding) is not compatible with FAST_IPSEC either.

XXX: The decision whether or not to fast forward should be made
XXX: dynamically. Using the current approach seriously reduces
XXX: routing performance on gateways with IPsec enabled.


# 1.187 26-Nov-2003 itojun

define RANDOM_IP_ID by default (unifdef -DRANDOM_IP_ID).
one use remains in sys/netipsec, which is kept for freebsd source code compat.


# 1.186 24-Nov-2003 scw

For FAST_IPSEC, ipfilter gets to see wire-format IPsec-encapsulated packets
only. Decapsulated packets bypass ipfilter. This mimics current behaviour
for Kame IPsec.


# 1.185 19-Nov-2003 fvdl

Correct number of arguments to sysctl_rdint.


# 1.184 19-Nov-2003 jonathan

Patch back support for (badly) randomized IP ids, by request:

* Include "opt_inet.h" everywhere IP-ids are generated with ip_newid(),
so the RANDOM_IP_ID option is visible. Also in ip_id(), to ensure
the prototype for ip_randomid() is made visible.

* Add new sysctl to enable randomized IP-ids, provided the kernel was
configured with RANDOM_IP_ID. (The sysctl defaults to zero, and is
a read-only zero if RANDOM_IP_ID is not configured).

Note that the implementation of randomized IP ids is still defective,
and should not be enabled at all (even if configured) without
very careful deliberation. Caveat emptor.


# 1.183 17-Nov-2003 jonathan

Diff to netinet/ip_input.c (restore ip_id, initialize) for ip_id fix:

Revert the (default) ip_id algorithm to the pre-randomid algorithm,
due to demonstrated low-period repeated IDs from the randomized IP_id
code. Consensus is that the low-period repetition (much less than
2^15) is not suitable for general-purpose use.

Allocators of new IPv4 IDs should now call the function ip_newid().
Randomized IP_ids is now a config-time option, "options RANDOM_IP_ID".
ip_newid() can use ip_random-id()_IP_ID if and only if configured
with RANDOM_IP_ID. A sysctl knob should be provided.

This API may be reworked in the near future to support linear ip_id
counters per (src,dst) IP-address pair.


# 1.182 12-Nov-2003 itojun

KNF


# 1.181 11-Nov-2003 jonathan

Change global head-of-local-IP-address list from in_ifaddr to
in_ifaddrhead. Recent changes in struct names caused a namespace
collision in fast-ipsec, which are most cleanly fixed by using
"in_ifaddrhead" as the listhead name.


# 1.180 10-Nov-2003 jonathan

Make per-protocol network input queue stats visible to userland via
sysctl. Add a protocol-independent sysctl handler to show the per-protocol
"struct ifq' statistics. Add IP(v4) specific call to the handler.
Other protocols can show their per-protocol input statistics by
allocating a sysclt node and calling sysctl_ifq() with their own struct ifq *.

As posted to tech-kern plus improvements/cleanup suggested by Andrew Brown.


# 1.179 28-Sep-2003 mycroft

Remove some code that breaks AH tunnels completely. The comment describing
the purpose of this code appears to be on crack -- it's talking about
end-to-end authentication, but the purpose of an AH tunnel is NOT end-to-end
authentication; it's authentication of the tunnel endpoints.

NB: This does not fix the fact that IPsec leaks "packet tags."


# 1.178 06-Sep-2003 itojun

randomize IPv4/v6 fragment ID and IPv6 flowlabel. avoids predictability
of these fields. ip_id.c is from openbsd. ip6_id.c is adapted by kame.


# 1.177 06-Sep-2003 itojun

backout previous, we don't know if arc4random() corrides on reboot.


# 1.176 05-Sep-2003 itojun

initialize fragment ID with arc4random, not by time.tv_sec


# 1.175 22-Aug-2003 itojun

remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.


# 1.174 22-Aug-2003 itojun

change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.


# 1.173 15-Aug-2003 jonathan

(fast-ipsec): Add hooks to pass IPv4 IPsec traffic into fast-ipsec, if
configured with ``options FAST_IPSEC''. Kernels with KAME IPsec or
with no IPsec should work as before.

All calls to ip_output() now always pass an additional compulsory
argument: the inpcb associated with the packet being sent,
or 0 if no inpcb is available.

Fast-ipsec tested with ICMP or UDP over ESP. TCP doesn't work, yet.


# 1.172 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.171 14-Jul-2003 itojun

correct igmp. from love


# 1.170 03-Jul-2003 itojun

minor KNF


# 1.169 30-Jun-2003 itojun

branches: 1.169.2;
do not generate ICMP redirect when packet filter alters ip_dst to an
address that reside on the same link. Cedric Berger convinced me that
it is necessary.


# 1.168 30-Jun-2003 itojun

fix indent


# 1.167 23-Jun-2003 martin

Make sure to include opt_foo.h if a defflag option FOO is used.


# 1.166 15-Jun-2003 matt

Change the way multicasts are kept. They now use a hash table in the same
manner as the ifaddr hash table. By doing this, the mkludge code can go
away. At the same time, keep track of what pcbs are using what ifaddr and
when an address is deleted from an interface, notify/abort all sockets
that have that address as a source. Switch IGMP and multicasts to use pools
for allocation. Fix a number of potential problems in the igmp code where
allocation failures could cause a trap/panic.


# 1.165 11-Apr-2003 christos

PR/991: Darren Reed: Add a sysctl (checkinteface) to implement this. This
implementation is taken from FreeBSD, but we default to off.
XXX: We should really do this on a per ifaddr basis as jason suggested.


# 1.164 26-Feb-2003 matt

Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.163 12-Nov-2002 itojun

remove all entries in rt timer queue on ip_mtudisc change, instead of
destroying the queue.


# 1.162 12-Nov-2002 itojun

ckout previous - doesn't compile


# 1.161 12-Nov-2002 itojun

update ip_mtudisc sysctl change handling.


# 1.160 10-Nov-2002 itojun

always create pmtud timeout queue, as ip_mtudisc can be tweaked via
sysctl at runtime. From lha@stacken.kth.se


# 1.159 02-Nov-2002 perry

/*CONTCOND*/ while (0)'ed macros


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.158 23-Sep-2002 itojun

revert mtudisc_timeout value to the old one if update falis


# 1.157 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame.


# 1.156 11-Sep-2002 itojun

correct signedness mixup in pointer passing. sync w/kame


Revision tags: gehenna-devsw-base
# 1.155 14-Aug-2002 itojun

avoid swapping endian of ip_len and ip_off on mbuf, to meet with M_LEADINGSPACE
optimization made last year. should solve PR 17867 and 10195.

IP_HDRINCL behavior of raw ip socket is kept unchanged. we may want to
provide IP_HDRINCL variant that does not swap endian.


# 1.154 30-Jun-2002 thorpej

Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).


# 1.153 13-Jun-2002 itojun

set IPv4 parameter to modern value.
- turn on path MTU discovery (previous: turned off)
- ICMPv4 redirect entry timeout = 600 sec (previous: never timeout)


# 1.152 09-Jun-2002 itojun

whitespace


# 1.151 07-Jun-2002 itojun

look at rmx_mtu on IPsec tunnel MTU computation.
From: David Waitzman <djw@bbn.com>


Revision tags: netbsd-1-6-base
# 1.150 12-May-2002 matt

branches: 1.150.2; 1.150.4;
Eliminate commons.


# 1.149 12-May-2002 wiz

Spelling fixes, from Sergey Svishchev in kern/16650.


# 1.148 07-May-2002 matt

Change struct ipqe to use TAILQ's instead of LIST's (primarily for TCP's
benefit currently). Rework tcp_reass code to optimize the 4 most likely causes
of out-of-order packets: first OoO pkt, next OoO pkt in seq, OoO pkt is part
of new chuck of OoO packets, and the OoO pkt fills the first hole. Add evcnts
to instrument tcp_reass (enabled by the options TCP_REASS_COUNTERS). This is
part 1/2 of tcp_reass changes.


# 1.147 18-Apr-2002 matt

Change test for M_EXT to M_READONLY for MROUTING. We only need to to do
a pullup if we aren't allowed to modify the packet.


Revision tags: eeh-devprop-base newlock-base
# 1.146 08-Mar-2002 thorpej

Pool deals fairly well with physical memory shortage, but it doesn't
deal with shortages of the VM maps where the backing pages are mapped
(usually kmem_map). Try to deal with this:

* Group all information about the backend allocator for a pool in a
separate structure. The pool references this structure, rather than
the individual fields.
* Change the pool_init() API accordingly, and adjust all callers.
* Link all pools using the same backend allocator on a list.
* The backend allocator is responsible for waiting for physical memory
to become available, but will still fail if it cannot callocate KVA
space for the pages. If this happens, carefully drain all pools using
the same backend allocator, so that some KVA space can be freed.
* Change pool_reclaim() to indicate if it actually succeeded in freeing
some pages, and use that information to make draining easier and more
efficient.
* Get rid of PR_URGENT. There was only one use of it, and it could be
dealt with by the caller.

From art@openbsd.org.


Revision tags: ifpoll-base
# 1.145 25-Feb-2002 itojun

correctly enforce ipsec policy check on forwarding case.
From: Greg Troxel <gdt@ir.bbn.com>, Bill Chiarchiaro <wjc@work.cleartech.com>


# 1.144 24-Feb-2002 martin

Clear M_BCAST and M_MCAST on outgoing mbufs.
Don't copy ttl from the inner packet to the encapsulating packet. Make
the outer ttl sysctl'able. This should close PR 14269 from Jasper Wallace
(change partly from there) and it makes traceroute work over gre tunnels.


# 1.143 21-Feb-2002 itojun

suppress source quence message, based on router-req RFC (also could be abused
as DoS traffic generator). from kjc/kame


# 1.142 28-Nov-2001 darrenr

recompute hlen after calling pfil_run_hooks() in case ip_hl was changed.


# 1.141 13-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-mips-cache-base
# 1.140 04-Nov-2001 matt

Convert netinet to not use the internal <sys/queue.h> field names
but instead the access macros. Use the FOREACH macros where appropriate.


# 1.139 04-Nov-2001 matt

Change a few variable/tables to const since they are read-only.


# 1.138 29-Oct-2001 simonb

Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.137 17-Sep-2001 thorpej

branches: 1.137.2;
Split the pre-computed ifnet checksum flags into Tx and Rx directions.
Add capabilities bits that indicate an interface can only perform
in-bound TCPv4 or UDPv4 checksums. There is at least one Gig-E chip
for which this is true (Level One LXT-1001), and this is also the
case for the Intel i82559 10/100 Ethernet chips.


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.136 06-Aug-2001 itojun

branches: 1.136.2;
cache IPsec policy on in6?pcb. most of the lookup operations can be bypassed,
especially when it is a connected SOCK_STREAM in6?pcb. sync with kame.


# 1.135 02-Jun-2001 thorpej

branches: 1.135.2;
Implement support for IP/TCP/UDP checksum offloading provided by
network interfaces. This works by pre-computing the pseudo-header
checksum and caching it, delaying the actual checksum to ip_output()
if the hardware cannot perform the sum for us. In-bound checksums
can either be fully-checked by hardware, or summed up for final
verification by software. This method was modeled after how this
is done in FreeBSD, although the code is significantly different in
most places.

We don't delay checksums for IPv6/TCP, but we do take advantage of the
cached pseudo-header checksum.

Note: hardware-assisted checksumming defaults to "off". It is
enabled with ifconfig(8). See the manual page for details.

Implement hardware-assisted checksumming on the DP83820 Gigabit Ethernet,
3c90xB/3c90xC 10/100 Ethernet, and Alteon Tigon/Tigon2 Gigabit Ethernet.


# 1.134 21-May-2001 lukem

fix spelo in comment


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.133 16-Apr-2001 itojun

give a default value to net.inet.ip.maxfragpackets, to protect us from
"lots of fragmented packets" DoS attack.

the current default value is derived from ipv6 counterpart, which is
a magical value "200". it should be enough for normal systems, not sure
if it is enough when you take hundreds of thousands of tcp connections on
your system. if you have proposal for a better value with concrete reasons,
let me know.


# 1.132 13-Apr-2001 thorpej

Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.


# 1.131 27-Mar-2001 itojun

net.inet.ip.maxfragpackets defines the maximum size of ip reass queue
(prevents fragment flood from chewing up mbuf memory space).
derived from KAME net.inet6.ip6.maxfragpackets.


# 1.130 02-Mar-2001 itojun

branches: 1.130.2;
increase ipstat.ips_badaddr if the packet fails to pass address checks.


# 1.129 02-Mar-2001 itojun

reject packets with 127/8 on IPv4 src/dst, they must not appear on wire
(RFC1122). torture-tests will be welcomed.
XXX do we want to check source routing headers as well?


# 1.128 01-Mar-2001 itojun

make sure to enforce inbound ipsec policy checking, for any protocols on top
of ip (check it when final header is visited). sync with kame.
XXX kame team will need to re-check policy engine code


# 1.127 24-Jan-2001 itojun

- record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation


# 1.126 28-Dec-2000 thorpej

Back out the sledgehammer damage applied by wiz while I was out for
the holiday.


# 1.125 25-Dec-2000 wiz

Back out previous change. It causes NAT to fail, and was CLEARLY
NOT TESTED before it was committed.


# 1.124 22-Dec-2000 thorpej

Slight adjustment to how pfil_head's are registered. Instead of a
"key" and a "dlt", use a "type" (PFIL_TYPE_{AF,IFNET} for now) and
a val/ptr appropriate for that type. This allows for more future
flexibility with the pfil_hook mechanism.


# 1.123 14-Dec-2000 thorpej

Add ALTQ glue. XXX Temporary until ALTQ is changed to use a pfil hook.


# 1.122 24-Nov-2000 itojun

IFA_STATS stability (not complete); don't touch ip if it is NULL.


# 1.121 11-Nov-2000 thorpej

Restructure the PFIL_HOOKS mechanism a bit:
- All packets are passed to PFIL_HOOKS as they come off the wire, i.e.
fields in protocol headers in network order, etc.
- Allow for multiple hooks to be registered, using a "key" and a "dlt".
The "dlt" is a BPF data link type, indicating what type of header is
present.
- INET and INET6 register with key == AF_INET or AF_INET6, and
dlt == DLT_RAW.
- PFIL_HOOKS now take an argument for the filter hook, and mbuf **,
an ifnet *, and a direction (PFIL_IN or PFIL_OUT), thus making them
less IP (really, IP Filter) centric.

Maintain compatibility with IP Filter by adding wrapper functions for
IP Filter.


# 1.120 08-Nov-2000 ad

Update for hashinit() change.


# 1.119 13-Oct-2000 itojun

make sure we don't share external mbuf between m and mcopy, in ip_forward().
should solve PR 11201.


# 1.118 26-Aug-2000 itojun

make sure anonport{min,max} is not negative number


# 1.117 25-Aug-2000 tron

Add new sysctl variables "net.inet.ip.lowportmin" and
"net.inet.ip.lowportmax" which can be used to the set minimum
and maximum port number assigned to sockets using
IP_PORTRANGE_LOW.


# 1.116 06-Jul-2000 itojun

remove unnecessary #include <netkey/key_debug.h>. from kame.


# 1.115 28-Jun-2000 mrg

<vm/vm.h> -> <uvm/uvm_extern.h>


Revision tags: netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.114 10-May-2000 itojun

branches: 1.114.4;
add missing boundary checks to ip options processing.
correct timestamp option validation (len and ptr upper/lower bound
based on RFC791).
fill "pointer" field for parameter problem in timestamp option processing.


# 1.113 10-May-2000 itojun

correct more out-of-bounds memory access, if cnt == 1 and optlen > 1.


# 1.112 06-May-2000 sommerfeld

Handle large offsets with very small options correctly.


# 1.111 31-Mar-2000 jdolecek

Slighly improve previous - only include <netinet/ip_mroute.h> if MROUTING
is defined.


# 1.110 31-Mar-2000 jdolecek

include <netinet/ip_mroute.h> for ip_mforward() - needed after
last duplicate prototype sweep (prototype for ip_mforward() used to be in <netinet/ip_var.h>)


# 1.109 30-Mar-2000 augustss

Remove register declarations.


# 1.108 30-Mar-2000 simonb

Delete uninitialised declaration of ip_defttl - there's an initialised
decl earlier in this file.


# 1.107 10-Mar-2000 thorpej

Back out previous, and adjust a comment.


# 1.106 07-Mar-2000 thorpej

Back out part of 1.104 which isn't actually needed.


# 1.105 03-Mar-2000 itojun

remove unnecessary ttl initialization which I mistakingly bringed in
during KAME merge (this is part of WIDE's expeirmental reass code...)
NetBSD PR: 9412
From: Wolfgang Rupprecht <wolfgang@wsrcc.com>
Fix from: ho@crt.se
itojun was notified from: theo


# 1.104 02-Mar-2000 thorpej

Avoid a bug in GCC which manifests itself when processing unaligned
IP options. Problem pointed out by Matt Hargett and Erik Fair, analyzed
by me.


# 1.103 01-Mar-2000 itojun

introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)


# 1.102 20-Feb-2000 darrenr

pass "struct pfil_head *" to pfil_add_hook and pfil_remove hook rather
than "struct protosw *".


# 1.101 17-Feb-2000 darrenr

Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.


# 1.100 16-Feb-2000 itojun

- if ip_dst matches address on !IFF_UP interface, and
- there's no match against addresses on IFF_UP interface,
send icmp unreach if I'm router. drop it if I'm host.

Revised version of PR: 9387 from nrt@iij.ad.jp. Discussed with thorpej+nrt.


Revision tags: chs-ubc2-newbase
# 1.99 12-Feb-2000 thorpej

Typo (Thanks, Havard :-)


# 1.98 12-Feb-2000 thorpej

Small cosmetic change, and note a place where a statistic should be
gathered.


# 1.97 11-Feb-2000 itojun

fix in-kernel packet forwarding loop (till TTL becomes 0) when:
- a packet is delivered to an address X,
- and the address X is configured on my !IFF_UP interface
- and ipforwarding=1

NetBSD PR: 9387
From: nrt@iij.ad.jp


# 1.96 01-Feb-2000 thorpej

Use ifatoia() and sintosa() consistently, rather than using home-grown
casting macros intermixed.


# 1.95 31-Jan-2000 itojun

bring in latest KAME ipsec tree.
- interop issues in ipcomp is fixed
- padding type (after ESP) is configurable
- key database memory management (need more fixes)
- policy specification is revisited

XXX m->m_pkthdr.rcvif is still overloaded - hope to fix it soon


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.94 26-Oct-1999 itojun

disable ipflow (IPv4 fast fowarding) when IPsec is configured into the kernel.


# 1.93 17-Oct-1999 sommerfeld

branches: 1.93.2; 1.93.4;
In ip_forward():

Avoid forwarding ip unicast packets which were contained inside
link-level multicast packets; having M_MCAST still set in the packet
header flags will mean that the packet will get multicast to a bogus
group instead of unicast to the next hop.

Malformed packets like this have occasionally been spotted "in the
wild" on a mediaone cable modem segment which also had multiple netbsd
machines running as router/NAT boxes.

Without this, any subnet with multiple netbsd routers receiving all
multicasts will generate a packet storm on receipt of such a
multicast. Note that we already do the same check here for link-level
broadcasts; ip6_forward already does this as well.

Note that multicast forwarding does not go through ip_forward().

Adding some code to if_ethersubr to sanity check link-level
vs. ip-level multicast addresses might also be worthwhile.


Revision tags: chs-ubc2-base
# 1.92 23-Jul-1999 itojun

branches: 1.92.2;
do not include unnecessary include files.


# 1.91 09-Jul-1999 thorpej

defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).


# 1.90 06-Jul-1999 itojun

sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups


# 1.89 01-Jul-1999 itojun

IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.88 26-Jun-1999 sommerfeld

If the new global variable hostzerobroadcast is zero, no longer assume
address zero of each net/subnet is a broadcast address.
(The default value is nonzero, which preserves the current behavior).

This can be set using sysctl; the boot-time default can also be
configured using the HOSTZEROBROADCAST kernel config option.

While we're here, defopt HOSTZEROBROADCAST and SUBNETSARELOCAL


# 1.87 04-May-1999 hwr

It does not make much sense to increase a "output" counter on input.


# 1.86 03-May-1999 thorpej

In INADDR_TO_IA(), skip interfaces which are not up. Revert previous change
to ip_input.c to check the interface status after INADDR_TO_IA().

Fix cooked up by Heiko Rupp and myself.

Fixes PR 7480.


# 1.85 03-May-1999 hwr

Drop packets, that have a Class-D address as source address.
Implements the first half of PR 7003.


# 1.84 07-Apr-1999 proff

tiny KNF change


# 1.83 07-Apr-1999 proff

Prevent reception of packets on downed interfaces (via an up interface).
fixes kern/7327


Revision tags: netbsd-1-4-base
# 1.82 27-Mar-1999 aidan

branches: 1.82.2;
Added per-addr input/output statistics. Currently just support netatalk
and netinet, currently only tested under netinet.

Disabled by default, enabled by compiling the kernel with option
IFA_STATS. Enabling this feature seems to make the ip_output function
take 13% longer than before, which should be OK for people that need
this feature.


# 1.81 26-Mar-1999 proff

security: test for ip_len < ip_hl <<2 and drop packet accordingly


# 1.80 19-Jan-1999 mycroft

There's just no plausible reason to byte-swap ip_id internally. It's opaque.


# 1.79 19-Jan-1999 mycroft

Don't screw with ip_len; just subtract from it where we actually use the
value.


# 1.78 19-Jan-1999 mycroft

Don't overwrite the checksum fields when checking them. There's no reason to
do this, and it screws up ICMP replies.
XXX The returned IP checksum and length are still wrong.


# 1.77 11-Jan-1999 thorpej

Fix byte order and ip_len inconsistencies in ICMP reply code. Also, fix
some formatting and HTONS(foo) vs. foo = htons(foo) inconsistencies.

PR #6602, Darren Reed.


# 1.76 19-Dec-1998 thorpej

Reverse the copyright-notice-swap. It went against existing practice.


# 1.75 18-Dec-1998 thorpej

Add a lock around the IP fragment reassembly queue, to prevent ip_drain()
from corrupting the queue if called from a device's interrupt context.

Should fix PR #5684.


Revision tags: kenh-if-detach-base
# 1.74 13-Nov-1998 thorpej

branches: 1.74.2;
Once a fragmented IP packet has been reassembled, recompute the packet
length before passing it up the stack. From FreeBSD.


Revision tags: chs-ubc-base
# 1.73 08-Oct-1998 thorpej

Use the pool allocator for ipflow entries.


# 1.72 08-Oct-1998 thorpej

Use the pool allocator for ipqent structures.


# 1.71 30-Sep-1998 tls

Switch order of TNF and UCB copyrights so UCB copyright is first; this seems more appropriate since UCB wrote the original code, after all.


# 1.70 09-Sep-1998 thorpej

Make a diagnostic printf more sensible, PR #5951, Heiko W. Rupp.


# 1.69 09-Aug-1998 mrg

defopt PFIL_HOOKS.


Revision tags: eeh-paddr_t-base
# 1.68 17-Jul-1998 sommerfe

Fix PR5508: ipfil cut-through forwarding causes panic


# 1.67 01-Jun-1998 thorpej

Protect the ipflow_reap() call with splsoftnet.


# 1.66 24-May-1998 thorpej

Fix OBOB in IP timestamp option processing, as noted in FreeBSD PR 6738,
from Jennifer Dawn Meyers <jdm@enteract.com>.


# 1.65 04-May-1998 matt

Default IP flow to being enabled. Add a sysctl to control the maximum
number of flows (net.inet.ip.maxflows). If set to 0, will disable fast
path forwarding.


# 1.64 01-May-1998 thorpej

Allow packet filters to prevent a packet from creating a fast-forwarding
flow, by setting the "can fast forward" flag in the packet header, and
giving a chance for filters to clear the flag. If the flag is still
set after the filters have given it a chance, the packet will be used
to create a fast-forward flow entry.


# 1.63 29-Apr-1998 matt

Add support for "fast" forwarding. Add hooks in if_ethersubr.c and
if_fddisubr.c to fastpath IP forwarding. If ip_forward successfully
forwards a packet, it will create a cache (ipflow) entry. ether_input
and fddi_input will first call ipflow_fastforward with the received
packet and if the packet passes enough tests, it will be forwarded (the
ttl is decremented and the cksum is adjusted incrementally).


# 1.62 29-Apr-1998 matt

defopt GATEWAY


# 1.61 29-Apr-1998 kml

change path MTU timeout value to match RFC 1191


# 1.60 29-Apr-1998 kml

Add support for deletion of routes added by path MTU discovery;
uses new generic route timeout code. Add sysctl for timeout period.


# 1.59 19-Mar-1998 mrg

convert pfil(9) in and out lists from <sys/queue.h> LISTs to TAILQs, and
change pfil_add_hook to put output filters at the tail of the queue,
while continuing to place input filters at the head of the queue. update
the two users of these functions, and document these changes.

fixes PR#4593.


# 1.58 15-Feb-1998 tls

Add correct copyright notice for IP address hash change. This code is donated to TNF by the original copyright holder, Panix.


# 1.57 13-Feb-1998 tls

Change list of interface IP addresses to a hash. Improves performance on hosts with a large number of IP addresses significantly.


# 1.56 28-Jan-1998 thorpej

Use offsetof() from libkern.h


# 1.55 12-Jan-1998 scottr

Use option header file for MROUTING


# 1.54 05-Jan-1998 lukem

enhance ephemeral port allocation code:
* support sysctl net.inet.ip.anonportmin (lowest ephemeral port)
and net.inet.ip.anonportmax (highest ephemeral port).
these can't be set to >65535, < IPPORT_RESERVED (unless IPNOPRIVPORTS
is defined), and anonportmin has to be < anonportmax.
* use a cleaner way of only cycling through the available set once;
this will be useful for when a random allocation scheme is used
* define IPPORT_ANON{MIN,MAX} instead of IPPORT_USER{LOW,HIGH}


Revision tags: netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base
# 1.53 18-Oct-1997 kml

branches: 1.53.2;
change sysctl net.inet.icmp.mtudisc to net.inet.ip.mtudisc


# 1.52 17-Oct-1997 thorpej

Allow `subnetsarelocal' to be changed via sysctl.


Revision tags: thorpej-signal-base marc-pcmcia-base
# 1.51 29-Aug-1997 gwr

Tweaks to allow operation with an interface address of 0.0.0.0
(needed for NFS mountroot using BOOTP to get boot parameters)


Revision tags: marc-pcmcia-bp
# 1.50 24-Jun-1997 thorpej

branches: 1.50.4;
Eliminate use of dtom() from the network code, allowing more flexible
use of mbuf external storage and increasing performance (by eliminating
an m_pullup() for clusters in the IP reassembly code).

Changes from Koji Imada <koji@math.human.nagoya-u.ac.jp>, in PR #3628
and #3480, with ever-so-slight integration changes by me.


# 1.49 15-Apr-1997 christos

Move the mtod calls *after* we've made sure that the packet has passed the
filter successfully. Otherwise it can be NULL if the filter blocked it,
and we die. How did this ever work?


Revision tags: is-newarp-before-merge
# 1.48 26-Feb-1997 mrg

allow src-routed packetd by default, per host requirements


# 1.47 25-Feb-1997 cjs

Add net.inet.ip.allowsrcrt option which allows/drops all source
routed packets. This currently defaults to `drop,' but once we
verify that all applications that rely on determining remote IP
addresses for authentication are dropping the connection when they
see a source route option (not just disabling the source route
option), we can turn this back on and conform with the host
requirements.


# 1.46 19-Feb-1997 cjs

Fix bug in sysctl net.inet.ip.forwsrcrt handing: now you can read it
if securelevel > 0. (Thanks, cgd.)


# 1.45 18-Feb-1997 mrg

pseudo-device ipfilter brings in PFIL_HOOKS.


Revision tags: is-newarp-base
# 1.44 11-Jan-1997 thorpej

branches: 1.44.4;
Implement the IP_RECVIF socket option: supply a datagram packet's incoming
interface using a sockaddr_dl in a control mbuf.

Implement SO_TIMESTAMP for IP datagrams.

Move packet information option processing into a generic function
so that they work with multicast UDP and raw IP as well as unicast UDP.

Contributed by Bill Fenner <fenner@parc.xerox.com>.


# 1.43 20-Dec-1996 mrg

in pfil_hooks: always reassign ip after calling hook.


# 1.42 20-Dec-1996 mrg

remove pfil_bad.


# 1.41 25-Oct-1996 thorpej

Before concatenating frags, sanity check the length of the packet. If it's
larger than IP_MAXPACKET, discard it.
Based on a patch from Bill Fenner <fenner@parc.xerox.com>


# 1.40 22-Oct-1996 veego

Fix a panic from the pfil_hooks.


# 1.39 13-Oct-1996 christos

backout previous kprintf changes


# 1.38 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.37 21-Sep-1996 perry

commit fix in pr 2772 -- the IP input code was assuming that the
reserved (must be zero) flag must necessarily be zero. We now define
an IP_RF (by analogy to IP_DF and IP_MF) and mask it out when necessary.


# 1.36 14-Sep-1996 mrg

move the packet filter hooks in to a saner location. while i'm here, rename
PACKET_FILTER to PFIL_HOOKS.


# 1.35 09-Sep-1996 mycroft

Add in_nullhost() and in_hosteq() macros, to hide some protocol
details. Also, fix a bug in TCP wrt SYN+URG packets.


# 1.34 08-Sep-1996 mycroft

Save 68 bytes of the packet for ICMP, not 64. From Laine Stump, PR 2296.


# 1.33 06-Sep-1996 mrg

add packet filter interface code. see pfil(9) for more details. you
need the PACKET_FILTER option to enable this code. currently, ipfilter
version 3.1.1-beta has been converted to use this new interface.


# 1.32 14-Aug-1996 thorpej

Fix some DIAGNOSTIC printf() formats; ntohl() provides a 32-bit quantity,
and should be printed with %x, not %lx.


# 1.31 10-Jul-1996 cgd

print result of ntohl/htonl as a long. (makes -Wformat work on the
Alpha.)


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.30 16-Mar-1996 christos

branches: 1.30.4;
Fix printf format args.


# 1.29 26-Feb-1996 mrg

two more local addr changes, all done differently now (idea from charles)


# 1.28 13-Feb-1996 christos

netinet prototypes


# 1.27 16-Jan-1996 thorpej

Add a net.inet.ip.directed-broadcast sysctl as suggested by
Darren Reed <darrenr@vitruvius.arbld.unimelb.edu.au> in PR #1227.
This change is slightly different than the one submitted by Darren in
that the DIRECTED_BROADCAST compile-time option will behave like it used
to so that existing configurations utilizing it won't have to change.


# 1.26 15-Jan-1996 thorpej

Add net.inet.ip.forwsrcrt: if zero, the system will not forward
source-routed packets. Note this value is protected by kernel security
level; it can only be changed if securelevel < 1.


# 1.25 21-Nov-1995 cgd

make netinet work on systems where pointers and longs are 64 bits
(like the alpha). Biggest problem: IP headers were overlayed with
structure which included pointers, and which therefore didn't overlay
properly on 64-bit machines. Solution: instead of threading pointers
through IP header overlays, add a "queue element" structure to do
the threading, and point it at the ip headers.


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.24 12-Aug-1995 mycroft

splnet --> splsoftnet


# 1.23 12-Jun-1995 mycroft

Change in_pcbnotify*() to take an errno value. Make inetctlerrmap[] an
array on ints, not u_chars.


# 1.22 12-Jun-1995 mycroft

Various cleanup, including:
* Convert several data structures to use queue.h.
* Split in_pcbnotify() into two parts; one for notifying a specific PCB, and
one for notifying all PCBs for a particular foreign address.


# 1.21 07-Jun-1995 mycroft

Remove ip_ifmatrix completely.


# 1.20 04-Jun-1995 mycroft

Don't cast things unnecessarily.


# 1.19 04-Jun-1995 mycroft

Clean up many more casts.


# 1.18 01-Jun-1995 mycroft

Avoid byte-swapping IP addresses at run time.


# 1.17 15-May-1995 cgd

oops; forgot a '{'


# 1.16 14-May-1995 cgd

drop (and record) malformed IP fragments. Fixes pr 1030 (differently).


# 1.15 13-Apr-1995 cgd

be a bit more careful and explicit with types. (basically a large no-op.)


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.14 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.13 13-May-1994 mycroft

Update to 4.4-Lite networking code, with a few local changes.


# 1.12 14-Feb-1994 mycroft

PARANOID --> DIAGNOSTIC for inexpensive tests.


# 1.11 02-Feb-1994 hpeyerl

Multicast is no longer optional.


# 1.10 29-Jan-1994 brezak

Fix some cases of NOT dealing with m_pkthdr's. This code is still suspect though, at least this fixes some panics.


# 1.9 10-Jan-1994 mycroft

Should compile now with or without `options MULTICAST'.


# 1.8 09-Jan-1994 mycroft

Prototype the rest.


# 1.7 08-Jan-1994 mycroft

More prototypes.


# 1.6 08-Jan-1994 mycroft

Fix some inconsistent spacing; spaces at the end of lines, etc.


# 1.5 18-Dec-1993 mycroft

Canonicalize all #includes.


# 1.4 06-Dec-1993 hpeyerl

multicast support.
>From Chris Maeda, cmaeda@cs.washington.edu
These patches are derived from the IP Multicast patches for BSDI.


Revision tags: magnum-base netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.3 20-May-1993 cgd

branches: 1.3.4;
more rcsid additions and file header cleanups


# 1.2 04-May-1993 cgd

make ip_input recursion checking be for -DPARANOID, and make it panic


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.374 07-Feb-2018 maxv

Remove null check on ip, it can't be null. (Confuses code scanners.)


# 1.373 06-Feb-2018 maxv

Typos and style a bit, no functional change.


# 1.372 05-Feb-2018 maxv

Exterminate IPSENDREDIRECTS and IPMTUDISCTIMEOUT, neither is documented.


# 1.371 05-Feb-2018 maxv

Nuke DIRECTED_BROADCAST, it is not documented and not enabled anywhere. It
probably wouldn't have built correctly anyway, since there is no associated
defflag.

These ten lines of code in ip_input.c already look a lot better.


# 1.370 05-Feb-2018 maxv

Clean up this mess. This is typically the kind of places where we need to
seriously cut the bullshit. These things are unreadable, undocumented, and
all they bought us was not figuring out we had IPv4 forwarding enabled by
default for 20+ years.


# 1.369 05-Feb-2018 maxv

Be tougher, and don't allow LSRR+SSRR (RFC7126).


# 1.368 05-Feb-2018 maxv

Kick duplicate options, they are not allowed (RFC791).


# 1.367 05-Feb-2018 maxv

Remove unused variable.


# 1.366 05-Feb-2018 maxv

Disable ip_allowsrcrt and ip_forwsrcrt. Enabling them by default was a
completely dumb idea, because they have security implications.

By sending an IPv4 packet containing an LSRR option, an attacker will
cause the system to forward the packet to another IPv4 address - and
this way he white-washes the source of the packet.

It is also possible for an attacker to reach hidden networks: if a server
has a public address, and a private one on an internal network (network
which has several internal machines connected), the attacker can send a
packet with:

source = 0.0.0.0
destination = public address of the server
LSRR first address = address of a machine on the internal network

And the packet will be forwarded, by the server, to the internal machine,
in some cases even with the internal IP address of the server as a source.


# 1.365 05-Feb-2018 maxv

Style, no functional change.


# 1.364 01-Jan-2018 christos

1) "#define ipi_spec_dst ipi_addr" in <netinet/in.h>
2) Change the IP_RECVPKTINFO option to control the generation of
IP_PKTINFO control messages, the way it's done in Solaris.
3) Remove the superfluous IP_RECVPKTINFO control message.
4) Change the IP_PKTINFO option to do different things depending on
the parameter it's supplied with:
- If it's sizeof(int), assume it's being used as in Linux:
- If it's non-zero, turn on the IP_RECVPKTINFO option.
- If it's zero, turn off the IP_RECVPKTINFO option.
- If it's sizeof(struct in_pktinfo), assume it's being used as in
Solaris, to set a default for the source interface and/or
source address for outgoing packets on the socket.
5) Return what Linux or Solaris compatible code expects, depending
on data size, and just added a fallback to a Linux (and current NetBSD)
compatible value if the size is unknown (as it is now), or,
in the future, if the calling application specifies a receiving
buffer that doesn't match either data item.

From: Tom Ivar Helbekkmo


Revision tags: tls-maxphys-base-20171202
# 1.363 24-Nov-2017 roy

Allow local communication over DETACHED addresses.
Allow binding to DETACHED or TENTATIVE addresses as we deny
sending upstream from them anyway.
Prefer non DETACHED or TENTATIVE addresses.


# 1.362 17-Nov-2017 ozaki-r

Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch

It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.

No functional change


# 1.361 27-Sep-2017 ozaki-r

Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).


Revision tags: nick-nhusb-base-20170825
# 1.360 27-Jul-2017 ozaki-r

Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.


# 1.359 19-Jul-2017 ozaki-r

Correct a comment


Revision tags: perseant-stdc-iso10646-base
# 1.358 08-Jul-2017 christos

Reorder the controls to the ones that need an interface and the ones that
don't; process the ones that don't first. Add a DIAGNOSTIC if there is no
interface; really this should be a KASSERT/panic because it is a bug if the
interface is not set at this point.


# 1.357 06-Jul-2017 christos

remove unnecessary casts (no functional change)


# 1.356 06-Jul-2017 christos

Merge the two copies SO_TIMESTAMP/SO_OTIMESTAMP processing to a single
function, and add a SOOPT_TIMESTAMP define reducing compat pollution from
5 places to 1.


Revision tags: netbsd-8-base
# 1.355 01-Jun-2017 chs

branches: 1.355.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base
# 1.354 31-Mar-2017 ozaki-r

Don't use a single global variable to store source route information for multiple incoming packets

It's not MP-safe. So use a m_tag to store the information instead.

Pointed out by knakahara@
The fix is from OpenBSD (originally fixed in FreeBSD)


# 1.353 31-Mar-2017 ozaki-r

Don't use a single global variable as a temporal storage for multiple packets

It's not MP-safe. So use local variables instead.


Revision tags: pgoyette-localcount-20170320
# 1.352 06-Mar-2017 ozaki-r

Make sure icmp_redirect_timeout_q and ip_mtudisc_timeout_q are initialized on bootup

Fix PR kern/52029


# 1.351 17-Feb-2017 ozaki-r

Fix return value


# 1.350 17-Feb-2017 ozaki-r

Protect sysctl_net_inet_ip_pmtudto with icmp_mtx instead of softnet_lock


# 1.349 07-Feb-2017 ozaki-r

Add missing NULL checks for m_get_rcvif


Revision tags: nick-nhusb-base-20170204
# 1.348 24-Jan-2017 ozaki-r

Tweak softnet_lock and NET_MPSAFE

- Don't hold softnet_lock in some functions if NET_MPSAFE
- Add softnet_lock to sysctl_net_inet_icmp_redirtimeout
- Add softnet_lock to expire_upcalls of ip_mroute.c
- Restore softnet_lock for in{,6}_pcbpurgeif{,0} if NET_MPSAFE
- Mark some softnet_lock for future work


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107
# 1.347 12-Dec-2016 ozaki-r

branches: 1.347.2;
Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.346 08-Dec-2016 ozaki-r

Use psref for ip_rtaddr

ip_rtaddr will be sleepable soon. So use psref instead of pserialize.


# 1.345 08-Dec-2016 ozaki-r

Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.344 18-Oct-2016 ozaki-r

Don't hold global locks if NET_MPSAFE is enabled

If NET_MPSAFE is enabled, don't hold KERNEL_LOCK and softnet_lock in
part of the network stack such as IP forwarding paths. The aim of the
change is to make it easy to test the network stack without the locks
and reduce our local diffs.

By default (i.e., if NET_MPSAFE isn't enabled), the locks are held
as they used to be.

Reviewed by knakahara@


# 1.343 18-Oct-2016 ozaki-r

Avoid double frees of mbuf

May fix one of panicks reported by Tom Ivar Helbekkmo in PR kern/51522


# 1.342 11-Oct-2016 ozaki-r

Fix kernel builds with IFA_STATS


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.341 07-Sep-2016 roy

Disallow input to detached addresses because they are not yet valid.


# 1.340 31-Aug-2016 ozaki-r

Make ipforward_rt and ip6_forward_rt percpu

Sharing one rtcache between CPUs is just a bad idea.

Reviewed by knakahara@


Revision tags: pgoyette-localcount-20160806
# 1.339 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


# 1.338 26-Jul-2016 ozaki-r

Fix downmatch increment


Revision tags: pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.337 08-Jul-2016 ozaki-r

branches: 1.337.2;
CID 1363344: remove dead code

We may need to reconsider a case when m_get_rcvif_psref returns NULL.


# 1.336 07-Jul-2016 ozaki-r

Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.


# 1.335 06-Jul-2016 ozaki-r

Switch the IPv4 address list to pslist(9)

Note that we leave the old list just in case; it seems there are some
kvm(3) users accessing the list. We can remove it later if we confirmed
nobody does actually.


# 1.334 06-Jul-2016 ozaki-r

Add and use pslist(9)-based hashtable for IPv4 addresses

Note that we leave the old hashtable to keep vmstat -H working.


# 1.333 04-Jul-2016 ozaki-r

Separate IP address matching functions

No functional change intended.


# 1.332 30-Jun-2016 ozaki-r

Tidy up goto lables

No functional change.


# 1.331 30-Jun-2016 ozaki-r

Fix error paths

Some error paths did m_put_rcvif_psref twice.


# 1.330 28-Jun-2016 ozaki-r

Add missing NULL checks for m_get_rcvif_psref


# 1.329 10-Jun-2016 ozaki-r

Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319
# 1.328 21-Jan-2016 riastradh

Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.


# 1.327 21-Jan-2016 riastradh

Give proper prototype to ip_output.


# 1.326 08-Jan-2016 knakahara

eliminate ip_input.c and ip6_input.c dependency on gif(4)


Revision tags: nick-nhusb-base-20151226
# 1.325 13-Oct-2015 roy

Include arp.h to restore the sysctl net.inet.ip.dad_count.
Fixes PR kern/49883 thanks to HITOSHI Osada.


Revision tags: nick-nhusb-base-20150921
# 1.324 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.323 07-Aug-2015 ozaki-r

Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.


Revision tags: nick-nhusb-base-20150606
# 1.322 02-May-2015 joerg

Fix !ARP build.


# 1.321 02-May-2015 roy

Add IPv4 address flags IN_IFF_TENTATIVE, IN_IFF_DUPLICATED and
IN_IFF_DETATCHED to mimic the IPv6 address behaviour.
Add SIOCGIFAFLAG_IN ioctl to retrieve the address flag via the
ifreq structure.
Add IPv4 DAD detection via the ARP methods described in RFC 5227.
Add sysctls net.inet.ip.dad_count and net.inet.arp.debug.

Discussed on tech-net@


Revision tags: nick-nhusb-base-20150406
# 1.320 26-Mar-2015 ozaki-r

Tidy up the regular path of ip_forward

No functional change is intended.


Revision tags: netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.319 16-Jun-2014 ozaki-r

branches: 1.319.4;
Add 3rd argument to pktq_create to pass sc

It will be used to pass bridge sc for bridge_forward softint.

ok rmind@


# 1.318 05-Jun-2014 rmind

- Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.


# 1.317 30-May-2014 christos

Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.


# 1.316 29-May-2014 rmind

Make IGMP and multicast group management code MP-safe. Use a read-write
lock to protect the hash table of multicast address records; also, make it
private and eliminate some macros. In the long term, the lookup path ought
to be optimised.


# 1.315 28-May-2014 christos

CID 12164{49,51}: Remove bogus ifp == NULL checks; if ifp was really NULL,
we would have been dead a few lines before the tests.


# 1.314 23-May-2014 rmind

ip_input(), ip_savecontrol(): cache m->m_pkthdr.rcvif in a variable.


# 1.313 23-May-2014 rmind

Make ip_forward() static, there is no need to expose it.


# 1.312 23-May-2014 rmind

Make ip_input() static, there is no need to expose it.


# 1.311 22-May-2014 rmind

- Add in_init() and move some functions, variables and sysctls into in.c
where they belong to. Make some functions and variables static.
- ip_input.c: reduce some #ifdefs, cleanup a little.
- Move some sysctls into ip_flow.c as they belong there.

No functional change.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.310 19-Mar-2014 liamjfoy

branches: 1.310.2;
Remove ipflow_prune and replace with ipflow_reap. ok rmind@


Revision tags: riastradh-drm2-base3
# 1.309 25-Feb-2014 pooka

Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.308 29-Jun-2013 rmind

- Rewrite parts of pfil(9): use array to store hooks and thus be more cache
friendly (there are only few hooks in the system). Make the structures
opaque and the interface more strict.
- Remove PFIL_HOOKS option by making pfil(9) mandatory.


# 1.307 27-Jun-2013 christos

branches: 1.307.2;
flip src/dst


# 1.306 27-Jun-2013 christos

implement IP_PKTINFO and IP_RECVPKTINFO.


# 1.305 08-Jun-2013 rmind

Split IPsec code in ip_input() and ip_forward() into the separate routines
ipsec4_input() and ipsec4_forward(). Tested by christos@.


# 1.304 05-Jun-2013 christos

IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.


Revision tags: agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.303 29-Nov-2012 christos

Add a new sysctl to mark ports as reserved, so that they are not used in
the anonymous or reserved port allocation.


Revision tags: yamt-pagecache-base6
# 1.302 25-Jun-2012 christos

branches: 1.302.2;
rename rfc6056 -> portalgo, requested by yamt


# 1.301 22-Jun-2012 christos

PR/46602: Move the rfc6056 port randomization to the IP layer.


# 1.300 02-Jun-2012 dsl

Add some pre-processor magic to verify that the type of the data item
passed to sysctl_createv() actually matches the declared type for
the item itself.
In the places where the caller specifies a function and a structure
address (typically the 'softc') an explicit (void *) cast is now needed.
Fixes bugs in sys/dev/acpi/asus_acpi.c sys/dev/bluetooth/bcsp.c
sys/kern/vfs_bio.c sys/miscfs/syncfs/sync_subr.c and setting
AcpiGbl_EnableAmlDebugObject.
(mostly passing the address of a uint64_t when typed as CTLTYPE_INT).
I've test built quite a few kernels, but there may be some unfixed MD
fallout. Most likely passing &char[] to char *.
Also add CTLFLAG_UNSIGNED for unsiged decimals - not set yet.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.299 22-Mar-2012 drochner

remove KAME IPSEC, replaced by FAST_IPSEC


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.298 09-Jan-2012 liamjfoy

check against NULL


# 1.297 19-Dec-2011 drochner

rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.296 31-Aug-2011 plunky

branches: 1.296.2; 1.296.6;
NULL does not need a cast


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.295 03-May-2011 dyoung

*_drain() routines may be called with locks held, so instead of doing
any work in *_drain(), set a drain-needed flag. Do the work in the
fasttimo handler.

Contributed by Coyote Point Systems, Inc.


# 1.294 14-Apr-2011 dyoung

In ipintr(), don't overwrite ipintrq.ifq_maxlen with IFQ_MAXLEN.

Initialize ipintrq.ifq_maxlen using IFQ_MAXLEN directly instead of using
the global ipqmaxlen. Get rid of the global ipqmaxlen.

Now it works again to override the maximum IP queue length with, for
example, sysctl -w net.inet.ip.ifq.maxlen=5.


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.293 13-Dec-2010 matt

branches: 1.293.2;
Back out rev that shouldn't have been committed.


# 1.292 11-Dec-2010 matt

Add routines to calculate a checkesum if the driver concludes that the
h/w can't do it.


Revision tags: uebayasi-xip-base4
# 1.291 05-Nov-2010 rmind

ip_randomid: make mechanism MP-safe and more modular.

OK matt@


# 1.290 05-Nov-2010 rmind

ip_reass_packet: finish abstraction; some clean-up.
Discussed some time ago with matt@.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.289 19-Jul-2010 rmind

Abstract IP reassembly into single generic routine - ip_reass_packet().
Make struct ipq private and struct ipqent not visible to userland.
Push ip_len adjustment into reassembly layer.

OK matt@


# 1.288 13-Jul-2010 rmind

Split-off IPv4 re-assembly mechanism into a separate module. Abstract
into ip_reass_init(), ip_reass_lookup(), etc (note: abstraction is not
yet complete). No functional changes to the actual mechanism.

OK matt@


# 1.287 09-Jul-2010 rmind

ip_input: move lookup for fragment queue a little bit further. OK matt@.


Revision tags: uebayasi-xip-base1
# 1.286 01-Apr-2010 tls

As suggested by at least 3 different people (the guilty parties know who
they are) avoid repeated kernel_lock/unlock by using an intrq on the stack.

About 5%-10% better from run to run, on my *very* simpleminded test. Can't
possibly be worse.


# 1.285 31-Mar-2010 tls

Don't hold kernel lock across call to ip_input() -- it blocked *all*
hardware interrupts for the length of time it took for all dequeued
packets to flow up the stack (on multiprocessors only). Initial testing
shows performance impact is minimal -- since this temporary fix actually
means taking/releasing the kernel lock per-packet, that seems
acceptable.

Holding the kernel lock across the ip_input() call duplicated the
exclusion intended to be provided by the socket locks/softnet lock
(same lock, for INET/INET6 sockets) and could mask serious bugs. Several
hours' testing didn't turn any up but I'd be surprised if some don't now
appear.

Damon Permezel noticed the problem. Temporary fix suggested by matt@.


Revision tags: yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.284 16-Sep-2009 pooka

branches: 1.284.2; 1.284.4;
Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.283 17-Jul-2009 minskim

Delete trailing whitespace.


Revision tags: yamt-nfs-mp-base6
# 1.282 16-Jul-2009 minskim

Add the IP_RECVTTL option support.

If the IP_RECVTTL option is enabled on a SOCK_DGRAM socket, the
recvmsg(2) call will return the TTL of the received datagram. The
msg_control field in the msghdr structure points to a buffer that
contains a cmsghdr structure followed by the TTL value.

Modeled after FreeBSD implementation.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.281 18-Apr-2009 tsutsui

Remove extra whitespace added by a stupid tool.
XXX: more in src/sys/arch


# 1.280 15-Apr-2009 elad

Remove a few KAUTH_GENERIC_ISSUSER in favor of more descriptive
alternatives.

Discussed on tech-kern:

http://mail-index.netbsd.org/tech-kern/2009/04/11/msg004798.html

Input from ad@, christos@, dyoung@, tsutsui@.

Okay ad@.


# 1.279 18-Mar-2009 cegger

bcopy -> memcpy


Revision tags: nick-hppapmap-base2
# 1.278 19-Jan-2009 christos

branches: 1.278.2;
Provide compatibility to the old timeval SCM_TIMESTAMP messages.


Revision tags: mjf-devfs2-base
# 1.277 17-Dec-2008 cegger

kill MALLOC and FREE macros.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.276 23-Nov-2008 rmind

ip_input: fix an IPQ "lock" leak. (hi <matt>!)


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4
# 1.275 04-Oct-2008 pooka

branches: 1.275.2; 1.275.4;
POOL_INIT -> pool_init


Revision tags: wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.274 05-Sep-2008 seanb

Wrong route being consulted in one place
in ip_forward() after change to rtcache_*().
Restore previous behaviour.


# 1.273 20-Aug-2008 matt

Make the sysctl routines take out softnet_lock before dealing with
any data structures.

Change inet6ctlerrmap and zeroin6_addr to const.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base
# 1.272 05-May-2008 ad

branches: 1.272.2; 1.272.6;
- Convert hashinit() to use kmem_alloc(). The hash tables can be large
and it's better to not have them in kmem_map.
- Convert a couple of minor items along the way to kmem_alloc().
- Fix some memory leaks.


# 1.271 04-May-2008 thorpej

Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577


# 1.270 02-May-2008 ad

PR kern/38497 Out of memory allocating ksiginfo

Work around: don't acquire softnet_lock in protocol drain routines.


# 1.269 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.268 24-Apr-2008 ad

branches: 1.268.2;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.


# 1.267 23-Apr-2008 thorpej

Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.266 12-Apr-2008 thorpej

branches: 1.266.2;
Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.


# 1.265 09-Apr-2008 thorpej

- ipflow is not used outside ip_flow.c; move its definition there.
- Make ipflow_reap() private to ip_flow.c, and introduce ipflow_prune()
for external callers to use (avoids returning an ipflow * that is never
actually used anyway).


# 1.264 07-Apr-2008 thorpej

Change IP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ipstat structure; old netstat
binaries will continue to work properly.


# 1.263 27-Mar-2008 cube

- Make sure we send a reasonable fragment size when IPSEC is configured.
Otherwise we end up sending a dubious "0" whenever we cannot find a
proper association for the packet.
- Reset sack_newdata along with snd_nxt to avoid improper integer
arithmetics that lead to sending data from an incorrect place in the
stream, making it appear as corrupted.

Patch by Michael Van Elst, based on an analysis by Michael for the IPSEC
stuff and I for the SACK issue.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.262 06-Feb-2008 matt

branches: 1.262.6;
Add a new ip_id generation scheme based on a Fisher-Yates shuffle over a
sliding window. XXX replace use of arc4random RSN.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.261 14-Jan-2008 dyoung

Use rtcache_validate() instead of rtcache_getrt(). Shorten staircase
in in_losing().


Revision tags: vmlocking2-base3 matt-armv6-base
# 1.260 22-Dec-2007 matt

Fix offset calculation.
Make sure that all frags use the same TOS.


# 1.259 21-Dec-2007 matt

Also make sure the first is at 68 bytes long.


# 1.258 21-Dec-2007 matt

Prevent TCP blind data attacks by not allowing non-initial fragments to
start at less than 68 bytes (minimal fragment size).


# 1.257 20-Dec-2007 dyoung

Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.256 26-Nov-2007 yamt

branches: 1.256.2; 1.256.6;
inetctlerrmap: use designated initializer.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.255 09-Nov-2007 kefren

Don't MCLAIM in ipintr() because we do it anyway in ip_input()


Revision tags: jmcneill-base yamt-x86pmap-base4 yamt-x86pmap-base3 yamt-x86pmap-base2 vmlocking-base
# 1.254 02-Oct-2007 dyoung

branches: 1.254.2; 1.254.4;
Delete the unused second argument to ip_stripoptions(), move it
closer to its single caller in if_eon.c, try to move fewer bytes
by moving the IP header forward instead of moving the tail of the
mbuf backward, and use m_adj(9) instead of fiddling directly with
mbuf data members.


Revision tags: yamt-x86pmap-base
# 1.253 11-Sep-2007 degroote

branches: 1.253.2;
In some FAST_IPSEC, spl level is not restored correctly. Fix that.

Spotted by Wolfgang Stukenbrock in pr/36800


Revision tags: nick-csl-alignment-base5
# 1.252 30-Aug-2007 dyoung

Use malloc(9) for sockaddrs instead of pool(9), and remove dom_sa_pool
and dom_sa_len members from struct domain. Pools of fixed-size
objects are too rigid for sockaddr_dls, whose size can vary over
a wide range.

Return sockaddr_dl to its "historical" size. Now that I'm using
malloc(9) instead of pool(9) to allocate sockaddr_dl, I can create
a sockaddr_dl of any size in the kernel, so expanding sockaddr_dl
is useless.

Avoid using sizeof(struct sockaddr_dl) in the kernel.

Introduce sockaddr_dl_alloc() for allocating & initializing an
arbitrary sockaddr_dl on the heap.

Add an argument, the sockaddr length, to sockaddr_alloc(),
sockaddr_copy(), and sockaddr_dl_setaddr().

Constify: LLADDR() -> CLLADDR().

Where the kernel overwrites LLADDR(), use sockaddr_dl_setaddr(),
instead. Used properly, sockaddr_dl_setaddr() will not overrun
the end of the sockaddr.


# 1.251 10-Aug-2007 dyoung

branches: 1.251.2;
Use sockaddr_dl_init().


Revision tags: matt-mips64-base
# 1.250 19-Jul-2007 dyoung

branches: 1.250.4; 1.250.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base yamt-idlelwp-base8 mjf-ufs-trans-base
# 1.249 02-May-2007 dyoung

branches: 1.249.2;
Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


Revision tags: thorpej-atomic-base
# 1.248 25-Mar-2007 liamjfoy

Add net.inet.ip.hashsize to control the IPv4 fast forward hash table size.


# 1.247 24-Mar-2007 liamjfoy

Don't call ip*flow_reap if we're just looking up maxflows


# 1.246 12-Mar-2007 ad

branches: 1.246.2; 1.246.4;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.


# 1.245 05-Mar-2007 liamjfoy

branches: 1.245.2;
Move ipflow_slowtimo from ip_slowtimo and into in_proto.c

ok matt@


# 1.244 04-Mar-2007 christos

Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.243 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.242 29-Jan-2007 dyoung

branches: 1.242.2;
Cosmetic: remove extraneous, non-KNF parentheses. Change a
sizeof(type) to a sizeof(*ptr) so the correctness of the statement
is correct "at a glance" (or so I hope).


# 1.241 22-Dec-2006 ad

ipintr(): check if the queue is empty before looping. Hardly a giant
win, but removed 30% of splnet() calls in one local test.


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.240 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.239 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


# 1.238 06-Dec-2006 dyoung

KNF.


# 1.237 06-Dec-2006 dyoung

KNF.


Revision tags: netbsd-4-0-RC1 netbsd-4-base
# 1.236 16-Nov-2006 christos

branches: 1.236.2; 1.236.4;
__unused removal on arguments; approved by core.


Revision tags: yamt-splraiseipl-base2
# 1.235 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


# 1.234 10-Oct-2006 dogcow

change the MOWNER_INIT define to take two args; fix extant struct mowner
decls to use it. Makes options MBUFTRACE compile again and not whinge about
missing structure declarations. (Also makes initialization consistent.)


# 1.233 05-Oct-2006 tls

Protect calls to pool_put/pool_get that may occur in interrupt context
with spl used to protect other allocations and frees, or datastructure
element insertion and removal, in adjacent code.

It is almost unquestionably the case that some of the spl()/splx() calls
added here are superfluous, but it really seems wrong to see:

s=splfoo();
/* frob data structure */
splx(s);
pool_put(x);

and if we think we need to protect the first operation, then it is hard
to see why we should not think we need to protect the next. "Better
safe than sorry".

It is also almost unquestionably the case that I missed some pool
gets/puts from interrupt context with my strategy for finding these
calls; use of PR_NOWAIT is a strong hint that a pool may be used from
interrupt context but many callers in the kernel pass a "can wait/can't
wait" flag down such that my searches might not have found them. One
notable area that needs to be looked at is pf.

See also:

http://mail-index.netbsd.org/tech-kern/2006/07/19/0003.html
http://mail-index.netbsd.org/tech-kern/2006/07/19/0009.html


# 1.232 19-Sep-2006 elad

Remove ugly (void *) casts from network scope authorization wrapper and
calls to it.

While here, adapt code for system scope listeners to avoid some more
casts (forgotten in previous run).

Update documentation.


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9
# 1.231 13-Sep-2006 elad

branches: 1.231.2;
Don't use KAUTH_RESULT_* where it's not applicable.
Prompted by yamt@.


# 1.230 08-Sep-2006 elad

First take at security model abstraction.

- Add a few scopes to the kernel: system, network, and machdep.

- Add a few more actions/sub-actions (requests), and start using them as
opposed to the KAUTH_GENERIC_ISSUSER place-holders.

- Introduce a basic set of listeners that implement our "traditional"
security model, called "bsd44". This is the default (and only) model we
have at the moment.

- Update all relevant documentation.

- Add some code and docs to help folks who want to actually use this stuff:

* There's a sample overlay model, sitting on-top of "bsd44", for
fast experimenting with tweaking just a subset of an existing model.

This is pretty cool because it's *really* straightforward to do stuff
you had to use ugly hacks for until now...

* And of course, documentation describing how to do the above for quick
reference, including code samples.

All of these changes were tested for regressions using a Python-based
testsuite that will be (I hope) available soon via pkgsrc. Information
about the tests, and how to write new ones, can be found on:

http://kauth.linbsd.org/kauthwiki

NOTE FOR DEVELOPERS: *PLEASE* don't add any code that does any of the
following:

- Uses a KAUTH_GENERIC_ISSUSER kauth(9) request,
- Checks 'securelevel' directly,
- Checks a uid/gid directly.

(or if you feel you have to, contact me first)

This is still work in progress; It's far from being done, but now it'll
be a lot easier.

Relevant mailing list threads:

http://mail-index.netbsd.org/tech-security/2006/01/25/0011.html
http://mail-index.netbsd.org/tech-security/2006/03/24/0001.html
http://mail-index.netbsd.org/tech-security/2006/04/18/0000.html
http://mail-index.netbsd.org/tech-security/2006/05/15/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/01/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/25/0000.html

Many thanks to YAMAMOTO Takashi, Matt Thomas, and Christos Zoulas for help
stablizing kauth(9).

Full credit for the regression tests, making sure these changes didn't break
anything, goes to Matt Fleming and Jaime Fournier.

Happy birthday Randi! :)


Revision tags: yamt-pdpolicy-base8 rpaulo-netinet-merge-pcb-base
# 1.229 30-Aug-2006 christos

branches: 1.229.2;
fix initializer


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.228 30-Jul-2006 elad

ugh.. more stuff that's overdue and should not be in 4.0: remove the
sysctl(9) flags CTLFLAG_READONLY[12]. luckily they're not documented
so it's only half regression.

only two knobs used them; proc.curproc.corename (check added in the
existing handler; its CTLFLAG_ANYWRITE, yay) and net.inet.ip.forwsrcrt,
that got its own handler now too.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base
# 1.227 07-Jun-2006 kardel

merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base simonb-timecounters-base
# 1.226 08-May-2006 liamjfoy

branches: 1.226.2;
#if -> #ifdef

ok christos


# 1.225 15-Apr-2006 christos

Coverity CID 1134: Protect against NULL deref.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.224 18-Feb-2006 joerg

branches: 1.224.2; 1.224.4; 1.224.6;
Print the source and destination IP in ip_forward's DIAGNOSTIC code
with inet_ntoa, making it more human friendly.

From Liam J. Foy in private mail.


# 1.223 24-Dec-2005 perry

branches: 1.223.2; 1.223.4; 1.223.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.222 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.221 01-Nov-2005 christos

Don't decrement the ttl, until we are sure that we can forward this packet.
Before if there was no route, we would call icmp_error with a datagram
packet that has an incorrect checksum. (From Liam Foy)


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base
# 1.220 23-Oct-2005 christos

No need to pass an interface when only the mtu is needed. From OpenBSD via
Liam Foy.


Revision tags: yamt-vop-base
# 1.219 05-Aug-2005 elad

branches: 1.219.2;
Add sysctls for IP, ICMP, TCP, and UDP statistics.


# 1.218 28-Jun-2005 seanb

branches: 1.218.2;
- Return ICMP_UNREACH_NET when no route found as per
section 4.3.3.1 of rfc1812.


# 1.217 09-Jun-2005 atatat

Properly fix the constipated lossage wrt -Wcast-qual and the sysctl
code. I know it's not the prettiest code, but it seems to work rather
well in spite of itself.


# 1.216 01-Jun-2005 blymn

Unconstify rnode to prevent compile error when GATEWAY option set.


Revision tags: kent-audio2-base
# 1.215 29-Apr-2005 yamt

move decl of inetsw to its own header to avoid array of incomplete type.
found by gcc4. reported by Adam Ciarcinski.


# 1.214 18-Apr-2005 yamt

fix problems related to loopback interface checksum omission. PR/29971.

- for ipv4, defer decision to ip layer as h/w checksum offloading does
so that it can check the actual interface the packet is going to.
- for ipv6, disable it.
(maybe will be revisited when it implements h/w checksum offloading.)

ok'ed by Jason Thorpe.


# 1.213 29-Mar-2005 yamt

ip_reass: clear stale csum_flags.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base
# 1.212 26-Feb-2005 perry

branches: 1.212.2;
nuke trailing whitespace


Revision tags: yamt-km-base2
# 1.211 03-Feb-2005 perry

ANSIfy function declarations


# 1.210 02-Feb-2005 perry

de-__P -- will ANSIfy .c files later.


Revision tags: yamt-km-base
# 1.209 24-Jan-2005 matt

branches: 1.209.2;
Add IFNET_FOREACH and IFADDR_FOREACH macros and start using them.


Revision tags: kent-audio1-beforemerge
# 1.208 19-Dec-2004 christos

branches: 1.208.2;
yamt's changes seem to fix all the checksumming issues. Turn the loopback
checksums back off so we can make sure that everything works.


# 1.207 17-Dec-2004 christos

Turn checksumming on loopback back on until we fix the bugs in it.
Connect over tcp on the loopback is broken:

4729 amq 0.000007 CALL connect(4,0x804f2a0,0x1c)
4729 amq 75.007420 RET connect -1 errno 60 Connection timed out


# 1.206 15-Dec-2004 thorpej

Don't perform checksums on loopback interfaces. They can be reenabled with
the net.inet.*.do_loopback_cksum sysctl.

Approved by: groo


Revision tags: kent-audio1-base
# 1.205 06-Oct-2004 darrenr

Add a comment to document what setting "srcrt" is really on about in ipintr()


# 1.204 29-Sep-2004 christos

PR/27081: Sean Boudreau: ip_input() bad csum count not incremented on sw csum


Revision tags: BEFORE-IPF413
# 1.203 25-May-2004 atatat

Sysctl descriptions under net subtree (net.key not done)


# 1.202 02-May-2004 darrenr

at line 543, we do a pullup here of hlen bytes into the mbuf,
so these later ones are superfluous.


# 1.201 01-May-2004 matt

Use EVCNT_ATTACH_STATIC{,2}


# 1.200 25-Apr-2004 simonb

Initialise (most) pools from a link set instead of explicit calls
to pool_init. Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.

Convert struct session, ucred and lockf to pools.


# 1.199 22-Apr-2004 matt

Constify protosw arrays. This can reduce the kernel .data section by
over 4K (if all the network protocols) are loaded.


# 1.198 01-Apr-2004 matt

In ip_reass_ttl_descr, make i signed since it's compared to >= 0


Revision tags: netbsd-2-0-base BEFORE-IPF411
# 1.197 24-Mar-2004 atatat

branches: 1.197.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.


# 1.196 15-Jan-2004 itojun

correct typo in 1.94 -> 1.95. pointed out by Shiva Shenoy


# 1.195 14-Dec-2003 thorpej

Fix syntax errors in CHECK_NMBCLUSTER_PARAMS().


# 1.194 14-Dec-2003 jonathan

Second part of hashed IP_reassembly changes:

When under pressure for mbufs or we have too many fragments in the IP
reassembly queue, drop half of all fragments. This multiplicative-drop
strategy ensures we return to a healthy state, even under borderline
denial-of-service from extremely lossy NFS-over-UDP peers.
The multiplicative-drop phase currently drops 50% of fragments, but
has pre-placed support for implementing drop-fractions other than 50%

The threshhold for the `drop-half' phase is the new variable,
ip_maxfrags which is calculated as nmbclusters/4.

ip_input.c now keeps ip_nmbclusters, a cached copy of nmbclusters.
Before using limits derived from nmbclusters, we check if nmbclusters
and ip_nmclusters are equal. If not, we recompute Ip parameters
derived from nmbclusters. Based on a suggestion by Jason Thorpe.
ip_maxfrags is currently auto-recalcuated.

The counters ip_nfrags and ip_nfragpacketsr are now declared static
and uninitialized (bss), to discourage tampering with them.


# 1.193 12-Dec-2003 scw

Make fast-ipsec and ipflow (Fast Forwarding) interoperate.

The idea is that we only clear M_CANFASTFWD if an SPD exists
for the packet. Otherwise, it's safe to add a fast-forward
cache entry for the route.

To make this work properly, we invalidate the entire ipflow
cache if a fast-ipsec key is added or changed.


# 1.192 08-Dec-2003 jonathan

Add new field ipq_nfrags to struct ipq. Maintain count of fragments
(fragments, not fragmented packets) in each queue entry.
Use ipq_nfrags to maintain a count of total fragments in reassembly queue.


# 1.191 07-Dec-2003 jonathan

KNF: s/unsigned/u_int/, in a couple of places I missed.


# 1.190 06-Dec-2003 jonathan

Replace the single global IP reassembly list/listhead, with a
hashtable of list-heads. Independently re-invented, then reworked to
match similar code in FreeBSD.


# 1.189 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.188 04-Dec-2003 scw

ipflow (IP fast forwarding) is not compatible with FAST_IPSEC either.

XXX: The decision whether or not to fast forward should be made
XXX: dynamically. Using the current approach seriously reduces
XXX: routing performance on gateways with IPsec enabled.


# 1.187 26-Nov-2003 itojun

define RANDOM_IP_ID by default (unifdef -DRANDOM_IP_ID).
one use remains in sys/netipsec, which is kept for freebsd source code compat.


# 1.186 24-Nov-2003 scw

For FAST_IPSEC, ipfilter gets to see wire-format IPsec-encapsulated packets
only. Decapsulated packets bypass ipfilter. This mimics current behaviour
for Kame IPsec.


# 1.185 19-Nov-2003 fvdl

Correct number of arguments to sysctl_rdint.


# 1.184 19-Nov-2003 jonathan

Patch back support for (badly) randomized IP ids, by request:

* Include "opt_inet.h" everywhere IP-ids are generated with ip_newid(),
so the RANDOM_IP_ID option is visible. Also in ip_id(), to ensure
the prototype for ip_randomid() is made visible.

* Add new sysctl to enable randomized IP-ids, provided the kernel was
configured with RANDOM_IP_ID. (The sysctl defaults to zero, and is
a read-only zero if RANDOM_IP_ID is not configured).

Note that the implementation of randomized IP ids is still defective,
and should not be enabled at all (even if configured) without
very careful deliberation. Caveat emptor.


# 1.183 17-Nov-2003 jonathan

Diff to netinet/ip_input.c (restore ip_id, initialize) for ip_id fix:

Revert the (default) ip_id algorithm to the pre-randomid algorithm,
due to demonstrated low-period repeated IDs from the randomized IP_id
code. Consensus is that the low-period repetition (much less than
2^15) is not suitable for general-purpose use.

Allocators of new IPv4 IDs should now call the function ip_newid().
Randomized IP_ids is now a config-time option, "options RANDOM_IP_ID".
ip_newid() can use ip_random-id()_IP_ID if and only if configured
with RANDOM_IP_ID. A sysctl knob should be provided.

This API may be reworked in the near future to support linear ip_id
counters per (src,dst) IP-address pair.


# 1.182 12-Nov-2003 itojun

KNF


# 1.181 11-Nov-2003 jonathan

Change global head-of-local-IP-address list from in_ifaddr to
in_ifaddrhead. Recent changes in struct names caused a namespace
collision in fast-ipsec, which are most cleanly fixed by using
"in_ifaddrhead" as the listhead name.


# 1.180 10-Nov-2003 jonathan

Make per-protocol network input queue stats visible to userland via
sysctl. Add a protocol-independent sysctl handler to show the per-protocol
"struct ifq' statistics. Add IP(v4) specific call to the handler.
Other protocols can show their per-protocol input statistics by
allocating a sysclt node and calling sysctl_ifq() with their own struct ifq *.

As posted to tech-kern plus improvements/cleanup suggested by Andrew Brown.


# 1.179 28-Sep-2003 mycroft

Remove some code that breaks AH tunnels completely. The comment describing
the purpose of this code appears to be on crack -- it's talking about
end-to-end authentication, but the purpose of an AH tunnel is NOT end-to-end
authentication; it's authentication of the tunnel endpoints.

NB: This does not fix the fact that IPsec leaks "packet tags."


# 1.178 06-Sep-2003 itojun

randomize IPv4/v6 fragment ID and IPv6 flowlabel. avoids predictability
of these fields. ip_id.c is from openbsd. ip6_id.c is adapted by kame.


# 1.177 06-Sep-2003 itojun

backout previous, we don't know if arc4random() corrides on reboot.


# 1.176 05-Sep-2003 itojun

initialize fragment ID with arc4random, not by time.tv_sec


# 1.175 22-Aug-2003 itojun

remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.


# 1.174 22-Aug-2003 itojun

change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.


# 1.173 15-Aug-2003 jonathan

(fast-ipsec): Add hooks to pass IPv4 IPsec traffic into fast-ipsec, if
configured with ``options FAST_IPSEC''. Kernels with KAME IPsec or
with no IPsec should work as before.

All calls to ip_output() now always pass an additional compulsory
argument: the inpcb associated with the packet being sent,
or 0 if no inpcb is available.

Fast-ipsec tested with ICMP or UDP over ESP. TCP doesn't work, yet.


# 1.172 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.171 14-Jul-2003 itojun

correct igmp. from love


# 1.170 03-Jul-2003 itojun

minor KNF


# 1.169 30-Jun-2003 itojun

branches: 1.169.2;
do not generate ICMP redirect when packet filter alters ip_dst to an
address that reside on the same link. Cedric Berger convinced me that
it is necessary.


# 1.168 30-Jun-2003 itojun

fix indent


# 1.167 23-Jun-2003 martin

Make sure to include opt_foo.h if a defflag option FOO is used.


# 1.166 15-Jun-2003 matt

Change the way multicasts are kept. They now use a hash table in the same
manner as the ifaddr hash table. By doing this, the mkludge code can go
away. At the same time, keep track of what pcbs are using what ifaddr and
when an address is deleted from an interface, notify/abort all sockets
that have that address as a source. Switch IGMP and multicasts to use pools
for allocation. Fix a number of potential problems in the igmp code where
allocation failures could cause a trap/panic.


# 1.165 11-Apr-2003 christos

PR/991: Darren Reed: Add a sysctl (checkinteface) to implement this. This
implementation is taken from FreeBSD, but we default to off.
XXX: We should really do this on a per ifaddr basis as jason suggested.


# 1.164 26-Feb-2003 matt

Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.163 12-Nov-2002 itojun

remove all entries in rt timer queue on ip_mtudisc change, instead of
destroying the queue.


# 1.162 12-Nov-2002 itojun

ckout previous - doesn't compile


# 1.161 12-Nov-2002 itojun

update ip_mtudisc sysctl change handling.


# 1.160 10-Nov-2002 itojun

always create pmtud timeout queue, as ip_mtudisc can be tweaked via
sysctl at runtime. From lha@stacken.kth.se


# 1.159 02-Nov-2002 perry

/*CONTCOND*/ while (0)'ed macros


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.158 23-Sep-2002 itojun

revert mtudisc_timeout value to the old one if update falis


# 1.157 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame.


# 1.156 11-Sep-2002 itojun

correct signedness mixup in pointer passing. sync w/kame


Revision tags: gehenna-devsw-base
# 1.155 14-Aug-2002 itojun

avoid swapping endian of ip_len and ip_off on mbuf, to meet with M_LEADINGSPACE
optimization made last year. should solve PR 17867 and 10195.

IP_HDRINCL behavior of raw ip socket is kept unchanged. we may want to
provide IP_HDRINCL variant that does not swap endian.


# 1.154 30-Jun-2002 thorpej

Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).


# 1.153 13-Jun-2002 itojun

set IPv4 parameter to modern value.
- turn on path MTU discovery (previous: turned off)
- ICMPv4 redirect entry timeout = 600 sec (previous: never timeout)


# 1.152 09-Jun-2002 itojun

whitespace


# 1.151 07-Jun-2002 itojun

look at rmx_mtu on IPsec tunnel MTU computation.
From: David Waitzman <djw@bbn.com>


Revision tags: netbsd-1-6-base
# 1.150 12-May-2002 matt

branches: 1.150.2; 1.150.4;
Eliminate commons.


# 1.149 12-May-2002 wiz

Spelling fixes, from Sergey Svishchev in kern/16650.


# 1.148 07-May-2002 matt

Change struct ipqe to use TAILQ's instead of LIST's (primarily for TCP's
benefit currently). Rework tcp_reass code to optimize the 4 most likely causes
of out-of-order packets: first OoO pkt, next OoO pkt in seq, OoO pkt is part
of new chuck of OoO packets, and the OoO pkt fills the first hole. Add evcnts
to instrument tcp_reass (enabled by the options TCP_REASS_COUNTERS). This is
part 1/2 of tcp_reass changes.


# 1.147 18-Apr-2002 matt

Change test for M_EXT to M_READONLY for MROUTING. We only need to to do
a pullup if we aren't allowed to modify the packet.


Revision tags: eeh-devprop-base newlock-base
# 1.146 08-Mar-2002 thorpej

Pool deals fairly well with physical memory shortage, but it doesn't
deal with shortages of the VM maps where the backing pages are mapped
(usually kmem_map). Try to deal with this:

* Group all information about the backend allocator for a pool in a
separate structure. The pool references this structure, rather than
the individual fields.
* Change the pool_init() API accordingly, and adjust all callers.
* Link all pools using the same backend allocator on a list.
* The backend allocator is responsible for waiting for physical memory
to become available, but will still fail if it cannot callocate KVA
space for the pages. If this happens, carefully drain all pools using
the same backend allocator, so that some KVA space can be freed.
* Change pool_reclaim() to indicate if it actually succeeded in freeing
some pages, and use that information to make draining easier and more
efficient.
* Get rid of PR_URGENT. There was only one use of it, and it could be
dealt with by the caller.

From art@openbsd.org.


Revision tags: ifpoll-base
# 1.145 25-Feb-2002 itojun

correctly enforce ipsec policy check on forwarding case.
From: Greg Troxel <gdt@ir.bbn.com>, Bill Chiarchiaro <wjc@work.cleartech.com>


# 1.144 24-Feb-2002 martin

Clear M_BCAST and M_MCAST on outgoing mbufs.
Don't copy ttl from the inner packet to the encapsulating packet. Make
the outer ttl sysctl'able. This should close PR 14269 from Jasper Wallace
(change partly from there) and it makes traceroute work over gre tunnels.


# 1.143 21-Feb-2002 itojun

suppress source quence message, based on router-req RFC (also could be abused
as DoS traffic generator). from kjc/kame


# 1.142 28-Nov-2001 darrenr

recompute hlen after calling pfil_run_hooks() in case ip_hl was changed.


# 1.141 13-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-mips-cache-base
# 1.140 04-Nov-2001 matt

Convert netinet to not use the internal <sys/queue.h> field names
but instead the access macros. Use the FOREACH macros where appropriate.


# 1.139 04-Nov-2001 matt

Change a few variable/tables to const since they are read-only.


# 1.138 29-Oct-2001 simonb

Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.137 17-Sep-2001 thorpej

branches: 1.137.2;
Split the pre-computed ifnet checksum flags into Tx and Rx directions.
Add capabilities bits that indicate an interface can only perform
in-bound TCPv4 or UDPv4 checksums. There is at least one Gig-E chip
for which this is true (Level One LXT-1001), and this is also the
case for the Intel i82559 10/100 Ethernet chips.


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.136 06-Aug-2001 itojun

branches: 1.136.2;
cache IPsec policy on in6?pcb. most of the lookup operations can be bypassed,
especially when it is a connected SOCK_STREAM in6?pcb. sync with kame.


# 1.135 02-Jun-2001 thorpej

branches: 1.135.2;
Implement support for IP/TCP/UDP checksum offloading provided by
network interfaces. This works by pre-computing the pseudo-header
checksum and caching it, delaying the actual checksum to ip_output()
if the hardware cannot perform the sum for us. In-bound checksums
can either be fully-checked by hardware, or summed up for final
verification by software. This method was modeled after how this
is done in FreeBSD, although the code is significantly different in
most places.

We don't delay checksums for IPv6/TCP, but we do take advantage of the
cached pseudo-header checksum.

Note: hardware-assisted checksumming defaults to "off". It is
enabled with ifconfig(8). See the manual page for details.

Implement hardware-assisted checksumming on the DP83820 Gigabit Ethernet,
3c90xB/3c90xC 10/100 Ethernet, and Alteon Tigon/Tigon2 Gigabit Ethernet.


# 1.134 21-May-2001 lukem

fix spelo in comment


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.133 16-Apr-2001 itojun

give a default value to net.inet.ip.maxfragpackets, to protect us from
"lots of fragmented packets" DoS attack.

the current default value is derived from ipv6 counterpart, which is
a magical value "200". it should be enough for normal systems, not sure
if it is enough when you take hundreds of thousands of tcp connections on
your system. if you have proposal for a better value with concrete reasons,
let me know.


# 1.132 13-Apr-2001 thorpej

Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.


# 1.131 27-Mar-2001 itojun

net.inet.ip.maxfragpackets defines the maximum size of ip reass queue
(prevents fragment flood from chewing up mbuf memory space).
derived from KAME net.inet6.ip6.maxfragpackets.


# 1.130 02-Mar-2001 itojun

branches: 1.130.2;
increase ipstat.ips_badaddr if the packet fails to pass address checks.


# 1.129 02-Mar-2001 itojun

reject packets with 127/8 on IPv4 src/dst, they must not appear on wire
(RFC1122). torture-tests will be welcomed.
XXX do we want to check source routing headers as well?


# 1.128 01-Mar-2001 itojun

make sure to enforce inbound ipsec policy checking, for any protocols on top
of ip (check it when final header is visited). sync with kame.
XXX kame team will need to re-check policy engine code


# 1.127 24-Jan-2001 itojun

- record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation


# 1.126 28-Dec-2000 thorpej

Back out the sledgehammer damage applied by wiz while I was out for
the holiday.


# 1.125 25-Dec-2000 wiz

Back out previous change. It causes NAT to fail, and was CLEARLY
NOT TESTED before it was committed.


# 1.124 22-Dec-2000 thorpej

Slight adjustment to how pfil_head's are registered. Instead of a
"key" and a "dlt", use a "type" (PFIL_TYPE_{AF,IFNET} for now) and
a val/ptr appropriate for that type. This allows for more future
flexibility with the pfil_hook mechanism.


# 1.123 14-Dec-2000 thorpej

Add ALTQ glue. XXX Temporary until ALTQ is changed to use a pfil hook.


# 1.122 24-Nov-2000 itojun

IFA_STATS stability (not complete); don't touch ip if it is NULL.


# 1.121 11-Nov-2000 thorpej

Restructure the PFIL_HOOKS mechanism a bit:
- All packets are passed to PFIL_HOOKS as they come off the wire, i.e.
fields in protocol headers in network order, etc.
- Allow for multiple hooks to be registered, using a "key" and a "dlt".
The "dlt" is a BPF data link type, indicating what type of header is
present.
- INET and INET6 register with key == AF_INET or AF_INET6, and
dlt == DLT_RAW.
- PFIL_HOOKS now take an argument for the filter hook, and mbuf **,
an ifnet *, and a direction (PFIL_IN or PFIL_OUT), thus making them
less IP (really, IP Filter) centric.

Maintain compatibility with IP Filter by adding wrapper functions for
IP Filter.


# 1.120 08-Nov-2000 ad

Update for hashinit() change.


# 1.119 13-Oct-2000 itojun

make sure we don't share external mbuf between m and mcopy, in ip_forward().
should solve PR 11201.


# 1.118 26-Aug-2000 itojun

make sure anonport{min,max} is not negative number


# 1.117 25-Aug-2000 tron

Add new sysctl variables "net.inet.ip.lowportmin" and
"net.inet.ip.lowportmax" which can be used to the set minimum
and maximum port number assigned to sockets using
IP_PORTRANGE_LOW.


# 1.116 06-Jul-2000 itojun

remove unnecessary #include <netkey/key_debug.h>. from kame.


# 1.115 28-Jun-2000 mrg

<vm/vm.h> -> <uvm/uvm_extern.h>


Revision tags: netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.114 10-May-2000 itojun

branches: 1.114.4;
add missing boundary checks to ip options processing.
correct timestamp option validation (len and ptr upper/lower bound
based on RFC791).
fill "pointer" field for parameter problem in timestamp option processing.


# 1.113 10-May-2000 itojun

correct more out-of-bounds memory access, if cnt == 1 and optlen > 1.


# 1.112 06-May-2000 sommerfeld

Handle large offsets with very small options correctly.


# 1.111 31-Mar-2000 jdolecek

Slighly improve previous - only include <netinet/ip_mroute.h> if MROUTING
is defined.


# 1.110 31-Mar-2000 jdolecek

include <netinet/ip_mroute.h> for ip_mforward() - needed after
last duplicate prototype sweep (prototype for ip_mforward() used to be in <netinet/ip_var.h>)


# 1.109 30-Mar-2000 augustss

Remove register declarations.


# 1.108 30-Mar-2000 simonb

Delete uninitialised declaration of ip_defttl - there's an initialised
decl earlier in this file.


# 1.107 10-Mar-2000 thorpej

Back out previous, and adjust a comment.


# 1.106 07-Mar-2000 thorpej

Back out part of 1.104 which isn't actually needed.


# 1.105 03-Mar-2000 itojun

remove unnecessary ttl initialization which I mistakingly bringed in
during KAME merge (this is part of WIDE's expeirmental reass code...)
NetBSD PR: 9412
From: Wolfgang Rupprecht <wolfgang@wsrcc.com>
Fix from: ho@crt.se
itojun was notified from: theo


# 1.104 02-Mar-2000 thorpej

Avoid a bug in GCC which manifests itself when processing unaligned
IP options. Problem pointed out by Matt Hargett and Erik Fair, analyzed
by me.


# 1.103 01-Mar-2000 itojun

introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)


# 1.102 20-Feb-2000 darrenr

pass "struct pfil_head *" to pfil_add_hook and pfil_remove hook rather
than "struct protosw *".


# 1.101 17-Feb-2000 darrenr

Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.


# 1.100 16-Feb-2000 itojun

- if ip_dst matches address on !IFF_UP interface, and
- there's no match against addresses on IFF_UP interface,
send icmp unreach if I'm router. drop it if I'm host.

Revised version of PR: 9387 from nrt@iij.ad.jp. Discussed with thorpej+nrt.


Revision tags: chs-ubc2-newbase
# 1.99 12-Feb-2000 thorpej

Typo (Thanks, Havard :-)


# 1.98 12-Feb-2000 thorpej

Small cosmetic change, and note a place where a statistic should be
gathered.


# 1.97 11-Feb-2000 itojun

fix in-kernel packet forwarding loop (till TTL becomes 0) when:
- a packet is delivered to an address X,
- and the address X is configured on my !IFF_UP interface
- and ipforwarding=1

NetBSD PR: 9387
From: nrt@iij.ad.jp


# 1.96 01-Feb-2000 thorpej

Use ifatoia() and sintosa() consistently, rather than using home-grown
casting macros intermixed.


# 1.95 31-Jan-2000 itojun

bring in latest KAME ipsec tree.
- interop issues in ipcomp is fixed
- padding type (after ESP) is configurable
- key database memory management (need more fixes)
- policy specification is revisited

XXX m->m_pkthdr.rcvif is still overloaded - hope to fix it soon


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.94 26-Oct-1999 itojun

disable ipflow (IPv4 fast fowarding) when IPsec is configured into the kernel.


# 1.93 17-Oct-1999 sommerfeld

branches: 1.93.2; 1.93.4;
In ip_forward():

Avoid forwarding ip unicast packets which were contained inside
link-level multicast packets; having M_MCAST still set in the packet
header flags will mean that the packet will get multicast to a bogus
group instead of unicast to the next hop.

Malformed packets like this have occasionally been spotted "in the
wild" on a mediaone cable modem segment which also had multiple netbsd
machines running as router/NAT boxes.

Without this, any subnet with multiple netbsd routers receiving all
multicasts will generate a packet storm on receipt of such a
multicast. Note that we already do the same check here for link-level
broadcasts; ip6_forward already does this as well.

Note that multicast forwarding does not go through ip_forward().

Adding some code to if_ethersubr to sanity check link-level
vs. ip-level multicast addresses might also be worthwhile.


Revision tags: chs-ubc2-base
# 1.92 23-Jul-1999 itojun

branches: 1.92.2;
do not include unnecessary include files.


# 1.91 09-Jul-1999 thorpej

defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).


# 1.90 06-Jul-1999 itojun

sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups


# 1.89 01-Jul-1999 itojun

IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.88 26-Jun-1999 sommerfeld

If the new global variable hostzerobroadcast is zero, no longer assume
address zero of each net/subnet is a broadcast address.
(The default value is nonzero, which preserves the current behavior).

This can be set using sysctl; the boot-time default can also be
configured using the HOSTZEROBROADCAST kernel config option.

While we're here, defopt HOSTZEROBROADCAST and SUBNETSARELOCAL


# 1.87 04-May-1999 hwr

It does not make much sense to increase a "output" counter on input.


# 1.86 03-May-1999 thorpej

In INADDR_TO_IA(), skip interfaces which are not up. Revert previous change
to ip_input.c to check the interface status after INADDR_TO_IA().

Fix cooked up by Heiko Rupp and myself.

Fixes PR 7480.


# 1.85 03-May-1999 hwr

Drop packets, that have a Class-D address as source address.
Implements the first half of PR 7003.


# 1.84 07-Apr-1999 proff

tiny KNF change


# 1.83 07-Apr-1999 proff

Prevent reception of packets on downed interfaces (via an up interface).
fixes kern/7327


Revision tags: netbsd-1-4-base
# 1.82 27-Mar-1999 aidan

branches: 1.82.2;
Added per-addr input/output statistics. Currently just support netatalk
and netinet, currently only tested under netinet.

Disabled by default, enabled by compiling the kernel with option
IFA_STATS. Enabling this feature seems to make the ip_output function
take 13% longer than before, which should be OK for people that need
this feature.


# 1.81 26-Mar-1999 proff

security: test for ip_len < ip_hl <<2 and drop packet accordingly


# 1.80 19-Jan-1999 mycroft

There's just no plausible reason to byte-swap ip_id internally. It's opaque.


# 1.79 19-Jan-1999 mycroft

Don't screw with ip_len; just subtract from it where we actually use the
value.


# 1.78 19-Jan-1999 mycroft

Don't overwrite the checksum fields when checking them. There's no reason to
do this, and it screws up ICMP replies.
XXX The returned IP checksum and length are still wrong.


# 1.77 11-Jan-1999 thorpej

Fix byte order and ip_len inconsistencies in ICMP reply code. Also, fix
some formatting and HTONS(foo) vs. foo = htons(foo) inconsistencies.

PR #6602, Darren Reed.


# 1.76 19-Dec-1998 thorpej

Reverse the copyright-notice-swap. It went against existing practice.


# 1.75 18-Dec-1998 thorpej

Add a lock around the IP fragment reassembly queue, to prevent ip_drain()
from corrupting the queue if called from a device's interrupt context.

Should fix PR #5684.


Revision tags: kenh-if-detach-base
# 1.74 13-Nov-1998 thorpej

branches: 1.74.2;
Once a fragmented IP packet has been reassembled, recompute the packet
length before passing it up the stack. From FreeBSD.


Revision tags: chs-ubc-base
# 1.73 08-Oct-1998 thorpej

Use the pool allocator for ipflow entries.


# 1.72 08-Oct-1998 thorpej

Use the pool allocator for ipqent structures.


# 1.71 30-Sep-1998 tls

Switch order of TNF and UCB copyrights so UCB copyright is first; this seems more appropriate since UCB wrote the original code, after all.


# 1.70 09-Sep-1998 thorpej

Make a diagnostic printf more sensible, PR #5951, Heiko W. Rupp.


# 1.69 09-Aug-1998 mrg

defopt PFIL_HOOKS.


Revision tags: eeh-paddr_t-base
# 1.68 17-Jul-1998 sommerfe

Fix PR5508: ipfil cut-through forwarding causes panic


# 1.67 01-Jun-1998 thorpej

Protect the ipflow_reap() call with splsoftnet.


# 1.66 24-May-1998 thorpej

Fix OBOB in IP timestamp option processing, as noted in FreeBSD PR 6738,
from Jennifer Dawn Meyers <jdm@enteract.com>.


# 1.65 04-May-1998 matt

Default IP flow to being enabled. Add a sysctl to control the maximum
number of flows (net.inet.ip.maxflows). If set to 0, will disable fast
path forwarding.


# 1.64 01-May-1998 thorpej

Allow packet filters to prevent a packet from creating a fast-forwarding
flow, by setting the "can fast forward" flag in the packet header, and
giving a chance for filters to clear the flag. If the flag is still
set after the filters have given it a chance, the packet will be used
to create a fast-forward flow entry.


# 1.63 29-Apr-1998 matt

Add support for "fast" forwarding. Add hooks in if_ethersubr.c and
if_fddisubr.c to fastpath IP forwarding. If ip_forward successfully
forwards a packet, it will create a cache (ipflow) entry. ether_input
and fddi_input will first call ipflow_fastforward with the received
packet and if the packet passes enough tests, it will be forwarded (the
ttl is decremented and the cksum is adjusted incrementally).


# 1.62 29-Apr-1998 matt

defopt GATEWAY


# 1.61 29-Apr-1998 kml

change path MTU timeout value to match RFC 1191


# 1.60 29-Apr-1998 kml

Add support for deletion of routes added by path MTU discovery;
uses new generic route timeout code. Add sysctl for timeout period.


# 1.59 19-Mar-1998 mrg

convert pfil(9) in and out lists from <sys/queue.h> LISTs to TAILQs, and
change pfil_add_hook to put output filters at the tail of the queue,
while continuing to place input filters at the head of the queue. update
the two users of these functions, and document these changes.

fixes PR#4593.


# 1.58 15-Feb-1998 tls

Add correct copyright notice for IP address hash change. This code is donated to TNF by the original copyright holder, Panix.


# 1.57 13-Feb-1998 tls

Change list of interface IP addresses to a hash. Improves performance on hosts with a large number of IP addresses significantly.


# 1.56 28-Jan-1998 thorpej

Use offsetof() from libkern.h


# 1.55 12-Jan-1998 scottr

Use option header file for MROUTING


# 1.54 05-Jan-1998 lukem

enhance ephemeral port allocation code:
* support sysctl net.inet.ip.anonportmin (lowest ephemeral port)
and net.inet.ip.anonportmax (highest ephemeral port).
these can't be set to >65535, < IPPORT_RESERVED (unless IPNOPRIVPORTS
is defined), and anonportmin has to be < anonportmax.
* use a cleaner way of only cycling through the available set once;
this will be useful for when a random allocation scheme is used
* define IPPORT_ANON{MIN,MAX} instead of IPPORT_USER{LOW,HIGH}


Revision tags: netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base
# 1.53 18-Oct-1997 kml

branches: 1.53.2;
change sysctl net.inet.icmp.mtudisc to net.inet.ip.mtudisc


# 1.52 17-Oct-1997 thorpej

Allow `subnetsarelocal' to be changed via sysctl.


Revision tags: thorpej-signal-base marc-pcmcia-base
# 1.51 29-Aug-1997 gwr

Tweaks to allow operation with an interface address of 0.0.0.0
(needed for NFS mountroot using BOOTP to get boot parameters)


Revision tags: marc-pcmcia-bp
# 1.50 24-Jun-1997 thorpej

branches: 1.50.4;
Eliminate use of dtom() from the network code, allowing more flexible
use of mbuf external storage and increasing performance (by eliminating
an m_pullup() for clusters in the IP reassembly code).

Changes from Koji Imada <koji@math.human.nagoya-u.ac.jp>, in PR #3628
and #3480, with ever-so-slight integration changes by me.


# 1.49 15-Apr-1997 christos

Move the mtod calls *after* we've made sure that the packet has passed the
filter successfully. Otherwise it can be NULL if the filter blocked it,
and we die. How did this ever work?


Revision tags: is-newarp-before-merge
# 1.48 26-Feb-1997 mrg

allow src-routed packetd by default, per host requirements


# 1.47 25-Feb-1997 cjs

Add net.inet.ip.allowsrcrt option which allows/drops all source
routed packets. This currently defaults to `drop,' but once we
verify that all applications that rely on determining remote IP
addresses for authentication are dropping the connection when they
see a source route option (not just disabling the source route
option), we can turn this back on and conform with the host
requirements.


# 1.46 19-Feb-1997 cjs

Fix bug in sysctl net.inet.ip.forwsrcrt handing: now you can read it
if securelevel > 0. (Thanks, cgd.)


# 1.45 18-Feb-1997 mrg

pseudo-device ipfilter brings in PFIL_HOOKS.


Revision tags: is-newarp-base
# 1.44 11-Jan-1997 thorpej

branches: 1.44.4;
Implement the IP_RECVIF socket option: supply a datagram packet's incoming
interface using a sockaddr_dl in a control mbuf.

Implement SO_TIMESTAMP for IP datagrams.

Move packet information option processing into a generic function
so that they work with multicast UDP and raw IP as well as unicast UDP.

Contributed by Bill Fenner <fenner@parc.xerox.com>.


# 1.43 20-Dec-1996 mrg

in pfil_hooks: always reassign ip after calling hook.


# 1.42 20-Dec-1996 mrg

remove pfil_bad.


# 1.41 25-Oct-1996 thorpej

Before concatenating frags, sanity check the length of the packet. If it's
larger than IP_MAXPACKET, discard it.
Based on a patch from Bill Fenner <fenner@parc.xerox.com>


# 1.40 22-Oct-1996 veego

Fix a panic from the pfil_hooks.


# 1.39 13-Oct-1996 christos

backout previous kprintf changes


# 1.38 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.37 21-Sep-1996 perry

commit fix in pr 2772 -- the IP input code was assuming that the
reserved (must be zero) flag must necessarily be zero. We now define
an IP_RF (by analogy to IP_DF and IP_MF) and mask it out when necessary.


# 1.36 14-Sep-1996 mrg

move the packet filter hooks in to a saner location. while i'm here, rename
PACKET_FILTER to PFIL_HOOKS.


# 1.35 09-Sep-1996 mycroft

Add in_nullhost() and in_hosteq() macros, to hide some protocol
details. Also, fix a bug in TCP wrt SYN+URG packets.


# 1.34 08-Sep-1996 mycroft

Save 68 bytes of the packet for ICMP, not 64. From Laine Stump, PR 2296.


# 1.33 06-Sep-1996 mrg

add packet filter interface code. see pfil(9) for more details. you
need the PACKET_FILTER option to enable this code. currently, ipfilter
version 3.1.1-beta has been converted to use this new interface.


# 1.32 14-Aug-1996 thorpej

Fix some DIAGNOSTIC printf() formats; ntohl() provides a 32-bit quantity,
and should be printed with %x, not %lx.


# 1.31 10-Jul-1996 cgd

print result of ntohl/htonl as a long. (makes -Wformat work on the
Alpha.)


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.30 16-Mar-1996 christos

branches: 1.30.4;
Fix printf format args.


# 1.29 26-Feb-1996 mrg

two more local addr changes, all done differently now (idea from charles)


# 1.28 13-Feb-1996 christos

netinet prototypes


# 1.27 16-Jan-1996 thorpej

Add a net.inet.ip.directed-broadcast sysctl as suggested by
Darren Reed <darrenr@vitruvius.arbld.unimelb.edu.au> in PR #1227.
This change is slightly different than the one submitted by Darren in
that the DIRECTED_BROADCAST compile-time option will behave like it used
to so that existing configurations utilizing it won't have to change.


# 1.26 15-Jan-1996 thorpej

Add net.inet.ip.forwsrcrt: if zero, the system will not forward
source-routed packets. Note this value is protected by kernel security
level; it can only be changed if securelevel < 1.


# 1.25 21-Nov-1995 cgd

make netinet work on systems where pointers and longs are 64 bits
(like the alpha). Biggest problem: IP headers were overlayed with
structure which included pointers, and which therefore didn't overlay
properly on 64-bit machines. Solution: instead of threading pointers
through IP header overlays, add a "queue element" structure to do
the threading, and point it at the ip headers.


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.24 12-Aug-1995 mycroft

splnet --> splsoftnet


# 1.23 12-Jun-1995 mycroft

Change in_pcbnotify*() to take an errno value. Make inetctlerrmap[] an
array on ints, not u_chars.


# 1.22 12-Jun-1995 mycroft

Various cleanup, including:
* Convert several data structures to use queue.h.
* Split in_pcbnotify() into two parts; one for notifying a specific PCB, and
one for notifying all PCBs for a particular foreign address.


# 1.21 07-Jun-1995 mycroft

Remove ip_ifmatrix completely.


# 1.20 04-Jun-1995 mycroft

Don't cast things unnecessarily.


# 1.19 04-Jun-1995 mycroft

Clean up many more casts.


# 1.18 01-Jun-1995 mycroft

Avoid byte-swapping IP addresses at run time.


# 1.17 15-May-1995 cgd

oops; forgot a '{'


# 1.16 14-May-1995 cgd

drop (and record) malformed IP fragments. Fixes pr 1030 (differently).


# 1.15 13-Apr-1995 cgd

be a bit more careful and explicit with types. (basically a large no-op.)


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.14 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.13 13-May-1994 mycroft

Update to 4.4-Lite networking code, with a few local changes.


# 1.12 14-Feb-1994 mycroft

PARANOID --> DIAGNOSTIC for inexpensive tests.


# 1.11 02-Feb-1994 hpeyerl

Multicast is no longer optional.


# 1.10 29-Jan-1994 brezak

Fix some cases of NOT dealing with m_pkthdr's. This code is still suspect though, at least this fixes some panics.


# 1.9 10-Jan-1994 mycroft

Should compile now with or without `options MULTICAST'.


# 1.8 09-Jan-1994 mycroft

Prototype the rest.


# 1.7 08-Jan-1994 mycroft

More prototypes.


# 1.6 08-Jan-1994 mycroft

Fix some inconsistent spacing; spaces at the end of lines, etc.


# 1.5 18-Dec-1993 mycroft

Canonicalize all #includes.


# 1.4 06-Dec-1993 hpeyerl

multicast support.
>From Chris Maeda, cmaeda@cs.washington.edu
These patches are derived from the IP Multicast patches for BSDI.


Revision tags: magnum-base netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.3 20-May-1993 cgd

branches: 1.3.4;
more rcsid additions and file header cleanups


# 1.2 04-May-1993 cgd

make ip_input recursion checking be for -DPARANOID, and make it panic


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.373 06-Feb-2018 maxv

Typos and style a bit, no functional change.


# 1.372 05-Feb-2018 maxv

Exterminate IPSENDREDIRECTS and IPMTUDISCTIMEOUT, neither is documented.


# 1.371 05-Feb-2018 maxv

Nuke DIRECTED_BROADCAST, it is not documented and not enabled anywhere. It
probably wouldn't have built correctly anyway, since there is no associated
defflag.

These ten lines of code in ip_input.c already look a lot better.


# 1.370 05-Feb-2018 maxv

Clean up this mess. This is typically the kind of places where we need to
seriously cut the bullshit. These things are unreadable, undocumented, and
all they bought us was not figuring out we had IPv4 forwarding enabled by
default for 20+ years.


# 1.369 05-Feb-2018 maxv

Be tougher, and don't allow LSRR+SSRR (RFC7126).


# 1.368 05-Feb-2018 maxv

Kick duplicate options, they are not allowed (RFC791).


# 1.367 05-Feb-2018 maxv

Remove unused variable.


# 1.366 05-Feb-2018 maxv

Disable ip_allowsrcrt and ip_forwsrcrt. Enabling them by default was a
completely dumb idea, because they have security implications.

By sending an IPv4 packet containing an LSRR option, an attacker will
cause the system to forward the packet to another IPv4 address - and
this way he white-washes the source of the packet.

It is also possible for an attacker to reach hidden networks: if a server
has a public address, and a private one on an internal network (network
which has several internal machines connected), the attacker can send a
packet with:

source = 0.0.0.0
destination = public address of the server
LSRR first address = address of a machine on the internal network

And the packet will be forwarded, by the server, to the internal machine,
in some cases even with the internal IP address of the server as a source.


# 1.365 05-Feb-2018 maxv

Style, no functional change.


# 1.364 01-Jan-2018 christos

1) "#define ipi_spec_dst ipi_addr" in <netinet/in.h>
2) Change the IP_RECVPKTINFO option to control the generation of
IP_PKTINFO control messages, the way it's done in Solaris.
3) Remove the superfluous IP_RECVPKTINFO control message.
4) Change the IP_PKTINFO option to do different things depending on
the parameter it's supplied with:
- If it's sizeof(int), assume it's being used as in Linux:
- If it's non-zero, turn on the IP_RECVPKTINFO option.
- If it's zero, turn off the IP_RECVPKTINFO option.
- If it's sizeof(struct in_pktinfo), assume it's being used as in
Solaris, to set a default for the source interface and/or
source address for outgoing packets on the socket.
5) Return what Linux or Solaris compatible code expects, depending
on data size, and just added a fallback to a Linux (and current NetBSD)
compatible value if the size is unknown (as it is now), or,
in the future, if the calling application specifies a receiving
buffer that doesn't match either data item.

From: Tom Ivar Helbekkmo


Revision tags: tls-maxphys-base-20171202
# 1.363 24-Nov-2017 roy

Allow local communication over DETACHED addresses.
Allow binding to DETACHED or TENTATIVE addresses as we deny
sending upstream from them anyway.
Prefer non DETACHED or TENTATIVE addresses.


# 1.362 17-Nov-2017 ozaki-r

Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch

It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.

No functional change


# 1.361 27-Sep-2017 ozaki-r

Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).


Revision tags: nick-nhusb-base-20170825
# 1.360 27-Jul-2017 ozaki-r

Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.


# 1.359 19-Jul-2017 ozaki-r

Correct a comment


Revision tags: perseant-stdc-iso10646-base
# 1.358 08-Jul-2017 christos

Reorder the controls to the ones that need an interface and the ones that
don't; process the ones that don't first. Add a DIAGNOSTIC if there is no
interface; really this should be a KASSERT/panic because it is a bug if the
interface is not set at this point.


# 1.357 06-Jul-2017 christos

remove unnecessary casts (no functional change)


# 1.356 06-Jul-2017 christos

Merge the two copies SO_TIMESTAMP/SO_OTIMESTAMP processing to a single
function, and add a SOOPT_TIMESTAMP define reducing compat pollution from
5 places to 1.


Revision tags: netbsd-8-base
# 1.355 01-Jun-2017 chs

branches: 1.355.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base
# 1.354 31-Mar-2017 ozaki-r

Don't use a single global variable to store source route information for multiple incoming packets

It's not MP-safe. So use a m_tag to store the information instead.

Pointed out by knakahara@
The fix is from OpenBSD (originally fixed in FreeBSD)


# 1.353 31-Mar-2017 ozaki-r

Don't use a single global variable as a temporal storage for multiple packets

It's not MP-safe. So use local variables instead.


Revision tags: pgoyette-localcount-20170320
# 1.352 06-Mar-2017 ozaki-r

Make sure icmp_redirect_timeout_q and ip_mtudisc_timeout_q are initialized on bootup

Fix PR kern/52029


# 1.351 17-Feb-2017 ozaki-r

Fix return value


# 1.350 17-Feb-2017 ozaki-r

Protect sysctl_net_inet_ip_pmtudto with icmp_mtx instead of softnet_lock


# 1.349 07-Feb-2017 ozaki-r

Add missing NULL checks for m_get_rcvif


Revision tags: nick-nhusb-base-20170204
# 1.348 24-Jan-2017 ozaki-r

Tweak softnet_lock and NET_MPSAFE

- Don't hold softnet_lock in some functions if NET_MPSAFE
- Add softnet_lock to sysctl_net_inet_icmp_redirtimeout
- Add softnet_lock to expire_upcalls of ip_mroute.c
- Restore softnet_lock for in{,6}_pcbpurgeif{,0} if NET_MPSAFE
- Mark some softnet_lock for future work


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107
# 1.347 12-Dec-2016 ozaki-r

branches: 1.347.2;
Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.346 08-Dec-2016 ozaki-r

Use psref for ip_rtaddr

ip_rtaddr will be sleepable soon. So use psref instead of pserialize.


# 1.345 08-Dec-2016 ozaki-r

Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.344 18-Oct-2016 ozaki-r

Don't hold global locks if NET_MPSAFE is enabled

If NET_MPSAFE is enabled, don't hold KERNEL_LOCK and softnet_lock in
part of the network stack such as IP forwarding paths. The aim of the
change is to make it easy to test the network stack without the locks
and reduce our local diffs.

By default (i.e., if NET_MPSAFE isn't enabled), the locks are held
as they used to be.

Reviewed by knakahara@


# 1.343 18-Oct-2016 ozaki-r

Avoid double frees of mbuf

May fix one of panicks reported by Tom Ivar Helbekkmo in PR kern/51522


# 1.342 11-Oct-2016 ozaki-r

Fix kernel builds with IFA_STATS


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.341 07-Sep-2016 roy

Disallow input to detached addresses because they are not yet valid.


# 1.340 31-Aug-2016 ozaki-r

Make ipforward_rt and ip6_forward_rt percpu

Sharing one rtcache between CPUs is just a bad idea.

Reviewed by knakahara@


Revision tags: pgoyette-localcount-20160806
# 1.339 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


# 1.338 26-Jul-2016 ozaki-r

Fix downmatch increment


Revision tags: pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.337 08-Jul-2016 ozaki-r

branches: 1.337.2;
CID 1363344: remove dead code

We may need to reconsider a case when m_get_rcvif_psref returns NULL.


# 1.336 07-Jul-2016 ozaki-r

Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.


# 1.335 06-Jul-2016 ozaki-r

Switch the IPv4 address list to pslist(9)

Note that we leave the old list just in case; it seems there are some
kvm(3) users accessing the list. We can remove it later if we confirmed
nobody does actually.


# 1.334 06-Jul-2016 ozaki-r

Add and use pslist(9)-based hashtable for IPv4 addresses

Note that we leave the old hashtable to keep vmstat -H working.


# 1.333 04-Jul-2016 ozaki-r

Separate IP address matching functions

No functional change intended.


# 1.332 30-Jun-2016 ozaki-r

Tidy up goto lables

No functional change.


# 1.331 30-Jun-2016 ozaki-r

Fix error paths

Some error paths did m_put_rcvif_psref twice.


# 1.330 28-Jun-2016 ozaki-r

Add missing NULL checks for m_get_rcvif_psref


# 1.329 10-Jun-2016 ozaki-r

Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319
# 1.328 21-Jan-2016 riastradh

Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.


# 1.327 21-Jan-2016 riastradh

Give proper prototype to ip_output.


# 1.326 08-Jan-2016 knakahara

eliminate ip_input.c and ip6_input.c dependency on gif(4)


Revision tags: nick-nhusb-base-20151226
# 1.325 13-Oct-2015 roy

Include arp.h to restore the sysctl net.inet.ip.dad_count.
Fixes PR kern/49883 thanks to HITOSHI Osada.


Revision tags: nick-nhusb-base-20150921
# 1.324 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.323 07-Aug-2015 ozaki-r

Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.


Revision tags: nick-nhusb-base-20150606
# 1.322 02-May-2015 joerg

Fix !ARP build.


# 1.321 02-May-2015 roy

Add IPv4 address flags IN_IFF_TENTATIVE, IN_IFF_DUPLICATED and
IN_IFF_DETATCHED to mimic the IPv6 address behaviour.
Add SIOCGIFAFLAG_IN ioctl to retrieve the address flag via the
ifreq structure.
Add IPv4 DAD detection via the ARP methods described in RFC 5227.
Add sysctls net.inet.ip.dad_count and net.inet.arp.debug.

Discussed on tech-net@


Revision tags: nick-nhusb-base-20150406
# 1.320 26-Mar-2015 ozaki-r

Tidy up the regular path of ip_forward

No functional change is intended.


Revision tags: netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.319 16-Jun-2014 ozaki-r

branches: 1.319.4;
Add 3rd argument to pktq_create to pass sc

It will be used to pass bridge sc for bridge_forward softint.

ok rmind@


# 1.318 05-Jun-2014 rmind

- Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.


# 1.317 30-May-2014 christos

Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.


# 1.316 29-May-2014 rmind

Make IGMP and multicast group management code MP-safe. Use a read-write
lock to protect the hash table of multicast address records; also, make it
private and eliminate some macros. In the long term, the lookup path ought
to be optimised.


# 1.315 28-May-2014 christos

CID 12164{49,51}: Remove bogus ifp == NULL checks; if ifp was really NULL,
we would have been dead a few lines before the tests.


# 1.314 23-May-2014 rmind

ip_input(), ip_savecontrol(): cache m->m_pkthdr.rcvif in a variable.


# 1.313 23-May-2014 rmind

Make ip_forward() static, there is no need to expose it.


# 1.312 23-May-2014 rmind

Make ip_input() static, there is no need to expose it.


# 1.311 22-May-2014 rmind

- Add in_init() and move some functions, variables and sysctls into in.c
where they belong to. Make some functions and variables static.
- ip_input.c: reduce some #ifdefs, cleanup a little.
- Move some sysctls into ip_flow.c as they belong there.

No functional change.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.310 19-Mar-2014 liamjfoy

branches: 1.310.2;
Remove ipflow_prune and replace with ipflow_reap. ok rmind@


Revision tags: riastradh-drm2-base3
# 1.309 25-Feb-2014 pooka

Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.308 29-Jun-2013 rmind

- Rewrite parts of pfil(9): use array to store hooks and thus be more cache
friendly (there are only few hooks in the system). Make the structures
opaque and the interface more strict.
- Remove PFIL_HOOKS option by making pfil(9) mandatory.


# 1.307 27-Jun-2013 christos

branches: 1.307.2;
flip src/dst


# 1.306 27-Jun-2013 christos

implement IP_PKTINFO and IP_RECVPKTINFO.


# 1.305 08-Jun-2013 rmind

Split IPsec code in ip_input() and ip_forward() into the separate routines
ipsec4_input() and ipsec4_forward(). Tested by christos@.


# 1.304 05-Jun-2013 christos

IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.


Revision tags: agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.303 29-Nov-2012 christos

Add a new sysctl to mark ports as reserved, so that they are not used in
the anonymous or reserved port allocation.


Revision tags: yamt-pagecache-base6
# 1.302 25-Jun-2012 christos

branches: 1.302.2;
rename rfc6056 -> portalgo, requested by yamt


# 1.301 22-Jun-2012 christos

PR/46602: Move the rfc6056 port randomization to the IP layer.


# 1.300 02-Jun-2012 dsl

Add some pre-processor magic to verify that the type of the data item
passed to sysctl_createv() actually matches the declared type for
the item itself.
In the places where the caller specifies a function and a structure
address (typically the 'softc') an explicit (void *) cast is now needed.
Fixes bugs in sys/dev/acpi/asus_acpi.c sys/dev/bluetooth/bcsp.c
sys/kern/vfs_bio.c sys/miscfs/syncfs/sync_subr.c and setting
AcpiGbl_EnableAmlDebugObject.
(mostly passing the address of a uint64_t when typed as CTLTYPE_INT).
I've test built quite a few kernels, but there may be some unfixed MD
fallout. Most likely passing &char[] to char *.
Also add CTLFLAG_UNSIGNED for unsiged decimals - not set yet.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.299 22-Mar-2012 drochner

remove KAME IPSEC, replaced by FAST_IPSEC


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.298 09-Jan-2012 liamjfoy

check against NULL


# 1.297 19-Dec-2011 drochner

rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.296 31-Aug-2011 plunky

branches: 1.296.2; 1.296.6;
NULL does not need a cast


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.295 03-May-2011 dyoung

*_drain() routines may be called with locks held, so instead of doing
any work in *_drain(), set a drain-needed flag. Do the work in the
fasttimo handler.

Contributed by Coyote Point Systems, Inc.


# 1.294 14-Apr-2011 dyoung

In ipintr(), don't overwrite ipintrq.ifq_maxlen with IFQ_MAXLEN.

Initialize ipintrq.ifq_maxlen using IFQ_MAXLEN directly instead of using
the global ipqmaxlen. Get rid of the global ipqmaxlen.

Now it works again to override the maximum IP queue length with, for
example, sysctl -w net.inet.ip.ifq.maxlen=5.


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.293 13-Dec-2010 matt

branches: 1.293.2;
Back out rev that shouldn't have been committed.


# 1.292 11-Dec-2010 matt

Add routines to calculate a checkesum if the driver concludes that the
h/w can't do it.


Revision tags: uebayasi-xip-base4
# 1.291 05-Nov-2010 rmind

ip_randomid: make mechanism MP-safe and more modular.

OK matt@


# 1.290 05-Nov-2010 rmind

ip_reass_packet: finish abstraction; some clean-up.
Discussed some time ago with matt@.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.289 19-Jul-2010 rmind

Abstract IP reassembly into single generic routine - ip_reass_packet().
Make struct ipq private and struct ipqent not visible to userland.
Push ip_len adjustment into reassembly layer.

OK matt@


# 1.288 13-Jul-2010 rmind

Split-off IPv4 re-assembly mechanism into a separate module. Abstract
into ip_reass_init(), ip_reass_lookup(), etc (note: abstraction is not
yet complete). No functional changes to the actual mechanism.

OK matt@


# 1.287 09-Jul-2010 rmind

ip_input: move lookup for fragment queue a little bit further. OK matt@.


Revision tags: uebayasi-xip-base1
# 1.286 01-Apr-2010 tls

As suggested by at least 3 different people (the guilty parties know who
they are) avoid repeated kernel_lock/unlock by using an intrq on the stack.

About 5%-10% better from run to run, on my *very* simpleminded test. Can't
possibly be worse.


# 1.285 31-Mar-2010 tls

Don't hold kernel lock across call to ip_input() -- it blocked *all*
hardware interrupts for the length of time it took for all dequeued
packets to flow up the stack (on multiprocessors only). Initial testing
shows performance impact is minimal -- since this temporary fix actually
means taking/releasing the kernel lock per-packet, that seems
acceptable.

Holding the kernel lock across the ip_input() call duplicated the
exclusion intended to be provided by the socket locks/softnet lock
(same lock, for INET/INET6 sockets) and could mask serious bugs. Several
hours' testing didn't turn any up but I'd be surprised if some don't now
appear.

Damon Permezel noticed the problem. Temporary fix suggested by matt@.


Revision tags: yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.284 16-Sep-2009 pooka

branches: 1.284.2; 1.284.4;
Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.283 17-Jul-2009 minskim

Delete trailing whitespace.


Revision tags: yamt-nfs-mp-base6
# 1.282 16-Jul-2009 minskim

Add the IP_RECVTTL option support.

If the IP_RECVTTL option is enabled on a SOCK_DGRAM socket, the
recvmsg(2) call will return the TTL of the received datagram. The
msg_control field in the msghdr structure points to a buffer that
contains a cmsghdr structure followed by the TTL value.

Modeled after FreeBSD implementation.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.281 18-Apr-2009 tsutsui

Remove extra whitespace added by a stupid tool.
XXX: more in src/sys/arch


# 1.280 15-Apr-2009 elad

Remove a few KAUTH_GENERIC_ISSUSER in favor of more descriptive
alternatives.

Discussed on tech-kern:

http://mail-index.netbsd.org/tech-kern/2009/04/11/msg004798.html

Input from ad@, christos@, dyoung@, tsutsui@.

Okay ad@.


# 1.279 18-Mar-2009 cegger

bcopy -> memcpy


Revision tags: nick-hppapmap-base2
# 1.278 19-Jan-2009 christos

branches: 1.278.2;
Provide compatibility to the old timeval SCM_TIMESTAMP messages.


Revision tags: mjf-devfs2-base
# 1.277 17-Dec-2008 cegger

kill MALLOC and FREE macros.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.276 23-Nov-2008 rmind

ip_input: fix an IPQ "lock" leak. (hi <matt>!)


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4
# 1.275 04-Oct-2008 pooka

branches: 1.275.2; 1.275.4;
POOL_INIT -> pool_init


Revision tags: wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.274 05-Sep-2008 seanb

Wrong route being consulted in one place
in ip_forward() after change to rtcache_*().
Restore previous behaviour.


# 1.273 20-Aug-2008 matt

Make the sysctl routines take out softnet_lock before dealing with
any data structures.

Change inet6ctlerrmap and zeroin6_addr to const.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base
# 1.272 05-May-2008 ad

branches: 1.272.2; 1.272.6;
- Convert hashinit() to use kmem_alloc(). The hash tables can be large
and it's better to not have them in kmem_map.
- Convert a couple of minor items along the way to kmem_alloc().
- Fix some memory leaks.


# 1.271 04-May-2008 thorpej

Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577


# 1.270 02-May-2008 ad

PR kern/38497 Out of memory allocating ksiginfo

Work around: don't acquire softnet_lock in protocol drain routines.


# 1.269 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.268 24-Apr-2008 ad

branches: 1.268.2;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.


# 1.267 23-Apr-2008 thorpej

Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.266 12-Apr-2008 thorpej

branches: 1.266.2;
Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.


# 1.265 09-Apr-2008 thorpej

- ipflow is not used outside ip_flow.c; move its definition there.
- Make ipflow_reap() private to ip_flow.c, and introduce ipflow_prune()
for external callers to use (avoids returning an ipflow * that is never
actually used anyway).


# 1.264 07-Apr-2008 thorpej

Change IP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ipstat structure; old netstat
binaries will continue to work properly.


# 1.263 27-Mar-2008 cube

- Make sure we send a reasonable fragment size when IPSEC is configured.
Otherwise we end up sending a dubious "0" whenever we cannot find a
proper association for the packet.
- Reset sack_newdata along with snd_nxt to avoid improper integer
arithmetics that lead to sending data from an incorrect place in the
stream, making it appear as corrupted.

Patch by Michael Van Elst, based on an analysis by Michael for the IPSEC
stuff and I for the SACK issue.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.262 06-Feb-2008 matt

branches: 1.262.6;
Add a new ip_id generation scheme based on a Fisher-Yates shuffle over a
sliding window. XXX replace use of arc4random RSN.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.261 14-Jan-2008 dyoung

Use rtcache_validate() instead of rtcache_getrt(). Shorten staircase
in in_losing().


Revision tags: vmlocking2-base3 matt-armv6-base
# 1.260 22-Dec-2007 matt

Fix offset calculation.
Make sure that all frags use the same TOS.


# 1.259 21-Dec-2007 matt

Also make sure the first is at 68 bytes long.


# 1.258 21-Dec-2007 matt

Prevent TCP blind data attacks by not allowing non-initial fragments to
start at less than 68 bytes (minimal fragment size).


# 1.257 20-Dec-2007 dyoung

Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.256 26-Nov-2007 yamt

branches: 1.256.2; 1.256.6;
inetctlerrmap: use designated initializer.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.255 09-Nov-2007 kefren

Don't MCLAIM in ipintr() because we do it anyway in ip_input()


Revision tags: jmcneill-base yamt-x86pmap-base4 yamt-x86pmap-base3 yamt-x86pmap-base2 vmlocking-base
# 1.254 02-Oct-2007 dyoung

branches: 1.254.2; 1.254.4;
Delete the unused second argument to ip_stripoptions(), move it
closer to its single caller in if_eon.c, try to move fewer bytes
by moving the IP header forward instead of moving the tail of the
mbuf backward, and use m_adj(9) instead of fiddling directly with
mbuf data members.


Revision tags: yamt-x86pmap-base
# 1.253 11-Sep-2007 degroote

branches: 1.253.2;
In some FAST_IPSEC, spl level is not restored correctly. Fix that.

Spotted by Wolfgang Stukenbrock in pr/36800


Revision tags: nick-csl-alignment-base5
# 1.252 30-Aug-2007 dyoung

Use malloc(9) for sockaddrs instead of pool(9), and remove dom_sa_pool
and dom_sa_len members from struct domain. Pools of fixed-size
objects are too rigid for sockaddr_dls, whose size can vary over
a wide range.

Return sockaddr_dl to its "historical" size. Now that I'm using
malloc(9) instead of pool(9) to allocate sockaddr_dl, I can create
a sockaddr_dl of any size in the kernel, so expanding sockaddr_dl
is useless.

Avoid using sizeof(struct sockaddr_dl) in the kernel.

Introduce sockaddr_dl_alloc() for allocating & initializing an
arbitrary sockaddr_dl on the heap.

Add an argument, the sockaddr length, to sockaddr_alloc(),
sockaddr_copy(), and sockaddr_dl_setaddr().

Constify: LLADDR() -> CLLADDR().

Where the kernel overwrites LLADDR(), use sockaddr_dl_setaddr(),
instead. Used properly, sockaddr_dl_setaddr() will not overrun
the end of the sockaddr.


# 1.251 10-Aug-2007 dyoung

branches: 1.251.2;
Use sockaddr_dl_init().


Revision tags: matt-mips64-base
# 1.250 19-Jul-2007 dyoung

branches: 1.250.4; 1.250.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base yamt-idlelwp-base8 mjf-ufs-trans-base
# 1.249 02-May-2007 dyoung

branches: 1.249.2;
Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


Revision tags: thorpej-atomic-base
# 1.248 25-Mar-2007 liamjfoy

Add net.inet.ip.hashsize to control the IPv4 fast forward hash table size.


# 1.247 24-Mar-2007 liamjfoy

Don't call ip*flow_reap if we're just looking up maxflows


# 1.246 12-Mar-2007 ad

branches: 1.246.2; 1.246.4;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.


# 1.245 05-Mar-2007 liamjfoy

branches: 1.245.2;
Move ipflow_slowtimo from ip_slowtimo and into in_proto.c

ok matt@


# 1.244 04-Mar-2007 christos

Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.243 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.242 29-Jan-2007 dyoung

branches: 1.242.2;
Cosmetic: remove extraneous, non-KNF parentheses. Change a
sizeof(type) to a sizeof(*ptr) so the correctness of the statement
is correct "at a glance" (or so I hope).


# 1.241 22-Dec-2006 ad

ipintr(): check if the queue is empty before looping. Hardly a giant
win, but removed 30% of splnet() calls in one local test.


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.240 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.239 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


# 1.238 06-Dec-2006 dyoung

KNF.


# 1.237 06-Dec-2006 dyoung

KNF.


Revision tags: netbsd-4-0-RC1 netbsd-4-base
# 1.236 16-Nov-2006 christos

branches: 1.236.2; 1.236.4;
__unused removal on arguments; approved by core.


Revision tags: yamt-splraiseipl-base2
# 1.235 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


# 1.234 10-Oct-2006 dogcow

change the MOWNER_INIT define to take two args; fix extant struct mowner
decls to use it. Makes options MBUFTRACE compile again and not whinge about
missing structure declarations. (Also makes initialization consistent.)


# 1.233 05-Oct-2006 tls

Protect calls to pool_put/pool_get that may occur in interrupt context
with spl used to protect other allocations and frees, or datastructure
element insertion and removal, in adjacent code.

It is almost unquestionably the case that some of the spl()/splx() calls
added here are superfluous, but it really seems wrong to see:

s=splfoo();
/* frob data structure */
splx(s);
pool_put(x);

and if we think we need to protect the first operation, then it is hard
to see why we should not think we need to protect the next. "Better
safe than sorry".

It is also almost unquestionably the case that I missed some pool
gets/puts from interrupt context with my strategy for finding these
calls; use of PR_NOWAIT is a strong hint that a pool may be used from
interrupt context but many callers in the kernel pass a "can wait/can't
wait" flag down such that my searches might not have found them. One
notable area that needs to be looked at is pf.

See also:

http://mail-index.netbsd.org/tech-kern/2006/07/19/0003.html
http://mail-index.netbsd.org/tech-kern/2006/07/19/0009.html


# 1.232 19-Sep-2006 elad

Remove ugly (void *) casts from network scope authorization wrapper and
calls to it.

While here, adapt code for system scope listeners to avoid some more
casts (forgotten in previous run).

Update documentation.


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9
# 1.231 13-Sep-2006 elad

branches: 1.231.2;
Don't use KAUTH_RESULT_* where it's not applicable.
Prompted by yamt@.


# 1.230 08-Sep-2006 elad

First take at security model abstraction.

- Add a few scopes to the kernel: system, network, and machdep.

- Add a few more actions/sub-actions (requests), and start using them as
opposed to the KAUTH_GENERIC_ISSUSER place-holders.

- Introduce a basic set of listeners that implement our "traditional"
security model, called "bsd44". This is the default (and only) model we
have at the moment.

- Update all relevant documentation.

- Add some code and docs to help folks who want to actually use this stuff:

* There's a sample overlay model, sitting on-top of "bsd44", for
fast experimenting with tweaking just a subset of an existing model.

This is pretty cool because it's *really* straightforward to do stuff
you had to use ugly hacks for until now...

* And of course, documentation describing how to do the above for quick
reference, including code samples.

All of these changes were tested for regressions using a Python-based
testsuite that will be (I hope) available soon via pkgsrc. Information
about the tests, and how to write new ones, can be found on:

http://kauth.linbsd.org/kauthwiki

NOTE FOR DEVELOPERS: *PLEASE* don't add any code that does any of the
following:

- Uses a KAUTH_GENERIC_ISSUSER kauth(9) request,
- Checks 'securelevel' directly,
- Checks a uid/gid directly.

(or if you feel you have to, contact me first)

This is still work in progress; It's far from being done, but now it'll
be a lot easier.

Relevant mailing list threads:

http://mail-index.netbsd.org/tech-security/2006/01/25/0011.html
http://mail-index.netbsd.org/tech-security/2006/03/24/0001.html
http://mail-index.netbsd.org/tech-security/2006/04/18/0000.html
http://mail-index.netbsd.org/tech-security/2006/05/15/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/01/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/25/0000.html

Many thanks to YAMAMOTO Takashi, Matt Thomas, and Christos Zoulas for help
stablizing kauth(9).

Full credit for the regression tests, making sure these changes didn't break
anything, goes to Matt Fleming and Jaime Fournier.

Happy birthday Randi! :)


Revision tags: yamt-pdpolicy-base8 rpaulo-netinet-merge-pcb-base
# 1.229 30-Aug-2006 christos

branches: 1.229.2;
fix initializer


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.228 30-Jul-2006 elad

ugh.. more stuff that's overdue and should not be in 4.0: remove the
sysctl(9) flags CTLFLAG_READONLY[12]. luckily they're not documented
so it's only half regression.

only two knobs used them; proc.curproc.corename (check added in the
existing handler; its CTLFLAG_ANYWRITE, yay) and net.inet.ip.forwsrcrt,
that got its own handler now too.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base
# 1.227 07-Jun-2006 kardel

merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base simonb-timecounters-base
# 1.226 08-May-2006 liamjfoy

branches: 1.226.2;
#if -> #ifdef

ok christos


# 1.225 15-Apr-2006 christos

Coverity CID 1134: Protect against NULL deref.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.224 18-Feb-2006 joerg

branches: 1.224.2; 1.224.4; 1.224.6;
Print the source and destination IP in ip_forward's DIAGNOSTIC code
with inet_ntoa, making it more human friendly.

From Liam J. Foy in private mail.


# 1.223 24-Dec-2005 perry

branches: 1.223.2; 1.223.4; 1.223.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.222 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.221 01-Nov-2005 christos

Don't decrement the ttl, until we are sure that we can forward this packet.
Before if there was no route, we would call icmp_error with a datagram
packet that has an incorrect checksum. (From Liam Foy)


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base
# 1.220 23-Oct-2005 christos

No need to pass an interface when only the mtu is needed. From OpenBSD via
Liam Foy.


Revision tags: yamt-vop-base
# 1.219 05-Aug-2005 elad

branches: 1.219.2;
Add sysctls for IP, ICMP, TCP, and UDP statistics.


# 1.218 28-Jun-2005 seanb

branches: 1.218.2;
- Return ICMP_UNREACH_NET when no route found as per
section 4.3.3.1 of rfc1812.


# 1.217 09-Jun-2005 atatat

Properly fix the constipated lossage wrt -Wcast-qual and the sysctl
code. I know it's not the prettiest code, but it seems to work rather
well in spite of itself.


# 1.216 01-Jun-2005 blymn

Unconstify rnode to prevent compile error when GATEWAY option set.


Revision tags: kent-audio2-base
# 1.215 29-Apr-2005 yamt

move decl of inetsw to its own header to avoid array of incomplete type.
found by gcc4. reported by Adam Ciarcinski.


# 1.214 18-Apr-2005 yamt

fix problems related to loopback interface checksum omission. PR/29971.

- for ipv4, defer decision to ip layer as h/w checksum offloading does
so that it can check the actual interface the packet is going to.
- for ipv6, disable it.
(maybe will be revisited when it implements h/w checksum offloading.)

ok'ed by Jason Thorpe.


# 1.213 29-Mar-2005 yamt

ip_reass: clear stale csum_flags.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base
# 1.212 26-Feb-2005 perry

branches: 1.212.2;
nuke trailing whitespace


Revision tags: yamt-km-base2
# 1.211 03-Feb-2005 perry

ANSIfy function declarations


# 1.210 02-Feb-2005 perry

de-__P -- will ANSIfy .c files later.


Revision tags: yamt-km-base
# 1.209 24-Jan-2005 matt

branches: 1.209.2;
Add IFNET_FOREACH and IFADDR_FOREACH macros and start using them.


Revision tags: kent-audio1-beforemerge
# 1.208 19-Dec-2004 christos

branches: 1.208.2;
yamt's changes seem to fix all the checksumming issues. Turn the loopback
checksums back off so we can make sure that everything works.


# 1.207 17-Dec-2004 christos

Turn checksumming on loopback back on until we fix the bugs in it.
Connect over tcp on the loopback is broken:

4729 amq 0.000007 CALL connect(4,0x804f2a0,0x1c)
4729 amq 75.007420 RET connect -1 errno 60 Connection timed out


# 1.206 15-Dec-2004 thorpej

Don't perform checksums on loopback interfaces. They can be reenabled with
the net.inet.*.do_loopback_cksum sysctl.

Approved by: groo


Revision tags: kent-audio1-base
# 1.205 06-Oct-2004 darrenr

Add a comment to document what setting "srcrt" is really on about in ipintr()


# 1.204 29-Sep-2004 christos

PR/27081: Sean Boudreau: ip_input() bad csum count not incremented on sw csum


Revision tags: BEFORE-IPF413
# 1.203 25-May-2004 atatat

Sysctl descriptions under net subtree (net.key not done)


# 1.202 02-May-2004 darrenr

at line 543, we do a pullup here of hlen bytes into the mbuf,
so these later ones are superfluous.


# 1.201 01-May-2004 matt

Use EVCNT_ATTACH_STATIC{,2}


# 1.200 25-Apr-2004 simonb

Initialise (most) pools from a link set instead of explicit calls
to pool_init. Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.

Convert struct session, ucred and lockf to pools.


# 1.199 22-Apr-2004 matt

Constify protosw arrays. This can reduce the kernel .data section by
over 4K (if all the network protocols) are loaded.


# 1.198 01-Apr-2004 matt

In ip_reass_ttl_descr, make i signed since it's compared to >= 0


Revision tags: netbsd-2-0-base BEFORE-IPF411
# 1.197 24-Mar-2004 atatat

branches: 1.197.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.


# 1.196 15-Jan-2004 itojun

correct typo in 1.94 -> 1.95. pointed out by Shiva Shenoy


# 1.195 14-Dec-2003 thorpej

Fix syntax errors in CHECK_NMBCLUSTER_PARAMS().


# 1.194 14-Dec-2003 jonathan

Second part of hashed IP_reassembly changes:

When under pressure for mbufs or we have too many fragments in the IP
reassembly queue, drop half of all fragments. This multiplicative-drop
strategy ensures we return to a healthy state, even under borderline
denial-of-service from extremely lossy NFS-over-UDP peers.
The multiplicative-drop phase currently drops 50% of fragments, but
has pre-placed support for implementing drop-fractions other than 50%

The threshhold for the `drop-half' phase is the new variable,
ip_maxfrags which is calculated as nmbclusters/4.

ip_input.c now keeps ip_nmbclusters, a cached copy of nmbclusters.
Before using limits derived from nmbclusters, we check if nmbclusters
and ip_nmclusters are equal. If not, we recompute Ip parameters
derived from nmbclusters. Based on a suggestion by Jason Thorpe.
ip_maxfrags is currently auto-recalcuated.

The counters ip_nfrags and ip_nfragpacketsr are now declared static
and uninitialized (bss), to discourage tampering with them.


# 1.193 12-Dec-2003 scw

Make fast-ipsec and ipflow (Fast Forwarding) interoperate.

The idea is that we only clear M_CANFASTFWD if an SPD exists
for the packet. Otherwise, it's safe to add a fast-forward
cache entry for the route.

To make this work properly, we invalidate the entire ipflow
cache if a fast-ipsec key is added or changed.


# 1.192 08-Dec-2003 jonathan

Add new field ipq_nfrags to struct ipq. Maintain count of fragments
(fragments, not fragmented packets) in each queue entry.
Use ipq_nfrags to maintain a count of total fragments in reassembly queue.


# 1.191 07-Dec-2003 jonathan

KNF: s/unsigned/u_int/, in a couple of places I missed.


# 1.190 06-Dec-2003 jonathan

Replace the single global IP reassembly list/listhead, with a
hashtable of list-heads. Independently re-invented, then reworked to
match similar code in FreeBSD.


# 1.189 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.188 04-Dec-2003 scw

ipflow (IP fast forwarding) is not compatible with FAST_IPSEC either.

XXX: The decision whether or not to fast forward should be made
XXX: dynamically. Using the current approach seriously reduces
XXX: routing performance on gateways with IPsec enabled.


# 1.187 26-Nov-2003 itojun

define RANDOM_IP_ID by default (unifdef -DRANDOM_IP_ID).
one use remains in sys/netipsec, which is kept for freebsd source code compat.


# 1.186 24-Nov-2003 scw

For FAST_IPSEC, ipfilter gets to see wire-format IPsec-encapsulated packets
only. Decapsulated packets bypass ipfilter. This mimics current behaviour
for Kame IPsec.


# 1.185 19-Nov-2003 fvdl

Correct number of arguments to sysctl_rdint.


# 1.184 19-Nov-2003 jonathan

Patch back support for (badly) randomized IP ids, by request:

* Include "opt_inet.h" everywhere IP-ids are generated with ip_newid(),
so the RANDOM_IP_ID option is visible. Also in ip_id(), to ensure
the prototype for ip_randomid() is made visible.

* Add new sysctl to enable randomized IP-ids, provided the kernel was
configured with RANDOM_IP_ID. (The sysctl defaults to zero, and is
a read-only zero if RANDOM_IP_ID is not configured).

Note that the implementation of randomized IP ids is still defective,
and should not be enabled at all (even if configured) without
very careful deliberation. Caveat emptor.


# 1.183 17-Nov-2003 jonathan

Diff to netinet/ip_input.c (restore ip_id, initialize) for ip_id fix:

Revert the (default) ip_id algorithm to the pre-randomid algorithm,
due to demonstrated low-period repeated IDs from the randomized IP_id
code. Consensus is that the low-period repetition (much less than
2^15) is not suitable for general-purpose use.

Allocators of new IPv4 IDs should now call the function ip_newid().
Randomized IP_ids is now a config-time option, "options RANDOM_IP_ID".
ip_newid() can use ip_random-id()_IP_ID if and only if configured
with RANDOM_IP_ID. A sysctl knob should be provided.

This API may be reworked in the near future to support linear ip_id
counters per (src,dst) IP-address pair.


# 1.182 12-Nov-2003 itojun

KNF


# 1.181 11-Nov-2003 jonathan

Change global head-of-local-IP-address list from in_ifaddr to
in_ifaddrhead. Recent changes in struct names caused a namespace
collision in fast-ipsec, which are most cleanly fixed by using
"in_ifaddrhead" as the listhead name.


# 1.180 10-Nov-2003 jonathan

Make per-protocol network input queue stats visible to userland via
sysctl. Add a protocol-independent sysctl handler to show the per-protocol
"struct ifq' statistics. Add IP(v4) specific call to the handler.
Other protocols can show their per-protocol input statistics by
allocating a sysclt node and calling sysctl_ifq() with their own struct ifq *.

As posted to tech-kern plus improvements/cleanup suggested by Andrew Brown.


# 1.179 28-Sep-2003 mycroft

Remove some code that breaks AH tunnels completely. The comment describing
the purpose of this code appears to be on crack -- it's talking about
end-to-end authentication, but the purpose of an AH tunnel is NOT end-to-end
authentication; it's authentication of the tunnel endpoints.

NB: This does not fix the fact that IPsec leaks "packet tags."


# 1.178 06-Sep-2003 itojun

randomize IPv4/v6 fragment ID and IPv6 flowlabel. avoids predictability
of these fields. ip_id.c is from openbsd. ip6_id.c is adapted by kame.


# 1.177 06-Sep-2003 itojun

backout previous, we don't know if arc4random() corrides on reboot.


# 1.176 05-Sep-2003 itojun

initialize fragment ID with arc4random, not by time.tv_sec


# 1.175 22-Aug-2003 itojun

remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.


# 1.174 22-Aug-2003 itojun

change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.


# 1.173 15-Aug-2003 jonathan

(fast-ipsec): Add hooks to pass IPv4 IPsec traffic into fast-ipsec, if
configured with ``options FAST_IPSEC''. Kernels with KAME IPsec or
with no IPsec should work as before.

All calls to ip_output() now always pass an additional compulsory
argument: the inpcb associated with the packet being sent,
or 0 if no inpcb is available.

Fast-ipsec tested with ICMP or UDP over ESP. TCP doesn't work, yet.


# 1.172 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.171 14-Jul-2003 itojun

correct igmp. from love


# 1.170 03-Jul-2003 itojun

minor KNF


# 1.169 30-Jun-2003 itojun

branches: 1.169.2;
do not generate ICMP redirect when packet filter alters ip_dst to an
address that reside on the same link. Cedric Berger convinced me that
it is necessary.


# 1.168 30-Jun-2003 itojun

fix indent


# 1.167 23-Jun-2003 martin

Make sure to include opt_foo.h if a defflag option FOO is used.


# 1.166 15-Jun-2003 matt

Change the way multicasts are kept. They now use a hash table in the same
manner as the ifaddr hash table. By doing this, the mkludge code can go
away. At the same time, keep track of what pcbs are using what ifaddr and
when an address is deleted from an interface, notify/abort all sockets
that have that address as a source. Switch IGMP and multicasts to use pools
for allocation. Fix a number of potential problems in the igmp code where
allocation failures could cause a trap/panic.


# 1.165 11-Apr-2003 christos

PR/991: Darren Reed: Add a sysctl (checkinteface) to implement this. This
implementation is taken from FreeBSD, but we default to off.
XXX: We should really do this on a per ifaddr basis as jason suggested.


# 1.164 26-Feb-2003 matt

Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.163 12-Nov-2002 itojun

remove all entries in rt timer queue on ip_mtudisc change, instead of
destroying the queue.


# 1.162 12-Nov-2002 itojun

ckout previous - doesn't compile


# 1.161 12-Nov-2002 itojun

update ip_mtudisc sysctl change handling.


# 1.160 10-Nov-2002 itojun

always create pmtud timeout queue, as ip_mtudisc can be tweaked via
sysctl at runtime. From lha@stacken.kth.se


# 1.159 02-Nov-2002 perry

/*CONTCOND*/ while (0)'ed macros


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.158 23-Sep-2002 itojun

revert mtudisc_timeout value to the old one if update falis


# 1.157 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame.


# 1.156 11-Sep-2002 itojun

correct signedness mixup in pointer passing. sync w/kame


Revision tags: gehenna-devsw-base
# 1.155 14-Aug-2002 itojun

avoid swapping endian of ip_len and ip_off on mbuf, to meet with M_LEADINGSPACE
optimization made last year. should solve PR 17867 and 10195.

IP_HDRINCL behavior of raw ip socket is kept unchanged. we may want to
provide IP_HDRINCL variant that does not swap endian.


# 1.154 30-Jun-2002 thorpej

Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).


# 1.153 13-Jun-2002 itojun

set IPv4 parameter to modern value.
- turn on path MTU discovery (previous: turned off)
- ICMPv4 redirect entry timeout = 600 sec (previous: never timeout)


# 1.152 09-Jun-2002 itojun

whitespace


# 1.151 07-Jun-2002 itojun

look at rmx_mtu on IPsec tunnel MTU computation.
From: David Waitzman <djw@bbn.com>


Revision tags: netbsd-1-6-base
# 1.150 12-May-2002 matt

branches: 1.150.2; 1.150.4;
Eliminate commons.


# 1.149 12-May-2002 wiz

Spelling fixes, from Sergey Svishchev in kern/16650.


# 1.148 07-May-2002 matt

Change struct ipqe to use TAILQ's instead of LIST's (primarily for TCP's
benefit currently). Rework tcp_reass code to optimize the 4 most likely causes
of out-of-order packets: first OoO pkt, next OoO pkt in seq, OoO pkt is part
of new chuck of OoO packets, and the OoO pkt fills the first hole. Add evcnts
to instrument tcp_reass (enabled by the options TCP_REASS_COUNTERS). This is
part 1/2 of tcp_reass changes.


# 1.147 18-Apr-2002 matt

Change test for M_EXT to M_READONLY for MROUTING. We only need to to do
a pullup if we aren't allowed to modify the packet.


Revision tags: eeh-devprop-base newlock-base
# 1.146 08-Mar-2002 thorpej

Pool deals fairly well with physical memory shortage, but it doesn't
deal with shortages of the VM maps where the backing pages are mapped
(usually kmem_map). Try to deal with this:

* Group all information about the backend allocator for a pool in a
separate structure. The pool references this structure, rather than
the individual fields.
* Change the pool_init() API accordingly, and adjust all callers.
* Link all pools using the same backend allocator on a list.
* The backend allocator is responsible for waiting for physical memory
to become available, but will still fail if it cannot callocate KVA
space for the pages. If this happens, carefully drain all pools using
the same backend allocator, so that some KVA space can be freed.
* Change pool_reclaim() to indicate if it actually succeeded in freeing
some pages, and use that information to make draining easier and more
efficient.
* Get rid of PR_URGENT. There was only one use of it, and it could be
dealt with by the caller.

From art@openbsd.org.


Revision tags: ifpoll-base
# 1.145 25-Feb-2002 itojun

correctly enforce ipsec policy check on forwarding case.
From: Greg Troxel <gdt@ir.bbn.com>, Bill Chiarchiaro <wjc@work.cleartech.com>


# 1.144 24-Feb-2002 martin

Clear M_BCAST and M_MCAST on outgoing mbufs.
Don't copy ttl from the inner packet to the encapsulating packet. Make
the outer ttl sysctl'able. This should close PR 14269 from Jasper Wallace
(change partly from there) and it makes traceroute work over gre tunnels.


# 1.143 21-Feb-2002 itojun

suppress source quence message, based on router-req RFC (also could be abused
as DoS traffic generator). from kjc/kame


# 1.142 28-Nov-2001 darrenr

recompute hlen after calling pfil_run_hooks() in case ip_hl was changed.


# 1.141 13-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-mips-cache-base
# 1.140 04-Nov-2001 matt

Convert netinet to not use the internal <sys/queue.h> field names
but instead the access macros. Use the FOREACH macros where appropriate.


# 1.139 04-Nov-2001 matt

Change a few variable/tables to const since they are read-only.


# 1.138 29-Oct-2001 simonb

Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.137 17-Sep-2001 thorpej

branches: 1.137.2;
Split the pre-computed ifnet checksum flags into Tx and Rx directions.
Add capabilities bits that indicate an interface can only perform
in-bound TCPv4 or UDPv4 checksums. There is at least one Gig-E chip
for which this is true (Level One LXT-1001), and this is also the
case for the Intel i82559 10/100 Ethernet chips.


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.136 06-Aug-2001 itojun

branches: 1.136.2;
cache IPsec policy on in6?pcb. most of the lookup operations can be bypassed,
especially when it is a connected SOCK_STREAM in6?pcb. sync with kame.


# 1.135 02-Jun-2001 thorpej

branches: 1.135.2;
Implement support for IP/TCP/UDP checksum offloading provided by
network interfaces. This works by pre-computing the pseudo-header
checksum and caching it, delaying the actual checksum to ip_output()
if the hardware cannot perform the sum for us. In-bound checksums
can either be fully-checked by hardware, or summed up for final
verification by software. This method was modeled after how this
is done in FreeBSD, although the code is significantly different in
most places.

We don't delay checksums for IPv6/TCP, but we do take advantage of the
cached pseudo-header checksum.

Note: hardware-assisted checksumming defaults to "off". It is
enabled with ifconfig(8). See the manual page for details.

Implement hardware-assisted checksumming on the DP83820 Gigabit Ethernet,
3c90xB/3c90xC 10/100 Ethernet, and Alteon Tigon/Tigon2 Gigabit Ethernet.


# 1.134 21-May-2001 lukem

fix spelo in comment


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.133 16-Apr-2001 itojun

give a default value to net.inet.ip.maxfragpackets, to protect us from
"lots of fragmented packets" DoS attack.

the current default value is derived from ipv6 counterpart, which is
a magical value "200". it should be enough for normal systems, not sure
if it is enough when you take hundreds of thousands of tcp connections on
your system. if you have proposal for a better value with concrete reasons,
let me know.


# 1.132 13-Apr-2001 thorpej

Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.


# 1.131 27-Mar-2001 itojun

net.inet.ip.maxfragpackets defines the maximum size of ip reass queue
(prevents fragment flood from chewing up mbuf memory space).
derived from KAME net.inet6.ip6.maxfragpackets.


# 1.130 02-Mar-2001 itojun

branches: 1.130.2;
increase ipstat.ips_badaddr if the packet fails to pass address checks.


# 1.129 02-Mar-2001 itojun

reject packets with 127/8 on IPv4 src/dst, they must not appear on wire
(RFC1122). torture-tests will be welcomed.
XXX do we want to check source routing headers as well?


# 1.128 01-Mar-2001 itojun

make sure to enforce inbound ipsec policy checking, for any protocols on top
of ip (check it when final header is visited). sync with kame.
XXX kame team will need to re-check policy engine code


# 1.127 24-Jan-2001 itojun

- record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation


# 1.126 28-Dec-2000 thorpej

Back out the sledgehammer damage applied by wiz while I was out for
the holiday.


# 1.125 25-Dec-2000 wiz

Back out previous change. It causes NAT to fail, and was CLEARLY
NOT TESTED before it was committed.


# 1.124 22-Dec-2000 thorpej

Slight adjustment to how pfil_head's are registered. Instead of a
"key" and a "dlt", use a "type" (PFIL_TYPE_{AF,IFNET} for now) and
a val/ptr appropriate for that type. This allows for more future
flexibility with the pfil_hook mechanism.


# 1.123 14-Dec-2000 thorpej

Add ALTQ glue. XXX Temporary until ALTQ is changed to use a pfil hook.


# 1.122 24-Nov-2000 itojun

IFA_STATS stability (not complete); don't touch ip if it is NULL.


# 1.121 11-Nov-2000 thorpej

Restructure the PFIL_HOOKS mechanism a bit:
- All packets are passed to PFIL_HOOKS as they come off the wire, i.e.
fields in protocol headers in network order, etc.
- Allow for multiple hooks to be registered, using a "key" and a "dlt".
The "dlt" is a BPF data link type, indicating what type of header is
present.
- INET and INET6 register with key == AF_INET or AF_INET6, and
dlt == DLT_RAW.
- PFIL_HOOKS now take an argument for the filter hook, and mbuf **,
an ifnet *, and a direction (PFIL_IN or PFIL_OUT), thus making them
less IP (really, IP Filter) centric.

Maintain compatibility with IP Filter by adding wrapper functions for
IP Filter.


# 1.120 08-Nov-2000 ad

Update for hashinit() change.


# 1.119 13-Oct-2000 itojun

make sure we don't share external mbuf between m and mcopy, in ip_forward().
should solve PR 11201.


# 1.118 26-Aug-2000 itojun

make sure anonport{min,max} is not negative number


# 1.117 25-Aug-2000 tron

Add new sysctl variables "net.inet.ip.lowportmin" and
"net.inet.ip.lowportmax" which can be used to the set minimum
and maximum port number assigned to sockets using
IP_PORTRANGE_LOW.


# 1.116 06-Jul-2000 itojun

remove unnecessary #include <netkey/key_debug.h>. from kame.


# 1.115 28-Jun-2000 mrg

<vm/vm.h> -> <uvm/uvm_extern.h>


Revision tags: netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.114 10-May-2000 itojun

branches: 1.114.4;
add missing boundary checks to ip options processing.
correct timestamp option validation (len and ptr upper/lower bound
based on RFC791).
fill "pointer" field for parameter problem in timestamp option processing.


# 1.113 10-May-2000 itojun

correct more out-of-bounds memory access, if cnt == 1 and optlen > 1.


# 1.112 06-May-2000 sommerfeld

Handle large offsets with very small options correctly.


# 1.111 31-Mar-2000 jdolecek

Slighly improve previous - only include <netinet/ip_mroute.h> if MROUTING
is defined.


# 1.110 31-Mar-2000 jdolecek

include <netinet/ip_mroute.h> for ip_mforward() - needed after
last duplicate prototype sweep (prototype for ip_mforward() used to be in <netinet/ip_var.h>)


# 1.109 30-Mar-2000 augustss

Remove register declarations.


# 1.108 30-Mar-2000 simonb

Delete uninitialised declaration of ip_defttl - there's an initialised
decl earlier in this file.


# 1.107 10-Mar-2000 thorpej

Back out previous, and adjust a comment.


# 1.106 07-Mar-2000 thorpej

Back out part of 1.104 which isn't actually needed.


# 1.105 03-Mar-2000 itojun

remove unnecessary ttl initialization which I mistakingly bringed in
during KAME merge (this is part of WIDE's expeirmental reass code...)
NetBSD PR: 9412
From: Wolfgang Rupprecht <wolfgang@wsrcc.com>
Fix from: ho@crt.se
itojun was notified from: theo


# 1.104 02-Mar-2000 thorpej

Avoid a bug in GCC which manifests itself when processing unaligned
IP options. Problem pointed out by Matt Hargett and Erik Fair, analyzed
by me.


# 1.103 01-Mar-2000 itojun

introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)


# 1.102 20-Feb-2000 darrenr

pass "struct pfil_head *" to pfil_add_hook and pfil_remove hook rather
than "struct protosw *".


# 1.101 17-Feb-2000 darrenr

Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.


# 1.100 16-Feb-2000 itojun

- if ip_dst matches address on !IFF_UP interface, and
- there's no match against addresses on IFF_UP interface,
send icmp unreach if I'm router. drop it if I'm host.

Revised version of PR: 9387 from nrt@iij.ad.jp. Discussed with thorpej+nrt.


Revision tags: chs-ubc2-newbase
# 1.99 12-Feb-2000 thorpej

Typo (Thanks, Havard :-)


# 1.98 12-Feb-2000 thorpej

Small cosmetic change, and note a place where a statistic should be
gathered.


# 1.97 11-Feb-2000 itojun

fix in-kernel packet forwarding loop (till TTL becomes 0) when:
- a packet is delivered to an address X,
- and the address X is configured on my !IFF_UP interface
- and ipforwarding=1

NetBSD PR: 9387
From: nrt@iij.ad.jp


# 1.96 01-Feb-2000 thorpej

Use ifatoia() and sintosa() consistently, rather than using home-grown
casting macros intermixed.


# 1.95 31-Jan-2000 itojun

bring in latest KAME ipsec tree.
- interop issues in ipcomp is fixed
- padding type (after ESP) is configurable
- key database memory management (need more fixes)
- policy specification is revisited

XXX m->m_pkthdr.rcvif is still overloaded - hope to fix it soon


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.94 26-Oct-1999 itojun

disable ipflow (IPv4 fast fowarding) when IPsec is configured into the kernel.


# 1.93 17-Oct-1999 sommerfeld

branches: 1.93.2; 1.93.4;
In ip_forward():

Avoid forwarding ip unicast packets which were contained inside
link-level multicast packets; having M_MCAST still set in the packet
header flags will mean that the packet will get multicast to a bogus
group instead of unicast to the next hop.

Malformed packets like this have occasionally been spotted "in the
wild" on a mediaone cable modem segment which also had multiple netbsd
machines running as router/NAT boxes.

Without this, any subnet with multiple netbsd routers receiving all
multicasts will generate a packet storm on receipt of such a
multicast. Note that we already do the same check here for link-level
broadcasts; ip6_forward already does this as well.

Note that multicast forwarding does not go through ip_forward().

Adding some code to if_ethersubr to sanity check link-level
vs. ip-level multicast addresses might also be worthwhile.


Revision tags: chs-ubc2-base
# 1.92 23-Jul-1999 itojun

branches: 1.92.2;
do not include unnecessary include files.


# 1.91 09-Jul-1999 thorpej

defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).


# 1.90 06-Jul-1999 itojun

sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups


# 1.89 01-Jul-1999 itojun

IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.88 26-Jun-1999 sommerfeld

If the new global variable hostzerobroadcast is zero, no longer assume
address zero of each net/subnet is a broadcast address.
(The default value is nonzero, which preserves the current behavior).

This can be set using sysctl; the boot-time default can also be
configured using the HOSTZEROBROADCAST kernel config option.

While we're here, defopt HOSTZEROBROADCAST and SUBNETSARELOCAL


# 1.87 04-May-1999 hwr

It does not make much sense to increase a "output" counter on input.


# 1.86 03-May-1999 thorpej

In INADDR_TO_IA(), skip interfaces which are not up. Revert previous change
to ip_input.c to check the interface status after INADDR_TO_IA().

Fix cooked up by Heiko Rupp and myself.

Fixes PR 7480.


# 1.85 03-May-1999 hwr

Drop packets, that have a Class-D address as source address.
Implements the first half of PR 7003.


# 1.84 07-Apr-1999 proff

tiny KNF change


# 1.83 07-Apr-1999 proff

Prevent reception of packets on downed interfaces (via an up interface).
fixes kern/7327


Revision tags: netbsd-1-4-base
# 1.82 27-Mar-1999 aidan

branches: 1.82.2;
Added per-addr input/output statistics. Currently just support netatalk
and netinet, currently only tested under netinet.

Disabled by default, enabled by compiling the kernel with option
IFA_STATS. Enabling this feature seems to make the ip_output function
take 13% longer than before, which should be OK for people that need
this feature.


# 1.81 26-Mar-1999 proff

security: test for ip_len < ip_hl <<2 and drop packet accordingly


# 1.80 19-Jan-1999 mycroft

There's just no plausible reason to byte-swap ip_id internally. It's opaque.


# 1.79 19-Jan-1999 mycroft

Don't screw with ip_len; just subtract from it where we actually use the
value.


# 1.78 19-Jan-1999 mycroft

Don't overwrite the checksum fields when checking them. There's no reason to
do this, and it screws up ICMP replies.
XXX The returned IP checksum and length are still wrong.


# 1.77 11-Jan-1999 thorpej

Fix byte order and ip_len inconsistencies in ICMP reply code. Also, fix
some formatting and HTONS(foo) vs. foo = htons(foo) inconsistencies.

PR #6602, Darren Reed.


# 1.76 19-Dec-1998 thorpej

Reverse the copyright-notice-swap. It went against existing practice.


# 1.75 18-Dec-1998 thorpej

Add a lock around the IP fragment reassembly queue, to prevent ip_drain()
from corrupting the queue if called from a device's interrupt context.

Should fix PR #5684.


Revision tags: kenh-if-detach-base
# 1.74 13-Nov-1998 thorpej

branches: 1.74.2;
Once a fragmented IP packet has been reassembled, recompute the packet
length before passing it up the stack. From FreeBSD.


Revision tags: chs-ubc-base
# 1.73 08-Oct-1998 thorpej

Use the pool allocator for ipflow entries.


# 1.72 08-Oct-1998 thorpej

Use the pool allocator for ipqent structures.


# 1.71 30-Sep-1998 tls

Switch order of TNF and UCB copyrights so UCB copyright is first; this seems more appropriate since UCB wrote the original code, after all.


# 1.70 09-Sep-1998 thorpej

Make a diagnostic printf more sensible, PR #5951, Heiko W. Rupp.


# 1.69 09-Aug-1998 mrg

defopt PFIL_HOOKS.


Revision tags: eeh-paddr_t-base
# 1.68 17-Jul-1998 sommerfe

Fix PR5508: ipfil cut-through forwarding causes panic


# 1.67 01-Jun-1998 thorpej

Protect the ipflow_reap() call with splsoftnet.


# 1.66 24-May-1998 thorpej

Fix OBOB in IP timestamp option processing, as noted in FreeBSD PR 6738,
from Jennifer Dawn Meyers <jdm@enteract.com>.


# 1.65 04-May-1998 matt

Default IP flow to being enabled. Add a sysctl to control the maximum
number of flows (net.inet.ip.maxflows). If set to 0, will disable fast
path forwarding.


# 1.64 01-May-1998 thorpej

Allow packet filters to prevent a packet from creating a fast-forwarding
flow, by setting the "can fast forward" flag in the packet header, and
giving a chance for filters to clear the flag. If the flag is still
set after the filters have given it a chance, the packet will be used
to create a fast-forward flow entry.


# 1.63 29-Apr-1998 matt

Add support for "fast" forwarding. Add hooks in if_ethersubr.c and
if_fddisubr.c to fastpath IP forwarding. If ip_forward successfully
forwards a packet, it will create a cache (ipflow) entry. ether_input
and fddi_input will first call ipflow_fastforward with the received
packet and if the packet passes enough tests, it will be forwarded (the
ttl is decremented and the cksum is adjusted incrementally).


# 1.62 29-Apr-1998 matt

defopt GATEWAY


# 1.61 29-Apr-1998 kml

change path MTU timeout value to match RFC 1191


# 1.60 29-Apr-1998 kml

Add support for deletion of routes added by path MTU discovery;
uses new generic route timeout code. Add sysctl for timeout period.


# 1.59 19-Mar-1998 mrg

convert pfil(9) in and out lists from <sys/queue.h> LISTs to TAILQs, and
change pfil_add_hook to put output filters at the tail of the queue,
while continuing to place input filters at the head of the queue. update
the two users of these functions, and document these changes.

fixes PR#4593.


# 1.58 15-Feb-1998 tls

Add correct copyright notice for IP address hash change. This code is donated to TNF by the original copyright holder, Panix.


# 1.57 13-Feb-1998 tls

Change list of interface IP addresses to a hash. Improves performance on hosts with a large number of IP addresses significantly.


# 1.56 28-Jan-1998 thorpej

Use offsetof() from libkern.h


# 1.55 12-Jan-1998 scottr

Use option header file for MROUTING


# 1.54 05-Jan-1998 lukem

enhance ephemeral port allocation code:
* support sysctl net.inet.ip.anonportmin (lowest ephemeral port)
and net.inet.ip.anonportmax (highest ephemeral port).
these can't be set to >65535, < IPPORT_RESERVED (unless IPNOPRIVPORTS
is defined), and anonportmin has to be < anonportmax.
* use a cleaner way of only cycling through the available set once;
this will be useful for when a random allocation scheme is used
* define IPPORT_ANON{MIN,MAX} instead of IPPORT_USER{LOW,HIGH}


Revision tags: netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base
# 1.53 18-Oct-1997 kml

branches: 1.53.2;
change sysctl net.inet.icmp.mtudisc to net.inet.ip.mtudisc


# 1.52 17-Oct-1997 thorpej

Allow `subnetsarelocal' to be changed via sysctl.


Revision tags: thorpej-signal-base marc-pcmcia-base
# 1.51 29-Aug-1997 gwr

Tweaks to allow operation with an interface address of 0.0.0.0
(needed for NFS mountroot using BOOTP to get boot parameters)


Revision tags: marc-pcmcia-bp
# 1.50 24-Jun-1997 thorpej

branches: 1.50.4;
Eliminate use of dtom() from the network code, allowing more flexible
use of mbuf external storage and increasing performance (by eliminating
an m_pullup() for clusters in the IP reassembly code).

Changes from Koji Imada <koji@math.human.nagoya-u.ac.jp>, in PR #3628
and #3480, with ever-so-slight integration changes by me.


# 1.49 15-Apr-1997 christos

Move the mtod calls *after* we've made sure that the packet has passed the
filter successfully. Otherwise it can be NULL if the filter blocked it,
and we die. How did this ever work?


Revision tags: is-newarp-before-merge
# 1.48 26-Feb-1997 mrg

allow src-routed packetd by default, per host requirements


# 1.47 25-Feb-1997 cjs

Add net.inet.ip.allowsrcrt option which allows/drops all source
routed packets. This currently defaults to `drop,' but once we
verify that all applications that rely on determining remote IP
addresses for authentication are dropping the connection when they
see a source route option (not just disabling the source route
option), we can turn this back on and conform with the host
requirements.


# 1.46 19-Feb-1997 cjs

Fix bug in sysctl net.inet.ip.forwsrcrt handing: now you can read it
if securelevel > 0. (Thanks, cgd.)


# 1.45 18-Feb-1997 mrg

pseudo-device ipfilter brings in PFIL_HOOKS.


Revision tags: is-newarp-base
# 1.44 11-Jan-1997 thorpej

branches: 1.44.4;
Implement the IP_RECVIF socket option: supply a datagram packet's incoming
interface using a sockaddr_dl in a control mbuf.

Implement SO_TIMESTAMP for IP datagrams.

Move packet information option processing into a generic function
so that they work with multicast UDP and raw IP as well as unicast UDP.

Contributed by Bill Fenner <fenner@parc.xerox.com>.


# 1.43 20-Dec-1996 mrg

in pfil_hooks: always reassign ip after calling hook.


# 1.42 20-Dec-1996 mrg

remove pfil_bad.


# 1.41 25-Oct-1996 thorpej

Before concatenating frags, sanity check the length of the packet. If it's
larger than IP_MAXPACKET, discard it.
Based on a patch from Bill Fenner <fenner@parc.xerox.com>


# 1.40 22-Oct-1996 veego

Fix a panic from the pfil_hooks.


# 1.39 13-Oct-1996 christos

backout previous kprintf changes


# 1.38 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.37 21-Sep-1996 perry

commit fix in pr 2772 -- the IP input code was assuming that the
reserved (must be zero) flag must necessarily be zero. We now define
an IP_RF (by analogy to IP_DF and IP_MF) and mask it out when necessary.


# 1.36 14-Sep-1996 mrg

move the packet filter hooks in to a saner location. while i'm here, rename
PACKET_FILTER to PFIL_HOOKS.


# 1.35 09-Sep-1996 mycroft

Add in_nullhost() and in_hosteq() macros, to hide some protocol
details. Also, fix a bug in TCP wrt SYN+URG packets.


# 1.34 08-Sep-1996 mycroft

Save 68 bytes of the packet for ICMP, not 64. From Laine Stump, PR 2296.


# 1.33 06-Sep-1996 mrg

add packet filter interface code. see pfil(9) for more details. you
need the PACKET_FILTER option to enable this code. currently, ipfilter
version 3.1.1-beta has been converted to use this new interface.


# 1.32 14-Aug-1996 thorpej

Fix some DIAGNOSTIC printf() formats; ntohl() provides a 32-bit quantity,
and should be printed with %x, not %lx.


# 1.31 10-Jul-1996 cgd

print result of ntohl/htonl as a long. (makes -Wformat work on the
Alpha.)


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.30 16-Mar-1996 christos

branches: 1.30.4;
Fix printf format args.


# 1.29 26-Feb-1996 mrg

two more local addr changes, all done differently now (idea from charles)


# 1.28 13-Feb-1996 christos

netinet prototypes


# 1.27 16-Jan-1996 thorpej

Add a net.inet.ip.directed-broadcast sysctl as suggested by
Darren Reed <darrenr@vitruvius.arbld.unimelb.edu.au> in PR #1227.
This change is slightly different than the one submitted by Darren in
that the DIRECTED_BROADCAST compile-time option will behave like it used
to so that existing configurations utilizing it won't have to change.


# 1.26 15-Jan-1996 thorpej

Add net.inet.ip.forwsrcrt: if zero, the system will not forward
source-routed packets. Note this value is protected by kernel security
level; it can only be changed if securelevel < 1.


# 1.25 21-Nov-1995 cgd

make netinet work on systems where pointers and longs are 64 bits
(like the alpha). Biggest problem: IP headers were overlayed with
structure which included pointers, and which therefore didn't overlay
properly on 64-bit machines. Solution: instead of threading pointers
through IP header overlays, add a "queue element" structure to do
the threading, and point it at the ip headers.


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.24 12-Aug-1995 mycroft

splnet --> splsoftnet


# 1.23 12-Jun-1995 mycroft

Change in_pcbnotify*() to take an errno value. Make inetctlerrmap[] an
array on ints, not u_chars.


# 1.22 12-Jun-1995 mycroft

Various cleanup, including:
* Convert several data structures to use queue.h.
* Split in_pcbnotify() into two parts; one for notifying a specific PCB, and
one for notifying all PCBs for a particular foreign address.


# 1.21 07-Jun-1995 mycroft

Remove ip_ifmatrix completely.


# 1.20 04-Jun-1995 mycroft

Don't cast things unnecessarily.


# 1.19 04-Jun-1995 mycroft

Clean up many more casts.


# 1.18 01-Jun-1995 mycroft

Avoid byte-swapping IP addresses at run time.


# 1.17 15-May-1995 cgd

oops; forgot a '{'


# 1.16 14-May-1995 cgd

drop (and record) malformed IP fragments. Fixes pr 1030 (differently).


# 1.15 13-Apr-1995 cgd

be a bit more careful and explicit with types. (basically a large no-op.)


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.14 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.13 13-May-1994 mycroft

Update to 4.4-Lite networking code, with a few local changes.


# 1.12 14-Feb-1994 mycroft

PARANOID --> DIAGNOSTIC for inexpensive tests.


# 1.11 02-Feb-1994 hpeyerl

Multicast is no longer optional.


# 1.10 29-Jan-1994 brezak

Fix some cases of NOT dealing with m_pkthdr's. This code is still suspect though, at least this fixes some panics.


# 1.9 10-Jan-1994 mycroft

Should compile now with or without `options MULTICAST'.


# 1.8 09-Jan-1994 mycroft

Prototype the rest.


# 1.7 08-Jan-1994 mycroft

More prototypes.


# 1.6 08-Jan-1994 mycroft

Fix some inconsistent spacing; spaces at the end of lines, etc.


# 1.5 18-Dec-1993 mycroft

Canonicalize all #includes.


# 1.4 06-Dec-1993 hpeyerl

multicast support.
>From Chris Maeda, cmaeda@cs.washington.edu
These patches are derived from the IP Multicast patches for BSDI.


Revision tags: magnum-base netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.3 20-May-1993 cgd

branches: 1.3.4;
more rcsid additions and file header cleanups


# 1.2 04-May-1993 cgd

make ip_input recursion checking be for -DPARANOID, and make it panic


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.364 01-Jan-2018 christos

1) "#define ipi_spec_dst ipi_addr" in <netinet/in.h>
2) Change the IP_RECVPKTINFO option to control the generation of
IP_PKTINFO control messages, the way it's done in Solaris.
3) Remove the superfluous IP_RECVPKTINFO control message.
4) Change the IP_PKTINFO option to do different things depending on
the parameter it's supplied with:
- If it's sizeof(int), assume it's being used as in Linux:
- If it's non-zero, turn on the IP_RECVPKTINFO option.
- If it's zero, turn off the IP_RECVPKTINFO option.
- If it's sizeof(struct in_pktinfo), assume it's being used as in
Solaris, to set a default for the source interface and/or
source address for outgoing packets on the socket.
5) Return what Linux or Solaris compatible code expects, depending
on data size, and just added a fallback to a Linux (and current NetBSD)
compatible value if the size is unknown (as it is now), or,
in the future, if the calling application specifies a receiving
buffer that doesn't match either data item.

From: Tom Ivar Helbekkmo


Revision tags: tls-maxphys-base-20171202
# 1.363 24-Nov-2017 roy

Allow local communication over DETACHED addresses.
Allow binding to DETACHED or TENTATIVE addresses as we deny
sending upstream from them anyway.
Prefer non DETACHED or TENTATIVE addresses.


# 1.362 17-Nov-2017 ozaki-r

Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch

It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.

No functional change


# 1.361 27-Sep-2017 ozaki-r

Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).


Revision tags: nick-nhusb-base-20170825
# 1.360 27-Jul-2017 ozaki-r

Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.


# 1.359 19-Jul-2017 ozaki-r

Correct a comment


Revision tags: perseant-stdc-iso10646-base
# 1.358 08-Jul-2017 christos

Reorder the controls to the ones that need an interface and the ones that
don't; process the ones that don't first. Add a DIAGNOSTIC if there is no
interface; really this should be a KASSERT/panic because it is a bug if the
interface is not set at this point.


# 1.357 06-Jul-2017 christos

remove unnecessary casts (no functional change)


# 1.356 06-Jul-2017 christos

Merge the two copies SO_TIMESTAMP/SO_OTIMESTAMP processing to a single
function, and add a SOOPT_TIMESTAMP define reducing compat pollution from
5 places to 1.


Revision tags: netbsd-8-base
# 1.355 01-Jun-2017 chs

branches: 1.355.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base
# 1.354 31-Mar-2017 ozaki-r

Don't use a single global variable to store source route information for multiple incoming packets

It's not MP-safe. So use a m_tag to store the information instead.

Pointed out by knakahara@
The fix is from OpenBSD (originally fixed in FreeBSD)


# 1.353 31-Mar-2017 ozaki-r

Don't use a single global variable as a temporal storage for multiple packets

It's not MP-safe. So use local variables instead.


Revision tags: pgoyette-localcount-20170320
# 1.352 06-Mar-2017 ozaki-r

Make sure icmp_redirect_timeout_q and ip_mtudisc_timeout_q are initialized on bootup

Fix PR kern/52029


# 1.351 17-Feb-2017 ozaki-r

Fix return value


# 1.350 17-Feb-2017 ozaki-r

Protect sysctl_net_inet_ip_pmtudto with icmp_mtx instead of softnet_lock


# 1.349 07-Feb-2017 ozaki-r

Add missing NULL checks for m_get_rcvif


Revision tags: nick-nhusb-base-20170204
# 1.348 24-Jan-2017 ozaki-r

Tweak softnet_lock and NET_MPSAFE

- Don't hold softnet_lock in some functions if NET_MPSAFE
- Add softnet_lock to sysctl_net_inet_icmp_redirtimeout
- Add softnet_lock to expire_upcalls of ip_mroute.c
- Restore softnet_lock for in{,6}_pcbpurgeif{,0} if NET_MPSAFE
- Mark some softnet_lock for future work


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107
# 1.347 12-Dec-2016 ozaki-r

branches: 1.347.2;
Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.346 08-Dec-2016 ozaki-r

Use psref for ip_rtaddr

ip_rtaddr will be sleepable soon. So use psref instead of pserialize.


# 1.345 08-Dec-2016 ozaki-r

Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.344 18-Oct-2016 ozaki-r

Don't hold global locks if NET_MPSAFE is enabled

If NET_MPSAFE is enabled, don't hold KERNEL_LOCK and softnet_lock in
part of the network stack such as IP forwarding paths. The aim of the
change is to make it easy to test the network stack without the locks
and reduce our local diffs.

By default (i.e., if NET_MPSAFE isn't enabled), the locks are held
as they used to be.

Reviewed by knakahara@


# 1.343 18-Oct-2016 ozaki-r

Avoid double frees of mbuf

May fix one of panicks reported by Tom Ivar Helbekkmo in PR kern/51522


# 1.342 11-Oct-2016 ozaki-r

Fix kernel builds with IFA_STATS


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.341 07-Sep-2016 roy

Disallow input to detached addresses because they are not yet valid.


# 1.340 31-Aug-2016 ozaki-r

Make ipforward_rt and ip6_forward_rt percpu

Sharing one rtcache between CPUs is just a bad idea.

Reviewed by knakahara@


Revision tags: pgoyette-localcount-20160806
# 1.339 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


# 1.338 26-Jul-2016 ozaki-r

Fix downmatch increment


Revision tags: pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.337 08-Jul-2016 ozaki-r

branches: 1.337.2;
CID 1363344: remove dead code

We may need to reconsider a case when m_get_rcvif_psref returns NULL.


# 1.336 07-Jul-2016 ozaki-r

Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.


# 1.335 06-Jul-2016 ozaki-r

Switch the IPv4 address list to pslist(9)

Note that we leave the old list just in case; it seems there are some
kvm(3) users accessing the list. We can remove it later if we confirmed
nobody does actually.


# 1.334 06-Jul-2016 ozaki-r

Add and use pslist(9)-based hashtable for IPv4 addresses

Note that we leave the old hashtable to keep vmstat -H working.


# 1.333 04-Jul-2016 ozaki-r

Separate IP address matching functions

No functional change intended.


# 1.332 30-Jun-2016 ozaki-r

Tidy up goto lables

No functional change.


# 1.331 30-Jun-2016 ozaki-r

Fix error paths

Some error paths did m_put_rcvif_psref twice.


# 1.330 28-Jun-2016 ozaki-r

Add missing NULL checks for m_get_rcvif_psref


# 1.329 10-Jun-2016 ozaki-r

Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319
# 1.328 21-Jan-2016 riastradh

Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.


# 1.327 21-Jan-2016 riastradh

Give proper prototype to ip_output.


# 1.326 08-Jan-2016 knakahara

eliminate ip_input.c and ip6_input.c dependency on gif(4)


Revision tags: nick-nhusb-base-20151226
# 1.325 13-Oct-2015 roy

Include arp.h to restore the sysctl net.inet.ip.dad_count.
Fixes PR kern/49883 thanks to HITOSHI Osada.


Revision tags: nick-nhusb-base-20150921
# 1.324 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.323 07-Aug-2015 ozaki-r

Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.


Revision tags: nick-nhusb-base-20150606
# 1.322 02-May-2015 joerg

Fix !ARP build.


# 1.321 02-May-2015 roy

Add IPv4 address flags IN_IFF_TENTATIVE, IN_IFF_DUPLICATED and
IN_IFF_DETATCHED to mimic the IPv6 address behaviour.
Add SIOCGIFAFLAG_IN ioctl to retrieve the address flag via the
ifreq structure.
Add IPv4 DAD detection via the ARP methods described in RFC 5227.
Add sysctls net.inet.ip.dad_count and net.inet.arp.debug.

Discussed on tech-net@


Revision tags: nick-nhusb-base-20150406
# 1.320 26-Mar-2015 ozaki-r

Tidy up the regular path of ip_forward

No functional change is intended.


Revision tags: netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.319 16-Jun-2014 ozaki-r

branches: 1.319.4;
Add 3rd argument to pktq_create to pass sc

It will be used to pass bridge sc for bridge_forward softint.

ok rmind@


# 1.318 05-Jun-2014 rmind

- Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.


# 1.317 30-May-2014 christos

Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.


# 1.316 29-May-2014 rmind

Make IGMP and multicast group management code MP-safe. Use a read-write
lock to protect the hash table of multicast address records; also, make it
private and eliminate some macros. In the long term, the lookup path ought
to be optimised.


# 1.315 28-May-2014 christos

CID 12164{49,51}: Remove bogus ifp == NULL checks; if ifp was really NULL,
we would have been dead a few lines before the tests.


# 1.314 23-May-2014 rmind

ip_input(), ip_savecontrol(): cache m->m_pkthdr.rcvif in a variable.


# 1.313 23-May-2014 rmind

Make ip_forward() static, there is no need to expose it.


# 1.312 23-May-2014 rmind

Make ip_input() static, there is no need to expose it.


# 1.311 22-May-2014 rmind

- Add in_init() and move some functions, variables and sysctls into in.c
where they belong to. Make some functions and variables static.
- ip_input.c: reduce some #ifdefs, cleanup a little.
- Move some sysctls into ip_flow.c as they belong there.

No functional change.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.310 19-Mar-2014 liamjfoy

branches: 1.310.2;
Remove ipflow_prune and replace with ipflow_reap. ok rmind@


Revision tags: riastradh-drm2-base3
# 1.309 25-Feb-2014 pooka

Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.308 29-Jun-2013 rmind

- Rewrite parts of pfil(9): use array to store hooks and thus be more cache
friendly (there are only few hooks in the system). Make the structures
opaque and the interface more strict.
- Remove PFIL_HOOKS option by making pfil(9) mandatory.


# 1.307 27-Jun-2013 christos

branches: 1.307.2;
flip src/dst


# 1.306 27-Jun-2013 christos

implement IP_PKTINFO and IP_RECVPKTINFO.


# 1.305 08-Jun-2013 rmind

Split IPsec code in ip_input() and ip_forward() into the separate routines
ipsec4_input() and ipsec4_forward(). Tested by christos@.


# 1.304 05-Jun-2013 christos

IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.


Revision tags: agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.303 29-Nov-2012 christos

Add a new sysctl to mark ports as reserved, so that they are not used in
the anonymous or reserved port allocation.


Revision tags: yamt-pagecache-base6
# 1.302 25-Jun-2012 christos

branches: 1.302.2;
rename rfc6056 -> portalgo, requested by yamt


# 1.301 22-Jun-2012 christos

PR/46602: Move the rfc6056 port randomization to the IP layer.


# 1.300 02-Jun-2012 dsl

Add some pre-processor magic to verify that the type of the data item
passed to sysctl_createv() actually matches the declared type for
the item itself.
In the places where the caller specifies a function and a structure
address (typically the 'softc') an explicit (void *) cast is now needed.
Fixes bugs in sys/dev/acpi/asus_acpi.c sys/dev/bluetooth/bcsp.c
sys/kern/vfs_bio.c sys/miscfs/syncfs/sync_subr.c and setting
AcpiGbl_EnableAmlDebugObject.
(mostly passing the address of a uint64_t when typed as CTLTYPE_INT).
I've test built quite a few kernels, but there may be some unfixed MD
fallout. Most likely passing &char[] to char *.
Also add CTLFLAG_UNSIGNED for unsiged decimals - not set yet.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.299 22-Mar-2012 drochner

remove KAME IPSEC, replaced by FAST_IPSEC


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.298 09-Jan-2012 liamjfoy

check against NULL


# 1.297 19-Dec-2011 drochner

rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.296 31-Aug-2011 plunky

branches: 1.296.2; 1.296.6;
NULL does not need a cast


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.295 03-May-2011 dyoung

*_drain() routines may be called with locks held, so instead of doing
any work in *_drain(), set a drain-needed flag. Do the work in the
fasttimo handler.

Contributed by Coyote Point Systems, Inc.


# 1.294 14-Apr-2011 dyoung

In ipintr(), don't overwrite ipintrq.ifq_maxlen with IFQ_MAXLEN.

Initialize ipintrq.ifq_maxlen using IFQ_MAXLEN directly instead of using
the global ipqmaxlen. Get rid of the global ipqmaxlen.

Now it works again to override the maximum IP queue length with, for
example, sysctl -w net.inet.ip.ifq.maxlen=5.


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.293 13-Dec-2010 matt

branches: 1.293.2;
Back out rev that shouldn't have been committed.


# 1.292 11-Dec-2010 matt

Add routines to calculate a checkesum if the driver concludes that the
h/w can't do it.


Revision tags: uebayasi-xip-base4
# 1.291 05-Nov-2010 rmind

ip_randomid: make mechanism MP-safe and more modular.

OK matt@


# 1.290 05-Nov-2010 rmind

ip_reass_packet: finish abstraction; some clean-up.
Discussed some time ago with matt@.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.289 19-Jul-2010 rmind

Abstract IP reassembly into single generic routine - ip_reass_packet().
Make struct ipq private and struct ipqent not visible to userland.
Push ip_len adjustment into reassembly layer.

OK matt@


# 1.288 13-Jul-2010 rmind

Split-off IPv4 re-assembly mechanism into a separate module. Abstract
into ip_reass_init(), ip_reass_lookup(), etc (note: abstraction is not
yet complete). No functional changes to the actual mechanism.

OK matt@


# 1.287 09-Jul-2010 rmind

ip_input: move lookup for fragment queue a little bit further. OK matt@.


Revision tags: uebayasi-xip-base1
# 1.286 01-Apr-2010 tls

As suggested by at least 3 different people (the guilty parties know who
they are) avoid repeated kernel_lock/unlock by using an intrq on the stack.

About 5%-10% better from run to run, on my *very* simpleminded test. Can't
possibly be worse.


# 1.285 31-Mar-2010 tls

Don't hold kernel lock across call to ip_input() -- it blocked *all*
hardware interrupts for the length of time it took for all dequeued
packets to flow up the stack (on multiprocessors only). Initial testing
shows performance impact is minimal -- since this temporary fix actually
means taking/releasing the kernel lock per-packet, that seems
acceptable.

Holding the kernel lock across the ip_input() call duplicated the
exclusion intended to be provided by the socket locks/softnet lock
(same lock, for INET/INET6 sockets) and could mask serious bugs. Several
hours' testing didn't turn any up but I'd be surprised if some don't now
appear.

Damon Permezel noticed the problem. Temporary fix suggested by matt@.


Revision tags: yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.284 16-Sep-2009 pooka

branches: 1.284.2; 1.284.4;
Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.283 17-Jul-2009 minskim

Delete trailing whitespace.


Revision tags: yamt-nfs-mp-base6
# 1.282 16-Jul-2009 minskim

Add the IP_RECVTTL option support.

If the IP_RECVTTL option is enabled on a SOCK_DGRAM socket, the
recvmsg(2) call will return the TTL of the received datagram. The
msg_control field in the msghdr structure points to a buffer that
contains a cmsghdr structure followed by the TTL value.

Modeled after FreeBSD implementation.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.281 18-Apr-2009 tsutsui

Remove extra whitespace added by a stupid tool.
XXX: more in src/sys/arch


# 1.280 15-Apr-2009 elad

Remove a few KAUTH_GENERIC_ISSUSER in favor of more descriptive
alternatives.

Discussed on tech-kern:

http://mail-index.netbsd.org/tech-kern/2009/04/11/msg004798.html

Input from ad@, christos@, dyoung@, tsutsui@.

Okay ad@.


# 1.279 18-Mar-2009 cegger

bcopy -> memcpy


Revision tags: nick-hppapmap-base2
# 1.278 19-Jan-2009 christos

branches: 1.278.2;
Provide compatibility to the old timeval SCM_TIMESTAMP messages.


Revision tags: mjf-devfs2-base
# 1.277 17-Dec-2008 cegger

kill MALLOC and FREE macros.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.276 23-Nov-2008 rmind

ip_input: fix an IPQ "lock" leak. (hi <matt>!)


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4
# 1.275 04-Oct-2008 pooka

branches: 1.275.2; 1.275.4;
POOL_INIT -> pool_init


Revision tags: wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.274 05-Sep-2008 seanb

Wrong route being consulted in one place
in ip_forward() after change to rtcache_*().
Restore previous behaviour.


# 1.273 20-Aug-2008 matt

Make the sysctl routines take out softnet_lock before dealing with
any data structures.

Change inet6ctlerrmap and zeroin6_addr to const.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base
# 1.272 05-May-2008 ad

branches: 1.272.2; 1.272.6;
- Convert hashinit() to use kmem_alloc(). The hash tables can be large
and it's better to not have them in kmem_map.
- Convert a couple of minor items along the way to kmem_alloc().
- Fix some memory leaks.


# 1.271 04-May-2008 thorpej

Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577


# 1.270 02-May-2008 ad

PR kern/38497 Out of memory allocating ksiginfo

Work around: don't acquire softnet_lock in protocol drain routines.


# 1.269 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.268 24-Apr-2008 ad

branches: 1.268.2;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.


# 1.267 23-Apr-2008 thorpej

Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.266 12-Apr-2008 thorpej

branches: 1.266.2;
Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.


# 1.265 09-Apr-2008 thorpej

- ipflow is not used outside ip_flow.c; move its definition there.
- Make ipflow_reap() private to ip_flow.c, and introduce ipflow_prune()
for external callers to use (avoids returning an ipflow * that is never
actually used anyway).


# 1.264 07-Apr-2008 thorpej

Change IP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ipstat structure; old netstat
binaries will continue to work properly.


# 1.263 27-Mar-2008 cube

- Make sure we send a reasonable fragment size when IPSEC is configured.
Otherwise we end up sending a dubious "0" whenever we cannot find a
proper association for the packet.
- Reset sack_newdata along with snd_nxt to avoid improper integer
arithmetics that lead to sending data from an incorrect place in the
stream, making it appear as corrupted.

Patch by Michael Van Elst, based on an analysis by Michael for the IPSEC
stuff and I for the SACK issue.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.262 06-Feb-2008 matt

branches: 1.262.6;
Add a new ip_id generation scheme based on a Fisher-Yates shuffle over a
sliding window. XXX replace use of arc4random RSN.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.261 14-Jan-2008 dyoung

Use rtcache_validate() instead of rtcache_getrt(). Shorten staircase
in in_losing().


Revision tags: vmlocking2-base3 matt-armv6-base
# 1.260 22-Dec-2007 matt

Fix offset calculation.
Make sure that all frags use the same TOS.


# 1.259 21-Dec-2007 matt

Also make sure the first is at 68 bytes long.


# 1.258 21-Dec-2007 matt

Prevent TCP blind data attacks by not allowing non-initial fragments to
start at less than 68 bytes (minimal fragment size).


# 1.257 20-Dec-2007 dyoung

Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.256 26-Nov-2007 yamt

branches: 1.256.2; 1.256.6;
inetctlerrmap: use designated initializer.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.255 09-Nov-2007 kefren

Don't MCLAIM in ipintr() because we do it anyway in ip_input()


Revision tags: jmcneill-base yamt-x86pmap-base4 yamt-x86pmap-base3 yamt-x86pmap-base2 vmlocking-base
# 1.254 02-Oct-2007 dyoung

branches: 1.254.2; 1.254.4;
Delete the unused second argument to ip_stripoptions(), move it
closer to its single caller in if_eon.c, try to move fewer bytes
by moving the IP header forward instead of moving the tail of the
mbuf backward, and use m_adj(9) instead of fiddling directly with
mbuf data members.


Revision tags: yamt-x86pmap-base
# 1.253 11-Sep-2007 degroote

branches: 1.253.2;
In some FAST_IPSEC, spl level is not restored correctly. Fix that.

Spotted by Wolfgang Stukenbrock in pr/36800


Revision tags: nick-csl-alignment-base5
# 1.252 30-Aug-2007 dyoung

Use malloc(9) for sockaddrs instead of pool(9), and remove dom_sa_pool
and dom_sa_len members from struct domain. Pools of fixed-size
objects are too rigid for sockaddr_dls, whose size can vary over
a wide range.

Return sockaddr_dl to its "historical" size. Now that I'm using
malloc(9) instead of pool(9) to allocate sockaddr_dl, I can create
a sockaddr_dl of any size in the kernel, so expanding sockaddr_dl
is useless.

Avoid using sizeof(struct sockaddr_dl) in the kernel.

Introduce sockaddr_dl_alloc() for allocating & initializing an
arbitrary sockaddr_dl on the heap.

Add an argument, the sockaddr length, to sockaddr_alloc(),
sockaddr_copy(), and sockaddr_dl_setaddr().

Constify: LLADDR() -> CLLADDR().

Where the kernel overwrites LLADDR(), use sockaddr_dl_setaddr(),
instead. Used properly, sockaddr_dl_setaddr() will not overrun
the end of the sockaddr.


# 1.251 10-Aug-2007 dyoung

branches: 1.251.2;
Use sockaddr_dl_init().


Revision tags: matt-mips64-base
# 1.250 19-Jul-2007 dyoung

branches: 1.250.4; 1.250.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base yamt-idlelwp-base8 mjf-ufs-trans-base
# 1.249 02-May-2007 dyoung

branches: 1.249.2;
Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


Revision tags: thorpej-atomic-base
# 1.248 25-Mar-2007 liamjfoy

Add net.inet.ip.hashsize to control the IPv4 fast forward hash table size.


# 1.247 24-Mar-2007 liamjfoy

Don't call ip*flow_reap if we're just looking up maxflows


# 1.246 12-Mar-2007 ad

branches: 1.246.2; 1.246.4;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.


# 1.245 05-Mar-2007 liamjfoy

branches: 1.245.2;
Move ipflow_slowtimo from ip_slowtimo and into in_proto.c

ok matt@


# 1.244 04-Mar-2007 christos

Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.243 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.242 29-Jan-2007 dyoung

branches: 1.242.2;
Cosmetic: remove extraneous, non-KNF parentheses. Change a
sizeof(type) to a sizeof(*ptr) so the correctness of the statement
is correct "at a glance" (or so I hope).


# 1.241 22-Dec-2006 ad

ipintr(): check if the queue is empty before looping. Hardly a giant
win, but removed 30% of splnet() calls in one local test.


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.240 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.239 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


# 1.238 06-Dec-2006 dyoung

KNF.


# 1.237 06-Dec-2006 dyoung

KNF.


Revision tags: netbsd-4-0-RC1 netbsd-4-base
# 1.236 16-Nov-2006 christos

branches: 1.236.2; 1.236.4;
__unused removal on arguments; approved by core.


Revision tags: yamt-splraiseipl-base2
# 1.235 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


# 1.234 10-Oct-2006 dogcow

change the MOWNER_INIT define to take two args; fix extant struct mowner
decls to use it. Makes options MBUFTRACE compile again and not whinge about
missing structure declarations. (Also makes initialization consistent.)


# 1.233 05-Oct-2006 tls

Protect calls to pool_put/pool_get that may occur in interrupt context
with spl used to protect other allocations and frees, or datastructure
element insertion and removal, in adjacent code.

It is almost unquestionably the case that some of the spl()/splx() calls
added here are superfluous, but it really seems wrong to see:

s=splfoo();
/* frob data structure */
splx(s);
pool_put(x);

and if we think we need to protect the first operation, then it is hard
to see why we should not think we need to protect the next. "Better
safe than sorry".

It is also almost unquestionably the case that I missed some pool
gets/puts from interrupt context with my strategy for finding these
calls; use of PR_NOWAIT is a strong hint that a pool may be used from
interrupt context but many callers in the kernel pass a "can wait/can't
wait" flag down such that my searches might not have found them. One
notable area that needs to be looked at is pf.

See also:

http://mail-index.netbsd.org/tech-kern/2006/07/19/0003.html
http://mail-index.netbsd.org/tech-kern/2006/07/19/0009.html


# 1.232 19-Sep-2006 elad

Remove ugly (void *) casts from network scope authorization wrapper and
calls to it.

While here, adapt code for system scope listeners to avoid some more
casts (forgotten in previous run).

Update documentation.


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9
# 1.231 13-Sep-2006 elad

branches: 1.231.2;
Don't use KAUTH_RESULT_* where it's not applicable.
Prompted by yamt@.


# 1.230 08-Sep-2006 elad

First take at security model abstraction.

- Add a few scopes to the kernel: system, network, and machdep.

- Add a few more actions/sub-actions (requests), and start using them as
opposed to the KAUTH_GENERIC_ISSUSER place-holders.

- Introduce a basic set of listeners that implement our "traditional"
security model, called "bsd44". This is the default (and only) model we
have at the moment.

- Update all relevant documentation.

- Add some code and docs to help folks who want to actually use this stuff:

* There's a sample overlay model, sitting on-top of "bsd44", for
fast experimenting with tweaking just a subset of an existing model.

This is pretty cool because it's *really* straightforward to do stuff
you had to use ugly hacks for until now...

* And of course, documentation describing how to do the above for quick
reference, including code samples.

All of these changes were tested for regressions using a Python-based
testsuite that will be (I hope) available soon via pkgsrc. Information
about the tests, and how to write new ones, can be found on:

http://kauth.linbsd.org/kauthwiki

NOTE FOR DEVELOPERS: *PLEASE* don't add any code that does any of the
following:

- Uses a KAUTH_GENERIC_ISSUSER kauth(9) request,
- Checks 'securelevel' directly,
- Checks a uid/gid directly.

(or if you feel you have to, contact me first)

This is still work in progress; It's far from being done, but now it'll
be a lot easier.

Relevant mailing list threads:

http://mail-index.netbsd.org/tech-security/2006/01/25/0011.html
http://mail-index.netbsd.org/tech-security/2006/03/24/0001.html
http://mail-index.netbsd.org/tech-security/2006/04/18/0000.html
http://mail-index.netbsd.org/tech-security/2006/05/15/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/01/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/25/0000.html

Many thanks to YAMAMOTO Takashi, Matt Thomas, and Christos Zoulas for help
stablizing kauth(9).

Full credit for the regression tests, making sure these changes didn't break
anything, goes to Matt Fleming and Jaime Fournier.

Happy birthday Randi! :)


Revision tags: yamt-pdpolicy-base8 rpaulo-netinet-merge-pcb-base
# 1.229 30-Aug-2006 christos

branches: 1.229.2;
fix initializer


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.228 30-Jul-2006 elad

ugh.. more stuff that's overdue and should not be in 4.0: remove the
sysctl(9) flags CTLFLAG_READONLY[12]. luckily they're not documented
so it's only half regression.

only two knobs used them; proc.curproc.corename (check added in the
existing handler; its CTLFLAG_ANYWRITE, yay) and net.inet.ip.forwsrcrt,
that got its own handler now too.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base
# 1.227 07-Jun-2006 kardel

merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base simonb-timecounters-base
# 1.226 08-May-2006 liamjfoy

branches: 1.226.2;
#if -> #ifdef

ok christos


# 1.225 15-Apr-2006 christos

Coverity CID 1134: Protect against NULL deref.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.224 18-Feb-2006 joerg

branches: 1.224.2; 1.224.4; 1.224.6;
Print the source and destination IP in ip_forward's DIAGNOSTIC code
with inet_ntoa, making it more human friendly.

From Liam J. Foy in private mail.


# 1.223 24-Dec-2005 perry

branches: 1.223.2; 1.223.4; 1.223.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.222 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.221 01-Nov-2005 christos

Don't decrement the ttl, until we are sure that we can forward this packet.
Before if there was no route, we would call icmp_error with a datagram
packet that has an incorrect checksum. (From Liam Foy)


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base
# 1.220 23-Oct-2005 christos

No need to pass an interface when only the mtu is needed. From OpenBSD via
Liam Foy.


Revision tags: yamt-vop-base
# 1.219 05-Aug-2005 elad

branches: 1.219.2;
Add sysctls for IP, ICMP, TCP, and UDP statistics.


# 1.218 28-Jun-2005 seanb

branches: 1.218.2;
- Return ICMP_UNREACH_NET when no route found as per
section 4.3.3.1 of rfc1812.


# 1.217 09-Jun-2005 atatat

Properly fix the constipated lossage wrt -Wcast-qual and the sysctl
code. I know it's not the prettiest code, but it seems to work rather
well in spite of itself.


# 1.216 01-Jun-2005 blymn

Unconstify rnode to prevent compile error when GATEWAY option set.


Revision tags: kent-audio2-base
# 1.215 29-Apr-2005 yamt

move decl of inetsw to its own header to avoid array of incomplete type.
found by gcc4. reported by Adam Ciarcinski.


# 1.214 18-Apr-2005 yamt

fix problems related to loopback interface checksum omission. PR/29971.

- for ipv4, defer decision to ip layer as h/w checksum offloading does
so that it can check the actual interface the packet is going to.
- for ipv6, disable it.
(maybe will be revisited when it implements h/w checksum offloading.)

ok'ed by Jason Thorpe.


# 1.213 29-Mar-2005 yamt

ip_reass: clear stale csum_flags.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base
# 1.212 26-Feb-2005 perry

branches: 1.212.2;
nuke trailing whitespace


Revision tags: yamt-km-base2
# 1.211 03-Feb-2005 perry

ANSIfy function declarations


# 1.210 02-Feb-2005 perry

de-__P -- will ANSIfy .c files later.


Revision tags: yamt-km-base
# 1.209 24-Jan-2005 matt

branches: 1.209.2;
Add IFNET_FOREACH and IFADDR_FOREACH macros and start using them.


Revision tags: kent-audio1-beforemerge
# 1.208 19-Dec-2004 christos

branches: 1.208.2;
yamt's changes seem to fix all the checksumming issues. Turn the loopback
checksums back off so we can make sure that everything works.


# 1.207 17-Dec-2004 christos

Turn checksumming on loopback back on until we fix the bugs in it.
Connect over tcp on the loopback is broken:

4729 amq 0.000007 CALL connect(4,0x804f2a0,0x1c)
4729 amq 75.007420 RET connect -1 errno 60 Connection timed out


# 1.206 15-Dec-2004 thorpej

Don't perform checksums on loopback interfaces. They can be reenabled with
the net.inet.*.do_loopback_cksum sysctl.

Approved by: groo


Revision tags: kent-audio1-base
# 1.205 06-Oct-2004 darrenr

Add a comment to document what setting "srcrt" is really on about in ipintr()


# 1.204 29-Sep-2004 christos

PR/27081: Sean Boudreau: ip_input() bad csum count not incremented on sw csum


Revision tags: BEFORE-IPF413
# 1.203 25-May-2004 atatat

Sysctl descriptions under net subtree (net.key not done)


# 1.202 02-May-2004 darrenr

at line 543, we do a pullup here of hlen bytes into the mbuf,
so these later ones are superfluous.


# 1.201 01-May-2004 matt

Use EVCNT_ATTACH_STATIC{,2}


# 1.200 25-Apr-2004 simonb

Initialise (most) pools from a link set instead of explicit calls
to pool_init. Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.

Convert struct session, ucred and lockf to pools.


# 1.199 22-Apr-2004 matt

Constify protosw arrays. This can reduce the kernel .data section by
over 4K (if all the network protocols) are loaded.


# 1.198 01-Apr-2004 matt

In ip_reass_ttl_descr, make i signed since it's compared to >= 0


Revision tags: netbsd-2-0-base BEFORE-IPF411
# 1.197 24-Mar-2004 atatat

branches: 1.197.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.


# 1.196 15-Jan-2004 itojun

correct typo in 1.94 -> 1.95. pointed out by Shiva Shenoy


# 1.195 14-Dec-2003 thorpej

Fix syntax errors in CHECK_NMBCLUSTER_PARAMS().


# 1.194 14-Dec-2003 jonathan

Second part of hashed IP_reassembly changes:

When under pressure for mbufs or we have too many fragments in the IP
reassembly queue, drop half of all fragments. This multiplicative-drop
strategy ensures we return to a healthy state, even under borderline
denial-of-service from extremely lossy NFS-over-UDP peers.
The multiplicative-drop phase currently drops 50% of fragments, but
has pre-placed support for implementing drop-fractions other than 50%

The threshhold for the `drop-half' phase is the new variable,
ip_maxfrags which is calculated as nmbclusters/4.

ip_input.c now keeps ip_nmbclusters, a cached copy of nmbclusters.
Before using limits derived from nmbclusters, we check if nmbclusters
and ip_nmclusters are equal. If not, we recompute Ip parameters
derived from nmbclusters. Based on a suggestion by Jason Thorpe.
ip_maxfrags is currently auto-recalcuated.

The counters ip_nfrags and ip_nfragpacketsr are now declared static
and uninitialized (bss), to discourage tampering with them.


# 1.193 12-Dec-2003 scw

Make fast-ipsec and ipflow (Fast Forwarding) interoperate.

The idea is that we only clear M_CANFASTFWD if an SPD exists
for the packet. Otherwise, it's safe to add a fast-forward
cache entry for the route.

To make this work properly, we invalidate the entire ipflow
cache if a fast-ipsec key is added or changed.


# 1.192 08-Dec-2003 jonathan

Add new field ipq_nfrags to struct ipq. Maintain count of fragments
(fragments, not fragmented packets) in each queue entry.
Use ipq_nfrags to maintain a count of total fragments in reassembly queue.


# 1.191 07-Dec-2003 jonathan

KNF: s/unsigned/u_int/, in a couple of places I missed.


# 1.190 06-Dec-2003 jonathan

Replace the single global IP reassembly list/listhead, with a
hashtable of list-heads. Independently re-invented, then reworked to
match similar code in FreeBSD.


# 1.189 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.188 04-Dec-2003 scw

ipflow (IP fast forwarding) is not compatible with FAST_IPSEC either.

XXX: The decision whether or not to fast forward should be made
XXX: dynamically. Using the current approach seriously reduces
XXX: routing performance on gateways with IPsec enabled.


# 1.187 26-Nov-2003 itojun

define RANDOM_IP_ID by default (unifdef -DRANDOM_IP_ID).
one use remains in sys/netipsec, which is kept for freebsd source code compat.


# 1.186 24-Nov-2003 scw

For FAST_IPSEC, ipfilter gets to see wire-format IPsec-encapsulated packets
only. Decapsulated packets bypass ipfilter. This mimics current behaviour
for Kame IPsec.


# 1.185 19-Nov-2003 fvdl

Correct number of arguments to sysctl_rdint.


# 1.184 19-Nov-2003 jonathan

Patch back support for (badly) randomized IP ids, by request:

* Include "opt_inet.h" everywhere IP-ids are generated with ip_newid(),
so the RANDOM_IP_ID option is visible. Also in ip_id(), to ensure
the prototype for ip_randomid() is made visible.

* Add new sysctl to enable randomized IP-ids, provided the kernel was
configured with RANDOM_IP_ID. (The sysctl defaults to zero, and is
a read-only zero if RANDOM_IP_ID is not configured).

Note that the implementation of randomized IP ids is still defective,
and should not be enabled at all (even if configured) without
very careful deliberation. Caveat emptor.


# 1.183 17-Nov-2003 jonathan

Diff to netinet/ip_input.c (restore ip_id, initialize) for ip_id fix:

Revert the (default) ip_id algorithm to the pre-randomid algorithm,
due to demonstrated low-period repeated IDs from the randomized IP_id
code. Consensus is that the low-period repetition (much less than
2^15) is not suitable for general-purpose use.

Allocators of new IPv4 IDs should now call the function ip_newid().
Randomized IP_ids is now a config-time option, "options RANDOM_IP_ID".
ip_newid() can use ip_random-id()_IP_ID if and only if configured
with RANDOM_IP_ID. A sysctl knob should be provided.

This API may be reworked in the near future to support linear ip_id
counters per (src,dst) IP-address pair.


# 1.182 12-Nov-2003 itojun

KNF


# 1.181 11-Nov-2003 jonathan

Change global head-of-local-IP-address list from in_ifaddr to
in_ifaddrhead. Recent changes in struct names caused a namespace
collision in fast-ipsec, which are most cleanly fixed by using
"in_ifaddrhead" as the listhead name.


# 1.180 10-Nov-2003 jonathan

Make per-protocol network input queue stats visible to userland via
sysctl. Add a protocol-independent sysctl handler to show the per-protocol
"struct ifq' statistics. Add IP(v4) specific call to the handler.
Other protocols can show their per-protocol input statistics by
allocating a sysclt node and calling sysctl_ifq() with their own struct ifq *.

As posted to tech-kern plus improvements/cleanup suggested by Andrew Brown.


# 1.179 28-Sep-2003 mycroft

Remove some code that breaks AH tunnels completely. The comment describing
the purpose of this code appears to be on crack -- it's talking about
end-to-end authentication, but the purpose of an AH tunnel is NOT end-to-end
authentication; it's authentication of the tunnel endpoints.

NB: This does not fix the fact that IPsec leaks "packet tags."


# 1.178 06-Sep-2003 itojun

randomize IPv4/v6 fragment ID and IPv6 flowlabel. avoids predictability
of these fields. ip_id.c is from openbsd. ip6_id.c is adapted by kame.


# 1.177 06-Sep-2003 itojun

backout previous, we don't know if arc4random() corrides on reboot.


# 1.176 05-Sep-2003 itojun

initialize fragment ID with arc4random, not by time.tv_sec


# 1.175 22-Aug-2003 itojun

remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.


# 1.174 22-Aug-2003 itojun

change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.


# 1.173 15-Aug-2003 jonathan

(fast-ipsec): Add hooks to pass IPv4 IPsec traffic into fast-ipsec, if
configured with ``options FAST_IPSEC''. Kernels with KAME IPsec or
with no IPsec should work as before.

All calls to ip_output() now always pass an additional compulsory
argument: the inpcb associated with the packet being sent,
or 0 if no inpcb is available.

Fast-ipsec tested with ICMP or UDP over ESP. TCP doesn't work, yet.


# 1.172 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.171 14-Jul-2003 itojun

correct igmp. from love


# 1.170 03-Jul-2003 itojun

minor KNF


# 1.169 30-Jun-2003 itojun

branches: 1.169.2;
do not generate ICMP redirect when packet filter alters ip_dst to an
address that reside on the same link. Cedric Berger convinced me that
it is necessary.


# 1.168 30-Jun-2003 itojun

fix indent


# 1.167 23-Jun-2003 martin

Make sure to include opt_foo.h if a defflag option FOO is used.


# 1.166 15-Jun-2003 matt

Change the way multicasts are kept. They now use a hash table in the same
manner as the ifaddr hash table. By doing this, the mkludge code can go
away. At the same time, keep track of what pcbs are using what ifaddr and
when an address is deleted from an interface, notify/abort all sockets
that have that address as a source. Switch IGMP and multicasts to use pools
for allocation. Fix a number of potential problems in the igmp code where
allocation failures could cause a trap/panic.


# 1.165 11-Apr-2003 christos

PR/991: Darren Reed: Add a sysctl (checkinteface) to implement this. This
implementation is taken from FreeBSD, but we default to off.
XXX: We should really do this on a per ifaddr basis as jason suggested.


# 1.164 26-Feb-2003 matt

Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.163 12-Nov-2002 itojun

remove all entries in rt timer queue on ip_mtudisc change, instead of
destroying the queue.


# 1.162 12-Nov-2002 itojun

ckout previous - doesn't compile


# 1.161 12-Nov-2002 itojun

update ip_mtudisc sysctl change handling.


# 1.160 10-Nov-2002 itojun

always create pmtud timeout queue, as ip_mtudisc can be tweaked via
sysctl at runtime. From lha@stacken.kth.se


# 1.159 02-Nov-2002 perry

/*CONTCOND*/ while (0)'ed macros


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.158 23-Sep-2002 itojun

revert mtudisc_timeout value to the old one if update falis


# 1.157 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame.


# 1.156 11-Sep-2002 itojun

correct signedness mixup in pointer passing. sync w/kame


Revision tags: gehenna-devsw-base
# 1.155 14-Aug-2002 itojun

avoid swapping endian of ip_len and ip_off on mbuf, to meet with M_LEADINGSPACE
optimization made last year. should solve PR 17867 and 10195.

IP_HDRINCL behavior of raw ip socket is kept unchanged. we may want to
provide IP_HDRINCL variant that does not swap endian.


# 1.154 30-Jun-2002 thorpej

Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).


# 1.153 13-Jun-2002 itojun

set IPv4 parameter to modern value.
- turn on path MTU discovery (previous: turned off)
- ICMPv4 redirect entry timeout = 600 sec (previous: never timeout)


# 1.152 09-Jun-2002 itojun

whitespace


# 1.151 07-Jun-2002 itojun

look at rmx_mtu on IPsec tunnel MTU computation.
From: David Waitzman <djw@bbn.com>


Revision tags: netbsd-1-6-base
# 1.150 12-May-2002 matt

branches: 1.150.2; 1.150.4;
Eliminate commons.


# 1.149 12-May-2002 wiz

Spelling fixes, from Sergey Svishchev in kern/16650.


# 1.148 07-May-2002 matt

Change struct ipqe to use TAILQ's instead of LIST's (primarily for TCP's
benefit currently). Rework tcp_reass code to optimize the 4 most likely causes
of out-of-order packets: first OoO pkt, next OoO pkt in seq, OoO pkt is part
of new chuck of OoO packets, and the OoO pkt fills the first hole. Add evcnts
to instrument tcp_reass (enabled by the options TCP_REASS_COUNTERS). This is
part 1/2 of tcp_reass changes.


# 1.147 18-Apr-2002 matt

Change test for M_EXT to M_READONLY for MROUTING. We only need to to do
a pullup if we aren't allowed to modify the packet.


Revision tags: eeh-devprop-base newlock-base
# 1.146 08-Mar-2002 thorpej

Pool deals fairly well with physical memory shortage, but it doesn't
deal with shortages of the VM maps where the backing pages are mapped
(usually kmem_map). Try to deal with this:

* Group all information about the backend allocator for a pool in a
separate structure. The pool references this structure, rather than
the individual fields.
* Change the pool_init() API accordingly, and adjust all callers.
* Link all pools using the same backend allocator on a list.
* The backend allocator is responsible for waiting for physical memory
to become available, but will still fail if it cannot callocate KVA
space for the pages. If this happens, carefully drain all pools using
the same backend allocator, so that some KVA space can be freed.
* Change pool_reclaim() to indicate if it actually succeeded in freeing
some pages, and use that information to make draining easier and more
efficient.
* Get rid of PR_URGENT. There was only one use of it, and it could be
dealt with by the caller.

From art@openbsd.org.


Revision tags: ifpoll-base
# 1.145 25-Feb-2002 itojun

correctly enforce ipsec policy check on forwarding case.
From: Greg Troxel <gdt@ir.bbn.com>, Bill Chiarchiaro <wjc@work.cleartech.com>


# 1.144 24-Feb-2002 martin

Clear M_BCAST and M_MCAST on outgoing mbufs.
Don't copy ttl from the inner packet to the encapsulating packet. Make
the outer ttl sysctl'able. This should close PR 14269 from Jasper Wallace
(change partly from there) and it makes traceroute work over gre tunnels.


# 1.143 21-Feb-2002 itojun

suppress source quence message, based on router-req RFC (also could be abused
as DoS traffic generator). from kjc/kame


# 1.142 28-Nov-2001 darrenr

recompute hlen after calling pfil_run_hooks() in case ip_hl was changed.


# 1.141 13-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-mips-cache-base
# 1.140 04-Nov-2001 matt

Convert netinet to not use the internal <sys/queue.h> field names
but instead the access macros. Use the FOREACH macros where appropriate.


# 1.139 04-Nov-2001 matt

Change a few variable/tables to const since they are read-only.


# 1.138 29-Oct-2001 simonb

Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.137 17-Sep-2001 thorpej

branches: 1.137.2;
Split the pre-computed ifnet checksum flags into Tx and Rx directions.
Add capabilities bits that indicate an interface can only perform
in-bound TCPv4 or UDPv4 checksums. There is at least one Gig-E chip
for which this is true (Level One LXT-1001), and this is also the
case for the Intel i82559 10/100 Ethernet chips.


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.136 06-Aug-2001 itojun

branches: 1.136.2;
cache IPsec policy on in6?pcb. most of the lookup operations can be bypassed,
especially when it is a connected SOCK_STREAM in6?pcb. sync with kame.


# 1.135 02-Jun-2001 thorpej

branches: 1.135.2;
Implement support for IP/TCP/UDP checksum offloading provided by
network interfaces. This works by pre-computing the pseudo-header
checksum and caching it, delaying the actual checksum to ip_output()
if the hardware cannot perform the sum for us. In-bound checksums
can either be fully-checked by hardware, or summed up for final
verification by software. This method was modeled after how this
is done in FreeBSD, although the code is significantly different in
most places.

We don't delay checksums for IPv6/TCP, but we do take advantage of the
cached pseudo-header checksum.

Note: hardware-assisted checksumming defaults to "off". It is
enabled with ifconfig(8). See the manual page for details.

Implement hardware-assisted checksumming on the DP83820 Gigabit Ethernet,
3c90xB/3c90xC 10/100 Ethernet, and Alteon Tigon/Tigon2 Gigabit Ethernet.


# 1.134 21-May-2001 lukem

fix spelo in comment


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.133 16-Apr-2001 itojun

give a default value to net.inet.ip.maxfragpackets, to protect us from
"lots of fragmented packets" DoS attack.

the current default value is derived from ipv6 counterpart, which is
a magical value "200". it should be enough for normal systems, not sure
if it is enough when you take hundreds of thousands of tcp connections on
your system. if you have proposal for a better value with concrete reasons,
let me know.


# 1.132 13-Apr-2001 thorpej

Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.


# 1.131 27-Mar-2001 itojun

net.inet.ip.maxfragpackets defines the maximum size of ip reass queue
(prevents fragment flood from chewing up mbuf memory space).
derived from KAME net.inet6.ip6.maxfragpackets.


# 1.130 02-Mar-2001 itojun

branches: 1.130.2;
increase ipstat.ips_badaddr if the packet fails to pass address checks.


# 1.129 02-Mar-2001 itojun

reject packets with 127/8 on IPv4 src/dst, they must not appear on wire
(RFC1122). torture-tests will be welcomed.
XXX do we want to check source routing headers as well?


# 1.128 01-Mar-2001 itojun

make sure to enforce inbound ipsec policy checking, for any protocols on top
of ip (check it when final header is visited). sync with kame.
XXX kame team will need to re-check policy engine code


# 1.127 24-Jan-2001 itojun

- record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation


# 1.126 28-Dec-2000 thorpej

Back out the sledgehammer damage applied by wiz while I was out for
the holiday.


# 1.125 25-Dec-2000 wiz

Back out previous change. It causes NAT to fail, and was CLEARLY
NOT TESTED before it was committed.


# 1.124 22-Dec-2000 thorpej

Slight adjustment to how pfil_head's are registered. Instead of a
"key" and a "dlt", use a "type" (PFIL_TYPE_{AF,IFNET} for now) and
a val/ptr appropriate for that type. This allows for more future
flexibility with the pfil_hook mechanism.


# 1.123 14-Dec-2000 thorpej

Add ALTQ glue. XXX Temporary until ALTQ is changed to use a pfil hook.


# 1.122 24-Nov-2000 itojun

IFA_STATS stability (not complete); don't touch ip if it is NULL.


# 1.121 11-Nov-2000 thorpej

Restructure the PFIL_HOOKS mechanism a bit:
- All packets are passed to PFIL_HOOKS as they come off the wire, i.e.
fields in protocol headers in network order, etc.
- Allow for multiple hooks to be registered, using a "key" and a "dlt".
The "dlt" is a BPF data link type, indicating what type of header is
present.
- INET and INET6 register with key == AF_INET or AF_INET6, and
dlt == DLT_RAW.
- PFIL_HOOKS now take an argument for the filter hook, and mbuf **,
an ifnet *, and a direction (PFIL_IN or PFIL_OUT), thus making them
less IP (really, IP Filter) centric.

Maintain compatibility with IP Filter by adding wrapper functions for
IP Filter.


# 1.120 08-Nov-2000 ad

Update for hashinit() change.


# 1.119 13-Oct-2000 itojun

make sure we don't share external mbuf between m and mcopy, in ip_forward().
should solve PR 11201.


# 1.118 26-Aug-2000 itojun

make sure anonport{min,max} is not negative number


# 1.117 25-Aug-2000 tron

Add new sysctl variables "net.inet.ip.lowportmin" and
"net.inet.ip.lowportmax" which can be used to the set minimum
and maximum port number assigned to sockets using
IP_PORTRANGE_LOW.


# 1.116 06-Jul-2000 itojun

remove unnecessary #include <netkey/key_debug.h>. from kame.


# 1.115 28-Jun-2000 mrg

<vm/vm.h> -> <uvm/uvm_extern.h>


Revision tags: netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.114 10-May-2000 itojun

branches: 1.114.4;
add missing boundary checks to ip options processing.
correct timestamp option validation (len and ptr upper/lower bound
based on RFC791).
fill "pointer" field for parameter problem in timestamp option processing.


# 1.113 10-May-2000 itojun

correct more out-of-bounds memory access, if cnt == 1 and optlen > 1.


# 1.112 06-May-2000 sommerfeld

Handle large offsets with very small options correctly.


# 1.111 31-Mar-2000 jdolecek

Slighly improve previous - only include <netinet/ip_mroute.h> if MROUTING
is defined.


# 1.110 31-Mar-2000 jdolecek

include <netinet/ip_mroute.h> for ip_mforward() - needed after
last duplicate prototype sweep (prototype for ip_mforward() used to be in <netinet/ip_var.h>)


# 1.109 30-Mar-2000 augustss

Remove register declarations.


# 1.108 30-Mar-2000 simonb

Delete uninitialised declaration of ip_defttl - there's an initialised
decl earlier in this file.


# 1.107 10-Mar-2000 thorpej

Back out previous, and adjust a comment.


# 1.106 07-Mar-2000 thorpej

Back out part of 1.104 which isn't actually needed.


# 1.105 03-Mar-2000 itojun

remove unnecessary ttl initialization which I mistakingly bringed in
during KAME merge (this is part of WIDE's expeirmental reass code...)
NetBSD PR: 9412
From: Wolfgang Rupprecht <wolfgang@wsrcc.com>
Fix from: ho@crt.se
itojun was notified from: theo


# 1.104 02-Mar-2000 thorpej

Avoid a bug in GCC which manifests itself when processing unaligned
IP options. Problem pointed out by Matt Hargett and Erik Fair, analyzed
by me.


# 1.103 01-Mar-2000 itojun

introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)


# 1.102 20-Feb-2000 darrenr

pass "struct pfil_head *" to pfil_add_hook and pfil_remove hook rather
than "struct protosw *".


# 1.101 17-Feb-2000 darrenr

Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.


# 1.100 16-Feb-2000 itojun

- if ip_dst matches address on !IFF_UP interface, and
- there's no match against addresses on IFF_UP interface,
send icmp unreach if I'm router. drop it if I'm host.

Revised version of PR: 9387 from nrt@iij.ad.jp. Discussed with thorpej+nrt.


Revision tags: chs-ubc2-newbase
# 1.99 12-Feb-2000 thorpej

Typo (Thanks, Havard :-)


# 1.98 12-Feb-2000 thorpej

Small cosmetic change, and note a place where a statistic should be
gathered.


# 1.97 11-Feb-2000 itojun

fix in-kernel packet forwarding loop (till TTL becomes 0) when:
- a packet is delivered to an address X,
- and the address X is configured on my !IFF_UP interface
- and ipforwarding=1

NetBSD PR: 9387
From: nrt@iij.ad.jp


# 1.96 01-Feb-2000 thorpej

Use ifatoia() and sintosa() consistently, rather than using home-grown
casting macros intermixed.


# 1.95 31-Jan-2000 itojun

bring in latest KAME ipsec tree.
- interop issues in ipcomp is fixed
- padding type (after ESP) is configurable
- key database memory management (need more fixes)
- policy specification is revisited

XXX m->m_pkthdr.rcvif is still overloaded - hope to fix it soon


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.94 26-Oct-1999 itojun

disable ipflow (IPv4 fast fowarding) when IPsec is configured into the kernel.


# 1.93 17-Oct-1999 sommerfeld

branches: 1.93.2; 1.93.4;
In ip_forward():

Avoid forwarding ip unicast packets which were contained inside
link-level multicast packets; having M_MCAST still set in the packet
header flags will mean that the packet will get multicast to a bogus
group instead of unicast to the next hop.

Malformed packets like this have occasionally been spotted "in the
wild" on a mediaone cable modem segment which also had multiple netbsd
machines running as router/NAT boxes.

Without this, any subnet with multiple netbsd routers receiving all
multicasts will generate a packet storm on receipt of such a
multicast. Note that we already do the same check here for link-level
broadcasts; ip6_forward already does this as well.

Note that multicast forwarding does not go through ip_forward().

Adding some code to if_ethersubr to sanity check link-level
vs. ip-level multicast addresses might also be worthwhile.


Revision tags: chs-ubc2-base
# 1.92 23-Jul-1999 itojun

branches: 1.92.2;
do not include unnecessary include files.


# 1.91 09-Jul-1999 thorpej

defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).


# 1.90 06-Jul-1999 itojun

sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups


# 1.89 01-Jul-1999 itojun

IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.88 26-Jun-1999 sommerfeld

If the new global variable hostzerobroadcast is zero, no longer assume
address zero of each net/subnet is a broadcast address.
(The default value is nonzero, which preserves the current behavior).

This can be set using sysctl; the boot-time default can also be
configured using the HOSTZEROBROADCAST kernel config option.

While we're here, defopt HOSTZEROBROADCAST and SUBNETSARELOCAL


# 1.87 04-May-1999 hwr

It does not make much sense to increase a "output" counter on input.


# 1.86 03-May-1999 thorpej

In INADDR_TO_IA(), skip interfaces which are not up. Revert previous change
to ip_input.c to check the interface status after INADDR_TO_IA().

Fix cooked up by Heiko Rupp and myself.

Fixes PR 7480.


# 1.85 03-May-1999 hwr

Drop packets, that have a Class-D address as source address.
Implements the first half of PR 7003.


# 1.84 07-Apr-1999 proff

tiny KNF change


# 1.83 07-Apr-1999 proff

Prevent reception of packets on downed interfaces (via an up interface).
fixes kern/7327


Revision tags: netbsd-1-4-base
# 1.82 27-Mar-1999 aidan

branches: 1.82.2;
Added per-addr input/output statistics. Currently just support netatalk
and netinet, currently only tested under netinet.

Disabled by default, enabled by compiling the kernel with option
IFA_STATS. Enabling this feature seems to make the ip_output function
take 13% longer than before, which should be OK for people that need
this feature.


# 1.81 26-Mar-1999 proff

security: test for ip_len < ip_hl <<2 and drop packet accordingly


# 1.80 19-Jan-1999 mycroft

There's just no plausible reason to byte-swap ip_id internally. It's opaque.


# 1.79 19-Jan-1999 mycroft

Don't screw with ip_len; just subtract from it where we actually use the
value.


# 1.78 19-Jan-1999 mycroft

Don't overwrite the checksum fields when checking them. There's no reason to
do this, and it screws up ICMP replies.
XXX The returned IP checksum and length are still wrong.


# 1.77 11-Jan-1999 thorpej

Fix byte order and ip_len inconsistencies in ICMP reply code. Also, fix
some formatting and HTONS(foo) vs. foo = htons(foo) inconsistencies.

PR #6602, Darren Reed.


# 1.76 19-Dec-1998 thorpej

Reverse the copyright-notice-swap. It went against existing practice.


# 1.75 18-Dec-1998 thorpej

Add a lock around the IP fragment reassembly queue, to prevent ip_drain()
from corrupting the queue if called from a device's interrupt context.

Should fix PR #5684.


Revision tags: kenh-if-detach-base
# 1.74 13-Nov-1998 thorpej

branches: 1.74.2;
Once a fragmented IP packet has been reassembled, recompute the packet
length before passing it up the stack. From FreeBSD.


Revision tags: chs-ubc-base
# 1.73 08-Oct-1998 thorpej

Use the pool allocator for ipflow entries.


# 1.72 08-Oct-1998 thorpej

Use the pool allocator for ipqent structures.


# 1.71 30-Sep-1998 tls

Switch order of TNF and UCB copyrights so UCB copyright is first; this seems more appropriate since UCB wrote the original code, after all.


# 1.70 09-Sep-1998 thorpej

Make a diagnostic printf more sensible, PR #5951, Heiko W. Rupp.


# 1.69 09-Aug-1998 mrg

defopt PFIL_HOOKS.


Revision tags: eeh-paddr_t-base
# 1.68 17-Jul-1998 sommerfe

Fix PR5508: ipfil cut-through forwarding causes panic


# 1.67 01-Jun-1998 thorpej

Protect the ipflow_reap() call with splsoftnet.


# 1.66 24-May-1998 thorpej

Fix OBOB in IP timestamp option processing, as noted in FreeBSD PR 6738,
from Jennifer Dawn Meyers <jdm@enteract.com>.


# 1.65 04-May-1998 matt

Default IP flow to being enabled. Add a sysctl to control the maximum
number of flows (net.inet.ip.maxflows). If set to 0, will disable fast
path forwarding.


# 1.64 01-May-1998 thorpej

Allow packet filters to prevent a packet from creating a fast-forwarding
flow, by setting the "can fast forward" flag in the packet header, and
giving a chance for filters to clear the flag. If the flag is still
set after the filters have given it a chance, the packet will be used
to create a fast-forward flow entry.


# 1.63 29-Apr-1998 matt

Add support for "fast" forwarding. Add hooks in if_ethersubr.c and
if_fddisubr.c to fastpath IP forwarding. If ip_forward successfully
forwards a packet, it will create a cache (ipflow) entry. ether_input
and fddi_input will first call ipflow_fastforward with the received
packet and if the packet passes enough tests, it will be forwarded (the
ttl is decremented and the cksum is adjusted incrementally).


# 1.62 29-Apr-1998 matt

defopt GATEWAY


# 1.61 29-Apr-1998 kml

change path MTU timeout value to match RFC 1191


# 1.60 29-Apr-1998 kml

Add support for deletion of routes added by path MTU discovery;
uses new generic route timeout code. Add sysctl for timeout period.


# 1.59 19-Mar-1998 mrg

convert pfil(9) in and out lists from <sys/queue.h> LISTs to TAILQs, and
change pfil_add_hook to put output filters at the tail of the queue,
while continuing to place input filters at the head of the queue. update
the two users of these functions, and document these changes.

fixes PR#4593.


# 1.58 15-Feb-1998 tls

Add correct copyright notice for IP address hash change. This code is donated to TNF by the original copyright holder, Panix.


# 1.57 13-Feb-1998 tls

Change list of interface IP addresses to a hash. Improves performance on hosts with a large number of IP addresses significantly.


# 1.56 28-Jan-1998 thorpej

Use offsetof() from libkern.h


# 1.55 12-Jan-1998 scottr

Use option header file for MROUTING


# 1.54 05-Jan-1998 lukem

enhance ephemeral port allocation code:
* support sysctl net.inet.ip.anonportmin (lowest ephemeral port)
and net.inet.ip.anonportmax (highest ephemeral port).
these can't be set to >65535, < IPPORT_RESERVED (unless IPNOPRIVPORTS
is defined), and anonportmin has to be < anonportmax.
* use a cleaner way of only cycling through the available set once;
this will be useful for when a random allocation scheme is used
* define IPPORT_ANON{MIN,MAX} instead of IPPORT_USER{LOW,HIGH}


Revision tags: netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base
# 1.53 18-Oct-1997 kml

branches: 1.53.2;
change sysctl net.inet.icmp.mtudisc to net.inet.ip.mtudisc


# 1.52 17-Oct-1997 thorpej

Allow `subnetsarelocal' to be changed via sysctl.


Revision tags: thorpej-signal-base marc-pcmcia-base
# 1.51 29-Aug-1997 gwr

Tweaks to allow operation with an interface address of 0.0.0.0
(needed for NFS mountroot using BOOTP to get boot parameters)


Revision tags: marc-pcmcia-bp
# 1.50 24-Jun-1997 thorpej

branches: 1.50.4;
Eliminate use of dtom() from the network code, allowing more flexible
use of mbuf external storage and increasing performance (by eliminating
an m_pullup() for clusters in the IP reassembly code).

Changes from Koji Imada <koji@math.human.nagoya-u.ac.jp>, in PR #3628
and #3480, with ever-so-slight integration changes by me.


# 1.49 15-Apr-1997 christos

Move the mtod calls *after* we've made sure that the packet has passed the
filter successfully. Otherwise it can be NULL if the filter blocked it,
and we die. How did this ever work?


Revision tags: is-newarp-before-merge
# 1.48 26-Feb-1997 mrg

allow src-routed packetd by default, per host requirements


# 1.47 25-Feb-1997 cjs

Add net.inet.ip.allowsrcrt option which allows/drops all source
routed packets. This currently defaults to `drop,' but once we
verify that all applications that rely on determining remote IP
addresses for authentication are dropping the connection when they
see a source route option (not just disabling the source route
option), we can turn this back on and conform with the host
requirements.


# 1.46 19-Feb-1997 cjs

Fix bug in sysctl net.inet.ip.forwsrcrt handing: now you can read it
if securelevel > 0. (Thanks, cgd.)


# 1.45 18-Feb-1997 mrg

pseudo-device ipfilter brings in PFIL_HOOKS.


Revision tags: is-newarp-base
# 1.44 11-Jan-1997 thorpej

branches: 1.44.4;
Implement the IP_RECVIF socket option: supply a datagram packet's incoming
interface using a sockaddr_dl in a control mbuf.

Implement SO_TIMESTAMP for IP datagrams.

Move packet information option processing into a generic function
so that they work with multicast UDP and raw IP as well as unicast UDP.

Contributed by Bill Fenner <fenner@parc.xerox.com>.


# 1.43 20-Dec-1996 mrg

in pfil_hooks: always reassign ip after calling hook.


# 1.42 20-Dec-1996 mrg

remove pfil_bad.


# 1.41 25-Oct-1996 thorpej

Before concatenating frags, sanity check the length of the packet. If it's
larger than IP_MAXPACKET, discard it.
Based on a patch from Bill Fenner <fenner@parc.xerox.com>


# 1.40 22-Oct-1996 veego

Fix a panic from the pfil_hooks.


# 1.39 13-Oct-1996 christos

backout previous kprintf changes


# 1.38 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.37 21-Sep-1996 perry

commit fix in pr 2772 -- the IP input code was assuming that the
reserved (must be zero) flag must necessarily be zero. We now define
an IP_RF (by analogy to IP_DF and IP_MF) and mask it out when necessary.


# 1.36 14-Sep-1996 mrg

move the packet filter hooks in to a saner location. while i'm here, rename
PACKET_FILTER to PFIL_HOOKS.


# 1.35 09-Sep-1996 mycroft

Add in_nullhost() and in_hosteq() macros, to hide some protocol
details. Also, fix a bug in TCP wrt SYN+URG packets.


# 1.34 08-Sep-1996 mycroft

Save 68 bytes of the packet for ICMP, not 64. From Laine Stump, PR 2296.


# 1.33 06-Sep-1996 mrg

add packet filter interface code. see pfil(9) for more details. you
need the PACKET_FILTER option to enable this code. currently, ipfilter
version 3.1.1-beta has been converted to use this new interface.


# 1.32 14-Aug-1996 thorpej

Fix some DIAGNOSTIC printf() formats; ntohl() provides a 32-bit quantity,
and should be printed with %x, not %lx.


# 1.31 10-Jul-1996 cgd

print result of ntohl/htonl as a long. (makes -Wformat work on the
Alpha.)


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.30 16-Mar-1996 christos

branches: 1.30.4;
Fix printf format args.


# 1.29 26-Feb-1996 mrg

two more local addr changes, all done differently now (idea from charles)


# 1.28 13-Feb-1996 christos

netinet prototypes


# 1.27 16-Jan-1996 thorpej

Add a net.inet.ip.directed-broadcast sysctl as suggested by
Darren Reed <darrenr@vitruvius.arbld.unimelb.edu.au> in PR #1227.
This change is slightly different than the one submitted by Darren in
that the DIRECTED_BROADCAST compile-time option will behave like it used
to so that existing configurations utilizing it won't have to change.


# 1.26 15-Jan-1996 thorpej

Add net.inet.ip.forwsrcrt: if zero, the system will not forward
source-routed packets. Note this value is protected by kernel security
level; it can only be changed if securelevel < 1.


# 1.25 21-Nov-1995 cgd

make netinet work on systems where pointers and longs are 64 bits
(like the alpha). Biggest problem: IP headers were overlayed with
structure which included pointers, and which therefore didn't overlay
properly on 64-bit machines. Solution: instead of threading pointers
through IP header overlays, add a "queue element" structure to do
the threading, and point it at the ip headers.


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.24 12-Aug-1995 mycroft

splnet --> splsoftnet


# 1.23 12-Jun-1995 mycroft

Change in_pcbnotify*() to take an errno value. Make inetctlerrmap[] an
array on ints, not u_chars.


# 1.22 12-Jun-1995 mycroft

Various cleanup, including:
* Convert several data structures to use queue.h.
* Split in_pcbnotify() into two parts; one for notifying a specific PCB, and
one for notifying all PCBs for a particular foreign address.


# 1.21 07-Jun-1995 mycroft

Remove ip_ifmatrix completely.


# 1.20 04-Jun-1995 mycroft

Don't cast things unnecessarily.


# 1.19 04-Jun-1995 mycroft

Clean up many more casts.


# 1.18 01-Jun-1995 mycroft

Avoid byte-swapping IP addresses at run time.


# 1.17 15-May-1995 cgd

oops; forgot a '{'


# 1.16 14-May-1995 cgd

drop (and record) malformed IP fragments. Fixes pr 1030 (differently).


# 1.15 13-Apr-1995 cgd

be a bit more careful and explicit with types. (basically a large no-op.)


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.14 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.13 13-May-1994 mycroft

Update to 4.4-Lite networking code, with a few local changes.


# 1.12 14-Feb-1994 mycroft

PARANOID --> DIAGNOSTIC for inexpensive tests.


# 1.11 02-Feb-1994 hpeyerl

Multicast is no longer optional.


# 1.10 29-Jan-1994 brezak

Fix some cases of NOT dealing with m_pkthdr's. This code is still suspect though, at least this fixes some panics.


# 1.9 10-Jan-1994 mycroft

Should compile now with or without `options MULTICAST'.


# 1.8 09-Jan-1994 mycroft

Prototype the rest.


# 1.7 08-Jan-1994 mycroft

More prototypes.


# 1.6 08-Jan-1994 mycroft

Fix some inconsistent spacing; spaces at the end of lines, etc.


# 1.5 18-Dec-1993 mycroft

Canonicalize all #includes.


# 1.4 06-Dec-1993 hpeyerl

multicast support.
>From Chris Maeda, cmaeda@cs.washington.edu
These patches are derived from the IP Multicast patches for BSDI.


Revision tags: magnum-base netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.3 20-May-1993 cgd

branches: 1.3.4;
more rcsid additions and file header cleanups


# 1.2 04-May-1993 cgd

make ip_input recursion checking be for -DPARANOID, and make it panic


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.361 27-Sep-2017 ozaki-r

Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).


Revision tags: nick-nhusb-base-20170825
# 1.360 27-Jul-2017 ozaki-r

Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.


# 1.359 19-Jul-2017 ozaki-r

Correct a comment


Revision tags: perseant-stdc-iso10646-base
# 1.358 08-Jul-2017 christos

Reorder the controls to the ones that need an interface and the ones that
don't; process the ones that don't first. Add a DIAGNOSTIC if there is no
interface; really this should be a KASSERT/panic because it is a bug if the
interface is not set at this point.


# 1.357 06-Jul-2017 christos

remove unnecessary casts (no functional change)


# 1.356 06-Jul-2017 christos

Merge the two copies SO_TIMESTAMP/SO_OTIMESTAMP processing to a single
function, and add a SOOPT_TIMESTAMP define reducing compat pollution from
5 places to 1.


Revision tags: netbsd-8-base
# 1.355 01-Jun-2017 chs

remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base
# 1.354 31-Mar-2017 ozaki-r

Don't use a single global variable to store source route information for multiple incoming packets

It's not MP-safe. So use a m_tag to store the information instead.

Pointed out by knakahara@
The fix is from OpenBSD (originally fixed in FreeBSD)


# 1.353 31-Mar-2017 ozaki-r

Don't use a single global variable as a temporal storage for multiple packets

It's not MP-safe. So use local variables instead.


Revision tags: pgoyette-localcount-20170320
# 1.352 06-Mar-2017 ozaki-r

Make sure icmp_redirect_timeout_q and ip_mtudisc_timeout_q are initialized on bootup

Fix PR kern/52029


# 1.351 17-Feb-2017 ozaki-r

Fix return value


# 1.350 17-Feb-2017 ozaki-r

Protect sysctl_net_inet_ip_pmtudto with icmp_mtx instead of softnet_lock


# 1.349 07-Feb-2017 ozaki-r

Add missing NULL checks for m_get_rcvif


Revision tags: nick-nhusb-base-20170204
# 1.348 24-Jan-2017 ozaki-r

Tweak softnet_lock and NET_MPSAFE

- Don't hold softnet_lock in some functions if NET_MPSAFE
- Add softnet_lock to sysctl_net_inet_icmp_redirtimeout
- Add softnet_lock to expire_upcalls of ip_mroute.c
- Restore softnet_lock for in{,6}_pcbpurgeif{,0} if NET_MPSAFE
- Mark some softnet_lock for future work


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107
# 1.347 12-Dec-2016 ozaki-r

branches: 1.347.2;
Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.346 08-Dec-2016 ozaki-r

Use psref for ip_rtaddr

ip_rtaddr will be sleepable soon. So use psref instead of pserialize.


# 1.345 08-Dec-2016 ozaki-r

Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.344 18-Oct-2016 ozaki-r

Don't hold global locks if NET_MPSAFE is enabled

If NET_MPSAFE is enabled, don't hold KERNEL_LOCK and softnet_lock in
part of the network stack such as IP forwarding paths. The aim of the
change is to make it easy to test the network stack without the locks
and reduce our local diffs.

By default (i.e., if NET_MPSAFE isn't enabled), the locks are held
as they used to be.

Reviewed by knakahara@


# 1.343 18-Oct-2016 ozaki-r

Avoid double frees of mbuf

May fix one of panicks reported by Tom Ivar Helbekkmo in PR kern/51522


# 1.342 11-Oct-2016 ozaki-r

Fix kernel builds with IFA_STATS


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.341 07-Sep-2016 roy

Disallow input to detached addresses because they are not yet valid.


# 1.340 31-Aug-2016 ozaki-r

Make ipforward_rt and ip6_forward_rt percpu

Sharing one rtcache between CPUs is just a bad idea.

Reviewed by knakahara@


Revision tags: pgoyette-localcount-20160806
# 1.339 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


# 1.338 26-Jul-2016 ozaki-r

Fix downmatch increment


Revision tags: pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.337 08-Jul-2016 ozaki-r

branches: 1.337.2;
CID 1363344: remove dead code

We may need to reconsider a case when m_get_rcvif_psref returns NULL.


# 1.336 07-Jul-2016 ozaki-r

Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.


# 1.335 06-Jul-2016 ozaki-r

Switch the IPv4 address list to pslist(9)

Note that we leave the old list just in case; it seems there are some
kvm(3) users accessing the list. We can remove it later if we confirmed
nobody does actually.


# 1.334 06-Jul-2016 ozaki-r

Add and use pslist(9)-based hashtable for IPv4 addresses

Note that we leave the old hashtable to keep vmstat -H working.


# 1.333 04-Jul-2016 ozaki-r

Separate IP address matching functions

No functional change intended.


# 1.332 30-Jun-2016 ozaki-r

Tidy up goto lables

No functional change.


# 1.331 30-Jun-2016 ozaki-r

Fix error paths

Some error paths did m_put_rcvif_psref twice.


# 1.330 28-Jun-2016 ozaki-r

Add missing NULL checks for m_get_rcvif_psref


# 1.329 10-Jun-2016 ozaki-r

Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319
# 1.328 21-Jan-2016 riastradh

Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.


# 1.327 21-Jan-2016 riastradh

Give proper prototype to ip_output.


# 1.326 08-Jan-2016 knakahara

eliminate ip_input.c and ip6_input.c dependency on gif(4)


Revision tags: nick-nhusb-base-20151226
# 1.325 13-Oct-2015 roy

Include arp.h to restore the sysctl net.inet.ip.dad_count.
Fixes PR kern/49883 thanks to HITOSHI Osada.


Revision tags: nick-nhusb-base-20150921
# 1.324 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.323 07-Aug-2015 ozaki-r

Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.


Revision tags: nick-nhusb-base-20150606
# 1.322 02-May-2015 joerg

Fix !ARP build.


# 1.321 02-May-2015 roy

Add IPv4 address flags IN_IFF_TENTATIVE, IN_IFF_DUPLICATED and
IN_IFF_DETATCHED to mimic the IPv6 address behaviour.
Add SIOCGIFAFLAG_IN ioctl to retrieve the address flag via the
ifreq structure.
Add IPv4 DAD detection via the ARP methods described in RFC 5227.
Add sysctls net.inet.ip.dad_count and net.inet.arp.debug.

Discussed on tech-net@


Revision tags: nick-nhusb-base-20150406
# 1.320 26-Mar-2015 ozaki-r

Tidy up the regular path of ip_forward

No functional change is intended.


Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.319 16-Jun-2014 ozaki-r

branches: 1.319.4;
Add 3rd argument to pktq_create to pass sc

It will be used to pass bridge sc for bridge_forward softint.

ok rmind@


# 1.318 05-Jun-2014 rmind

- Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.


# 1.317 30-May-2014 christos

Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.


# 1.316 29-May-2014 rmind

Make IGMP and multicast group management code MP-safe. Use a read-write
lock to protect the hash table of multicast address records; also, make it
private and eliminate some macros. In the long term, the lookup path ought
to be optimised.


# 1.315 28-May-2014 christos

CID 12164{49,51}: Remove bogus ifp == NULL checks; if ifp was really NULL,
we would have been dead a few lines before the tests.


# 1.314 23-May-2014 rmind

ip_input(), ip_savecontrol(): cache m->m_pkthdr.rcvif in a variable.


# 1.313 23-May-2014 rmind

Make ip_forward() static, there is no need to expose it.


# 1.312 23-May-2014 rmind

Make ip_input() static, there is no need to expose it.


# 1.311 22-May-2014 rmind

- Add in_init() and move some functions, variables and sysctls into in.c
where they belong to. Make some functions and variables static.
- ip_input.c: reduce some #ifdefs, cleanup a little.
- Move some sysctls into ip_flow.c as they belong there.

No functional change.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.310 19-Mar-2014 liamjfoy

branches: 1.310.2;
Remove ipflow_prune and replace with ipflow_reap. ok rmind@


Revision tags: riastradh-drm2-base3
# 1.309 25-Feb-2014 pooka

Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.308 29-Jun-2013 rmind

- Rewrite parts of pfil(9): use array to store hooks and thus be more cache
friendly (there are only few hooks in the system). Make the structures
opaque and the interface more strict.
- Remove PFIL_HOOKS option by making pfil(9) mandatory.


# 1.307 27-Jun-2013 christos

branches: 1.307.2;
flip src/dst


# 1.306 27-Jun-2013 christos

implement IP_PKTINFO and IP_RECVPKTINFO.


# 1.305 08-Jun-2013 rmind

Split IPsec code in ip_input() and ip_forward() into the separate routines
ipsec4_input() and ipsec4_forward(). Tested by christos@.


# 1.304 05-Jun-2013 christos

IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.


Revision tags: agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.303 29-Nov-2012 christos

Add a new sysctl to mark ports as reserved, so that they are not used in
the anonymous or reserved port allocation.


Revision tags: yamt-pagecache-base6
# 1.302 25-Jun-2012 christos

branches: 1.302.2;
rename rfc6056 -> portalgo, requested by yamt


# 1.301 22-Jun-2012 christos

PR/46602: Move the rfc6056 port randomization to the IP layer.


# 1.300 02-Jun-2012 dsl

Add some pre-processor magic to verify that the type of the data item
passed to sysctl_createv() actually matches the declared type for
the item itself.
In the places where the caller specifies a function and a structure
address (typically the 'softc') an explicit (void *) cast is now needed.
Fixes bugs in sys/dev/acpi/asus_acpi.c sys/dev/bluetooth/bcsp.c
sys/kern/vfs_bio.c sys/miscfs/syncfs/sync_subr.c and setting
AcpiGbl_EnableAmlDebugObject.
(mostly passing the address of a uint64_t when typed as CTLTYPE_INT).
I've test built quite a few kernels, but there may be some unfixed MD
fallout. Most likely passing &char[] to char *.
Also add CTLFLAG_UNSIGNED for unsiged decimals - not set yet.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.299 22-Mar-2012 drochner

remove KAME IPSEC, replaced by FAST_IPSEC


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.298 09-Jan-2012 liamjfoy

check against NULL


# 1.297 19-Dec-2011 drochner

rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.296 31-Aug-2011 plunky

branches: 1.296.2; 1.296.6;
NULL does not need a cast


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.295 03-May-2011 dyoung

*_drain() routines may be called with locks held, so instead of doing
any work in *_drain(), set a drain-needed flag. Do the work in the
fasttimo handler.

Contributed by Coyote Point Systems, Inc.


# 1.294 14-Apr-2011 dyoung

In ipintr(), don't overwrite ipintrq.ifq_maxlen with IFQ_MAXLEN.

Initialize ipintrq.ifq_maxlen using IFQ_MAXLEN directly instead of using
the global ipqmaxlen. Get rid of the global ipqmaxlen.

Now it works again to override the maximum IP queue length with, for
example, sysctl -w net.inet.ip.ifq.maxlen=5.


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.293 13-Dec-2010 matt

branches: 1.293.2;
Back out rev that shouldn't have been committed.


# 1.292 11-Dec-2010 matt

Add routines to calculate a checkesum if the driver concludes that the
h/w can't do it.


Revision tags: uebayasi-xip-base4
# 1.291 05-Nov-2010 rmind

ip_randomid: make mechanism MP-safe and more modular.

OK matt@


# 1.290 05-Nov-2010 rmind

ip_reass_packet: finish abstraction; some clean-up.
Discussed some time ago with matt@.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.289 19-Jul-2010 rmind

Abstract IP reassembly into single generic routine - ip_reass_packet().
Make struct ipq private and struct ipqent not visible to userland.
Push ip_len adjustment into reassembly layer.

OK matt@


# 1.288 13-Jul-2010 rmind

Split-off IPv4 re-assembly mechanism into a separate module. Abstract
into ip_reass_init(), ip_reass_lookup(), etc (note: abstraction is not
yet complete). No functional changes to the actual mechanism.

OK matt@


# 1.287 09-Jul-2010 rmind

ip_input: move lookup for fragment queue a little bit further. OK matt@.


Revision tags: uebayasi-xip-base1
# 1.286 01-Apr-2010 tls

As suggested by at least 3 different people (the guilty parties know who
they are) avoid repeated kernel_lock/unlock by using an intrq on the stack.

About 5%-10% better from run to run, on my *very* simpleminded test. Can't
possibly be worse.


# 1.285 31-Mar-2010 tls

Don't hold kernel lock across call to ip_input() -- it blocked *all*
hardware interrupts for the length of time it took for all dequeued
packets to flow up the stack (on multiprocessors only). Initial testing
shows performance impact is minimal -- since this temporary fix actually
means taking/releasing the kernel lock per-packet, that seems
acceptable.

Holding the kernel lock across the ip_input() call duplicated the
exclusion intended to be provided by the socket locks/softnet lock
(same lock, for INET/INET6 sockets) and could mask serious bugs. Several
hours' testing didn't turn any up but I'd be surprised if some don't now
appear.

Damon Permezel noticed the problem. Temporary fix suggested by matt@.


Revision tags: yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.284 16-Sep-2009 pooka

branches: 1.284.2; 1.284.4;
Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.283 17-Jul-2009 minskim

Delete trailing whitespace.


Revision tags: yamt-nfs-mp-base6
# 1.282 16-Jul-2009 minskim

Add the IP_RECVTTL option support.

If the IP_RECVTTL option is enabled on a SOCK_DGRAM socket, the
recvmsg(2) call will return the TTL of the received datagram. The
msg_control field in the msghdr structure points to a buffer that
contains a cmsghdr structure followed by the TTL value.

Modeled after FreeBSD implementation.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.281 18-Apr-2009 tsutsui

Remove extra whitespace added by a stupid tool.
XXX: more in src/sys/arch


# 1.280 15-Apr-2009 elad

Remove a few KAUTH_GENERIC_ISSUSER in favor of more descriptive
alternatives.

Discussed on tech-kern:

http://mail-index.netbsd.org/tech-kern/2009/04/11/msg004798.html

Input from ad@, christos@, dyoung@, tsutsui@.

Okay ad@.


# 1.279 18-Mar-2009 cegger

bcopy -> memcpy


Revision tags: nick-hppapmap-base2
# 1.278 19-Jan-2009 christos

branches: 1.278.2;
Provide compatibility to the old timeval SCM_TIMESTAMP messages.


Revision tags: mjf-devfs2-base
# 1.277 17-Dec-2008 cegger

kill MALLOC and FREE macros.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.276 23-Nov-2008 rmind

ip_input: fix an IPQ "lock" leak. (hi <matt>!)


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4
# 1.275 04-Oct-2008 pooka

branches: 1.275.2; 1.275.4;
POOL_INIT -> pool_init


Revision tags: wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.274 05-Sep-2008 seanb

Wrong route being consulted in one place
in ip_forward() after change to rtcache_*().
Restore previous behaviour.


# 1.273 20-Aug-2008 matt

Make the sysctl routines take out softnet_lock before dealing with
any data structures.

Change inet6ctlerrmap and zeroin6_addr to const.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base
# 1.272 05-May-2008 ad

branches: 1.272.2; 1.272.6;
- Convert hashinit() to use kmem_alloc(). The hash tables can be large
and it's better to not have them in kmem_map.
- Convert a couple of minor items along the way to kmem_alloc().
- Fix some memory leaks.


# 1.271 04-May-2008 thorpej

Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577


# 1.270 02-May-2008 ad

PR kern/38497 Out of memory allocating ksiginfo

Work around: don't acquire softnet_lock in protocol drain routines.


# 1.269 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.268 24-Apr-2008 ad

branches: 1.268.2;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.


# 1.267 23-Apr-2008 thorpej

Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.266 12-Apr-2008 thorpej

branches: 1.266.2;
Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.


# 1.265 09-Apr-2008 thorpej

- ipflow is not used outside ip_flow.c; move its definition there.
- Make ipflow_reap() private to ip_flow.c, and introduce ipflow_prune()
for external callers to use (avoids returning an ipflow * that is never
actually used anyway).


# 1.264 07-Apr-2008 thorpej

Change IP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ipstat structure; old netstat
binaries will continue to work properly.


# 1.263 27-Mar-2008 cube

- Make sure we send a reasonable fragment size when IPSEC is configured.
Otherwise we end up sending a dubious "0" whenever we cannot find a
proper association for the packet.
- Reset sack_newdata along with snd_nxt to avoid improper integer
arithmetics that lead to sending data from an incorrect place in the
stream, making it appear as corrupted.

Patch by Michael Van Elst, based on an analysis by Michael for the IPSEC
stuff and I for the SACK issue.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.262 06-Feb-2008 matt

branches: 1.262.6;
Add a new ip_id generation scheme based on a Fisher-Yates shuffle over a
sliding window. XXX replace use of arc4random RSN.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.261 14-Jan-2008 dyoung

Use rtcache_validate() instead of rtcache_getrt(). Shorten staircase
in in_losing().


Revision tags: vmlocking2-base3 matt-armv6-base
# 1.260 22-Dec-2007 matt

Fix offset calculation.
Make sure that all frags use the same TOS.


# 1.259 21-Dec-2007 matt

Also make sure the first is at 68 bytes long.


# 1.258 21-Dec-2007 matt

Prevent TCP blind data attacks by not allowing non-initial fragments to
start at less than 68 bytes (minimal fragment size).


# 1.257 20-Dec-2007 dyoung

Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.256 26-Nov-2007 yamt

branches: 1.256.2; 1.256.6;
inetctlerrmap: use designated initializer.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.255 09-Nov-2007 kefren

Don't MCLAIM in ipintr() because we do it anyway in ip_input()


Revision tags: jmcneill-base yamt-x86pmap-base4 yamt-x86pmap-base3 yamt-x86pmap-base2 vmlocking-base
# 1.254 02-Oct-2007 dyoung

branches: 1.254.2; 1.254.4;
Delete the unused second argument to ip_stripoptions(), move it
closer to its single caller in if_eon.c, try to move fewer bytes
by moving the IP header forward instead of moving the tail of the
mbuf backward, and use m_adj(9) instead of fiddling directly with
mbuf data members.


Revision tags: yamt-x86pmap-base
# 1.253 11-Sep-2007 degroote

branches: 1.253.2;
In some FAST_IPSEC, spl level is not restored correctly. Fix that.

Spotted by Wolfgang Stukenbrock in pr/36800


Revision tags: nick-csl-alignment-base5
# 1.252 30-Aug-2007 dyoung

Use malloc(9) for sockaddrs instead of pool(9), and remove dom_sa_pool
and dom_sa_len members from struct domain. Pools of fixed-size
objects are too rigid for sockaddr_dls, whose size can vary over
a wide range.

Return sockaddr_dl to its "historical" size. Now that I'm using
malloc(9) instead of pool(9) to allocate sockaddr_dl, I can create
a sockaddr_dl of any size in the kernel, so expanding sockaddr_dl
is useless.

Avoid using sizeof(struct sockaddr_dl) in the kernel.

Introduce sockaddr_dl_alloc() for allocating & initializing an
arbitrary sockaddr_dl on the heap.

Add an argument, the sockaddr length, to sockaddr_alloc(),
sockaddr_copy(), and sockaddr_dl_setaddr().

Constify: LLADDR() -> CLLADDR().

Where the kernel overwrites LLADDR(), use sockaddr_dl_setaddr(),
instead. Used properly, sockaddr_dl_setaddr() will not overrun
the end of the sockaddr.


# 1.251 10-Aug-2007 dyoung

branches: 1.251.2;
Use sockaddr_dl_init().


Revision tags: matt-mips64-base
# 1.250 19-Jul-2007 dyoung

branches: 1.250.4; 1.250.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base yamt-idlelwp-base8 mjf-ufs-trans-base
# 1.249 02-May-2007 dyoung

branches: 1.249.2;
Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


Revision tags: thorpej-atomic-base
# 1.248 25-Mar-2007 liamjfoy

Add net.inet.ip.hashsize to control the IPv4 fast forward hash table size.


# 1.247 24-Mar-2007 liamjfoy

Don't call ip*flow_reap if we're just looking up maxflows


# 1.246 12-Mar-2007 ad

branches: 1.246.2; 1.246.4;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.


# 1.245 05-Mar-2007 liamjfoy

branches: 1.245.2;
Move ipflow_slowtimo from ip_slowtimo and into in_proto.c

ok matt@


# 1.244 04-Mar-2007 christos

Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.243 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.242 29-Jan-2007 dyoung

branches: 1.242.2;
Cosmetic: remove extraneous, non-KNF parentheses. Change a
sizeof(type) to a sizeof(*ptr) so the correctness of the statement
is correct "at a glance" (or so I hope).


# 1.241 22-Dec-2006 ad

ipintr(): check if the queue is empty before looping. Hardly a giant
win, but removed 30% of splnet() calls in one local test.


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.240 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.239 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


# 1.238 06-Dec-2006 dyoung

KNF.


# 1.237 06-Dec-2006 dyoung

KNF.


Revision tags: netbsd-4-0-RC1 netbsd-4-base
# 1.236 16-Nov-2006 christos

branches: 1.236.2; 1.236.4;
__unused removal on arguments; approved by core.


Revision tags: yamt-splraiseipl-base2
# 1.235 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


# 1.234 10-Oct-2006 dogcow

change the MOWNER_INIT define to take two args; fix extant struct mowner
decls to use it. Makes options MBUFTRACE compile again and not whinge about
missing structure declarations. (Also makes initialization consistent.)


# 1.233 05-Oct-2006 tls

Protect calls to pool_put/pool_get that may occur in interrupt context
with spl used to protect other allocations and frees, or datastructure
element insertion and removal, in adjacent code.

It is almost unquestionably the case that some of the spl()/splx() calls
added here are superfluous, but it really seems wrong to see:

s=splfoo();
/* frob data structure */
splx(s);
pool_put(x);

and if we think we need to protect the first operation, then it is hard
to see why we should not think we need to protect the next. "Better
safe than sorry".

It is also almost unquestionably the case that I missed some pool
gets/puts from interrupt context with my strategy for finding these
calls; use of PR_NOWAIT is a strong hint that a pool may be used from
interrupt context but many callers in the kernel pass a "can wait/can't
wait" flag down such that my searches might not have found them. One
notable area that needs to be looked at is pf.

See also:

http://mail-index.netbsd.org/tech-kern/2006/07/19/0003.html
http://mail-index.netbsd.org/tech-kern/2006/07/19/0009.html


# 1.232 19-Sep-2006 elad

Remove ugly (void *) casts from network scope authorization wrapper and
calls to it.

While here, adapt code for system scope listeners to avoid some more
casts (forgotten in previous run).

Update documentation.


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9
# 1.231 13-Sep-2006 elad

branches: 1.231.2;
Don't use KAUTH_RESULT_* where it's not applicable.
Prompted by yamt@.


# 1.230 08-Sep-2006 elad

First take at security model abstraction.

- Add a few scopes to the kernel: system, network, and machdep.

- Add a few more actions/sub-actions (requests), and start using them as
opposed to the KAUTH_GENERIC_ISSUSER place-holders.

- Introduce a basic set of listeners that implement our "traditional"
security model, called "bsd44". This is the default (and only) model we
have at the moment.

- Update all relevant documentation.

- Add some code and docs to help folks who want to actually use this stuff:

* There's a sample overlay model, sitting on-top of "bsd44", for
fast experimenting with tweaking just a subset of an existing model.

This is pretty cool because it's *really* straightforward to do stuff
you had to use ugly hacks for until now...

* And of course, documentation describing how to do the above for quick
reference, including code samples.

All of these changes were tested for regressions using a Python-based
testsuite that will be (I hope) available soon via pkgsrc. Information
about the tests, and how to write new ones, can be found on:

http://kauth.linbsd.org/kauthwiki

NOTE FOR DEVELOPERS: *PLEASE* don't add any code that does any of the
following:

- Uses a KAUTH_GENERIC_ISSUSER kauth(9) request,
- Checks 'securelevel' directly,
- Checks a uid/gid directly.

(or if you feel you have to, contact me first)

This is still work in progress; It's far from being done, but now it'll
be a lot easier.

Relevant mailing list threads:

http://mail-index.netbsd.org/tech-security/2006/01/25/0011.html
http://mail-index.netbsd.org/tech-security/2006/03/24/0001.html
http://mail-index.netbsd.org/tech-security/2006/04/18/0000.html
http://mail-index.netbsd.org/tech-security/2006/05/15/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/01/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/25/0000.html

Many thanks to YAMAMOTO Takashi, Matt Thomas, and Christos Zoulas for help
stablizing kauth(9).

Full credit for the regression tests, making sure these changes didn't break
anything, goes to Matt Fleming and Jaime Fournier.

Happy birthday Randi! :)


Revision tags: yamt-pdpolicy-base8 rpaulo-netinet-merge-pcb-base
# 1.229 30-Aug-2006 christos

branches: 1.229.2;
fix initializer


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.228 30-Jul-2006 elad

ugh.. more stuff that's overdue and should not be in 4.0: remove the
sysctl(9) flags CTLFLAG_READONLY[12]. luckily they're not documented
so it's only half regression.

only two knobs used them; proc.curproc.corename (check added in the
existing handler; its CTLFLAG_ANYWRITE, yay) and net.inet.ip.forwsrcrt,
that got its own handler now too.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base
# 1.227 07-Jun-2006 kardel

merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base simonb-timecounters-base
# 1.226 08-May-2006 liamjfoy

branches: 1.226.2;
#if -> #ifdef

ok christos


# 1.225 15-Apr-2006 christos

Coverity CID 1134: Protect against NULL deref.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.224 18-Feb-2006 joerg

branches: 1.224.2; 1.224.4; 1.224.6;
Print the source and destination IP in ip_forward's DIAGNOSTIC code
with inet_ntoa, making it more human friendly.

From Liam J. Foy in private mail.


# 1.223 24-Dec-2005 perry

branches: 1.223.2; 1.223.4; 1.223.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.222 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.221 01-Nov-2005 christos

Don't decrement the ttl, until we are sure that we can forward this packet.
Before if there was no route, we would call icmp_error with a datagram
packet that has an incorrect checksum. (From Liam Foy)


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base
# 1.220 23-Oct-2005 christos

No need to pass an interface when only the mtu is needed. From OpenBSD via
Liam Foy.


Revision tags: yamt-vop-base
# 1.219 05-Aug-2005 elad

branches: 1.219.2;
Add sysctls for IP, ICMP, TCP, and UDP statistics.


# 1.218 28-Jun-2005 seanb

branches: 1.218.2;
- Return ICMP_UNREACH_NET when no route found as per
section 4.3.3.1 of rfc1812.


# 1.217 09-Jun-2005 atatat

Properly fix the constipated lossage wrt -Wcast-qual and the sysctl
code. I know it's not the prettiest code, but it seems to work rather
well in spite of itself.


# 1.216 01-Jun-2005 blymn

Unconstify rnode to prevent compile error when GATEWAY option set.


Revision tags: kent-audio2-base
# 1.215 29-Apr-2005 yamt

move decl of inetsw to its own header to avoid array of incomplete type.
found by gcc4. reported by Adam Ciarcinski.


# 1.214 18-Apr-2005 yamt

fix problems related to loopback interface checksum omission. PR/29971.

- for ipv4, defer decision to ip layer as h/w checksum offloading does
so that it can check the actual interface the packet is going to.
- for ipv6, disable it.
(maybe will be revisited when it implements h/w checksum offloading.)

ok'ed by Jason Thorpe.


# 1.213 29-Mar-2005 yamt

ip_reass: clear stale csum_flags.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base
# 1.212 26-Feb-2005 perry

branches: 1.212.2;
nuke trailing whitespace


Revision tags: yamt-km-base2
# 1.211 03-Feb-2005 perry

ANSIfy function declarations


# 1.210 02-Feb-2005 perry

de-__P -- will ANSIfy .c files later.


Revision tags: yamt-km-base
# 1.209 24-Jan-2005 matt

branches: 1.209.2;
Add IFNET_FOREACH and IFADDR_FOREACH macros and start using them.


Revision tags: kent-audio1-beforemerge
# 1.208 19-Dec-2004 christos

branches: 1.208.2;
yamt's changes seem to fix all the checksumming issues. Turn the loopback
checksums back off so we can make sure that everything works.


# 1.207 17-Dec-2004 christos

Turn checksumming on loopback back on until we fix the bugs in it.
Connect over tcp on the loopback is broken:

4729 amq 0.000007 CALL connect(4,0x804f2a0,0x1c)
4729 amq 75.007420 RET connect -1 errno 60 Connection timed out


# 1.206 15-Dec-2004 thorpej

Don't perform checksums on loopback interfaces. They can be reenabled with
the net.inet.*.do_loopback_cksum sysctl.

Approved by: groo


Revision tags: kent-audio1-base
# 1.205 06-Oct-2004 darrenr

Add a comment to document what setting "srcrt" is really on about in ipintr()


# 1.204 29-Sep-2004 christos

PR/27081: Sean Boudreau: ip_input() bad csum count not incremented on sw csum


Revision tags: BEFORE-IPF413
# 1.203 25-May-2004 atatat

Sysctl descriptions under net subtree (net.key not done)


# 1.202 02-May-2004 darrenr

at line 543, we do a pullup here of hlen bytes into the mbuf,
so these later ones are superfluous.


# 1.201 01-May-2004 matt

Use EVCNT_ATTACH_STATIC{,2}


# 1.200 25-Apr-2004 simonb

Initialise (most) pools from a link set instead of explicit calls
to pool_init. Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.

Convert struct session, ucred and lockf to pools.


# 1.199 22-Apr-2004 matt

Constify protosw arrays. This can reduce the kernel .data section by
over 4K (if all the network protocols) are loaded.


# 1.198 01-Apr-2004 matt

In ip_reass_ttl_descr, make i signed since it's compared to >= 0


Revision tags: netbsd-2-0-base BEFORE-IPF411
# 1.197 24-Mar-2004 atatat

branches: 1.197.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.


# 1.196 15-Jan-2004 itojun

correct typo in 1.94 -> 1.95. pointed out by Shiva Shenoy


# 1.195 14-Dec-2003 thorpej

Fix syntax errors in CHECK_NMBCLUSTER_PARAMS().


# 1.194 14-Dec-2003 jonathan

Second part of hashed IP_reassembly changes:

When under pressure for mbufs or we have too many fragments in the IP
reassembly queue, drop half of all fragments. This multiplicative-drop
strategy ensures we return to a healthy state, even under borderline
denial-of-service from extremely lossy NFS-over-UDP peers.
The multiplicative-drop phase currently drops 50% of fragments, but
has pre-placed support for implementing drop-fractions other than 50%

The threshhold for the `drop-half' phase is the new variable,
ip_maxfrags which is calculated as nmbclusters/4.

ip_input.c now keeps ip_nmbclusters, a cached copy of nmbclusters.
Before using limits derived from nmbclusters, we check if nmbclusters
and ip_nmclusters are equal. If not, we recompute Ip parameters
derived from nmbclusters. Based on a suggestion by Jason Thorpe.
ip_maxfrags is currently auto-recalcuated.

The counters ip_nfrags and ip_nfragpacketsr are now declared static
and uninitialized (bss), to discourage tampering with them.


# 1.193 12-Dec-2003 scw

Make fast-ipsec and ipflow (Fast Forwarding) interoperate.

The idea is that we only clear M_CANFASTFWD if an SPD exists
for the packet. Otherwise, it's safe to add a fast-forward
cache entry for the route.

To make this work properly, we invalidate the entire ipflow
cache if a fast-ipsec key is added or changed.


# 1.192 08-Dec-2003 jonathan

Add new field ipq_nfrags to struct ipq. Maintain count of fragments
(fragments, not fragmented packets) in each queue entry.
Use ipq_nfrags to maintain a count of total fragments in reassembly queue.


# 1.191 07-Dec-2003 jonathan

KNF: s/unsigned/u_int/, in a couple of places I missed.


# 1.190 06-Dec-2003 jonathan

Replace the single global IP reassembly list/listhead, with a
hashtable of list-heads. Independently re-invented, then reworked to
match similar code in FreeBSD.


# 1.189 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.188 04-Dec-2003 scw

ipflow (IP fast forwarding) is not compatible with FAST_IPSEC either.

XXX: The decision whether or not to fast forward should be made
XXX: dynamically. Using the current approach seriously reduces
XXX: routing performance on gateways with IPsec enabled.


# 1.187 26-Nov-2003 itojun

define RANDOM_IP_ID by default (unifdef -DRANDOM_IP_ID).
one use remains in sys/netipsec, which is kept for freebsd source code compat.


# 1.186 24-Nov-2003 scw

For FAST_IPSEC, ipfilter gets to see wire-format IPsec-encapsulated packets
only. Decapsulated packets bypass ipfilter. This mimics current behaviour
for Kame IPsec.


# 1.185 19-Nov-2003 fvdl

Correct number of arguments to sysctl_rdint.


# 1.184 19-Nov-2003 jonathan

Patch back support for (badly) randomized IP ids, by request:

* Include "opt_inet.h" everywhere IP-ids are generated with ip_newid(),
so the RANDOM_IP_ID option is visible. Also in ip_id(), to ensure
the prototype for ip_randomid() is made visible.

* Add new sysctl to enable randomized IP-ids, provided the kernel was
configured with RANDOM_IP_ID. (The sysctl defaults to zero, and is
a read-only zero if RANDOM_IP_ID is not configured).

Note that the implementation of randomized IP ids is still defective,
and should not be enabled at all (even if configured) without
very careful deliberation. Caveat emptor.


# 1.183 17-Nov-2003 jonathan

Diff to netinet/ip_input.c (restore ip_id, initialize) for ip_id fix:

Revert the (default) ip_id algorithm to the pre-randomid algorithm,
due to demonstrated low-period repeated IDs from the randomized IP_id
code. Consensus is that the low-period repetition (much less than
2^15) is not suitable for general-purpose use.

Allocators of new IPv4 IDs should now call the function ip_newid().
Randomized IP_ids is now a config-time option, "options RANDOM_IP_ID".
ip_newid() can use ip_random-id()_IP_ID if and only if configured
with RANDOM_IP_ID. A sysctl knob should be provided.

This API may be reworked in the near future to support linear ip_id
counters per (src,dst) IP-address pair.


# 1.182 12-Nov-2003 itojun

KNF


# 1.181 11-Nov-2003 jonathan

Change global head-of-local-IP-address list from in_ifaddr to
in_ifaddrhead. Recent changes in struct names caused a namespace
collision in fast-ipsec, which are most cleanly fixed by using
"in_ifaddrhead" as the listhead name.


# 1.180 10-Nov-2003 jonathan

Make per-protocol network input queue stats visible to userland via
sysctl. Add a protocol-independent sysctl handler to show the per-protocol
"struct ifq' statistics. Add IP(v4) specific call to the handler.
Other protocols can show their per-protocol input statistics by
allocating a sysclt node and calling sysctl_ifq() with their own struct ifq *.

As posted to tech-kern plus improvements/cleanup suggested by Andrew Brown.


# 1.179 28-Sep-2003 mycroft

Remove some code that breaks AH tunnels completely. The comment describing
the purpose of this code appears to be on crack -- it's talking about
end-to-end authentication, but the purpose of an AH tunnel is NOT end-to-end
authentication; it's authentication of the tunnel endpoints.

NB: This does not fix the fact that IPsec leaks "packet tags."


# 1.178 06-Sep-2003 itojun

randomize IPv4/v6 fragment ID and IPv6 flowlabel. avoids predictability
of these fields. ip_id.c is from openbsd. ip6_id.c is adapted by kame.


# 1.177 06-Sep-2003 itojun

backout previous, we don't know if arc4random() corrides on reboot.


# 1.176 05-Sep-2003 itojun

initialize fragment ID with arc4random, not by time.tv_sec


# 1.175 22-Aug-2003 itojun

remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.


# 1.174 22-Aug-2003 itojun

change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.


# 1.173 15-Aug-2003 jonathan

(fast-ipsec): Add hooks to pass IPv4 IPsec traffic into fast-ipsec, if
configured with ``options FAST_IPSEC''. Kernels with KAME IPsec or
with no IPsec should work as before.

All calls to ip_output() now always pass an additional compulsory
argument: the inpcb associated with the packet being sent,
or 0 if no inpcb is available.

Fast-ipsec tested with ICMP or UDP over ESP. TCP doesn't work, yet.


# 1.172 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.171 14-Jul-2003 itojun

correct igmp. from love


# 1.170 03-Jul-2003 itojun

minor KNF


# 1.169 30-Jun-2003 itojun

branches: 1.169.2;
do not generate ICMP redirect when packet filter alters ip_dst to an
address that reside on the same link. Cedric Berger convinced me that
it is necessary.


# 1.168 30-Jun-2003 itojun

fix indent


# 1.167 23-Jun-2003 martin

Make sure to include opt_foo.h if a defflag option FOO is used.


# 1.166 15-Jun-2003 matt

Change the way multicasts are kept. They now use a hash table in the same
manner as the ifaddr hash table. By doing this, the mkludge code can go
away. At the same time, keep track of what pcbs are using what ifaddr and
when an address is deleted from an interface, notify/abort all sockets
that have that address as a source. Switch IGMP and multicasts to use pools
for allocation. Fix a number of potential problems in the igmp code where
allocation failures could cause a trap/panic.


# 1.165 11-Apr-2003 christos

PR/991: Darren Reed: Add a sysctl (checkinteface) to implement this. This
implementation is taken from FreeBSD, but we default to off.
XXX: We should really do this on a per ifaddr basis as jason suggested.


# 1.164 26-Feb-2003 matt

Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.163 12-Nov-2002 itojun

remove all entries in rt timer queue on ip_mtudisc change, instead of
destroying the queue.


# 1.162 12-Nov-2002 itojun

ckout previous - doesn't compile


# 1.161 12-Nov-2002 itojun

update ip_mtudisc sysctl change handling.


# 1.160 10-Nov-2002 itojun

always create pmtud timeout queue, as ip_mtudisc can be tweaked via
sysctl at runtime. From lha@stacken.kth.se


# 1.159 02-Nov-2002 perry

/*CONTCOND*/ while (0)'ed macros


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.158 23-Sep-2002 itojun

revert mtudisc_timeout value to the old one if update falis


# 1.157 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame.


# 1.156 11-Sep-2002 itojun

correct signedness mixup in pointer passing. sync w/kame


Revision tags: gehenna-devsw-base
# 1.155 14-Aug-2002 itojun

avoid swapping endian of ip_len and ip_off on mbuf, to meet with M_LEADINGSPACE
optimization made last year. should solve PR 17867 and 10195.

IP_HDRINCL behavior of raw ip socket is kept unchanged. we may want to
provide IP_HDRINCL variant that does not swap endian.


# 1.154 30-Jun-2002 thorpej

Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).


# 1.153 13-Jun-2002 itojun

set IPv4 parameter to modern value.
- turn on path MTU discovery (previous: turned off)
- ICMPv4 redirect entry timeout = 600 sec (previous: never timeout)


# 1.152 09-Jun-2002 itojun

whitespace


# 1.151 07-Jun-2002 itojun

look at rmx_mtu on IPsec tunnel MTU computation.
From: David Waitzman <djw@bbn.com>


Revision tags: netbsd-1-6-base
# 1.150 12-May-2002 matt

branches: 1.150.2; 1.150.4;
Eliminate commons.


# 1.149 12-May-2002 wiz

Spelling fixes, from Sergey Svishchev in kern/16650.


# 1.148 07-May-2002 matt

Change struct ipqe to use TAILQ's instead of LIST's (primarily for TCP's
benefit currently). Rework tcp_reass code to optimize the 4 most likely causes
of out-of-order packets: first OoO pkt, next OoO pkt in seq, OoO pkt is part
of new chuck of OoO packets, and the OoO pkt fills the first hole. Add evcnts
to instrument tcp_reass (enabled by the options TCP_REASS_COUNTERS). This is
part 1/2 of tcp_reass changes.


# 1.147 18-Apr-2002 matt

Change test for M_EXT to M_READONLY for MROUTING. We only need to to do
a pullup if we aren't allowed to modify the packet.


Revision tags: eeh-devprop-base newlock-base
# 1.146 08-Mar-2002 thorpej

Pool deals fairly well with physical memory shortage, but it doesn't
deal with shortages of the VM maps where the backing pages are mapped
(usually kmem_map). Try to deal with this:

* Group all information about the backend allocator for a pool in a
separate structure. The pool references this structure, rather than
the individual fields.
* Change the pool_init() API accordingly, and adjust all callers.
* Link all pools using the same backend allocator on a list.
* The backend allocator is responsible for waiting for physical memory
to become available, but will still fail if it cannot callocate KVA
space for the pages. If this happens, carefully drain all pools using
the same backend allocator, so that some KVA space can be freed.
* Change pool_reclaim() to indicate if it actually succeeded in freeing
some pages, and use that information to make draining easier and more
efficient.
* Get rid of PR_URGENT. There was only one use of it, and it could be
dealt with by the caller.

From art@openbsd.org.


Revision tags: ifpoll-base
# 1.145 25-Feb-2002 itojun

correctly enforce ipsec policy check on forwarding case.
From: Greg Troxel <gdt@ir.bbn.com>, Bill Chiarchiaro <wjc@work.cleartech.com>


# 1.144 24-Feb-2002 martin

Clear M_BCAST and M_MCAST on outgoing mbufs.
Don't copy ttl from the inner packet to the encapsulating packet. Make
the outer ttl sysctl'able. This should close PR 14269 from Jasper Wallace
(change partly from there) and it makes traceroute work over gre tunnels.


# 1.143 21-Feb-2002 itojun

suppress source quence message, based on router-req RFC (also could be abused
as DoS traffic generator). from kjc/kame


# 1.142 28-Nov-2001 darrenr

recompute hlen after calling pfil_run_hooks() in case ip_hl was changed.


# 1.141 13-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-mips-cache-base
# 1.140 04-Nov-2001 matt

Convert netinet to not use the internal <sys/queue.h> field names
but instead the access macros. Use the FOREACH macros where appropriate.


# 1.139 04-Nov-2001 matt

Change a few variable/tables to const since they are read-only.


# 1.138 29-Oct-2001 simonb

Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.137 17-Sep-2001 thorpej

branches: 1.137.2;
Split the pre-computed ifnet checksum flags into Tx and Rx directions.
Add capabilities bits that indicate an interface can only perform
in-bound TCPv4 or UDPv4 checksums. There is at least one Gig-E chip
for which this is true (Level One LXT-1001), and this is also the
case for the Intel i82559 10/100 Ethernet chips.


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.136 06-Aug-2001 itojun

branches: 1.136.2;
cache IPsec policy on in6?pcb. most of the lookup operations can be bypassed,
especially when it is a connected SOCK_STREAM in6?pcb. sync with kame.


# 1.135 02-Jun-2001 thorpej

branches: 1.135.2;
Implement support for IP/TCP/UDP checksum offloading provided by
network interfaces. This works by pre-computing the pseudo-header
checksum and caching it, delaying the actual checksum to ip_output()
if the hardware cannot perform the sum for us. In-bound checksums
can either be fully-checked by hardware, or summed up for final
verification by software. This method was modeled after how this
is done in FreeBSD, although the code is significantly different in
most places.

We don't delay checksums for IPv6/TCP, but we do take advantage of the
cached pseudo-header checksum.

Note: hardware-assisted checksumming defaults to "off". It is
enabled with ifconfig(8). See the manual page for details.

Implement hardware-assisted checksumming on the DP83820 Gigabit Ethernet,
3c90xB/3c90xC 10/100 Ethernet, and Alteon Tigon/Tigon2 Gigabit Ethernet.


# 1.134 21-May-2001 lukem

fix spelo in comment


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.133 16-Apr-2001 itojun

give a default value to net.inet.ip.maxfragpackets, to protect us from
"lots of fragmented packets" DoS attack.

the current default value is derived from ipv6 counterpart, which is
a magical value "200". it should be enough for normal systems, not sure
if it is enough when you take hundreds of thousands of tcp connections on
your system. if you have proposal for a better value with concrete reasons,
let me know.


# 1.132 13-Apr-2001 thorpej

Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.


# 1.131 27-Mar-2001 itojun

net.inet.ip.maxfragpackets defines the maximum size of ip reass queue
(prevents fragment flood from chewing up mbuf memory space).
derived from KAME net.inet6.ip6.maxfragpackets.


# 1.130 02-Mar-2001 itojun

branches: 1.130.2;
increase ipstat.ips_badaddr if the packet fails to pass address checks.


# 1.129 02-Mar-2001 itojun

reject packets with 127/8 on IPv4 src/dst, they must not appear on wire
(RFC1122). torture-tests will be welcomed.
XXX do we want to check source routing headers as well?


# 1.128 01-Mar-2001 itojun

make sure to enforce inbound ipsec policy checking, for any protocols on top
of ip (check it when final header is visited). sync with kame.
XXX kame team will need to re-check policy engine code


# 1.127 24-Jan-2001 itojun

- record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation


# 1.126 28-Dec-2000 thorpej

Back out the sledgehammer damage applied by wiz while I was out for
the holiday.


# 1.125 25-Dec-2000 wiz

Back out previous change. It causes NAT to fail, and was CLEARLY
NOT TESTED before it was committed.


# 1.124 22-Dec-2000 thorpej

Slight adjustment to how pfil_head's are registered. Instead of a
"key" and a "dlt", use a "type" (PFIL_TYPE_{AF,IFNET} for now) and
a val/ptr appropriate for that type. This allows for more future
flexibility with the pfil_hook mechanism.


# 1.123 14-Dec-2000 thorpej

Add ALTQ glue. XXX Temporary until ALTQ is changed to use a pfil hook.


# 1.122 24-Nov-2000 itojun

IFA_STATS stability (not complete); don't touch ip if it is NULL.


# 1.121 11-Nov-2000 thorpej

Restructure the PFIL_HOOKS mechanism a bit:
- All packets are passed to PFIL_HOOKS as they come off the wire, i.e.
fields in protocol headers in network order, etc.
- Allow for multiple hooks to be registered, using a "key" and a "dlt".
The "dlt" is a BPF data link type, indicating what type of header is
present.
- INET and INET6 register with key == AF_INET or AF_INET6, and
dlt == DLT_RAW.
- PFIL_HOOKS now take an argument for the filter hook, and mbuf **,
an ifnet *, and a direction (PFIL_IN or PFIL_OUT), thus making them
less IP (really, IP Filter) centric.

Maintain compatibility with IP Filter by adding wrapper functions for
IP Filter.


# 1.120 08-Nov-2000 ad

Update for hashinit() change.


# 1.119 13-Oct-2000 itojun

make sure we don't share external mbuf between m and mcopy, in ip_forward().
should solve PR 11201.


# 1.118 26-Aug-2000 itojun

make sure anonport{min,max} is not negative number


# 1.117 25-Aug-2000 tron

Add new sysctl variables "net.inet.ip.lowportmin" and
"net.inet.ip.lowportmax" which can be used to the set minimum
and maximum port number assigned to sockets using
IP_PORTRANGE_LOW.


# 1.116 06-Jul-2000 itojun

remove unnecessary #include <netkey/key_debug.h>. from kame.


# 1.115 28-Jun-2000 mrg

<vm/vm.h> -> <uvm/uvm_extern.h>


Revision tags: netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.114 10-May-2000 itojun

branches: 1.114.4;
add missing boundary checks to ip options processing.
correct timestamp option validation (len and ptr upper/lower bound
based on RFC791).
fill "pointer" field for parameter problem in timestamp option processing.


# 1.113 10-May-2000 itojun

correct more out-of-bounds memory access, if cnt == 1 and optlen > 1.


# 1.112 06-May-2000 sommerfeld

Handle large offsets with very small options correctly.


# 1.111 31-Mar-2000 jdolecek

Slighly improve previous - only include <netinet/ip_mroute.h> if MROUTING
is defined.


# 1.110 31-Mar-2000 jdolecek

include <netinet/ip_mroute.h> for ip_mforward() - needed after
last duplicate prototype sweep (prototype for ip_mforward() used to be in <netinet/ip_var.h>)


# 1.109 30-Mar-2000 augustss

Remove register declarations.


# 1.108 30-Mar-2000 simonb

Delete uninitialised declaration of ip_defttl - there's an initialised
decl earlier in this file.


# 1.107 10-Mar-2000 thorpej

Back out previous, and adjust a comment.


# 1.106 07-Mar-2000 thorpej

Back out part of 1.104 which isn't actually needed.


# 1.105 03-Mar-2000 itojun

remove unnecessary ttl initialization which I mistakingly bringed in
during KAME merge (this is part of WIDE's expeirmental reass code...)
NetBSD PR: 9412
From: Wolfgang Rupprecht <wolfgang@wsrcc.com>
Fix from: ho@crt.se
itojun was notified from: theo


# 1.104 02-Mar-2000 thorpej

Avoid a bug in GCC which manifests itself when processing unaligned
IP options. Problem pointed out by Matt Hargett and Erik Fair, analyzed
by me.


# 1.103 01-Mar-2000 itojun

introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)


# 1.102 20-Feb-2000 darrenr

pass "struct pfil_head *" to pfil_add_hook and pfil_remove hook rather
than "struct protosw *".


# 1.101 17-Feb-2000 darrenr

Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.


# 1.100 16-Feb-2000 itojun

- if ip_dst matches address on !IFF_UP interface, and
- there's no match against addresses on IFF_UP interface,
send icmp unreach if I'm router. drop it if I'm host.

Revised version of PR: 9387 from nrt@iij.ad.jp. Discussed with thorpej+nrt.


Revision tags: chs-ubc2-newbase
# 1.99 12-Feb-2000 thorpej

Typo (Thanks, Havard :-)


# 1.98 12-Feb-2000 thorpej

Small cosmetic change, and note a place where a statistic should be
gathered.


# 1.97 11-Feb-2000 itojun

fix in-kernel packet forwarding loop (till TTL becomes 0) when:
- a packet is delivered to an address X,
- and the address X is configured on my !IFF_UP interface
- and ipforwarding=1

NetBSD PR: 9387
From: nrt@iij.ad.jp


# 1.96 01-Feb-2000 thorpej

Use ifatoia() and sintosa() consistently, rather than using home-grown
casting macros intermixed.


# 1.95 31-Jan-2000 itojun

bring in latest KAME ipsec tree.
- interop issues in ipcomp is fixed
- padding type (after ESP) is configurable
- key database memory management (need more fixes)
- policy specification is revisited

XXX m->m_pkthdr.rcvif is still overloaded - hope to fix it soon


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.94 26-Oct-1999 itojun

disable ipflow (IPv4 fast fowarding) when IPsec is configured into the kernel.


# 1.93 17-Oct-1999 sommerfeld

branches: 1.93.2; 1.93.4;
In ip_forward():

Avoid forwarding ip unicast packets which were contained inside
link-level multicast packets; having M_MCAST still set in the packet
header flags will mean that the packet will get multicast to a bogus
group instead of unicast to the next hop.

Malformed packets like this have occasionally been spotted "in the
wild" on a mediaone cable modem segment which also had multiple netbsd
machines running as router/NAT boxes.

Without this, any subnet with multiple netbsd routers receiving all
multicasts will generate a packet storm on receipt of such a
multicast. Note that we already do the same check here for link-level
broadcasts; ip6_forward already does this as well.

Note that multicast forwarding does not go through ip_forward().

Adding some code to if_ethersubr to sanity check link-level
vs. ip-level multicast addresses might also be worthwhile.


Revision tags: chs-ubc2-base
# 1.92 23-Jul-1999 itojun

branches: 1.92.2;
do not include unnecessary include files.


# 1.91 09-Jul-1999 thorpej

defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).


# 1.90 06-Jul-1999 itojun

sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups


# 1.89 01-Jul-1999 itojun

IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.88 26-Jun-1999 sommerfeld

If the new global variable hostzerobroadcast is zero, no longer assume
address zero of each net/subnet is a broadcast address.
(The default value is nonzero, which preserves the current behavior).

This can be set using sysctl; the boot-time default can also be
configured using the HOSTZEROBROADCAST kernel config option.

While we're here, defopt HOSTZEROBROADCAST and SUBNETSARELOCAL


# 1.87 04-May-1999 hwr

It does not make much sense to increase a "output" counter on input.


# 1.86 03-May-1999 thorpej

In INADDR_TO_IA(), skip interfaces which are not up. Revert previous change
to ip_input.c to check the interface status after INADDR_TO_IA().

Fix cooked up by Heiko Rupp and myself.

Fixes PR 7480.


# 1.85 03-May-1999 hwr

Drop packets, that have a Class-D address as source address.
Implements the first half of PR 7003.


# 1.84 07-Apr-1999 proff

tiny KNF change


# 1.83 07-Apr-1999 proff

Prevent reception of packets on downed interfaces (via an up interface).
fixes kern/7327


Revision tags: netbsd-1-4-base
# 1.82 27-Mar-1999 aidan

branches: 1.82.2;
Added per-addr input/output statistics. Currently just support netatalk
and netinet, currently only tested under netinet.

Disabled by default, enabled by compiling the kernel with option
IFA_STATS. Enabling this feature seems to make the ip_output function
take 13% longer than before, which should be OK for people that need
this feature.


# 1.81 26-Mar-1999 proff

security: test for ip_len < ip_hl <<2 and drop packet accordingly


# 1.80 19-Jan-1999 mycroft

There's just no plausible reason to byte-swap ip_id internally. It's opaque.


# 1.79 19-Jan-1999 mycroft

Don't screw with ip_len; just subtract from it where we actually use the
value.


# 1.78 19-Jan-1999 mycroft

Don't overwrite the checksum fields when checking them. There's no reason to
do this, and it screws up ICMP replies.
XXX The returned IP checksum and length are still wrong.


# 1.77 11-Jan-1999 thorpej

Fix byte order and ip_len inconsistencies in ICMP reply code. Also, fix
some formatting and HTONS(foo) vs. foo = htons(foo) inconsistencies.

PR #6602, Darren Reed.


# 1.76 19-Dec-1998 thorpej

Reverse the copyright-notice-swap. It went against existing practice.


# 1.75 18-Dec-1998 thorpej

Add a lock around the IP fragment reassembly queue, to prevent ip_drain()
from corrupting the queue if called from a device's interrupt context.

Should fix PR #5684.


Revision tags: kenh-if-detach-base
# 1.74 13-Nov-1998 thorpej

branches: 1.74.2;
Once a fragmented IP packet has been reassembled, recompute the packet
length before passing it up the stack. From FreeBSD.


Revision tags: chs-ubc-base
# 1.73 08-Oct-1998 thorpej

Use the pool allocator for ipflow entries.


# 1.72 08-Oct-1998 thorpej

Use the pool allocator for ipqent structures.


# 1.71 30-Sep-1998 tls

Switch order of TNF and UCB copyrights so UCB copyright is first; this seems more appropriate since UCB wrote the original code, after all.


# 1.70 09-Sep-1998 thorpej

Make a diagnostic printf more sensible, PR #5951, Heiko W. Rupp.


# 1.69 09-Aug-1998 mrg

defopt PFIL_HOOKS.


Revision tags: eeh-paddr_t-base
# 1.68 17-Jul-1998 sommerfe

Fix PR5508: ipfil cut-through forwarding causes panic


# 1.67 01-Jun-1998 thorpej

Protect the ipflow_reap() call with splsoftnet.


# 1.66 24-May-1998 thorpej

Fix OBOB in IP timestamp option processing, as noted in FreeBSD PR 6738,
from Jennifer Dawn Meyers <jdm@enteract.com>.


# 1.65 04-May-1998 matt

Default IP flow to being enabled. Add a sysctl to control the maximum
number of flows (net.inet.ip.maxflows). If set to 0, will disable fast
path forwarding.


# 1.64 01-May-1998 thorpej

Allow packet filters to prevent a packet from creating a fast-forwarding
flow, by setting the "can fast forward" flag in the packet header, and
giving a chance for filters to clear the flag. If the flag is still
set after the filters have given it a chance, the packet will be used
to create a fast-forward flow entry.


# 1.63 29-Apr-1998 matt

Add support for "fast" forwarding. Add hooks in if_ethersubr.c and
if_fddisubr.c to fastpath IP forwarding. If ip_forward successfully
forwards a packet, it will create a cache (ipflow) entry. ether_input
and fddi_input will first call ipflow_fastforward with the received
packet and if the packet passes enough tests, it will be forwarded (the
ttl is decremented and the cksum is adjusted incrementally).


# 1.62 29-Apr-1998 matt

defopt GATEWAY


# 1.61 29-Apr-1998 kml

change path MTU timeout value to match RFC 1191


# 1.60 29-Apr-1998 kml

Add support for deletion of routes added by path MTU discovery;
uses new generic route timeout code. Add sysctl for timeout period.


# 1.59 19-Mar-1998 mrg

convert pfil(9) in and out lists from <sys/queue.h> LISTs to TAILQs, and
change pfil_add_hook to put output filters at the tail of the queue,
while continuing to place input filters at the head of the queue. update
the two users of these functions, and document these changes.

fixes PR#4593.


# 1.58 15-Feb-1998 tls

Add correct copyright notice for IP address hash change. This code is donated to TNF by the original copyright holder, Panix.


# 1.57 13-Feb-1998 tls

Change list of interface IP addresses to a hash. Improves performance on hosts with a large number of IP addresses significantly.


# 1.56 28-Jan-1998 thorpej

Use offsetof() from libkern.h


# 1.55 12-Jan-1998 scottr

Use option header file for MROUTING


# 1.54 05-Jan-1998 lukem

enhance ephemeral port allocation code:
* support sysctl net.inet.ip.anonportmin (lowest ephemeral port)
and net.inet.ip.anonportmax (highest ephemeral port).
these can't be set to >65535, < IPPORT_RESERVED (unless IPNOPRIVPORTS
is defined), and anonportmin has to be < anonportmax.
* use a cleaner way of only cycling through the available set once;
this will be useful for when a random allocation scheme is used
* define IPPORT_ANON{MIN,MAX} instead of IPPORT_USER{LOW,HIGH}


Revision tags: netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base
# 1.53 18-Oct-1997 kml

branches: 1.53.2;
change sysctl net.inet.icmp.mtudisc to net.inet.ip.mtudisc


# 1.52 17-Oct-1997 thorpej

Allow `subnetsarelocal' to be changed via sysctl.


Revision tags: thorpej-signal-base marc-pcmcia-base
# 1.51 29-Aug-1997 gwr

Tweaks to allow operation with an interface address of 0.0.0.0
(needed for NFS mountroot using BOOTP to get boot parameters)


Revision tags: marc-pcmcia-bp
# 1.50 24-Jun-1997 thorpej

branches: 1.50.4;
Eliminate use of dtom() from the network code, allowing more flexible
use of mbuf external storage and increasing performance (by eliminating
an m_pullup() for clusters in the IP reassembly code).

Changes from Koji Imada <koji@math.human.nagoya-u.ac.jp>, in PR #3628
and #3480, with ever-so-slight integration changes by me.


# 1.49 15-Apr-1997 christos

Move the mtod calls *after* we've made sure that the packet has passed the
filter successfully. Otherwise it can be NULL if the filter blocked it,
and we die. How did this ever work?


Revision tags: is-newarp-before-merge
# 1.48 26-Feb-1997 mrg

allow src-routed packetd by default, per host requirements


# 1.47 25-Feb-1997 cjs

Add net.inet.ip.allowsrcrt option which allows/drops all source
routed packets. This currently defaults to `drop,' but once we
verify that all applications that rely on determining remote IP
addresses for authentication are dropping the connection when they
see a source route option (not just disabling the source route
option), we can turn this back on and conform with the host
requirements.


# 1.46 19-Feb-1997 cjs

Fix bug in sysctl net.inet.ip.forwsrcrt handing: now you can read it
if securelevel > 0. (Thanks, cgd.)


# 1.45 18-Feb-1997 mrg

pseudo-device ipfilter brings in PFIL_HOOKS.


Revision tags: is-newarp-base
# 1.44 11-Jan-1997 thorpej

branches: 1.44.4;
Implement the IP_RECVIF socket option: supply a datagram packet's incoming
interface using a sockaddr_dl in a control mbuf.

Implement SO_TIMESTAMP for IP datagrams.

Move packet information option processing into a generic function
so that they work with multicast UDP and raw IP as well as unicast UDP.

Contributed by Bill Fenner <fenner@parc.xerox.com>.


# 1.43 20-Dec-1996 mrg

in pfil_hooks: always reassign ip after calling hook.


# 1.42 20-Dec-1996 mrg

remove pfil_bad.


# 1.41 25-Oct-1996 thorpej

Before concatenating frags, sanity check the length of the packet. If it's
larger than IP_MAXPACKET, discard it.
Based on a patch from Bill Fenner <fenner@parc.xerox.com>


# 1.40 22-Oct-1996 veego

Fix a panic from the pfil_hooks.


# 1.39 13-Oct-1996 christos

backout previous kprintf changes


# 1.38 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.37 21-Sep-1996 perry

commit fix in pr 2772 -- the IP input code was assuming that the
reserved (must be zero) flag must necessarily be zero. We now define
an IP_RF (by analogy to IP_DF and IP_MF) and mask it out when necessary.


# 1.36 14-Sep-1996 mrg

move the packet filter hooks in to a saner location. while i'm here, rename
PACKET_FILTER to PFIL_HOOKS.


# 1.35 09-Sep-1996 mycroft

Add in_nullhost() and in_hosteq() macros, to hide some protocol
details. Also, fix a bug in TCP wrt SYN+URG packets.


# 1.34 08-Sep-1996 mycroft

Save 68 bytes of the packet for ICMP, not 64. From Laine Stump, PR 2296.


# 1.33 06-Sep-1996 mrg

add packet filter interface code. see pfil(9) for more details. you
need the PACKET_FILTER option to enable this code. currently, ipfilter
version 3.1.1-beta has been converted to use this new interface.


# 1.32 14-Aug-1996 thorpej

Fix some DIAGNOSTIC printf() formats; ntohl() provides a 32-bit quantity,
and should be printed with %x, not %lx.


# 1.31 10-Jul-1996 cgd

print result of ntohl/htonl as a long. (makes -Wformat work on the
Alpha.)


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.30 16-Mar-1996 christos

branches: 1.30.4;
Fix printf format args.


# 1.29 26-Feb-1996 mrg

two more local addr changes, all done differently now (idea from charles)


# 1.28 13-Feb-1996 christos

netinet prototypes


# 1.27 16-Jan-1996 thorpej

Add a net.inet.ip.directed-broadcast sysctl as suggested by
Darren Reed <darrenr@vitruvius.arbld.unimelb.edu.au> in PR #1227.
This change is slightly different than the one submitted by Darren in
that the DIRECTED_BROADCAST compile-time option will behave like it used
to so that existing configurations utilizing it won't have to change.


# 1.26 15-Jan-1996 thorpej

Add net.inet.ip.forwsrcrt: if zero, the system will not forward
source-routed packets. Note this value is protected by kernel security
level; it can only be changed if securelevel < 1.


# 1.25 21-Nov-1995 cgd

make netinet work on systems where pointers and longs are 64 bits
(like the alpha). Biggest problem: IP headers were overlayed with
structure which included pointers, and which therefore didn't overlay
properly on 64-bit machines. Solution: instead of threading pointers
through IP header overlays, add a "queue element" structure to do
the threading, and point it at the ip headers.


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.24 12-Aug-1995 mycroft

splnet --> splsoftnet


# 1.23 12-Jun-1995 mycroft

Change in_pcbnotify*() to take an errno value. Make inetctlerrmap[] an
array on ints, not u_chars.


# 1.22 12-Jun-1995 mycroft

Various cleanup, including:
* Convert several data structures to use queue.h.
* Split in_pcbnotify() into two parts; one for notifying a specific PCB, and
one for notifying all PCBs for a particular foreign address.


# 1.21 07-Jun-1995 mycroft

Remove ip_ifmatrix completely.


# 1.20 04-Jun-1995 mycroft

Don't cast things unnecessarily.


# 1.19 04-Jun-1995 mycroft

Clean up many more casts.


# 1.18 01-Jun-1995 mycroft

Avoid byte-swapping IP addresses at run time.


# 1.17 15-May-1995 cgd

oops; forgot a '{'


# 1.16 14-May-1995 cgd

drop (and record) malformed IP fragments. Fixes pr 1030 (differently).


# 1.15 13-Apr-1995 cgd

be a bit more careful and explicit with types. (basically a large no-op.)


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.14 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.13 13-May-1994 mycroft

Update to 4.4-Lite networking code, with a few local changes.


# 1.12 14-Feb-1994 mycroft

PARANOID --> DIAGNOSTIC for inexpensive tests.


# 1.11 02-Feb-1994 hpeyerl

Multicast is no longer optional.


# 1.10 29-Jan-1994 brezak

Fix some cases of NOT dealing with m_pkthdr's. This code is still suspect though, at least this fixes some panics.


# 1.9 10-Jan-1994 mycroft

Should compile now with or without `options MULTICAST'.


# 1.8 09-Jan-1994 mycroft

Prototype the rest.


# 1.7 08-Jan-1994 mycroft

More prototypes.


# 1.6 08-Jan-1994 mycroft

Fix some inconsistent spacing; spaces at the end of lines, etc.


# 1.5 18-Dec-1993 mycroft

Canonicalize all #includes.


# 1.4 06-Dec-1993 hpeyerl

multicast support.
>From Chris Maeda, cmaeda@cs.washington.edu
These patches are derived from the IP Multicast patches for BSDI.


Revision tags: magnum-base netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.3 20-May-1993 cgd

branches: 1.3.4;
more rcsid additions and file header cleanups


# 1.2 04-May-1993 cgd

make ip_input recursion checking be for -DPARANOID, and make it panic


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.360 27-Jul-2017 ozaki-r

Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.


# 1.359 19-Jul-2017 ozaki-r

Correct a comment


Revision tags: perseant-stdc-iso10646-base
# 1.358 08-Jul-2017 christos

Reorder the controls to the ones that need an interface and the ones that
don't; process the ones that don't first. Add a DIAGNOSTIC if there is no
interface; really this should be a KASSERT/panic because it is a bug if the
interface is not set at this point.


# 1.357 06-Jul-2017 christos

remove unnecessary casts (no functional change)


# 1.356 06-Jul-2017 christos

Merge the two copies SO_TIMESTAMP/SO_OTIMESTAMP processing to a single
function, and add a SOOPT_TIMESTAMP define reducing compat pollution from
5 places to 1.


Revision tags: netbsd-8-base
# 1.355 01-Jun-2017 chs

remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base
# 1.354 31-Mar-2017 ozaki-r

Don't use a single global variable to store source route information for multiple incoming packets

It's not MP-safe. So use a m_tag to store the information instead.

Pointed out by knakahara@
The fix is from OpenBSD (originally fixed in FreeBSD)


# 1.353 31-Mar-2017 ozaki-r

Don't use a single global variable as a temporal storage for multiple packets

It's not MP-safe. So use local variables instead.


Revision tags: pgoyette-localcount-20170320
# 1.352 06-Mar-2017 ozaki-r

Make sure icmp_redirect_timeout_q and ip_mtudisc_timeout_q are initialized on bootup

Fix PR kern/52029


# 1.351 17-Feb-2017 ozaki-r

Fix return value


# 1.350 17-Feb-2017 ozaki-r

Protect sysctl_net_inet_ip_pmtudto with icmp_mtx instead of softnet_lock


# 1.349 07-Feb-2017 ozaki-r

Add missing NULL checks for m_get_rcvif


Revision tags: nick-nhusb-base-20170204
# 1.348 24-Jan-2017 ozaki-r

Tweak softnet_lock and NET_MPSAFE

- Don't hold softnet_lock in some functions if NET_MPSAFE
- Add softnet_lock to sysctl_net_inet_icmp_redirtimeout
- Add softnet_lock to expire_upcalls of ip_mroute.c
- Restore softnet_lock for in{,6}_pcbpurgeif{,0} if NET_MPSAFE
- Mark some softnet_lock for future work


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107
# 1.347 12-Dec-2016 ozaki-r

branches: 1.347.2;
Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.346 08-Dec-2016 ozaki-r

Use psref for ip_rtaddr

ip_rtaddr will be sleepable soon. So use psref instead of pserialize.


# 1.345 08-Dec-2016 ozaki-r

Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.344 18-Oct-2016 ozaki-r

Don't hold global locks if NET_MPSAFE is enabled

If NET_MPSAFE is enabled, don't hold KERNEL_LOCK and softnet_lock in
part of the network stack such as IP forwarding paths. The aim of the
change is to make it easy to test the network stack without the locks
and reduce our local diffs.

By default (i.e., if NET_MPSAFE isn't enabled), the locks are held
as they used to be.

Reviewed by knakahara@


# 1.343 18-Oct-2016 ozaki-r

Avoid double frees of mbuf

May fix one of panicks reported by Tom Ivar Helbekkmo in PR kern/51522


# 1.342 11-Oct-2016 ozaki-r

Fix kernel builds with IFA_STATS


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.341 07-Sep-2016 roy

Disallow input to detached addresses because they are not yet valid.


# 1.340 31-Aug-2016 ozaki-r

Make ipforward_rt and ip6_forward_rt percpu

Sharing one rtcache between CPUs is just a bad idea.

Reviewed by knakahara@


Revision tags: pgoyette-localcount-20160806
# 1.339 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


# 1.338 26-Jul-2016 ozaki-r

Fix downmatch increment


Revision tags: pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.337 08-Jul-2016 ozaki-r

branches: 1.337.2;
CID 1363344: remove dead code

We may need to reconsider a case when m_get_rcvif_psref returns NULL.


# 1.336 07-Jul-2016 ozaki-r

Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.


# 1.335 06-Jul-2016 ozaki-r

Switch the IPv4 address list to pslist(9)

Note that we leave the old list just in case; it seems there are some
kvm(3) users accessing the list. We can remove it later if we confirmed
nobody does actually.


# 1.334 06-Jul-2016 ozaki-r

Add and use pslist(9)-based hashtable for IPv4 addresses

Note that we leave the old hashtable to keep vmstat -H working.


# 1.333 04-Jul-2016 ozaki-r

Separate IP address matching functions

No functional change intended.


# 1.332 30-Jun-2016 ozaki-r

Tidy up goto lables

No functional change.


# 1.331 30-Jun-2016 ozaki-r

Fix error paths

Some error paths did m_put_rcvif_psref twice.


# 1.330 28-Jun-2016 ozaki-r

Add missing NULL checks for m_get_rcvif_psref


# 1.329 10-Jun-2016 ozaki-r

Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319
# 1.328 21-Jan-2016 riastradh

Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.


# 1.327 21-Jan-2016 riastradh

Give proper prototype to ip_output.


# 1.326 08-Jan-2016 knakahara

eliminate ip_input.c and ip6_input.c dependency on gif(4)


Revision tags: nick-nhusb-base-20151226
# 1.325 13-Oct-2015 roy

Include arp.h to restore the sysctl net.inet.ip.dad_count.
Fixes PR kern/49883 thanks to HITOSHI Osada.


Revision tags: nick-nhusb-base-20150921
# 1.324 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.323 07-Aug-2015 ozaki-r

Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.


Revision tags: nick-nhusb-base-20150606
# 1.322 02-May-2015 joerg

Fix !ARP build.


# 1.321 02-May-2015 roy

Add IPv4 address flags IN_IFF_TENTATIVE, IN_IFF_DUPLICATED and
IN_IFF_DETATCHED to mimic the IPv6 address behaviour.
Add SIOCGIFAFLAG_IN ioctl to retrieve the address flag via the
ifreq structure.
Add IPv4 DAD detection via the ARP methods described in RFC 5227.
Add sysctls net.inet.ip.dad_count and net.inet.arp.debug.

Discussed on tech-net@


Revision tags: nick-nhusb-base-20150406
# 1.320 26-Mar-2015 ozaki-r

Tidy up the regular path of ip_forward

No functional change is intended.


Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.319 16-Jun-2014 ozaki-r

branches: 1.319.4;
Add 3rd argument to pktq_create to pass sc

It will be used to pass bridge sc for bridge_forward softint.

ok rmind@


# 1.318 05-Jun-2014 rmind

- Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.


# 1.317 30-May-2014 christos

Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.


# 1.316 29-May-2014 rmind

Make IGMP and multicast group management code MP-safe. Use a read-write
lock to protect the hash table of multicast address records; also, make it
private and eliminate some macros. In the long term, the lookup path ought
to be optimised.


# 1.315 28-May-2014 christos

CID 12164{49,51}: Remove bogus ifp == NULL checks; if ifp was really NULL,
we would have been dead a few lines before the tests.


# 1.314 23-May-2014 rmind

ip_input(), ip_savecontrol(): cache m->m_pkthdr.rcvif in a variable.


# 1.313 23-May-2014 rmind

Make ip_forward() static, there is no need to expose it.


# 1.312 23-May-2014 rmind

Make ip_input() static, there is no need to expose it.


# 1.311 22-May-2014 rmind

- Add in_init() and move some functions, variables and sysctls into in.c
where they belong to. Make some functions and variables static.
- ip_input.c: reduce some #ifdefs, cleanup a little.
- Move some sysctls into ip_flow.c as they belong there.

No functional change.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.310 19-Mar-2014 liamjfoy

branches: 1.310.2;
Remove ipflow_prune and replace with ipflow_reap. ok rmind@


Revision tags: riastradh-drm2-base3
# 1.309 25-Feb-2014 pooka

Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.308 29-Jun-2013 rmind

- Rewrite parts of pfil(9): use array to store hooks and thus be more cache
friendly (there are only few hooks in the system). Make the structures
opaque and the interface more strict.
- Remove PFIL_HOOKS option by making pfil(9) mandatory.


# 1.307 27-Jun-2013 christos

branches: 1.307.2;
flip src/dst


# 1.306 27-Jun-2013 christos

implement IP_PKTINFO and IP_RECVPKTINFO.


# 1.305 08-Jun-2013 rmind

Split IPsec code in ip_input() and ip_forward() into the separate routines
ipsec4_input() and ipsec4_forward(). Tested by christos@.


# 1.304 05-Jun-2013 christos

IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.


Revision tags: agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.303 29-Nov-2012 christos

Add a new sysctl to mark ports as reserved, so that they are not used in
the anonymous or reserved port allocation.


Revision tags: yamt-pagecache-base6
# 1.302 25-Jun-2012 christos

branches: 1.302.2;
rename rfc6056 -> portalgo, requested by yamt


# 1.301 22-Jun-2012 christos

PR/46602: Move the rfc6056 port randomization to the IP layer.


# 1.300 02-Jun-2012 dsl

Add some pre-processor magic to verify that the type of the data item
passed to sysctl_createv() actually matches the declared type for
the item itself.
In the places where the caller specifies a function and a structure
address (typically the 'softc') an explicit (void *) cast is now needed.
Fixes bugs in sys/dev/acpi/asus_acpi.c sys/dev/bluetooth/bcsp.c
sys/kern/vfs_bio.c sys/miscfs/syncfs/sync_subr.c and setting
AcpiGbl_EnableAmlDebugObject.
(mostly passing the address of a uint64_t when typed as CTLTYPE_INT).
I've test built quite a few kernels, but there may be some unfixed MD
fallout. Most likely passing &char[] to char *.
Also add CTLFLAG_UNSIGNED for unsiged decimals - not set yet.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.299 22-Mar-2012 drochner

remove KAME IPSEC, replaced by FAST_IPSEC


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.298 09-Jan-2012 liamjfoy

check against NULL


# 1.297 19-Dec-2011 drochner

rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.296 31-Aug-2011 plunky

branches: 1.296.2; 1.296.6;
NULL does not need a cast


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.295 03-May-2011 dyoung

*_drain() routines may be called with locks held, so instead of doing
any work in *_drain(), set a drain-needed flag. Do the work in the
fasttimo handler.

Contributed by Coyote Point Systems, Inc.


# 1.294 14-Apr-2011 dyoung

In ipintr(), don't overwrite ipintrq.ifq_maxlen with IFQ_MAXLEN.

Initialize ipintrq.ifq_maxlen using IFQ_MAXLEN directly instead of using
the global ipqmaxlen. Get rid of the global ipqmaxlen.

Now it works again to override the maximum IP queue length with, for
example, sysctl -w net.inet.ip.ifq.maxlen=5.


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.293 13-Dec-2010 matt

branches: 1.293.2;
Back out rev that shouldn't have been committed.


# 1.292 11-Dec-2010 matt

Add routines to calculate a checkesum if the driver concludes that the
h/w can't do it.


Revision tags: uebayasi-xip-base4
# 1.291 05-Nov-2010 rmind

ip_randomid: make mechanism MP-safe and more modular.

OK matt@


# 1.290 05-Nov-2010 rmind

ip_reass_packet: finish abstraction; some clean-up.
Discussed some time ago with matt@.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.289 19-Jul-2010 rmind

Abstract IP reassembly into single generic routine - ip_reass_packet().
Make struct ipq private and struct ipqent not visible to userland.
Push ip_len adjustment into reassembly layer.

OK matt@


# 1.288 13-Jul-2010 rmind

Split-off IPv4 re-assembly mechanism into a separate module. Abstract
into ip_reass_init(), ip_reass_lookup(), etc (note: abstraction is not
yet complete). No functional changes to the actual mechanism.

OK matt@


# 1.287 09-Jul-2010 rmind

ip_input: move lookup for fragment queue a little bit further. OK matt@.


Revision tags: uebayasi-xip-base1
# 1.286 01-Apr-2010 tls

As suggested by at least 3 different people (the guilty parties know who
they are) avoid repeated kernel_lock/unlock by using an intrq on the stack.

About 5%-10% better from run to run, on my *very* simpleminded test. Can't
possibly be worse.


# 1.285 31-Mar-2010 tls

Don't hold kernel lock across call to ip_input() -- it blocked *all*
hardware interrupts for the length of time it took for all dequeued
packets to flow up the stack (on multiprocessors only). Initial testing
shows performance impact is minimal -- since this temporary fix actually
means taking/releasing the kernel lock per-packet, that seems
acceptable.

Holding the kernel lock across the ip_input() call duplicated the
exclusion intended to be provided by the socket locks/softnet lock
(same lock, for INET/INET6 sockets) and could mask serious bugs. Several
hours' testing didn't turn any up but I'd be surprised if some don't now
appear.

Damon Permezel noticed the problem. Temporary fix suggested by matt@.


Revision tags: yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.284 16-Sep-2009 pooka

branches: 1.284.2; 1.284.4;
Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.283 17-Jul-2009 minskim

Delete trailing whitespace.


Revision tags: yamt-nfs-mp-base6
# 1.282 16-Jul-2009 minskim

Add the IP_RECVTTL option support.

If the IP_RECVTTL option is enabled on a SOCK_DGRAM socket, the
recvmsg(2) call will return the TTL of the received datagram. The
msg_control field in the msghdr structure points to a buffer that
contains a cmsghdr structure followed by the TTL value.

Modeled after FreeBSD implementation.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.281 18-Apr-2009 tsutsui

Remove extra whitespace added by a stupid tool.
XXX: more in src/sys/arch


# 1.280 15-Apr-2009 elad

Remove a few KAUTH_GENERIC_ISSUSER in favor of more descriptive
alternatives.

Discussed on tech-kern:

http://mail-index.netbsd.org/tech-kern/2009/04/11/msg004798.html

Input from ad@, christos@, dyoung@, tsutsui@.

Okay ad@.


# 1.279 18-Mar-2009 cegger

bcopy -> memcpy


Revision tags: nick-hppapmap-base2
# 1.278 19-Jan-2009 christos

branches: 1.278.2;
Provide compatibility to the old timeval SCM_TIMESTAMP messages.


Revision tags: mjf-devfs2-base
# 1.277 17-Dec-2008 cegger

kill MALLOC and FREE macros.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.276 23-Nov-2008 rmind

ip_input: fix an IPQ "lock" leak. (hi <matt>!)


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4
# 1.275 04-Oct-2008 pooka

branches: 1.275.2; 1.275.4;
POOL_INIT -> pool_init


Revision tags: wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.274 05-Sep-2008 seanb

Wrong route being consulted in one place
in ip_forward() after change to rtcache_*().
Restore previous behaviour.


# 1.273 20-Aug-2008 matt

Make the sysctl routines take out softnet_lock before dealing with
any data structures.

Change inet6ctlerrmap and zeroin6_addr to const.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base
# 1.272 05-May-2008 ad

branches: 1.272.2; 1.272.6;
- Convert hashinit() to use kmem_alloc(). The hash tables can be large
and it's better to not have them in kmem_map.
- Convert a couple of minor items along the way to kmem_alloc().
- Fix some memory leaks.


# 1.271 04-May-2008 thorpej

Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577


# 1.270 02-May-2008 ad

PR kern/38497 Out of memory allocating ksiginfo

Work around: don't acquire softnet_lock in protocol drain routines.


# 1.269 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.268 24-Apr-2008 ad

branches: 1.268.2;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.


# 1.267 23-Apr-2008 thorpej

Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.266 12-Apr-2008 thorpej

branches: 1.266.2;
Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.


# 1.265 09-Apr-2008 thorpej

- ipflow is not used outside ip_flow.c; move its definition there.
- Make ipflow_reap() private to ip_flow.c, and introduce ipflow_prune()
for external callers to use (avoids returning an ipflow * that is never
actually used anyway).


# 1.264 07-Apr-2008 thorpej

Change IP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ipstat structure; old netstat
binaries will continue to work properly.


# 1.263 27-Mar-2008 cube

- Make sure we send a reasonable fragment size when IPSEC is configured.
Otherwise we end up sending a dubious "0" whenever we cannot find a
proper association for the packet.
- Reset sack_newdata along with snd_nxt to avoid improper integer
arithmetics that lead to sending data from an incorrect place in the
stream, making it appear as corrupted.

Patch by Michael Van Elst, based on an analysis by Michael for the IPSEC
stuff and I for the SACK issue.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.262 06-Feb-2008 matt

branches: 1.262.6;
Add a new ip_id generation scheme based on a Fisher-Yates shuffle over a
sliding window. XXX replace use of arc4random RSN.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.261 14-Jan-2008 dyoung

Use rtcache_validate() instead of rtcache_getrt(). Shorten staircase
in in_losing().


Revision tags: vmlocking2-base3 matt-armv6-base
# 1.260 22-Dec-2007 matt

Fix offset calculation.
Make sure that all frags use the same TOS.


# 1.259 21-Dec-2007 matt

Also make sure the first is at 68 bytes long.


# 1.258 21-Dec-2007 matt

Prevent TCP blind data attacks by not allowing non-initial fragments to
start at less than 68 bytes (minimal fragment size).


# 1.257 20-Dec-2007 dyoung

Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.256 26-Nov-2007 yamt

branches: 1.256.2; 1.256.6;
inetctlerrmap: use designated initializer.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.255 09-Nov-2007 kefren

Don't MCLAIM in ipintr() because we do it anyway in ip_input()


Revision tags: jmcneill-base yamt-x86pmap-base4 yamt-x86pmap-base3 yamt-x86pmap-base2 vmlocking-base
# 1.254 02-Oct-2007 dyoung

branches: 1.254.2; 1.254.4;
Delete the unused second argument to ip_stripoptions(), move it
closer to its single caller in if_eon.c, try to move fewer bytes
by moving the IP header forward instead of moving the tail of the
mbuf backward, and use m_adj(9) instead of fiddling directly with
mbuf data members.


Revision tags: yamt-x86pmap-base
# 1.253 11-Sep-2007 degroote

branches: 1.253.2;
In some FAST_IPSEC, spl level is not restored correctly. Fix that.

Spotted by Wolfgang Stukenbrock in pr/36800


Revision tags: nick-csl-alignment-base5
# 1.252 30-Aug-2007 dyoung

Use malloc(9) for sockaddrs instead of pool(9), and remove dom_sa_pool
and dom_sa_len members from struct domain. Pools of fixed-size
objects are too rigid for sockaddr_dls, whose size can vary over
a wide range.

Return sockaddr_dl to its "historical" size. Now that I'm using
malloc(9) instead of pool(9) to allocate sockaddr_dl, I can create
a sockaddr_dl of any size in the kernel, so expanding sockaddr_dl
is useless.

Avoid using sizeof(struct sockaddr_dl) in the kernel.

Introduce sockaddr_dl_alloc() for allocating & initializing an
arbitrary sockaddr_dl on the heap.

Add an argument, the sockaddr length, to sockaddr_alloc(),
sockaddr_copy(), and sockaddr_dl_setaddr().

Constify: LLADDR() -> CLLADDR().

Where the kernel overwrites LLADDR(), use sockaddr_dl_setaddr(),
instead. Used properly, sockaddr_dl_setaddr() will not overrun
the end of the sockaddr.


# 1.251 10-Aug-2007 dyoung

branches: 1.251.2;
Use sockaddr_dl_init().


Revision tags: matt-mips64-base
# 1.250 19-Jul-2007 dyoung

branches: 1.250.4; 1.250.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base yamt-idlelwp-base8 mjf-ufs-trans-base
# 1.249 02-May-2007 dyoung

branches: 1.249.2;
Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


Revision tags: thorpej-atomic-base
# 1.248 25-Mar-2007 liamjfoy

Add net.inet.ip.hashsize to control the IPv4 fast forward hash table size.


# 1.247 24-Mar-2007 liamjfoy

Don't call ip*flow_reap if we're just looking up maxflows


# 1.246 12-Mar-2007 ad

branches: 1.246.2; 1.246.4;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.


# 1.245 05-Mar-2007 liamjfoy

branches: 1.245.2;
Move ipflow_slowtimo from ip_slowtimo and into in_proto.c

ok matt@


# 1.244 04-Mar-2007 christos

Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.243 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.242 29-Jan-2007 dyoung

branches: 1.242.2;
Cosmetic: remove extraneous, non-KNF parentheses. Change a
sizeof(type) to a sizeof(*ptr) so the correctness of the statement
is correct "at a glance" (or so I hope).


# 1.241 22-Dec-2006 ad

ipintr(): check if the queue is empty before looping. Hardly a giant
win, but removed 30% of splnet() calls in one local test.


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.240 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.239 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


# 1.238 06-Dec-2006 dyoung

KNF.


# 1.237 06-Dec-2006 dyoung

KNF.


Revision tags: netbsd-4-0-RC1 netbsd-4-base
# 1.236 16-Nov-2006 christos

branches: 1.236.2; 1.236.4;
__unused removal on arguments; approved by core.


Revision tags: yamt-splraiseipl-base2
# 1.235 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


# 1.234 10-Oct-2006 dogcow

change the MOWNER_INIT define to take two args; fix extant struct mowner
decls to use it. Makes options MBUFTRACE compile again and not whinge about
missing structure declarations. (Also makes initialization consistent.)


# 1.233 05-Oct-2006 tls

Protect calls to pool_put/pool_get that may occur in interrupt context
with spl used to protect other allocations and frees, or datastructure
element insertion and removal, in adjacent code.

It is almost unquestionably the case that some of the spl()/splx() calls
added here are superfluous, but it really seems wrong to see:

s=splfoo();
/* frob data structure */
splx(s);
pool_put(x);

and if we think we need to protect the first operation, then it is hard
to see why we should not think we need to protect the next. "Better
safe than sorry".

It is also almost unquestionably the case that I missed some pool
gets/puts from interrupt context with my strategy for finding these
calls; use of PR_NOWAIT is a strong hint that a pool may be used from
interrupt context but many callers in the kernel pass a "can wait/can't
wait" flag down such that my searches might not have found them. One
notable area that needs to be looked at is pf.

See also:

http://mail-index.netbsd.org/tech-kern/2006/07/19/0003.html
http://mail-index.netbsd.org/tech-kern/2006/07/19/0009.html


# 1.232 19-Sep-2006 elad

Remove ugly (void *) casts from network scope authorization wrapper and
calls to it.

While here, adapt code for system scope listeners to avoid some more
casts (forgotten in previous run).

Update documentation.


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9
# 1.231 13-Sep-2006 elad

branches: 1.231.2;
Don't use KAUTH_RESULT_* where it's not applicable.
Prompted by yamt@.


# 1.230 08-Sep-2006 elad

First take at security model abstraction.

- Add a few scopes to the kernel: system, network, and machdep.

- Add a few more actions/sub-actions (requests), and start using them as
opposed to the KAUTH_GENERIC_ISSUSER place-holders.

- Introduce a basic set of listeners that implement our "traditional"
security model, called "bsd44". This is the default (and only) model we
have at the moment.

- Update all relevant documentation.

- Add some code and docs to help folks who want to actually use this stuff:

* There's a sample overlay model, sitting on-top of "bsd44", for
fast experimenting with tweaking just a subset of an existing model.

This is pretty cool because it's *really* straightforward to do stuff
you had to use ugly hacks for until now...

* And of course, documentation describing how to do the above for quick
reference, including code samples.

All of these changes were tested for regressions using a Python-based
testsuite that will be (I hope) available soon via pkgsrc. Information
about the tests, and how to write new ones, can be found on:

http://kauth.linbsd.org/kauthwiki

NOTE FOR DEVELOPERS: *PLEASE* don't add any code that does any of the
following:

- Uses a KAUTH_GENERIC_ISSUSER kauth(9) request,
- Checks 'securelevel' directly,
- Checks a uid/gid directly.

(or if you feel you have to, contact me first)

This is still work in progress; It's far from being done, but now it'll
be a lot easier.

Relevant mailing list threads:

http://mail-index.netbsd.org/tech-security/2006/01/25/0011.html
http://mail-index.netbsd.org/tech-security/2006/03/24/0001.html
http://mail-index.netbsd.org/tech-security/2006/04/18/0000.html
http://mail-index.netbsd.org/tech-security/2006/05/15/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/01/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/25/0000.html

Many thanks to YAMAMOTO Takashi, Matt Thomas, and Christos Zoulas for help
stablizing kauth(9).

Full credit for the regression tests, making sure these changes didn't break
anything, goes to Matt Fleming and Jaime Fournier.

Happy birthday Randi! :)


Revision tags: yamt-pdpolicy-base8 rpaulo-netinet-merge-pcb-base
# 1.229 30-Aug-2006 christos

branches: 1.229.2;
fix initializer


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.228 30-Jul-2006 elad

ugh.. more stuff that's overdue and should not be in 4.0: remove the
sysctl(9) flags CTLFLAG_READONLY[12]. luckily they're not documented
so it's only half regression.

only two knobs used them; proc.curproc.corename (check added in the
existing handler; its CTLFLAG_ANYWRITE, yay) and net.inet.ip.forwsrcrt,
that got its own handler now too.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base
# 1.227 07-Jun-2006 kardel

merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base simonb-timecounters-base
# 1.226 08-May-2006 liamjfoy

branches: 1.226.2;
#if -> #ifdef

ok christos


# 1.225 15-Apr-2006 christos

Coverity CID 1134: Protect against NULL deref.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.224 18-Feb-2006 joerg

branches: 1.224.2; 1.224.4; 1.224.6;
Print the source and destination IP in ip_forward's DIAGNOSTIC code
with inet_ntoa, making it more human friendly.

From Liam J. Foy in private mail.


# 1.223 24-Dec-2005 perry

branches: 1.223.2; 1.223.4; 1.223.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.222 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.221 01-Nov-2005 christos

Don't decrement the ttl, until we are sure that we can forward this packet.
Before if there was no route, we would call icmp_error with a datagram
packet that has an incorrect checksum. (From Liam Foy)


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base
# 1.220 23-Oct-2005 christos

No need to pass an interface when only the mtu is needed. From OpenBSD via
Liam Foy.


Revision tags: yamt-vop-base
# 1.219 05-Aug-2005 elad

branches: 1.219.2;
Add sysctls for IP, ICMP, TCP, and UDP statistics.


# 1.218 28-Jun-2005 seanb

branches: 1.218.2;
- Return ICMP_UNREACH_NET when no route found as per
section 4.3.3.1 of rfc1812.


# 1.217 09-Jun-2005 atatat

Properly fix the constipated lossage wrt -Wcast-qual and the sysctl
code. I know it's not the prettiest code, but it seems to work rather
well in spite of itself.


# 1.216 01-Jun-2005 blymn

Unconstify rnode to prevent compile error when GATEWAY option set.


Revision tags: kent-audio2-base
# 1.215 29-Apr-2005 yamt

move decl of inetsw to its own header to avoid array of incomplete type.
found by gcc4. reported by Adam Ciarcinski.


# 1.214 18-Apr-2005 yamt

fix problems related to loopback interface checksum omission. PR/29971.

- for ipv4, defer decision to ip layer as h/w checksum offloading does
so that it can check the actual interface the packet is going to.
- for ipv6, disable it.
(maybe will be revisited when it implements h/w checksum offloading.)

ok'ed by Jason Thorpe.


# 1.213 29-Mar-2005 yamt

ip_reass: clear stale csum_flags.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base
# 1.212 26-Feb-2005 perry

branches: 1.212.2;
nuke trailing whitespace


Revision tags: yamt-km-base2
# 1.211 03-Feb-2005 perry

ANSIfy function declarations


# 1.210 02-Feb-2005 perry

de-__P -- will ANSIfy .c files later.


Revision tags: yamt-km-base
# 1.209 24-Jan-2005 matt

branches: 1.209.2;
Add IFNET_FOREACH and IFADDR_FOREACH macros and start using them.


Revision tags: kent-audio1-beforemerge
# 1.208 19-Dec-2004 christos

branches: 1.208.2;
yamt's changes seem to fix all the checksumming issues. Turn the loopback
checksums back off so we can make sure that everything works.


# 1.207 17-Dec-2004 christos

Turn checksumming on loopback back on until we fix the bugs in it.
Connect over tcp on the loopback is broken:

4729 amq 0.000007 CALL connect(4,0x804f2a0,0x1c)
4729 amq 75.007420 RET connect -1 errno 60 Connection timed out


# 1.206 15-Dec-2004 thorpej

Don't perform checksums on loopback interfaces. They can be reenabled with
the net.inet.*.do_loopback_cksum sysctl.

Approved by: groo


Revision tags: kent-audio1-base
# 1.205 06-Oct-2004 darrenr

Add a comment to document what setting "srcrt" is really on about in ipintr()


# 1.204 29-Sep-2004 christos

PR/27081: Sean Boudreau: ip_input() bad csum count not incremented on sw csum


Revision tags: BEFORE-IPF413
# 1.203 25-May-2004 atatat

Sysctl descriptions under net subtree (net.key not done)


# 1.202 02-May-2004 darrenr

at line 543, we do a pullup here of hlen bytes into the mbuf,
so these later ones are superfluous.


# 1.201 01-May-2004 matt

Use EVCNT_ATTACH_STATIC{,2}


# 1.200 25-Apr-2004 simonb

Initialise (most) pools from a link set instead of explicit calls
to pool_init. Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.

Convert struct session, ucred and lockf to pools.


# 1.199 22-Apr-2004 matt

Constify protosw arrays. This can reduce the kernel .data section by
over 4K (if all the network protocols) are loaded.


# 1.198 01-Apr-2004 matt

In ip_reass_ttl_descr, make i signed since it's compared to >= 0


Revision tags: netbsd-2-0-base BEFORE-IPF411
# 1.197 24-Mar-2004 atatat

branches: 1.197.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.


# 1.196 15-Jan-2004 itojun

correct typo in 1.94 -> 1.95. pointed out by Shiva Shenoy


# 1.195 14-Dec-2003 thorpej

Fix syntax errors in CHECK_NMBCLUSTER_PARAMS().


# 1.194 14-Dec-2003 jonathan

Second part of hashed IP_reassembly changes:

When under pressure for mbufs or we have too many fragments in the IP
reassembly queue, drop half of all fragments. This multiplicative-drop
strategy ensures we return to a healthy state, even under borderline
denial-of-service from extremely lossy NFS-over-UDP peers.
The multiplicative-drop phase currently drops 50% of fragments, but
has pre-placed support for implementing drop-fractions other than 50%

The threshhold for the `drop-half' phase is the new variable,
ip_maxfrags which is calculated as nmbclusters/4.

ip_input.c now keeps ip_nmbclusters, a cached copy of nmbclusters.
Before using limits derived from nmbclusters, we check if nmbclusters
and ip_nmclusters are equal. If not, we recompute Ip parameters
derived from nmbclusters. Based on a suggestion by Jason Thorpe.
ip_maxfrags is currently auto-recalcuated.

The counters ip_nfrags and ip_nfragpacketsr are now declared static
and uninitialized (bss), to discourage tampering with them.


# 1.193 12-Dec-2003 scw

Make fast-ipsec and ipflow (Fast Forwarding) interoperate.

The idea is that we only clear M_CANFASTFWD if an SPD exists
for the packet. Otherwise, it's safe to add a fast-forward
cache entry for the route.

To make this work properly, we invalidate the entire ipflow
cache if a fast-ipsec key is added or changed.


# 1.192 08-Dec-2003 jonathan

Add new field ipq_nfrags to struct ipq. Maintain count of fragments
(fragments, not fragmented packets) in each queue entry.
Use ipq_nfrags to maintain a count of total fragments in reassembly queue.


# 1.191 07-Dec-2003 jonathan

KNF: s/unsigned/u_int/, in a couple of places I missed.


# 1.190 06-Dec-2003 jonathan

Replace the single global IP reassembly list/listhead, with a
hashtable of list-heads. Independently re-invented, then reworked to
match similar code in FreeBSD.


# 1.189 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.188 04-Dec-2003 scw

ipflow (IP fast forwarding) is not compatible with FAST_IPSEC either.

XXX: The decision whether or not to fast forward should be made
XXX: dynamically. Using the current approach seriously reduces
XXX: routing performance on gateways with IPsec enabled.


# 1.187 26-Nov-2003 itojun

define RANDOM_IP_ID by default (unifdef -DRANDOM_IP_ID).
one use remains in sys/netipsec, which is kept for freebsd source code compat.


# 1.186 24-Nov-2003 scw

For FAST_IPSEC, ipfilter gets to see wire-format IPsec-encapsulated packets
only. Decapsulated packets bypass ipfilter. This mimics current behaviour
for Kame IPsec.


# 1.185 19-Nov-2003 fvdl

Correct number of arguments to sysctl_rdint.


# 1.184 19-Nov-2003 jonathan

Patch back support for (badly) randomized IP ids, by request:

* Include "opt_inet.h" everywhere IP-ids are generated with ip_newid(),
so the RANDOM_IP_ID option is visible. Also in ip_id(), to ensure
the prototype for ip_randomid() is made visible.

* Add new sysctl to enable randomized IP-ids, provided the kernel was
configured with RANDOM_IP_ID. (The sysctl defaults to zero, and is
a read-only zero if RANDOM_IP_ID is not configured).

Note that the implementation of randomized IP ids is still defective,
and should not be enabled at all (even if configured) without
very careful deliberation. Caveat emptor.


# 1.183 17-Nov-2003 jonathan

Diff to netinet/ip_input.c (restore ip_id, initialize) for ip_id fix:

Revert the (default) ip_id algorithm to the pre-randomid algorithm,
due to demonstrated low-period repeated IDs from the randomized IP_id
code. Consensus is that the low-period repetition (much less than
2^15) is not suitable for general-purpose use.

Allocators of new IPv4 IDs should now call the function ip_newid().
Randomized IP_ids is now a config-time option, "options RANDOM_IP_ID".
ip_newid() can use ip_random-id()_IP_ID if and only if configured
with RANDOM_IP_ID. A sysctl knob should be provided.

This API may be reworked in the near future to support linear ip_id
counters per (src,dst) IP-address pair.


# 1.182 12-Nov-2003 itojun

KNF


# 1.181 11-Nov-2003 jonathan

Change global head-of-local-IP-address list from in_ifaddr to
in_ifaddrhead. Recent changes in struct names caused a namespace
collision in fast-ipsec, which are most cleanly fixed by using
"in_ifaddrhead" as the listhead name.


# 1.180 10-Nov-2003 jonathan

Make per-protocol network input queue stats visible to userland via
sysctl. Add a protocol-independent sysctl handler to show the per-protocol
"struct ifq' statistics. Add IP(v4) specific call to the handler.
Other protocols can show their per-protocol input statistics by
allocating a sysclt node and calling sysctl_ifq() with their own struct ifq *.

As posted to tech-kern plus improvements/cleanup suggested by Andrew Brown.


# 1.179 28-Sep-2003 mycroft

Remove some code that breaks AH tunnels completely. The comment describing
the purpose of this code appears to be on crack -- it's talking about
end-to-end authentication, but the purpose of an AH tunnel is NOT end-to-end
authentication; it's authentication of the tunnel endpoints.

NB: This does not fix the fact that IPsec leaks "packet tags."


# 1.178 06-Sep-2003 itojun

randomize IPv4/v6 fragment ID and IPv6 flowlabel. avoids predictability
of these fields. ip_id.c is from openbsd. ip6_id.c is adapted by kame.


# 1.177 06-Sep-2003 itojun

backout previous, we don't know if arc4random() corrides on reboot.


# 1.176 05-Sep-2003 itojun

initialize fragment ID with arc4random, not by time.tv_sec


# 1.175 22-Aug-2003 itojun

remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.


# 1.174 22-Aug-2003 itojun

change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.


# 1.173 15-Aug-2003 jonathan

(fast-ipsec): Add hooks to pass IPv4 IPsec traffic into fast-ipsec, if
configured with ``options FAST_IPSEC''. Kernels with KAME IPsec or
with no IPsec should work as before.

All calls to ip_output() now always pass an additional compulsory
argument: the inpcb associated with the packet being sent,
or 0 if no inpcb is available.

Fast-ipsec tested with ICMP or UDP over ESP. TCP doesn't work, yet.


# 1.172 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.171 14-Jul-2003 itojun

correct igmp. from love


# 1.170 03-Jul-2003 itojun

minor KNF


# 1.169 30-Jun-2003 itojun

branches: 1.169.2;
do not generate ICMP redirect when packet filter alters ip_dst to an
address that reside on the same link. Cedric Berger convinced me that
it is necessary.


# 1.168 30-Jun-2003 itojun

fix indent


# 1.167 23-Jun-2003 martin

Make sure to include opt_foo.h if a defflag option FOO is used.


# 1.166 15-Jun-2003 matt

Change the way multicasts are kept. They now use a hash table in the same
manner as the ifaddr hash table. By doing this, the mkludge code can go
away. At the same time, keep track of what pcbs are using what ifaddr and
when an address is deleted from an interface, notify/abort all sockets
that have that address as a source. Switch IGMP and multicasts to use pools
for allocation. Fix a number of potential problems in the igmp code where
allocation failures could cause a trap/panic.


# 1.165 11-Apr-2003 christos

PR/991: Darren Reed: Add a sysctl (checkinteface) to implement this. This
implementation is taken from FreeBSD, but we default to off.
XXX: We should really do this on a per ifaddr basis as jason suggested.


# 1.164 26-Feb-2003 matt

Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.163 12-Nov-2002 itojun

remove all entries in rt timer queue on ip_mtudisc change, instead of
destroying the queue.


# 1.162 12-Nov-2002 itojun

ckout previous - doesn't compile


# 1.161 12-Nov-2002 itojun

update ip_mtudisc sysctl change handling.


# 1.160 10-Nov-2002 itojun

always create pmtud timeout queue, as ip_mtudisc can be tweaked via
sysctl at runtime. From lha@stacken.kth.se


# 1.159 02-Nov-2002 perry

/*CONTCOND*/ while (0)'ed macros


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.158 23-Sep-2002 itojun

revert mtudisc_timeout value to the old one if update falis


# 1.157 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame.


# 1.156 11-Sep-2002 itojun

correct signedness mixup in pointer passing. sync w/kame


Revision tags: gehenna-devsw-base
# 1.155 14-Aug-2002 itojun

avoid swapping endian of ip_len and ip_off on mbuf, to meet with M_LEADINGSPACE
optimization made last year. should solve PR 17867 and 10195.

IP_HDRINCL behavior of raw ip socket is kept unchanged. we may want to
provide IP_HDRINCL variant that does not swap endian.


# 1.154 30-Jun-2002 thorpej

Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).


# 1.153 13-Jun-2002 itojun

set IPv4 parameter to modern value.
- turn on path MTU discovery (previous: turned off)
- ICMPv4 redirect entry timeout = 600 sec (previous: never timeout)


# 1.152 09-Jun-2002 itojun

whitespace


# 1.151 07-Jun-2002 itojun

look at rmx_mtu on IPsec tunnel MTU computation.
From: David Waitzman <djw@bbn.com>


Revision tags: netbsd-1-6-base
# 1.150 12-May-2002 matt

branches: 1.150.2; 1.150.4;
Eliminate commons.


# 1.149 12-May-2002 wiz

Spelling fixes, from Sergey Svishchev in kern/16650.


# 1.148 07-May-2002 matt

Change struct ipqe to use TAILQ's instead of LIST's (primarily for TCP's
benefit currently). Rework tcp_reass code to optimize the 4 most likely causes
of out-of-order packets: first OoO pkt, next OoO pkt in seq, OoO pkt is part
of new chuck of OoO packets, and the OoO pkt fills the first hole. Add evcnts
to instrument tcp_reass (enabled by the options TCP_REASS_COUNTERS). This is
part 1/2 of tcp_reass changes.


# 1.147 18-Apr-2002 matt

Change test for M_EXT to M_READONLY for MROUTING. We only need to to do
a pullup if we aren't allowed to modify the packet.


Revision tags: eeh-devprop-base newlock-base
# 1.146 08-Mar-2002 thorpej

Pool deals fairly well with physical memory shortage, but it doesn't
deal with shortages of the VM maps where the backing pages are mapped
(usually kmem_map). Try to deal with this:

* Group all information about the backend allocator for a pool in a
separate structure. The pool references this structure, rather than
the individual fields.
* Change the pool_init() API accordingly, and adjust all callers.
* Link all pools using the same backend allocator on a list.
* The backend allocator is responsible for waiting for physical memory
to become available, but will still fail if it cannot callocate KVA
space for the pages. If this happens, carefully drain all pools using
the same backend allocator, so that some KVA space can be freed.
* Change pool_reclaim() to indicate if it actually succeeded in freeing
some pages, and use that information to make draining easier and more
efficient.
* Get rid of PR_URGENT. There was only one use of it, and it could be
dealt with by the caller.

From art@openbsd.org.


Revision tags: ifpoll-base
# 1.145 25-Feb-2002 itojun

correctly enforce ipsec policy check on forwarding case.
From: Greg Troxel <gdt@ir.bbn.com>, Bill Chiarchiaro <wjc@work.cleartech.com>


# 1.144 24-Feb-2002 martin

Clear M_BCAST and M_MCAST on outgoing mbufs.
Don't copy ttl from the inner packet to the encapsulating packet. Make
the outer ttl sysctl'able. This should close PR 14269 from Jasper Wallace
(change partly from there) and it makes traceroute work over gre tunnels.


# 1.143 21-Feb-2002 itojun

suppress source quence message, based on router-req RFC (also could be abused
as DoS traffic generator). from kjc/kame


# 1.142 28-Nov-2001 darrenr

recompute hlen after calling pfil_run_hooks() in case ip_hl was changed.


# 1.141 13-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-mips-cache-base
# 1.140 04-Nov-2001 matt

Convert netinet to not use the internal <sys/queue.h> field names
but instead the access macros. Use the FOREACH macros where appropriate.


# 1.139 04-Nov-2001 matt

Change a few variable/tables to const since they are read-only.


# 1.138 29-Oct-2001 simonb

Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.137 17-Sep-2001 thorpej

branches: 1.137.2;
Split the pre-computed ifnet checksum flags into Tx and Rx directions.
Add capabilities bits that indicate an interface can only perform
in-bound TCPv4 or UDPv4 checksums. There is at least one Gig-E chip
for which this is true (Level One LXT-1001), and this is also the
case for the Intel i82559 10/100 Ethernet chips.


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.136 06-Aug-2001 itojun

branches: 1.136.2;
cache IPsec policy on in6?pcb. most of the lookup operations can be bypassed,
especially when it is a connected SOCK_STREAM in6?pcb. sync with kame.


# 1.135 02-Jun-2001 thorpej

branches: 1.135.2;
Implement support for IP/TCP/UDP checksum offloading provided by
network interfaces. This works by pre-computing the pseudo-header
checksum and caching it, delaying the actual checksum to ip_output()
if the hardware cannot perform the sum for us. In-bound checksums
can either be fully-checked by hardware, or summed up for final
verification by software. This method was modeled after how this
is done in FreeBSD, although the code is significantly different in
most places.

We don't delay checksums for IPv6/TCP, but we do take advantage of the
cached pseudo-header checksum.

Note: hardware-assisted checksumming defaults to "off". It is
enabled with ifconfig(8). See the manual page for details.

Implement hardware-assisted checksumming on the DP83820 Gigabit Ethernet,
3c90xB/3c90xC 10/100 Ethernet, and Alteon Tigon/Tigon2 Gigabit Ethernet.


# 1.134 21-May-2001 lukem

fix spelo in comment


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.133 16-Apr-2001 itojun

give a default value to net.inet.ip.maxfragpackets, to protect us from
"lots of fragmented packets" DoS attack.

the current default value is derived from ipv6 counterpart, which is
a magical value "200". it should be enough for normal systems, not sure
if it is enough when you take hundreds of thousands of tcp connections on
your system. if you have proposal for a better value with concrete reasons,
let me know.


# 1.132 13-Apr-2001 thorpej

Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.


# 1.131 27-Mar-2001 itojun

net.inet.ip.maxfragpackets defines the maximum size of ip reass queue
(prevents fragment flood from chewing up mbuf memory space).
derived from KAME net.inet6.ip6.maxfragpackets.


# 1.130 02-Mar-2001 itojun

branches: 1.130.2;
increase ipstat.ips_badaddr if the packet fails to pass address checks.


# 1.129 02-Mar-2001 itojun

reject packets with 127/8 on IPv4 src/dst, they must not appear on wire
(RFC1122). torture-tests will be welcomed.
XXX do we want to check source routing headers as well?


# 1.128 01-Mar-2001 itojun

make sure to enforce inbound ipsec policy checking, for any protocols on top
of ip (check it when final header is visited). sync with kame.
XXX kame team will need to re-check policy engine code


# 1.127 24-Jan-2001 itojun

- record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation


# 1.126 28-Dec-2000 thorpej

Back out the sledgehammer damage applied by wiz while I was out for
the holiday.


# 1.125 25-Dec-2000 wiz

Back out previous change. It causes NAT to fail, and was CLEARLY
NOT TESTED before it was committed.


# 1.124 22-Dec-2000 thorpej

Slight adjustment to how pfil_head's are registered. Instead of a
"key" and a "dlt", use a "type" (PFIL_TYPE_{AF,IFNET} for now) and
a val/ptr appropriate for that type. This allows for more future
flexibility with the pfil_hook mechanism.


# 1.123 14-Dec-2000 thorpej

Add ALTQ glue. XXX Temporary until ALTQ is changed to use a pfil hook.


# 1.122 24-Nov-2000 itojun

IFA_STATS stability (not complete); don't touch ip if it is NULL.


# 1.121 11-Nov-2000 thorpej

Restructure the PFIL_HOOKS mechanism a bit:
- All packets are passed to PFIL_HOOKS as they come off the wire, i.e.
fields in protocol headers in network order, etc.
- Allow for multiple hooks to be registered, using a "key" and a "dlt".
The "dlt" is a BPF data link type, indicating what type of header is
present.
- INET and INET6 register with key == AF_INET or AF_INET6, and
dlt == DLT_RAW.
- PFIL_HOOKS now take an argument for the filter hook, and mbuf **,
an ifnet *, and a direction (PFIL_IN or PFIL_OUT), thus making them
less IP (really, IP Filter) centric.

Maintain compatibility with IP Filter by adding wrapper functions for
IP Filter.


# 1.120 08-Nov-2000 ad

Update for hashinit() change.


# 1.119 13-Oct-2000 itojun

make sure we don't share external mbuf between m and mcopy, in ip_forward().
should solve PR 11201.


# 1.118 26-Aug-2000 itojun

make sure anonport{min,max} is not negative number


# 1.117 25-Aug-2000 tron

Add new sysctl variables "net.inet.ip.lowportmin" and
"net.inet.ip.lowportmax" which can be used to the set minimum
and maximum port number assigned to sockets using
IP_PORTRANGE_LOW.


# 1.116 06-Jul-2000 itojun

remove unnecessary #include <netkey/key_debug.h>. from kame.


# 1.115 28-Jun-2000 mrg

<vm/vm.h> -> <uvm/uvm_extern.h>


Revision tags: netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.114 10-May-2000 itojun

branches: 1.114.4;
add missing boundary checks to ip options processing.
correct timestamp option validation (len and ptr upper/lower bound
based on RFC791).
fill "pointer" field for parameter problem in timestamp option processing.


# 1.113 10-May-2000 itojun

correct more out-of-bounds memory access, if cnt == 1 and optlen > 1.


# 1.112 06-May-2000 sommerfeld

Handle large offsets with very small options correctly.


# 1.111 31-Mar-2000 jdolecek

Slighly improve previous - only include <netinet/ip_mroute.h> if MROUTING
is defined.


# 1.110 31-Mar-2000 jdolecek

include <netinet/ip_mroute.h> for ip_mforward() - needed after
last duplicate prototype sweep (prototype for ip_mforward() used to be in <netinet/ip_var.h>)


# 1.109 30-Mar-2000 augustss

Remove register declarations.


# 1.108 30-Mar-2000 simonb

Delete uninitialised declaration of ip_defttl - there's an initialised
decl earlier in this file.


# 1.107 10-Mar-2000 thorpej

Back out previous, and adjust a comment.


# 1.106 07-Mar-2000 thorpej

Back out part of 1.104 which isn't actually needed.


# 1.105 03-Mar-2000 itojun

remove unnecessary ttl initialization which I mistakingly bringed in
during KAME merge (this is part of WIDE's expeirmental reass code...)
NetBSD PR: 9412
From: Wolfgang Rupprecht <wolfgang@wsrcc.com>
Fix from: ho@crt.se
itojun was notified from: theo


# 1.104 02-Mar-2000 thorpej

Avoid a bug in GCC which manifests itself when processing unaligned
IP options. Problem pointed out by Matt Hargett and Erik Fair, analyzed
by me.


# 1.103 01-Mar-2000 itojun

introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)


# 1.102 20-Feb-2000 darrenr

pass "struct pfil_head *" to pfil_add_hook and pfil_remove hook rather
than "struct protosw *".


# 1.101 17-Feb-2000 darrenr

Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.


# 1.100 16-Feb-2000 itojun

- if ip_dst matches address on !IFF_UP interface, and
- there's no match against addresses on IFF_UP interface,
send icmp unreach if I'm router. drop it if I'm host.

Revised version of PR: 9387 from nrt@iij.ad.jp. Discussed with thorpej+nrt.


Revision tags: chs-ubc2-newbase
# 1.99 12-Feb-2000 thorpej

Typo (Thanks, Havard :-)


# 1.98 12-Feb-2000 thorpej

Small cosmetic change, and note a place where a statistic should be
gathered.


# 1.97 11-Feb-2000 itojun

fix in-kernel packet forwarding loop (till TTL becomes 0) when:
- a packet is delivered to an address X,
- and the address X is configured on my !IFF_UP interface
- and ipforwarding=1

NetBSD PR: 9387
From: nrt@iij.ad.jp


# 1.96 01-Feb-2000 thorpej

Use ifatoia() and sintosa() consistently, rather than using home-grown
casting macros intermixed.


# 1.95 31-Jan-2000 itojun

bring in latest KAME ipsec tree.
- interop issues in ipcomp is fixed
- padding type (after ESP) is configurable
- key database memory management (need more fixes)
- policy specification is revisited

XXX m->m_pkthdr.rcvif is still overloaded - hope to fix it soon


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.94 26-Oct-1999 itojun

disable ipflow (IPv4 fast fowarding) when IPsec is configured into the kernel.


# 1.93 17-Oct-1999 sommerfeld

branches: 1.93.2; 1.93.4;
In ip_forward():

Avoid forwarding ip unicast packets which were contained inside
link-level multicast packets; having M_MCAST still set in the packet
header flags will mean that the packet will get multicast to a bogus
group instead of unicast to the next hop.

Malformed packets like this have occasionally been spotted "in the
wild" on a mediaone cable modem segment which also had multiple netbsd
machines running as router/NAT boxes.

Without this, any subnet with multiple netbsd routers receiving all
multicasts will generate a packet storm on receipt of such a
multicast. Note that we already do the same check here for link-level
broadcasts; ip6_forward already does this as well.

Note that multicast forwarding does not go through ip_forward().

Adding some code to if_ethersubr to sanity check link-level
vs. ip-level multicast addresses might also be worthwhile.


Revision tags: chs-ubc2-base
# 1.92 23-Jul-1999 itojun

branches: 1.92.2;
do not include unnecessary include files.


# 1.91 09-Jul-1999 thorpej

defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).


# 1.90 06-Jul-1999 itojun

sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups


# 1.89 01-Jul-1999 itojun

IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.88 26-Jun-1999 sommerfeld

If the new global variable hostzerobroadcast is zero, no longer assume
address zero of each net/subnet is a broadcast address.
(The default value is nonzero, which preserves the current behavior).

This can be set using sysctl; the boot-time default can also be
configured using the HOSTZEROBROADCAST kernel config option.

While we're here, defopt HOSTZEROBROADCAST and SUBNETSARELOCAL


# 1.87 04-May-1999 hwr

It does not make much sense to increase a "output" counter on input.


# 1.86 03-May-1999 thorpej

In INADDR_TO_IA(), skip interfaces which are not up. Revert previous change
to ip_input.c to check the interface status after INADDR_TO_IA().

Fix cooked up by Heiko Rupp and myself.

Fixes PR 7480.


# 1.85 03-May-1999 hwr

Drop packets, that have a Class-D address as source address.
Implements the first half of PR 7003.


# 1.84 07-Apr-1999 proff

tiny KNF change


# 1.83 07-Apr-1999 proff

Prevent reception of packets on downed interfaces (via an up interface).
fixes kern/7327


Revision tags: netbsd-1-4-base
# 1.82 27-Mar-1999 aidan

branches: 1.82.2;
Added per-addr input/output statistics. Currently just support netatalk
and netinet, currently only tested under netinet.

Disabled by default, enabled by compiling the kernel with option
IFA_STATS. Enabling this feature seems to make the ip_output function
take 13% longer than before, which should be OK for people that need
this feature.


# 1.81 26-Mar-1999 proff

security: test for ip_len < ip_hl <<2 and drop packet accordingly


# 1.80 19-Jan-1999 mycroft

There's just no plausible reason to byte-swap ip_id internally. It's opaque.


# 1.79 19-Jan-1999 mycroft

Don't screw with ip_len; just subtract from it where we actually use the
value.


# 1.78 19-Jan-1999 mycroft

Don't overwrite the checksum fields when checking them. There's no reason to
do this, and it screws up ICMP replies.
XXX The returned IP checksum and length are still wrong.


# 1.77 11-Jan-1999 thorpej

Fix byte order and ip_len inconsistencies in ICMP reply code. Also, fix
some formatting and HTONS(foo) vs. foo = htons(foo) inconsistencies.

PR #6602, Darren Reed.


# 1.76 19-Dec-1998 thorpej

Reverse the copyright-notice-swap. It went against existing practice.


# 1.75 18-Dec-1998 thorpej

Add a lock around the IP fragment reassembly queue, to prevent ip_drain()
from corrupting the queue if called from a device's interrupt context.

Should fix PR #5684.


Revision tags: kenh-if-detach-base
# 1.74 13-Nov-1998 thorpej

branches: 1.74.2;
Once a fragmented IP packet has been reassembled, recompute the packet
length before passing it up the stack. From FreeBSD.


Revision tags: chs-ubc-base
# 1.73 08-Oct-1998 thorpej

Use the pool allocator for ipflow entries.


# 1.72 08-Oct-1998 thorpej

Use the pool allocator for ipqent structures.


# 1.71 30-Sep-1998 tls

Switch order of TNF and UCB copyrights so UCB copyright is first; this seems more appropriate since UCB wrote the original code, after all.


# 1.70 09-Sep-1998 thorpej

Make a diagnostic printf more sensible, PR #5951, Heiko W. Rupp.


# 1.69 09-Aug-1998 mrg

defopt PFIL_HOOKS.


Revision tags: eeh-paddr_t-base
# 1.68 17-Jul-1998 sommerfe

Fix PR5508: ipfil cut-through forwarding causes panic


# 1.67 01-Jun-1998 thorpej

Protect the ipflow_reap() call with splsoftnet.


# 1.66 24-May-1998 thorpej

Fix OBOB in IP timestamp option processing, as noted in FreeBSD PR 6738,
from Jennifer Dawn Meyers <jdm@enteract.com>.


# 1.65 04-May-1998 matt

Default IP flow to being enabled. Add a sysctl to control the maximum
number of flows (net.inet.ip.maxflows). If set to 0, will disable fast
path forwarding.


# 1.64 01-May-1998 thorpej

Allow packet filters to prevent a packet from creating a fast-forwarding
flow, by setting the "can fast forward" flag in the packet header, and
giving a chance for filters to clear the flag. If the flag is still
set after the filters have given it a chance, the packet will be used
to create a fast-forward flow entry.


# 1.63 29-Apr-1998 matt

Add support for "fast" forwarding. Add hooks in if_ethersubr.c and
if_fddisubr.c to fastpath IP forwarding. If ip_forward successfully
forwards a packet, it will create a cache (ipflow) entry. ether_input
and fddi_input will first call ipflow_fastforward with the received
packet and if the packet passes enough tests, it will be forwarded (the
ttl is decremented and the cksum is adjusted incrementally).


# 1.62 29-Apr-1998 matt

defopt GATEWAY


# 1.61 29-Apr-1998 kml

change path MTU timeout value to match RFC 1191


# 1.60 29-Apr-1998 kml

Add support for deletion of routes added by path MTU discovery;
uses new generic route timeout code. Add sysctl for timeout period.


# 1.59 19-Mar-1998 mrg

convert pfil(9) in and out lists from <sys/queue.h> LISTs to TAILQs, and
change pfil_add_hook to put output filters at the tail of the queue,
while continuing to place input filters at the head of the queue. update
the two users of these functions, and document these changes.

fixes PR#4593.


# 1.58 15-Feb-1998 tls

Add correct copyright notice for IP address hash change. This code is donated to TNF by the original copyright holder, Panix.


# 1.57 13-Feb-1998 tls

Change list of interface IP addresses to a hash. Improves performance on hosts with a large number of IP addresses significantly.


# 1.56 28-Jan-1998 thorpej

Use offsetof() from libkern.h


# 1.55 12-Jan-1998 scottr

Use option header file for MROUTING


# 1.54 05-Jan-1998 lukem

enhance ephemeral port allocation code:
* support sysctl net.inet.ip.anonportmin (lowest ephemeral port)
and net.inet.ip.anonportmax (highest ephemeral port).
these can't be set to >65535, < IPPORT_RESERVED (unless IPNOPRIVPORTS
is defined), and anonportmin has to be < anonportmax.
* use a cleaner way of only cycling through the available set once;
this will be useful for when a random allocation scheme is used
* define IPPORT_ANON{MIN,MAX} instead of IPPORT_USER{LOW,HIGH}


Revision tags: netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base
# 1.53 18-Oct-1997 kml

branches: 1.53.2;
change sysctl net.inet.icmp.mtudisc to net.inet.ip.mtudisc


# 1.52 17-Oct-1997 thorpej

Allow `subnetsarelocal' to be changed via sysctl.


Revision tags: thorpej-signal-base marc-pcmcia-base
# 1.51 29-Aug-1997 gwr

Tweaks to allow operation with an interface address of 0.0.0.0
(needed for NFS mountroot using BOOTP to get boot parameters)


Revision tags: marc-pcmcia-bp
# 1.50 24-Jun-1997 thorpej

branches: 1.50.4;
Eliminate use of dtom() from the network code, allowing more flexible
use of mbuf external storage and increasing performance (by eliminating
an m_pullup() for clusters in the IP reassembly code).

Changes from Koji Imada <koji@math.human.nagoya-u.ac.jp>, in PR #3628
and #3480, with ever-so-slight integration changes by me.


# 1.49 15-Apr-1997 christos

Move the mtod calls *after* we've made sure that the packet has passed the
filter successfully. Otherwise it can be NULL if the filter blocked it,
and we die. How did this ever work?


Revision tags: is-newarp-before-merge
# 1.48 26-Feb-1997 mrg

allow src-routed packetd by default, per host requirements


# 1.47 25-Feb-1997 cjs

Add net.inet.ip.allowsrcrt option which allows/drops all source
routed packets. This currently defaults to `drop,' but once we
verify that all applications that rely on determining remote IP
addresses for authentication are dropping the connection when they
see a source route option (not just disabling the source route
option), we can turn this back on and conform with the host
requirements.


# 1.46 19-Feb-1997 cjs

Fix bug in sysctl net.inet.ip.forwsrcrt handing: now you can read it
if securelevel > 0. (Thanks, cgd.)


# 1.45 18-Feb-1997 mrg

pseudo-device ipfilter brings in PFIL_HOOKS.


Revision tags: is-newarp-base
# 1.44 11-Jan-1997 thorpej

branches: 1.44.4;
Implement the IP_RECVIF socket option: supply a datagram packet's incoming
interface using a sockaddr_dl in a control mbuf.

Implement SO_TIMESTAMP for IP datagrams.

Move packet information option processing into a generic function
so that they work with multicast UDP and raw IP as well as unicast UDP.

Contributed by Bill Fenner <fenner@parc.xerox.com>.


# 1.43 20-Dec-1996 mrg

in pfil_hooks: always reassign ip after calling hook.


# 1.42 20-Dec-1996 mrg

remove pfil_bad.


# 1.41 25-Oct-1996 thorpej

Before concatenating frags, sanity check the length of the packet. If it's
larger than IP_MAXPACKET, discard it.
Based on a patch from Bill Fenner <fenner@parc.xerox.com>


# 1.40 22-Oct-1996 veego

Fix a panic from the pfil_hooks.


# 1.39 13-Oct-1996 christos

backout previous kprintf changes


# 1.38 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.37 21-Sep-1996 perry

commit fix in pr 2772 -- the IP input code was assuming that the
reserved (must be zero) flag must necessarily be zero. We now define
an IP_RF (by analogy to IP_DF and IP_MF) and mask it out when necessary.


# 1.36 14-Sep-1996 mrg

move the packet filter hooks in to a saner location. while i'm here, rename
PACKET_FILTER to PFIL_HOOKS.


# 1.35 09-Sep-1996 mycroft

Add in_nullhost() and in_hosteq() macros, to hide some protocol
details. Also, fix a bug in TCP wrt SYN+URG packets.


# 1.34 08-Sep-1996 mycroft

Save 68 bytes of the packet for ICMP, not 64. From Laine Stump, PR 2296.


# 1.33 06-Sep-1996 mrg

add packet filter interface code. see pfil(9) for more details. you
need the PACKET_FILTER option to enable this code. currently, ipfilter
version 3.1.1-beta has been converted to use this new interface.


# 1.32 14-Aug-1996 thorpej

Fix some DIAGNOSTIC printf() formats; ntohl() provides a 32-bit quantity,
and should be printed with %x, not %lx.


# 1.31 10-Jul-1996 cgd

print result of ntohl/htonl as a long. (makes -Wformat work on the
Alpha.)


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.30 16-Mar-1996 christos

branches: 1.30.4;
Fix printf format args.


# 1.29 26-Feb-1996 mrg

two more local addr changes, all done differently now (idea from charles)


# 1.28 13-Feb-1996 christos

netinet prototypes


# 1.27 16-Jan-1996 thorpej

Add a net.inet.ip.directed-broadcast sysctl as suggested by
Darren Reed <darrenr@vitruvius.arbld.unimelb.edu.au> in PR #1227.
This change is slightly different than the one submitted by Darren in
that the DIRECTED_BROADCAST compile-time option will behave like it used
to so that existing configurations utilizing it won't have to change.


# 1.26 15-Jan-1996 thorpej

Add net.inet.ip.forwsrcrt: if zero, the system will not forward
source-routed packets. Note this value is protected by kernel security
level; it can only be changed if securelevel < 1.


# 1.25 21-Nov-1995 cgd

make netinet work on systems where pointers and longs are 64 bits
(like the alpha). Biggest problem: IP headers were overlayed with
structure which included pointers, and which therefore didn't overlay
properly on 64-bit machines. Solution: instead of threading pointers
through IP header overlays, add a "queue element" structure to do
the threading, and point it at the ip headers.


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.24 12-Aug-1995 mycroft

splnet --> splsoftnet


# 1.23 12-Jun-1995 mycroft

Change in_pcbnotify*() to take an errno value. Make inetctlerrmap[] an
array on ints, not u_chars.


# 1.22 12-Jun-1995 mycroft

Various cleanup, including:
* Convert several data structures to use queue.h.
* Split in_pcbnotify() into two parts; one for notifying a specific PCB, and
one for notifying all PCBs for a particular foreign address.


# 1.21 07-Jun-1995 mycroft

Remove ip_ifmatrix completely.


# 1.20 04-Jun-1995 mycroft

Don't cast things unnecessarily.


# 1.19 04-Jun-1995 mycroft

Clean up many more casts.


# 1.18 01-Jun-1995 mycroft

Avoid byte-swapping IP addresses at run time.


# 1.17 15-May-1995 cgd

oops; forgot a '{'


# 1.16 14-May-1995 cgd

drop (and record) malformed IP fragments. Fixes pr 1030 (differently).


# 1.15 13-Apr-1995 cgd

be a bit more careful and explicit with types. (basically a large no-op.)


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.14 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.13 13-May-1994 mycroft

Update to 4.4-Lite networking code, with a few local changes.


# 1.12 14-Feb-1994 mycroft

PARANOID --> DIAGNOSTIC for inexpensive tests.


# 1.11 02-Feb-1994 hpeyerl

Multicast is no longer optional.


# 1.10 29-Jan-1994 brezak

Fix some cases of NOT dealing with m_pkthdr's. This code is still suspect though, at least this fixes some panics.


# 1.9 10-Jan-1994 mycroft

Should compile now with or without `options MULTICAST'.


# 1.8 09-Jan-1994 mycroft

Prototype the rest.


# 1.7 08-Jan-1994 mycroft

More prototypes.


# 1.6 08-Jan-1994 mycroft

Fix some inconsistent spacing; spaces at the end of lines, etc.


# 1.5 18-Dec-1993 mycroft

Canonicalize all #includes.


# 1.4 06-Dec-1993 hpeyerl

multicast support.
>From Chris Maeda, cmaeda@cs.washington.edu
These patches are derived from the IP Multicast patches for BSDI.


Revision tags: magnum-base netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.3 20-May-1993 cgd

branches: 1.3.4;
more rcsid additions and file header cleanups


# 1.2 04-May-1993 cgd

make ip_input recursion checking be for -DPARANOID, and make it panic


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.359 19-Jul-2017 ozaki-r

Correct a comment


Revision tags: perseant-stdc-iso10646-base
# 1.358 08-Jul-2017 christos

Reorder the controls to the ones that need an interface and the ones that
don't; process the ones that don't first. Add a DIAGNOSTIC if there is no
interface; really this should be a KASSERT/panic because it is a bug if the
interface is not set at this point.


# 1.357 06-Jul-2017 christos

remove unnecessary casts (no functional change)


# 1.356 06-Jul-2017 christos

Merge the two copies SO_TIMESTAMP/SO_OTIMESTAMP processing to a single
function, and add a SOOPT_TIMESTAMP define reducing compat pollution from
5 places to 1.


Revision tags: netbsd-8-base
# 1.355 01-Jun-2017 chs

remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base
# 1.354 31-Mar-2017 ozaki-r

Don't use a single global variable to store source route information for multiple incoming packets

It's not MP-safe. So use a m_tag to store the information instead.

Pointed out by knakahara@
The fix is from OpenBSD (originally fixed in FreeBSD)


# 1.353 31-Mar-2017 ozaki-r

Don't use a single global variable as a temporal storage for multiple packets

It's not MP-safe. So use local variables instead.


Revision tags: pgoyette-localcount-20170320
# 1.352 06-Mar-2017 ozaki-r

Make sure icmp_redirect_timeout_q and ip_mtudisc_timeout_q are initialized on bootup

Fix PR kern/52029


# 1.351 17-Feb-2017 ozaki-r

Fix return value


# 1.350 17-Feb-2017 ozaki-r

Protect sysctl_net_inet_ip_pmtudto with icmp_mtx instead of softnet_lock


# 1.349 07-Feb-2017 ozaki-r

Add missing NULL checks for m_get_rcvif


Revision tags: nick-nhusb-base-20170204
# 1.348 24-Jan-2017 ozaki-r

Tweak softnet_lock and NET_MPSAFE

- Don't hold softnet_lock in some functions if NET_MPSAFE
- Add softnet_lock to sysctl_net_inet_icmp_redirtimeout
- Add softnet_lock to expire_upcalls of ip_mroute.c
- Restore softnet_lock for in{,6}_pcbpurgeif{,0} if NET_MPSAFE
- Mark some softnet_lock for future work


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107
# 1.347 12-Dec-2016 ozaki-r

branches: 1.347.2;
Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.346 08-Dec-2016 ozaki-r

Use psref for ip_rtaddr

ip_rtaddr will be sleepable soon. So use psref instead of pserialize.


# 1.345 08-Dec-2016 ozaki-r

Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.344 18-Oct-2016 ozaki-r

Don't hold global locks if NET_MPSAFE is enabled

If NET_MPSAFE is enabled, don't hold KERNEL_LOCK and softnet_lock in
part of the network stack such as IP forwarding paths. The aim of the
change is to make it easy to test the network stack without the locks
and reduce our local diffs.

By default (i.e., if NET_MPSAFE isn't enabled), the locks are held
as they used to be.

Reviewed by knakahara@


# 1.343 18-Oct-2016 ozaki-r

Avoid double frees of mbuf

May fix one of panicks reported by Tom Ivar Helbekkmo in PR kern/51522


# 1.342 11-Oct-2016 ozaki-r

Fix kernel builds with IFA_STATS


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.341 07-Sep-2016 roy

Disallow input to detached addresses because they are not yet valid.


# 1.340 31-Aug-2016 ozaki-r

Make ipforward_rt and ip6_forward_rt percpu

Sharing one rtcache between CPUs is just a bad idea.

Reviewed by knakahara@


Revision tags: pgoyette-localcount-20160806
# 1.339 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


# 1.338 26-Jul-2016 ozaki-r

Fix downmatch increment


Revision tags: pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.337 08-Jul-2016 ozaki-r

branches: 1.337.2;
CID 1363344: remove dead code

We may need to reconsider a case when m_get_rcvif_psref returns NULL.


# 1.336 07-Jul-2016 ozaki-r

Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.


# 1.335 06-Jul-2016 ozaki-r

Switch the IPv4 address list to pslist(9)

Note that we leave the old list just in case; it seems there are some
kvm(3) users accessing the list. We can remove it later if we confirmed
nobody does actually.


# 1.334 06-Jul-2016 ozaki-r

Add and use pslist(9)-based hashtable for IPv4 addresses

Note that we leave the old hashtable to keep vmstat -H working.


# 1.333 04-Jul-2016 ozaki-r

Separate IP address matching functions

No functional change intended.


# 1.332 30-Jun-2016 ozaki-r

Tidy up goto lables

No functional change.


# 1.331 30-Jun-2016 ozaki-r

Fix error paths

Some error paths did m_put_rcvif_psref twice.


# 1.330 28-Jun-2016 ozaki-r

Add missing NULL checks for m_get_rcvif_psref


# 1.329 10-Jun-2016 ozaki-r

Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319
# 1.328 21-Jan-2016 riastradh

Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.


# 1.327 21-Jan-2016 riastradh

Give proper prototype to ip_output.


# 1.326 08-Jan-2016 knakahara

eliminate ip_input.c and ip6_input.c dependency on gif(4)


Revision tags: nick-nhusb-base-20151226
# 1.325 13-Oct-2015 roy

Include arp.h to restore the sysctl net.inet.ip.dad_count.
Fixes PR kern/49883 thanks to HITOSHI Osada.


Revision tags: nick-nhusb-base-20150921
# 1.324 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.323 07-Aug-2015 ozaki-r

Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.


Revision tags: nick-nhusb-base-20150606
# 1.322 02-May-2015 joerg

Fix !ARP build.


# 1.321 02-May-2015 roy

Add IPv4 address flags IN_IFF_TENTATIVE, IN_IFF_DUPLICATED and
IN_IFF_DETATCHED to mimic the IPv6 address behaviour.
Add SIOCGIFAFLAG_IN ioctl to retrieve the address flag via the
ifreq structure.
Add IPv4 DAD detection via the ARP methods described in RFC 5227.
Add sysctls net.inet.ip.dad_count and net.inet.arp.debug.

Discussed on tech-net@


Revision tags: nick-nhusb-base-20150406
# 1.320 26-Mar-2015 ozaki-r

Tidy up the regular path of ip_forward

No functional change is intended.


Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.319 16-Jun-2014 ozaki-r

branches: 1.319.4;
Add 3rd argument to pktq_create to pass sc

It will be used to pass bridge sc for bridge_forward softint.

ok rmind@


# 1.318 05-Jun-2014 rmind

- Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.


# 1.317 30-May-2014 christos

Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.


# 1.316 29-May-2014 rmind

Make IGMP and multicast group management code MP-safe. Use a read-write
lock to protect the hash table of multicast address records; also, make it
private and eliminate some macros. In the long term, the lookup path ought
to be optimised.


# 1.315 28-May-2014 christos

CID 12164{49,51}: Remove bogus ifp == NULL checks; if ifp was really NULL,
we would have been dead a few lines before the tests.


# 1.314 23-May-2014 rmind

ip_input(), ip_savecontrol(): cache m->m_pkthdr.rcvif in a variable.


# 1.313 23-May-2014 rmind

Make ip_forward() static, there is no need to expose it.


# 1.312 23-May-2014 rmind

Make ip_input() static, there is no need to expose it.


# 1.311 22-May-2014 rmind

- Add in_init() and move some functions, variables and sysctls into in.c
where they belong to. Make some functions and variables static.
- ip_input.c: reduce some #ifdefs, cleanup a little.
- Move some sysctls into ip_flow.c as they belong there.

No functional change.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.310 19-Mar-2014 liamjfoy

branches: 1.310.2;
Remove ipflow_prune and replace with ipflow_reap. ok rmind@


Revision tags: riastradh-drm2-base3
# 1.309 25-Feb-2014 pooka

Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.308 29-Jun-2013 rmind

- Rewrite parts of pfil(9): use array to store hooks and thus be more cache
friendly (there are only few hooks in the system). Make the structures
opaque and the interface more strict.
- Remove PFIL_HOOKS option by making pfil(9) mandatory.


# 1.307 27-Jun-2013 christos

branches: 1.307.2;
flip src/dst


# 1.306 27-Jun-2013 christos

implement IP_PKTINFO and IP_RECVPKTINFO.


# 1.305 08-Jun-2013 rmind

Split IPsec code in ip_input() and ip_forward() into the separate routines
ipsec4_input() and ipsec4_forward(). Tested by christos@.


# 1.304 05-Jun-2013 christos

IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.


Revision tags: agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.303 29-Nov-2012 christos

Add a new sysctl to mark ports as reserved, so that they are not used in
the anonymous or reserved port allocation.


Revision tags: yamt-pagecache-base6
# 1.302 25-Jun-2012 christos

branches: 1.302.2;
rename rfc6056 -> portalgo, requested by yamt


# 1.301 22-Jun-2012 christos

PR/46602: Move the rfc6056 port randomization to the IP layer.


# 1.300 02-Jun-2012 dsl

Add some pre-processor magic to verify that the type of the data item
passed to sysctl_createv() actually matches the declared type for
the item itself.
In the places where the caller specifies a function and a structure
address (typically the 'softc') an explicit (void *) cast is now needed.
Fixes bugs in sys/dev/acpi/asus_acpi.c sys/dev/bluetooth/bcsp.c
sys/kern/vfs_bio.c sys/miscfs/syncfs/sync_subr.c and setting
AcpiGbl_EnableAmlDebugObject.
(mostly passing the address of a uint64_t when typed as CTLTYPE_INT).
I've test built quite a few kernels, but there may be some unfixed MD
fallout. Most likely passing &char[] to char *.
Also add CTLFLAG_UNSIGNED for unsiged decimals - not set yet.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.299 22-Mar-2012 drochner

remove KAME IPSEC, replaced by FAST_IPSEC


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.298 09-Jan-2012 liamjfoy

check against NULL


# 1.297 19-Dec-2011 drochner

rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.296 31-Aug-2011 plunky

branches: 1.296.2; 1.296.6;
NULL does not need a cast


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.295 03-May-2011 dyoung

*_drain() routines may be called with locks held, so instead of doing
any work in *_drain(), set a drain-needed flag. Do the work in the
fasttimo handler.

Contributed by Coyote Point Systems, Inc.


# 1.294 14-Apr-2011 dyoung

In ipintr(), don't overwrite ipintrq.ifq_maxlen with IFQ_MAXLEN.

Initialize ipintrq.ifq_maxlen using IFQ_MAXLEN directly instead of using
the global ipqmaxlen. Get rid of the global ipqmaxlen.

Now it works again to override the maximum IP queue length with, for
example, sysctl -w net.inet.ip.ifq.maxlen=5.


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.293 13-Dec-2010 matt

branches: 1.293.2;
Back out rev that shouldn't have been committed.


# 1.292 11-Dec-2010 matt

Add routines to calculate a checkesum if the driver concludes that the
h/w can't do it.


Revision tags: uebayasi-xip-base4
# 1.291 05-Nov-2010 rmind

ip_randomid: make mechanism MP-safe and more modular.

OK matt@


# 1.290 05-Nov-2010 rmind

ip_reass_packet: finish abstraction; some clean-up.
Discussed some time ago with matt@.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.289 19-Jul-2010 rmind

Abstract IP reassembly into single generic routine - ip_reass_packet().
Make struct ipq private and struct ipqent not visible to userland.
Push ip_len adjustment into reassembly layer.

OK matt@


# 1.288 13-Jul-2010 rmind

Split-off IPv4 re-assembly mechanism into a separate module. Abstract
into ip_reass_init(), ip_reass_lookup(), etc (note: abstraction is not
yet complete). No functional changes to the actual mechanism.

OK matt@


# 1.287 09-Jul-2010 rmind

ip_input: move lookup for fragment queue a little bit further. OK matt@.


Revision tags: uebayasi-xip-base1
# 1.286 01-Apr-2010 tls

As suggested by at least 3 different people (the guilty parties know who
they are) avoid repeated kernel_lock/unlock by using an intrq on the stack.

About 5%-10% better from run to run, on my *very* simpleminded test. Can't
possibly be worse.


# 1.285 31-Mar-2010 tls

Don't hold kernel lock across call to ip_input() -- it blocked *all*
hardware interrupts for the length of time it took for all dequeued
packets to flow up the stack (on multiprocessors only). Initial testing
shows performance impact is minimal -- since this temporary fix actually
means taking/releasing the kernel lock per-packet, that seems
acceptable.

Holding the kernel lock across the ip_input() call duplicated the
exclusion intended to be provided by the socket locks/softnet lock
(same lock, for INET/INET6 sockets) and could mask serious bugs. Several
hours' testing didn't turn any up but I'd be surprised if some don't now
appear.

Damon Permezel noticed the problem. Temporary fix suggested by matt@.


Revision tags: yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.284 16-Sep-2009 pooka

branches: 1.284.2; 1.284.4;
Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.283 17-Jul-2009 minskim

Delete trailing whitespace.


Revision tags: yamt-nfs-mp-base6
# 1.282 16-Jul-2009 minskim

Add the IP_RECVTTL option support.

If the IP_RECVTTL option is enabled on a SOCK_DGRAM socket, the
recvmsg(2) call will return the TTL of the received datagram. The
msg_control field in the msghdr structure points to a buffer that
contains a cmsghdr structure followed by the TTL value.

Modeled after FreeBSD implementation.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.281 18-Apr-2009 tsutsui

Remove extra whitespace added by a stupid tool.
XXX: more in src/sys/arch


# 1.280 15-Apr-2009 elad

Remove a few KAUTH_GENERIC_ISSUSER in favor of more descriptive
alternatives.

Discussed on tech-kern:

http://mail-index.netbsd.org/tech-kern/2009/04/11/msg004798.html

Input from ad@, christos@, dyoung@, tsutsui@.

Okay ad@.


# 1.279 18-Mar-2009 cegger

bcopy -> memcpy


Revision tags: nick-hppapmap-base2
# 1.278 19-Jan-2009 christos

branches: 1.278.2;
Provide compatibility to the old timeval SCM_TIMESTAMP messages.


Revision tags: mjf-devfs2-base
# 1.277 17-Dec-2008 cegger

kill MALLOC and FREE macros.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.276 23-Nov-2008 rmind

ip_input: fix an IPQ "lock" leak. (hi <matt>!)


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4
# 1.275 04-Oct-2008 pooka

branches: 1.275.2; 1.275.4;
POOL_INIT -> pool_init


Revision tags: wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.274 05-Sep-2008 seanb

Wrong route being consulted in one place
in ip_forward() after change to rtcache_*().
Restore previous behaviour.


# 1.273 20-Aug-2008 matt

Make the sysctl routines take out softnet_lock before dealing with
any data structures.

Change inet6ctlerrmap and zeroin6_addr to const.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base
# 1.272 05-May-2008 ad

branches: 1.272.2; 1.272.6;
- Convert hashinit() to use kmem_alloc(). The hash tables can be large
and it's better to not have them in kmem_map.
- Convert a couple of minor items along the way to kmem_alloc().
- Fix some memory leaks.


# 1.271 04-May-2008 thorpej

Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577


# 1.270 02-May-2008 ad

PR kern/38497 Out of memory allocating ksiginfo

Work around: don't acquire softnet_lock in protocol drain routines.


# 1.269 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.268 24-Apr-2008 ad

branches: 1.268.2;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.


# 1.267 23-Apr-2008 thorpej

Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.266 12-Apr-2008 thorpej

branches: 1.266.2;
Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.


# 1.265 09-Apr-2008 thorpej

- ipflow is not used outside ip_flow.c; move its definition there.
- Make ipflow_reap() private to ip_flow.c, and introduce ipflow_prune()
for external callers to use (avoids returning an ipflow * that is never
actually used anyway).


# 1.264 07-Apr-2008 thorpej

Change IP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ipstat structure; old netstat
binaries will continue to work properly.


# 1.263 27-Mar-2008 cube

- Make sure we send a reasonable fragment size when IPSEC is configured.
Otherwise we end up sending a dubious "0" whenever we cannot find a
proper association for the packet.
- Reset sack_newdata along with snd_nxt to avoid improper integer
arithmetics that lead to sending data from an incorrect place in the
stream, making it appear as corrupted.

Patch by Michael Van Elst, based on an analysis by Michael for the IPSEC
stuff and I for the SACK issue.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.262 06-Feb-2008 matt

branches: 1.262.6;
Add a new ip_id generation scheme based on a Fisher-Yates shuffle over a
sliding window. XXX replace use of arc4random RSN.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.261 14-Jan-2008 dyoung

Use rtcache_validate() instead of rtcache_getrt(). Shorten staircase
in in_losing().


Revision tags: vmlocking2-base3 matt-armv6-base
# 1.260 22-Dec-2007 matt

Fix offset calculation.
Make sure that all frags use the same TOS.


# 1.259 21-Dec-2007 matt

Also make sure the first is at 68 bytes long.


# 1.258 21-Dec-2007 matt

Prevent TCP blind data attacks by not allowing non-initial fragments to
start at less than 68 bytes (minimal fragment size).


# 1.257 20-Dec-2007 dyoung

Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.256 26-Nov-2007 yamt

branches: 1.256.2; 1.256.6;
inetctlerrmap: use designated initializer.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.255 09-Nov-2007 kefren

Don't MCLAIM in ipintr() because we do it anyway in ip_input()


Revision tags: jmcneill-base yamt-x86pmap-base4 yamt-x86pmap-base3 yamt-x86pmap-base2 vmlocking-base
# 1.254 02-Oct-2007 dyoung

branches: 1.254.2; 1.254.4;
Delete the unused second argument to ip_stripoptions(), move it
closer to its single caller in if_eon.c, try to move fewer bytes
by moving the IP header forward instead of moving the tail of the
mbuf backward, and use m_adj(9) instead of fiddling directly with
mbuf data members.


Revision tags: yamt-x86pmap-base
# 1.253 11-Sep-2007 degroote

branches: 1.253.2;
In some FAST_IPSEC, spl level is not restored correctly. Fix that.

Spotted by Wolfgang Stukenbrock in pr/36800


Revision tags: nick-csl-alignment-base5
# 1.252 30-Aug-2007 dyoung

Use malloc(9) for sockaddrs instead of pool(9), and remove dom_sa_pool
and dom_sa_len members from struct domain. Pools of fixed-size
objects are too rigid for sockaddr_dls, whose size can vary over
a wide range.

Return sockaddr_dl to its "historical" size. Now that I'm using
malloc(9) instead of pool(9) to allocate sockaddr_dl, I can create
a sockaddr_dl of any size in the kernel, so expanding sockaddr_dl
is useless.

Avoid using sizeof(struct sockaddr_dl) in the kernel.

Introduce sockaddr_dl_alloc() for allocating & initializing an
arbitrary sockaddr_dl on the heap.

Add an argument, the sockaddr length, to sockaddr_alloc(),
sockaddr_copy(), and sockaddr_dl_setaddr().

Constify: LLADDR() -> CLLADDR().

Where the kernel overwrites LLADDR(), use sockaddr_dl_setaddr(),
instead. Used properly, sockaddr_dl_setaddr() will not overrun
the end of the sockaddr.


# 1.251 10-Aug-2007 dyoung

branches: 1.251.2;
Use sockaddr_dl_init().


Revision tags: matt-mips64-base
# 1.250 19-Jul-2007 dyoung

branches: 1.250.4; 1.250.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base yamt-idlelwp-base8 mjf-ufs-trans-base
# 1.249 02-May-2007 dyoung

branches: 1.249.2;
Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


Revision tags: thorpej-atomic-base
# 1.248 25-Mar-2007 liamjfoy

Add net.inet.ip.hashsize to control the IPv4 fast forward hash table size.


# 1.247 24-Mar-2007 liamjfoy

Don't call ip*flow_reap if we're just looking up maxflows


# 1.246 12-Mar-2007 ad

branches: 1.246.2; 1.246.4;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.


# 1.245 05-Mar-2007 liamjfoy

branches: 1.245.2;
Move ipflow_slowtimo from ip_slowtimo and into in_proto.c

ok matt@


# 1.244 04-Mar-2007 christos

Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.243 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.242 29-Jan-2007 dyoung

branches: 1.242.2;
Cosmetic: remove extraneous, non-KNF parentheses. Change a
sizeof(type) to a sizeof(*ptr) so the correctness of the statement
is correct "at a glance" (or so I hope).


# 1.241 22-Dec-2006 ad

ipintr(): check if the queue is empty before looping. Hardly a giant
win, but removed 30% of splnet() calls in one local test.


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.240 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.239 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


# 1.238 06-Dec-2006 dyoung

KNF.


# 1.237 06-Dec-2006 dyoung

KNF.


Revision tags: netbsd-4-0-RC1 netbsd-4-base
# 1.236 16-Nov-2006 christos

branches: 1.236.2; 1.236.4;
__unused removal on arguments; approved by core.


Revision tags: yamt-splraiseipl-base2
# 1.235 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


# 1.234 10-Oct-2006 dogcow

change the MOWNER_INIT define to take two args; fix extant struct mowner
decls to use it. Makes options MBUFTRACE compile again and not whinge about
missing structure declarations. (Also makes initialization consistent.)


# 1.233 05-Oct-2006 tls

Protect calls to pool_put/pool_get that may occur in interrupt context
with spl used to protect other allocations and frees, or datastructure
element insertion and removal, in adjacent code.

It is almost unquestionably the case that some of the spl()/splx() calls
added here are superfluous, but it really seems wrong to see:

s=splfoo();
/* frob data structure */
splx(s);
pool_put(x);

and if we think we need to protect the first operation, then it is hard
to see why we should not think we need to protect the next. "Better
safe than sorry".

It is also almost unquestionably the case that I missed some pool
gets/puts from interrupt context with my strategy for finding these
calls; use of PR_NOWAIT is a strong hint that a pool may be used from
interrupt context but many callers in the kernel pass a "can wait/can't
wait" flag down such that my searches might not have found them. One
notable area that needs to be looked at is pf.

See also:

http://mail-index.netbsd.org/tech-kern/2006/07/19/0003.html
http://mail-index.netbsd.org/tech-kern/2006/07/19/0009.html


# 1.232 19-Sep-2006 elad

Remove ugly (void *) casts from network scope authorization wrapper and
calls to it.

While here, adapt code for system scope listeners to avoid some more
casts (forgotten in previous run).

Update documentation.


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9
# 1.231 13-Sep-2006 elad

branches: 1.231.2;
Don't use KAUTH_RESULT_* where it's not applicable.
Prompted by yamt@.


# 1.230 08-Sep-2006 elad

First take at security model abstraction.

- Add a few scopes to the kernel: system, network, and machdep.

- Add a few more actions/sub-actions (requests), and start using them as
opposed to the KAUTH_GENERIC_ISSUSER place-holders.

- Introduce a basic set of listeners that implement our "traditional"
security model, called "bsd44". This is the default (and only) model we
have at the moment.

- Update all relevant documentation.

- Add some code and docs to help folks who want to actually use this stuff:

* There's a sample overlay model, sitting on-top of "bsd44", for
fast experimenting with tweaking just a subset of an existing model.

This is pretty cool because it's *really* straightforward to do stuff
you had to use ugly hacks for until now...

* And of course, documentation describing how to do the above for quick
reference, including code samples.

All of these changes were tested for regressions using a Python-based
testsuite that will be (I hope) available soon via pkgsrc. Information
about the tests, and how to write new ones, can be found on:

http://kauth.linbsd.org/kauthwiki

NOTE FOR DEVELOPERS: *PLEASE* don't add any code that does any of the
following:

- Uses a KAUTH_GENERIC_ISSUSER kauth(9) request,
- Checks 'securelevel' directly,
- Checks a uid/gid directly.

(or if you feel you have to, contact me first)

This is still work in progress; It's far from being done, but now it'll
be a lot easier.

Relevant mailing list threads:

http://mail-index.netbsd.org/tech-security/2006/01/25/0011.html
http://mail-index.netbsd.org/tech-security/2006/03/24/0001.html
http://mail-index.netbsd.org/tech-security/2006/04/18/0000.html
http://mail-index.netbsd.org/tech-security/2006/05/15/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/01/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/25/0000.html

Many thanks to YAMAMOTO Takashi, Matt Thomas, and Christos Zoulas for help
stablizing kauth(9).

Full credit for the regression tests, making sure these changes didn't break
anything, goes to Matt Fleming and Jaime Fournier.

Happy birthday Randi! :)


Revision tags: yamt-pdpolicy-base8 rpaulo-netinet-merge-pcb-base
# 1.229 30-Aug-2006 christos

branches: 1.229.2;
fix initializer


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.228 30-Jul-2006 elad

ugh.. more stuff that's overdue and should not be in 4.0: remove the
sysctl(9) flags CTLFLAG_READONLY[12]. luckily they're not documented
so it's only half regression.

only two knobs used them; proc.curproc.corename (check added in the
existing handler; its CTLFLAG_ANYWRITE, yay) and net.inet.ip.forwsrcrt,
that got its own handler now too.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base
# 1.227 07-Jun-2006 kardel

merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base simonb-timecounters-base
# 1.226 08-May-2006 liamjfoy

branches: 1.226.2;
#if -> #ifdef

ok christos


# 1.225 15-Apr-2006 christos

Coverity CID 1134: Protect against NULL deref.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.224 18-Feb-2006 joerg

branches: 1.224.2; 1.224.4; 1.224.6;
Print the source and destination IP in ip_forward's DIAGNOSTIC code
with inet_ntoa, making it more human friendly.

From Liam J. Foy in private mail.


# 1.223 24-Dec-2005 perry

branches: 1.223.2; 1.223.4; 1.223.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.222 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.221 01-Nov-2005 christos

Don't decrement the ttl, until we are sure that we can forward this packet.
Before if there was no route, we would call icmp_error with a datagram
packet that has an incorrect checksum. (From Liam Foy)


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base
# 1.220 23-Oct-2005 christos

No need to pass an interface when only the mtu is needed. From OpenBSD via
Liam Foy.


Revision tags: yamt-vop-base
# 1.219 05-Aug-2005 elad

branches: 1.219.2;
Add sysctls for IP, ICMP, TCP, and UDP statistics.


# 1.218 28-Jun-2005 seanb

branches: 1.218.2;
- Return ICMP_UNREACH_NET when no route found as per
section 4.3.3.1 of rfc1812.


# 1.217 09-Jun-2005 atatat

Properly fix the constipated lossage wrt -Wcast-qual and the sysctl
code. I know it's not the prettiest code, but it seems to work rather
well in spite of itself.


# 1.216 01-Jun-2005 blymn

Unconstify rnode to prevent compile error when GATEWAY option set.


Revision tags: kent-audio2-base
# 1.215 29-Apr-2005 yamt

move decl of inetsw to its own header to avoid array of incomplete type.
found by gcc4. reported by Adam Ciarcinski.


# 1.214 18-Apr-2005 yamt

fix problems related to loopback interface checksum omission. PR/29971.

- for ipv4, defer decision to ip layer as h/w checksum offloading does
so that it can check the actual interface the packet is going to.
- for ipv6, disable it.
(maybe will be revisited when it implements h/w checksum offloading.)

ok'ed by Jason Thorpe.


# 1.213 29-Mar-2005 yamt

ip_reass: clear stale csum_flags.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base
# 1.212 26-Feb-2005 perry

branches: 1.212.2;
nuke trailing whitespace


Revision tags: yamt-km-base2
# 1.211 03-Feb-2005 perry

ANSIfy function declarations


# 1.210 02-Feb-2005 perry

de-__P -- will ANSIfy .c files later.


Revision tags: yamt-km-base
# 1.209 24-Jan-2005 matt

branches: 1.209.2;
Add IFNET_FOREACH and IFADDR_FOREACH macros and start using them.


Revision tags: kent-audio1-beforemerge
# 1.208 19-Dec-2004 christos

branches: 1.208.2;
yamt's changes seem to fix all the checksumming issues. Turn the loopback
checksums back off so we can make sure that everything works.


# 1.207 17-Dec-2004 christos

Turn checksumming on loopback back on until we fix the bugs in it.
Connect over tcp on the loopback is broken:

4729 amq 0.000007 CALL connect(4,0x804f2a0,0x1c)
4729 amq 75.007420 RET connect -1 errno 60 Connection timed out


# 1.206 15-Dec-2004 thorpej

Don't perform checksums on loopback interfaces. They can be reenabled with
the net.inet.*.do_loopback_cksum sysctl.

Approved by: groo


Revision tags: kent-audio1-base
# 1.205 06-Oct-2004 darrenr

Add a comment to document what setting "srcrt" is really on about in ipintr()


# 1.204 29-Sep-2004 christos

PR/27081: Sean Boudreau: ip_input() bad csum count not incremented on sw csum


Revision tags: BEFORE-IPF413
# 1.203 25-May-2004 atatat

Sysctl descriptions under net subtree (net.key not done)


# 1.202 02-May-2004 darrenr

at line 543, we do a pullup here of hlen bytes into the mbuf,
so these later ones are superfluous.


# 1.201 01-May-2004 matt

Use EVCNT_ATTACH_STATIC{,2}


# 1.200 25-Apr-2004 simonb

Initialise (most) pools from a link set instead of explicit calls
to pool_init. Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.

Convert struct session, ucred and lockf to pools.


# 1.199 22-Apr-2004 matt

Constify protosw arrays. This can reduce the kernel .data section by
over 4K (if all the network protocols) are loaded.


# 1.198 01-Apr-2004 matt

In ip_reass_ttl_descr, make i signed since it's compared to >= 0


Revision tags: netbsd-2-0-base BEFORE-IPF411
# 1.197 24-Mar-2004 atatat

branches: 1.197.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.


# 1.196 15-Jan-2004 itojun

correct typo in 1.94 -> 1.95. pointed out by Shiva Shenoy


# 1.195 14-Dec-2003 thorpej

Fix syntax errors in CHECK_NMBCLUSTER_PARAMS().


# 1.194 14-Dec-2003 jonathan

Second part of hashed IP_reassembly changes:

When under pressure for mbufs or we have too many fragments in the IP
reassembly queue, drop half of all fragments. This multiplicative-drop
strategy ensures we return to a healthy state, even under borderline
denial-of-service from extremely lossy NFS-over-UDP peers.
The multiplicative-drop phase currently drops 50% of fragments, but
has pre-placed support for implementing drop-fractions other than 50%

The threshhold for the `drop-half' phase is the new variable,
ip_maxfrags which is calculated as nmbclusters/4.

ip_input.c now keeps ip_nmbclusters, a cached copy of nmbclusters.
Before using limits derived from nmbclusters, we check if nmbclusters
and ip_nmclusters are equal. If not, we recompute Ip parameters
derived from nmbclusters. Based on a suggestion by Jason Thorpe.
ip_maxfrags is currently auto-recalcuated.

The counters ip_nfrags and ip_nfragpacketsr are now declared static
and uninitialized (bss), to discourage tampering with them.


# 1.193 12-Dec-2003 scw

Make fast-ipsec and ipflow (Fast Forwarding) interoperate.

The idea is that we only clear M_CANFASTFWD if an SPD exists
for the packet. Otherwise, it's safe to add a fast-forward
cache entry for the route.

To make this work properly, we invalidate the entire ipflow
cache if a fast-ipsec key is added or changed.


# 1.192 08-Dec-2003 jonathan

Add new field ipq_nfrags to struct ipq. Maintain count of fragments
(fragments, not fragmented packets) in each queue entry.
Use ipq_nfrags to maintain a count of total fragments in reassembly queue.


# 1.191 07-Dec-2003 jonathan

KNF: s/unsigned/u_int/, in a couple of places I missed.


# 1.190 06-Dec-2003 jonathan

Replace the single global IP reassembly list/listhead, with a
hashtable of list-heads. Independently re-invented, then reworked to
match similar code in FreeBSD.


# 1.189 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.188 04-Dec-2003 scw

ipflow (IP fast forwarding) is not compatible with FAST_IPSEC either.

XXX: The decision whether or not to fast forward should be made
XXX: dynamically. Using the current approach seriously reduces
XXX: routing performance on gateways with IPsec enabled.


# 1.187 26-Nov-2003 itojun

define RANDOM_IP_ID by default (unifdef -DRANDOM_IP_ID).
one use remains in sys/netipsec, which is kept for freebsd source code compat.


# 1.186 24-Nov-2003 scw

For FAST_IPSEC, ipfilter gets to see wire-format IPsec-encapsulated packets
only. Decapsulated packets bypass ipfilter. This mimics current behaviour
for Kame IPsec.


# 1.185 19-Nov-2003 fvdl

Correct number of arguments to sysctl_rdint.


# 1.184 19-Nov-2003 jonathan

Patch back support for (badly) randomized IP ids, by request:

* Include "opt_inet.h" everywhere IP-ids are generated with ip_newid(),
so the RANDOM_IP_ID option is visible. Also in ip_id(), to ensure
the prototype for ip_randomid() is made visible.

* Add new sysctl to enable randomized IP-ids, provided the kernel was
configured with RANDOM_IP_ID. (The sysctl defaults to zero, and is
a read-only zero if RANDOM_IP_ID is not configured).

Note that the implementation of randomized IP ids is still defective,
and should not be enabled at all (even if configured) without
very careful deliberation. Caveat emptor.


# 1.183 17-Nov-2003 jonathan

Diff to netinet/ip_input.c (restore ip_id, initialize) for ip_id fix:

Revert the (default) ip_id algorithm to the pre-randomid algorithm,
due to demonstrated low-period repeated IDs from the randomized IP_id
code. Consensus is that the low-period repetition (much less than
2^15) is not suitable for general-purpose use.

Allocators of new IPv4 IDs should now call the function ip_newid().
Randomized IP_ids is now a config-time option, "options RANDOM_IP_ID".
ip_newid() can use ip_random-id()_IP_ID if and only if configured
with RANDOM_IP_ID. A sysctl knob should be provided.

This API may be reworked in the near future to support linear ip_id
counters per (src,dst) IP-address pair.


# 1.182 12-Nov-2003 itojun

KNF


# 1.181 11-Nov-2003 jonathan

Change global head-of-local-IP-address list from in_ifaddr to
in_ifaddrhead. Recent changes in struct names caused a namespace
collision in fast-ipsec, which are most cleanly fixed by using
"in_ifaddrhead" as the listhead name.


# 1.180 10-Nov-2003 jonathan

Make per-protocol network input queue stats visible to userland via
sysctl. Add a protocol-independent sysctl handler to show the per-protocol
"struct ifq' statistics. Add IP(v4) specific call to the handler.
Other protocols can show their per-protocol input statistics by
allocating a sysclt node and calling sysctl_ifq() with their own struct ifq *.

As posted to tech-kern plus improvements/cleanup suggested by Andrew Brown.


# 1.179 28-Sep-2003 mycroft

Remove some code that breaks AH tunnels completely. The comment describing
the purpose of this code appears to be on crack -- it's talking about
end-to-end authentication, but the purpose of an AH tunnel is NOT end-to-end
authentication; it's authentication of the tunnel endpoints.

NB: This does not fix the fact that IPsec leaks "packet tags."


# 1.178 06-Sep-2003 itojun

randomize IPv4/v6 fragment ID and IPv6 flowlabel. avoids predictability
of these fields. ip_id.c is from openbsd. ip6_id.c is adapted by kame.


# 1.177 06-Sep-2003 itojun

backout previous, we don't know if arc4random() corrides on reboot.


# 1.176 05-Sep-2003 itojun

initialize fragment ID with arc4random, not by time.tv_sec


# 1.175 22-Aug-2003 itojun

remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.


# 1.174 22-Aug-2003 itojun

change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.


# 1.173 15-Aug-2003 jonathan

(fast-ipsec): Add hooks to pass IPv4 IPsec traffic into fast-ipsec, if
configured with ``options FAST_IPSEC''. Kernels with KAME IPsec or
with no IPsec should work as before.

All calls to ip_output() now always pass an additional compulsory
argument: the inpcb associated with the packet being sent,
or 0 if no inpcb is available.

Fast-ipsec tested with ICMP or UDP over ESP. TCP doesn't work, yet.


# 1.172 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.171 14-Jul-2003 itojun

correct igmp. from love


# 1.170 03-Jul-2003 itojun

minor KNF


# 1.169 30-Jun-2003 itojun

branches: 1.169.2;
do not generate ICMP redirect when packet filter alters ip_dst to an
address that reside on the same link. Cedric Berger convinced me that
it is necessary.


# 1.168 30-Jun-2003 itojun

fix indent


# 1.167 23-Jun-2003 martin

Make sure to include opt_foo.h if a defflag option FOO is used.


# 1.166 15-Jun-2003 matt

Change the way multicasts are kept. They now use a hash table in the same
manner as the ifaddr hash table. By doing this, the mkludge code can go
away. At the same time, keep track of what pcbs are using what ifaddr and
when an address is deleted from an interface, notify/abort all sockets
that have that address as a source. Switch IGMP and multicasts to use pools
for allocation. Fix a number of potential problems in the igmp code where
allocation failures could cause a trap/panic.


# 1.165 11-Apr-2003 christos

PR/991: Darren Reed: Add a sysctl (checkinteface) to implement this. This
implementation is taken from FreeBSD, but we default to off.
XXX: We should really do this on a per ifaddr basis as jason suggested.


# 1.164 26-Feb-2003 matt

Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.163 12-Nov-2002 itojun

remove all entries in rt timer queue on ip_mtudisc change, instead of
destroying the queue.


# 1.162 12-Nov-2002 itojun

ckout previous - doesn't compile


# 1.161 12-Nov-2002 itojun

update ip_mtudisc sysctl change handling.


# 1.160 10-Nov-2002 itojun

always create pmtud timeout queue, as ip_mtudisc can be tweaked via
sysctl at runtime. From lha@stacken.kth.se


# 1.159 02-Nov-2002 perry

/*CONTCOND*/ while (0)'ed macros


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.158 23-Sep-2002 itojun

revert mtudisc_timeout value to the old one if update falis


# 1.157 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame.


# 1.156 11-Sep-2002 itojun

correct signedness mixup in pointer passing. sync w/kame


Revision tags: gehenna-devsw-base
# 1.155 14-Aug-2002 itojun

avoid swapping endian of ip_len and ip_off on mbuf, to meet with M_LEADINGSPACE
optimization made last year. should solve PR 17867 and 10195.

IP_HDRINCL behavior of raw ip socket is kept unchanged. we may want to
provide IP_HDRINCL variant that does not swap endian.


# 1.154 30-Jun-2002 thorpej

Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).


# 1.153 13-Jun-2002 itojun

set IPv4 parameter to modern value.
- turn on path MTU discovery (previous: turned off)
- ICMPv4 redirect entry timeout = 600 sec (previous: never timeout)


# 1.152 09-Jun-2002 itojun

whitespace


# 1.151 07-Jun-2002 itojun

look at rmx_mtu on IPsec tunnel MTU computation.
From: David Waitzman <djw@bbn.com>


Revision tags: netbsd-1-6-base
# 1.150 12-May-2002 matt

branches: 1.150.2; 1.150.4;
Eliminate commons.


# 1.149 12-May-2002 wiz

Spelling fixes, from Sergey Svishchev in kern/16650.


# 1.148 07-May-2002 matt

Change struct ipqe to use TAILQ's instead of LIST's (primarily for TCP's
benefit currently). Rework tcp_reass code to optimize the 4 most likely causes
of out-of-order packets: first OoO pkt, next OoO pkt in seq, OoO pkt is part
of new chuck of OoO packets, and the OoO pkt fills the first hole. Add evcnts
to instrument tcp_reass (enabled by the options TCP_REASS_COUNTERS). This is
part 1/2 of tcp_reass changes.


# 1.147 18-Apr-2002 matt

Change test for M_EXT to M_READONLY for MROUTING. We only need to to do
a pullup if we aren't allowed to modify the packet.


Revision tags: eeh-devprop-base newlock-base
# 1.146 08-Mar-2002 thorpej

Pool deals fairly well with physical memory shortage, but it doesn't
deal with shortages of the VM maps where the backing pages are mapped
(usually kmem_map). Try to deal with this:

* Group all information about the backend allocator for a pool in a
separate structure. The pool references this structure, rather than
the individual fields.
* Change the pool_init() API accordingly, and adjust all callers.
* Link all pools using the same backend allocator on a list.
* The backend allocator is responsible for waiting for physical memory
to become available, but will still fail if it cannot callocate KVA
space for the pages. If this happens, carefully drain all pools using
the same backend allocator, so that some KVA space can be freed.
* Change pool_reclaim() to indicate if it actually succeeded in freeing
some pages, and use that information to make draining easier and more
efficient.
* Get rid of PR_URGENT. There was only one use of it, and it could be
dealt with by the caller.

From art@openbsd.org.


Revision tags: ifpoll-base
# 1.145 25-Feb-2002 itojun

correctly enforce ipsec policy check on forwarding case.
From: Greg Troxel <gdt@ir.bbn.com>, Bill Chiarchiaro <wjc@work.cleartech.com>


# 1.144 24-Feb-2002 martin

Clear M_BCAST and M_MCAST on outgoing mbufs.
Don't copy ttl from the inner packet to the encapsulating packet. Make
the outer ttl sysctl'able. This should close PR 14269 from Jasper Wallace
(change partly from there) and it makes traceroute work over gre tunnels.


# 1.143 21-Feb-2002 itojun

suppress source quence message, based on router-req RFC (also could be abused
as DoS traffic generator). from kjc/kame


# 1.142 28-Nov-2001 darrenr

recompute hlen after calling pfil_run_hooks() in case ip_hl was changed.


# 1.141 13-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-mips-cache-base
# 1.140 04-Nov-2001 matt

Convert netinet to not use the internal <sys/queue.h> field names
but instead the access macros. Use the FOREACH macros where appropriate.


# 1.139 04-Nov-2001 matt

Change a few variable/tables to const since they are read-only.


# 1.138 29-Oct-2001 simonb

Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.137 17-Sep-2001 thorpej

branches: 1.137.2;
Split the pre-computed ifnet checksum flags into Tx and Rx directions.
Add capabilities bits that indicate an interface can only perform
in-bound TCPv4 or UDPv4 checksums. There is at least one Gig-E chip
for which this is true (Level One LXT-1001), and this is also the
case for the Intel i82559 10/100 Ethernet chips.


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.136 06-Aug-2001 itojun

branches: 1.136.2;
cache IPsec policy on in6?pcb. most of the lookup operations can be bypassed,
especially when it is a connected SOCK_STREAM in6?pcb. sync with kame.


# 1.135 02-Jun-2001 thorpej

branches: 1.135.2;
Implement support for IP/TCP/UDP checksum offloading provided by
network interfaces. This works by pre-computing the pseudo-header
checksum and caching it, delaying the actual checksum to ip_output()
if the hardware cannot perform the sum for us. In-bound checksums
can either be fully-checked by hardware, or summed up for final
verification by software. This method was modeled after how this
is done in FreeBSD, although the code is significantly different in
most places.

We don't delay checksums for IPv6/TCP, but we do take advantage of the
cached pseudo-header checksum.

Note: hardware-assisted checksumming defaults to "off". It is
enabled with ifconfig(8). See the manual page for details.

Implement hardware-assisted checksumming on the DP83820 Gigabit Ethernet,
3c90xB/3c90xC 10/100 Ethernet, and Alteon Tigon/Tigon2 Gigabit Ethernet.


# 1.134 21-May-2001 lukem

fix spelo in comment


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.133 16-Apr-2001 itojun

give a default value to net.inet.ip.maxfragpackets, to protect us from
"lots of fragmented packets" DoS attack.

the current default value is derived from ipv6 counterpart, which is
a magical value "200". it should be enough for normal systems, not sure
if it is enough when you take hundreds of thousands of tcp connections on
your system. if you have proposal for a better value with concrete reasons,
let me know.


# 1.132 13-Apr-2001 thorpej

Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.


# 1.131 27-Mar-2001 itojun

net.inet.ip.maxfragpackets defines the maximum size of ip reass queue
(prevents fragment flood from chewing up mbuf memory space).
derived from KAME net.inet6.ip6.maxfragpackets.


# 1.130 02-Mar-2001 itojun

branches: 1.130.2;
increase ipstat.ips_badaddr if the packet fails to pass address checks.


# 1.129 02-Mar-2001 itojun

reject packets with 127/8 on IPv4 src/dst, they must not appear on wire
(RFC1122). torture-tests will be welcomed.
XXX do we want to check source routing headers as well?


# 1.128 01-Mar-2001 itojun

make sure to enforce inbound ipsec policy checking, for any protocols on top
of ip (check it when final header is visited). sync with kame.
XXX kame team will need to re-check policy engine code


# 1.127 24-Jan-2001 itojun

- record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation


# 1.126 28-Dec-2000 thorpej

Back out the sledgehammer damage applied by wiz while I was out for
the holiday.


# 1.125 25-Dec-2000 wiz

Back out previous change. It causes NAT to fail, and was CLEARLY
NOT TESTED before it was committed.


# 1.124 22-Dec-2000 thorpej

Slight adjustment to how pfil_head's are registered. Instead of a
"key" and a "dlt", use a "type" (PFIL_TYPE_{AF,IFNET} for now) and
a val/ptr appropriate for that type. This allows for more future
flexibility with the pfil_hook mechanism.


# 1.123 14-Dec-2000 thorpej

Add ALTQ glue. XXX Temporary until ALTQ is changed to use a pfil hook.


# 1.122 24-Nov-2000 itojun

IFA_STATS stability (not complete); don't touch ip if it is NULL.


# 1.121 11-Nov-2000 thorpej

Restructure the PFIL_HOOKS mechanism a bit:
- All packets are passed to PFIL_HOOKS as they come off the wire, i.e.
fields in protocol headers in network order, etc.
- Allow for multiple hooks to be registered, using a "key" and a "dlt".
The "dlt" is a BPF data link type, indicating what type of header is
present.
- INET and INET6 register with key == AF_INET or AF_INET6, and
dlt == DLT_RAW.
- PFIL_HOOKS now take an argument for the filter hook, and mbuf **,
an ifnet *, and a direction (PFIL_IN or PFIL_OUT), thus making them
less IP (really, IP Filter) centric.

Maintain compatibility with IP Filter by adding wrapper functions for
IP Filter.


# 1.120 08-Nov-2000 ad

Update for hashinit() change.


# 1.119 13-Oct-2000 itojun

make sure we don't share external mbuf between m and mcopy, in ip_forward().
should solve PR 11201.


# 1.118 26-Aug-2000 itojun

make sure anonport{min,max} is not negative number


# 1.117 25-Aug-2000 tron

Add new sysctl variables "net.inet.ip.lowportmin" and
"net.inet.ip.lowportmax" which can be used to the set minimum
and maximum port number assigned to sockets using
IP_PORTRANGE_LOW.


# 1.116 06-Jul-2000 itojun

remove unnecessary #include <netkey/key_debug.h>. from kame.


# 1.115 28-Jun-2000 mrg

<vm/vm.h> -> <uvm/uvm_extern.h>


Revision tags: netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.114 10-May-2000 itojun

branches: 1.114.4;
add missing boundary checks to ip options processing.
correct timestamp option validation (len and ptr upper/lower bound
based on RFC791).
fill "pointer" field for parameter problem in timestamp option processing.


# 1.113 10-May-2000 itojun

correct more out-of-bounds memory access, if cnt == 1 and optlen > 1.


# 1.112 06-May-2000 sommerfeld

Handle large offsets with very small options correctly.


# 1.111 31-Mar-2000 jdolecek

Slighly improve previous - only include <netinet/ip_mroute.h> if MROUTING
is defined.


# 1.110 31-Mar-2000 jdolecek

include <netinet/ip_mroute.h> for ip_mforward() - needed after
last duplicate prototype sweep (prototype for ip_mforward() used to be in <netinet/ip_var.h>)


# 1.109 30-Mar-2000 augustss

Remove register declarations.


# 1.108 30-Mar-2000 simonb

Delete uninitialised declaration of ip_defttl - there's an initialised
decl earlier in this file.


# 1.107 10-Mar-2000 thorpej

Back out previous, and adjust a comment.


# 1.106 07-Mar-2000 thorpej

Back out part of 1.104 which isn't actually needed.


# 1.105 03-Mar-2000 itojun

remove unnecessary ttl initialization which I mistakingly bringed in
during KAME merge (this is part of WIDE's expeirmental reass code...)
NetBSD PR: 9412
From: Wolfgang Rupprecht <wolfgang@wsrcc.com>
Fix from: ho@crt.se
itojun was notified from: theo


# 1.104 02-Mar-2000 thorpej

Avoid a bug in GCC which manifests itself when processing unaligned
IP options. Problem pointed out by Matt Hargett and Erik Fair, analyzed
by me.


# 1.103 01-Mar-2000 itojun

introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)


# 1.102 20-Feb-2000 darrenr

pass "struct pfil_head *" to pfil_add_hook and pfil_remove hook rather
than "struct protosw *".


# 1.101 17-Feb-2000 darrenr

Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.


# 1.100 16-Feb-2000 itojun

- if ip_dst matches address on !IFF_UP interface, and
- there's no match against addresses on IFF_UP interface,
send icmp unreach if I'm router. drop it if I'm host.

Revised version of PR: 9387 from nrt@iij.ad.jp. Discussed with thorpej+nrt.


Revision tags: chs-ubc2-newbase
# 1.99 12-Feb-2000 thorpej

Typo (Thanks, Havard :-)


# 1.98 12-Feb-2000 thorpej

Small cosmetic change, and note a place where a statistic should be
gathered.


# 1.97 11-Feb-2000 itojun

fix in-kernel packet forwarding loop (till TTL becomes 0) when:
- a packet is delivered to an address X,
- and the address X is configured on my !IFF_UP interface
- and ipforwarding=1

NetBSD PR: 9387
From: nrt@iij.ad.jp


# 1.96 01-Feb-2000 thorpej

Use ifatoia() and sintosa() consistently, rather than using home-grown
casting macros intermixed.


# 1.95 31-Jan-2000 itojun

bring in latest KAME ipsec tree.
- interop issues in ipcomp is fixed
- padding type (after ESP) is configurable
- key database memory management (need more fixes)
- policy specification is revisited

XXX m->m_pkthdr.rcvif is still overloaded - hope to fix it soon


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.94 26-Oct-1999 itojun

disable ipflow (IPv4 fast fowarding) when IPsec is configured into the kernel.


# 1.93 17-Oct-1999 sommerfeld

branches: 1.93.2; 1.93.4;
In ip_forward():

Avoid forwarding ip unicast packets which were contained inside
link-level multicast packets; having M_MCAST still set in the packet
header flags will mean that the packet will get multicast to a bogus
group instead of unicast to the next hop.

Malformed packets like this have occasionally been spotted "in the
wild" on a mediaone cable modem segment which also had multiple netbsd
machines running as router/NAT boxes.

Without this, any subnet with multiple netbsd routers receiving all
multicasts will generate a packet storm on receipt of such a
multicast. Note that we already do the same check here for link-level
broadcasts; ip6_forward already does this as well.

Note that multicast forwarding does not go through ip_forward().

Adding some code to if_ethersubr to sanity check link-level
vs. ip-level multicast addresses might also be worthwhile.


Revision tags: chs-ubc2-base
# 1.92 23-Jul-1999 itojun

branches: 1.92.2;
do not include unnecessary include files.


# 1.91 09-Jul-1999 thorpej

defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).


# 1.90 06-Jul-1999 itojun

sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups


# 1.89 01-Jul-1999 itojun

IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.88 26-Jun-1999 sommerfeld

If the new global variable hostzerobroadcast is zero, no longer assume
address zero of each net/subnet is a broadcast address.
(The default value is nonzero, which preserves the current behavior).

This can be set using sysctl; the boot-time default can also be
configured using the HOSTZEROBROADCAST kernel config option.

While we're here, defopt HOSTZEROBROADCAST and SUBNETSARELOCAL


# 1.87 04-May-1999 hwr

It does not make much sense to increase a "output" counter on input.


# 1.86 03-May-1999 thorpej

In INADDR_TO_IA(), skip interfaces which are not up. Revert previous change
to ip_input.c to check the interface status after INADDR_TO_IA().

Fix cooked up by Heiko Rupp and myself.

Fixes PR 7480.


# 1.85 03-May-1999 hwr

Drop packets, that have a Class-D address as source address.
Implements the first half of PR 7003.


# 1.84 07-Apr-1999 proff

tiny KNF change


# 1.83 07-Apr-1999 proff

Prevent reception of packets on downed interfaces (via an up interface).
fixes kern/7327


Revision tags: netbsd-1-4-base
# 1.82 27-Mar-1999 aidan

branches: 1.82.2;
Added per-addr input/output statistics. Currently just support netatalk
and netinet, currently only tested under netinet.

Disabled by default, enabled by compiling the kernel with option
IFA_STATS. Enabling this feature seems to make the ip_output function
take 13% longer than before, which should be OK for people that need
this feature.


# 1.81 26-Mar-1999 proff

security: test for ip_len < ip_hl <<2 and drop packet accordingly


# 1.80 19-Jan-1999 mycroft

There's just no plausible reason to byte-swap ip_id internally. It's opaque.


# 1.79 19-Jan-1999 mycroft

Don't screw with ip_len; just subtract from it where we actually use the
value.


# 1.78 19-Jan-1999 mycroft

Don't overwrite the checksum fields when checking them. There's no reason to
do this, and it screws up ICMP replies.
XXX The returned IP checksum and length are still wrong.


# 1.77 11-Jan-1999 thorpej

Fix byte order and ip_len inconsistencies in ICMP reply code. Also, fix
some formatting and HTONS(foo) vs. foo = htons(foo) inconsistencies.

PR #6602, Darren Reed.


# 1.76 19-Dec-1998 thorpej

Reverse the copyright-notice-swap. It went against existing practice.


# 1.75 18-Dec-1998 thorpej

Add a lock around the IP fragment reassembly queue, to prevent ip_drain()
from corrupting the queue if called from a device's interrupt context.

Should fix PR #5684.


Revision tags: kenh-if-detach-base
# 1.74 13-Nov-1998 thorpej

branches: 1.74.2;
Once a fragmented IP packet has been reassembled, recompute the packet
length before passing it up the stack. From FreeBSD.


Revision tags: chs-ubc-base
# 1.73 08-Oct-1998 thorpej

Use the pool allocator for ipflow entries.


# 1.72 08-Oct-1998 thorpej

Use the pool allocator for ipqent structures.


# 1.71 30-Sep-1998 tls

Switch order of TNF and UCB copyrights so UCB copyright is first; this seems more appropriate since UCB wrote the original code, after all.


# 1.70 09-Sep-1998 thorpej

Make a diagnostic printf more sensible, PR #5951, Heiko W. Rupp.


# 1.69 09-Aug-1998 mrg

defopt PFIL_HOOKS.


Revision tags: eeh-paddr_t-base
# 1.68 17-Jul-1998 sommerfe

Fix PR5508: ipfil cut-through forwarding causes panic


# 1.67 01-Jun-1998 thorpej

Protect the ipflow_reap() call with splsoftnet.


# 1.66 24-May-1998 thorpej

Fix OBOB in IP timestamp option processing, as noted in FreeBSD PR 6738,
from Jennifer Dawn Meyers <jdm@enteract.com>.


# 1.65 04-May-1998 matt

Default IP flow to being enabled. Add a sysctl to control the maximum
number of flows (net.inet.ip.maxflows). If set to 0, will disable fast
path forwarding.


# 1.64 01-May-1998 thorpej

Allow packet filters to prevent a packet from creating a fast-forwarding
flow, by setting the "can fast forward" flag in the packet header, and
giving a chance for filters to clear the flag. If the flag is still
set after the filters have given it a chance, the packet will be used
to create a fast-forward flow entry.


# 1.63 29-Apr-1998 matt

Add support for "fast" forwarding. Add hooks in if_ethersubr.c and
if_fddisubr.c to fastpath IP forwarding. If ip_forward successfully
forwards a packet, it will create a cache (ipflow) entry. ether_input
and fddi_input will first call ipflow_fastforward with the received
packet and if the packet passes enough tests, it will be forwarded (the
ttl is decremented and the cksum is adjusted incrementally).


# 1.62 29-Apr-1998 matt

defopt GATEWAY


# 1.61 29-Apr-1998 kml

change path MTU timeout value to match RFC 1191


# 1.60 29-Apr-1998 kml

Add support for deletion of routes added by path MTU discovery;
uses new generic route timeout code. Add sysctl for timeout period.


# 1.59 19-Mar-1998 mrg

convert pfil(9) in and out lists from <sys/queue.h> LISTs to TAILQs, and
change pfil_add_hook to put output filters at the tail of the queue,
while continuing to place input filters at the head of the queue. update
the two users of these functions, and document these changes.

fixes PR#4593.


# 1.58 15-Feb-1998 tls

Add correct copyright notice for IP address hash change. This code is donated to TNF by the original copyright holder, Panix.


# 1.57 13-Feb-1998 tls

Change list of interface IP addresses to a hash. Improves performance on hosts with a large number of IP addresses significantly.


# 1.56 28-Jan-1998 thorpej

Use offsetof() from libkern.h


# 1.55 12-Jan-1998 scottr

Use option header file for MROUTING


# 1.54 05-Jan-1998 lukem

enhance ephemeral port allocation code:
* support sysctl net.inet.ip.anonportmin (lowest ephemeral port)
and net.inet.ip.anonportmax (highest ephemeral port).
these can't be set to >65535, < IPPORT_RESERVED (unless IPNOPRIVPORTS
is defined), and anonportmin has to be < anonportmax.
* use a cleaner way of only cycling through the available set once;
this will be useful for when a random allocation scheme is used
* define IPPORT_ANON{MIN,MAX} instead of IPPORT_USER{LOW,HIGH}


Revision tags: netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base
# 1.53 18-Oct-1997 kml

branches: 1.53.2;
change sysctl net.inet.icmp.mtudisc to net.inet.ip.mtudisc


# 1.52 17-Oct-1997 thorpej

Allow `subnetsarelocal' to be changed via sysctl.


Revision tags: thorpej-signal-base marc-pcmcia-base
# 1.51 29-Aug-1997 gwr

Tweaks to allow operation with an interface address of 0.0.0.0
(needed for NFS mountroot using BOOTP to get boot parameters)


Revision tags: marc-pcmcia-bp
# 1.50 24-Jun-1997 thorpej

branches: 1.50.4;
Eliminate use of dtom() from the network code, allowing more flexible
use of mbuf external storage and increasing performance (by eliminating
an m_pullup() for clusters in the IP reassembly code).

Changes from Koji Imada <koji@math.human.nagoya-u.ac.jp>, in PR #3628
and #3480, with ever-so-slight integration changes by me.


# 1.49 15-Apr-1997 christos

Move the mtod calls *after* we've made sure that the packet has passed the
filter successfully. Otherwise it can be NULL if the filter blocked it,
and we die. How did this ever work?


Revision tags: is-newarp-before-merge
# 1.48 26-Feb-1997 mrg

allow src-routed packetd by default, per host requirements


# 1.47 25-Feb-1997 cjs

Add net.inet.ip.allowsrcrt option which allows/drops all source
routed packets. This currently defaults to `drop,' but once we
verify that all applications that rely on determining remote IP
addresses for authentication are dropping the connection when they
see a source route option (not just disabling the source route
option), we can turn this back on and conform with the host
requirements.


# 1.46 19-Feb-1997 cjs

Fix bug in sysctl net.inet.ip.forwsrcrt handing: now you can read it
if securelevel > 0. (Thanks, cgd.)


# 1.45 18-Feb-1997 mrg

pseudo-device ipfilter brings in PFIL_HOOKS.


Revision tags: is-newarp-base
# 1.44 11-Jan-1997 thorpej

branches: 1.44.4;
Implement the IP_RECVIF socket option: supply a datagram packet's incoming
interface using a sockaddr_dl in a control mbuf.

Implement SO_TIMESTAMP for IP datagrams.

Move packet information option processing into a generic function
so that they work with multicast UDP and raw IP as well as unicast UDP.

Contributed by Bill Fenner <fenner@parc.xerox.com>.


# 1.43 20-Dec-1996 mrg

in pfil_hooks: always reassign ip after calling hook.


# 1.42 20-Dec-1996 mrg

remove pfil_bad.


# 1.41 25-Oct-1996 thorpej

Before concatenating frags, sanity check the length of the packet. If it's
larger than IP_MAXPACKET, discard it.
Based on a patch from Bill Fenner <fenner@parc.xerox.com>


# 1.40 22-Oct-1996 veego

Fix a panic from the pfil_hooks.


# 1.39 13-Oct-1996 christos

backout previous kprintf changes


# 1.38 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.37 21-Sep-1996 perry

commit fix in pr 2772 -- the IP input code was assuming that the
reserved (must be zero) flag must necessarily be zero. We now define
an IP_RF (by analogy to IP_DF and IP_MF) and mask it out when necessary.


# 1.36 14-Sep-1996 mrg

move the packet filter hooks in to a saner location. while i'm here, rename
PACKET_FILTER to PFIL_HOOKS.


# 1.35 09-Sep-1996 mycroft

Add in_nullhost() and in_hosteq() macros, to hide some protocol
details. Also, fix a bug in TCP wrt SYN+URG packets.


# 1.34 08-Sep-1996 mycroft

Save 68 bytes of the packet for ICMP, not 64. From Laine Stump, PR 2296.


# 1.33 06-Sep-1996 mrg

add packet filter interface code. see pfil(9) for more details. you
need the PACKET_FILTER option to enable this code. currently, ipfilter
version 3.1.1-beta has been converted to use this new interface.


# 1.32 14-Aug-1996 thorpej

Fix some DIAGNOSTIC printf() formats; ntohl() provides a 32-bit quantity,
and should be printed with %x, not %lx.


# 1.31 10-Jul-1996 cgd

print result of ntohl/htonl as a long. (makes -Wformat work on the
Alpha.)


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.30 16-Mar-1996 christos

branches: 1.30.4;
Fix printf format args.


# 1.29 26-Feb-1996 mrg

two more local addr changes, all done differently now (idea from charles)


# 1.28 13-Feb-1996 christos

netinet prototypes


# 1.27 16-Jan-1996 thorpej

Add a net.inet.ip.directed-broadcast sysctl as suggested by
Darren Reed <darrenr@vitruvius.arbld.unimelb.edu.au> in PR #1227.
This change is slightly different than the one submitted by Darren in
that the DIRECTED_BROADCAST compile-time option will behave like it used
to so that existing configurations utilizing it won't have to change.


# 1.26 15-Jan-1996 thorpej

Add net.inet.ip.forwsrcrt: if zero, the system will not forward
source-routed packets. Note this value is protected by kernel security
level; it can only be changed if securelevel < 1.


# 1.25 21-Nov-1995 cgd

make netinet work on systems where pointers and longs are 64 bits
(like the alpha). Biggest problem: IP headers were overlayed with
structure which included pointers, and which therefore didn't overlay
properly on 64-bit machines. Solution: instead of threading pointers
through IP header overlays, add a "queue element" structure to do
the threading, and point it at the ip headers.


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.24 12-Aug-1995 mycroft

splnet --> splsoftnet


# 1.23 12-Jun-1995 mycroft

Change in_pcbnotify*() to take an errno value. Make inetctlerrmap[] an
array on ints, not u_chars.


# 1.22 12-Jun-1995 mycroft

Various cleanup, including:
* Convert several data structures to use queue.h.
* Split in_pcbnotify() into two parts; one for notifying a specific PCB, and
one for notifying all PCBs for a particular foreign address.


# 1.21 07-Jun-1995 mycroft

Remove ip_ifmatrix completely.


# 1.20 04-Jun-1995 mycroft

Don't cast things unnecessarily.


# 1.19 04-Jun-1995 mycroft

Clean up many more casts.


# 1.18 01-Jun-1995 mycroft

Avoid byte-swapping IP addresses at run time.


# 1.17 15-May-1995 cgd

oops; forgot a '{'


# 1.16 14-May-1995 cgd

drop (and record) malformed IP fragments. Fixes pr 1030 (differently).


# 1.15 13-Apr-1995 cgd

be a bit more careful and explicit with types. (basically a large no-op.)


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.14 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.13 13-May-1994 mycroft

Update to 4.4-Lite networking code, with a few local changes.


# 1.12 14-Feb-1994 mycroft

PARANOID --> DIAGNOSTIC for inexpensive tests.


# 1.11 02-Feb-1994 hpeyerl

Multicast is no longer optional.


# 1.10 29-Jan-1994 brezak

Fix some cases of NOT dealing with m_pkthdr's. This code is still suspect though, at least this fixes some panics.


# 1.9 10-Jan-1994 mycroft

Should compile now with or without `options MULTICAST'.


# 1.8 09-Jan-1994 mycroft

Prototype the rest.


# 1.7 08-Jan-1994 mycroft

More prototypes.


# 1.6 08-Jan-1994 mycroft

Fix some inconsistent spacing; spaces at the end of lines, etc.


# 1.5 18-Dec-1993 mycroft

Canonicalize all #includes.


# 1.4 06-Dec-1993 hpeyerl

multicast support.
>From Chris Maeda, cmaeda@cs.washington.edu
These patches are derived from the IP Multicast patches for BSDI.


Revision tags: magnum-base netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.3 20-May-1993 cgd

branches: 1.3.4;
more rcsid additions and file header cleanups


# 1.2 04-May-1993 cgd

make ip_input recursion checking be for -DPARANOID, and make it panic


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.358 08-Jul-2017 christos

Reorder the controls to the ones that need an interface and the ones that
don't; process the ones that don't first. Add a DIAGNOSTIC if there is no
interface; really this should be a KASSERT/panic because it is a bug if the
interface is not set at this point.


# 1.357 06-Jul-2017 christos

remove unnecessary casts (no functional change)


# 1.356 06-Jul-2017 christos

Merge the two copies SO_TIMESTAMP/SO_OTIMESTAMP processing to a single
function, and add a SOOPT_TIMESTAMP define reducing compat pollution from
5 places to 1.


Revision tags: netbsd-8-base
# 1.355 01-Jun-2017 chs

remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base
# 1.354 31-Mar-2017 ozaki-r

Don't use a single global variable to store source route information for multiple incoming packets

It's not MP-safe. So use a m_tag to store the information instead.

Pointed out by knakahara@
The fix is from OpenBSD (originally fixed in FreeBSD)


# 1.353 31-Mar-2017 ozaki-r

Don't use a single global variable as a temporal storage for multiple packets

It's not MP-safe. So use local variables instead.


Revision tags: pgoyette-localcount-20170320
# 1.352 06-Mar-2017 ozaki-r

Make sure icmp_redirect_timeout_q and ip_mtudisc_timeout_q are initialized on bootup

Fix PR kern/52029


# 1.351 17-Feb-2017 ozaki-r

Fix return value


# 1.350 17-Feb-2017 ozaki-r

Protect sysctl_net_inet_ip_pmtudto with icmp_mtx instead of softnet_lock


# 1.349 07-Feb-2017 ozaki-r

Add missing NULL checks for m_get_rcvif


Revision tags: nick-nhusb-base-20170204
# 1.348 24-Jan-2017 ozaki-r

Tweak softnet_lock and NET_MPSAFE

- Don't hold softnet_lock in some functions if NET_MPSAFE
- Add softnet_lock to sysctl_net_inet_icmp_redirtimeout
- Add softnet_lock to expire_upcalls of ip_mroute.c
- Restore softnet_lock for in{,6}_pcbpurgeif{,0} if NET_MPSAFE
- Mark some softnet_lock for future work


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107
# 1.347 12-Dec-2016 ozaki-r

branches: 1.347.2;
Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.346 08-Dec-2016 ozaki-r

Use psref for ip_rtaddr

ip_rtaddr will be sleepable soon. So use psref instead of pserialize.


# 1.345 08-Dec-2016 ozaki-r

Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.344 18-Oct-2016 ozaki-r

Don't hold global locks if NET_MPSAFE is enabled

If NET_MPSAFE is enabled, don't hold KERNEL_LOCK and softnet_lock in
part of the network stack such as IP forwarding paths. The aim of the
change is to make it easy to test the network stack without the locks
and reduce our local diffs.

By default (i.e., if NET_MPSAFE isn't enabled), the locks are held
as they used to be.

Reviewed by knakahara@


# 1.343 18-Oct-2016 ozaki-r

Avoid double frees of mbuf

May fix one of panicks reported by Tom Ivar Helbekkmo in PR kern/51522


# 1.342 11-Oct-2016 ozaki-r

Fix kernel builds with IFA_STATS


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.341 07-Sep-2016 roy

Disallow input to detached addresses because they are not yet valid.


# 1.340 31-Aug-2016 ozaki-r

Make ipforward_rt and ip6_forward_rt percpu

Sharing one rtcache between CPUs is just a bad idea.

Reviewed by knakahara@


Revision tags: pgoyette-localcount-20160806
# 1.339 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


# 1.338 26-Jul-2016 ozaki-r

Fix downmatch increment


Revision tags: pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.337 08-Jul-2016 ozaki-r

branches: 1.337.2;
CID 1363344: remove dead code

We may need to reconsider a case when m_get_rcvif_psref returns NULL.


# 1.336 07-Jul-2016 ozaki-r

Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.


# 1.335 06-Jul-2016 ozaki-r

Switch the IPv4 address list to pslist(9)

Note that we leave the old list just in case; it seems there are some
kvm(3) users accessing the list. We can remove it later if we confirmed
nobody does actually.


# 1.334 06-Jul-2016 ozaki-r

Add and use pslist(9)-based hashtable for IPv4 addresses

Note that we leave the old hashtable to keep vmstat -H working.


# 1.333 04-Jul-2016 ozaki-r

Separate IP address matching functions

No functional change intended.


# 1.332 30-Jun-2016 ozaki-r

Tidy up goto lables

No functional change.


# 1.331 30-Jun-2016 ozaki-r

Fix error paths

Some error paths did m_put_rcvif_psref twice.


# 1.330 28-Jun-2016 ozaki-r

Add missing NULL checks for m_get_rcvif_psref


# 1.329 10-Jun-2016 ozaki-r

Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319
# 1.328 21-Jan-2016 riastradh

Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.


# 1.327 21-Jan-2016 riastradh

Give proper prototype to ip_output.


# 1.326 08-Jan-2016 knakahara

eliminate ip_input.c and ip6_input.c dependency on gif(4)


Revision tags: nick-nhusb-base-20151226
# 1.325 13-Oct-2015 roy

Include arp.h to restore the sysctl net.inet.ip.dad_count.
Fixes PR kern/49883 thanks to HITOSHI Osada.


Revision tags: nick-nhusb-base-20150921
# 1.324 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.323 07-Aug-2015 ozaki-r

Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.


Revision tags: nick-nhusb-base-20150606
# 1.322 02-May-2015 joerg

Fix !ARP build.


# 1.321 02-May-2015 roy

Add IPv4 address flags IN_IFF_TENTATIVE, IN_IFF_DUPLICATED and
IN_IFF_DETATCHED to mimic the IPv6 address behaviour.
Add SIOCGIFAFLAG_IN ioctl to retrieve the address flag via the
ifreq structure.
Add IPv4 DAD detection via the ARP methods described in RFC 5227.
Add sysctls net.inet.ip.dad_count and net.inet.arp.debug.

Discussed on tech-net@


Revision tags: nick-nhusb-base-20150406
# 1.320 26-Mar-2015 ozaki-r

Tidy up the regular path of ip_forward

No functional change is intended.


Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.319 16-Jun-2014 ozaki-r

branches: 1.319.4;
Add 3rd argument to pktq_create to pass sc

It will be used to pass bridge sc for bridge_forward softint.

ok rmind@


# 1.318 05-Jun-2014 rmind

- Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.


# 1.317 30-May-2014 christos

Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.


# 1.316 29-May-2014 rmind

Make IGMP and multicast group management code MP-safe. Use a read-write
lock to protect the hash table of multicast address records; also, make it
private and eliminate some macros. In the long term, the lookup path ought
to be optimised.


# 1.315 28-May-2014 christos

CID 12164{49,51}: Remove bogus ifp == NULL checks; if ifp was really NULL,
we would have been dead a few lines before the tests.


# 1.314 23-May-2014 rmind

ip_input(), ip_savecontrol(): cache m->m_pkthdr.rcvif in a variable.


# 1.313 23-May-2014 rmind

Make ip_forward() static, there is no need to expose it.


# 1.312 23-May-2014 rmind

Make ip_input() static, there is no need to expose it.


# 1.311 22-May-2014 rmind

- Add in_init() and move some functions, variables and sysctls into in.c
where they belong to. Make some functions and variables static.
- ip_input.c: reduce some #ifdefs, cleanup a little.
- Move some sysctls into ip_flow.c as they belong there.

No functional change.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.310 19-Mar-2014 liamjfoy

branches: 1.310.2;
Remove ipflow_prune and replace with ipflow_reap. ok rmind@


Revision tags: riastradh-drm2-base3
# 1.309 25-Feb-2014 pooka

Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.308 29-Jun-2013 rmind

- Rewrite parts of pfil(9): use array to store hooks and thus be more cache
friendly (there are only few hooks in the system). Make the structures
opaque and the interface more strict.
- Remove PFIL_HOOKS option by making pfil(9) mandatory.


# 1.307 27-Jun-2013 christos

branches: 1.307.2;
flip src/dst


# 1.306 27-Jun-2013 christos

implement IP_PKTINFO and IP_RECVPKTINFO.


# 1.305 08-Jun-2013 rmind

Split IPsec code in ip_input() and ip_forward() into the separate routines
ipsec4_input() and ipsec4_forward(). Tested by christos@.


# 1.304 05-Jun-2013 christos

IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.


Revision tags: agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.303 29-Nov-2012 christos

Add a new sysctl to mark ports as reserved, so that they are not used in
the anonymous or reserved port allocation.


Revision tags: yamt-pagecache-base6
# 1.302 25-Jun-2012 christos

branches: 1.302.2;
rename rfc6056 -> portalgo, requested by yamt


# 1.301 22-Jun-2012 christos

PR/46602: Move the rfc6056 port randomization to the IP layer.


# 1.300 02-Jun-2012 dsl

Add some pre-processor magic to verify that the type of the data item
passed to sysctl_createv() actually matches the declared type for
the item itself.
In the places where the caller specifies a function and a structure
address (typically the 'softc') an explicit (void *) cast is now needed.
Fixes bugs in sys/dev/acpi/asus_acpi.c sys/dev/bluetooth/bcsp.c
sys/kern/vfs_bio.c sys/miscfs/syncfs/sync_subr.c and setting
AcpiGbl_EnableAmlDebugObject.
(mostly passing the address of a uint64_t when typed as CTLTYPE_INT).
I've test built quite a few kernels, but there may be some unfixed MD
fallout. Most likely passing &char[] to char *.
Also add CTLFLAG_UNSIGNED for unsiged decimals - not set yet.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.299 22-Mar-2012 drochner

remove KAME IPSEC, replaced by FAST_IPSEC


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.298 09-Jan-2012 liamjfoy

check against NULL


# 1.297 19-Dec-2011 drochner

rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.296 31-Aug-2011 plunky

branches: 1.296.2; 1.296.6;
NULL does not need a cast


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.295 03-May-2011 dyoung

*_drain() routines may be called with locks held, so instead of doing
any work in *_drain(), set a drain-needed flag. Do the work in the
fasttimo handler.

Contributed by Coyote Point Systems, Inc.


# 1.294 14-Apr-2011 dyoung

In ipintr(), don't overwrite ipintrq.ifq_maxlen with IFQ_MAXLEN.

Initialize ipintrq.ifq_maxlen using IFQ_MAXLEN directly instead of using
the global ipqmaxlen. Get rid of the global ipqmaxlen.

Now it works again to override the maximum IP queue length with, for
example, sysctl -w net.inet.ip.ifq.maxlen=5.


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.293 13-Dec-2010 matt

branches: 1.293.2;
Back out rev that shouldn't have been committed.


# 1.292 11-Dec-2010 matt

Add routines to calculate a checkesum if the driver concludes that the
h/w can't do it.


Revision tags: uebayasi-xip-base4
# 1.291 05-Nov-2010 rmind

ip_randomid: make mechanism MP-safe and more modular.

OK matt@


# 1.290 05-Nov-2010 rmind

ip_reass_packet: finish abstraction; some clean-up.
Discussed some time ago with matt@.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.289 19-Jul-2010 rmind

Abstract IP reassembly into single generic routine - ip_reass_packet().
Make struct ipq private and struct ipqent not visible to userland.
Push ip_len adjustment into reassembly layer.

OK matt@


# 1.288 13-Jul-2010 rmind

Split-off IPv4 re-assembly mechanism into a separate module. Abstract
into ip_reass_init(), ip_reass_lookup(), etc (note: abstraction is not
yet complete). No functional changes to the actual mechanism.

OK matt@


# 1.287 09-Jul-2010 rmind

ip_input: move lookup for fragment queue a little bit further. OK matt@.


Revision tags: uebayasi-xip-base1
# 1.286 01-Apr-2010 tls

As suggested by at least 3 different people (the guilty parties know who
they are) avoid repeated kernel_lock/unlock by using an intrq on the stack.

About 5%-10% better from run to run, on my *very* simpleminded test. Can't
possibly be worse.


# 1.285 31-Mar-2010 tls

Don't hold kernel lock across call to ip_input() -- it blocked *all*
hardware interrupts for the length of time it took for all dequeued
packets to flow up the stack (on multiprocessors only). Initial testing
shows performance impact is minimal -- since this temporary fix actually
means taking/releasing the kernel lock per-packet, that seems
acceptable.

Holding the kernel lock across the ip_input() call duplicated the
exclusion intended to be provided by the socket locks/softnet lock
(same lock, for INET/INET6 sockets) and could mask serious bugs. Several
hours' testing didn't turn any up but I'd be surprised if some don't now
appear.

Damon Permezel noticed the problem. Temporary fix suggested by matt@.


Revision tags: yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.284 16-Sep-2009 pooka

branches: 1.284.2; 1.284.4;
Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.283 17-Jul-2009 minskim

Delete trailing whitespace.


Revision tags: yamt-nfs-mp-base6
# 1.282 16-Jul-2009 minskim

Add the IP_RECVTTL option support.

If the IP_RECVTTL option is enabled on a SOCK_DGRAM socket, the
recvmsg(2) call will return the TTL of the received datagram. The
msg_control field in the msghdr structure points to a buffer that
contains a cmsghdr structure followed by the TTL value.

Modeled after FreeBSD implementation.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.281 18-Apr-2009 tsutsui

Remove extra whitespace added by a stupid tool.
XXX: more in src/sys/arch


# 1.280 15-Apr-2009 elad

Remove a few KAUTH_GENERIC_ISSUSER in favor of more descriptive
alternatives.

Discussed on tech-kern:

http://mail-index.netbsd.org/tech-kern/2009/04/11/msg004798.html

Input from ad@, christos@, dyoung@, tsutsui@.

Okay ad@.


# 1.279 18-Mar-2009 cegger

bcopy -> memcpy


Revision tags: nick-hppapmap-base2
# 1.278 19-Jan-2009 christos

branches: 1.278.2;
Provide compatibility to the old timeval SCM_TIMESTAMP messages.


Revision tags: mjf-devfs2-base
# 1.277 17-Dec-2008 cegger

kill MALLOC and FREE macros.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.276 23-Nov-2008 rmind

ip_input: fix an IPQ "lock" leak. (hi <matt>!)


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4
# 1.275 04-Oct-2008 pooka

branches: 1.275.2; 1.275.4;
POOL_INIT -> pool_init


Revision tags: wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.274 05-Sep-2008 seanb

Wrong route being consulted in one place
in ip_forward() after change to rtcache_*().
Restore previous behaviour.


# 1.273 20-Aug-2008 matt

Make the sysctl routines take out softnet_lock before dealing with
any data structures.

Change inet6ctlerrmap and zeroin6_addr to const.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base
# 1.272 05-May-2008 ad

branches: 1.272.2; 1.272.6;
- Convert hashinit() to use kmem_alloc(). The hash tables can be large
and it's better to not have them in kmem_map.
- Convert a couple of minor items along the way to kmem_alloc().
- Fix some memory leaks.


# 1.271 04-May-2008 thorpej

Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577


# 1.270 02-May-2008 ad

PR kern/38497 Out of memory allocating ksiginfo

Work around: don't acquire softnet_lock in protocol drain routines.


# 1.269 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.268 24-Apr-2008 ad

branches: 1.268.2;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.


# 1.267 23-Apr-2008 thorpej

Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.266 12-Apr-2008 thorpej

branches: 1.266.2;
Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.


# 1.265 09-Apr-2008 thorpej

- ipflow is not used outside ip_flow.c; move its definition there.
- Make ipflow_reap() private to ip_flow.c, and introduce ipflow_prune()
for external callers to use (avoids returning an ipflow * that is never
actually used anyway).


# 1.264 07-Apr-2008 thorpej

Change IP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ipstat structure; old netstat
binaries will continue to work properly.


# 1.263 27-Mar-2008 cube

- Make sure we send a reasonable fragment size when IPSEC is configured.
Otherwise we end up sending a dubious "0" whenever we cannot find a
proper association for the packet.
- Reset sack_newdata along with snd_nxt to avoid improper integer
arithmetics that lead to sending data from an incorrect place in the
stream, making it appear as corrupted.

Patch by Michael Van Elst, based on an analysis by Michael for the IPSEC
stuff and I for the SACK issue.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.262 06-Feb-2008 matt

branches: 1.262.6;
Add a new ip_id generation scheme based on a Fisher-Yates shuffle over a
sliding window. XXX replace use of arc4random RSN.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.261 14-Jan-2008 dyoung

Use rtcache_validate() instead of rtcache_getrt(). Shorten staircase
in in_losing().


Revision tags: vmlocking2-base3 matt-armv6-base
# 1.260 22-Dec-2007 matt

Fix offset calculation.
Make sure that all frags use the same TOS.


# 1.259 21-Dec-2007 matt

Also make sure the first is at 68 bytes long.


# 1.258 21-Dec-2007 matt

Prevent TCP blind data attacks by not allowing non-initial fragments to
start at less than 68 bytes (minimal fragment size).


# 1.257 20-Dec-2007 dyoung

Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.256 26-Nov-2007 yamt

branches: 1.256.2; 1.256.6;
inetctlerrmap: use designated initializer.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.255 09-Nov-2007 kefren

Don't MCLAIM in ipintr() because we do it anyway in ip_input()


Revision tags: jmcneill-base yamt-x86pmap-base4 yamt-x86pmap-base3 yamt-x86pmap-base2 vmlocking-base
# 1.254 02-Oct-2007 dyoung

branches: 1.254.2; 1.254.4;
Delete the unused second argument to ip_stripoptions(), move it
closer to its single caller in if_eon.c, try to move fewer bytes
by moving the IP header forward instead of moving the tail of the
mbuf backward, and use m_adj(9) instead of fiddling directly with
mbuf data members.


Revision tags: yamt-x86pmap-base
# 1.253 11-Sep-2007 degroote

branches: 1.253.2;
In some FAST_IPSEC, spl level is not restored correctly. Fix that.

Spotted by Wolfgang Stukenbrock in pr/36800


Revision tags: nick-csl-alignment-base5
# 1.252 30-Aug-2007 dyoung

Use malloc(9) for sockaddrs instead of pool(9), and remove dom_sa_pool
and dom_sa_len members from struct domain. Pools of fixed-size
objects are too rigid for sockaddr_dls, whose size can vary over
a wide range.

Return sockaddr_dl to its "historical" size. Now that I'm using
malloc(9) instead of pool(9) to allocate sockaddr_dl, I can create
a sockaddr_dl of any size in the kernel, so expanding sockaddr_dl
is useless.

Avoid using sizeof(struct sockaddr_dl) in the kernel.

Introduce sockaddr_dl_alloc() for allocating & initializing an
arbitrary sockaddr_dl on the heap.

Add an argument, the sockaddr length, to sockaddr_alloc(),
sockaddr_copy(), and sockaddr_dl_setaddr().

Constify: LLADDR() -> CLLADDR().

Where the kernel overwrites LLADDR(), use sockaddr_dl_setaddr(),
instead. Used properly, sockaddr_dl_setaddr() will not overrun
the end of the sockaddr.


# 1.251 10-Aug-2007 dyoung

branches: 1.251.2;
Use sockaddr_dl_init().


Revision tags: matt-mips64-base
# 1.250 19-Jul-2007 dyoung

branches: 1.250.4; 1.250.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base yamt-idlelwp-base8 mjf-ufs-trans-base
# 1.249 02-May-2007 dyoung

branches: 1.249.2;
Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


Revision tags: thorpej-atomic-base
# 1.248 25-Mar-2007 liamjfoy

Add net.inet.ip.hashsize to control the IPv4 fast forward hash table size.


# 1.247 24-Mar-2007 liamjfoy

Don't call ip*flow_reap if we're just looking up maxflows


# 1.246 12-Mar-2007 ad

branches: 1.246.2; 1.246.4;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.


# 1.245 05-Mar-2007 liamjfoy

branches: 1.245.2;
Move ipflow_slowtimo from ip_slowtimo and into in_proto.c

ok matt@


# 1.244 04-Mar-2007 christos

Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.243 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.242 29-Jan-2007 dyoung

branches: 1.242.2;
Cosmetic: remove extraneous, non-KNF parentheses. Change a
sizeof(type) to a sizeof(*ptr) so the correctness of the statement
is correct "at a glance" (or so I hope).


# 1.241 22-Dec-2006 ad

ipintr(): check if the queue is empty before looping. Hardly a giant
win, but removed 30% of splnet() calls in one local test.


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.240 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.239 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


# 1.238 06-Dec-2006 dyoung

KNF.


# 1.237 06-Dec-2006 dyoung

KNF.


Revision tags: netbsd-4-0-RC1 netbsd-4-base
# 1.236 16-Nov-2006 christos

branches: 1.236.2; 1.236.4;
__unused removal on arguments; approved by core.


Revision tags: yamt-splraiseipl-base2
# 1.235 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


# 1.234 10-Oct-2006 dogcow

change the MOWNER_INIT define to take two args; fix extant struct mowner
decls to use it. Makes options MBUFTRACE compile again and not whinge about
missing structure declarations. (Also makes initialization consistent.)


# 1.233 05-Oct-2006 tls

Protect calls to pool_put/pool_get that may occur in interrupt context
with spl used to protect other allocations and frees, or datastructure
element insertion and removal, in adjacent code.

It is almost unquestionably the case that some of the spl()/splx() calls
added here are superfluous, but it really seems wrong to see:

s=splfoo();
/* frob data structure */
splx(s);
pool_put(x);

and if we think we need to protect the first operation, then it is hard
to see why we should not think we need to protect the next. "Better
safe than sorry".

It is also almost unquestionably the case that I missed some pool
gets/puts from interrupt context with my strategy for finding these
calls; use of PR_NOWAIT is a strong hint that a pool may be used from
interrupt context but many callers in the kernel pass a "can wait/can't
wait" flag down such that my searches might not have found them. One
notable area that needs to be looked at is pf.

See also:

http://mail-index.netbsd.org/tech-kern/2006/07/19/0003.html
http://mail-index.netbsd.org/tech-kern/2006/07/19/0009.html


# 1.232 19-Sep-2006 elad

Remove ugly (void *) casts from network scope authorization wrapper and
calls to it.

While here, adapt code for system scope listeners to avoid some more
casts (forgotten in previous run).

Update documentation.


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9
# 1.231 13-Sep-2006 elad

branches: 1.231.2;
Don't use KAUTH_RESULT_* where it's not applicable.
Prompted by yamt@.


# 1.230 08-Sep-2006 elad

First take at security model abstraction.

- Add a few scopes to the kernel: system, network, and machdep.

- Add a few more actions/sub-actions (requests), and start using them as
opposed to the KAUTH_GENERIC_ISSUSER place-holders.

- Introduce a basic set of listeners that implement our "traditional"
security model, called "bsd44". This is the default (and only) model we
have at the moment.

- Update all relevant documentation.

- Add some code and docs to help folks who want to actually use this stuff:

* There's a sample overlay model, sitting on-top of "bsd44", for
fast experimenting with tweaking just a subset of an existing model.

This is pretty cool because it's *really* straightforward to do stuff
you had to use ugly hacks for until now...

* And of course, documentation describing how to do the above for quick
reference, including code samples.

All of these changes were tested for regressions using a Python-based
testsuite that will be (I hope) available soon via pkgsrc. Information
about the tests, and how to write new ones, can be found on:

http://kauth.linbsd.org/kauthwiki

NOTE FOR DEVELOPERS: *PLEASE* don't add any code that does any of the
following:

- Uses a KAUTH_GENERIC_ISSUSER kauth(9) request,
- Checks 'securelevel' directly,
- Checks a uid/gid directly.

(or if you feel you have to, contact me first)

This is still work in progress; It's far from being done, but now it'll
be a lot easier.

Relevant mailing list threads:

http://mail-index.netbsd.org/tech-security/2006/01/25/0011.html
http://mail-index.netbsd.org/tech-security/2006/03/24/0001.html
http://mail-index.netbsd.org/tech-security/2006/04/18/0000.html
http://mail-index.netbsd.org/tech-security/2006/05/15/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/01/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/25/0000.html

Many thanks to YAMAMOTO Takashi, Matt Thomas, and Christos Zoulas for help
stablizing kauth(9).

Full credit for the regression tests, making sure these changes didn't break
anything, goes to Matt Fleming and Jaime Fournier.

Happy birthday Randi! :)


Revision tags: yamt-pdpolicy-base8 rpaulo-netinet-merge-pcb-base
# 1.229 30-Aug-2006 christos

branches: 1.229.2;
fix initializer


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.228 30-Jul-2006 elad

ugh.. more stuff that's overdue and should not be in 4.0: remove the
sysctl(9) flags CTLFLAG_READONLY[12]. luckily they're not documented
so it's only half regression.

only two knobs used them; proc.curproc.corename (check added in the
existing handler; its CTLFLAG_ANYWRITE, yay) and net.inet.ip.forwsrcrt,
that got its own handler now too.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base
# 1.227 07-Jun-2006 kardel

merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base simonb-timecounters-base
# 1.226 08-May-2006 liamjfoy

branches: 1.226.2;
#if -> #ifdef

ok christos


# 1.225 15-Apr-2006 christos

Coverity CID 1134: Protect against NULL deref.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.224 18-Feb-2006 joerg

branches: 1.224.2; 1.224.4; 1.224.6;
Print the source and destination IP in ip_forward's DIAGNOSTIC code
with inet_ntoa, making it more human friendly.

From Liam J. Foy in private mail.


# 1.223 24-Dec-2005 perry

branches: 1.223.2; 1.223.4; 1.223.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.222 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.221 01-Nov-2005 christos

Don't decrement the ttl, until we are sure that we can forward this packet.
Before if there was no route, we would call icmp_error with a datagram
packet that has an incorrect checksum. (From Liam Foy)


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base
# 1.220 23-Oct-2005 christos

No need to pass an interface when only the mtu is needed. From OpenBSD via
Liam Foy.


Revision tags: yamt-vop-base
# 1.219 05-Aug-2005 elad

branches: 1.219.2;
Add sysctls for IP, ICMP, TCP, and UDP statistics.


# 1.218 28-Jun-2005 seanb

branches: 1.218.2;
- Return ICMP_UNREACH_NET when no route found as per
section 4.3.3.1 of rfc1812.


# 1.217 09-Jun-2005 atatat

Properly fix the constipated lossage wrt -Wcast-qual and the sysctl
code. I know it's not the prettiest code, but it seems to work rather
well in spite of itself.


# 1.216 01-Jun-2005 blymn

Unconstify rnode to prevent compile error when GATEWAY option set.


Revision tags: kent-audio2-base
# 1.215 29-Apr-2005 yamt

move decl of inetsw to its own header to avoid array of incomplete type.
found by gcc4. reported by Adam Ciarcinski.


# 1.214 18-Apr-2005 yamt

fix problems related to loopback interface checksum omission. PR/29971.

- for ipv4, defer decision to ip layer as h/w checksum offloading does
so that it can check the actual interface the packet is going to.
- for ipv6, disable it.
(maybe will be revisited when it implements h/w checksum offloading.)

ok'ed by Jason Thorpe.


# 1.213 29-Mar-2005 yamt

ip_reass: clear stale csum_flags.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base
# 1.212 26-Feb-2005 perry

branches: 1.212.2;
nuke trailing whitespace


Revision tags: yamt-km-base2
# 1.211 03-Feb-2005 perry

ANSIfy function declarations


# 1.210 02-Feb-2005 perry

de-__P -- will ANSIfy .c files later.


Revision tags: yamt-km-base
# 1.209 24-Jan-2005 matt

branches: 1.209.2;
Add IFNET_FOREACH and IFADDR_FOREACH macros and start using them.


Revision tags: kent-audio1-beforemerge
# 1.208 19-Dec-2004 christos

branches: 1.208.2;
yamt's changes seem to fix all the checksumming issues. Turn the loopback
checksums back off so we can make sure that everything works.


# 1.207 17-Dec-2004 christos

Turn checksumming on loopback back on until we fix the bugs in it.
Connect over tcp on the loopback is broken:

4729 amq 0.000007 CALL connect(4,0x804f2a0,0x1c)
4729 amq 75.007420 RET connect -1 errno 60 Connection timed out


# 1.206 15-Dec-2004 thorpej

Don't perform checksums on loopback interfaces. They can be reenabled with
the net.inet.*.do_loopback_cksum sysctl.

Approved by: groo


Revision tags: kent-audio1-base
# 1.205 06-Oct-2004 darrenr

Add a comment to document what setting "srcrt" is really on about in ipintr()


# 1.204 29-Sep-2004 christos

PR/27081: Sean Boudreau: ip_input() bad csum count not incremented on sw csum


Revision tags: BEFORE-IPF413
# 1.203 25-May-2004 atatat

Sysctl descriptions under net subtree (net.key not done)


# 1.202 02-May-2004 darrenr

at line 543, we do a pullup here of hlen bytes into the mbuf,
so these later ones are superfluous.


# 1.201 01-May-2004 matt

Use EVCNT_ATTACH_STATIC{,2}


# 1.200 25-Apr-2004 simonb

Initialise (most) pools from a link set instead of explicit calls
to pool_init. Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.

Convert struct session, ucred and lockf to pools.


# 1.199 22-Apr-2004 matt

Constify protosw arrays. This can reduce the kernel .data section by
over 4K (if all the network protocols) are loaded.


# 1.198 01-Apr-2004 matt

In ip_reass_ttl_descr, make i signed since it's compared to >= 0


Revision tags: netbsd-2-0-base BEFORE-IPF411
# 1.197 24-Mar-2004 atatat

branches: 1.197.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.


# 1.196 15-Jan-2004 itojun

correct typo in 1.94 -> 1.95. pointed out by Shiva Shenoy


# 1.195 14-Dec-2003 thorpej

Fix syntax errors in CHECK_NMBCLUSTER_PARAMS().


# 1.194 14-Dec-2003 jonathan

Second part of hashed IP_reassembly changes:

When under pressure for mbufs or we have too many fragments in the IP
reassembly queue, drop half of all fragments. This multiplicative-drop
strategy ensures we return to a healthy state, even under borderline
denial-of-service from extremely lossy NFS-over-UDP peers.
The multiplicative-drop phase currently drops 50% of fragments, but
has pre-placed support for implementing drop-fractions other than 50%

The threshhold for the `drop-half' phase is the new variable,
ip_maxfrags which is calculated as nmbclusters/4.

ip_input.c now keeps ip_nmbclusters, a cached copy of nmbclusters.
Before using limits derived from nmbclusters, we check if nmbclusters
and ip_nmclusters are equal. If not, we recompute Ip parameters
derived from nmbclusters. Based on a suggestion by Jason Thorpe.
ip_maxfrags is currently auto-recalcuated.

The counters ip_nfrags and ip_nfragpacketsr are now declared static
and uninitialized (bss), to discourage tampering with them.


# 1.193 12-Dec-2003 scw

Make fast-ipsec and ipflow (Fast Forwarding) interoperate.

The idea is that we only clear M_CANFASTFWD if an SPD exists
for the packet. Otherwise, it's safe to add a fast-forward
cache entry for the route.

To make this work properly, we invalidate the entire ipflow
cache if a fast-ipsec key is added or changed.


# 1.192 08-Dec-2003 jonathan

Add new field ipq_nfrags to struct ipq. Maintain count of fragments
(fragments, not fragmented packets) in each queue entry.
Use ipq_nfrags to maintain a count of total fragments in reassembly queue.


# 1.191 07-Dec-2003 jonathan

KNF: s/unsigned/u_int/, in a couple of places I missed.


# 1.190 06-Dec-2003 jonathan

Replace the single global IP reassembly list/listhead, with a
hashtable of list-heads. Independently re-invented, then reworked to
match similar code in FreeBSD.


# 1.189 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.188 04-Dec-2003 scw

ipflow (IP fast forwarding) is not compatible with FAST_IPSEC either.

XXX: The decision whether or not to fast forward should be made
XXX: dynamically. Using the current approach seriously reduces
XXX: routing performance on gateways with IPsec enabled.


# 1.187 26-Nov-2003 itojun

define RANDOM_IP_ID by default (unifdef -DRANDOM_IP_ID).
one use remains in sys/netipsec, which is kept for freebsd source code compat.


# 1.186 24-Nov-2003 scw

For FAST_IPSEC, ipfilter gets to see wire-format IPsec-encapsulated packets
only. Decapsulated packets bypass ipfilter. This mimics current behaviour
for Kame IPsec.


# 1.185 19-Nov-2003 fvdl

Correct number of arguments to sysctl_rdint.


# 1.184 19-Nov-2003 jonathan

Patch back support for (badly) randomized IP ids, by request:

* Include "opt_inet.h" everywhere IP-ids are generated with ip_newid(),
so the RANDOM_IP_ID option is visible. Also in ip_id(), to ensure
the prototype for ip_randomid() is made visible.

* Add new sysctl to enable randomized IP-ids, provided the kernel was
configured with RANDOM_IP_ID. (The sysctl defaults to zero, and is
a read-only zero if RANDOM_IP_ID is not configured).

Note that the implementation of randomized IP ids is still defective,
and should not be enabled at all (even if configured) without
very careful deliberation. Caveat emptor.


# 1.183 17-Nov-2003 jonathan

Diff to netinet/ip_input.c (restore ip_id, initialize) for ip_id fix:

Revert the (default) ip_id algorithm to the pre-randomid algorithm,
due to demonstrated low-period repeated IDs from the randomized IP_id
code. Consensus is that the low-period repetition (much less than
2^15) is not suitable for general-purpose use.

Allocators of new IPv4 IDs should now call the function ip_newid().
Randomized IP_ids is now a config-time option, "options RANDOM_IP_ID".
ip_newid() can use ip_random-id()_IP_ID if and only if configured
with RANDOM_IP_ID. A sysctl knob should be provided.

This API may be reworked in the near future to support linear ip_id
counters per (src,dst) IP-address pair.


# 1.182 12-Nov-2003 itojun

KNF


# 1.181 11-Nov-2003 jonathan

Change global head-of-local-IP-address list from in_ifaddr to
in_ifaddrhead. Recent changes in struct names caused a namespace
collision in fast-ipsec, which are most cleanly fixed by using
"in_ifaddrhead" as the listhead name.


# 1.180 10-Nov-2003 jonathan

Make per-protocol network input queue stats visible to userland via
sysctl. Add a protocol-independent sysctl handler to show the per-protocol
"struct ifq' statistics. Add IP(v4) specific call to the handler.
Other protocols can show their per-protocol input statistics by
allocating a sysclt node and calling sysctl_ifq() with their own struct ifq *.

As posted to tech-kern plus improvements/cleanup suggested by Andrew Brown.


# 1.179 28-Sep-2003 mycroft

Remove some code that breaks AH tunnels completely. The comment describing
the purpose of this code appears to be on crack -- it's talking about
end-to-end authentication, but the purpose of an AH tunnel is NOT end-to-end
authentication; it's authentication of the tunnel endpoints.

NB: This does not fix the fact that IPsec leaks "packet tags."


# 1.178 06-Sep-2003 itojun

randomize IPv4/v6 fragment ID and IPv6 flowlabel. avoids predictability
of these fields. ip_id.c is from openbsd. ip6_id.c is adapted by kame.


# 1.177 06-Sep-2003 itojun

backout previous, we don't know if arc4random() corrides on reboot.


# 1.176 05-Sep-2003 itojun

initialize fragment ID with arc4random, not by time.tv_sec


# 1.175 22-Aug-2003 itojun

remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.


# 1.174 22-Aug-2003 itojun

change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.


# 1.173 15-Aug-2003 jonathan

(fast-ipsec): Add hooks to pass IPv4 IPsec traffic into fast-ipsec, if
configured with ``options FAST_IPSEC''. Kernels with KAME IPsec or
with no IPsec should work as before.

All calls to ip_output() now always pass an additional compulsory
argument: the inpcb associated with the packet being sent,
or 0 if no inpcb is available.

Fast-ipsec tested with ICMP or UDP over ESP. TCP doesn't work, yet.


# 1.172 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.171 14-Jul-2003 itojun

correct igmp. from love


# 1.170 03-Jul-2003 itojun

minor KNF


# 1.169 30-Jun-2003 itojun

branches: 1.169.2;
do not generate ICMP redirect when packet filter alters ip_dst to an
address that reside on the same link. Cedric Berger convinced me that
it is necessary.


# 1.168 30-Jun-2003 itojun

fix indent


# 1.167 23-Jun-2003 martin

Make sure to include opt_foo.h if a defflag option FOO is used.


# 1.166 15-Jun-2003 matt

Change the way multicasts are kept. They now use a hash table in the same
manner as the ifaddr hash table. By doing this, the mkludge code can go
away. At the same time, keep track of what pcbs are using what ifaddr and
when an address is deleted from an interface, notify/abort all sockets
that have that address as a source. Switch IGMP and multicasts to use pools
for allocation. Fix a number of potential problems in the igmp code where
allocation failures could cause a trap/panic.


# 1.165 11-Apr-2003 christos

PR/991: Darren Reed: Add a sysctl (checkinteface) to implement this. This
implementation is taken from FreeBSD, but we default to off.
XXX: We should really do this on a per ifaddr basis as jason suggested.


# 1.164 26-Feb-2003 matt

Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.163 12-Nov-2002 itojun

remove all entries in rt timer queue on ip_mtudisc change, instead of
destroying the queue.


# 1.162 12-Nov-2002 itojun

ckout previous - doesn't compile


# 1.161 12-Nov-2002 itojun

update ip_mtudisc sysctl change handling.


# 1.160 10-Nov-2002 itojun

always create pmtud timeout queue, as ip_mtudisc can be tweaked via
sysctl at runtime. From lha@stacken.kth.se


# 1.159 02-Nov-2002 perry

/*CONTCOND*/ while (0)'ed macros


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.158 23-Sep-2002 itojun

revert mtudisc_timeout value to the old one if update falis


# 1.157 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame.


# 1.156 11-Sep-2002 itojun

correct signedness mixup in pointer passing. sync w/kame


Revision tags: gehenna-devsw-base
# 1.155 14-Aug-2002 itojun

avoid swapping endian of ip_len and ip_off on mbuf, to meet with M_LEADINGSPACE
optimization made last year. should solve PR 17867 and 10195.

IP_HDRINCL behavior of raw ip socket is kept unchanged. we may want to
provide IP_HDRINCL variant that does not swap endian.


# 1.154 30-Jun-2002 thorpej

Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).


# 1.153 13-Jun-2002 itojun

set IPv4 parameter to modern value.
- turn on path MTU discovery (previous: turned off)
- ICMPv4 redirect entry timeout = 600 sec (previous: never timeout)


# 1.152 09-Jun-2002 itojun

whitespace


# 1.151 07-Jun-2002 itojun

look at rmx_mtu on IPsec tunnel MTU computation.
From: David Waitzman <djw@bbn.com>


Revision tags: netbsd-1-6-base
# 1.150 12-May-2002 matt

branches: 1.150.2; 1.150.4;
Eliminate commons.


# 1.149 12-May-2002 wiz

Spelling fixes, from Sergey Svishchev in kern/16650.


# 1.148 07-May-2002 matt

Change struct ipqe to use TAILQ's instead of LIST's (primarily for TCP's
benefit currently). Rework tcp_reass code to optimize the 4 most likely causes
of out-of-order packets: first OoO pkt, next OoO pkt in seq, OoO pkt is part
of new chuck of OoO packets, and the OoO pkt fills the first hole. Add evcnts
to instrument tcp_reass (enabled by the options TCP_REASS_COUNTERS). This is
part 1/2 of tcp_reass changes.


# 1.147 18-Apr-2002 matt

Change test for M_EXT to M_READONLY for MROUTING. We only need to to do
a pullup if we aren't allowed to modify the packet.


Revision tags: eeh-devprop-base newlock-base
# 1.146 08-Mar-2002 thorpej

Pool deals fairly well with physical memory shortage, but it doesn't
deal with shortages of the VM maps where the backing pages are mapped
(usually kmem_map). Try to deal with this:

* Group all information about the backend allocator for a pool in a
separate structure. The pool references this structure, rather than
the individual fields.
* Change the pool_init() API accordingly, and adjust all callers.
* Link all pools using the same backend allocator on a list.
* The backend allocator is responsible for waiting for physical memory
to become available, but will still fail if it cannot callocate KVA
space for the pages. If this happens, carefully drain all pools using
the same backend allocator, so that some KVA space can be freed.
* Change pool_reclaim() to indicate if it actually succeeded in freeing
some pages, and use that information to make draining easier and more
efficient.
* Get rid of PR_URGENT. There was only one use of it, and it could be
dealt with by the caller.

From art@openbsd.org.


Revision tags: ifpoll-base
# 1.145 25-Feb-2002 itojun

correctly enforce ipsec policy check on forwarding case.
From: Greg Troxel <gdt@ir.bbn.com>, Bill Chiarchiaro <wjc@work.cleartech.com>


# 1.144 24-Feb-2002 martin

Clear M_BCAST and M_MCAST on outgoing mbufs.
Don't copy ttl from the inner packet to the encapsulating packet. Make
the outer ttl sysctl'able. This should close PR 14269 from Jasper Wallace
(change partly from there) and it makes traceroute work over gre tunnels.


# 1.143 21-Feb-2002 itojun

suppress source quence message, based on router-req RFC (also could be abused
as DoS traffic generator). from kjc/kame


# 1.142 28-Nov-2001 darrenr

recompute hlen after calling pfil_run_hooks() in case ip_hl was changed.


# 1.141 13-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-mips-cache-base
# 1.140 04-Nov-2001 matt

Convert netinet to not use the internal <sys/queue.h> field names
but instead the access macros. Use the FOREACH macros where appropriate.


# 1.139 04-Nov-2001 matt

Change a few variable/tables to const since they are read-only.


# 1.138 29-Oct-2001 simonb

Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.137 17-Sep-2001 thorpej

branches: 1.137.2;
Split the pre-computed ifnet checksum flags into Tx and Rx directions.
Add capabilities bits that indicate an interface can only perform
in-bound TCPv4 or UDPv4 checksums. There is at least one Gig-E chip
for which this is true (Level One LXT-1001), and this is also the
case for the Intel i82559 10/100 Ethernet chips.


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.136 06-Aug-2001 itojun

branches: 1.136.2;
cache IPsec policy on in6?pcb. most of the lookup operations can be bypassed,
especially when it is a connected SOCK_STREAM in6?pcb. sync with kame.


# 1.135 02-Jun-2001 thorpej

branches: 1.135.2;
Implement support for IP/TCP/UDP checksum offloading provided by
network interfaces. This works by pre-computing the pseudo-header
checksum and caching it, delaying the actual checksum to ip_output()
if the hardware cannot perform the sum for us. In-bound checksums
can either be fully-checked by hardware, or summed up for final
verification by software. This method was modeled after how this
is done in FreeBSD, although the code is significantly different in
most places.

We don't delay checksums for IPv6/TCP, but we do take advantage of the
cached pseudo-header checksum.

Note: hardware-assisted checksumming defaults to "off". It is
enabled with ifconfig(8). See the manual page for details.

Implement hardware-assisted checksumming on the DP83820 Gigabit Ethernet,
3c90xB/3c90xC 10/100 Ethernet, and Alteon Tigon/Tigon2 Gigabit Ethernet.


# 1.134 21-May-2001 lukem

fix spelo in comment


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.133 16-Apr-2001 itojun

give a default value to net.inet.ip.maxfragpackets, to protect us from
"lots of fragmented packets" DoS attack.

the current default value is derived from ipv6 counterpart, which is
a magical value "200". it should be enough for normal systems, not sure
if it is enough when you take hundreds of thousands of tcp connections on
your system. if you have proposal for a better value with concrete reasons,
let me know.


# 1.132 13-Apr-2001 thorpej

Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.


# 1.131 27-Mar-2001 itojun

net.inet.ip.maxfragpackets defines the maximum size of ip reass queue
(prevents fragment flood from chewing up mbuf memory space).
derived from KAME net.inet6.ip6.maxfragpackets.


# 1.130 02-Mar-2001 itojun

branches: 1.130.2;
increase ipstat.ips_badaddr if the packet fails to pass address checks.


# 1.129 02-Mar-2001 itojun

reject packets with 127/8 on IPv4 src/dst, they must not appear on wire
(RFC1122). torture-tests will be welcomed.
XXX do we want to check source routing headers as well?


# 1.128 01-Mar-2001 itojun

make sure to enforce inbound ipsec policy checking, for any protocols on top
of ip (check it when final header is visited). sync with kame.
XXX kame team will need to re-check policy engine code


# 1.127 24-Jan-2001 itojun

- record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation


# 1.126 28-Dec-2000 thorpej

Back out the sledgehammer damage applied by wiz while I was out for
the holiday.


# 1.125 25-Dec-2000 wiz

Back out previous change. It causes NAT to fail, and was CLEARLY
NOT TESTED before it was committed.


# 1.124 22-Dec-2000 thorpej

Slight adjustment to how pfil_head's are registered. Instead of a
"key" and a "dlt", use a "type" (PFIL_TYPE_{AF,IFNET} for now) and
a val/ptr appropriate for that type. This allows for more future
flexibility with the pfil_hook mechanism.


# 1.123 14-Dec-2000 thorpej

Add ALTQ glue. XXX Temporary until ALTQ is changed to use a pfil hook.


# 1.122 24-Nov-2000 itojun

IFA_STATS stability (not complete); don't touch ip if it is NULL.


# 1.121 11-Nov-2000 thorpej

Restructure the PFIL_HOOKS mechanism a bit:
- All packets are passed to PFIL_HOOKS as they come off the wire, i.e.
fields in protocol headers in network order, etc.
- Allow for multiple hooks to be registered, using a "key" and a "dlt".
The "dlt" is a BPF data link type, indicating what type of header is
present.
- INET and INET6 register with key == AF_INET or AF_INET6, and
dlt == DLT_RAW.
- PFIL_HOOKS now take an argument for the filter hook, and mbuf **,
an ifnet *, and a direction (PFIL_IN or PFIL_OUT), thus making them
less IP (really, IP Filter) centric.

Maintain compatibility with IP Filter by adding wrapper functions for
IP Filter.


# 1.120 08-Nov-2000 ad

Update for hashinit() change.


# 1.119 13-Oct-2000 itojun

make sure we don't share external mbuf between m and mcopy, in ip_forward().
should solve PR 11201.


# 1.118 26-Aug-2000 itojun

make sure anonport{min,max} is not negative number


# 1.117 25-Aug-2000 tron

Add new sysctl variables "net.inet.ip.lowportmin" and
"net.inet.ip.lowportmax" which can be used to the set minimum
and maximum port number assigned to sockets using
IP_PORTRANGE_LOW.


# 1.116 06-Jul-2000 itojun

remove unnecessary #include <netkey/key_debug.h>. from kame.


# 1.115 28-Jun-2000 mrg

<vm/vm.h> -> <uvm/uvm_extern.h>


Revision tags: netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.114 10-May-2000 itojun

branches: 1.114.4;
add missing boundary checks to ip options processing.
correct timestamp option validation (len and ptr upper/lower bound
based on RFC791).
fill "pointer" field for parameter problem in timestamp option processing.


# 1.113 10-May-2000 itojun

correct more out-of-bounds memory access, if cnt == 1 and optlen > 1.


# 1.112 06-May-2000 sommerfeld

Handle large offsets with very small options correctly.


# 1.111 31-Mar-2000 jdolecek

Slighly improve previous - only include <netinet/ip_mroute.h> if MROUTING
is defined.


# 1.110 31-Mar-2000 jdolecek

include <netinet/ip_mroute.h> for ip_mforward() - needed after
last duplicate prototype sweep (prototype for ip_mforward() used to be in <netinet/ip_var.h>)


# 1.109 30-Mar-2000 augustss

Remove register declarations.


# 1.108 30-Mar-2000 simonb

Delete uninitialised declaration of ip_defttl - there's an initialised
decl earlier in this file.


# 1.107 10-Mar-2000 thorpej

Back out previous, and adjust a comment.


# 1.106 07-Mar-2000 thorpej

Back out part of 1.104 which isn't actually needed.


# 1.105 03-Mar-2000 itojun

remove unnecessary ttl initialization which I mistakingly bringed in
during KAME merge (this is part of WIDE's expeirmental reass code...)
NetBSD PR: 9412
From: Wolfgang Rupprecht <wolfgang@wsrcc.com>
Fix from: ho@crt.se
itojun was notified from: theo


# 1.104 02-Mar-2000 thorpej

Avoid a bug in GCC which manifests itself when processing unaligned
IP options. Problem pointed out by Matt Hargett and Erik Fair, analyzed
by me.


# 1.103 01-Mar-2000 itojun

introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)


# 1.102 20-Feb-2000 darrenr

pass "struct pfil_head *" to pfil_add_hook and pfil_remove hook rather
than "struct protosw *".


# 1.101 17-Feb-2000 darrenr

Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.


# 1.100 16-Feb-2000 itojun

- if ip_dst matches address on !IFF_UP interface, and
- there's no match against addresses on IFF_UP interface,
send icmp unreach if I'm router. drop it if I'm host.

Revised version of PR: 9387 from nrt@iij.ad.jp. Discussed with thorpej+nrt.


Revision tags: chs-ubc2-newbase
# 1.99 12-Feb-2000 thorpej

Typo (Thanks, Havard :-)


# 1.98 12-Feb-2000 thorpej

Small cosmetic change, and note a place where a statistic should be
gathered.


# 1.97 11-Feb-2000 itojun

fix in-kernel packet forwarding loop (till TTL becomes 0) when:
- a packet is delivered to an address X,
- and the address X is configured on my !IFF_UP interface
- and ipforwarding=1

NetBSD PR: 9387
From: nrt@iij.ad.jp


# 1.96 01-Feb-2000 thorpej

Use ifatoia() and sintosa() consistently, rather than using home-grown
casting macros intermixed.


# 1.95 31-Jan-2000 itojun

bring in latest KAME ipsec tree.
- interop issues in ipcomp is fixed
- padding type (after ESP) is configurable
- key database memory management (need more fixes)
- policy specification is revisited

XXX m->m_pkthdr.rcvif is still overloaded - hope to fix it soon


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.94 26-Oct-1999 itojun

disable ipflow (IPv4 fast fowarding) when IPsec is configured into the kernel.


# 1.93 17-Oct-1999 sommerfeld

branches: 1.93.2; 1.93.4;
In ip_forward():

Avoid forwarding ip unicast packets which were contained inside
link-level multicast packets; having M_MCAST still set in the packet
header flags will mean that the packet will get multicast to a bogus
group instead of unicast to the next hop.

Malformed packets like this have occasionally been spotted "in the
wild" on a mediaone cable modem segment which also had multiple netbsd
machines running as router/NAT boxes.

Without this, any subnet with multiple netbsd routers receiving all
multicasts will generate a packet storm on receipt of such a
multicast. Note that we already do the same check here for link-level
broadcasts; ip6_forward already does this as well.

Note that multicast forwarding does not go through ip_forward().

Adding some code to if_ethersubr to sanity check link-level
vs. ip-level multicast addresses might also be worthwhile.


Revision tags: chs-ubc2-base
# 1.92 23-Jul-1999 itojun

branches: 1.92.2;
do not include unnecessary include files.


# 1.91 09-Jul-1999 thorpej

defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).


# 1.90 06-Jul-1999 itojun

sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups


# 1.89 01-Jul-1999 itojun

IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.88 26-Jun-1999 sommerfeld

If the new global variable hostzerobroadcast is zero, no longer assume
address zero of each net/subnet is a broadcast address.
(The default value is nonzero, which preserves the current behavior).

This can be set using sysctl; the boot-time default can also be
configured using the HOSTZEROBROADCAST kernel config option.

While we're here, defopt HOSTZEROBROADCAST and SUBNETSARELOCAL


# 1.87 04-May-1999 hwr

It does not make much sense to increase a "output" counter on input.


# 1.86 03-May-1999 thorpej

In INADDR_TO_IA(), skip interfaces which are not up. Revert previous change
to ip_input.c to check the interface status after INADDR_TO_IA().

Fix cooked up by Heiko Rupp and myself.

Fixes PR 7480.


# 1.85 03-May-1999 hwr

Drop packets, that have a Class-D address as source address.
Implements the first half of PR 7003.


# 1.84 07-Apr-1999 proff

tiny KNF change


# 1.83 07-Apr-1999 proff

Prevent reception of packets on downed interfaces (via an up interface).
fixes kern/7327


Revision tags: netbsd-1-4-base
# 1.82 27-Mar-1999 aidan

branches: 1.82.2;
Added per-addr input/output statistics. Currently just support netatalk
and netinet, currently only tested under netinet.

Disabled by default, enabled by compiling the kernel with option
IFA_STATS. Enabling this feature seems to make the ip_output function
take 13% longer than before, which should be OK for people that need
this feature.


# 1.81 26-Mar-1999 proff

security: test for ip_len < ip_hl <<2 and drop packet accordingly


# 1.80 19-Jan-1999 mycroft

There's just no plausible reason to byte-swap ip_id internally. It's opaque.


# 1.79 19-Jan-1999 mycroft

Don't screw with ip_len; just subtract from it where we actually use the
value.


# 1.78 19-Jan-1999 mycroft

Don't overwrite the checksum fields when checking them. There's no reason to
do this, and it screws up ICMP replies.
XXX The returned IP checksum and length are still wrong.


# 1.77 11-Jan-1999 thorpej

Fix byte order and ip_len inconsistencies in ICMP reply code. Also, fix
some formatting and HTONS(foo) vs. foo = htons(foo) inconsistencies.

PR #6602, Darren Reed.


# 1.76 19-Dec-1998 thorpej

Reverse the copyright-notice-swap. It went against existing practice.


# 1.75 18-Dec-1998 thorpej

Add a lock around the IP fragment reassembly queue, to prevent ip_drain()
from corrupting the queue if called from a device's interrupt context.

Should fix PR #5684.


Revision tags: kenh-if-detach-base
# 1.74 13-Nov-1998 thorpej

branches: 1.74.2;
Once a fragmented IP packet has been reassembled, recompute the packet
length before passing it up the stack. From FreeBSD.


Revision tags: chs-ubc-base
# 1.73 08-Oct-1998 thorpej

Use the pool allocator for ipflow entries.


# 1.72 08-Oct-1998 thorpej

Use the pool allocator for ipqent structures.


# 1.71 30-Sep-1998 tls

Switch order of TNF and UCB copyrights so UCB copyright is first; this seems more appropriate since UCB wrote the original code, after all.


# 1.70 09-Sep-1998 thorpej

Make a diagnostic printf more sensible, PR #5951, Heiko W. Rupp.


# 1.69 09-Aug-1998 mrg

defopt PFIL_HOOKS.


Revision tags: eeh-paddr_t-base
# 1.68 17-Jul-1998 sommerfe

Fix PR5508: ipfil cut-through forwarding causes panic


# 1.67 01-Jun-1998 thorpej

Protect the ipflow_reap() call with splsoftnet.


# 1.66 24-May-1998 thorpej

Fix OBOB in IP timestamp option processing, as noted in FreeBSD PR 6738,
from Jennifer Dawn Meyers <jdm@enteract.com>.


# 1.65 04-May-1998 matt

Default IP flow to being enabled. Add a sysctl to control the maximum
number of flows (net.inet.ip.maxflows). If set to 0, will disable fast
path forwarding.


# 1.64 01-May-1998 thorpej

Allow packet filters to prevent a packet from creating a fast-forwarding
flow, by setting the "can fast forward" flag in the packet header, and
giving a chance for filters to clear the flag. If the flag is still
set after the filters have given it a chance, the packet will be used
to create a fast-forward flow entry.


# 1.63 29-Apr-1998 matt

Add support for "fast" forwarding. Add hooks in if_ethersubr.c and
if_fddisubr.c to fastpath IP forwarding. If ip_forward successfully
forwards a packet, it will create a cache (ipflow) entry. ether_input
and fddi_input will first call ipflow_fastforward with the received
packet and if the packet passes enough tests, it will be forwarded (the
ttl is decremented and the cksum is adjusted incrementally).


# 1.62 29-Apr-1998 matt

defopt GATEWAY


# 1.61 29-Apr-1998 kml

change path MTU timeout value to match RFC 1191


# 1.60 29-Apr-1998 kml

Add support for deletion of routes added by path MTU discovery;
uses new generic route timeout code. Add sysctl for timeout period.


# 1.59 19-Mar-1998 mrg

convert pfil(9) in and out lists from <sys/queue.h> LISTs to TAILQs, and
change pfil_add_hook to put output filters at the tail of the queue,
while continuing to place input filters at the head of the queue. update
the two users of these functions, and document these changes.

fixes PR#4593.


# 1.58 15-Feb-1998 tls

Add correct copyright notice for IP address hash change. This code is donated to TNF by the original copyright holder, Panix.


# 1.57 13-Feb-1998 tls

Change list of interface IP addresses to a hash. Improves performance on hosts with a large number of IP addresses significantly.


# 1.56 28-Jan-1998 thorpej

Use offsetof() from libkern.h


# 1.55 12-Jan-1998 scottr

Use option header file for MROUTING


# 1.54 05-Jan-1998 lukem

enhance ephemeral port allocation code:
* support sysctl net.inet.ip.anonportmin (lowest ephemeral port)
and net.inet.ip.anonportmax (highest ephemeral port).
these can't be set to >65535, < IPPORT_RESERVED (unless IPNOPRIVPORTS
is defined), and anonportmin has to be < anonportmax.
* use a cleaner way of only cycling through the available set once;
this will be useful for when a random allocation scheme is used
* define IPPORT_ANON{MIN,MAX} instead of IPPORT_USER{LOW,HIGH}


Revision tags: netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base
# 1.53 18-Oct-1997 kml

branches: 1.53.2;
change sysctl net.inet.icmp.mtudisc to net.inet.ip.mtudisc


# 1.52 17-Oct-1997 thorpej

Allow `subnetsarelocal' to be changed via sysctl.


Revision tags: thorpej-signal-base marc-pcmcia-base
# 1.51 29-Aug-1997 gwr

Tweaks to allow operation with an interface address of 0.0.0.0
(needed for NFS mountroot using BOOTP to get boot parameters)


Revision tags: marc-pcmcia-bp
# 1.50 24-Jun-1997 thorpej

branches: 1.50.4;
Eliminate use of dtom() from the network code, allowing more flexible
use of mbuf external storage and increasing performance (by eliminating
an m_pullup() for clusters in the IP reassembly code).

Changes from Koji Imada <koji@math.human.nagoya-u.ac.jp>, in PR #3628
and #3480, with ever-so-slight integration changes by me.


# 1.49 15-Apr-1997 christos

Move the mtod calls *after* we've made sure that the packet has passed the
filter successfully. Otherwise it can be NULL if the filter blocked it,
and we die. How did this ever work?


Revision tags: is-newarp-before-merge
# 1.48 26-Feb-1997 mrg

allow src-routed packetd by default, per host requirements


# 1.47 25-Feb-1997 cjs

Add net.inet.ip.allowsrcrt option which allows/drops all source
routed packets. This currently defaults to `drop,' but once we
verify that all applications that rely on determining remote IP
addresses for authentication are dropping the connection when they
see a source route option (not just disabling the source route
option), we can turn this back on and conform with the host
requirements.


# 1.46 19-Feb-1997 cjs

Fix bug in sysctl net.inet.ip.forwsrcrt handing: now you can read it
if securelevel > 0. (Thanks, cgd.)


# 1.45 18-Feb-1997 mrg

pseudo-device ipfilter brings in PFIL_HOOKS.


Revision tags: is-newarp-base
# 1.44 11-Jan-1997 thorpej

branches: 1.44.4;
Implement the IP_RECVIF socket option: supply a datagram packet's incoming
interface using a sockaddr_dl in a control mbuf.

Implement SO_TIMESTAMP for IP datagrams.

Move packet information option processing into a generic function
so that they work with multicast UDP and raw IP as well as unicast UDP.

Contributed by Bill Fenner <fenner@parc.xerox.com>.


# 1.43 20-Dec-1996 mrg

in pfil_hooks: always reassign ip after calling hook.


# 1.42 20-Dec-1996 mrg

remove pfil_bad.


# 1.41 25-Oct-1996 thorpej

Before concatenating frags, sanity check the length of the packet. If it's
larger than IP_MAXPACKET, discard it.
Based on a patch from Bill Fenner <fenner@parc.xerox.com>


# 1.40 22-Oct-1996 veego

Fix a panic from the pfil_hooks.


# 1.39 13-Oct-1996 christos

backout previous kprintf changes


# 1.38 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.37 21-Sep-1996 perry

commit fix in pr 2772 -- the IP input code was assuming that the
reserved (must be zero) flag must necessarily be zero. We now define
an IP_RF (by analogy to IP_DF and IP_MF) and mask it out when necessary.


# 1.36 14-Sep-1996 mrg

move the packet filter hooks in to a saner location. while i'm here, rename
PACKET_FILTER to PFIL_HOOKS.


# 1.35 09-Sep-1996 mycroft

Add in_nullhost() and in_hosteq() macros, to hide some protocol
details. Also, fix a bug in TCP wrt SYN+URG packets.


# 1.34 08-Sep-1996 mycroft

Save 68 bytes of the packet for ICMP, not 64. From Laine Stump, PR 2296.


# 1.33 06-Sep-1996 mrg

add packet filter interface code. see pfil(9) for more details. you
need the PACKET_FILTER option to enable this code. currently, ipfilter
version 3.1.1-beta has been converted to use this new interface.


# 1.32 14-Aug-1996 thorpej

Fix some DIAGNOSTIC printf() formats; ntohl() provides a 32-bit quantity,
and should be printed with %x, not %lx.


# 1.31 10-Jul-1996 cgd

print result of ntohl/htonl as a long. (makes -Wformat work on the
Alpha.)


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.30 16-Mar-1996 christos

branches: 1.30.4;
Fix printf format args.


# 1.29 26-Feb-1996 mrg

two more local addr changes, all done differently now (idea from charles)


# 1.28 13-Feb-1996 christos

netinet prototypes


# 1.27 16-Jan-1996 thorpej

Add a net.inet.ip.directed-broadcast sysctl as suggested by
Darren Reed <darrenr@vitruvius.arbld.unimelb.edu.au> in PR #1227.
This change is slightly different than the one submitted by Darren in
that the DIRECTED_BROADCAST compile-time option will behave like it used
to so that existing configurations utilizing it won't have to change.


# 1.26 15-Jan-1996 thorpej

Add net.inet.ip.forwsrcrt: if zero, the system will not forward
source-routed packets. Note this value is protected by kernel security
level; it can only be changed if securelevel < 1.


# 1.25 21-Nov-1995 cgd

make netinet work on systems where pointers and longs are 64 bits
(like the alpha). Biggest problem: IP headers were overlayed with
structure which included pointers, and which therefore didn't overlay
properly on 64-bit machines. Solution: instead of threading pointers
through IP header overlays, add a "queue element" structure to do
the threading, and point it at the ip headers.


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.24 12-Aug-1995 mycroft

splnet --> splsoftnet


# 1.23 12-Jun-1995 mycroft

Change in_pcbnotify*() to take an errno value. Make inetctlerrmap[] an
array on ints, not u_chars.


# 1.22 12-Jun-1995 mycroft

Various cleanup, including:
* Convert several data structures to use queue.h.
* Split in_pcbnotify() into two parts; one for notifying a specific PCB, and
one for notifying all PCBs for a particular foreign address.


# 1.21 07-Jun-1995 mycroft

Remove ip_ifmatrix completely.


# 1.20 04-Jun-1995 mycroft

Don't cast things unnecessarily.


# 1.19 04-Jun-1995 mycroft

Clean up many more casts.


# 1.18 01-Jun-1995 mycroft

Avoid byte-swapping IP addresses at run time.


# 1.17 15-May-1995 cgd

oops; forgot a '{'


# 1.16 14-May-1995 cgd

drop (and record) malformed IP fragments. Fixes pr 1030 (differently).


# 1.15 13-Apr-1995 cgd

be a bit more careful and explicit with types. (basically a large no-op.)


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.14 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.13 13-May-1994 mycroft

Update to 4.4-Lite networking code, with a few local changes.


# 1.12 14-Feb-1994 mycroft

PARANOID --> DIAGNOSTIC for inexpensive tests.


# 1.11 02-Feb-1994 hpeyerl

Multicast is no longer optional.


# 1.10 29-Jan-1994 brezak

Fix some cases of NOT dealing with m_pkthdr's. This code is still suspect though, at least this fixes some panics.


# 1.9 10-Jan-1994 mycroft

Should compile now with or without `options MULTICAST'.


# 1.8 09-Jan-1994 mycroft

Prototype the rest.


# 1.7 08-Jan-1994 mycroft

More prototypes.


# 1.6 08-Jan-1994 mycroft

Fix some inconsistent spacing; spaces at the end of lines, etc.


# 1.5 18-Dec-1993 mycroft

Canonicalize all #includes.


# 1.4 06-Dec-1993 hpeyerl

multicast support.
>From Chris Maeda, cmaeda@cs.washington.edu
These patches are derived from the IP Multicast patches for BSDI.


Revision tags: magnum-base netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.3 20-May-1993 cgd

branches: 1.3.4;
more rcsid additions and file header cleanups


# 1.2 04-May-1993 cgd

make ip_input recursion checking be for -DPARANOID, and make it panic


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.357 06-Jul-2017 christos

remove unnecessary casts (no functional change)


# 1.356 06-Jul-2017 christos

Merge the two copies SO_TIMESTAMP/SO_OTIMESTAMP processing to a single
function, and add a SOOPT_TIMESTAMP define reducing compat pollution from
5 places to 1.


Revision tags: netbsd-8-base
# 1.355 01-Jun-2017 chs

remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base
# 1.354 31-Mar-2017 ozaki-r

Don't use a single global variable to store source route information for multiple incoming packets

It's not MP-safe. So use a m_tag to store the information instead.

Pointed out by knakahara@
The fix is from OpenBSD (originally fixed in FreeBSD)


# 1.353 31-Mar-2017 ozaki-r

Don't use a single global variable as a temporal storage for multiple packets

It's not MP-safe. So use local variables instead.


Revision tags: pgoyette-localcount-20170320
# 1.352 06-Mar-2017 ozaki-r

Make sure icmp_redirect_timeout_q and ip_mtudisc_timeout_q are initialized on bootup

Fix PR kern/52029


# 1.351 17-Feb-2017 ozaki-r

Fix return value


# 1.350 17-Feb-2017 ozaki-r

Protect sysctl_net_inet_ip_pmtudto with icmp_mtx instead of softnet_lock


# 1.349 07-Feb-2017 ozaki-r

Add missing NULL checks for m_get_rcvif


Revision tags: nick-nhusb-base-20170204
# 1.348 24-Jan-2017 ozaki-r

Tweak softnet_lock and NET_MPSAFE

- Don't hold softnet_lock in some functions if NET_MPSAFE
- Add softnet_lock to sysctl_net_inet_icmp_redirtimeout
- Add softnet_lock to expire_upcalls of ip_mroute.c
- Restore softnet_lock for in{,6}_pcbpurgeif{,0} if NET_MPSAFE
- Mark some softnet_lock for future work


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107
# 1.347 12-Dec-2016 ozaki-r

branches: 1.347.2;
Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.346 08-Dec-2016 ozaki-r

Use psref for ip_rtaddr

ip_rtaddr will be sleepable soon. So use psref instead of pserialize.


# 1.345 08-Dec-2016 ozaki-r

Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.344 18-Oct-2016 ozaki-r

Don't hold global locks if NET_MPSAFE is enabled

If NET_MPSAFE is enabled, don't hold KERNEL_LOCK and softnet_lock in
part of the network stack such as IP forwarding paths. The aim of the
change is to make it easy to test the network stack without the locks
and reduce our local diffs.

By default (i.e., if NET_MPSAFE isn't enabled), the locks are held
as they used to be.

Reviewed by knakahara@


# 1.343 18-Oct-2016 ozaki-r

Avoid double frees of mbuf

May fix one of panicks reported by Tom Ivar Helbekkmo in PR kern/51522


# 1.342 11-Oct-2016 ozaki-r

Fix kernel builds with IFA_STATS


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.341 07-Sep-2016 roy

Disallow input to detached addresses because they are not yet valid.


# 1.340 31-Aug-2016 ozaki-r

Make ipforward_rt and ip6_forward_rt percpu

Sharing one rtcache between CPUs is just a bad idea.

Reviewed by knakahara@


Revision tags: pgoyette-localcount-20160806
# 1.339 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


# 1.338 26-Jul-2016 ozaki-r

Fix downmatch increment


Revision tags: pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.337 08-Jul-2016 ozaki-r

branches: 1.337.2;
CID 1363344: remove dead code

We may need to reconsider a case when m_get_rcvif_psref returns NULL.


# 1.336 07-Jul-2016 ozaki-r

Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.


# 1.335 06-Jul-2016 ozaki-r

Switch the IPv4 address list to pslist(9)

Note that we leave the old list just in case; it seems there are some
kvm(3) users accessing the list. We can remove it later if we confirmed
nobody does actually.


# 1.334 06-Jul-2016 ozaki-r

Add and use pslist(9)-based hashtable for IPv4 addresses

Note that we leave the old hashtable to keep vmstat -H working.


# 1.333 04-Jul-2016 ozaki-r

Separate IP address matching functions

No functional change intended.


# 1.332 30-Jun-2016 ozaki-r

Tidy up goto lables

No functional change.


# 1.331 30-Jun-2016 ozaki-r

Fix error paths

Some error paths did m_put_rcvif_psref twice.


# 1.330 28-Jun-2016 ozaki-r

Add missing NULL checks for m_get_rcvif_psref


# 1.329 10-Jun-2016 ozaki-r

Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319
# 1.328 21-Jan-2016 riastradh

Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.


# 1.327 21-Jan-2016 riastradh

Give proper prototype to ip_output.


# 1.326 08-Jan-2016 knakahara

eliminate ip_input.c and ip6_input.c dependency on gif(4)


Revision tags: nick-nhusb-base-20151226
# 1.325 13-Oct-2015 roy

Include arp.h to restore the sysctl net.inet.ip.dad_count.
Fixes PR kern/49883 thanks to HITOSHI Osada.


Revision tags: nick-nhusb-base-20150921
# 1.324 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.323 07-Aug-2015 ozaki-r

Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.


Revision tags: nick-nhusb-base-20150606
# 1.322 02-May-2015 joerg

Fix !ARP build.


# 1.321 02-May-2015 roy

Add IPv4 address flags IN_IFF_TENTATIVE, IN_IFF_DUPLICATED and
IN_IFF_DETATCHED to mimic the IPv6 address behaviour.
Add SIOCGIFAFLAG_IN ioctl to retrieve the address flag via the
ifreq structure.
Add IPv4 DAD detection via the ARP methods described in RFC 5227.
Add sysctls net.inet.ip.dad_count and net.inet.arp.debug.

Discussed on tech-net@


Revision tags: nick-nhusb-base-20150406
# 1.320 26-Mar-2015 ozaki-r

Tidy up the regular path of ip_forward

No functional change is intended.


Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.319 16-Jun-2014 ozaki-r

branches: 1.319.4;
Add 3rd argument to pktq_create to pass sc

It will be used to pass bridge sc for bridge_forward softint.

ok rmind@


# 1.318 05-Jun-2014 rmind

- Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.


# 1.317 30-May-2014 christos

Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.


# 1.316 29-May-2014 rmind

Make IGMP and multicast group management code MP-safe. Use a read-write
lock to protect the hash table of multicast address records; also, make it
private and eliminate some macros. In the long term, the lookup path ought
to be optimised.


# 1.315 28-May-2014 christos

CID 12164{49,51}: Remove bogus ifp == NULL checks; if ifp was really NULL,
we would have been dead a few lines before the tests.


# 1.314 23-May-2014 rmind

ip_input(), ip_savecontrol(): cache m->m_pkthdr.rcvif in a variable.


# 1.313 23-May-2014 rmind

Make ip_forward() static, there is no need to expose it.


# 1.312 23-May-2014 rmind

Make ip_input() static, there is no need to expose it.


# 1.311 22-May-2014 rmind

- Add in_init() and move some functions, variables and sysctls into in.c
where they belong to. Make some functions and variables static.
- ip_input.c: reduce some #ifdefs, cleanup a little.
- Move some sysctls into ip_flow.c as they belong there.

No functional change.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.310 19-Mar-2014 liamjfoy

branches: 1.310.2;
Remove ipflow_prune and replace with ipflow_reap. ok rmind@


Revision tags: riastradh-drm2-base3
# 1.309 25-Feb-2014 pooka

Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.308 29-Jun-2013 rmind

- Rewrite parts of pfil(9): use array to store hooks and thus be more cache
friendly (there are only few hooks in the system). Make the structures
opaque and the interface more strict.
- Remove PFIL_HOOKS option by making pfil(9) mandatory.


# 1.307 27-Jun-2013 christos

branches: 1.307.2;
flip src/dst


# 1.306 27-Jun-2013 christos

implement IP_PKTINFO and IP_RECVPKTINFO.


# 1.305 08-Jun-2013 rmind

Split IPsec code in ip_input() and ip_forward() into the separate routines
ipsec4_input() and ipsec4_forward(). Tested by christos@.


# 1.304 05-Jun-2013 christos

IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.


Revision tags: agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.303 29-Nov-2012 christos

Add a new sysctl to mark ports as reserved, so that they are not used in
the anonymous or reserved port allocation.


Revision tags: yamt-pagecache-base6
# 1.302 25-Jun-2012 christos

branches: 1.302.2;
rename rfc6056 -> portalgo, requested by yamt


# 1.301 22-Jun-2012 christos

PR/46602: Move the rfc6056 port randomization to the IP layer.


# 1.300 02-Jun-2012 dsl

Add some pre-processor magic to verify that the type of the data item
passed to sysctl_createv() actually matches the declared type for
the item itself.
In the places where the caller specifies a function and a structure
address (typically the 'softc') an explicit (void *) cast is now needed.
Fixes bugs in sys/dev/acpi/asus_acpi.c sys/dev/bluetooth/bcsp.c
sys/kern/vfs_bio.c sys/miscfs/syncfs/sync_subr.c and setting
AcpiGbl_EnableAmlDebugObject.
(mostly passing the address of a uint64_t when typed as CTLTYPE_INT).
I've test built quite a few kernels, but there may be some unfixed MD
fallout. Most likely passing &char[] to char *.
Also add CTLFLAG_UNSIGNED for unsiged decimals - not set yet.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.299 22-Mar-2012 drochner

remove KAME IPSEC, replaced by FAST_IPSEC


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.298 09-Jan-2012 liamjfoy

check against NULL


# 1.297 19-Dec-2011 drochner

rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.296 31-Aug-2011 plunky

branches: 1.296.2; 1.296.6;
NULL does not need a cast


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.295 03-May-2011 dyoung

*_drain() routines may be called with locks held, so instead of doing
any work in *_drain(), set a drain-needed flag. Do the work in the
fasttimo handler.

Contributed by Coyote Point Systems, Inc.


# 1.294 14-Apr-2011 dyoung

In ipintr(), don't overwrite ipintrq.ifq_maxlen with IFQ_MAXLEN.

Initialize ipintrq.ifq_maxlen using IFQ_MAXLEN directly instead of using
the global ipqmaxlen. Get rid of the global ipqmaxlen.

Now it works again to override the maximum IP queue length with, for
example, sysctl -w net.inet.ip.ifq.maxlen=5.


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.293 13-Dec-2010 matt

branches: 1.293.2;
Back out rev that shouldn't have been committed.


# 1.292 11-Dec-2010 matt

Add routines to calculate a checkesum if the driver concludes that the
h/w can't do it.


Revision tags: uebayasi-xip-base4
# 1.291 05-Nov-2010 rmind

ip_randomid: make mechanism MP-safe and more modular.

OK matt@


# 1.290 05-Nov-2010 rmind

ip_reass_packet: finish abstraction; some clean-up.
Discussed some time ago with matt@.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.289 19-Jul-2010 rmind

Abstract IP reassembly into single generic routine - ip_reass_packet().
Make struct ipq private and struct ipqent not visible to userland.
Push ip_len adjustment into reassembly layer.

OK matt@


# 1.288 13-Jul-2010 rmind

Split-off IPv4 re-assembly mechanism into a separate module. Abstract
into ip_reass_init(), ip_reass_lookup(), etc (note: abstraction is not
yet complete). No functional changes to the actual mechanism.

OK matt@


# 1.287 09-Jul-2010 rmind

ip_input: move lookup for fragment queue a little bit further. OK matt@.


Revision tags: uebayasi-xip-base1
# 1.286 01-Apr-2010 tls

As suggested by at least 3 different people (the guilty parties know who
they are) avoid repeated kernel_lock/unlock by using an intrq on the stack.

About 5%-10% better from run to run, on my *very* simpleminded test. Can't
possibly be worse.


# 1.285 31-Mar-2010 tls

Don't hold kernel lock across call to ip_input() -- it blocked *all*
hardware interrupts for the length of time it took for all dequeued
packets to flow up the stack (on multiprocessors only). Initial testing
shows performance impact is minimal -- since this temporary fix actually
means taking/releasing the kernel lock per-packet, that seems
acceptable.

Holding the kernel lock across the ip_input() call duplicated the
exclusion intended to be provided by the socket locks/softnet lock
(same lock, for INET/INET6 sockets) and could mask serious bugs. Several
hours' testing didn't turn any up but I'd be surprised if some don't now
appear.

Damon Permezel noticed the problem. Temporary fix suggested by matt@.


Revision tags: yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.284 16-Sep-2009 pooka

branches: 1.284.2; 1.284.4;
Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.283 17-Jul-2009 minskim

Delete trailing whitespace.


Revision tags: yamt-nfs-mp-base6
# 1.282 16-Jul-2009 minskim

Add the IP_RECVTTL option support.

If the IP_RECVTTL option is enabled on a SOCK_DGRAM socket, the
recvmsg(2) call will return the TTL of the received datagram. The
msg_control field in the msghdr structure points to a buffer that
contains a cmsghdr structure followed by the TTL value.

Modeled after FreeBSD implementation.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.281 18-Apr-2009 tsutsui

Remove extra whitespace added by a stupid tool.
XXX: more in src/sys/arch


# 1.280 15-Apr-2009 elad

Remove a few KAUTH_GENERIC_ISSUSER in favor of more descriptive
alternatives.

Discussed on tech-kern:

http://mail-index.netbsd.org/tech-kern/2009/04/11/msg004798.html

Input from ad@, christos@, dyoung@, tsutsui@.

Okay ad@.


# 1.279 18-Mar-2009 cegger

bcopy -> memcpy


Revision tags: nick-hppapmap-base2
# 1.278 19-Jan-2009 christos

branches: 1.278.2;
Provide compatibility to the old timeval SCM_TIMESTAMP messages.


Revision tags: mjf-devfs2-base
# 1.277 17-Dec-2008 cegger

kill MALLOC and FREE macros.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.276 23-Nov-2008 rmind

ip_input: fix an IPQ "lock" leak. (hi <matt>!)


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4
# 1.275 04-Oct-2008 pooka

branches: 1.275.2; 1.275.4;
POOL_INIT -> pool_init


Revision tags: wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.274 05-Sep-2008 seanb

Wrong route being consulted in one place
in ip_forward() after change to rtcache_*().
Restore previous behaviour.


# 1.273 20-Aug-2008 matt

Make the sysctl routines take out softnet_lock before dealing with
any data structures.

Change inet6ctlerrmap and zeroin6_addr to const.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base
# 1.272 05-May-2008 ad

branches: 1.272.2; 1.272.6;
- Convert hashinit() to use kmem_alloc(). The hash tables can be large
and it's better to not have them in kmem_map.
- Convert a couple of minor items along the way to kmem_alloc().
- Fix some memory leaks.


# 1.271 04-May-2008 thorpej

Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577


# 1.270 02-May-2008 ad

PR kern/38497 Out of memory allocating ksiginfo

Work around: don't acquire softnet_lock in protocol drain routines.


# 1.269 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.268 24-Apr-2008 ad

branches: 1.268.2;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.


# 1.267 23-Apr-2008 thorpej

Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.266 12-Apr-2008 thorpej

branches: 1.266.2;
Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.


# 1.265 09-Apr-2008 thorpej

- ipflow is not used outside ip_flow.c; move its definition there.
- Make ipflow_reap() private to ip_flow.c, and introduce ipflow_prune()
for external callers to use (avoids returning an ipflow * that is never
actually used anyway).


# 1.264 07-Apr-2008 thorpej

Change IP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ipstat structure; old netstat
binaries will continue to work properly.


# 1.263 27-Mar-2008 cube

- Make sure we send a reasonable fragment size when IPSEC is configured.
Otherwise we end up sending a dubious "0" whenever we cannot find a
proper association for the packet.
- Reset sack_newdata along with snd_nxt to avoid improper integer
arithmetics that lead to sending data from an incorrect place in the
stream, making it appear as corrupted.

Patch by Michael Van Elst, based on an analysis by Michael for the IPSEC
stuff and I for the SACK issue.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.262 06-Feb-2008 matt

branches: 1.262.6;
Add a new ip_id generation scheme based on a Fisher-Yates shuffle over a
sliding window. XXX replace use of arc4random RSN.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.261 14-Jan-2008 dyoung

Use rtcache_validate() instead of rtcache_getrt(). Shorten staircase
in in_losing().


Revision tags: vmlocking2-base3 matt-armv6-base
# 1.260 22-Dec-2007 matt

Fix offset calculation.
Make sure that all frags use the same TOS.


# 1.259 21-Dec-2007 matt

Also make sure the first is at 68 bytes long.


# 1.258 21-Dec-2007 matt

Prevent TCP blind data attacks by not allowing non-initial fragments to
start at less than 68 bytes (minimal fragment size).


# 1.257 20-Dec-2007 dyoung

Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.256 26-Nov-2007 yamt

branches: 1.256.2; 1.256.6;
inetctlerrmap: use designated initializer.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.255 09-Nov-2007 kefren

Don't MCLAIM in ipintr() because we do it anyway in ip_input()


Revision tags: jmcneill-base yamt-x86pmap-base4 yamt-x86pmap-base3 yamt-x86pmap-base2 vmlocking-base
# 1.254 02-Oct-2007 dyoung

branches: 1.254.2; 1.254.4;
Delete the unused second argument to ip_stripoptions(), move it
closer to its single caller in if_eon.c, try to move fewer bytes
by moving the IP header forward instead of moving the tail of the
mbuf backward, and use m_adj(9) instead of fiddling directly with
mbuf data members.


Revision tags: yamt-x86pmap-base
# 1.253 11-Sep-2007 degroote

branches: 1.253.2;
In some FAST_IPSEC, spl level is not restored correctly. Fix that.

Spotted by Wolfgang Stukenbrock in pr/36800


Revision tags: nick-csl-alignment-base5
# 1.252 30-Aug-2007 dyoung

Use malloc(9) for sockaddrs instead of pool(9), and remove dom_sa_pool
and dom_sa_len members from struct domain. Pools of fixed-size
objects are too rigid for sockaddr_dls, whose size can vary over
a wide range.

Return sockaddr_dl to its "historical" size. Now that I'm using
malloc(9) instead of pool(9) to allocate sockaddr_dl, I can create
a sockaddr_dl of any size in the kernel, so expanding sockaddr_dl
is useless.

Avoid using sizeof(struct sockaddr_dl) in the kernel.

Introduce sockaddr_dl_alloc() for allocating & initializing an
arbitrary sockaddr_dl on the heap.

Add an argument, the sockaddr length, to sockaddr_alloc(),
sockaddr_copy(), and sockaddr_dl_setaddr().

Constify: LLADDR() -> CLLADDR().

Where the kernel overwrites LLADDR(), use sockaddr_dl_setaddr(),
instead. Used properly, sockaddr_dl_setaddr() will not overrun
the end of the sockaddr.


# 1.251 10-Aug-2007 dyoung

branches: 1.251.2;
Use sockaddr_dl_init().


Revision tags: matt-mips64-base
# 1.250 19-Jul-2007 dyoung

branches: 1.250.4; 1.250.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base yamt-idlelwp-base8 mjf-ufs-trans-base
# 1.249 02-May-2007 dyoung

branches: 1.249.2;
Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


Revision tags: thorpej-atomic-base
# 1.248 25-Mar-2007 liamjfoy

Add net.inet.ip.hashsize to control the IPv4 fast forward hash table size.


# 1.247 24-Mar-2007 liamjfoy

Don't call ip*flow_reap if we're just looking up maxflows


# 1.246 12-Mar-2007 ad

branches: 1.246.2; 1.246.4;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.


# 1.245 05-Mar-2007 liamjfoy

branches: 1.245.2;
Move ipflow_slowtimo from ip_slowtimo and into in_proto.c

ok matt@


# 1.244 04-Mar-2007 christos

Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.243 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.242 29-Jan-2007 dyoung

branches: 1.242.2;
Cosmetic: remove extraneous, non-KNF parentheses. Change a
sizeof(type) to a sizeof(*ptr) so the correctness of the statement
is correct "at a glance" (or so I hope).


# 1.241 22-Dec-2006 ad

ipintr(): check if the queue is empty before looping. Hardly a giant
win, but removed 30% of splnet() calls in one local test.


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.240 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.239 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


# 1.238 06-Dec-2006 dyoung

KNF.


# 1.237 06-Dec-2006 dyoung

KNF.


Revision tags: netbsd-4-0-RC1 netbsd-4-base
# 1.236 16-Nov-2006 christos

branches: 1.236.2; 1.236.4;
__unused removal on arguments; approved by core.


Revision tags: yamt-splraiseipl-base2
# 1.235 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


# 1.234 10-Oct-2006 dogcow

change the MOWNER_INIT define to take two args; fix extant struct mowner
decls to use it. Makes options MBUFTRACE compile again and not whinge about
missing structure declarations. (Also makes initialization consistent.)


# 1.233 05-Oct-2006 tls

Protect calls to pool_put/pool_get that may occur in interrupt context
with spl used to protect other allocations and frees, or datastructure
element insertion and removal, in adjacent code.

It is almost unquestionably the case that some of the spl()/splx() calls
added here are superfluous, but it really seems wrong to see:

s=splfoo();
/* frob data structure */
splx(s);
pool_put(x);

and if we think we need to protect the first operation, then it is hard
to see why we should not think we need to protect the next. "Better
safe than sorry".

It is also almost unquestionably the case that I missed some pool
gets/puts from interrupt context with my strategy for finding these
calls; use of PR_NOWAIT is a strong hint that a pool may be used from
interrupt context but many callers in the kernel pass a "can wait/can't
wait" flag down such that my searches might not have found them. One
notable area that needs to be looked at is pf.

See also:

http://mail-index.netbsd.org/tech-kern/2006/07/19/0003.html
http://mail-index.netbsd.org/tech-kern/2006/07/19/0009.html


# 1.232 19-Sep-2006 elad

Remove ugly (void *) casts from network scope authorization wrapper and
calls to it.

While here, adapt code for system scope listeners to avoid some more
casts (forgotten in previous run).

Update documentation.


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9
# 1.231 13-Sep-2006 elad

branches: 1.231.2;
Don't use KAUTH_RESULT_* where it's not applicable.
Prompted by yamt@.


# 1.230 08-Sep-2006 elad

First take at security model abstraction.

- Add a few scopes to the kernel: system, network, and machdep.

- Add a few more actions/sub-actions (requests), and start using them as
opposed to the KAUTH_GENERIC_ISSUSER place-holders.

- Introduce a basic set of listeners that implement our "traditional"
security model, called "bsd44". This is the default (and only) model we
have at the moment.

- Update all relevant documentation.

- Add some code and docs to help folks who want to actually use this stuff:

* There's a sample overlay model, sitting on-top of "bsd44", for
fast experimenting with tweaking just a subset of an existing model.

This is pretty cool because it's *really* straightforward to do stuff
you had to use ugly hacks for until now...

* And of course, documentation describing how to do the above for quick
reference, including code samples.

All of these changes were tested for regressions using a Python-based
testsuite that will be (I hope) available soon via pkgsrc. Information
about the tests, and how to write new ones, can be found on:

http://kauth.linbsd.org/kauthwiki

NOTE FOR DEVELOPERS: *PLEASE* don't add any code that does any of the
following:

- Uses a KAUTH_GENERIC_ISSUSER kauth(9) request,
- Checks 'securelevel' directly,
- Checks a uid/gid directly.

(or if you feel you have to, contact me first)

This is still work in progress; It's far from being done, but now it'll
be a lot easier.

Relevant mailing list threads:

http://mail-index.netbsd.org/tech-security/2006/01/25/0011.html
http://mail-index.netbsd.org/tech-security/2006/03/24/0001.html
http://mail-index.netbsd.org/tech-security/2006/04/18/0000.html
http://mail-index.netbsd.org/tech-security/2006/05/15/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/01/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/25/0000.html

Many thanks to YAMAMOTO Takashi, Matt Thomas, and Christos Zoulas for help
stablizing kauth(9).

Full credit for the regression tests, making sure these changes didn't break
anything, goes to Matt Fleming and Jaime Fournier.

Happy birthday Randi! :)


Revision tags: yamt-pdpolicy-base8 rpaulo-netinet-merge-pcb-base
# 1.229 30-Aug-2006 christos

branches: 1.229.2;
fix initializer


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.228 30-Jul-2006 elad

ugh.. more stuff that's overdue and should not be in 4.0: remove the
sysctl(9) flags CTLFLAG_READONLY[12]. luckily they're not documented
so it's only half regression.

only two knobs used them; proc.curproc.corename (check added in the
existing handler; its CTLFLAG_ANYWRITE, yay) and net.inet.ip.forwsrcrt,
that got its own handler now too.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base
# 1.227 07-Jun-2006 kardel

merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base simonb-timecounters-base
# 1.226 08-May-2006 liamjfoy

branches: 1.226.2;
#if -> #ifdef

ok christos


# 1.225 15-Apr-2006 christos

Coverity CID 1134: Protect against NULL deref.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.224 18-Feb-2006 joerg

branches: 1.224.2; 1.224.4; 1.224.6;
Print the source and destination IP in ip_forward's DIAGNOSTIC code
with inet_ntoa, making it more human friendly.

From Liam J. Foy in private mail.


# 1.223 24-Dec-2005 perry

branches: 1.223.2; 1.223.4; 1.223.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.222 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.221 01-Nov-2005 christos

Don't decrement the ttl, until we are sure that we can forward this packet.
Before if there was no route, we would call icmp_error with a datagram
packet that has an incorrect checksum. (From Liam Foy)


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base
# 1.220 23-Oct-2005 christos

No need to pass an interface when only the mtu is needed. From OpenBSD via
Liam Foy.


Revision tags: yamt-vop-base
# 1.219 05-Aug-2005 elad

branches: 1.219.2;
Add sysctls for IP, ICMP, TCP, and UDP statistics.


# 1.218 28-Jun-2005 seanb

branches: 1.218.2;
- Return ICMP_UNREACH_NET when no route found as per
section 4.3.3.1 of rfc1812.


# 1.217 09-Jun-2005 atatat

Properly fix the constipated lossage wrt -Wcast-qual and the sysctl
code. I know it's not the prettiest code, but it seems to work rather
well in spite of itself.


# 1.216 01-Jun-2005 blymn

Unconstify rnode to prevent compile error when GATEWAY option set.


Revision tags: kent-audio2-base
# 1.215 29-Apr-2005 yamt

move decl of inetsw to its own header to avoid array of incomplete type.
found by gcc4. reported by Adam Ciarcinski.


# 1.214 18-Apr-2005 yamt

fix problems related to loopback interface checksum omission. PR/29971.

- for ipv4, defer decision to ip layer as h/w checksum offloading does
so that it can check the actual interface the packet is going to.
- for ipv6, disable it.
(maybe will be revisited when it implements h/w checksum offloading.)

ok'ed by Jason Thorpe.


# 1.213 29-Mar-2005 yamt

ip_reass: clear stale csum_flags.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base
# 1.212 26-Feb-2005 perry

branches: 1.212.2;
nuke trailing whitespace


Revision tags: yamt-km-base2
# 1.211 03-Feb-2005 perry

ANSIfy function declarations


# 1.210 02-Feb-2005 perry

de-__P -- will ANSIfy .c files later.


Revision tags: yamt-km-base
# 1.209 24-Jan-2005 matt

branches: 1.209.2;
Add IFNET_FOREACH and IFADDR_FOREACH macros and start using them.


Revision tags: kent-audio1-beforemerge
# 1.208 19-Dec-2004 christos

branches: 1.208.2;
yamt's changes seem to fix all the checksumming issues. Turn the loopback
checksums back off so we can make sure that everything works.


# 1.207 17-Dec-2004 christos

Turn checksumming on loopback back on until we fix the bugs in it.
Connect over tcp on the loopback is broken:

4729 amq 0.000007 CALL connect(4,0x804f2a0,0x1c)
4729 amq 75.007420 RET connect -1 errno 60 Connection timed out


# 1.206 15-Dec-2004 thorpej

Don't perform checksums on loopback interfaces. They can be reenabled with
the net.inet.*.do_loopback_cksum sysctl.

Approved by: groo


Revision tags: kent-audio1-base
# 1.205 06-Oct-2004 darrenr

Add a comment to document what setting "srcrt" is really on about in ipintr()


# 1.204 29-Sep-2004 christos

PR/27081: Sean Boudreau: ip_input() bad csum count not incremented on sw csum


Revision tags: BEFORE-IPF413
# 1.203 25-May-2004 atatat

Sysctl descriptions under net subtree (net.key not done)


# 1.202 02-May-2004 darrenr

at line 543, we do a pullup here of hlen bytes into the mbuf,
so these later ones are superfluous.


# 1.201 01-May-2004 matt

Use EVCNT_ATTACH_STATIC{,2}


# 1.200 25-Apr-2004 simonb

Initialise (most) pools from a link set instead of explicit calls
to pool_init. Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.

Convert struct session, ucred and lockf to pools.


# 1.199 22-Apr-2004 matt

Constify protosw arrays. This can reduce the kernel .data section by
over 4K (if all the network protocols) are loaded.


# 1.198 01-Apr-2004 matt

In ip_reass_ttl_descr, make i signed since it's compared to >= 0


Revision tags: netbsd-2-0-base BEFORE-IPF411
# 1.197 24-Mar-2004 atatat

branches: 1.197.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.


# 1.196 15-Jan-2004 itojun

correct typo in 1.94 -> 1.95. pointed out by Shiva Shenoy


# 1.195 14-Dec-2003 thorpej

Fix syntax errors in CHECK_NMBCLUSTER_PARAMS().


# 1.194 14-Dec-2003 jonathan

Second part of hashed IP_reassembly changes:

When under pressure for mbufs or we have too many fragments in the IP
reassembly queue, drop half of all fragments. This multiplicative-drop
strategy ensures we return to a healthy state, even under borderline
denial-of-service from extremely lossy NFS-over-UDP peers.
The multiplicative-drop phase currently drops 50% of fragments, but
has pre-placed support for implementing drop-fractions other than 50%

The threshhold for the `drop-half' phase is the new variable,
ip_maxfrags which is calculated as nmbclusters/4.

ip_input.c now keeps ip_nmbclusters, a cached copy of nmbclusters.
Before using limits derived from nmbclusters, we check if nmbclusters
and ip_nmclusters are equal. If not, we recompute Ip parameters
derived from nmbclusters. Based on a suggestion by Jason Thorpe.
ip_maxfrags is currently auto-recalcuated.

The counters ip_nfrags and ip_nfragpacketsr are now declared static
and uninitialized (bss), to discourage tampering with them.


# 1.193 12-Dec-2003 scw

Make fast-ipsec and ipflow (Fast Forwarding) interoperate.

The idea is that we only clear M_CANFASTFWD if an SPD exists
for the packet. Otherwise, it's safe to add a fast-forward
cache entry for the route.

To make this work properly, we invalidate the entire ipflow
cache if a fast-ipsec key is added or changed.


# 1.192 08-Dec-2003 jonathan

Add new field ipq_nfrags to struct ipq. Maintain count of fragments
(fragments, not fragmented packets) in each queue entry.
Use ipq_nfrags to maintain a count of total fragments in reassembly queue.


# 1.191 07-Dec-2003 jonathan

KNF: s/unsigned/u_int/, in a couple of places I missed.


# 1.190 06-Dec-2003 jonathan

Replace the single global IP reassembly list/listhead, with a
hashtable of list-heads. Independently re-invented, then reworked to
match similar code in FreeBSD.


# 1.189 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.188 04-Dec-2003 scw

ipflow (IP fast forwarding) is not compatible with FAST_IPSEC either.

XXX: The decision whether or not to fast forward should be made
XXX: dynamically. Using the current approach seriously reduces
XXX: routing performance on gateways with IPsec enabled.


# 1.187 26-Nov-2003 itojun

define RANDOM_IP_ID by default (unifdef -DRANDOM_IP_ID).
one use remains in sys/netipsec, which is kept for freebsd source code compat.


# 1.186 24-Nov-2003 scw

For FAST_IPSEC, ipfilter gets to see wire-format IPsec-encapsulated packets
only. Decapsulated packets bypass ipfilter. This mimics current behaviour
for Kame IPsec.


# 1.185 19-Nov-2003 fvdl

Correct number of arguments to sysctl_rdint.


# 1.184 19-Nov-2003 jonathan

Patch back support for (badly) randomized IP ids, by request:

* Include "opt_inet.h" everywhere IP-ids are generated with ip_newid(),
so the RANDOM_IP_ID option is visible. Also in ip_id(), to ensure
the prototype for ip_randomid() is made visible.

* Add new sysctl to enable randomized IP-ids, provided the kernel was
configured with RANDOM_IP_ID. (The sysctl defaults to zero, and is
a read-only zero if RANDOM_IP_ID is not configured).

Note that the implementation of randomized IP ids is still defective,
and should not be enabled at all (even if configured) without
very careful deliberation. Caveat emptor.


# 1.183 17-Nov-2003 jonathan

Diff to netinet/ip_input.c (restore ip_id, initialize) for ip_id fix:

Revert the (default) ip_id algorithm to the pre-randomid algorithm,
due to demonstrated low-period repeated IDs from the randomized IP_id
code. Consensus is that the low-period repetition (much less than
2^15) is not suitable for general-purpose use.

Allocators of new IPv4 IDs should now call the function ip_newid().
Randomized IP_ids is now a config-time option, "options RANDOM_IP_ID".
ip_newid() can use ip_random-id()_IP_ID if and only if configured
with RANDOM_IP_ID. A sysctl knob should be provided.

This API may be reworked in the near future to support linear ip_id
counters per (src,dst) IP-address pair.


# 1.182 12-Nov-2003 itojun

KNF


# 1.181 11-Nov-2003 jonathan

Change global head-of-local-IP-address list from in_ifaddr to
in_ifaddrhead. Recent changes in struct names caused a namespace
collision in fast-ipsec, which are most cleanly fixed by using
"in_ifaddrhead" as the listhead name.


# 1.180 10-Nov-2003 jonathan

Make per-protocol network input queue stats visible to userland via
sysctl. Add a protocol-independent sysctl handler to show the per-protocol
"struct ifq' statistics. Add IP(v4) specific call to the handler.
Other protocols can show their per-protocol input statistics by
allocating a sysclt node and calling sysctl_ifq() with their own struct ifq *.

As posted to tech-kern plus improvements/cleanup suggested by Andrew Brown.


# 1.179 28-Sep-2003 mycroft

Remove some code that breaks AH tunnels completely. The comment describing
the purpose of this code appears to be on crack -- it's talking about
end-to-end authentication, but the purpose of an AH tunnel is NOT end-to-end
authentication; it's authentication of the tunnel endpoints.

NB: This does not fix the fact that IPsec leaks "packet tags."


# 1.178 06-Sep-2003 itojun

randomize IPv4/v6 fragment ID and IPv6 flowlabel. avoids predictability
of these fields. ip_id.c is from openbsd. ip6_id.c is adapted by kame.


# 1.177 06-Sep-2003 itojun

backout previous, we don't know if arc4random() corrides on reboot.


# 1.176 05-Sep-2003 itojun

initialize fragment ID with arc4random, not by time.tv_sec


# 1.175 22-Aug-2003 itojun

remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.


# 1.174 22-Aug-2003 itojun

change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.


# 1.173 15-Aug-2003 jonathan

(fast-ipsec): Add hooks to pass IPv4 IPsec traffic into fast-ipsec, if
configured with ``options FAST_IPSEC''. Kernels with KAME IPsec or
with no IPsec should work as before.

All calls to ip_output() now always pass an additional compulsory
argument: the inpcb associated with the packet being sent,
or 0 if no inpcb is available.

Fast-ipsec tested with ICMP or UDP over ESP. TCP doesn't work, yet.


# 1.172 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.171 14-Jul-2003 itojun

correct igmp. from love


# 1.170 03-Jul-2003 itojun

minor KNF


# 1.169 30-Jun-2003 itojun

branches: 1.169.2;
do not generate ICMP redirect when packet filter alters ip_dst to an
address that reside on the same link. Cedric Berger convinced me that
it is necessary.


# 1.168 30-Jun-2003 itojun

fix indent


# 1.167 23-Jun-2003 martin

Make sure to include opt_foo.h if a defflag option FOO is used.


# 1.166 15-Jun-2003 matt

Change the way multicasts are kept. They now use a hash table in the same
manner as the ifaddr hash table. By doing this, the mkludge code can go
away. At the same time, keep track of what pcbs are using what ifaddr and
when an address is deleted from an interface, notify/abort all sockets
that have that address as a source. Switch IGMP and multicasts to use pools
for allocation. Fix a number of potential problems in the igmp code where
allocation failures could cause a trap/panic.


# 1.165 11-Apr-2003 christos

PR/991: Darren Reed: Add a sysctl (checkinteface) to implement this. This
implementation is taken from FreeBSD, but we default to off.
XXX: We should really do this on a per ifaddr basis as jason suggested.


# 1.164 26-Feb-2003 matt

Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.163 12-Nov-2002 itojun

remove all entries in rt timer queue on ip_mtudisc change, instead of
destroying the queue.


# 1.162 12-Nov-2002 itojun

ckout previous - doesn't compile


# 1.161 12-Nov-2002 itojun

update ip_mtudisc sysctl change handling.


# 1.160 10-Nov-2002 itojun

always create pmtud timeout queue, as ip_mtudisc can be tweaked via
sysctl at runtime. From lha@stacken.kth.se


# 1.159 02-Nov-2002 perry

/*CONTCOND*/ while (0)'ed macros


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.158 23-Sep-2002 itojun

revert mtudisc_timeout value to the old one if update falis


# 1.157 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame.


# 1.156 11-Sep-2002 itojun

correct signedness mixup in pointer passing. sync w/kame


Revision tags: gehenna-devsw-base
# 1.155 14-Aug-2002 itojun

avoid swapping endian of ip_len and ip_off on mbuf, to meet with M_LEADINGSPACE
optimization made last year. should solve PR 17867 and 10195.

IP_HDRINCL behavior of raw ip socket is kept unchanged. we may want to
provide IP_HDRINCL variant that does not swap endian.


# 1.154 30-Jun-2002 thorpej

Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).


# 1.153 13-Jun-2002 itojun

set IPv4 parameter to modern value.
- turn on path MTU discovery (previous: turned off)
- ICMPv4 redirect entry timeout = 600 sec (previous: never timeout)


# 1.152 09-Jun-2002 itojun

whitespace


# 1.151 07-Jun-2002 itojun

look at rmx_mtu on IPsec tunnel MTU computation.
From: David Waitzman <djw@bbn.com>


Revision tags: netbsd-1-6-base
# 1.150 12-May-2002 matt

branches: 1.150.2; 1.150.4;
Eliminate commons.


# 1.149 12-May-2002 wiz

Spelling fixes, from Sergey Svishchev in kern/16650.


# 1.148 07-May-2002 matt

Change struct ipqe to use TAILQ's instead of LIST's (primarily for TCP's
benefit currently). Rework tcp_reass code to optimize the 4 most likely causes
of out-of-order packets: first OoO pkt, next OoO pkt in seq, OoO pkt is part
of new chuck of OoO packets, and the OoO pkt fills the first hole. Add evcnts
to instrument tcp_reass (enabled by the options TCP_REASS_COUNTERS). This is
part 1/2 of tcp_reass changes.


# 1.147 18-Apr-2002 matt

Change test for M_EXT to M_READONLY for MROUTING. We only need to to do
a pullup if we aren't allowed to modify the packet.


Revision tags: eeh-devprop-base newlock-base
# 1.146 08-Mar-2002 thorpej

Pool deals fairly well with physical memory shortage, but it doesn't
deal with shortages of the VM maps where the backing pages are mapped
(usually kmem_map). Try to deal with this:

* Group all information about the backend allocator for a pool in a
separate structure. The pool references this structure, rather than
the individual fields.
* Change the pool_init() API accordingly, and adjust all callers.
* Link all pools using the same backend allocator on a list.
* The backend allocator is responsible for waiting for physical memory
to become available, but will still fail if it cannot callocate KVA
space for the pages. If this happens, carefully drain all pools using
the same backend allocator, so that some KVA space can be freed.
* Change pool_reclaim() to indicate if it actually succeeded in freeing
some pages, and use that information to make draining easier and more
efficient.
* Get rid of PR_URGENT. There was only one use of it, and it could be
dealt with by the caller.

From art@openbsd.org.


Revision tags: ifpoll-base
# 1.145 25-Feb-2002 itojun

correctly enforce ipsec policy check on forwarding case.
From: Greg Troxel <gdt@ir.bbn.com>, Bill Chiarchiaro <wjc@work.cleartech.com>


# 1.144 24-Feb-2002 martin

Clear M_BCAST and M_MCAST on outgoing mbufs.
Don't copy ttl from the inner packet to the encapsulating packet. Make
the outer ttl sysctl'able. This should close PR 14269 from Jasper Wallace
(change partly from there) and it makes traceroute work over gre tunnels.


# 1.143 21-Feb-2002 itojun

suppress source quence message, based on router-req RFC (also could be abused
as DoS traffic generator). from kjc/kame


# 1.142 28-Nov-2001 darrenr

recompute hlen after calling pfil_run_hooks() in case ip_hl was changed.


# 1.141 13-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-mips-cache-base
# 1.140 04-Nov-2001 matt

Convert netinet to not use the internal <sys/queue.h> field names
but instead the access macros. Use the FOREACH macros where appropriate.


# 1.139 04-Nov-2001 matt

Change a few variable/tables to const since they are read-only.


# 1.138 29-Oct-2001 simonb

Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.137 17-Sep-2001 thorpej

branches: 1.137.2;
Split the pre-computed ifnet checksum flags into Tx and Rx directions.
Add capabilities bits that indicate an interface can only perform
in-bound TCPv4 or UDPv4 checksums. There is at least one Gig-E chip
for which this is true (Level One LXT-1001), and this is also the
case for the Intel i82559 10/100 Ethernet chips.


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.136 06-Aug-2001 itojun

branches: 1.136.2;
cache IPsec policy on in6?pcb. most of the lookup operations can be bypassed,
especially when it is a connected SOCK_STREAM in6?pcb. sync with kame.


# 1.135 02-Jun-2001 thorpej

branches: 1.135.2;
Implement support for IP/TCP/UDP checksum offloading provided by
network interfaces. This works by pre-computing the pseudo-header
checksum and caching it, delaying the actual checksum to ip_output()
if the hardware cannot perform the sum for us. In-bound checksums
can either be fully-checked by hardware, or summed up for final
verification by software. This method was modeled after how this
is done in FreeBSD, although the code is significantly different in
most places.

We don't delay checksums for IPv6/TCP, but we do take advantage of the
cached pseudo-header checksum.

Note: hardware-assisted checksumming defaults to "off". It is
enabled with ifconfig(8). See the manual page for details.

Implement hardware-assisted checksumming on the DP83820 Gigabit Ethernet,
3c90xB/3c90xC 10/100 Ethernet, and Alteon Tigon/Tigon2 Gigabit Ethernet.


# 1.134 21-May-2001 lukem

fix spelo in comment


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.133 16-Apr-2001 itojun

give a default value to net.inet.ip.maxfragpackets, to protect us from
"lots of fragmented packets" DoS attack.

the current default value is derived from ipv6 counterpart, which is
a magical value "200". it should be enough for normal systems, not sure
if it is enough when you take hundreds of thousands of tcp connections on
your system. if you have proposal for a better value with concrete reasons,
let me know.


# 1.132 13-Apr-2001 thorpej

Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.


# 1.131 27-Mar-2001 itojun

net.inet.ip.maxfragpackets defines the maximum size of ip reass queue
(prevents fragment flood from chewing up mbuf memory space).
derived from KAME net.inet6.ip6.maxfragpackets.


# 1.130 02-Mar-2001 itojun

branches: 1.130.2;
increase ipstat.ips_badaddr if the packet fails to pass address checks.


# 1.129 02-Mar-2001 itojun

reject packets with 127/8 on IPv4 src/dst, they must not appear on wire
(RFC1122). torture-tests will be welcomed.
XXX do we want to check source routing headers as well?


# 1.128 01-Mar-2001 itojun

make sure to enforce inbound ipsec policy checking, for any protocols on top
of ip (check it when final header is visited). sync with kame.
XXX kame team will need to re-check policy engine code


# 1.127 24-Jan-2001 itojun

- record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation


# 1.126 28-Dec-2000 thorpej

Back out the sledgehammer damage applied by wiz while I was out for
the holiday.


# 1.125 25-Dec-2000 wiz

Back out previous change. It causes NAT to fail, and was CLEARLY
NOT TESTED before it was committed.


# 1.124 22-Dec-2000 thorpej

Slight adjustment to how pfil_head's are registered. Instead of a
"key" and a "dlt", use a "type" (PFIL_TYPE_{AF,IFNET} for now) and
a val/ptr appropriate for that type. This allows for more future
flexibility with the pfil_hook mechanism.


# 1.123 14-Dec-2000 thorpej

Add ALTQ glue. XXX Temporary until ALTQ is changed to use a pfil hook.


# 1.122 24-Nov-2000 itojun

IFA_STATS stability (not complete); don't touch ip if it is NULL.


# 1.121 11-Nov-2000 thorpej

Restructure the PFIL_HOOKS mechanism a bit:
- All packets are passed to PFIL_HOOKS as they come off the wire, i.e.
fields in protocol headers in network order, etc.
- Allow for multiple hooks to be registered, using a "key" and a "dlt".
The "dlt" is a BPF data link type, indicating what type of header is
present.
- INET and INET6 register with key == AF_INET or AF_INET6, and
dlt == DLT_RAW.
- PFIL_HOOKS now take an argument for the filter hook, and mbuf **,
an ifnet *, and a direction (PFIL_IN or PFIL_OUT), thus making them
less IP (really, IP Filter) centric.

Maintain compatibility with IP Filter by adding wrapper functions for
IP Filter.


# 1.120 08-Nov-2000 ad

Update for hashinit() change.


# 1.119 13-Oct-2000 itojun

make sure we don't share external mbuf between m and mcopy, in ip_forward().
should solve PR 11201.


# 1.118 26-Aug-2000 itojun

make sure anonport{min,max} is not negative number


# 1.117 25-Aug-2000 tron

Add new sysctl variables "net.inet.ip.lowportmin" and
"net.inet.ip.lowportmax" which can be used to the set minimum
and maximum port number assigned to sockets using
IP_PORTRANGE_LOW.


# 1.116 06-Jul-2000 itojun

remove unnecessary #include <netkey/key_debug.h>. from kame.


# 1.115 28-Jun-2000 mrg

<vm/vm.h> -> <uvm/uvm_extern.h>


Revision tags: netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.114 10-May-2000 itojun

branches: 1.114.4;
add missing boundary checks to ip options processing.
correct timestamp option validation (len and ptr upper/lower bound
based on RFC791).
fill "pointer" field for parameter problem in timestamp option processing.


# 1.113 10-May-2000 itojun

correct more out-of-bounds memory access, if cnt == 1 and optlen > 1.


# 1.112 06-May-2000 sommerfeld

Handle large offsets with very small options correctly.


# 1.111 31-Mar-2000 jdolecek

Slighly improve previous - only include <netinet/ip_mroute.h> if MROUTING
is defined.


# 1.110 31-Mar-2000 jdolecek

include <netinet/ip_mroute.h> for ip_mforward() - needed after
last duplicate prototype sweep (prototype for ip_mforward() used to be in <netinet/ip_var.h>)


# 1.109 30-Mar-2000 augustss

Remove register declarations.


# 1.108 30-Mar-2000 simonb

Delete uninitialised declaration of ip_defttl - there's an initialised
decl earlier in this file.


# 1.107 10-Mar-2000 thorpej

Back out previous, and adjust a comment.


# 1.106 07-Mar-2000 thorpej

Back out part of 1.104 which isn't actually needed.


# 1.105 03-Mar-2000 itojun

remove unnecessary ttl initialization which I mistakingly bringed in
during KAME merge (this is part of WIDE's expeirmental reass code...)
NetBSD PR: 9412
From: Wolfgang Rupprecht <wolfgang@wsrcc.com>
Fix from: ho@crt.se
itojun was notified from: theo


# 1.104 02-Mar-2000 thorpej

Avoid a bug in GCC which manifests itself when processing unaligned
IP options. Problem pointed out by Matt Hargett and Erik Fair, analyzed
by me.


# 1.103 01-Mar-2000 itojun

introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)


# 1.102 20-Feb-2000 darrenr

pass "struct pfil_head *" to pfil_add_hook and pfil_remove hook rather
than "struct protosw *".


# 1.101 17-Feb-2000 darrenr

Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.


# 1.100 16-Feb-2000 itojun

- if ip_dst matches address on !IFF_UP interface, and
- there's no match against addresses on IFF_UP interface,
send icmp unreach if I'm router. drop it if I'm host.

Revised version of PR: 9387 from nrt@iij.ad.jp. Discussed with thorpej+nrt.


Revision tags: chs-ubc2-newbase
# 1.99 12-Feb-2000 thorpej

Typo (Thanks, Havard :-)


# 1.98 12-Feb-2000 thorpej

Small cosmetic change, and note a place where a statistic should be
gathered.


# 1.97 11-Feb-2000 itojun

fix in-kernel packet forwarding loop (till TTL becomes 0) when:
- a packet is delivered to an address X,
- and the address X is configured on my !IFF_UP interface
- and ipforwarding=1

NetBSD PR: 9387
From: nrt@iij.ad.jp


# 1.96 01-Feb-2000 thorpej

Use ifatoia() and sintosa() consistently, rather than using home-grown
casting macros intermixed.


# 1.95 31-Jan-2000 itojun

bring in latest KAME ipsec tree.
- interop issues in ipcomp is fixed
- padding type (after ESP) is configurable
- key database memory management (need more fixes)
- policy specification is revisited

XXX m->m_pkthdr.rcvif is still overloaded - hope to fix it soon


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.94 26-Oct-1999 itojun

disable ipflow (IPv4 fast fowarding) when IPsec is configured into the kernel.


# 1.93 17-Oct-1999 sommerfeld

branches: 1.93.2; 1.93.4;
In ip_forward():

Avoid forwarding ip unicast packets which were contained inside
link-level multicast packets; having M_MCAST still set in the packet
header flags will mean that the packet will get multicast to a bogus
group instead of unicast to the next hop.

Malformed packets like this have occasionally been spotted "in the
wild" on a mediaone cable modem segment which also had multiple netbsd
machines running as router/NAT boxes.

Without this, any subnet with multiple netbsd routers receiving all
multicasts will generate a packet storm on receipt of such a
multicast. Note that we already do the same check here for link-level
broadcasts; ip6_forward already does this as well.

Note that multicast forwarding does not go through ip_forward().

Adding some code to if_ethersubr to sanity check link-level
vs. ip-level multicast addresses might also be worthwhile.


Revision tags: chs-ubc2-base
# 1.92 23-Jul-1999 itojun

branches: 1.92.2;
do not include unnecessary include files.


# 1.91 09-Jul-1999 thorpej

defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).


# 1.90 06-Jul-1999 itojun

sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups


# 1.89 01-Jul-1999 itojun

IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.88 26-Jun-1999 sommerfeld

If the new global variable hostzerobroadcast is zero, no longer assume
address zero of each net/subnet is a broadcast address.
(The default value is nonzero, which preserves the current behavior).

This can be set using sysctl; the boot-time default can also be
configured using the HOSTZEROBROADCAST kernel config option.

While we're here, defopt HOSTZEROBROADCAST and SUBNETSARELOCAL


# 1.87 04-May-1999 hwr

It does not make much sense to increase a "output" counter on input.


# 1.86 03-May-1999 thorpej

In INADDR_TO_IA(), skip interfaces which are not up. Revert previous change
to ip_input.c to check the interface status after INADDR_TO_IA().

Fix cooked up by Heiko Rupp and myself.

Fixes PR 7480.


# 1.85 03-May-1999 hwr

Drop packets, that have a Class-D address as source address.
Implements the first half of PR 7003.


# 1.84 07-Apr-1999 proff

tiny KNF change


# 1.83 07-Apr-1999 proff

Prevent reception of packets on downed interfaces (via an up interface).
fixes kern/7327


Revision tags: netbsd-1-4-base
# 1.82 27-Mar-1999 aidan

branches: 1.82.2;
Added per-addr input/output statistics. Currently just support netatalk
and netinet, currently only tested under netinet.

Disabled by default, enabled by compiling the kernel with option
IFA_STATS. Enabling this feature seems to make the ip_output function
take 13% longer than before, which should be OK for people that need
this feature.


# 1.81 26-Mar-1999 proff

security: test for ip_len < ip_hl <<2 and drop packet accordingly


# 1.80 19-Jan-1999 mycroft

There's just no plausible reason to byte-swap ip_id internally. It's opaque.


# 1.79 19-Jan-1999 mycroft

Don't screw with ip_len; just subtract from it where we actually use the
value.


# 1.78 19-Jan-1999 mycroft

Don't overwrite the checksum fields when checking them. There's no reason to
do this, and it screws up ICMP replies.
XXX The returned IP checksum and length are still wrong.


# 1.77 11-Jan-1999 thorpej

Fix byte order and ip_len inconsistencies in ICMP reply code. Also, fix
some formatting and HTONS(foo) vs. foo = htons(foo) inconsistencies.

PR #6602, Darren Reed.


# 1.76 19-Dec-1998 thorpej

Reverse the copyright-notice-swap. It went against existing practice.


# 1.75 18-Dec-1998 thorpej

Add a lock around the IP fragment reassembly queue, to prevent ip_drain()
from corrupting the queue if called from a device's interrupt context.

Should fix PR #5684.


Revision tags: kenh-if-detach-base
# 1.74 13-Nov-1998 thorpej

branches: 1.74.2;
Once a fragmented IP packet has been reassembled, recompute the packet
length before passing it up the stack. From FreeBSD.


Revision tags: chs-ubc-base
# 1.73 08-Oct-1998 thorpej

Use the pool allocator for ipflow entries.


# 1.72 08-Oct-1998 thorpej

Use the pool allocator for ipqent structures.


# 1.71 30-Sep-1998 tls

Switch order of TNF and UCB copyrights so UCB copyright is first; this seems more appropriate since UCB wrote the original code, after all.


# 1.70 09-Sep-1998 thorpej

Make a diagnostic printf more sensible, PR #5951, Heiko W. Rupp.


# 1.69 09-Aug-1998 mrg

defopt PFIL_HOOKS.


Revision tags: eeh-paddr_t-base
# 1.68 17-Jul-1998 sommerfe

Fix PR5508: ipfil cut-through forwarding causes panic


# 1.67 01-Jun-1998 thorpej

Protect the ipflow_reap() call with splsoftnet.


# 1.66 24-May-1998 thorpej

Fix OBOB in IP timestamp option processing, as noted in FreeBSD PR 6738,
from Jennifer Dawn Meyers <jdm@enteract.com>.


# 1.65 04-May-1998 matt

Default IP flow to being enabled. Add a sysctl to control the maximum
number of flows (net.inet.ip.maxflows). If set to 0, will disable fast
path forwarding.


# 1.64 01-May-1998 thorpej

Allow packet filters to prevent a packet from creating a fast-forwarding
flow, by setting the "can fast forward" flag in the packet header, and
giving a chance for filters to clear the flag. If the flag is still
set after the filters have given it a chance, the packet will be used
to create a fast-forward flow entry.


# 1.63 29-Apr-1998 matt

Add support for "fast" forwarding. Add hooks in if_ethersubr.c and
if_fddisubr.c to fastpath IP forwarding. If ip_forward successfully
forwards a packet, it will create a cache (ipflow) entry. ether_input
and fddi_input will first call ipflow_fastforward with the received
packet and if the packet passes enough tests, it will be forwarded (the
ttl is decremented and the cksum is adjusted incrementally).


# 1.62 29-Apr-1998 matt

defopt GATEWAY


# 1.61 29-Apr-1998 kml

change path MTU timeout value to match RFC 1191


# 1.60 29-Apr-1998 kml

Add support for deletion of routes added by path MTU discovery;
uses new generic route timeout code. Add sysctl for timeout period.


# 1.59 19-Mar-1998 mrg

convert pfil(9) in and out lists from <sys/queue.h> LISTs to TAILQs, and
change pfil_add_hook to put output filters at the tail of the queue,
while continuing to place input filters at the head of the queue. update
the two users of these functions, and document these changes.

fixes PR#4593.


# 1.58 15-Feb-1998 tls

Add correct copyright notice for IP address hash change. This code is donated to TNF by the original copyright holder, Panix.


# 1.57 13-Feb-1998 tls

Change list of interface IP addresses to a hash. Improves performance on hosts with a large number of IP addresses significantly.


# 1.56 28-Jan-1998 thorpej

Use offsetof() from libkern.h


# 1.55 12-Jan-1998 scottr

Use option header file for MROUTING


# 1.54 05-Jan-1998 lukem

enhance ephemeral port allocation code:
* support sysctl net.inet.ip.anonportmin (lowest ephemeral port)
and net.inet.ip.anonportmax (highest ephemeral port).
these can't be set to >65535, < IPPORT_RESERVED (unless IPNOPRIVPORTS
is defined), and anonportmin has to be < anonportmax.
* use a cleaner way of only cycling through the available set once;
this will be useful for when a random allocation scheme is used
* define IPPORT_ANON{MIN,MAX} instead of IPPORT_USER{LOW,HIGH}


Revision tags: netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base
# 1.53 18-Oct-1997 kml

branches: 1.53.2;
change sysctl net.inet.icmp.mtudisc to net.inet.ip.mtudisc


# 1.52 17-Oct-1997 thorpej

Allow `subnetsarelocal' to be changed via sysctl.


Revision tags: thorpej-signal-base marc-pcmcia-base
# 1.51 29-Aug-1997 gwr

Tweaks to allow operation with an interface address of 0.0.0.0
(needed for NFS mountroot using BOOTP to get boot parameters)


Revision tags: marc-pcmcia-bp
# 1.50 24-Jun-1997 thorpej

branches: 1.50.4;
Eliminate use of dtom() from the network code, allowing more flexible
use of mbuf external storage and increasing performance (by eliminating
an m_pullup() for clusters in the IP reassembly code).

Changes from Koji Imada <koji@math.human.nagoya-u.ac.jp>, in PR #3628
and #3480, with ever-so-slight integration changes by me.


# 1.49 15-Apr-1997 christos

Move the mtod calls *after* we've made sure that the packet has passed the
filter successfully. Otherwise it can be NULL if the filter blocked it,
and we die. How did this ever work?


Revision tags: is-newarp-before-merge
# 1.48 26-Feb-1997 mrg

allow src-routed packetd by default, per host requirements


# 1.47 25-Feb-1997 cjs

Add net.inet.ip.allowsrcrt option which allows/drops all source
routed packets. This currently defaults to `drop,' but once we
verify that all applications that rely on determining remote IP
addresses for authentication are dropping the connection when they
see a source route option (not just disabling the source route
option), we can turn this back on and conform with the host
requirements.


# 1.46 19-Feb-1997 cjs

Fix bug in sysctl net.inet.ip.forwsrcrt handing: now you can read it
if securelevel > 0. (Thanks, cgd.)


# 1.45 18-Feb-1997 mrg

pseudo-device ipfilter brings in PFIL_HOOKS.


Revision tags: is-newarp-base
# 1.44 11-Jan-1997 thorpej

branches: 1.44.4;
Implement the IP_RECVIF socket option: supply a datagram packet's incoming
interface using a sockaddr_dl in a control mbuf.

Implement SO_TIMESTAMP for IP datagrams.

Move packet information option processing into a generic function
so that they work with multicast UDP and raw IP as well as unicast UDP.

Contributed by Bill Fenner <fenner@parc.xerox.com>.


# 1.43 20-Dec-1996 mrg

in pfil_hooks: always reassign ip after calling hook.


# 1.42 20-Dec-1996 mrg

remove pfil_bad.


# 1.41 25-Oct-1996 thorpej

Before concatenating frags, sanity check the length of the packet. If it's
larger than IP_MAXPACKET, discard it.
Based on a patch from Bill Fenner <fenner@parc.xerox.com>


# 1.40 22-Oct-1996 veego

Fix a panic from the pfil_hooks.


# 1.39 13-Oct-1996 christos

backout previous kprintf changes


# 1.38 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.37 21-Sep-1996 perry

commit fix in pr 2772 -- the IP input code was assuming that the
reserved (must be zero) flag must necessarily be zero. We now define
an IP_RF (by analogy to IP_DF and IP_MF) and mask it out when necessary.


# 1.36 14-Sep-1996 mrg

move the packet filter hooks in to a saner location. while i'm here, rename
PACKET_FILTER to PFIL_HOOKS.


# 1.35 09-Sep-1996 mycroft

Add in_nullhost() and in_hosteq() macros, to hide some protocol
details. Also, fix a bug in TCP wrt SYN+URG packets.


# 1.34 08-Sep-1996 mycroft

Save 68 bytes of the packet for ICMP, not 64. From Laine Stump, PR 2296.


# 1.33 06-Sep-1996 mrg

add packet filter interface code. see pfil(9) for more details. you
need the PACKET_FILTER option to enable this code. currently, ipfilter
version 3.1.1-beta has been converted to use this new interface.


# 1.32 14-Aug-1996 thorpej

Fix some DIAGNOSTIC printf() formats; ntohl() provides a 32-bit quantity,
and should be printed with %x, not %lx.


# 1.31 10-Jul-1996 cgd

print result of ntohl/htonl as a long. (makes -Wformat work on the
Alpha.)


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.30 16-Mar-1996 christos

branches: 1.30.4;
Fix printf format args.


# 1.29 26-Feb-1996 mrg

two more local addr changes, all done differently now (idea from charles)


# 1.28 13-Feb-1996 christos

netinet prototypes


# 1.27 16-Jan-1996 thorpej

Add a net.inet.ip.directed-broadcast sysctl as suggested by
Darren Reed <darrenr@vitruvius.arbld.unimelb.edu.au> in PR #1227.
This change is slightly different than the one submitted by Darren in
that the DIRECTED_BROADCAST compile-time option will behave like it used
to so that existing configurations utilizing it won't have to change.


# 1.26 15-Jan-1996 thorpej

Add net.inet.ip.forwsrcrt: if zero, the system will not forward
source-routed packets. Note this value is protected by kernel security
level; it can only be changed if securelevel < 1.


# 1.25 21-Nov-1995 cgd

make netinet work on systems where pointers and longs are 64 bits
(like the alpha). Biggest problem: IP headers were overlayed with
structure which included pointers, and which therefore didn't overlay
properly on 64-bit machines. Solution: instead of threading pointers
through IP header overlays, add a "queue element" structure to do
the threading, and point it at the ip headers.


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.24 12-Aug-1995 mycroft

splnet --> splsoftnet


# 1.23 12-Jun-1995 mycroft

Change in_pcbnotify*() to take an errno value. Make inetctlerrmap[] an
array on ints, not u_chars.


# 1.22 12-Jun-1995 mycroft

Various cleanup, including:
* Convert several data structures to use queue.h.
* Split in_pcbnotify() into two parts; one for notifying a specific PCB, and
one for notifying all PCBs for a particular foreign address.


# 1.21 07-Jun-1995 mycroft

Remove ip_ifmatrix completely.


# 1.20 04-Jun-1995 mycroft

Don't cast things unnecessarily.


# 1.19 04-Jun-1995 mycroft

Clean up many more casts.


# 1.18 01-Jun-1995 mycroft

Avoid byte-swapping IP addresses at run time.


# 1.17 15-May-1995 cgd

oops; forgot a '{'


# 1.16 14-May-1995 cgd

drop (and record) malformed IP fragments. Fixes pr 1030 (differently).


# 1.15 13-Apr-1995 cgd

be a bit more careful and explicit with types. (basically a large no-op.)


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.14 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.13 13-May-1994 mycroft

Update to 4.4-Lite networking code, with a few local changes.


# 1.12 14-Feb-1994 mycroft

PARANOID --> DIAGNOSTIC for inexpensive tests.


# 1.11 02-Feb-1994 hpeyerl

Multicast is no longer optional.


# 1.10 29-Jan-1994 brezak

Fix some cases of NOT dealing with m_pkthdr's. This code is still suspect though, at least this fixes some panics.


# 1.9 10-Jan-1994 mycroft

Should compile now with or without `options MULTICAST'.


# 1.8 09-Jan-1994 mycroft

Prototype the rest.


# 1.7 08-Jan-1994 mycroft

More prototypes.


# 1.6 08-Jan-1994 mycroft

Fix some inconsistent spacing; spaces at the end of lines, etc.


# 1.5 18-Dec-1993 mycroft

Canonicalize all #includes.


# 1.4 06-Dec-1993 hpeyerl

multicast support.
>From Chris Maeda, cmaeda@cs.washington.edu
These patches are derived from the IP Multicast patches for BSDI.


Revision tags: magnum-base netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.3 20-May-1993 cgd

branches: 1.3.4;
more rcsid additions and file header cleanups


# 1.2 04-May-1993 cgd

make ip_input recursion checking be for -DPARANOID, and make it panic


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.355 01-Jun-2017 chs

remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.


Revision tags: prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base
# 1.354 31-Mar-2017 ozaki-r

Don't use a single global variable to store source route information for multiple incoming packets

It's not MP-safe. So use a m_tag to store the information instead.

Pointed out by knakahara@
The fix is from OpenBSD (originally fixed in FreeBSD)


# 1.353 31-Mar-2017 ozaki-r

Don't use a single global variable as a temporal storage for multiple packets

It's not MP-safe. So use local variables instead.


Revision tags: pgoyette-localcount-20170320
# 1.352 06-Mar-2017 ozaki-r

Make sure icmp_redirect_timeout_q and ip_mtudisc_timeout_q are initialized on bootup

Fix PR kern/52029


# 1.351 17-Feb-2017 ozaki-r

Fix return value


# 1.350 17-Feb-2017 ozaki-r

Protect sysctl_net_inet_ip_pmtudto with icmp_mtx instead of softnet_lock


# 1.349 07-Feb-2017 ozaki-r

Add missing NULL checks for m_get_rcvif


Revision tags: nick-nhusb-base-20170204
# 1.348 24-Jan-2017 ozaki-r

Tweak softnet_lock and NET_MPSAFE

- Don't hold softnet_lock in some functions if NET_MPSAFE
- Add softnet_lock to sysctl_net_inet_icmp_redirtimeout
- Add softnet_lock to expire_upcalls of ip_mroute.c
- Restore softnet_lock for in{,6}_pcbpurgeif{,0} if NET_MPSAFE
- Mark some softnet_lock for future work


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107
# 1.347 12-Dec-2016 ozaki-r

branches: 1.347.2;
Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.346 08-Dec-2016 ozaki-r

Use psref for ip_rtaddr

ip_rtaddr will be sleepable soon. So use psref instead of pserialize.


# 1.345 08-Dec-2016 ozaki-r

Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.344 18-Oct-2016 ozaki-r

Don't hold global locks if NET_MPSAFE is enabled

If NET_MPSAFE is enabled, don't hold KERNEL_LOCK and softnet_lock in
part of the network stack such as IP forwarding paths. The aim of the
change is to make it easy to test the network stack without the locks
and reduce our local diffs.

By default (i.e., if NET_MPSAFE isn't enabled), the locks are held
as they used to be.

Reviewed by knakahara@


# 1.343 18-Oct-2016 ozaki-r

Avoid double frees of mbuf

May fix one of panicks reported by Tom Ivar Helbekkmo in PR kern/51522


# 1.342 11-Oct-2016 ozaki-r

Fix kernel builds with IFA_STATS


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.341 07-Sep-2016 roy

Disallow input to detached addresses because they are not yet valid.


# 1.340 31-Aug-2016 ozaki-r

Make ipforward_rt and ip6_forward_rt percpu

Sharing one rtcache between CPUs is just a bad idea.

Reviewed by knakahara@


Revision tags: pgoyette-localcount-20160806
# 1.339 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


# 1.338 26-Jul-2016 ozaki-r

Fix downmatch increment


Revision tags: pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.337 08-Jul-2016 ozaki-r

branches: 1.337.2;
CID 1363344: remove dead code

We may need to reconsider a case when m_get_rcvif_psref returns NULL.


# 1.336 07-Jul-2016 ozaki-r

Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.


# 1.335 06-Jul-2016 ozaki-r

Switch the IPv4 address list to pslist(9)

Note that we leave the old list just in case; it seems there are some
kvm(3) users accessing the list. We can remove it later if we confirmed
nobody does actually.


# 1.334 06-Jul-2016 ozaki-r

Add and use pslist(9)-based hashtable for IPv4 addresses

Note that we leave the old hashtable to keep vmstat -H working.


# 1.333 04-Jul-2016 ozaki-r

Separate IP address matching functions

No functional change intended.


# 1.332 30-Jun-2016 ozaki-r

Tidy up goto lables

No functional change.


# 1.331 30-Jun-2016 ozaki-r

Fix error paths

Some error paths did m_put_rcvif_psref twice.


# 1.330 28-Jun-2016 ozaki-r

Add missing NULL checks for m_get_rcvif_psref


# 1.329 10-Jun-2016 ozaki-r

Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319
# 1.328 21-Jan-2016 riastradh

Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.


# 1.327 21-Jan-2016 riastradh

Give proper prototype to ip_output.


# 1.326 08-Jan-2016 knakahara

eliminate ip_input.c and ip6_input.c dependency on gif(4)


Revision tags: nick-nhusb-base-20151226
# 1.325 13-Oct-2015 roy

Include arp.h to restore the sysctl net.inet.ip.dad_count.
Fixes PR kern/49883 thanks to HITOSHI Osada.


Revision tags: nick-nhusb-base-20150921
# 1.324 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.323 07-Aug-2015 ozaki-r

Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.


Revision tags: nick-nhusb-base-20150606
# 1.322 02-May-2015 joerg

Fix !ARP build.


# 1.321 02-May-2015 roy

Add IPv4 address flags IN_IFF_TENTATIVE, IN_IFF_DUPLICATED and
IN_IFF_DETATCHED to mimic the IPv6 address behaviour.
Add SIOCGIFAFLAG_IN ioctl to retrieve the address flag via the
ifreq structure.
Add IPv4 DAD detection via the ARP methods described in RFC 5227.
Add sysctls net.inet.ip.dad_count and net.inet.arp.debug.

Discussed on tech-net@


Revision tags: nick-nhusb-base-20150406
# 1.320 26-Mar-2015 ozaki-r

Tidy up the regular path of ip_forward

No functional change is intended.


Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.319 16-Jun-2014 ozaki-r

branches: 1.319.4;
Add 3rd argument to pktq_create to pass sc

It will be used to pass bridge sc for bridge_forward softint.

ok rmind@


# 1.318 05-Jun-2014 rmind

- Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.


# 1.317 30-May-2014 christos

Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.


# 1.316 29-May-2014 rmind

Make IGMP and multicast group management code MP-safe. Use a read-write
lock to protect the hash table of multicast address records; also, make it
private and eliminate some macros. In the long term, the lookup path ought
to be optimised.


# 1.315 28-May-2014 christos

CID 12164{49,51}: Remove bogus ifp == NULL checks; if ifp was really NULL,
we would have been dead a few lines before the tests.


# 1.314 23-May-2014 rmind

ip_input(), ip_savecontrol(): cache m->m_pkthdr.rcvif in a variable.


# 1.313 23-May-2014 rmind

Make ip_forward() static, there is no need to expose it.


# 1.312 23-May-2014 rmind

Make ip_input() static, there is no need to expose it.


# 1.311 22-May-2014 rmind

- Add in_init() and move some functions, variables and sysctls into in.c
where they belong to. Make some functions and variables static.
- ip_input.c: reduce some #ifdefs, cleanup a little.
- Move some sysctls into ip_flow.c as they belong there.

No functional change.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.310 19-Mar-2014 liamjfoy

branches: 1.310.2;
Remove ipflow_prune and replace with ipflow_reap. ok rmind@


Revision tags: riastradh-drm2-base3
# 1.309 25-Feb-2014 pooka

Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.308 29-Jun-2013 rmind

- Rewrite parts of pfil(9): use array to store hooks and thus be more cache
friendly (there are only few hooks in the system). Make the structures
opaque and the interface more strict.
- Remove PFIL_HOOKS option by making pfil(9) mandatory.


# 1.307 27-Jun-2013 christos

branches: 1.307.2;
flip src/dst


# 1.306 27-Jun-2013 christos

implement IP_PKTINFO and IP_RECVPKTINFO.


# 1.305 08-Jun-2013 rmind

Split IPsec code in ip_input() and ip_forward() into the separate routines
ipsec4_input() and ipsec4_forward(). Tested by christos@.


# 1.304 05-Jun-2013 christos

IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.


Revision tags: agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.303 29-Nov-2012 christos

Add a new sysctl to mark ports as reserved, so that they are not used in
the anonymous or reserved port allocation.


Revision tags: yamt-pagecache-base6
# 1.302 25-Jun-2012 christos

branches: 1.302.2;
rename rfc6056 -> portalgo, requested by yamt


# 1.301 22-Jun-2012 christos

PR/46602: Move the rfc6056 port randomization to the IP layer.


# 1.300 02-Jun-2012 dsl

Add some pre-processor magic to verify that the type of the data item
passed to sysctl_createv() actually matches the declared type for
the item itself.
In the places where the caller specifies a function and a structure
address (typically the 'softc') an explicit (void *) cast is now needed.
Fixes bugs in sys/dev/acpi/asus_acpi.c sys/dev/bluetooth/bcsp.c
sys/kern/vfs_bio.c sys/miscfs/syncfs/sync_subr.c and setting
AcpiGbl_EnableAmlDebugObject.
(mostly passing the address of a uint64_t when typed as CTLTYPE_INT).
I've test built quite a few kernels, but there may be some unfixed MD
fallout. Most likely passing &char[] to char *.
Also add CTLFLAG_UNSIGNED for unsiged decimals - not set yet.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.299 22-Mar-2012 drochner

remove KAME IPSEC, replaced by FAST_IPSEC


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.298 09-Jan-2012 liamjfoy

check against NULL


# 1.297 19-Dec-2011 drochner

rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.296 31-Aug-2011 plunky

branches: 1.296.2; 1.296.6;
NULL does not need a cast


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.295 03-May-2011 dyoung

*_drain() routines may be called with locks held, so instead of doing
any work in *_drain(), set a drain-needed flag. Do the work in the
fasttimo handler.

Contributed by Coyote Point Systems, Inc.


# 1.294 14-Apr-2011 dyoung

In ipintr(), don't overwrite ipintrq.ifq_maxlen with IFQ_MAXLEN.

Initialize ipintrq.ifq_maxlen using IFQ_MAXLEN directly instead of using
the global ipqmaxlen. Get rid of the global ipqmaxlen.

Now it works again to override the maximum IP queue length with, for
example, sysctl -w net.inet.ip.ifq.maxlen=5.


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.293 13-Dec-2010 matt

branches: 1.293.2;
Back out rev that shouldn't have been committed.


# 1.292 11-Dec-2010 matt

Add routines to calculate a checkesum if the driver concludes that the
h/w can't do it.


Revision tags: uebayasi-xip-base4
# 1.291 05-Nov-2010 rmind

ip_randomid: make mechanism MP-safe and more modular.

OK matt@


# 1.290 05-Nov-2010 rmind

ip_reass_packet: finish abstraction; some clean-up.
Discussed some time ago with matt@.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.289 19-Jul-2010 rmind

Abstract IP reassembly into single generic routine - ip_reass_packet().
Make struct ipq private and struct ipqent not visible to userland.
Push ip_len adjustment into reassembly layer.

OK matt@


# 1.288 13-Jul-2010 rmind

Split-off IPv4 re-assembly mechanism into a separate module. Abstract
into ip_reass_init(), ip_reass_lookup(), etc (note: abstraction is not
yet complete). No functional changes to the actual mechanism.

OK matt@


# 1.287 09-Jul-2010 rmind

ip_input: move lookup for fragment queue a little bit further. OK matt@.


Revision tags: uebayasi-xip-base1
# 1.286 01-Apr-2010 tls

As suggested by at least 3 different people (the guilty parties know who
they are) avoid repeated kernel_lock/unlock by using an intrq on the stack.

About 5%-10% better from run to run, on my *very* simpleminded test. Can't
possibly be worse.


# 1.285 31-Mar-2010 tls

Don't hold kernel lock across call to ip_input() -- it blocked *all*
hardware interrupts for the length of time it took for all dequeued
packets to flow up the stack (on multiprocessors only). Initial testing
shows performance impact is minimal -- since this temporary fix actually
means taking/releasing the kernel lock per-packet, that seems
acceptable.

Holding the kernel lock across the ip_input() call duplicated the
exclusion intended to be provided by the socket locks/softnet lock
(same lock, for INET/INET6 sockets) and could mask serious bugs. Several
hours' testing didn't turn any up but I'd be surprised if some don't now
appear.

Damon Permezel noticed the problem. Temporary fix suggested by matt@.


Revision tags: yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.284 16-Sep-2009 pooka

branches: 1.284.2; 1.284.4;
Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.283 17-Jul-2009 minskim

Delete trailing whitespace.


Revision tags: yamt-nfs-mp-base6
# 1.282 16-Jul-2009 minskim

Add the IP_RECVTTL option support.

If the IP_RECVTTL option is enabled on a SOCK_DGRAM socket, the
recvmsg(2) call will return the TTL of the received datagram. The
msg_control field in the msghdr structure points to a buffer that
contains a cmsghdr structure followed by the TTL value.

Modeled after FreeBSD implementation.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.281 18-Apr-2009 tsutsui

Remove extra whitespace added by a stupid tool.
XXX: more in src/sys/arch


# 1.280 15-Apr-2009 elad

Remove a few KAUTH_GENERIC_ISSUSER in favor of more descriptive
alternatives.

Discussed on tech-kern:

http://mail-index.netbsd.org/tech-kern/2009/04/11/msg004798.html

Input from ad@, christos@, dyoung@, tsutsui@.

Okay ad@.


# 1.279 18-Mar-2009 cegger

bcopy -> memcpy


Revision tags: nick-hppapmap-base2
# 1.278 19-Jan-2009 christos

branches: 1.278.2;
Provide compatibility to the old timeval SCM_TIMESTAMP messages.


Revision tags: mjf-devfs2-base
# 1.277 17-Dec-2008 cegger

kill MALLOC and FREE macros.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.276 23-Nov-2008 rmind

ip_input: fix an IPQ "lock" leak. (hi <matt>!)


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4
# 1.275 04-Oct-2008 pooka

branches: 1.275.2; 1.275.4;
POOL_INIT -> pool_init


Revision tags: wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.274 05-Sep-2008 seanb

Wrong route being consulted in one place
in ip_forward() after change to rtcache_*().
Restore previous behaviour.


# 1.273 20-Aug-2008 matt

Make the sysctl routines take out softnet_lock before dealing with
any data structures.

Change inet6ctlerrmap and zeroin6_addr to const.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base
# 1.272 05-May-2008 ad

branches: 1.272.2; 1.272.6;
- Convert hashinit() to use kmem_alloc(). The hash tables can be large
and it's better to not have them in kmem_map.
- Convert a couple of minor items along the way to kmem_alloc().
- Fix some memory leaks.


# 1.271 04-May-2008 thorpej

Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577


# 1.270 02-May-2008 ad

PR kern/38497 Out of memory allocating ksiginfo

Work around: don't acquire softnet_lock in protocol drain routines.


# 1.269 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.268 24-Apr-2008 ad

branches: 1.268.2;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.


# 1.267 23-Apr-2008 thorpej

Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.266 12-Apr-2008 thorpej

branches: 1.266.2;
Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.


# 1.265 09-Apr-2008 thorpej

- ipflow is not used outside ip_flow.c; move its definition there.
- Make ipflow_reap() private to ip_flow.c, and introduce ipflow_prune()
for external callers to use (avoids returning an ipflow * that is never
actually used anyway).


# 1.264 07-Apr-2008 thorpej

Change IP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ipstat structure; old netstat
binaries will continue to work properly.


# 1.263 27-Mar-2008 cube

- Make sure we send a reasonable fragment size when IPSEC is configured.
Otherwise we end up sending a dubious "0" whenever we cannot find a
proper association for the packet.
- Reset sack_newdata along with snd_nxt to avoid improper integer
arithmetics that lead to sending data from an incorrect place in the
stream, making it appear as corrupted.

Patch by Michael Van Elst, based on an analysis by Michael for the IPSEC
stuff and I for the SACK issue.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.262 06-Feb-2008 matt

branches: 1.262.6;
Add a new ip_id generation scheme based on a Fisher-Yates shuffle over a
sliding window. XXX replace use of arc4random RSN.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.261 14-Jan-2008 dyoung

Use rtcache_validate() instead of rtcache_getrt(). Shorten staircase
in in_losing().


Revision tags: vmlocking2-base3 matt-armv6-base
# 1.260 22-Dec-2007 matt

Fix offset calculation.
Make sure that all frags use the same TOS.


# 1.259 21-Dec-2007 matt

Also make sure the first is at 68 bytes long.


# 1.258 21-Dec-2007 matt

Prevent TCP blind data attacks by not allowing non-initial fragments to
start at less than 68 bytes (minimal fragment size).


# 1.257 20-Dec-2007 dyoung

Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.256 26-Nov-2007 yamt

branches: 1.256.2; 1.256.6;
inetctlerrmap: use designated initializer.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.255 09-Nov-2007 kefren

Don't MCLAIM in ipintr() because we do it anyway in ip_input()


Revision tags: jmcneill-base yamt-x86pmap-base4 yamt-x86pmap-base3 yamt-x86pmap-base2 vmlocking-base
# 1.254 02-Oct-2007 dyoung

branches: 1.254.2; 1.254.4;
Delete the unused second argument to ip_stripoptions(), move it
closer to its single caller in if_eon.c, try to move fewer bytes
by moving the IP header forward instead of moving the tail of the
mbuf backward, and use m_adj(9) instead of fiddling directly with
mbuf data members.


Revision tags: yamt-x86pmap-base
# 1.253 11-Sep-2007 degroote

branches: 1.253.2;
In some FAST_IPSEC, spl level is not restored correctly. Fix that.

Spotted by Wolfgang Stukenbrock in pr/36800


Revision tags: nick-csl-alignment-base5
# 1.252 30-Aug-2007 dyoung

Use malloc(9) for sockaddrs instead of pool(9), and remove dom_sa_pool
and dom_sa_len members from struct domain. Pools of fixed-size
objects are too rigid for sockaddr_dls, whose size can vary over
a wide range.

Return sockaddr_dl to its "historical" size. Now that I'm using
malloc(9) instead of pool(9) to allocate sockaddr_dl, I can create
a sockaddr_dl of any size in the kernel, so expanding sockaddr_dl
is useless.

Avoid using sizeof(struct sockaddr_dl) in the kernel.

Introduce sockaddr_dl_alloc() for allocating & initializing an
arbitrary sockaddr_dl on the heap.

Add an argument, the sockaddr length, to sockaddr_alloc(),
sockaddr_copy(), and sockaddr_dl_setaddr().

Constify: LLADDR() -> CLLADDR().

Where the kernel overwrites LLADDR(), use sockaddr_dl_setaddr(),
instead. Used properly, sockaddr_dl_setaddr() will not overrun
the end of the sockaddr.


# 1.251 10-Aug-2007 dyoung

branches: 1.251.2;
Use sockaddr_dl_init().


Revision tags: matt-mips64-base
# 1.250 19-Jul-2007 dyoung

branches: 1.250.4; 1.250.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base yamt-idlelwp-base8 mjf-ufs-trans-base
# 1.249 02-May-2007 dyoung

branches: 1.249.2;
Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


Revision tags: thorpej-atomic-base
# 1.248 25-Mar-2007 liamjfoy

Add net.inet.ip.hashsize to control the IPv4 fast forward hash table size.


# 1.247 24-Mar-2007 liamjfoy

Don't call ip*flow_reap if we're just looking up maxflows


# 1.246 12-Mar-2007 ad

branches: 1.246.2; 1.246.4;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.


# 1.245 05-Mar-2007 liamjfoy

branches: 1.245.2;
Move ipflow_slowtimo from ip_slowtimo and into in_proto.c

ok matt@


# 1.244 04-Mar-2007 christos

Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.243 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.242 29-Jan-2007 dyoung

branches: 1.242.2;
Cosmetic: remove extraneous, non-KNF parentheses. Change a
sizeof(type) to a sizeof(*ptr) so the correctness of the statement
is correct "at a glance" (or so I hope).


# 1.241 22-Dec-2006 ad

ipintr(): check if the queue is empty before looping. Hardly a giant
win, but removed 30% of splnet() calls in one local test.


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.240 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.239 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


# 1.238 06-Dec-2006 dyoung

KNF.


# 1.237 06-Dec-2006 dyoung

KNF.


Revision tags: netbsd-4-0-RC1 netbsd-4-base
# 1.236 16-Nov-2006 christos

branches: 1.236.2; 1.236.4;
__unused removal on arguments; approved by core.


Revision tags: yamt-splraiseipl-base2
# 1.235 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


# 1.234 10-Oct-2006 dogcow

change the MOWNER_INIT define to take two args; fix extant struct mowner
decls to use it. Makes options MBUFTRACE compile again and not whinge about
missing structure declarations. (Also makes initialization consistent.)


# 1.233 05-Oct-2006 tls

Protect calls to pool_put/pool_get that may occur in interrupt context
with spl used to protect other allocations and frees, or datastructure
element insertion and removal, in adjacent code.

It is almost unquestionably the case that some of the spl()/splx() calls
added here are superfluous, but it really seems wrong to see:

s=splfoo();
/* frob data structure */
splx(s);
pool_put(x);

and if we think we need to protect the first operation, then it is hard
to see why we should not think we need to protect the next. "Better
safe than sorry".

It is also almost unquestionably the case that I missed some pool
gets/puts from interrupt context with my strategy for finding these
calls; use of PR_NOWAIT is a strong hint that a pool may be used from
interrupt context but many callers in the kernel pass a "can wait/can't
wait" flag down such that my searches might not have found them. One
notable area that needs to be looked at is pf.

See also:

http://mail-index.netbsd.org/tech-kern/2006/07/19/0003.html
http://mail-index.netbsd.org/tech-kern/2006/07/19/0009.html


# 1.232 19-Sep-2006 elad

Remove ugly (void *) casts from network scope authorization wrapper and
calls to it.

While here, adapt code for system scope listeners to avoid some more
casts (forgotten in previous run).

Update documentation.


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9
# 1.231 13-Sep-2006 elad

branches: 1.231.2;
Don't use KAUTH_RESULT_* where it's not applicable.
Prompted by yamt@.


# 1.230 08-Sep-2006 elad

First take at security model abstraction.

- Add a few scopes to the kernel: system, network, and machdep.

- Add a few more actions/sub-actions (requests), and start using them as
opposed to the KAUTH_GENERIC_ISSUSER place-holders.

- Introduce a basic set of listeners that implement our "traditional"
security model, called "bsd44". This is the default (and only) model we
have at the moment.

- Update all relevant documentation.

- Add some code and docs to help folks who want to actually use this stuff:

* There's a sample overlay model, sitting on-top of "bsd44", for
fast experimenting with tweaking just a subset of an existing model.

This is pretty cool because it's *really* straightforward to do stuff
you had to use ugly hacks for until now...

* And of course, documentation describing how to do the above for quick
reference, including code samples.

All of these changes were tested for regressions using a Python-based
testsuite that will be (I hope) available soon via pkgsrc. Information
about the tests, and how to write new ones, can be found on:

http://kauth.linbsd.org/kauthwiki

NOTE FOR DEVELOPERS: *PLEASE* don't add any code that does any of the
following:

- Uses a KAUTH_GENERIC_ISSUSER kauth(9) request,
- Checks 'securelevel' directly,
- Checks a uid/gid directly.

(or if you feel you have to, contact me first)

This is still work in progress; It's far from being done, but now it'll
be a lot easier.

Relevant mailing list threads:

http://mail-index.netbsd.org/tech-security/2006/01/25/0011.html
http://mail-index.netbsd.org/tech-security/2006/03/24/0001.html
http://mail-index.netbsd.org/tech-security/2006/04/18/0000.html
http://mail-index.netbsd.org/tech-security/2006/05/15/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/01/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/25/0000.html

Many thanks to YAMAMOTO Takashi, Matt Thomas, and Christos Zoulas for help
stablizing kauth(9).

Full credit for the regression tests, making sure these changes didn't break
anything, goes to Matt Fleming and Jaime Fournier.

Happy birthday Randi! :)


Revision tags: yamt-pdpolicy-base8 rpaulo-netinet-merge-pcb-base
# 1.229 30-Aug-2006 christos

branches: 1.229.2;
fix initializer


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.228 30-Jul-2006 elad

ugh.. more stuff that's overdue and should not be in 4.0: remove the
sysctl(9) flags CTLFLAG_READONLY[12]. luckily they're not documented
so it's only half regression.

only two knobs used them; proc.curproc.corename (check added in the
existing handler; its CTLFLAG_ANYWRITE, yay) and net.inet.ip.forwsrcrt,
that got its own handler now too.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base
# 1.227 07-Jun-2006 kardel

merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base simonb-timecounters-base
# 1.226 08-May-2006 liamjfoy

branches: 1.226.2;
#if -> #ifdef

ok christos


# 1.225 15-Apr-2006 christos

Coverity CID 1134: Protect against NULL deref.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.224 18-Feb-2006 joerg

branches: 1.224.2; 1.224.4; 1.224.6;
Print the source and destination IP in ip_forward's DIAGNOSTIC code
with inet_ntoa, making it more human friendly.

From Liam J. Foy in private mail.


# 1.223 24-Dec-2005 perry

branches: 1.223.2; 1.223.4; 1.223.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.222 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.221 01-Nov-2005 christos

Don't decrement the ttl, until we are sure that we can forward this packet.
Before if there was no route, we would call icmp_error with a datagram
packet that has an incorrect checksum. (From Liam Foy)


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base
# 1.220 23-Oct-2005 christos

No need to pass an interface when only the mtu is needed. From OpenBSD via
Liam Foy.


Revision tags: yamt-vop-base
# 1.219 05-Aug-2005 elad

branches: 1.219.2;
Add sysctls for IP, ICMP, TCP, and UDP statistics.


# 1.218 28-Jun-2005 seanb

branches: 1.218.2;
- Return ICMP_UNREACH_NET when no route found as per
section 4.3.3.1 of rfc1812.


# 1.217 09-Jun-2005 atatat

Properly fix the constipated lossage wrt -Wcast-qual and the sysctl
code. I know it's not the prettiest code, but it seems to work rather
well in spite of itself.


# 1.216 01-Jun-2005 blymn

Unconstify rnode to prevent compile error when GATEWAY option set.


Revision tags: kent-audio2-base
# 1.215 29-Apr-2005 yamt

move decl of inetsw to its own header to avoid array of incomplete type.
found by gcc4. reported by Adam Ciarcinski.


# 1.214 18-Apr-2005 yamt

fix problems related to loopback interface checksum omission. PR/29971.

- for ipv4, defer decision to ip layer as h/w checksum offloading does
so that it can check the actual interface the packet is going to.
- for ipv6, disable it.
(maybe will be revisited when it implements h/w checksum offloading.)

ok'ed by Jason Thorpe.


# 1.213 29-Mar-2005 yamt

ip_reass: clear stale csum_flags.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base
# 1.212 26-Feb-2005 perry

branches: 1.212.2;
nuke trailing whitespace


Revision tags: yamt-km-base2
# 1.211 03-Feb-2005 perry

ANSIfy function declarations


# 1.210 02-Feb-2005 perry

de-__P -- will ANSIfy .c files later.


Revision tags: yamt-km-base
# 1.209 24-Jan-2005 matt

branches: 1.209.2;
Add IFNET_FOREACH and IFADDR_FOREACH macros and start using them.


Revision tags: kent-audio1-beforemerge
# 1.208 19-Dec-2004 christos

branches: 1.208.2;
yamt's changes seem to fix all the checksumming issues. Turn the loopback
checksums back off so we can make sure that everything works.


# 1.207 17-Dec-2004 christos

Turn checksumming on loopback back on until we fix the bugs in it.
Connect over tcp on the loopback is broken:

4729 amq 0.000007 CALL connect(4,0x804f2a0,0x1c)
4729 amq 75.007420 RET connect -1 errno 60 Connection timed out


# 1.206 15-Dec-2004 thorpej

Don't perform checksums on loopback interfaces. They can be reenabled with
the net.inet.*.do_loopback_cksum sysctl.

Approved by: groo


Revision tags: kent-audio1-base
# 1.205 06-Oct-2004 darrenr

Add a comment to document what setting "srcrt" is really on about in ipintr()


# 1.204 29-Sep-2004 christos

PR/27081: Sean Boudreau: ip_input() bad csum count not incremented on sw csum


Revision tags: BEFORE-IPF413
# 1.203 25-May-2004 atatat

Sysctl descriptions under net subtree (net.key not done)


# 1.202 02-May-2004 darrenr

at line 543, we do a pullup here of hlen bytes into the mbuf,
so these later ones are superfluous.


# 1.201 01-May-2004 matt

Use EVCNT_ATTACH_STATIC{,2}


# 1.200 25-Apr-2004 simonb

Initialise (most) pools from a link set instead of explicit calls
to pool_init. Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.

Convert struct session, ucred and lockf to pools.


# 1.199 22-Apr-2004 matt

Constify protosw arrays. This can reduce the kernel .data section by
over 4K (if all the network protocols) are loaded.


# 1.198 01-Apr-2004 matt

In ip_reass_ttl_descr, make i signed since it's compared to >= 0


Revision tags: netbsd-2-0-base BEFORE-IPF411
# 1.197 24-Mar-2004 atatat

branches: 1.197.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.


# 1.196 15-Jan-2004 itojun

correct typo in 1.94 -> 1.95. pointed out by Shiva Shenoy


# 1.195 14-Dec-2003 thorpej

Fix syntax errors in CHECK_NMBCLUSTER_PARAMS().


# 1.194 14-Dec-2003 jonathan

Second part of hashed IP_reassembly changes:

When under pressure for mbufs or we have too many fragments in the IP
reassembly queue, drop half of all fragments. This multiplicative-drop
strategy ensures we return to a healthy state, even under borderline
denial-of-service from extremely lossy NFS-over-UDP peers.
The multiplicative-drop phase currently drops 50% of fragments, but
has pre-placed support for implementing drop-fractions other than 50%

The threshhold for the `drop-half' phase is the new variable,
ip_maxfrags which is calculated as nmbclusters/4.

ip_input.c now keeps ip_nmbclusters, a cached copy of nmbclusters.
Before using limits derived from nmbclusters, we check if nmbclusters
and ip_nmclusters are equal. If not, we recompute Ip parameters
derived from nmbclusters. Based on a suggestion by Jason Thorpe.
ip_maxfrags is currently auto-recalcuated.

The counters ip_nfrags and ip_nfragpacketsr are now declared static
and uninitialized (bss), to discourage tampering with them.


# 1.193 12-Dec-2003 scw

Make fast-ipsec and ipflow (Fast Forwarding) interoperate.

The idea is that we only clear M_CANFASTFWD if an SPD exists
for the packet. Otherwise, it's safe to add a fast-forward
cache entry for the route.

To make this work properly, we invalidate the entire ipflow
cache if a fast-ipsec key is added or changed.


# 1.192 08-Dec-2003 jonathan

Add new field ipq_nfrags to struct ipq. Maintain count of fragments
(fragments, not fragmented packets) in each queue entry.
Use ipq_nfrags to maintain a count of total fragments in reassembly queue.


# 1.191 07-Dec-2003 jonathan

KNF: s/unsigned/u_int/, in a couple of places I missed.


# 1.190 06-Dec-2003 jonathan

Replace the single global IP reassembly list/listhead, with a
hashtable of list-heads. Independently re-invented, then reworked to
match similar code in FreeBSD.


# 1.189 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.188 04-Dec-2003 scw

ipflow (IP fast forwarding) is not compatible with FAST_IPSEC either.

XXX: The decision whether or not to fast forward should be made
XXX: dynamically. Using the current approach seriously reduces
XXX: routing performance on gateways with IPsec enabled.


# 1.187 26-Nov-2003 itojun

define RANDOM_IP_ID by default (unifdef -DRANDOM_IP_ID).
one use remains in sys/netipsec, which is kept for freebsd source code compat.


# 1.186 24-Nov-2003 scw

For FAST_IPSEC, ipfilter gets to see wire-format IPsec-encapsulated packets
only. Decapsulated packets bypass ipfilter. This mimics current behaviour
for Kame IPsec.


# 1.185 19-Nov-2003 fvdl

Correct number of arguments to sysctl_rdint.


# 1.184 19-Nov-2003 jonathan

Patch back support for (badly) randomized IP ids, by request:

* Include "opt_inet.h" everywhere IP-ids are generated with ip_newid(),
so the RANDOM_IP_ID option is visible. Also in ip_id(), to ensure
the prototype for ip_randomid() is made visible.

* Add new sysctl to enable randomized IP-ids, provided the kernel was
configured with RANDOM_IP_ID. (The sysctl defaults to zero, and is
a read-only zero if RANDOM_IP_ID is not configured).

Note that the implementation of randomized IP ids is still defective,
and should not be enabled at all (even if configured) without
very careful deliberation. Caveat emptor.


# 1.183 17-Nov-2003 jonathan

Diff to netinet/ip_input.c (restore ip_id, initialize) for ip_id fix:

Revert the (default) ip_id algorithm to the pre-randomid algorithm,
due to demonstrated low-period repeated IDs from the randomized IP_id
code. Consensus is that the low-period repetition (much less than
2^15) is not suitable for general-purpose use.

Allocators of new IPv4 IDs should now call the function ip_newid().
Randomized IP_ids is now a config-time option, "options RANDOM_IP_ID".
ip_newid() can use ip_random-id()_IP_ID if and only if configured
with RANDOM_IP_ID. A sysctl knob should be provided.

This API may be reworked in the near future to support linear ip_id
counters per (src,dst) IP-address pair.


# 1.182 12-Nov-2003 itojun

KNF


# 1.181 11-Nov-2003 jonathan

Change global head-of-local-IP-address list from in_ifaddr to
in_ifaddrhead. Recent changes in struct names caused a namespace
collision in fast-ipsec, which are most cleanly fixed by using
"in_ifaddrhead" as the listhead name.


# 1.180 10-Nov-2003 jonathan

Make per-protocol network input queue stats visible to userland via
sysctl. Add a protocol-independent sysctl handler to show the per-protocol
"struct ifq' statistics. Add IP(v4) specific call to the handler.
Other protocols can show their per-protocol input statistics by
allocating a sysclt node and calling sysctl_ifq() with their own struct ifq *.

As posted to tech-kern plus improvements/cleanup suggested by Andrew Brown.


# 1.179 28-Sep-2003 mycroft

Remove some code that breaks AH tunnels completely. The comment describing
the purpose of this code appears to be on crack -- it's talking about
end-to-end authentication, but the purpose of an AH tunnel is NOT end-to-end
authentication; it's authentication of the tunnel endpoints.

NB: This does not fix the fact that IPsec leaks "packet tags."


# 1.178 06-Sep-2003 itojun

randomize IPv4/v6 fragment ID and IPv6 flowlabel. avoids predictability
of these fields. ip_id.c is from openbsd. ip6_id.c is adapted by kame.


# 1.177 06-Sep-2003 itojun

backout previous, we don't know if arc4random() corrides on reboot.


# 1.176 05-Sep-2003 itojun

initialize fragment ID with arc4random, not by time.tv_sec


# 1.175 22-Aug-2003 itojun

remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.


# 1.174 22-Aug-2003 itojun

change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.


# 1.173 15-Aug-2003 jonathan

(fast-ipsec): Add hooks to pass IPv4 IPsec traffic into fast-ipsec, if
configured with ``options FAST_IPSEC''. Kernels with KAME IPsec or
with no IPsec should work as before.

All calls to ip_output() now always pass an additional compulsory
argument: the inpcb associated with the packet being sent,
or 0 if no inpcb is available.

Fast-ipsec tested with ICMP or UDP over ESP. TCP doesn't work, yet.


# 1.172 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.171 14-Jul-2003 itojun

correct igmp. from love


# 1.170 03-Jul-2003 itojun

minor KNF


# 1.169 30-Jun-2003 itojun

branches: 1.169.2;
do not generate ICMP redirect when packet filter alters ip_dst to an
address that reside on the same link. Cedric Berger convinced me that
it is necessary.


# 1.168 30-Jun-2003 itojun

fix indent


# 1.167 23-Jun-2003 martin

Make sure to include opt_foo.h if a defflag option FOO is used.


# 1.166 15-Jun-2003 matt

Change the way multicasts are kept. They now use a hash table in the same
manner as the ifaddr hash table. By doing this, the mkludge code can go
away. At the same time, keep track of what pcbs are using what ifaddr and
when an address is deleted from an interface, notify/abort all sockets
that have that address as a source. Switch IGMP and multicasts to use pools
for allocation. Fix a number of potential problems in the igmp code where
allocation failures could cause a trap/panic.


# 1.165 11-Apr-2003 christos

PR/991: Darren Reed: Add a sysctl (checkinteface) to implement this. This
implementation is taken from FreeBSD, but we default to off.
XXX: We should really do this on a per ifaddr basis as jason suggested.


# 1.164 26-Feb-2003 matt

Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.163 12-Nov-2002 itojun

remove all entries in rt timer queue on ip_mtudisc change, instead of
destroying the queue.


# 1.162 12-Nov-2002 itojun

ckout previous - doesn't compile


# 1.161 12-Nov-2002 itojun

update ip_mtudisc sysctl change handling.


# 1.160 10-Nov-2002 itojun

always create pmtud timeout queue, as ip_mtudisc can be tweaked via
sysctl at runtime. From lha@stacken.kth.se


# 1.159 02-Nov-2002 perry

/*CONTCOND*/ while (0)'ed macros


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.158 23-Sep-2002 itojun

revert mtudisc_timeout value to the old one if update falis


# 1.157 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame.


# 1.156 11-Sep-2002 itojun

correct signedness mixup in pointer passing. sync w/kame


Revision tags: gehenna-devsw-base
# 1.155 14-Aug-2002 itojun

avoid swapping endian of ip_len and ip_off on mbuf, to meet with M_LEADINGSPACE
optimization made last year. should solve PR 17867 and 10195.

IP_HDRINCL behavior of raw ip socket is kept unchanged. we may want to
provide IP_HDRINCL variant that does not swap endian.


# 1.154 30-Jun-2002 thorpej

Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).


# 1.153 13-Jun-2002 itojun

set IPv4 parameter to modern value.
- turn on path MTU discovery (previous: turned off)
- ICMPv4 redirect entry timeout = 600 sec (previous: never timeout)


# 1.152 09-Jun-2002 itojun

whitespace


# 1.151 07-Jun-2002 itojun

look at rmx_mtu on IPsec tunnel MTU computation.
From: David Waitzman <djw@bbn.com>


Revision tags: netbsd-1-6-base
# 1.150 12-May-2002 matt

branches: 1.150.2; 1.150.4;
Eliminate commons.


# 1.149 12-May-2002 wiz

Spelling fixes, from Sergey Svishchev in kern/16650.


# 1.148 07-May-2002 matt

Change struct ipqe to use TAILQ's instead of LIST's (primarily for TCP's
benefit currently). Rework tcp_reass code to optimize the 4 most likely causes
of out-of-order packets: first OoO pkt, next OoO pkt in seq, OoO pkt is part
of new chuck of OoO packets, and the OoO pkt fills the first hole. Add evcnts
to instrument tcp_reass (enabled by the options TCP_REASS_COUNTERS). This is
part 1/2 of tcp_reass changes.


# 1.147 18-Apr-2002 matt

Change test for M_EXT to M_READONLY for MROUTING. We only need to to do
a pullup if we aren't allowed to modify the packet.


Revision tags: eeh-devprop-base newlock-base
# 1.146 08-Mar-2002 thorpej

Pool deals fairly well with physical memory shortage, but it doesn't
deal with shortages of the VM maps where the backing pages are mapped
(usually kmem_map). Try to deal with this:

* Group all information about the backend allocator for a pool in a
separate structure. The pool references this structure, rather than
the individual fields.
* Change the pool_init() API accordingly, and adjust all callers.
* Link all pools using the same backend allocator on a list.
* The backend allocator is responsible for waiting for physical memory
to become available, but will still fail if it cannot callocate KVA
space for the pages. If this happens, carefully drain all pools using
the same backend allocator, so that some KVA space can be freed.
* Change pool_reclaim() to indicate if it actually succeeded in freeing
some pages, and use that information to make draining easier and more
efficient.
* Get rid of PR_URGENT. There was only one use of it, and it could be
dealt with by the caller.

From art@openbsd.org.


Revision tags: ifpoll-base
# 1.145 25-Feb-2002 itojun

correctly enforce ipsec policy check on forwarding case.
From: Greg Troxel <gdt@ir.bbn.com>, Bill Chiarchiaro <wjc@work.cleartech.com>


# 1.144 24-Feb-2002 martin

Clear M_BCAST and M_MCAST on outgoing mbufs.
Don't copy ttl from the inner packet to the encapsulating packet. Make
the outer ttl sysctl'able. This should close PR 14269 from Jasper Wallace
(change partly from there) and it makes traceroute work over gre tunnels.


# 1.143 21-Feb-2002 itojun

suppress source quence message, based on router-req RFC (also could be abused
as DoS traffic generator). from kjc/kame


# 1.142 28-Nov-2001 darrenr

recompute hlen after calling pfil_run_hooks() in case ip_hl was changed.


# 1.141 13-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-mips-cache-base
# 1.140 04-Nov-2001 matt

Convert netinet to not use the internal <sys/queue.h> field names
but instead the access macros. Use the FOREACH macros where appropriate.


# 1.139 04-Nov-2001 matt

Change a few variable/tables to const since they are read-only.


# 1.138 29-Oct-2001 simonb

Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.137 17-Sep-2001 thorpej

branches: 1.137.2;
Split the pre-computed ifnet checksum flags into Tx and Rx directions.
Add capabilities bits that indicate an interface can only perform
in-bound TCPv4 or UDPv4 checksums. There is at least one Gig-E chip
for which this is true (Level One LXT-1001), and this is also the
case for the Intel i82559 10/100 Ethernet chips.


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.136 06-Aug-2001 itojun

branches: 1.136.2;
cache IPsec policy on in6?pcb. most of the lookup operations can be bypassed,
especially when it is a connected SOCK_STREAM in6?pcb. sync with kame.


# 1.135 02-Jun-2001 thorpej

branches: 1.135.2;
Implement support for IP/TCP/UDP checksum offloading provided by
network interfaces. This works by pre-computing the pseudo-header
checksum and caching it, delaying the actual checksum to ip_output()
if the hardware cannot perform the sum for us. In-bound checksums
can either be fully-checked by hardware, or summed up for final
verification by software. This method was modeled after how this
is done in FreeBSD, although the code is significantly different in
most places.

We don't delay checksums for IPv6/TCP, but we do take advantage of the
cached pseudo-header checksum.

Note: hardware-assisted checksumming defaults to "off". It is
enabled with ifconfig(8). See the manual page for details.

Implement hardware-assisted checksumming on the DP83820 Gigabit Ethernet,
3c90xB/3c90xC 10/100 Ethernet, and Alteon Tigon/Tigon2 Gigabit Ethernet.


# 1.134 21-May-2001 lukem

fix spelo in comment


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.133 16-Apr-2001 itojun

give a default value to net.inet.ip.maxfragpackets, to protect us from
"lots of fragmented packets" DoS attack.

the current default value is derived from ipv6 counterpart, which is
a magical value "200". it should be enough for normal systems, not sure
if it is enough when you take hundreds of thousands of tcp connections on
your system. if you have proposal for a better value with concrete reasons,
let me know.


# 1.132 13-Apr-2001 thorpej

Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.


# 1.131 27-Mar-2001 itojun

net.inet.ip.maxfragpackets defines the maximum size of ip reass queue
(prevents fragment flood from chewing up mbuf memory space).
derived from KAME net.inet6.ip6.maxfragpackets.


# 1.130 02-Mar-2001 itojun

branches: 1.130.2;
increase ipstat.ips_badaddr if the packet fails to pass address checks.


# 1.129 02-Mar-2001 itojun

reject packets with 127/8 on IPv4 src/dst, they must not appear on wire
(RFC1122). torture-tests will be welcomed.
XXX do we want to check source routing headers as well?


# 1.128 01-Mar-2001 itojun

make sure to enforce inbound ipsec policy checking, for any protocols on top
of ip (check it when final header is visited). sync with kame.
XXX kame team will need to re-check policy engine code


# 1.127 24-Jan-2001 itojun

- record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation


# 1.126 28-Dec-2000 thorpej

Back out the sledgehammer damage applied by wiz while I was out for
the holiday.


# 1.125 25-Dec-2000 wiz

Back out previous change. It causes NAT to fail, and was CLEARLY
NOT TESTED before it was committed.


# 1.124 22-Dec-2000 thorpej

Slight adjustment to how pfil_head's are registered. Instead of a
"key" and a "dlt", use a "type" (PFIL_TYPE_{AF,IFNET} for now) and
a val/ptr appropriate for that type. This allows for more future
flexibility with the pfil_hook mechanism.


# 1.123 14-Dec-2000 thorpej

Add ALTQ glue. XXX Temporary until ALTQ is changed to use a pfil hook.


# 1.122 24-Nov-2000 itojun

IFA_STATS stability (not complete); don't touch ip if it is NULL.


# 1.121 11-Nov-2000 thorpej

Restructure the PFIL_HOOKS mechanism a bit:
- All packets are passed to PFIL_HOOKS as they come off the wire, i.e.
fields in protocol headers in network order, etc.
- Allow for multiple hooks to be registered, using a "key" and a "dlt".
The "dlt" is a BPF data link type, indicating what type of header is
present.
- INET and INET6 register with key == AF_INET or AF_INET6, and
dlt == DLT_RAW.
- PFIL_HOOKS now take an argument for the filter hook, and mbuf **,
an ifnet *, and a direction (PFIL_IN or PFIL_OUT), thus making them
less IP (really, IP Filter) centric.

Maintain compatibility with IP Filter by adding wrapper functions for
IP Filter.


# 1.120 08-Nov-2000 ad

Update for hashinit() change.


# 1.119 13-Oct-2000 itojun

make sure we don't share external mbuf between m and mcopy, in ip_forward().
should solve PR 11201.


# 1.118 26-Aug-2000 itojun

make sure anonport{min,max} is not negative number


# 1.117 25-Aug-2000 tron

Add new sysctl variables "net.inet.ip.lowportmin" and
"net.inet.ip.lowportmax" which can be used to the set minimum
and maximum port number assigned to sockets using
IP_PORTRANGE_LOW.


# 1.116 06-Jul-2000 itojun

remove unnecessary #include <netkey/key_debug.h>. from kame.


# 1.115 28-Jun-2000 mrg

<vm/vm.h> -> <uvm/uvm_extern.h>


Revision tags: netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.114 10-May-2000 itojun

branches: 1.114.4;
add missing boundary checks to ip options processing.
correct timestamp option validation (len and ptr upper/lower bound
based on RFC791).
fill "pointer" field for parameter problem in timestamp option processing.


# 1.113 10-May-2000 itojun

correct more out-of-bounds memory access, if cnt == 1 and optlen > 1.


# 1.112 06-May-2000 sommerfeld

Handle large offsets with very small options correctly.


# 1.111 31-Mar-2000 jdolecek

Slighly improve previous - only include <netinet/ip_mroute.h> if MROUTING
is defined.


# 1.110 31-Mar-2000 jdolecek

include <netinet/ip_mroute.h> for ip_mforward() - needed after
last duplicate prototype sweep (prototype for ip_mforward() used to be in <netinet/ip_var.h>)


# 1.109 30-Mar-2000 augustss

Remove register declarations.


# 1.108 30-Mar-2000 simonb

Delete uninitialised declaration of ip_defttl - there's an initialised
decl earlier in this file.


# 1.107 10-Mar-2000 thorpej

Back out previous, and adjust a comment.


# 1.106 07-Mar-2000 thorpej

Back out part of 1.104 which isn't actually needed.


# 1.105 03-Mar-2000 itojun

remove unnecessary ttl initialization which I mistakingly bringed in
during KAME merge (this is part of WIDE's expeirmental reass code...)
NetBSD PR: 9412
From: Wolfgang Rupprecht <wolfgang@wsrcc.com>
Fix from: ho@crt.se
itojun was notified from: theo


# 1.104 02-Mar-2000 thorpej

Avoid a bug in GCC which manifests itself when processing unaligned
IP options. Problem pointed out by Matt Hargett and Erik Fair, analyzed
by me.


# 1.103 01-Mar-2000 itojun

introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)


# 1.102 20-Feb-2000 darrenr

pass "struct pfil_head *" to pfil_add_hook and pfil_remove hook rather
than "struct protosw *".


# 1.101 17-Feb-2000 darrenr

Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.


# 1.100 16-Feb-2000 itojun

- if ip_dst matches address on !IFF_UP interface, and
- there's no match against addresses on IFF_UP interface,
send icmp unreach if I'm router. drop it if I'm host.

Revised version of PR: 9387 from nrt@iij.ad.jp. Discussed with thorpej+nrt.


Revision tags: chs-ubc2-newbase
# 1.99 12-Feb-2000 thorpej

Typo (Thanks, Havard :-)


# 1.98 12-Feb-2000 thorpej

Small cosmetic change, and note a place where a statistic should be
gathered.


# 1.97 11-Feb-2000 itojun

fix in-kernel packet forwarding loop (till TTL becomes 0) when:
- a packet is delivered to an address X,
- and the address X is configured on my !IFF_UP interface
- and ipforwarding=1

NetBSD PR: 9387
From: nrt@iij.ad.jp


# 1.96 01-Feb-2000 thorpej

Use ifatoia() and sintosa() consistently, rather than using home-grown
casting macros intermixed.


# 1.95 31-Jan-2000 itojun

bring in latest KAME ipsec tree.
- interop issues in ipcomp is fixed
- padding type (after ESP) is configurable
- key database memory management (need more fixes)
- policy specification is revisited

XXX m->m_pkthdr.rcvif is still overloaded - hope to fix it soon


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.94 26-Oct-1999 itojun

disable ipflow (IPv4 fast fowarding) when IPsec is configured into the kernel.


# 1.93 17-Oct-1999 sommerfeld

branches: 1.93.2; 1.93.4;
In ip_forward():

Avoid forwarding ip unicast packets which were contained inside
link-level multicast packets; having M_MCAST still set in the packet
header flags will mean that the packet will get multicast to a bogus
group instead of unicast to the next hop.

Malformed packets like this have occasionally been spotted "in the
wild" on a mediaone cable modem segment which also had multiple netbsd
machines running as router/NAT boxes.

Without this, any subnet with multiple netbsd routers receiving all
multicasts will generate a packet storm on receipt of such a
multicast. Note that we already do the same check here for link-level
broadcasts; ip6_forward already does this as well.

Note that multicast forwarding does not go through ip_forward().

Adding some code to if_ethersubr to sanity check link-level
vs. ip-level multicast addresses might also be worthwhile.


Revision tags: chs-ubc2-base
# 1.92 23-Jul-1999 itojun

branches: 1.92.2;
do not include unnecessary include files.


# 1.91 09-Jul-1999 thorpej

defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).


# 1.90 06-Jul-1999 itojun

sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups


# 1.89 01-Jul-1999 itojun

IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.88 26-Jun-1999 sommerfeld

If the new global variable hostzerobroadcast is zero, no longer assume
address zero of each net/subnet is a broadcast address.
(The default value is nonzero, which preserves the current behavior).

This can be set using sysctl; the boot-time default can also be
configured using the HOSTZEROBROADCAST kernel config option.

While we're here, defopt HOSTZEROBROADCAST and SUBNETSARELOCAL


# 1.87 04-May-1999 hwr

It does not make much sense to increase a "output" counter on input.


# 1.86 03-May-1999 thorpej

In INADDR_TO_IA(), skip interfaces which are not up. Revert previous change
to ip_input.c to check the interface status after INADDR_TO_IA().

Fix cooked up by Heiko Rupp and myself.

Fixes PR 7480.


# 1.85 03-May-1999 hwr

Drop packets, that have a Class-D address as source address.
Implements the first half of PR 7003.


# 1.84 07-Apr-1999 proff

tiny KNF change


# 1.83 07-Apr-1999 proff

Prevent reception of packets on downed interfaces (via an up interface).
fixes kern/7327


Revision tags: netbsd-1-4-base
# 1.82 27-Mar-1999 aidan

branches: 1.82.2;
Added per-addr input/output statistics. Currently just support netatalk
and netinet, currently only tested under netinet.

Disabled by default, enabled by compiling the kernel with option
IFA_STATS. Enabling this feature seems to make the ip_output function
take 13% longer than before, which should be OK for people that need
this feature.


# 1.81 26-Mar-1999 proff

security: test for ip_len < ip_hl <<2 and drop packet accordingly


# 1.80 19-Jan-1999 mycroft

There's just no plausible reason to byte-swap ip_id internally. It's opaque.


# 1.79 19-Jan-1999 mycroft

Don't screw with ip_len; just subtract from it where we actually use the
value.


# 1.78 19-Jan-1999 mycroft

Don't overwrite the checksum fields when checking them. There's no reason to
do this, and it screws up ICMP replies.
XXX The returned IP checksum and length are still wrong.


# 1.77 11-Jan-1999 thorpej

Fix byte order and ip_len inconsistencies in ICMP reply code. Also, fix
some formatting and HTONS(foo) vs. foo = htons(foo) inconsistencies.

PR #6602, Darren Reed.


# 1.76 19-Dec-1998 thorpej

Reverse the copyright-notice-swap. It went against existing practice.


# 1.75 18-Dec-1998 thorpej

Add a lock around the IP fragment reassembly queue, to prevent ip_drain()
from corrupting the queue if called from a device's interrupt context.

Should fix PR #5684.


Revision tags: kenh-if-detach-base
# 1.74 13-Nov-1998 thorpej

branches: 1.74.2;
Once a fragmented IP packet has been reassembled, recompute the packet
length before passing it up the stack. From FreeBSD.


Revision tags: chs-ubc-base
# 1.73 08-Oct-1998 thorpej

Use the pool allocator for ipflow entries.


# 1.72 08-Oct-1998 thorpej

Use the pool allocator for ipqent structures.


# 1.71 30-Sep-1998 tls

Switch order of TNF and UCB copyrights so UCB copyright is first; this seems more appropriate since UCB wrote the original code, after all.


# 1.70 09-Sep-1998 thorpej

Make a diagnostic printf more sensible, PR #5951, Heiko W. Rupp.


# 1.69 09-Aug-1998 mrg

defopt PFIL_HOOKS.


Revision tags: eeh-paddr_t-base
# 1.68 17-Jul-1998 sommerfe

Fix PR5508: ipfil cut-through forwarding causes panic


# 1.67 01-Jun-1998 thorpej

Protect the ipflow_reap() call with splsoftnet.


# 1.66 24-May-1998 thorpej

Fix OBOB in IP timestamp option processing, as noted in FreeBSD PR 6738,
from Jennifer Dawn Meyers <jdm@enteract.com>.


# 1.65 04-May-1998 matt

Default IP flow to being enabled. Add a sysctl to control the maximum
number of flows (net.inet.ip.maxflows). If set to 0, will disable fast
path forwarding.


# 1.64 01-May-1998 thorpej

Allow packet filters to prevent a packet from creating a fast-forwarding
flow, by setting the "can fast forward" flag in the packet header, and
giving a chance for filters to clear the flag. If the flag is still
set after the filters have given it a chance, the packet will be used
to create a fast-forward flow entry.


# 1.63 29-Apr-1998 matt

Add support for "fast" forwarding. Add hooks in if_ethersubr.c and
if_fddisubr.c to fastpath IP forwarding. If ip_forward successfully
forwards a packet, it will create a cache (ipflow) entry. ether_input
and fddi_input will first call ipflow_fastforward with the received
packet and if the packet passes enough tests, it will be forwarded (the
ttl is decremented and the cksum is adjusted incrementally).


# 1.62 29-Apr-1998 matt

defopt GATEWAY


# 1.61 29-Apr-1998 kml

change path MTU timeout value to match RFC 1191


# 1.60 29-Apr-1998 kml

Add support for deletion of routes added by path MTU discovery;
uses new generic route timeout code. Add sysctl for timeout period.


# 1.59 19-Mar-1998 mrg

convert pfil(9) in and out lists from <sys/queue.h> LISTs to TAILQs, and
change pfil_add_hook to put output filters at the tail of the queue,
while continuing to place input filters at the head of the queue. update
the two users of these functions, and document these changes.

fixes PR#4593.


# 1.58 15-Feb-1998 tls

Add correct copyright notice for IP address hash change. This code is donated to TNF by the original copyright holder, Panix.


# 1.57 13-Feb-1998 tls

Change list of interface IP addresses to a hash. Improves performance on hosts with a large number of IP addresses significantly.


# 1.56 28-Jan-1998 thorpej

Use offsetof() from libkern.h


# 1.55 12-Jan-1998 scottr

Use option header file for MROUTING


# 1.54 05-Jan-1998 lukem

enhance ephemeral port allocation code:
* support sysctl net.inet.ip.anonportmin (lowest ephemeral port)
and net.inet.ip.anonportmax (highest ephemeral port).
these can't be set to >65535, < IPPORT_RESERVED (unless IPNOPRIVPORTS
is defined), and anonportmin has to be < anonportmax.
* use a cleaner way of only cycling through the available set once;
this will be useful for when a random allocation scheme is used
* define IPPORT_ANON{MIN,MAX} instead of IPPORT_USER{LOW,HIGH}


Revision tags: netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base
# 1.53 18-Oct-1997 kml

branches: 1.53.2;
change sysctl net.inet.icmp.mtudisc to net.inet.ip.mtudisc


# 1.52 17-Oct-1997 thorpej

Allow `subnetsarelocal' to be changed via sysctl.


Revision tags: thorpej-signal-base marc-pcmcia-base
# 1.51 29-Aug-1997 gwr

Tweaks to allow operation with an interface address of 0.0.0.0
(needed for NFS mountroot using BOOTP to get boot parameters)


Revision tags: marc-pcmcia-bp
# 1.50 24-Jun-1997 thorpej

branches: 1.50.4;
Eliminate use of dtom() from the network code, allowing more flexible
use of mbuf external storage and increasing performance (by eliminating
an m_pullup() for clusters in the IP reassembly code).

Changes from Koji Imada <koji@math.human.nagoya-u.ac.jp>, in PR #3628
and #3480, with ever-so-slight integration changes by me.


# 1.49 15-Apr-1997 christos

Move the mtod calls *after* we've made sure that the packet has passed the
filter successfully. Otherwise it can be NULL if the filter blocked it,
and we die. How did this ever work?


Revision tags: is-newarp-before-merge
# 1.48 26-Feb-1997 mrg

allow src-routed packetd by default, per host requirements


# 1.47 25-Feb-1997 cjs

Add net.inet.ip.allowsrcrt option which allows/drops all source
routed packets. This currently defaults to `drop,' but once we
verify that all applications that rely on determining remote IP
addresses for authentication are dropping the connection when they
see a source route option (not just disabling the source route
option), we can turn this back on and conform with the host
requirements.


# 1.46 19-Feb-1997 cjs

Fix bug in sysctl net.inet.ip.forwsrcrt handing: now you can read it
if securelevel > 0. (Thanks, cgd.)


# 1.45 18-Feb-1997 mrg

pseudo-device ipfilter brings in PFIL_HOOKS.


Revision tags: is-newarp-base
# 1.44 11-Jan-1997 thorpej

branches: 1.44.4;
Implement the IP_RECVIF socket option: supply a datagram packet's incoming
interface using a sockaddr_dl in a control mbuf.

Implement SO_TIMESTAMP for IP datagrams.

Move packet information option processing into a generic function
so that they work with multicast UDP and raw IP as well as unicast UDP.

Contributed by Bill Fenner <fenner@parc.xerox.com>.


# 1.43 20-Dec-1996 mrg

in pfil_hooks: always reassign ip after calling hook.


# 1.42 20-Dec-1996 mrg

remove pfil_bad.


# 1.41 25-Oct-1996 thorpej

Before concatenating frags, sanity check the length of the packet. If it's
larger than IP_MAXPACKET, discard it.
Based on a patch from Bill Fenner <fenner@parc.xerox.com>


# 1.40 22-Oct-1996 veego

Fix a panic from the pfil_hooks.


# 1.39 13-Oct-1996 christos

backout previous kprintf changes


# 1.38 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.37 21-Sep-1996 perry

commit fix in pr 2772 -- the IP input code was assuming that the
reserved (must be zero) flag must necessarily be zero. We now define
an IP_RF (by analogy to IP_DF and IP_MF) and mask it out when necessary.


# 1.36 14-Sep-1996 mrg

move the packet filter hooks in to a saner location. while i'm here, rename
PACKET_FILTER to PFIL_HOOKS.


# 1.35 09-Sep-1996 mycroft

Add in_nullhost() and in_hosteq() macros, to hide some protocol
details. Also, fix a bug in TCP wrt SYN+URG packets.


# 1.34 08-Sep-1996 mycroft

Save 68 bytes of the packet for ICMP, not 64. From Laine Stump, PR 2296.


# 1.33 06-Sep-1996 mrg

add packet filter interface code. see pfil(9) for more details. you
need the PACKET_FILTER option to enable this code. currently, ipfilter
version 3.1.1-beta has been converted to use this new interface.


# 1.32 14-Aug-1996 thorpej

Fix some DIAGNOSTIC printf() formats; ntohl() provides a 32-bit quantity,
and should be printed with %x, not %lx.


# 1.31 10-Jul-1996 cgd

print result of ntohl/htonl as a long. (makes -Wformat work on the
Alpha.)


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.30 16-Mar-1996 christos

branches: 1.30.4;
Fix printf format args.


# 1.29 26-Feb-1996 mrg

two more local addr changes, all done differently now (idea from charles)


# 1.28 13-Feb-1996 christos

netinet prototypes


# 1.27 16-Jan-1996 thorpej

Add a net.inet.ip.directed-broadcast sysctl as suggested by
Darren Reed <darrenr@vitruvius.arbld.unimelb.edu.au> in PR #1227.
This change is slightly different than the one submitted by Darren in
that the DIRECTED_BROADCAST compile-time option will behave like it used
to so that existing configurations utilizing it won't have to change.


# 1.26 15-Jan-1996 thorpej

Add net.inet.ip.forwsrcrt: if zero, the system will not forward
source-routed packets. Note this value is protected by kernel security
level; it can only be changed if securelevel < 1.


# 1.25 21-Nov-1995 cgd

make netinet work on systems where pointers and longs are 64 bits
(like the alpha). Biggest problem: IP headers were overlayed with
structure which included pointers, and which therefore didn't overlay
properly on 64-bit machines. Solution: instead of threading pointers
through IP header overlays, add a "queue element" structure to do
the threading, and point it at the ip headers.


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.24 12-Aug-1995 mycroft

splnet --> splsoftnet


# 1.23 12-Jun-1995 mycroft

Change in_pcbnotify*() to take an errno value. Make inetctlerrmap[] an
array on ints, not u_chars.


# 1.22 12-Jun-1995 mycroft

Various cleanup, including:
* Convert several data structures to use queue.h.
* Split in_pcbnotify() into two parts; one for notifying a specific PCB, and
one for notifying all PCBs for a particular foreign address.


# 1.21 07-Jun-1995 mycroft

Remove ip_ifmatrix completely.


# 1.20 04-Jun-1995 mycroft

Don't cast things unnecessarily.


# 1.19 04-Jun-1995 mycroft

Clean up many more casts.


# 1.18 01-Jun-1995 mycroft

Avoid byte-swapping IP addresses at run time.


# 1.17 15-May-1995 cgd

oops; forgot a '{'


# 1.16 14-May-1995 cgd

drop (and record) malformed IP fragments. Fixes pr 1030 (differently).


# 1.15 13-Apr-1995 cgd

be a bit more careful and explicit with types. (basically a large no-op.)


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.14 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.13 13-May-1994 mycroft

Update to 4.4-Lite networking code, with a few local changes.


# 1.12 14-Feb-1994 mycroft

PARANOID --> DIAGNOSTIC for inexpensive tests.


# 1.11 02-Feb-1994 hpeyerl

Multicast is no longer optional.


# 1.10 29-Jan-1994 brezak

Fix some cases of NOT dealing with m_pkthdr's. This code is still suspect though, at least this fixes some panics.


# 1.9 10-Jan-1994 mycroft

Should compile now with or without `options MULTICAST'.


# 1.8 09-Jan-1994 mycroft

Prototype the rest.


# 1.7 08-Jan-1994 mycroft

More prototypes.


# 1.6 08-Jan-1994 mycroft

Fix some inconsistent spacing; spaces at the end of lines, etc.


# 1.5 18-Dec-1993 mycroft

Canonicalize all #includes.


# 1.4 06-Dec-1993 hpeyerl

multicast support.
>From Chris Maeda, cmaeda@cs.washington.edu
These patches are derived from the IP Multicast patches for BSDI.


Revision tags: magnum-base netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.3 20-May-1993 cgd

branches: 1.3.4;
more rcsid additions and file header cleanups


# 1.2 04-May-1993 cgd

make ip_input recursion checking be for -DPARANOID, and make it panic


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


Revision tags: prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base
# 1.354 31-Mar-2017 ozaki-r

Don't use a single global variable to store source route information for multiple incoming packets

It's not MP-safe. So use a m_tag to store the information instead.

Pointed out by knakahara@
The fix is from OpenBSD (originally fixed in FreeBSD)


# 1.353 31-Mar-2017 ozaki-r

Don't use a single global variable as a temporal storage for multiple packets

It's not MP-safe. So use local variables instead.


Revision tags: pgoyette-localcount-20170320
# 1.352 06-Mar-2017 ozaki-r

Make sure icmp_redirect_timeout_q and ip_mtudisc_timeout_q are initialized on bootup

Fix PR kern/52029


# 1.351 17-Feb-2017 ozaki-r

Fix return value


# 1.350 17-Feb-2017 ozaki-r

Protect sysctl_net_inet_ip_pmtudto with icmp_mtx instead of softnet_lock


# 1.349 07-Feb-2017 ozaki-r

Add missing NULL checks for m_get_rcvif


Revision tags: nick-nhusb-base-20170204
# 1.348 24-Jan-2017 ozaki-r

Tweak softnet_lock and NET_MPSAFE

- Don't hold softnet_lock in some functions if NET_MPSAFE
- Add softnet_lock to sysctl_net_inet_icmp_redirtimeout
- Add softnet_lock to expire_upcalls of ip_mroute.c
- Restore softnet_lock for in{,6}_pcbpurgeif{,0} if NET_MPSAFE
- Mark some softnet_lock for future work


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107
# 1.347 12-Dec-2016 ozaki-r

branches: 1.347.2;
Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.346 08-Dec-2016 ozaki-r

Use psref for ip_rtaddr

ip_rtaddr will be sleepable soon. So use psref instead of pserialize.


# 1.345 08-Dec-2016 ozaki-r

Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.344 18-Oct-2016 ozaki-r

Don't hold global locks if NET_MPSAFE is enabled

If NET_MPSAFE is enabled, don't hold KERNEL_LOCK and softnet_lock in
part of the network stack such as IP forwarding paths. The aim of the
change is to make it easy to test the network stack without the locks
and reduce our local diffs.

By default (i.e., if NET_MPSAFE isn't enabled), the locks are held
as they used to be.

Reviewed by knakahara@


# 1.343 18-Oct-2016 ozaki-r

Avoid double frees of mbuf

May fix one of panicks reported by Tom Ivar Helbekkmo in PR kern/51522


# 1.342 11-Oct-2016 ozaki-r

Fix kernel builds with IFA_STATS


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.341 07-Sep-2016 roy

Disallow input to detached addresses because they are not yet valid.


# 1.340 31-Aug-2016 ozaki-r

Make ipforward_rt and ip6_forward_rt percpu

Sharing one rtcache between CPUs is just a bad idea.

Reviewed by knakahara@


Revision tags: pgoyette-localcount-20160806
# 1.339 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


# 1.338 26-Jul-2016 ozaki-r

Fix downmatch increment


Revision tags: pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.337 08-Jul-2016 ozaki-r

branches: 1.337.2;
CID 1363344: remove dead code

We may need to reconsider a case when m_get_rcvif_psref returns NULL.


# 1.336 07-Jul-2016 ozaki-r

Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.


# 1.335 06-Jul-2016 ozaki-r

Switch the IPv4 address list to pslist(9)

Note that we leave the old list just in case; it seems there are some
kvm(3) users accessing the list. We can remove it later if we confirmed
nobody does actually.


# 1.334 06-Jul-2016 ozaki-r

Add and use pslist(9)-based hashtable for IPv4 addresses

Note that we leave the old hashtable to keep vmstat -H working.


# 1.333 04-Jul-2016 ozaki-r

Separate IP address matching functions

No functional change intended.


# 1.332 30-Jun-2016 ozaki-r

Tidy up goto lables

No functional change.


# 1.331 30-Jun-2016 ozaki-r

Fix error paths

Some error paths did m_put_rcvif_psref twice.


# 1.330 28-Jun-2016 ozaki-r

Add missing NULL checks for m_get_rcvif_psref


# 1.329 10-Jun-2016 ozaki-r

Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319
# 1.328 21-Jan-2016 riastradh

Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.


# 1.327 21-Jan-2016 riastradh

Give proper prototype to ip_output.


# 1.326 08-Jan-2016 knakahara

eliminate ip_input.c and ip6_input.c dependency on gif(4)


Revision tags: nick-nhusb-base-20151226
# 1.325 13-Oct-2015 roy

Include arp.h to restore the sysctl net.inet.ip.dad_count.
Fixes PR kern/49883 thanks to HITOSHI Osada.


Revision tags: nick-nhusb-base-20150921
# 1.324 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.323 07-Aug-2015 ozaki-r

Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.


Revision tags: nick-nhusb-base-20150606
# 1.322 02-May-2015 joerg

Fix !ARP build.


# 1.321 02-May-2015 roy

Add IPv4 address flags IN_IFF_TENTATIVE, IN_IFF_DUPLICATED and
IN_IFF_DETATCHED to mimic the IPv6 address behaviour.
Add SIOCGIFAFLAG_IN ioctl to retrieve the address flag via the
ifreq structure.
Add IPv4 DAD detection via the ARP methods described in RFC 5227.
Add sysctls net.inet.ip.dad_count and net.inet.arp.debug.

Discussed on tech-net@


Revision tags: nick-nhusb-base-20150406
# 1.320 26-Mar-2015 ozaki-r

Tidy up the regular path of ip_forward

No functional change is intended.


Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.319 16-Jun-2014 ozaki-r

branches: 1.319.4;
Add 3rd argument to pktq_create to pass sc

It will be used to pass bridge sc for bridge_forward softint.

ok rmind@


# 1.318 05-Jun-2014 rmind

- Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.


# 1.317 30-May-2014 christos

Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.


# 1.316 29-May-2014 rmind

Make IGMP and multicast group management code MP-safe. Use a read-write
lock to protect the hash table of multicast address records; also, make it
private and eliminate some macros. In the long term, the lookup path ought
to be optimised.


# 1.315 28-May-2014 christos

CID 12164{49,51}: Remove bogus ifp == NULL checks; if ifp was really NULL,
we would have been dead a few lines before the tests.


# 1.314 23-May-2014 rmind

ip_input(), ip_savecontrol(): cache m->m_pkthdr.rcvif in a variable.


# 1.313 23-May-2014 rmind

Make ip_forward() static, there is no need to expose it.


# 1.312 23-May-2014 rmind

Make ip_input() static, there is no need to expose it.


# 1.311 22-May-2014 rmind

- Add in_init() and move some functions, variables and sysctls into in.c
where they belong to. Make some functions and variables static.
- ip_input.c: reduce some #ifdefs, cleanup a little.
- Move some sysctls into ip_flow.c as they belong there.

No functional change.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.310 19-Mar-2014 liamjfoy

branches: 1.310.2;
Remove ipflow_prune and replace with ipflow_reap. ok rmind@


Revision tags: riastradh-drm2-base3
# 1.309 25-Feb-2014 pooka

Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.308 29-Jun-2013 rmind

- Rewrite parts of pfil(9): use array to store hooks and thus be more cache
friendly (there are only few hooks in the system). Make the structures
opaque and the interface more strict.
- Remove PFIL_HOOKS option by making pfil(9) mandatory.


# 1.307 27-Jun-2013 christos

branches: 1.307.2;
flip src/dst


# 1.306 27-Jun-2013 christos

implement IP_PKTINFO and IP_RECVPKTINFO.


# 1.305 08-Jun-2013 rmind

Split IPsec code in ip_input() and ip_forward() into the separate routines
ipsec4_input() and ipsec4_forward(). Tested by christos@.


# 1.304 05-Jun-2013 christos

IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.


Revision tags: agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.303 29-Nov-2012 christos

Add a new sysctl to mark ports as reserved, so that they are not used in
the anonymous or reserved port allocation.


Revision tags: yamt-pagecache-base6
# 1.302 25-Jun-2012 christos

branches: 1.302.2;
rename rfc6056 -> portalgo, requested by yamt


# 1.301 22-Jun-2012 christos

PR/46602: Move the rfc6056 port randomization to the IP layer.


# 1.300 02-Jun-2012 dsl

Add some pre-processor magic to verify that the type of the data item
passed to sysctl_createv() actually matches the declared type for
the item itself.
In the places where the caller specifies a function and a structure
address (typically the 'softc') an explicit (void *) cast is now needed.
Fixes bugs in sys/dev/acpi/asus_acpi.c sys/dev/bluetooth/bcsp.c
sys/kern/vfs_bio.c sys/miscfs/syncfs/sync_subr.c and setting
AcpiGbl_EnableAmlDebugObject.
(mostly passing the address of a uint64_t when typed as CTLTYPE_INT).
I've test built quite a few kernels, but there may be some unfixed MD
fallout. Most likely passing &char[] to char *.
Also add CTLFLAG_UNSIGNED for unsiged decimals - not set yet.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.299 22-Mar-2012 drochner

remove KAME IPSEC, replaced by FAST_IPSEC


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.298 09-Jan-2012 liamjfoy

check against NULL


# 1.297 19-Dec-2011 drochner

rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.296 31-Aug-2011 plunky

branches: 1.296.2; 1.296.6;
NULL does not need a cast


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.295 03-May-2011 dyoung

*_drain() routines may be called with locks held, so instead of doing
any work in *_drain(), set a drain-needed flag. Do the work in the
fasttimo handler.

Contributed by Coyote Point Systems, Inc.


# 1.294 14-Apr-2011 dyoung

In ipintr(), don't overwrite ipintrq.ifq_maxlen with IFQ_MAXLEN.

Initialize ipintrq.ifq_maxlen using IFQ_MAXLEN directly instead of using
the global ipqmaxlen. Get rid of the global ipqmaxlen.

Now it works again to override the maximum IP queue length with, for
example, sysctl -w net.inet.ip.ifq.maxlen=5.


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.293 13-Dec-2010 matt

branches: 1.293.2;
Back out rev that shouldn't have been committed.


# 1.292 11-Dec-2010 matt

Add routines to calculate a checkesum if the driver concludes that the
h/w can't do it.


Revision tags: uebayasi-xip-base4
# 1.291 05-Nov-2010 rmind

ip_randomid: make mechanism MP-safe and more modular.

OK matt@


# 1.290 05-Nov-2010 rmind

ip_reass_packet: finish abstraction; some clean-up.
Discussed some time ago with matt@.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.289 19-Jul-2010 rmind

Abstract IP reassembly into single generic routine - ip_reass_packet().
Make struct ipq private and struct ipqent not visible to userland.
Push ip_len adjustment into reassembly layer.

OK matt@


# 1.288 13-Jul-2010 rmind

Split-off IPv4 re-assembly mechanism into a separate module. Abstract
into ip_reass_init(), ip_reass_lookup(), etc (note: abstraction is not
yet complete). No functional changes to the actual mechanism.

OK matt@


# 1.287 09-Jul-2010 rmind

ip_input: move lookup for fragment queue a little bit further. OK matt@.


Revision tags: uebayasi-xip-base1
# 1.286 01-Apr-2010 tls

As suggested by at least 3 different people (the guilty parties know who
they are) avoid repeated kernel_lock/unlock by using an intrq on the stack.

About 5%-10% better from run to run, on my *very* simpleminded test. Can't
possibly be worse.


# 1.285 31-Mar-2010 tls

Don't hold kernel lock across call to ip_input() -- it blocked *all*
hardware interrupts for the length of time it took for all dequeued
packets to flow up the stack (on multiprocessors only). Initial testing
shows performance impact is minimal -- since this temporary fix actually
means taking/releasing the kernel lock per-packet, that seems
acceptable.

Holding the kernel lock across the ip_input() call duplicated the
exclusion intended to be provided by the socket locks/softnet lock
(same lock, for INET/INET6 sockets) and could mask serious bugs. Several
hours' testing didn't turn any up but I'd be surprised if some don't now
appear.

Damon Permezel noticed the problem. Temporary fix suggested by matt@.


Revision tags: yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.284 16-Sep-2009 pooka

branches: 1.284.2; 1.284.4;
Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.283 17-Jul-2009 minskim

Delete trailing whitespace.


Revision tags: yamt-nfs-mp-base6
# 1.282 16-Jul-2009 minskim

Add the IP_RECVTTL option support.

If the IP_RECVTTL option is enabled on a SOCK_DGRAM socket, the
recvmsg(2) call will return the TTL of the received datagram. The
msg_control field in the msghdr structure points to a buffer that
contains a cmsghdr structure followed by the TTL value.

Modeled after FreeBSD implementation.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.281 18-Apr-2009 tsutsui

Remove extra whitespace added by a stupid tool.
XXX: more in src/sys/arch


# 1.280 15-Apr-2009 elad

Remove a few KAUTH_GENERIC_ISSUSER in favor of more descriptive
alternatives.

Discussed on tech-kern:

http://mail-index.netbsd.org/tech-kern/2009/04/11/msg004798.html

Input from ad@, christos@, dyoung@, tsutsui@.

Okay ad@.


# 1.279 18-Mar-2009 cegger

bcopy -> memcpy


Revision tags: nick-hppapmap-base2
# 1.278 19-Jan-2009 christos

branches: 1.278.2;
Provide compatibility to the old timeval SCM_TIMESTAMP messages.


Revision tags: mjf-devfs2-base
# 1.277 17-Dec-2008 cegger

kill MALLOC and FREE macros.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.276 23-Nov-2008 rmind

ip_input: fix an IPQ "lock" leak. (hi <matt>!)


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4
# 1.275 04-Oct-2008 pooka

branches: 1.275.2; 1.275.4;
POOL_INIT -> pool_init


Revision tags: wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.274 05-Sep-2008 seanb

Wrong route being consulted in one place
in ip_forward() after change to rtcache_*().
Restore previous behaviour.


# 1.273 20-Aug-2008 matt

Make the sysctl routines take out softnet_lock before dealing with
any data structures.

Change inet6ctlerrmap and zeroin6_addr to const.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base
# 1.272 05-May-2008 ad

branches: 1.272.2; 1.272.6;
- Convert hashinit() to use kmem_alloc(). The hash tables can be large
and it's better to not have them in kmem_map.
- Convert a couple of minor items along the way to kmem_alloc().
- Fix some memory leaks.


# 1.271 04-May-2008 thorpej

Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577


# 1.270 02-May-2008 ad

PR kern/38497 Out of memory allocating ksiginfo

Work around: don't acquire softnet_lock in protocol drain routines.


# 1.269 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.268 24-Apr-2008 ad

branches: 1.268.2;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.


# 1.267 23-Apr-2008 thorpej

Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.266 12-Apr-2008 thorpej

branches: 1.266.2;
Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.


# 1.265 09-Apr-2008 thorpej

- ipflow is not used outside ip_flow.c; move its definition there.
- Make ipflow_reap() private to ip_flow.c, and introduce ipflow_prune()
for external callers to use (avoids returning an ipflow * that is never
actually used anyway).


# 1.264 07-Apr-2008 thorpej

Change IP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ipstat structure; old netstat
binaries will continue to work properly.


# 1.263 27-Mar-2008 cube

- Make sure we send a reasonable fragment size when IPSEC is configured.
Otherwise we end up sending a dubious "0" whenever we cannot find a
proper association for the packet.
- Reset sack_newdata along with snd_nxt to avoid improper integer
arithmetics that lead to sending data from an incorrect place in the
stream, making it appear as corrupted.

Patch by Michael Van Elst, based on an analysis by Michael for the IPSEC
stuff and I for the SACK issue.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.262 06-Feb-2008 matt

branches: 1.262.6;
Add a new ip_id generation scheme based on a Fisher-Yates shuffle over a
sliding window. XXX replace use of arc4random RSN.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.261 14-Jan-2008 dyoung

Use rtcache_validate() instead of rtcache_getrt(). Shorten staircase
in in_losing().


Revision tags: vmlocking2-base3 matt-armv6-base
# 1.260 22-Dec-2007 matt

Fix offset calculation.
Make sure that all frags use the same TOS.


# 1.259 21-Dec-2007 matt

Also make sure the first is at 68 bytes long.


# 1.258 21-Dec-2007 matt

Prevent TCP blind data attacks by not allowing non-initial fragments to
start at less than 68 bytes (minimal fragment size).


# 1.257 20-Dec-2007 dyoung

Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.256 26-Nov-2007 yamt

branches: 1.256.2; 1.256.6;
inetctlerrmap: use designated initializer.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.255 09-Nov-2007 kefren

Don't MCLAIM in ipintr() because we do it anyway in ip_input()


Revision tags: jmcneill-base yamt-x86pmap-base4 yamt-x86pmap-base3 yamt-x86pmap-base2 vmlocking-base
# 1.254 02-Oct-2007 dyoung

branches: 1.254.2; 1.254.4;
Delete the unused second argument to ip_stripoptions(), move it
closer to its single caller in if_eon.c, try to move fewer bytes
by moving the IP header forward instead of moving the tail of the
mbuf backward, and use m_adj(9) instead of fiddling directly with
mbuf data members.


Revision tags: yamt-x86pmap-base
# 1.253 11-Sep-2007 degroote

branches: 1.253.2;
In some FAST_IPSEC, spl level is not restored correctly. Fix that.

Spotted by Wolfgang Stukenbrock in pr/36800


Revision tags: nick-csl-alignment-base5
# 1.252 30-Aug-2007 dyoung

Use malloc(9) for sockaddrs instead of pool(9), and remove dom_sa_pool
and dom_sa_len members from struct domain. Pools of fixed-size
objects are too rigid for sockaddr_dls, whose size can vary over
a wide range.

Return sockaddr_dl to its "historical" size. Now that I'm using
malloc(9) instead of pool(9) to allocate sockaddr_dl, I can create
a sockaddr_dl of any size in the kernel, so expanding sockaddr_dl
is useless.

Avoid using sizeof(struct sockaddr_dl) in the kernel.

Introduce sockaddr_dl_alloc() for allocating & initializing an
arbitrary sockaddr_dl on the heap.

Add an argument, the sockaddr length, to sockaddr_alloc(),
sockaddr_copy(), and sockaddr_dl_setaddr().

Constify: LLADDR() -> CLLADDR().

Where the kernel overwrites LLADDR(), use sockaddr_dl_setaddr(),
instead. Used properly, sockaddr_dl_setaddr() will not overrun
the end of the sockaddr.


# 1.251 10-Aug-2007 dyoung

branches: 1.251.2;
Use sockaddr_dl_init().


Revision tags: matt-mips64-base
# 1.250 19-Jul-2007 dyoung

branches: 1.250.4; 1.250.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base yamt-idlelwp-base8 mjf-ufs-trans-base
# 1.249 02-May-2007 dyoung

branches: 1.249.2;
Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


Revision tags: thorpej-atomic-base
# 1.248 25-Mar-2007 liamjfoy

Add net.inet.ip.hashsize to control the IPv4 fast forward hash table size.


# 1.247 24-Mar-2007 liamjfoy

Don't call ip*flow_reap if we're just looking up maxflows


# 1.246 12-Mar-2007 ad

branches: 1.246.2; 1.246.4;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.


# 1.245 05-Mar-2007 liamjfoy

branches: 1.245.2;
Move ipflow_slowtimo from ip_slowtimo and into in_proto.c

ok matt@


# 1.244 04-Mar-2007 christos

Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.243 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.242 29-Jan-2007 dyoung

branches: 1.242.2;
Cosmetic: remove extraneous, non-KNF parentheses. Change a
sizeof(type) to a sizeof(*ptr) so the correctness of the statement
is correct "at a glance" (or so I hope).


# 1.241 22-Dec-2006 ad

ipintr(): check if the queue is empty before looping. Hardly a giant
win, but removed 30% of splnet() calls in one local test.


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.240 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.239 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


# 1.238 06-Dec-2006 dyoung

KNF.


# 1.237 06-Dec-2006 dyoung

KNF.


Revision tags: netbsd-4-0-RC1 netbsd-4-base
# 1.236 16-Nov-2006 christos

branches: 1.236.2; 1.236.4;
__unused removal on arguments; approved by core.


Revision tags: yamt-splraiseipl-base2
# 1.235 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


# 1.234 10-Oct-2006 dogcow

change the MOWNER_INIT define to take two args; fix extant struct mowner
decls to use it. Makes options MBUFTRACE compile again and not whinge about
missing structure declarations. (Also makes initialization consistent.)


# 1.233 05-Oct-2006 tls

Protect calls to pool_put/pool_get that may occur in interrupt context
with spl used to protect other allocations and frees, or datastructure
element insertion and removal, in adjacent code.

It is almost unquestionably the case that some of the spl()/splx() calls
added here are superfluous, but it really seems wrong to see:

s=splfoo();
/* frob data structure */
splx(s);
pool_put(x);

and if we think we need to protect the first operation, then it is hard
to see why we should not think we need to protect the next. "Better
safe than sorry".

It is also almost unquestionably the case that I missed some pool
gets/puts from interrupt context with my strategy for finding these
calls; use of PR_NOWAIT is a strong hint that a pool may be used from
interrupt context but many callers in the kernel pass a "can wait/can't
wait" flag down such that my searches might not have found them. One
notable area that needs to be looked at is pf.

See also:

http://mail-index.netbsd.org/tech-kern/2006/07/19/0003.html
http://mail-index.netbsd.org/tech-kern/2006/07/19/0009.html


# 1.232 19-Sep-2006 elad

Remove ugly (void *) casts from network scope authorization wrapper and
calls to it.

While here, adapt code for system scope listeners to avoid some more
casts (forgotten in previous run).

Update documentation.


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9
# 1.231 13-Sep-2006 elad

branches: 1.231.2;
Don't use KAUTH_RESULT_* where it's not applicable.
Prompted by yamt@.


# 1.230 08-Sep-2006 elad

First take at security model abstraction.

- Add a few scopes to the kernel: system, network, and machdep.

- Add a few more actions/sub-actions (requests), and start using them as
opposed to the KAUTH_GENERIC_ISSUSER place-holders.

- Introduce a basic set of listeners that implement our "traditional"
security model, called "bsd44". This is the default (and only) model we
have at the moment.

- Update all relevant documentation.

- Add some code and docs to help folks who want to actually use this stuff:

* There's a sample overlay model, sitting on-top of "bsd44", for
fast experimenting with tweaking just a subset of an existing model.

This is pretty cool because it's *really* straightforward to do stuff
you had to use ugly hacks for until now...

* And of course, documentation describing how to do the above for quick
reference, including code samples.

All of these changes were tested for regressions using a Python-based
testsuite that will be (I hope) available soon via pkgsrc. Information
about the tests, and how to write new ones, can be found on:

http://kauth.linbsd.org/kauthwiki

NOTE FOR DEVELOPERS: *PLEASE* don't add any code that does any of the
following:

- Uses a KAUTH_GENERIC_ISSUSER kauth(9) request,
- Checks 'securelevel' directly,
- Checks a uid/gid directly.

(or if you feel you have to, contact me first)

This is still work in progress; It's far from being done, but now it'll
be a lot easier.

Relevant mailing list threads:

http://mail-index.netbsd.org/tech-security/2006/01/25/0011.html
http://mail-index.netbsd.org/tech-security/2006/03/24/0001.html
http://mail-index.netbsd.org/tech-security/2006/04/18/0000.html
http://mail-index.netbsd.org/tech-security/2006/05/15/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/01/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/25/0000.html

Many thanks to YAMAMOTO Takashi, Matt Thomas, and Christos Zoulas for help
stablizing kauth(9).

Full credit for the regression tests, making sure these changes didn't break
anything, goes to Matt Fleming and Jaime Fournier.

Happy birthday Randi! :)


Revision tags: yamt-pdpolicy-base8 rpaulo-netinet-merge-pcb-base
# 1.229 30-Aug-2006 christos

branches: 1.229.2;
fix initializer


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.228 30-Jul-2006 elad

ugh.. more stuff that's overdue and should not be in 4.0: remove the
sysctl(9) flags CTLFLAG_READONLY[12]. luckily they're not documented
so it's only half regression.

only two knobs used them; proc.curproc.corename (check added in the
existing handler; its CTLFLAG_ANYWRITE, yay) and net.inet.ip.forwsrcrt,
that got its own handler now too.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base
# 1.227 07-Jun-2006 kardel

merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base simonb-timecounters-base
# 1.226 08-May-2006 liamjfoy

branches: 1.226.2;
#if -> #ifdef

ok christos


# 1.225 15-Apr-2006 christos

Coverity CID 1134: Protect against NULL deref.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.224 18-Feb-2006 joerg

branches: 1.224.2; 1.224.4; 1.224.6;
Print the source and destination IP in ip_forward's DIAGNOSTIC code
with inet_ntoa, making it more human friendly.

From Liam J. Foy in private mail.


# 1.223 24-Dec-2005 perry

branches: 1.223.2; 1.223.4; 1.223.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.222 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.221 01-Nov-2005 christos

Don't decrement the ttl, until we are sure that we can forward this packet.
Before if there was no route, we would call icmp_error with a datagram
packet that has an incorrect checksum. (From Liam Foy)


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base
# 1.220 23-Oct-2005 christos

No need to pass an interface when only the mtu is needed. From OpenBSD via
Liam Foy.


Revision tags: yamt-vop-base
# 1.219 05-Aug-2005 elad

branches: 1.219.2;
Add sysctls for IP, ICMP, TCP, and UDP statistics.


# 1.218 28-Jun-2005 seanb

branches: 1.218.2;
- Return ICMP_UNREACH_NET when no route found as per
section 4.3.3.1 of rfc1812.


# 1.217 09-Jun-2005 atatat

Properly fix the constipated lossage wrt -Wcast-qual and the sysctl
code. I know it's not the prettiest code, but it seems to work rather
well in spite of itself.


# 1.216 01-Jun-2005 blymn

Unconstify rnode to prevent compile error when GATEWAY option set.


Revision tags: kent-audio2-base
# 1.215 29-Apr-2005 yamt

move decl of inetsw to its own header to avoid array of incomplete type.
found by gcc4. reported by Adam Ciarcinski.


# 1.214 18-Apr-2005 yamt

fix problems related to loopback interface checksum omission. PR/29971.

- for ipv4, defer decision to ip layer as h/w checksum offloading does
so that it can check the actual interface the packet is going to.
- for ipv6, disable it.
(maybe will be revisited when it implements h/w checksum offloading.)

ok'ed by Jason Thorpe.


# 1.213 29-Mar-2005 yamt

ip_reass: clear stale csum_flags.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base
# 1.212 26-Feb-2005 perry

branches: 1.212.2;
nuke trailing whitespace


Revision tags: yamt-km-base2
# 1.211 03-Feb-2005 perry

ANSIfy function declarations


# 1.210 02-Feb-2005 perry

de-__P -- will ANSIfy .c files later.


Revision tags: yamt-km-base
# 1.209 24-Jan-2005 matt

branches: 1.209.2;
Add IFNET_FOREACH and IFADDR_FOREACH macros and start using them.


Revision tags: kent-audio1-beforemerge
# 1.208 19-Dec-2004 christos

branches: 1.208.2;
yamt's changes seem to fix all the checksumming issues. Turn the loopback
checksums back off so we can make sure that everything works.


# 1.207 17-Dec-2004 christos

Turn checksumming on loopback back on until we fix the bugs in it.
Connect over tcp on the loopback is broken:

4729 amq 0.000007 CALL connect(4,0x804f2a0,0x1c)
4729 amq 75.007420 RET connect -1 errno 60 Connection timed out


# 1.206 15-Dec-2004 thorpej

Don't perform checksums on loopback interfaces. They can be reenabled with
the net.inet.*.do_loopback_cksum sysctl.

Approved by: groo


Revision tags: kent-audio1-base
# 1.205 06-Oct-2004 darrenr

Add a comment to document what setting "srcrt" is really on about in ipintr()


# 1.204 29-Sep-2004 christos

PR/27081: Sean Boudreau: ip_input() bad csum count not incremented on sw csum


Revision tags: BEFORE-IPF413
# 1.203 25-May-2004 atatat

Sysctl descriptions under net subtree (net.key not done)


# 1.202 02-May-2004 darrenr

at line 543, we do a pullup here of hlen bytes into the mbuf,
so these later ones are superfluous.


# 1.201 01-May-2004 matt

Use EVCNT_ATTACH_STATIC{,2}


# 1.200 25-Apr-2004 simonb

Initialise (most) pools from a link set instead of explicit calls
to pool_init. Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.

Convert struct session, ucred and lockf to pools.


# 1.199 22-Apr-2004 matt

Constify protosw arrays. This can reduce the kernel .data section by
over 4K (if all the network protocols) are loaded.


# 1.198 01-Apr-2004 matt

In ip_reass_ttl_descr, make i signed since it's compared to >= 0


Revision tags: netbsd-2-0-base BEFORE-IPF411
# 1.197 24-Mar-2004 atatat

branches: 1.197.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.


# 1.196 15-Jan-2004 itojun

correct typo in 1.94 -> 1.95. pointed out by Shiva Shenoy


# 1.195 14-Dec-2003 thorpej

Fix syntax errors in CHECK_NMBCLUSTER_PARAMS().


# 1.194 14-Dec-2003 jonathan

Second part of hashed IP_reassembly changes:

When under pressure for mbufs or we have too many fragments in the IP
reassembly queue, drop half of all fragments. This multiplicative-drop
strategy ensures we return to a healthy state, even under borderline
denial-of-service from extremely lossy NFS-over-UDP peers.
The multiplicative-drop phase currently drops 50% of fragments, but
has pre-placed support for implementing drop-fractions other than 50%

The threshhold for the `drop-half' phase is the new variable,
ip_maxfrags which is calculated as nmbclusters/4.

ip_input.c now keeps ip_nmbclusters, a cached copy of nmbclusters.
Before using limits derived from nmbclusters, we check if nmbclusters
and ip_nmclusters are equal. If not, we recompute Ip parameters
derived from nmbclusters. Based on a suggestion by Jason Thorpe.
ip_maxfrags is currently auto-recalcuated.

The counters ip_nfrags and ip_nfragpacketsr are now declared static
and uninitialized (bss), to discourage tampering with them.


# 1.193 12-Dec-2003 scw

Make fast-ipsec and ipflow (Fast Forwarding) interoperate.

The idea is that we only clear M_CANFASTFWD if an SPD exists
for the packet. Otherwise, it's safe to add a fast-forward
cache entry for the route.

To make this work properly, we invalidate the entire ipflow
cache if a fast-ipsec key is added or changed.


# 1.192 08-Dec-2003 jonathan

Add new field ipq_nfrags to struct ipq. Maintain count of fragments
(fragments, not fragmented packets) in each queue entry.
Use ipq_nfrags to maintain a count of total fragments in reassembly queue.


# 1.191 07-Dec-2003 jonathan

KNF: s/unsigned/u_int/, in a couple of places I missed.


# 1.190 06-Dec-2003 jonathan

Replace the single global IP reassembly list/listhead, with a
hashtable of list-heads. Independently re-invented, then reworked to
match similar code in FreeBSD.


# 1.189 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.188 04-Dec-2003 scw

ipflow (IP fast forwarding) is not compatible with FAST_IPSEC either.

XXX: The decision whether or not to fast forward should be made
XXX: dynamically. Using the current approach seriously reduces
XXX: routing performance on gateways with IPsec enabled.


# 1.187 26-Nov-2003 itojun

define RANDOM_IP_ID by default (unifdef -DRANDOM_IP_ID).
one use remains in sys/netipsec, which is kept for freebsd source code compat.


# 1.186 24-Nov-2003 scw

For FAST_IPSEC, ipfilter gets to see wire-format IPsec-encapsulated packets
only. Decapsulated packets bypass ipfilter. This mimics current behaviour
for Kame IPsec.


# 1.185 19-Nov-2003 fvdl

Correct number of arguments to sysctl_rdint.


# 1.184 19-Nov-2003 jonathan

Patch back support for (badly) randomized IP ids, by request:

* Include "opt_inet.h" everywhere IP-ids are generated with ip_newid(),
so the RANDOM_IP_ID option is visible. Also in ip_id(), to ensure
the prototype for ip_randomid() is made visible.

* Add new sysctl to enable randomized IP-ids, provided the kernel was
configured with RANDOM_IP_ID. (The sysctl defaults to zero, and is
a read-only zero if RANDOM_IP_ID is not configured).

Note that the implementation of randomized IP ids is still defective,
and should not be enabled at all (even if configured) without
very careful deliberation. Caveat emptor.


# 1.183 17-Nov-2003 jonathan

Diff to netinet/ip_input.c (restore ip_id, initialize) for ip_id fix:

Revert the (default) ip_id algorithm to the pre-randomid algorithm,
due to demonstrated low-period repeated IDs from the randomized IP_id
code. Consensus is that the low-period repetition (much less than
2^15) is not suitable for general-purpose use.

Allocators of new IPv4 IDs should now call the function ip_newid().
Randomized IP_ids is now a config-time option, "options RANDOM_IP_ID".
ip_newid() can use ip_random-id()_IP_ID if and only if configured
with RANDOM_IP_ID. A sysctl knob should be provided.

This API may be reworked in the near future to support linear ip_id
counters per (src,dst) IP-address pair.


# 1.182 12-Nov-2003 itojun

KNF


# 1.181 11-Nov-2003 jonathan

Change global head-of-local-IP-address list from in_ifaddr to
in_ifaddrhead. Recent changes in struct names caused a namespace
collision in fast-ipsec, which are most cleanly fixed by using
"in_ifaddrhead" as the listhead name.


# 1.180 10-Nov-2003 jonathan

Make per-protocol network input queue stats visible to userland via
sysctl. Add a protocol-independent sysctl handler to show the per-protocol
"struct ifq' statistics. Add IP(v4) specific call to the handler.
Other protocols can show their per-protocol input statistics by
allocating a sysclt node and calling sysctl_ifq() with their own struct ifq *.

As posted to tech-kern plus improvements/cleanup suggested by Andrew Brown.


# 1.179 28-Sep-2003 mycroft

Remove some code that breaks AH tunnels completely. The comment describing
the purpose of this code appears to be on crack -- it's talking about
end-to-end authentication, but the purpose of an AH tunnel is NOT end-to-end
authentication; it's authentication of the tunnel endpoints.

NB: This does not fix the fact that IPsec leaks "packet tags."


# 1.178 06-Sep-2003 itojun

randomize IPv4/v6 fragment ID and IPv6 flowlabel. avoids predictability
of these fields. ip_id.c is from openbsd. ip6_id.c is adapted by kame.


# 1.177 06-Sep-2003 itojun

backout previous, we don't know if arc4random() corrides on reboot.


# 1.176 05-Sep-2003 itojun

initialize fragment ID with arc4random, not by time.tv_sec


# 1.175 22-Aug-2003 itojun

remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.


# 1.174 22-Aug-2003 itojun

change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.


# 1.173 15-Aug-2003 jonathan

(fast-ipsec): Add hooks to pass IPv4 IPsec traffic into fast-ipsec, if
configured with ``options FAST_IPSEC''. Kernels with KAME IPsec or
with no IPsec should work as before.

All calls to ip_output() now always pass an additional compulsory
argument: the inpcb associated with the packet being sent,
or 0 if no inpcb is available.

Fast-ipsec tested with ICMP or UDP over ESP. TCP doesn't work, yet.


# 1.172 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.171 14-Jul-2003 itojun

correct igmp. from love


# 1.170 03-Jul-2003 itojun

minor KNF


# 1.169 30-Jun-2003 itojun

branches: 1.169.2;
do not generate ICMP redirect when packet filter alters ip_dst to an
address that reside on the same link. Cedric Berger convinced me that
it is necessary.


# 1.168 30-Jun-2003 itojun

fix indent


# 1.167 23-Jun-2003 martin

Make sure to include opt_foo.h if a defflag option FOO is used.


# 1.166 15-Jun-2003 matt

Change the way multicasts are kept. They now use a hash table in the same
manner as the ifaddr hash table. By doing this, the mkludge code can go
away. At the same time, keep track of what pcbs are using what ifaddr and
when an address is deleted from an interface, notify/abort all sockets
that have that address as a source. Switch IGMP and multicasts to use pools
for allocation. Fix a number of potential problems in the igmp code where
allocation failures could cause a trap/panic.


# 1.165 11-Apr-2003 christos

PR/991: Darren Reed: Add a sysctl (checkinteface) to implement this. This
implementation is taken from FreeBSD, but we default to off.
XXX: We should really do this on a per ifaddr basis as jason suggested.


# 1.164 26-Feb-2003 matt

Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.163 12-Nov-2002 itojun

remove all entries in rt timer queue on ip_mtudisc change, instead of
destroying the queue.


# 1.162 12-Nov-2002 itojun

ckout previous - doesn't compile


# 1.161 12-Nov-2002 itojun

update ip_mtudisc sysctl change handling.


# 1.160 10-Nov-2002 itojun

always create pmtud timeout queue, as ip_mtudisc can be tweaked via
sysctl at runtime. From lha@stacken.kth.se


# 1.159 02-Nov-2002 perry

/*CONTCOND*/ while (0)'ed macros


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.158 23-Sep-2002 itojun

revert mtudisc_timeout value to the old one if update falis


# 1.157 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame.


# 1.156 11-Sep-2002 itojun

correct signedness mixup in pointer passing. sync w/kame


Revision tags: gehenna-devsw-base
# 1.155 14-Aug-2002 itojun

avoid swapping endian of ip_len and ip_off on mbuf, to meet with M_LEADINGSPACE
optimization made last year. should solve PR 17867 and 10195.

IP_HDRINCL behavior of raw ip socket is kept unchanged. we may want to
provide IP_HDRINCL variant that does not swap endian.


# 1.154 30-Jun-2002 thorpej

Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).


# 1.153 13-Jun-2002 itojun

set IPv4 parameter to modern value.
- turn on path MTU discovery (previous: turned off)
- ICMPv4 redirect entry timeout = 600 sec (previous: never timeout)


# 1.152 09-Jun-2002 itojun

whitespace


# 1.151 07-Jun-2002 itojun

look at rmx_mtu on IPsec tunnel MTU computation.
From: David Waitzman <djw@bbn.com>


Revision tags: netbsd-1-6-base
# 1.150 12-May-2002 matt

branches: 1.150.2; 1.150.4;
Eliminate commons.


# 1.149 12-May-2002 wiz

Spelling fixes, from Sergey Svishchev in kern/16650.


# 1.148 07-May-2002 matt

Change struct ipqe to use TAILQ's instead of LIST's (primarily for TCP's
benefit currently). Rework tcp_reass code to optimize the 4 most likely causes
of out-of-order packets: first OoO pkt, next OoO pkt in seq, OoO pkt is part
of new chuck of OoO packets, and the OoO pkt fills the first hole. Add evcnts
to instrument tcp_reass (enabled by the options TCP_REASS_COUNTERS). This is
part 1/2 of tcp_reass changes.


# 1.147 18-Apr-2002 matt

Change test for M_EXT to M_READONLY for MROUTING. We only need to to do
a pullup if we aren't allowed to modify the packet.


Revision tags: eeh-devprop-base newlock-base
# 1.146 08-Mar-2002 thorpej

Pool deals fairly well with physical memory shortage, but it doesn't
deal with shortages of the VM maps where the backing pages are mapped
(usually kmem_map). Try to deal with this:

* Group all information about the backend allocator for a pool in a
separate structure. The pool references this structure, rather than
the individual fields.
* Change the pool_init() API accordingly, and adjust all callers.
* Link all pools using the same backend allocator on a list.
* The backend allocator is responsible for waiting for physical memory
to become available, but will still fail if it cannot callocate KVA
space for the pages. If this happens, carefully drain all pools using
the same backend allocator, so that some KVA space can be freed.
* Change pool_reclaim() to indicate if it actually succeeded in freeing
some pages, and use that information to make draining easier and more
efficient.
* Get rid of PR_URGENT. There was only one use of it, and it could be
dealt with by the caller.

From art@openbsd.org.


Revision tags: ifpoll-base
# 1.145 25-Feb-2002 itojun

correctly enforce ipsec policy check on forwarding case.
From: Greg Troxel <gdt@ir.bbn.com>, Bill Chiarchiaro <wjc@work.cleartech.com>


# 1.144 24-Feb-2002 martin

Clear M_BCAST and M_MCAST on outgoing mbufs.
Don't copy ttl from the inner packet to the encapsulating packet. Make
the outer ttl sysctl'able. This should close PR 14269 from Jasper Wallace
(change partly from there) and it makes traceroute work over gre tunnels.


# 1.143 21-Feb-2002 itojun

suppress source quence message, based on router-req RFC (also could be abused
as DoS traffic generator). from kjc/kame


# 1.142 28-Nov-2001 darrenr

recompute hlen after calling pfil_run_hooks() in case ip_hl was changed.


# 1.141 13-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-mips-cache-base
# 1.140 04-Nov-2001 matt

Convert netinet to not use the internal <sys/queue.h> field names
but instead the access macros. Use the FOREACH macros where appropriate.


# 1.139 04-Nov-2001 matt

Change a few variable/tables to const since they are read-only.


# 1.138 29-Oct-2001 simonb

Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.137 17-Sep-2001 thorpej

branches: 1.137.2;
Split the pre-computed ifnet checksum flags into Tx and Rx directions.
Add capabilities bits that indicate an interface can only perform
in-bound TCPv4 or UDPv4 checksums. There is at least one Gig-E chip
for which this is true (Level One LXT-1001), and this is also the
case for the Intel i82559 10/100 Ethernet chips.


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.136 06-Aug-2001 itojun

branches: 1.136.2;
cache IPsec policy on in6?pcb. most of the lookup operations can be bypassed,
especially when it is a connected SOCK_STREAM in6?pcb. sync with kame.


# 1.135 02-Jun-2001 thorpej

branches: 1.135.2;
Implement support for IP/TCP/UDP checksum offloading provided by
network interfaces. This works by pre-computing the pseudo-header
checksum and caching it, delaying the actual checksum to ip_output()
if the hardware cannot perform the sum for us. In-bound checksums
can either be fully-checked by hardware, or summed up for final
verification by software. This method was modeled after how this
is done in FreeBSD, although the code is significantly different in
most places.

We don't delay checksums for IPv6/TCP, but we do take advantage of the
cached pseudo-header checksum.

Note: hardware-assisted checksumming defaults to "off". It is
enabled with ifconfig(8). See the manual page for details.

Implement hardware-assisted checksumming on the DP83820 Gigabit Ethernet,
3c90xB/3c90xC 10/100 Ethernet, and Alteon Tigon/Tigon2 Gigabit Ethernet.


# 1.134 21-May-2001 lukem

fix spelo in comment


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.133 16-Apr-2001 itojun

give a default value to net.inet.ip.maxfragpackets, to protect us from
"lots of fragmented packets" DoS attack.

the current default value is derived from ipv6 counterpart, which is
a magical value "200". it should be enough for normal systems, not sure
if it is enough when you take hundreds of thousands of tcp connections on
your system. if you have proposal for a better value with concrete reasons,
let me know.


# 1.132 13-Apr-2001 thorpej

Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.


# 1.131 27-Mar-2001 itojun

net.inet.ip.maxfragpackets defines the maximum size of ip reass queue
(prevents fragment flood from chewing up mbuf memory space).
derived from KAME net.inet6.ip6.maxfragpackets.


# 1.130 02-Mar-2001 itojun

branches: 1.130.2;
increase ipstat.ips_badaddr if the packet fails to pass address checks.


# 1.129 02-Mar-2001 itojun

reject packets with 127/8 on IPv4 src/dst, they must not appear on wire
(RFC1122). torture-tests will be welcomed.
XXX do we want to check source routing headers as well?


# 1.128 01-Mar-2001 itojun

make sure to enforce inbound ipsec policy checking, for any protocols on top
of ip (check it when final header is visited). sync with kame.
XXX kame team will need to re-check policy engine code


# 1.127 24-Jan-2001 itojun

- record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation


# 1.126 28-Dec-2000 thorpej

Back out the sledgehammer damage applied by wiz while I was out for
the holiday.


# 1.125 25-Dec-2000 wiz

Back out previous change. It causes NAT to fail, and was CLEARLY
NOT TESTED before it was committed.


# 1.124 22-Dec-2000 thorpej

Slight adjustment to how pfil_head's are registered. Instead of a
"key" and a "dlt", use a "type" (PFIL_TYPE_{AF,IFNET} for now) and
a val/ptr appropriate for that type. This allows for more future
flexibility with the pfil_hook mechanism.


# 1.123 14-Dec-2000 thorpej

Add ALTQ glue. XXX Temporary until ALTQ is changed to use a pfil hook.


# 1.122 24-Nov-2000 itojun

IFA_STATS stability (not complete); don't touch ip if it is NULL.


# 1.121 11-Nov-2000 thorpej

Restructure the PFIL_HOOKS mechanism a bit:
- All packets are passed to PFIL_HOOKS as they come off the wire, i.e.
fields in protocol headers in network order, etc.
- Allow for multiple hooks to be registered, using a "key" and a "dlt".
The "dlt" is a BPF data link type, indicating what type of header is
present.
- INET and INET6 register with key == AF_INET or AF_INET6, and
dlt == DLT_RAW.
- PFIL_HOOKS now take an argument for the filter hook, and mbuf **,
an ifnet *, and a direction (PFIL_IN or PFIL_OUT), thus making them
less IP (really, IP Filter) centric.

Maintain compatibility with IP Filter by adding wrapper functions for
IP Filter.


# 1.120 08-Nov-2000 ad

Update for hashinit() change.


# 1.119 13-Oct-2000 itojun

make sure we don't share external mbuf between m and mcopy, in ip_forward().
should solve PR 11201.


# 1.118 26-Aug-2000 itojun

make sure anonport{min,max} is not negative number


# 1.117 25-Aug-2000 tron

Add new sysctl variables "net.inet.ip.lowportmin" and
"net.inet.ip.lowportmax" which can be used to the set minimum
and maximum port number assigned to sockets using
IP_PORTRANGE_LOW.


# 1.116 06-Jul-2000 itojun

remove unnecessary #include <netkey/key_debug.h>. from kame.


# 1.115 28-Jun-2000 mrg

<vm/vm.h> -> <uvm/uvm_extern.h>


Revision tags: netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.114 10-May-2000 itojun

branches: 1.114.4;
add missing boundary checks to ip options processing.
correct timestamp option validation (len and ptr upper/lower bound
based on RFC791).
fill "pointer" field for parameter problem in timestamp option processing.


# 1.113 10-May-2000 itojun

correct more out-of-bounds memory access, if cnt == 1 and optlen > 1.


# 1.112 06-May-2000 sommerfeld

Handle large offsets with very small options correctly.


# 1.111 31-Mar-2000 jdolecek

Slighly improve previous - only include <netinet/ip_mroute.h> if MROUTING
is defined.


# 1.110 31-Mar-2000 jdolecek

include <netinet/ip_mroute.h> for ip_mforward() - needed after
last duplicate prototype sweep (prototype for ip_mforward() used to be in <netinet/ip_var.h>)


# 1.109 30-Mar-2000 augustss

Remove register declarations.


# 1.108 30-Mar-2000 simonb

Delete uninitialised declaration of ip_defttl - there's an initialised
decl earlier in this file.


# 1.107 10-Mar-2000 thorpej

Back out previous, and adjust a comment.


# 1.106 07-Mar-2000 thorpej

Back out part of 1.104 which isn't actually needed.


# 1.105 03-Mar-2000 itojun

remove unnecessary ttl initialization which I mistakingly bringed in
during KAME merge (this is part of WIDE's expeirmental reass code...)
NetBSD PR: 9412
From: Wolfgang Rupprecht <wolfgang@wsrcc.com>
Fix from: ho@crt.se
itojun was notified from: theo


# 1.104 02-Mar-2000 thorpej

Avoid a bug in GCC which manifests itself when processing unaligned
IP options. Problem pointed out by Matt Hargett and Erik Fair, analyzed
by me.


# 1.103 01-Mar-2000 itojun

introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)


# 1.102 20-Feb-2000 darrenr

pass "struct pfil_head *" to pfil_add_hook and pfil_remove hook rather
than "struct protosw *".


# 1.101 17-Feb-2000 darrenr

Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.


# 1.100 16-Feb-2000 itojun

- if ip_dst matches address on !IFF_UP interface, and
- there's no match against addresses on IFF_UP interface,
send icmp unreach if I'm router. drop it if I'm host.

Revised version of PR: 9387 from nrt@iij.ad.jp. Discussed with thorpej+nrt.


Revision tags: chs-ubc2-newbase
# 1.99 12-Feb-2000 thorpej

Typo (Thanks, Havard :-)


# 1.98 12-Feb-2000 thorpej

Small cosmetic change, and note a place where a statistic should be
gathered.


# 1.97 11-Feb-2000 itojun

fix in-kernel packet forwarding loop (till TTL becomes 0) when:
- a packet is delivered to an address X,
- and the address X is configured on my !IFF_UP interface
- and ipforwarding=1

NetBSD PR: 9387
From: nrt@iij.ad.jp


# 1.96 01-Feb-2000 thorpej

Use ifatoia() and sintosa() consistently, rather than using home-grown
casting macros intermixed.


# 1.95 31-Jan-2000 itojun

bring in latest KAME ipsec tree.
- interop issues in ipcomp is fixed
- padding type (after ESP) is configurable
- key database memory management (need more fixes)
- policy specification is revisited

XXX m->m_pkthdr.rcvif is still overloaded - hope to fix it soon


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.94 26-Oct-1999 itojun

disable ipflow (IPv4 fast fowarding) when IPsec is configured into the kernel.


# 1.93 17-Oct-1999 sommerfeld

branches: 1.93.2; 1.93.4;
In ip_forward():

Avoid forwarding ip unicast packets which were contained inside
link-level multicast packets; having M_MCAST still set in the packet
header flags will mean that the packet will get multicast to a bogus
group instead of unicast to the next hop.

Malformed packets like this have occasionally been spotted "in the
wild" on a mediaone cable modem segment which also had multiple netbsd
machines running as router/NAT boxes.

Without this, any subnet with multiple netbsd routers receiving all
multicasts will generate a packet storm on receipt of such a
multicast. Note that we already do the same check here for link-level
broadcasts; ip6_forward already does this as well.

Note that multicast forwarding does not go through ip_forward().

Adding some code to if_ethersubr to sanity check link-level
vs. ip-level multicast addresses might also be worthwhile.


Revision tags: chs-ubc2-base
# 1.92 23-Jul-1999 itojun

branches: 1.92.2;
do not include unnecessary include files.


# 1.91 09-Jul-1999 thorpej

defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).


# 1.90 06-Jul-1999 itojun

sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups


# 1.89 01-Jul-1999 itojun

IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.88 26-Jun-1999 sommerfeld

If the new global variable hostzerobroadcast is zero, no longer assume
address zero of each net/subnet is a broadcast address.
(The default value is nonzero, which preserves the current behavior).

This can be set using sysctl; the boot-time default can also be
configured using the HOSTZEROBROADCAST kernel config option.

While we're here, defopt HOSTZEROBROADCAST and SUBNETSARELOCAL


# 1.87 04-May-1999 hwr

It does not make much sense to increase a "output" counter on input.


# 1.86 03-May-1999 thorpej

In INADDR_TO_IA(), skip interfaces which are not up. Revert previous change
to ip_input.c to check the interface status after INADDR_TO_IA().

Fix cooked up by Heiko Rupp and myself.

Fixes PR 7480.


# 1.85 03-May-1999 hwr

Drop packets, that have a Class-D address as source address.
Implements the first half of PR 7003.


# 1.84 07-Apr-1999 proff

tiny KNF change


# 1.83 07-Apr-1999 proff

Prevent reception of packets on downed interfaces (via an up interface).
fixes kern/7327


Revision tags: netbsd-1-4-base
# 1.82 27-Mar-1999 aidan

branches: 1.82.2;
Added per-addr input/output statistics. Currently just support netatalk
and netinet, currently only tested under netinet.

Disabled by default, enabled by compiling the kernel with option
IFA_STATS. Enabling this feature seems to make the ip_output function
take 13% longer than before, which should be OK for people that need
this feature.


# 1.81 26-Mar-1999 proff

security: test for ip_len < ip_hl <<2 and drop packet accordingly


# 1.80 19-Jan-1999 mycroft

There's just no plausible reason to byte-swap ip_id internally. It's opaque.


# 1.79 19-Jan-1999 mycroft

Don't screw with ip_len; just subtract from it where we actually use the
value.


# 1.78 19-Jan-1999 mycroft

Don't overwrite the checksum fields when checking them. There's no reason to
do this, and it screws up ICMP replies.
XXX The returned IP checksum and length are still wrong.


# 1.77 11-Jan-1999 thorpej

Fix byte order and ip_len inconsistencies in ICMP reply code. Also, fix
some formatting and HTONS(foo) vs. foo = htons(foo) inconsistencies.

PR #6602, Darren Reed.


# 1.76 19-Dec-1998 thorpej

Reverse the copyright-notice-swap. It went against existing practice.


# 1.75 18-Dec-1998 thorpej

Add a lock around the IP fragment reassembly queue, to prevent ip_drain()
from corrupting the queue if called from a device's interrupt context.

Should fix PR #5684.


Revision tags: kenh-if-detach-base
# 1.74 13-Nov-1998 thorpej

branches: 1.74.2;
Once a fragmented IP packet has been reassembled, recompute the packet
length before passing it up the stack. From FreeBSD.


Revision tags: chs-ubc-base
# 1.73 08-Oct-1998 thorpej

Use the pool allocator for ipflow entries.


# 1.72 08-Oct-1998 thorpej

Use the pool allocator for ipqent structures.


# 1.71 30-Sep-1998 tls

Switch order of TNF and UCB copyrights so UCB copyright is first; this seems more appropriate since UCB wrote the original code, after all.


# 1.70 09-Sep-1998 thorpej

Make a diagnostic printf more sensible, PR #5951, Heiko W. Rupp.


# 1.69 09-Aug-1998 mrg

defopt PFIL_HOOKS.


Revision tags: eeh-paddr_t-base
# 1.68 17-Jul-1998 sommerfe

Fix PR5508: ipfil cut-through forwarding causes panic


# 1.67 01-Jun-1998 thorpej

Protect the ipflow_reap() call with splsoftnet.


# 1.66 24-May-1998 thorpej

Fix OBOB in IP timestamp option processing, as noted in FreeBSD PR 6738,
from Jennifer Dawn Meyers <jdm@enteract.com>.


# 1.65 04-May-1998 matt

Default IP flow to being enabled. Add a sysctl to control the maximum
number of flows (net.inet.ip.maxflows). If set to 0, will disable fast
path forwarding.


# 1.64 01-May-1998 thorpej

Allow packet filters to prevent a packet from creating a fast-forwarding
flow, by setting the "can fast forward" flag in the packet header, and
giving a chance for filters to clear the flag. If the flag is still
set after the filters have given it a chance, the packet will be used
to create a fast-forward flow entry.


# 1.63 29-Apr-1998 matt

Add support for "fast" forwarding. Add hooks in if_ethersubr.c and
if_fddisubr.c to fastpath IP forwarding. If ip_forward successfully
forwards a packet, it will create a cache (ipflow) entry. ether_input
and fddi_input will first call ipflow_fastforward with the received
packet and if the packet passes enough tests, it will be forwarded (the
ttl is decremented and the cksum is adjusted incrementally).


# 1.62 29-Apr-1998 matt

defopt GATEWAY


# 1.61 29-Apr-1998 kml

change path MTU timeout value to match RFC 1191


# 1.60 29-Apr-1998 kml

Add support for deletion of routes added by path MTU discovery;
uses new generic route timeout code. Add sysctl for timeout period.


# 1.59 19-Mar-1998 mrg

convert pfil(9) in and out lists from <sys/queue.h> LISTs to TAILQs, and
change pfil_add_hook to put output filters at the tail of the queue,
while continuing to place input filters at the head of the queue. update
the two users of these functions, and document these changes.

fixes PR#4593.


# 1.58 15-Feb-1998 tls

Add correct copyright notice for IP address hash change. This code is donated to TNF by the original copyright holder, Panix.


# 1.57 13-Feb-1998 tls

Change list of interface IP addresses to a hash. Improves performance on hosts with a large number of IP addresses significantly.


# 1.56 28-Jan-1998 thorpej

Use offsetof() from libkern.h


# 1.55 12-Jan-1998 scottr

Use option header file for MROUTING


# 1.54 05-Jan-1998 lukem

enhance ephemeral port allocation code:
* support sysctl net.inet.ip.anonportmin (lowest ephemeral port)
and net.inet.ip.anonportmax (highest ephemeral port).
these can't be set to >65535, < IPPORT_RESERVED (unless IPNOPRIVPORTS
is defined), and anonportmin has to be < anonportmax.
* use a cleaner way of only cycling through the available set once;
this will be useful for when a random allocation scheme is used
* define IPPORT_ANON{MIN,MAX} instead of IPPORT_USER{LOW,HIGH}


Revision tags: netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base
# 1.53 18-Oct-1997 kml

branches: 1.53.2;
change sysctl net.inet.icmp.mtudisc to net.inet.ip.mtudisc


# 1.52 17-Oct-1997 thorpej

Allow `subnetsarelocal' to be changed via sysctl.


Revision tags: thorpej-signal-base marc-pcmcia-base
# 1.51 29-Aug-1997 gwr

Tweaks to allow operation with an interface address of 0.0.0.0
(needed for NFS mountroot using BOOTP to get boot parameters)


Revision tags: marc-pcmcia-bp
# 1.50 24-Jun-1997 thorpej

branches: 1.50.4;
Eliminate use of dtom() from the network code, allowing more flexible
use of mbuf external storage and increasing performance (by eliminating
an m_pullup() for clusters in the IP reassembly code).

Changes from Koji Imada <koji@math.human.nagoya-u.ac.jp>, in PR #3628
and #3480, with ever-so-slight integration changes by me.


# 1.49 15-Apr-1997 christos

Move the mtod calls *after* we've made sure that the packet has passed the
filter successfully. Otherwise it can be NULL if the filter blocked it,
and we die. How did this ever work?


Revision tags: is-newarp-before-merge
# 1.48 26-Feb-1997 mrg

allow src-routed packetd by default, per host requirements


# 1.47 25-Feb-1997 cjs

Add net.inet.ip.allowsrcrt option which allows/drops all source
routed packets. This currently defaults to `drop,' but once we
verify that all applications that rely on determining remote IP
addresses for authentication are dropping the connection when they
see a source route option (not just disabling the source route
option), we can turn this back on and conform with the host
requirements.


# 1.46 19-Feb-1997 cjs

Fix bug in sysctl net.inet.ip.forwsrcrt handing: now you can read it
if securelevel > 0. (Thanks, cgd.)


# 1.45 18-Feb-1997 mrg

pseudo-device ipfilter brings in PFIL_HOOKS.


Revision tags: is-newarp-base
# 1.44 11-Jan-1997 thorpej

branches: 1.44.4;
Implement the IP_RECVIF socket option: supply a datagram packet's incoming
interface using a sockaddr_dl in a control mbuf.

Implement SO_TIMESTAMP for IP datagrams.

Move packet information option processing into a generic function
so that they work with multicast UDP and raw IP as well as unicast UDP.

Contributed by Bill Fenner <fenner@parc.xerox.com>.


# 1.43 20-Dec-1996 mrg

in pfil_hooks: always reassign ip after calling hook.


# 1.42 20-Dec-1996 mrg

remove pfil_bad.


# 1.41 25-Oct-1996 thorpej

Before concatenating frags, sanity check the length of the packet. If it's
larger than IP_MAXPACKET, discard it.
Based on a patch from Bill Fenner <fenner@parc.xerox.com>


# 1.40 22-Oct-1996 veego

Fix a panic from the pfil_hooks.


# 1.39 13-Oct-1996 christos

backout previous kprintf changes


# 1.38 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.37 21-Sep-1996 perry

commit fix in pr 2772 -- the IP input code was assuming that the
reserved (must be zero) flag must necessarily be zero. We now define
an IP_RF (by analogy to IP_DF and IP_MF) and mask it out when necessary.


# 1.36 14-Sep-1996 mrg

move the packet filter hooks in to a saner location. while i'm here, rename
PACKET_FILTER to PFIL_HOOKS.


# 1.35 09-Sep-1996 mycroft

Add in_nullhost() and in_hosteq() macros, to hide some protocol
details. Also, fix a bug in TCP wrt SYN+URG packets.


# 1.34 08-Sep-1996 mycroft

Save 68 bytes of the packet for ICMP, not 64. From Laine Stump, PR 2296.


# 1.33 06-Sep-1996 mrg

add packet filter interface code. see pfil(9) for more details. you
need the PACKET_FILTER option to enable this code. currently, ipfilter
version 3.1.1-beta has been converted to use this new interface.


# 1.32 14-Aug-1996 thorpej

Fix some DIAGNOSTIC printf() formats; ntohl() provides a 32-bit quantity,
and should be printed with %x, not %lx.


# 1.31 10-Jul-1996 cgd

print result of ntohl/htonl as a long. (makes -Wformat work on the
Alpha.)


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.30 16-Mar-1996 christos

branches: 1.30.4;
Fix printf format args.


# 1.29 26-Feb-1996 mrg

two more local addr changes, all done differently now (idea from charles)


# 1.28 13-Feb-1996 christos

netinet prototypes


# 1.27 16-Jan-1996 thorpej

Add a net.inet.ip.directed-broadcast sysctl as suggested by
Darren Reed <darrenr@vitruvius.arbld.unimelb.edu.au> in PR #1227.
This change is slightly different than the one submitted by Darren in
that the DIRECTED_BROADCAST compile-time option will behave like it used
to so that existing configurations utilizing it won't have to change.


# 1.26 15-Jan-1996 thorpej

Add net.inet.ip.forwsrcrt: if zero, the system will not forward
source-routed packets. Note this value is protected by kernel security
level; it can only be changed if securelevel < 1.


# 1.25 21-Nov-1995 cgd

make netinet work on systems where pointers and longs are 64 bits
(like the alpha). Biggest problem: IP headers were overlayed with
structure which included pointers, and which therefore didn't overlay
properly on 64-bit machines. Solution: instead of threading pointers
through IP header overlays, add a "queue element" structure to do
the threading, and point it at the ip headers.


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.24 12-Aug-1995 mycroft

splnet --> splsoftnet


# 1.23 12-Jun-1995 mycroft

Change in_pcbnotify*() to take an errno value. Make inetctlerrmap[] an
array on ints, not u_chars.


# 1.22 12-Jun-1995 mycroft

Various cleanup, including:
* Convert several data structures to use queue.h.
* Split in_pcbnotify() into two parts; one for notifying a specific PCB, and
one for notifying all PCBs for a particular foreign address.


# 1.21 07-Jun-1995 mycroft

Remove ip_ifmatrix completely.


# 1.20 04-Jun-1995 mycroft

Don't cast things unnecessarily.


# 1.19 04-Jun-1995 mycroft

Clean up many more casts.


# 1.18 01-Jun-1995 mycroft

Avoid byte-swapping IP addresses at run time.


# 1.17 15-May-1995 cgd

oops; forgot a '{'


# 1.16 14-May-1995 cgd

drop (and record) malformed IP fragments. Fixes pr 1030 (differently).


# 1.15 13-Apr-1995 cgd

be a bit more careful and explicit with types. (basically a large no-op.)


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.14 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.13 13-May-1994 mycroft

Update to 4.4-Lite networking code, with a few local changes.


# 1.12 14-Feb-1994 mycroft

PARANOID --> DIAGNOSTIC for inexpensive tests.


# 1.11 02-Feb-1994 hpeyerl

Multicast is no longer optional.


# 1.10 29-Jan-1994 brezak

Fix some cases of NOT dealing with m_pkthdr's. This code is still suspect though, at least this fixes some panics.


# 1.9 10-Jan-1994 mycroft

Should compile now with or without `options MULTICAST'.


# 1.8 09-Jan-1994 mycroft

Prototype the rest.


# 1.7 08-Jan-1994 mycroft

More prototypes.


# 1.6 08-Jan-1994 mycroft

Fix some inconsistent spacing; spaces at the end of lines, etc.


# 1.5 18-Dec-1993 mycroft

Canonicalize all #includes.


# 1.4 06-Dec-1993 hpeyerl

multicast support.
>From Chris Maeda, cmaeda@cs.washington.edu
These patches are derived from the IP Multicast patches for BSDI.


Revision tags: magnum-base netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.3 20-May-1993 cgd

branches: 1.3.4;
more rcsid additions and file header cleanups


# 1.2 04-May-1993 cgd

make ip_input recursion checking be for -DPARANOID, and make it panic


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.352 06-Mar-2017 ozaki-r

Make sure icmp_redirect_timeout_q and ip_mtudisc_timeout_q are initialized on bootup

Fix PR kern/52029


# 1.351 17-Feb-2017 ozaki-r

Fix return value


# 1.350 17-Feb-2017 ozaki-r

Protect sysctl_net_inet_ip_pmtudto with icmp_mtx instead of softnet_lock


# 1.349 07-Feb-2017 ozaki-r

Add missing NULL checks for m_get_rcvif


Revision tags: nick-nhusb-base-20170204
# 1.348 24-Jan-2017 ozaki-r

Tweak softnet_lock and NET_MPSAFE

- Don't hold softnet_lock in some functions if NET_MPSAFE
- Add softnet_lock to sysctl_net_inet_icmp_redirtimeout
- Add softnet_lock to expire_upcalls of ip_mroute.c
- Restore softnet_lock for in{,6}_pcbpurgeif{,0} if NET_MPSAFE
- Mark some softnet_lock for future work


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107
# 1.347 12-Dec-2016 ozaki-r

Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.346 08-Dec-2016 ozaki-r

Use psref for ip_rtaddr

ip_rtaddr will be sleepable soon. So use psref instead of pserialize.


# 1.345 08-Dec-2016 ozaki-r

Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.344 18-Oct-2016 ozaki-r

Don't hold global locks if NET_MPSAFE is enabled

If NET_MPSAFE is enabled, don't hold KERNEL_LOCK and softnet_lock in
part of the network stack such as IP forwarding paths. The aim of the
change is to make it easy to test the network stack without the locks
and reduce our local diffs.

By default (i.e., if NET_MPSAFE isn't enabled), the locks are held
as they used to be.

Reviewed by knakahara@


# 1.343 18-Oct-2016 ozaki-r

Avoid double frees of mbuf

May fix one of panicks reported by Tom Ivar Helbekkmo in PR kern/51522


# 1.342 11-Oct-2016 ozaki-r

Fix kernel builds with IFA_STATS


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.341 07-Sep-2016 roy

Disallow input to detached addresses because they are not yet valid.


# 1.340 31-Aug-2016 ozaki-r

Make ipforward_rt and ip6_forward_rt percpu

Sharing one rtcache between CPUs is just a bad idea.

Reviewed by knakahara@


Revision tags: pgoyette-localcount-20160806
# 1.339 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


# 1.338 26-Jul-2016 ozaki-r

Fix downmatch increment


Revision tags: pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.337 08-Jul-2016 ozaki-r

branches: 1.337.2;
CID 1363344: remove dead code

We may need to reconsider a case when m_get_rcvif_psref returns NULL.


# 1.336 07-Jul-2016 ozaki-r

Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.


# 1.335 06-Jul-2016 ozaki-r

Switch the IPv4 address list to pslist(9)

Note that we leave the old list just in case; it seems there are some
kvm(3) users accessing the list. We can remove it later if we confirmed
nobody does actually.


# 1.334 06-Jul-2016 ozaki-r

Add and use pslist(9)-based hashtable for IPv4 addresses

Note that we leave the old hashtable to keep vmstat -H working.


# 1.333 04-Jul-2016 ozaki-r

Separate IP address matching functions

No functional change intended.


# 1.332 30-Jun-2016 ozaki-r

Tidy up goto lables

No functional change.


# 1.331 30-Jun-2016 ozaki-r

Fix error paths

Some error paths did m_put_rcvif_psref twice.


# 1.330 28-Jun-2016 ozaki-r

Add missing NULL checks for m_get_rcvif_psref


# 1.329 10-Jun-2016 ozaki-r

Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319
# 1.328 21-Jan-2016 riastradh

Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.


# 1.327 21-Jan-2016 riastradh

Give proper prototype to ip_output.


# 1.326 08-Jan-2016 knakahara

eliminate ip_input.c and ip6_input.c dependency on gif(4)


Revision tags: nick-nhusb-base-20151226
# 1.325 13-Oct-2015 roy

Include arp.h to restore the sysctl net.inet.ip.dad_count.
Fixes PR kern/49883 thanks to HITOSHI Osada.


Revision tags: nick-nhusb-base-20150921
# 1.324 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.323 07-Aug-2015 ozaki-r

Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.


Revision tags: nick-nhusb-base-20150606
# 1.322 02-May-2015 joerg

Fix !ARP build.


# 1.321 02-May-2015 roy

Add IPv4 address flags IN_IFF_TENTATIVE, IN_IFF_DUPLICATED and
IN_IFF_DETATCHED to mimic the IPv6 address behaviour.
Add SIOCGIFAFLAG_IN ioctl to retrieve the address flag via the
ifreq structure.
Add IPv4 DAD detection via the ARP methods described in RFC 5227.
Add sysctls net.inet.ip.dad_count and net.inet.arp.debug.

Discussed on tech-net@


Revision tags: nick-nhusb-base-20150406
# 1.320 26-Mar-2015 ozaki-r

Tidy up the regular path of ip_forward

No functional change is intended.


Revision tags: netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.319 16-Jun-2014 ozaki-r

branches: 1.319.4;
Add 3rd argument to pktq_create to pass sc

It will be used to pass bridge sc for bridge_forward softint.

ok rmind@


# 1.318 05-Jun-2014 rmind

- Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.


# 1.317 30-May-2014 christos

Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.


# 1.316 29-May-2014 rmind

Make IGMP and multicast group management code MP-safe. Use a read-write
lock to protect the hash table of multicast address records; also, make it
private and eliminate some macros. In the long term, the lookup path ought
to be optimised.


# 1.315 28-May-2014 christos

CID 12164{49,51}: Remove bogus ifp == NULL checks; if ifp was really NULL,
we would have been dead a few lines before the tests.


# 1.314 23-May-2014 rmind

ip_input(), ip_savecontrol(): cache m->m_pkthdr.rcvif in a variable.


# 1.313 23-May-2014 rmind

Make ip_forward() static, there is no need to expose it.


# 1.312 23-May-2014 rmind

Make ip_input() static, there is no need to expose it.


# 1.311 22-May-2014 rmind

- Add in_init() and move some functions, variables and sysctls into in.c
where they belong to. Make some functions and variables static.
- ip_input.c: reduce some #ifdefs, cleanup a little.
- Move some sysctls into ip_flow.c as they belong there.

No functional change.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.310 19-Mar-2014 liamjfoy

branches: 1.310.2;
Remove ipflow_prune and replace with ipflow_reap. ok rmind@


Revision tags: riastradh-drm2-base3
# 1.309 25-Feb-2014 pooka

Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.308 29-Jun-2013 rmind

- Rewrite parts of pfil(9): use array to store hooks and thus be more cache
friendly (there are only few hooks in the system). Make the structures
opaque and the interface more strict.
- Remove PFIL_HOOKS option by making pfil(9) mandatory.


# 1.307 27-Jun-2013 christos

branches: 1.307.2;
flip src/dst


# 1.306 27-Jun-2013 christos

implement IP_PKTINFO and IP_RECVPKTINFO.


# 1.305 08-Jun-2013 rmind

Split IPsec code in ip_input() and ip_forward() into the separate routines
ipsec4_input() and ipsec4_forward(). Tested by christos@.


# 1.304 05-Jun-2013 christos

IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.


Revision tags: agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.303 29-Nov-2012 christos

Add a new sysctl to mark ports as reserved, so that they are not used in
the anonymous or reserved port allocation.


Revision tags: yamt-pagecache-base6
# 1.302 25-Jun-2012 christos

branches: 1.302.2;
rename rfc6056 -> portalgo, requested by yamt


# 1.301 22-Jun-2012 christos

PR/46602: Move the rfc6056 port randomization to the IP layer.


# 1.300 02-Jun-2012 dsl

Add some pre-processor magic to verify that the type of the data item
passed to sysctl_createv() actually matches the declared type for
the item itself.
In the places where the caller specifies a function and a structure
address (typically the 'softc') an explicit (void *) cast is now needed.
Fixes bugs in sys/dev/acpi/asus_acpi.c sys/dev/bluetooth/bcsp.c
sys/kern/vfs_bio.c sys/miscfs/syncfs/sync_subr.c and setting
AcpiGbl_EnableAmlDebugObject.
(mostly passing the address of a uint64_t when typed as CTLTYPE_INT).
I've test built quite a few kernels, but there may be some unfixed MD
fallout. Most likely passing &char[] to char *.
Also add CTLFLAG_UNSIGNED for unsiged decimals - not set yet.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.299 22-Mar-2012 drochner

remove KAME IPSEC, replaced by FAST_IPSEC


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.298 09-Jan-2012 liamjfoy

check against NULL


# 1.297 19-Dec-2011 drochner

rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.296 31-Aug-2011 plunky

branches: 1.296.2; 1.296.6;
NULL does not need a cast


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.295 03-May-2011 dyoung

*_drain() routines may be called with locks held, so instead of doing
any work in *_drain(), set a drain-needed flag. Do the work in the
fasttimo handler.

Contributed by Coyote Point Systems, Inc.


# 1.294 14-Apr-2011 dyoung

In ipintr(), don't overwrite ipintrq.ifq_maxlen with IFQ_MAXLEN.

Initialize ipintrq.ifq_maxlen using IFQ_MAXLEN directly instead of using
the global ipqmaxlen. Get rid of the global ipqmaxlen.

Now it works again to override the maximum IP queue length with, for
example, sysctl -w net.inet.ip.ifq.maxlen=5.


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.293 13-Dec-2010 matt

branches: 1.293.2;
Back out rev that shouldn't have been committed.


# 1.292 11-Dec-2010 matt

Add routines to calculate a checkesum if the driver concludes that the
h/w can't do it.


Revision tags: uebayasi-xip-base4
# 1.291 05-Nov-2010 rmind

ip_randomid: make mechanism MP-safe and more modular.

OK matt@


# 1.290 05-Nov-2010 rmind

ip_reass_packet: finish abstraction; some clean-up.
Discussed some time ago with matt@.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.289 19-Jul-2010 rmind

Abstract IP reassembly into single generic routine - ip_reass_packet().
Make struct ipq private and struct ipqent not visible to userland.
Push ip_len adjustment into reassembly layer.

OK matt@


# 1.288 13-Jul-2010 rmind

Split-off IPv4 re-assembly mechanism into a separate module. Abstract
into ip_reass_init(), ip_reass_lookup(), etc (note: abstraction is not
yet complete). No functional changes to the actual mechanism.

OK matt@


# 1.287 09-Jul-2010 rmind

ip_input: move lookup for fragment queue a little bit further. OK matt@.


Revision tags: uebayasi-xip-base1
# 1.286 01-Apr-2010 tls

As suggested by at least 3 different people (the guilty parties know who
they are) avoid repeated kernel_lock/unlock by using an intrq on the stack.

About 5%-10% better from run to run, on my *very* simpleminded test. Can't
possibly be worse.


# 1.285 31-Mar-2010 tls

Don't hold kernel lock across call to ip_input() -- it blocked *all*
hardware interrupts for the length of time it took for all dequeued
packets to flow up the stack (on multiprocessors only). Initial testing
shows performance impact is minimal -- since this temporary fix actually
means taking/releasing the kernel lock per-packet, that seems
acceptable.

Holding the kernel lock across the ip_input() call duplicated the
exclusion intended to be provided by the socket locks/softnet lock
(same lock, for INET/INET6 sockets) and could mask serious bugs. Several
hours' testing didn't turn any up but I'd be surprised if some don't now
appear.

Damon Permezel noticed the problem. Temporary fix suggested by matt@.


Revision tags: yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.284 16-Sep-2009 pooka

branches: 1.284.2; 1.284.4;
Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.283 17-Jul-2009 minskim

Delete trailing whitespace.


Revision tags: yamt-nfs-mp-base6
# 1.282 16-Jul-2009 minskim

Add the IP_RECVTTL option support.

If the IP_RECVTTL option is enabled on a SOCK_DGRAM socket, the
recvmsg(2) call will return the TTL of the received datagram. The
msg_control field in the msghdr structure points to a buffer that
contains a cmsghdr structure followed by the TTL value.

Modeled after FreeBSD implementation.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.281 18-Apr-2009 tsutsui

Remove extra whitespace added by a stupid tool.
XXX: more in src/sys/arch


# 1.280 15-Apr-2009 elad

Remove a few KAUTH_GENERIC_ISSUSER in favor of more descriptive
alternatives.

Discussed on tech-kern:

http://mail-index.netbsd.org/tech-kern/2009/04/11/msg004798.html

Input from ad@, christos@, dyoung@, tsutsui@.

Okay ad@.


# 1.279 18-Mar-2009 cegger

bcopy -> memcpy


Revision tags: nick-hppapmap-base2
# 1.278 19-Jan-2009 christos

branches: 1.278.2;
Provide compatibility to the old timeval SCM_TIMESTAMP messages.


Revision tags: mjf-devfs2-base
# 1.277 17-Dec-2008 cegger

kill MALLOC and FREE macros.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.276 23-Nov-2008 rmind

ip_input: fix an IPQ "lock" leak. (hi <matt>!)


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4
# 1.275 04-Oct-2008 pooka

branches: 1.275.2; 1.275.4;
POOL_INIT -> pool_init


Revision tags: wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.274 05-Sep-2008 seanb

Wrong route being consulted in one place
in ip_forward() after change to rtcache_*().
Restore previous behaviour.


# 1.273 20-Aug-2008 matt

Make the sysctl routines take out softnet_lock before dealing with
any data structures.

Change inet6ctlerrmap and zeroin6_addr to const.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base
# 1.272 05-May-2008 ad

branches: 1.272.2; 1.272.6;
- Convert hashinit() to use kmem_alloc(). The hash tables can be large
and it's better to not have them in kmem_map.
- Convert a couple of minor items along the way to kmem_alloc().
- Fix some memory leaks.


# 1.271 04-May-2008 thorpej

Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577


# 1.270 02-May-2008 ad

PR kern/38497 Out of memory allocating ksiginfo

Work around: don't acquire softnet_lock in protocol drain routines.


# 1.269 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.268 24-Apr-2008 ad

branches: 1.268.2;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.


# 1.267 23-Apr-2008 thorpej

Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.266 12-Apr-2008 thorpej

branches: 1.266.2;
Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.


# 1.265 09-Apr-2008 thorpej

- ipflow is not used outside ip_flow.c; move its definition there.
- Make ipflow_reap() private to ip_flow.c, and introduce ipflow_prune()
for external callers to use (avoids returning an ipflow * that is never
actually used anyway).


# 1.264 07-Apr-2008 thorpej

Change IP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ipstat structure; old netstat
binaries will continue to work properly.


# 1.263 27-Mar-2008 cube

- Make sure we send a reasonable fragment size when IPSEC is configured.
Otherwise we end up sending a dubious "0" whenever we cannot find a
proper association for the packet.
- Reset sack_newdata along with snd_nxt to avoid improper integer
arithmetics that lead to sending data from an incorrect place in the
stream, making it appear as corrupted.

Patch by Michael Van Elst, based on an analysis by Michael for the IPSEC
stuff and I for the SACK issue.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.262 06-Feb-2008 matt

branches: 1.262.6;
Add a new ip_id generation scheme based on a Fisher-Yates shuffle over a
sliding window. XXX replace use of arc4random RSN.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.261 14-Jan-2008 dyoung

Use rtcache_validate() instead of rtcache_getrt(). Shorten staircase
in in_losing().


Revision tags: vmlocking2-base3 matt-armv6-base
# 1.260 22-Dec-2007 matt

Fix offset calculation.
Make sure that all frags use the same TOS.


# 1.259 21-Dec-2007 matt

Also make sure the first is at 68 bytes long.


# 1.258 21-Dec-2007 matt

Prevent TCP blind data attacks by not allowing non-initial fragments to
start at less than 68 bytes (minimal fragment size).


# 1.257 20-Dec-2007 dyoung

Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.256 26-Nov-2007 yamt

branches: 1.256.2; 1.256.6;
inetctlerrmap: use designated initializer.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.255 09-Nov-2007 kefren

Don't MCLAIM in ipintr() because we do it anyway in ip_input()


Revision tags: jmcneill-base yamt-x86pmap-base4 yamt-x86pmap-base3 yamt-x86pmap-base2 vmlocking-base
# 1.254 02-Oct-2007 dyoung

branches: 1.254.2; 1.254.4;
Delete the unused second argument to ip_stripoptions(), move it
closer to its single caller in if_eon.c, try to move fewer bytes
by moving the IP header forward instead of moving the tail of the
mbuf backward, and use m_adj(9) instead of fiddling directly with
mbuf data members.


Revision tags: yamt-x86pmap-base
# 1.253 11-Sep-2007 degroote

branches: 1.253.2;
In some FAST_IPSEC, spl level is not restored correctly. Fix that.

Spotted by Wolfgang Stukenbrock in pr/36800


Revision tags: nick-csl-alignment-base5
# 1.252 30-Aug-2007 dyoung

Use malloc(9) for sockaddrs instead of pool(9), and remove dom_sa_pool
and dom_sa_len members from struct domain. Pools of fixed-size
objects are too rigid for sockaddr_dls, whose size can vary over
a wide range.

Return sockaddr_dl to its "historical" size. Now that I'm using
malloc(9) instead of pool(9) to allocate sockaddr_dl, I can create
a sockaddr_dl of any size in the kernel, so expanding sockaddr_dl
is useless.

Avoid using sizeof(struct sockaddr_dl) in the kernel.

Introduce sockaddr_dl_alloc() for allocating & initializing an
arbitrary sockaddr_dl on the heap.

Add an argument, the sockaddr length, to sockaddr_alloc(),
sockaddr_copy(), and sockaddr_dl_setaddr().

Constify: LLADDR() -> CLLADDR().

Where the kernel overwrites LLADDR(), use sockaddr_dl_setaddr(),
instead. Used properly, sockaddr_dl_setaddr() will not overrun
the end of the sockaddr.


# 1.251 10-Aug-2007 dyoung

branches: 1.251.2;
Use sockaddr_dl_init().


Revision tags: matt-mips64-base
# 1.250 19-Jul-2007 dyoung

branches: 1.250.4; 1.250.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base yamt-idlelwp-base8 mjf-ufs-trans-base
# 1.249 02-May-2007 dyoung

branches: 1.249.2;
Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


Revision tags: thorpej-atomic-base
# 1.248 25-Mar-2007 liamjfoy

Add net.inet.ip.hashsize to control the IPv4 fast forward hash table size.


# 1.247 24-Mar-2007 liamjfoy

Don't call ip*flow_reap if we're just looking up maxflows


# 1.246 12-Mar-2007 ad

branches: 1.246.2; 1.246.4;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.


# 1.245 05-Mar-2007 liamjfoy

branches: 1.245.2;
Move ipflow_slowtimo from ip_slowtimo and into in_proto.c

ok matt@


# 1.244 04-Mar-2007 christos

Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.243 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.242 29-Jan-2007 dyoung

branches: 1.242.2;
Cosmetic: remove extraneous, non-KNF parentheses. Change a
sizeof(type) to a sizeof(*ptr) so the correctness of the statement
is correct "at a glance" (or so I hope).


# 1.241 22-Dec-2006 ad

ipintr(): check if the queue is empty before looping. Hardly a giant
win, but removed 30% of splnet() calls in one local test.


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.240 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.239 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


# 1.238 06-Dec-2006 dyoung

KNF.


# 1.237 06-Dec-2006 dyoung

KNF.


Revision tags: netbsd-4-0-RC1 netbsd-4-base
# 1.236 16-Nov-2006 christos

branches: 1.236.2; 1.236.4;
__unused removal on arguments; approved by core.


Revision tags: yamt-splraiseipl-base2
# 1.235 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


# 1.234 10-Oct-2006 dogcow

change the MOWNER_INIT define to take two args; fix extant struct mowner
decls to use it. Makes options MBUFTRACE compile again and not whinge about
missing structure declarations. (Also makes initialization consistent.)


# 1.233 05-Oct-2006 tls

Protect calls to pool_put/pool_get that may occur in interrupt context
with spl used to protect other allocations and frees, or datastructure
element insertion and removal, in adjacent code.

It is almost unquestionably the case that some of the spl()/splx() calls
added here are superfluous, but it really seems wrong to see:

s=splfoo();
/* frob data structure */
splx(s);
pool_put(x);

and if we think we need to protect the first operation, then it is hard
to see why we should not think we need to protect the next. "Better
safe than sorry".

It is also almost unquestionably the case that I missed some pool
gets/puts from interrupt context with my strategy for finding these
calls; use of PR_NOWAIT is a strong hint that a pool may be used from
interrupt context but many callers in the kernel pass a "can wait/can't
wait" flag down such that my searches might not have found them. One
notable area that needs to be looked at is pf.

See also:

http://mail-index.netbsd.org/tech-kern/2006/07/19/0003.html
http://mail-index.netbsd.org/tech-kern/2006/07/19/0009.html


# 1.232 19-Sep-2006 elad

Remove ugly (void *) casts from network scope authorization wrapper and
calls to it.

While here, adapt code for system scope listeners to avoid some more
casts (forgotten in previous run).

Update documentation.


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9
# 1.231 13-Sep-2006 elad

branches: 1.231.2;
Don't use KAUTH_RESULT_* where it's not applicable.
Prompted by yamt@.


# 1.230 08-Sep-2006 elad

First take at security model abstraction.

- Add a few scopes to the kernel: system, network, and machdep.

- Add a few more actions/sub-actions (requests), and start using them as
opposed to the KAUTH_GENERIC_ISSUSER place-holders.

- Introduce a basic set of listeners that implement our "traditional"
security model, called "bsd44". This is the default (and only) model we
have at the moment.

- Update all relevant documentation.

- Add some code and docs to help folks who want to actually use this stuff:

* There's a sample overlay model, sitting on-top of "bsd44", for
fast experimenting with tweaking just a subset of an existing model.

This is pretty cool because it's *really* straightforward to do stuff
you had to use ugly hacks for until now...

* And of course, documentation describing how to do the above for quick
reference, including code samples.

All of these changes were tested for regressions using a Python-based
testsuite that will be (I hope) available soon via pkgsrc. Information
about the tests, and how to write new ones, can be found on:

http://kauth.linbsd.org/kauthwiki

NOTE FOR DEVELOPERS: *PLEASE* don't add any code that does any of the
following:

- Uses a KAUTH_GENERIC_ISSUSER kauth(9) request,
- Checks 'securelevel' directly,
- Checks a uid/gid directly.

(or if you feel you have to, contact me first)

This is still work in progress; It's far from being done, but now it'll
be a lot easier.

Relevant mailing list threads:

http://mail-index.netbsd.org/tech-security/2006/01/25/0011.html
http://mail-index.netbsd.org/tech-security/2006/03/24/0001.html
http://mail-index.netbsd.org/tech-security/2006/04/18/0000.html
http://mail-index.netbsd.org/tech-security/2006/05/15/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/01/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/25/0000.html

Many thanks to YAMAMOTO Takashi, Matt Thomas, and Christos Zoulas for help
stablizing kauth(9).

Full credit for the regression tests, making sure these changes didn't break
anything, goes to Matt Fleming and Jaime Fournier.

Happy birthday Randi! :)


Revision tags: yamt-pdpolicy-base8 rpaulo-netinet-merge-pcb-base
# 1.229 30-Aug-2006 christos

branches: 1.229.2;
fix initializer


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.228 30-Jul-2006 elad

ugh.. more stuff that's overdue and should not be in 4.0: remove the
sysctl(9) flags CTLFLAG_READONLY[12]. luckily they're not documented
so it's only half regression.

only two knobs used them; proc.curproc.corename (check added in the
existing handler; its CTLFLAG_ANYWRITE, yay) and net.inet.ip.forwsrcrt,
that got its own handler now too.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base
# 1.227 07-Jun-2006 kardel

merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base simonb-timecounters-base
# 1.226 08-May-2006 liamjfoy

branches: 1.226.2;
#if -> #ifdef

ok christos


# 1.225 15-Apr-2006 christos

Coverity CID 1134: Protect against NULL deref.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.224 18-Feb-2006 joerg

branches: 1.224.2; 1.224.4; 1.224.6;
Print the source and destination IP in ip_forward's DIAGNOSTIC code
with inet_ntoa, making it more human friendly.

From Liam J. Foy in private mail.


# 1.223 24-Dec-2005 perry

branches: 1.223.2; 1.223.4; 1.223.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.222 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.221 01-Nov-2005 christos

Don't decrement the ttl, until we are sure that we can forward this packet.
Before if there was no route, we would call icmp_error with a datagram
packet that has an incorrect checksum. (From Liam Foy)


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base
# 1.220 23-Oct-2005 christos

No need to pass an interface when only the mtu is needed. From OpenBSD via
Liam Foy.


Revision tags: yamt-vop-base
# 1.219 05-Aug-2005 elad

branches: 1.219.2;
Add sysctls for IP, ICMP, TCP, and UDP statistics.


# 1.218 28-Jun-2005 seanb

branches: 1.218.2;
- Return ICMP_UNREACH_NET when no route found as per
section 4.3.3.1 of rfc1812.


# 1.217 09-Jun-2005 atatat

Properly fix the constipated lossage wrt -Wcast-qual and the sysctl
code. I know it's not the prettiest code, but it seems to work rather
well in spite of itself.


# 1.216 01-Jun-2005 blymn

Unconstify rnode to prevent compile error when GATEWAY option set.


Revision tags: kent-audio2-base
# 1.215 29-Apr-2005 yamt

move decl of inetsw to its own header to avoid array of incomplete type.
found by gcc4. reported by Adam Ciarcinski.


# 1.214 18-Apr-2005 yamt

fix problems related to loopback interface checksum omission. PR/29971.

- for ipv4, defer decision to ip layer as h/w checksum offloading does
so that it can check the actual interface the packet is going to.
- for ipv6, disable it.
(maybe will be revisited when it implements h/w checksum offloading.)

ok'ed by Jason Thorpe.


# 1.213 29-Mar-2005 yamt

ip_reass: clear stale csum_flags.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base
# 1.212 26-Feb-2005 perry

branches: 1.212.2;
nuke trailing whitespace


Revision tags: yamt-km-base2
# 1.211 03-Feb-2005 perry

ANSIfy function declarations


# 1.210 02-Feb-2005 perry

de-__P -- will ANSIfy .c files later.


Revision tags: yamt-km-base
# 1.209 24-Jan-2005 matt

branches: 1.209.2;
Add IFNET_FOREACH and IFADDR_FOREACH macros and start using them.


Revision tags: kent-audio1-beforemerge
# 1.208 19-Dec-2004 christos

branches: 1.208.2;
yamt's changes seem to fix all the checksumming issues. Turn the loopback
checksums back off so we can make sure that everything works.


# 1.207 17-Dec-2004 christos

Turn checksumming on loopback back on until we fix the bugs in it.
Connect over tcp on the loopback is broken:

4729 amq 0.000007 CALL connect(4,0x804f2a0,0x1c)
4729 amq 75.007420 RET connect -1 errno 60 Connection timed out


# 1.206 15-Dec-2004 thorpej

Don't perform checksums on loopback interfaces. They can be reenabled with
the net.inet.*.do_loopback_cksum sysctl.

Approved by: groo


Revision tags: kent-audio1-base
# 1.205 06-Oct-2004 darrenr

Add a comment to document what setting "srcrt" is really on about in ipintr()


# 1.204 29-Sep-2004 christos

PR/27081: Sean Boudreau: ip_input() bad csum count not incremented on sw csum


Revision tags: BEFORE-IPF413
# 1.203 25-May-2004 atatat

Sysctl descriptions under net subtree (net.key not done)


# 1.202 02-May-2004 darrenr

at line 543, we do a pullup here of hlen bytes into the mbuf,
so these later ones are superfluous.


# 1.201 01-May-2004 matt

Use EVCNT_ATTACH_STATIC{,2}


# 1.200 25-Apr-2004 simonb

Initialise (most) pools from a link set instead of explicit calls
to pool_init. Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.

Convert struct session, ucred and lockf to pools.


# 1.199 22-Apr-2004 matt

Constify protosw arrays. This can reduce the kernel .data section by
over 4K (if all the network protocols) are loaded.


# 1.198 01-Apr-2004 matt

In ip_reass_ttl_descr, make i signed since it's compared to >= 0


Revision tags: netbsd-2-0-base BEFORE-IPF411
# 1.197 24-Mar-2004 atatat

branches: 1.197.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.


# 1.196 15-Jan-2004 itojun

correct typo in 1.94 -> 1.95. pointed out by Shiva Shenoy


# 1.195 14-Dec-2003 thorpej

Fix syntax errors in CHECK_NMBCLUSTER_PARAMS().


# 1.194 14-Dec-2003 jonathan

Second part of hashed IP_reassembly changes:

When under pressure for mbufs or we have too many fragments in the IP
reassembly queue, drop half of all fragments. This multiplicative-drop
strategy ensures we return to a healthy state, even under borderline
denial-of-service from extremely lossy NFS-over-UDP peers.
The multiplicative-drop phase currently drops 50% of fragments, but
has pre-placed support for implementing drop-fractions other than 50%

The threshhold for the `drop-half' phase is the new variable,
ip_maxfrags which is calculated as nmbclusters/4.

ip_input.c now keeps ip_nmbclusters, a cached copy of nmbclusters.
Before using limits derived from nmbclusters, we check if nmbclusters
and ip_nmclusters are equal. If not, we recompute Ip parameters
derived from nmbclusters. Based on a suggestion by Jason Thorpe.
ip_maxfrags is currently auto-recalcuated.

The counters ip_nfrags and ip_nfragpacketsr are now declared static
and uninitialized (bss), to discourage tampering with them.


# 1.193 12-Dec-2003 scw

Make fast-ipsec and ipflow (Fast Forwarding) interoperate.

The idea is that we only clear M_CANFASTFWD if an SPD exists
for the packet. Otherwise, it's safe to add a fast-forward
cache entry for the route.

To make this work properly, we invalidate the entire ipflow
cache if a fast-ipsec key is added or changed.


# 1.192 08-Dec-2003 jonathan

Add new field ipq_nfrags to struct ipq. Maintain count of fragments
(fragments, not fragmented packets) in each queue entry.
Use ipq_nfrags to maintain a count of total fragments in reassembly queue.


# 1.191 07-Dec-2003 jonathan

KNF: s/unsigned/u_int/, in a couple of places I missed.


# 1.190 06-Dec-2003 jonathan

Replace the single global IP reassembly list/listhead, with a
hashtable of list-heads. Independently re-invented, then reworked to
match similar code in FreeBSD.


# 1.189 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.188 04-Dec-2003 scw

ipflow (IP fast forwarding) is not compatible with FAST_IPSEC either.

XXX: The decision whether or not to fast forward should be made
XXX: dynamically. Using the current approach seriously reduces
XXX: routing performance on gateways with IPsec enabled.


# 1.187 26-Nov-2003 itojun

define RANDOM_IP_ID by default (unifdef -DRANDOM_IP_ID).
one use remains in sys/netipsec, which is kept for freebsd source code compat.


# 1.186 24-Nov-2003 scw

For FAST_IPSEC, ipfilter gets to see wire-format IPsec-encapsulated packets
only. Decapsulated packets bypass ipfilter. This mimics current behaviour
for Kame IPsec.


# 1.185 19-Nov-2003 fvdl

Correct number of arguments to sysctl_rdint.


# 1.184 19-Nov-2003 jonathan

Patch back support for (badly) randomized IP ids, by request:

* Include "opt_inet.h" everywhere IP-ids are generated with ip_newid(),
so the RANDOM_IP_ID option is visible. Also in ip_id(), to ensure
the prototype for ip_randomid() is made visible.

* Add new sysctl to enable randomized IP-ids, provided the kernel was
configured with RANDOM_IP_ID. (The sysctl defaults to zero, and is
a read-only zero if RANDOM_IP_ID is not configured).

Note that the implementation of randomized IP ids is still defective,
and should not be enabled at all (even if configured) without
very careful deliberation. Caveat emptor.


# 1.183 17-Nov-2003 jonathan

Diff to netinet/ip_input.c (restore ip_id, initialize) for ip_id fix:

Revert the (default) ip_id algorithm to the pre-randomid algorithm,
due to demonstrated low-period repeated IDs from the randomized IP_id
code. Consensus is that the low-period repetition (much less than
2^15) is not suitable for general-purpose use.

Allocators of new IPv4 IDs should now call the function ip_newid().
Randomized IP_ids is now a config-time option, "options RANDOM_IP_ID".
ip_newid() can use ip_random-id()_IP_ID if and only if configured
with RANDOM_IP_ID. A sysctl knob should be provided.

This API may be reworked in the near future to support linear ip_id
counters per (src,dst) IP-address pair.


# 1.182 12-Nov-2003 itojun

KNF


# 1.181 11-Nov-2003 jonathan

Change global head-of-local-IP-address list from in_ifaddr to
in_ifaddrhead. Recent changes in struct names caused a namespace
collision in fast-ipsec, which are most cleanly fixed by using
"in_ifaddrhead" as the listhead name.


# 1.180 10-Nov-2003 jonathan

Make per-protocol network input queue stats visible to userland via
sysctl. Add a protocol-independent sysctl handler to show the per-protocol
"struct ifq' statistics. Add IP(v4) specific call to the handler.
Other protocols can show their per-protocol input statistics by
allocating a sysclt node and calling sysctl_ifq() with their own struct ifq *.

As posted to tech-kern plus improvements/cleanup suggested by Andrew Brown.


# 1.179 28-Sep-2003 mycroft

Remove some code that breaks AH tunnels completely. The comment describing
the purpose of this code appears to be on crack -- it's talking about
end-to-end authentication, but the purpose of an AH tunnel is NOT end-to-end
authentication; it's authentication of the tunnel endpoints.

NB: This does not fix the fact that IPsec leaks "packet tags."


# 1.178 06-Sep-2003 itojun

randomize IPv4/v6 fragment ID and IPv6 flowlabel. avoids predictability
of these fields. ip_id.c is from openbsd. ip6_id.c is adapted by kame.


# 1.177 06-Sep-2003 itojun

backout previous, we don't know if arc4random() corrides on reboot.


# 1.176 05-Sep-2003 itojun

initialize fragment ID with arc4random, not by time.tv_sec


# 1.175 22-Aug-2003 itojun

remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.


# 1.174 22-Aug-2003 itojun

change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.


# 1.173 15-Aug-2003 jonathan

(fast-ipsec): Add hooks to pass IPv4 IPsec traffic into fast-ipsec, if
configured with ``options FAST_IPSEC''. Kernels with KAME IPsec or
with no IPsec should work as before.

All calls to ip_output() now always pass an additional compulsory
argument: the inpcb associated with the packet being sent,
or 0 if no inpcb is available.

Fast-ipsec tested with ICMP or UDP over ESP. TCP doesn't work, yet.


# 1.172 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.171 14-Jul-2003 itojun

correct igmp. from love


# 1.170 03-Jul-2003 itojun

minor KNF


# 1.169 30-Jun-2003 itojun

branches: 1.169.2;
do not generate ICMP redirect when packet filter alters ip_dst to an
address that reside on the same link. Cedric Berger convinced me that
it is necessary.


# 1.168 30-Jun-2003 itojun

fix indent


# 1.167 23-Jun-2003 martin

Make sure to include opt_foo.h if a defflag option FOO is used.


# 1.166 15-Jun-2003 matt

Change the way multicasts are kept. They now use a hash table in the same
manner as the ifaddr hash table. By doing this, the mkludge code can go
away. At the same time, keep track of what pcbs are using what ifaddr and
when an address is deleted from an interface, notify/abort all sockets
that have that address as a source. Switch IGMP and multicasts to use pools
for allocation. Fix a number of potential problems in the igmp code where
allocation failures could cause a trap/panic.


# 1.165 11-Apr-2003 christos

PR/991: Darren Reed: Add a sysctl (checkinteface) to implement this. This
implementation is taken from FreeBSD, but we default to off.
XXX: We should really do this on a per ifaddr basis as jason suggested.


# 1.164 26-Feb-2003 matt

Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.163 12-Nov-2002 itojun

remove all entries in rt timer queue on ip_mtudisc change, instead of
destroying the queue.


# 1.162 12-Nov-2002 itojun

ckout previous - doesn't compile


# 1.161 12-Nov-2002 itojun

update ip_mtudisc sysctl change handling.


# 1.160 10-Nov-2002 itojun

always create pmtud timeout queue, as ip_mtudisc can be tweaked via
sysctl at runtime. From lha@stacken.kth.se


# 1.159 02-Nov-2002 perry

/*CONTCOND*/ while (0)'ed macros


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.158 23-Sep-2002 itojun

revert mtudisc_timeout value to the old one if update falis


# 1.157 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame.


# 1.156 11-Sep-2002 itojun

correct signedness mixup in pointer passing. sync w/kame


Revision tags: gehenna-devsw-base
# 1.155 14-Aug-2002 itojun

avoid swapping endian of ip_len and ip_off on mbuf, to meet with M_LEADINGSPACE
optimization made last year. should solve PR 17867 and 10195.

IP_HDRINCL behavior of raw ip socket is kept unchanged. we may want to
provide IP_HDRINCL variant that does not swap endian.


# 1.154 30-Jun-2002 thorpej

Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).


# 1.153 13-Jun-2002 itojun

set IPv4 parameter to modern value.
- turn on path MTU discovery (previous: turned off)
- ICMPv4 redirect entry timeout = 600 sec (previous: never timeout)


# 1.152 09-Jun-2002 itojun

whitespace


# 1.151 07-Jun-2002 itojun

look at rmx_mtu on IPsec tunnel MTU computation.
From: David Waitzman <djw@bbn.com>


Revision tags: netbsd-1-6-base
# 1.150 12-May-2002 matt

branches: 1.150.2; 1.150.4;
Eliminate commons.


# 1.149 12-May-2002 wiz

Spelling fixes, from Sergey Svishchev in kern/16650.


# 1.148 07-May-2002 matt

Change struct ipqe to use TAILQ's instead of LIST's (primarily for TCP's
benefit currently). Rework tcp_reass code to optimize the 4 most likely causes
of out-of-order packets: first OoO pkt, next OoO pkt in seq, OoO pkt is part
of new chuck of OoO packets, and the OoO pkt fills the first hole. Add evcnts
to instrument tcp_reass (enabled by the options TCP_REASS_COUNTERS). This is
part 1/2 of tcp_reass changes.


# 1.147 18-Apr-2002 matt

Change test for M_EXT to M_READONLY for MROUTING. We only need to to do
a pullup if we aren't allowed to modify the packet.


Revision tags: eeh-devprop-base newlock-base
# 1.146 08-Mar-2002 thorpej

Pool deals fairly well with physical memory shortage, but it doesn't
deal with shortages of the VM maps where the backing pages are mapped
(usually kmem_map). Try to deal with this:

* Group all information about the backend allocator for a pool in a
separate structure. The pool references this structure, rather than
the individual fields.
* Change the pool_init() API accordingly, and adjust all callers.
* Link all pools using the same backend allocator on a list.
* The backend allocator is responsible for waiting for physical memory
to become available, but will still fail if it cannot callocate KVA
space for the pages. If this happens, carefully drain all pools using
the same backend allocator, so that some KVA space can be freed.
* Change pool_reclaim() to indicate if it actually succeeded in freeing
some pages, and use that information to make draining easier and more
efficient.
* Get rid of PR_URGENT. There was only one use of it, and it could be
dealt with by the caller.

From art@openbsd.org.


Revision tags: ifpoll-base
# 1.145 25-Feb-2002 itojun

correctly enforce ipsec policy check on forwarding case.
From: Greg Troxel <gdt@ir.bbn.com>, Bill Chiarchiaro <wjc@work.cleartech.com>


# 1.144 24-Feb-2002 martin

Clear M_BCAST and M_MCAST on outgoing mbufs.
Don't copy ttl from the inner packet to the encapsulating packet. Make
the outer ttl sysctl'able. This should close PR 14269 from Jasper Wallace
(change partly from there) and it makes traceroute work over gre tunnels.


# 1.143 21-Feb-2002 itojun

suppress source quence message, based on router-req RFC (also could be abused
as DoS traffic generator). from kjc/kame


# 1.142 28-Nov-2001 darrenr

recompute hlen after calling pfil_run_hooks() in case ip_hl was changed.


# 1.141 13-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-mips-cache-base
# 1.140 04-Nov-2001 matt

Convert netinet to not use the internal <sys/queue.h> field names
but instead the access macros. Use the FOREACH macros where appropriate.


# 1.139 04-Nov-2001 matt

Change a few variable/tables to const since they are read-only.


# 1.138 29-Oct-2001 simonb

Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.137 17-Sep-2001 thorpej

branches: 1.137.2;
Split the pre-computed ifnet checksum flags into Tx and Rx directions.
Add capabilities bits that indicate an interface can only perform
in-bound TCPv4 or UDPv4 checksums. There is at least one Gig-E chip
for which this is true (Level One LXT-1001), and this is also the
case for the Intel i82559 10/100 Ethernet chips.


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.136 06-Aug-2001 itojun

branches: 1.136.2;
cache IPsec policy on in6?pcb. most of the lookup operations can be bypassed,
especially when it is a connected SOCK_STREAM in6?pcb. sync with kame.


# 1.135 02-Jun-2001 thorpej

branches: 1.135.2;
Implement support for IP/TCP/UDP checksum offloading provided by
network interfaces. This works by pre-computing the pseudo-header
checksum and caching it, delaying the actual checksum to ip_output()
if the hardware cannot perform the sum for us. In-bound checksums
can either be fully-checked by hardware, or summed up for final
verification by software. This method was modeled after how this
is done in FreeBSD, although the code is significantly different in
most places.

We don't delay checksums for IPv6/TCP, but we do take advantage of the
cached pseudo-header checksum.

Note: hardware-assisted checksumming defaults to "off". It is
enabled with ifconfig(8). See the manual page for details.

Implement hardware-assisted checksumming on the DP83820 Gigabit Ethernet,
3c90xB/3c90xC 10/100 Ethernet, and Alteon Tigon/Tigon2 Gigabit Ethernet.


# 1.134 21-May-2001 lukem

fix spelo in comment


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.133 16-Apr-2001 itojun

give a default value to net.inet.ip.maxfragpackets, to protect us from
"lots of fragmented packets" DoS attack.

the current default value is derived from ipv6 counterpart, which is
a magical value "200". it should be enough for normal systems, not sure
if it is enough when you take hundreds of thousands of tcp connections on
your system. if you have proposal for a better value with concrete reasons,
let me know.


# 1.132 13-Apr-2001 thorpej

Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.


# 1.131 27-Mar-2001 itojun

net.inet.ip.maxfragpackets defines the maximum size of ip reass queue
(prevents fragment flood from chewing up mbuf memory space).
derived from KAME net.inet6.ip6.maxfragpackets.


# 1.130 02-Mar-2001 itojun

branches: 1.130.2;
increase ipstat.ips_badaddr if the packet fails to pass address checks.


# 1.129 02-Mar-2001 itojun

reject packets with 127/8 on IPv4 src/dst, they must not appear on wire
(RFC1122). torture-tests will be welcomed.
XXX do we want to check source routing headers as well?


# 1.128 01-Mar-2001 itojun

make sure to enforce inbound ipsec policy checking, for any protocols on top
of ip (check it when final header is visited). sync with kame.
XXX kame team will need to re-check policy engine code


# 1.127 24-Jan-2001 itojun

- record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation


# 1.126 28-Dec-2000 thorpej

Back out the sledgehammer damage applied by wiz while I was out for
the holiday.


# 1.125 25-Dec-2000 wiz

Back out previous change. It causes NAT to fail, and was CLEARLY
NOT TESTED before it was committed.


# 1.124 22-Dec-2000 thorpej

Slight adjustment to how pfil_head's are registered. Instead of a
"key" and a "dlt", use a "type" (PFIL_TYPE_{AF,IFNET} for now) and
a val/ptr appropriate for that type. This allows for more future
flexibility with the pfil_hook mechanism.


# 1.123 14-Dec-2000 thorpej

Add ALTQ glue. XXX Temporary until ALTQ is changed to use a pfil hook.


# 1.122 24-Nov-2000 itojun

IFA_STATS stability (not complete); don't touch ip if it is NULL.


# 1.121 11-Nov-2000 thorpej

Restructure the PFIL_HOOKS mechanism a bit:
- All packets are passed to PFIL_HOOKS as they come off the wire, i.e.
fields in protocol headers in network order, etc.
- Allow for multiple hooks to be registered, using a "key" and a "dlt".
The "dlt" is a BPF data link type, indicating what type of header is
present.
- INET and INET6 register with key == AF_INET or AF_INET6, and
dlt == DLT_RAW.
- PFIL_HOOKS now take an argument for the filter hook, and mbuf **,
an ifnet *, and a direction (PFIL_IN or PFIL_OUT), thus making them
less IP (really, IP Filter) centric.

Maintain compatibility with IP Filter by adding wrapper functions for
IP Filter.


# 1.120 08-Nov-2000 ad

Update for hashinit() change.


# 1.119 13-Oct-2000 itojun

make sure we don't share external mbuf between m and mcopy, in ip_forward().
should solve PR 11201.


# 1.118 26-Aug-2000 itojun

make sure anonport{min,max} is not negative number


# 1.117 25-Aug-2000 tron

Add new sysctl variables "net.inet.ip.lowportmin" and
"net.inet.ip.lowportmax" which can be used to the set minimum
and maximum port number assigned to sockets using
IP_PORTRANGE_LOW.


# 1.116 06-Jul-2000 itojun

remove unnecessary #include <netkey/key_debug.h>. from kame.


# 1.115 28-Jun-2000 mrg

<vm/vm.h> -> <uvm/uvm_extern.h>


Revision tags: netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.114 10-May-2000 itojun

branches: 1.114.4;
add missing boundary checks to ip options processing.
correct timestamp option validation (len and ptr upper/lower bound
based on RFC791).
fill "pointer" field for parameter problem in timestamp option processing.


# 1.113 10-May-2000 itojun

correct more out-of-bounds memory access, if cnt == 1 and optlen > 1.


# 1.112 06-May-2000 sommerfeld

Handle large offsets with very small options correctly.


# 1.111 31-Mar-2000 jdolecek

Slighly improve previous - only include <netinet/ip_mroute.h> if MROUTING
is defined.


# 1.110 31-Mar-2000 jdolecek

include <netinet/ip_mroute.h> for ip_mforward() - needed after
last duplicate prototype sweep (prototype for ip_mforward() used to be in <netinet/ip_var.h>)


# 1.109 30-Mar-2000 augustss

Remove register declarations.


# 1.108 30-Mar-2000 simonb

Delete uninitialised declaration of ip_defttl - there's an initialised
decl earlier in this file.


# 1.107 10-Mar-2000 thorpej

Back out previous, and adjust a comment.


# 1.106 07-Mar-2000 thorpej

Back out part of 1.104 which isn't actually needed.


# 1.105 03-Mar-2000 itojun

remove unnecessary ttl initialization which I mistakingly bringed in
during KAME merge (this is part of WIDE's expeirmental reass code...)
NetBSD PR: 9412
From: Wolfgang Rupprecht <wolfgang@wsrcc.com>
Fix from: ho@crt.se
itojun was notified from: theo


# 1.104 02-Mar-2000 thorpej

Avoid a bug in GCC which manifests itself when processing unaligned
IP options. Problem pointed out by Matt Hargett and Erik Fair, analyzed
by me.


# 1.103 01-Mar-2000 itojun

introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)


# 1.102 20-Feb-2000 darrenr

pass "struct pfil_head *" to pfil_add_hook and pfil_remove hook rather
than "struct protosw *".


# 1.101 17-Feb-2000 darrenr

Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.


# 1.100 16-Feb-2000 itojun

- if ip_dst matches address on !IFF_UP interface, and
- there's no match against addresses on IFF_UP interface,
send icmp unreach if I'm router. drop it if I'm host.

Revised version of PR: 9387 from nrt@iij.ad.jp. Discussed with thorpej+nrt.


Revision tags: chs-ubc2-newbase
# 1.99 12-Feb-2000 thorpej

Typo (Thanks, Havard :-)


# 1.98 12-Feb-2000 thorpej

Small cosmetic change, and note a place where a statistic should be
gathered.


# 1.97 11-Feb-2000 itojun

fix in-kernel packet forwarding loop (till TTL becomes 0) when:
- a packet is delivered to an address X,
- and the address X is configured on my !IFF_UP interface
- and ipforwarding=1

NetBSD PR: 9387
From: nrt@iij.ad.jp


# 1.96 01-Feb-2000 thorpej

Use ifatoia() and sintosa() consistently, rather than using home-grown
casting macros intermixed.


# 1.95 31-Jan-2000 itojun

bring in latest KAME ipsec tree.
- interop issues in ipcomp is fixed
- padding type (after ESP) is configurable
- key database memory management (need more fixes)
- policy specification is revisited

XXX m->m_pkthdr.rcvif is still overloaded - hope to fix it soon


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.94 26-Oct-1999 itojun

disable ipflow (IPv4 fast fowarding) when IPsec is configured into the kernel.


# 1.93 17-Oct-1999 sommerfeld

branches: 1.93.2; 1.93.4;
In ip_forward():

Avoid forwarding ip unicast packets which were contained inside
link-level multicast packets; having M_MCAST still set in the packet
header flags will mean that the packet will get multicast to a bogus
group instead of unicast to the next hop.

Malformed packets like this have occasionally been spotted "in the
wild" on a mediaone cable modem segment which also had multiple netbsd
machines running as router/NAT boxes.

Without this, any subnet with multiple netbsd routers receiving all
multicasts will generate a packet storm on receipt of such a
multicast. Note that we already do the same check here for link-level
broadcasts; ip6_forward already does this as well.

Note that multicast forwarding does not go through ip_forward().

Adding some code to if_ethersubr to sanity check link-level
vs. ip-level multicast addresses might also be worthwhile.


Revision tags: chs-ubc2-base
# 1.92 23-Jul-1999 itojun

branches: 1.92.2;
do not include unnecessary include files.


# 1.91 09-Jul-1999 thorpej

defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).


# 1.90 06-Jul-1999 itojun

sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups


# 1.89 01-Jul-1999 itojun

IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.88 26-Jun-1999 sommerfeld

If the new global variable hostzerobroadcast is zero, no longer assume
address zero of each net/subnet is a broadcast address.
(The default value is nonzero, which preserves the current behavior).

This can be set using sysctl; the boot-time default can also be
configured using the HOSTZEROBROADCAST kernel config option.

While we're here, defopt HOSTZEROBROADCAST and SUBNETSARELOCAL


# 1.87 04-May-1999 hwr

It does not make much sense to increase a "output" counter on input.


# 1.86 03-May-1999 thorpej

In INADDR_TO_IA(), skip interfaces which are not up. Revert previous change
to ip_input.c to check the interface status after INADDR_TO_IA().

Fix cooked up by Heiko Rupp and myself.

Fixes PR 7480.


# 1.85 03-May-1999 hwr

Drop packets, that have a Class-D address as source address.
Implements the first half of PR 7003.


# 1.84 07-Apr-1999 proff

tiny KNF change


# 1.83 07-Apr-1999 proff

Prevent reception of packets on downed interfaces (via an up interface).
fixes kern/7327


Revision tags: netbsd-1-4-base
# 1.82 27-Mar-1999 aidan

branches: 1.82.2;
Added per-addr input/output statistics. Currently just support netatalk
and netinet, currently only tested under netinet.

Disabled by default, enabled by compiling the kernel with option
IFA_STATS. Enabling this feature seems to make the ip_output function
take 13% longer than before, which should be OK for people that need
this feature.


# 1.81 26-Mar-1999 proff

security: test for ip_len < ip_hl <<2 and drop packet accordingly


# 1.80 19-Jan-1999 mycroft

There's just no plausible reason to byte-swap ip_id internally. It's opaque.


# 1.79 19-Jan-1999 mycroft

Don't screw with ip_len; just subtract from it where we actually use the
value.


# 1.78 19-Jan-1999 mycroft

Don't overwrite the checksum fields when checking them. There's no reason to
do this, and it screws up ICMP replies.
XXX The returned IP checksum and length are still wrong.


# 1.77 11-Jan-1999 thorpej

Fix byte order and ip_len inconsistencies in ICMP reply code. Also, fix
some formatting and HTONS(foo) vs. foo = htons(foo) inconsistencies.

PR #6602, Darren Reed.


# 1.76 19-Dec-1998 thorpej

Reverse the copyright-notice-swap. It went against existing practice.


# 1.75 18-Dec-1998 thorpej

Add a lock around the IP fragment reassembly queue, to prevent ip_drain()
from corrupting the queue if called from a device's interrupt context.

Should fix PR #5684.


Revision tags: kenh-if-detach-base
# 1.74 13-Nov-1998 thorpej

branches: 1.74.2;
Once a fragmented IP packet has been reassembled, recompute the packet
length before passing it up the stack. From FreeBSD.


Revision tags: chs-ubc-base
# 1.73 08-Oct-1998 thorpej

Use the pool allocator for ipflow entries.


# 1.72 08-Oct-1998 thorpej

Use the pool allocator for ipqent structures.


# 1.71 30-Sep-1998 tls

Switch order of TNF and UCB copyrights so UCB copyright is first; this seems more appropriate since UCB wrote the original code, after all.


# 1.70 09-Sep-1998 thorpej

Make a diagnostic printf more sensible, PR #5951, Heiko W. Rupp.


# 1.69 09-Aug-1998 mrg

defopt PFIL_HOOKS.


Revision tags: eeh-paddr_t-base
# 1.68 17-Jul-1998 sommerfe

Fix PR5508: ipfil cut-through forwarding causes panic


# 1.67 01-Jun-1998 thorpej

Protect the ipflow_reap() call with splsoftnet.


# 1.66 24-May-1998 thorpej

Fix OBOB in IP timestamp option processing, as noted in FreeBSD PR 6738,
from Jennifer Dawn Meyers <jdm@enteract.com>.


# 1.65 04-May-1998 matt

Default IP flow to being enabled. Add a sysctl to control the maximum
number of flows (net.inet.ip.maxflows). If set to 0, will disable fast
path forwarding.


# 1.64 01-May-1998 thorpej

Allow packet filters to prevent a packet from creating a fast-forwarding
flow, by setting the "can fast forward" flag in the packet header, and
giving a chance for filters to clear the flag. If the flag is still
set after the filters have given it a chance, the packet will be used
to create a fast-forward flow entry.


# 1.63 29-Apr-1998 matt

Add support for "fast" forwarding. Add hooks in if_ethersubr.c and
if_fddisubr.c to fastpath IP forwarding. If ip_forward successfully
forwards a packet, it will create a cache (ipflow) entry. ether_input
and fddi_input will first call ipflow_fastforward with the received
packet and if the packet passes enough tests, it will be forwarded (the
ttl is decremented and the cksum is adjusted incrementally).


# 1.62 29-Apr-1998 matt

defopt GATEWAY


# 1.61 29-Apr-1998 kml

change path MTU timeout value to match RFC 1191


# 1.60 29-Apr-1998 kml

Add support for deletion of routes added by path MTU discovery;
uses new generic route timeout code. Add sysctl for timeout period.


# 1.59 19-Mar-1998 mrg

convert pfil(9) in and out lists from <sys/queue.h> LISTs to TAILQs, and
change pfil_add_hook to put output filters at the tail of the queue,
while continuing to place input filters at the head of the queue. update
the two users of these functions, and document these changes.

fixes PR#4593.


# 1.58 15-Feb-1998 tls

Add correct copyright notice for IP address hash change. This code is donated to TNF by the original copyright holder, Panix.


# 1.57 13-Feb-1998 tls

Change list of interface IP addresses to a hash. Improves performance on hosts with a large number of IP addresses significantly.


# 1.56 28-Jan-1998 thorpej

Use offsetof() from libkern.h


# 1.55 12-Jan-1998 scottr

Use option header file for MROUTING


# 1.54 05-Jan-1998 lukem

enhance ephemeral port allocation code:
* support sysctl net.inet.ip.anonportmin (lowest ephemeral port)
and net.inet.ip.anonportmax (highest ephemeral port).
these can't be set to >65535, < IPPORT_RESERVED (unless IPNOPRIVPORTS
is defined), and anonportmin has to be < anonportmax.
* use a cleaner way of only cycling through the available set once;
this will be useful for when a random allocation scheme is used
* define IPPORT_ANON{MIN,MAX} instead of IPPORT_USER{LOW,HIGH}


Revision tags: netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base
# 1.53 18-Oct-1997 kml

branches: 1.53.2;
change sysctl net.inet.icmp.mtudisc to net.inet.ip.mtudisc


# 1.52 17-Oct-1997 thorpej

Allow `subnetsarelocal' to be changed via sysctl.


Revision tags: thorpej-signal-base marc-pcmcia-base
# 1.51 29-Aug-1997 gwr

Tweaks to allow operation with an interface address of 0.0.0.0
(needed for NFS mountroot using BOOTP to get boot parameters)


Revision tags: marc-pcmcia-bp
# 1.50 24-Jun-1997 thorpej

branches: 1.50.4;
Eliminate use of dtom() from the network code, allowing more flexible
use of mbuf external storage and increasing performance (by eliminating
an m_pullup() for clusters in the IP reassembly code).

Changes from Koji Imada <koji@math.human.nagoya-u.ac.jp>, in PR #3628
and #3480, with ever-so-slight integration changes by me.


# 1.49 15-Apr-1997 christos

Move the mtod calls *after* we've made sure that the packet has passed the
filter successfully. Otherwise it can be NULL if the filter blocked it,
and we die. How did this ever work?


Revision tags: is-newarp-before-merge
# 1.48 26-Feb-1997 mrg

allow src-routed packetd by default, per host requirements


# 1.47 25-Feb-1997 cjs

Add net.inet.ip.allowsrcrt option which allows/drops all source
routed packets. This currently defaults to `drop,' but once we
verify that all applications that rely on determining remote IP
addresses for authentication are dropping the connection when they
see a source route option (not just disabling the source route
option), we can turn this back on and conform with the host
requirements.


# 1.46 19-Feb-1997 cjs

Fix bug in sysctl net.inet.ip.forwsrcrt handing: now you can read it
if securelevel > 0. (Thanks, cgd.)


# 1.45 18-Feb-1997 mrg

pseudo-device ipfilter brings in PFIL_HOOKS.


Revision tags: is-newarp-base
# 1.44 11-Jan-1997 thorpej

branches: 1.44.4;
Implement the IP_RECVIF socket option: supply a datagram packet's incoming
interface using a sockaddr_dl in a control mbuf.

Implement SO_TIMESTAMP for IP datagrams.

Move packet information option processing into a generic function
so that they work with multicast UDP and raw IP as well as unicast UDP.

Contributed by Bill Fenner <fenner@parc.xerox.com>.


# 1.43 20-Dec-1996 mrg

in pfil_hooks: always reassign ip after calling hook.


# 1.42 20-Dec-1996 mrg

remove pfil_bad.


# 1.41 25-Oct-1996 thorpej

Before concatenating frags, sanity check the length of the packet. If it's
larger than IP_MAXPACKET, discard it.
Based on a patch from Bill Fenner <fenner@parc.xerox.com>


# 1.40 22-Oct-1996 veego

Fix a panic from the pfil_hooks.


# 1.39 13-Oct-1996 christos

backout previous kprintf changes


# 1.38 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.37 21-Sep-1996 perry

commit fix in pr 2772 -- the IP input code was assuming that the
reserved (must be zero) flag must necessarily be zero. We now define
an IP_RF (by analogy to IP_DF and IP_MF) and mask it out when necessary.


# 1.36 14-Sep-1996 mrg

move the packet filter hooks in to a saner location. while i'm here, rename
PACKET_FILTER to PFIL_HOOKS.


# 1.35 09-Sep-1996 mycroft

Add in_nullhost() and in_hosteq() macros, to hide some protocol
details. Also, fix a bug in TCP wrt SYN+URG packets.


# 1.34 08-Sep-1996 mycroft

Save 68 bytes of the packet for ICMP, not 64. From Laine Stump, PR 2296.


# 1.33 06-Sep-1996 mrg

add packet filter interface code. see pfil(9) for more details. you
need the PACKET_FILTER option to enable this code. currently, ipfilter
version 3.1.1-beta has been converted to use this new interface.


# 1.32 14-Aug-1996 thorpej

Fix some DIAGNOSTIC printf() formats; ntohl() provides a 32-bit quantity,
and should be printed with %x, not %lx.


# 1.31 10-Jul-1996 cgd

print result of ntohl/htonl as a long. (makes -Wformat work on the
Alpha.)


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.30 16-Mar-1996 christos

branches: 1.30.4;
Fix printf format args.


# 1.29 26-Feb-1996 mrg

two more local addr changes, all done differently now (idea from charles)


# 1.28 13-Feb-1996 christos

netinet prototypes


# 1.27 16-Jan-1996 thorpej

Add a net.inet.ip.directed-broadcast sysctl as suggested by
Darren Reed <darrenr@vitruvius.arbld.unimelb.edu.au> in PR #1227.
This change is slightly different than the one submitted by Darren in
that the DIRECTED_BROADCAST compile-time option will behave like it used
to so that existing configurations utilizing it won't have to change.


# 1.26 15-Jan-1996 thorpej

Add net.inet.ip.forwsrcrt: if zero, the system will not forward
source-routed packets. Note this value is protected by kernel security
level; it can only be changed if securelevel < 1.


# 1.25 21-Nov-1995 cgd

make netinet work on systems where pointers and longs are 64 bits
(like the alpha). Biggest problem: IP headers were overlayed with
structure which included pointers, and which therefore didn't overlay
properly on 64-bit machines. Solution: instead of threading pointers
through IP header overlays, add a "queue element" structure to do
the threading, and point it at the ip headers.


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.24 12-Aug-1995 mycroft

splnet --> splsoftnet


# 1.23 12-Jun-1995 mycroft

Change in_pcbnotify*() to take an errno value. Make inetctlerrmap[] an
array on ints, not u_chars.


# 1.22 12-Jun-1995 mycroft

Various cleanup, including:
* Convert several data structures to use queue.h.
* Split in_pcbnotify() into two parts; one for notifying a specific PCB, and
one for notifying all PCBs for a particular foreign address.


# 1.21 07-Jun-1995 mycroft

Remove ip_ifmatrix completely.


# 1.20 04-Jun-1995 mycroft

Don't cast things unnecessarily.


# 1.19 04-Jun-1995 mycroft

Clean up many more casts.


# 1.18 01-Jun-1995 mycroft

Avoid byte-swapping IP addresses at run time.


# 1.17 15-May-1995 cgd

oops; forgot a '{'


# 1.16 14-May-1995 cgd

drop (and record) malformed IP fragments. Fixes pr 1030 (differently).


# 1.15 13-Apr-1995 cgd

be a bit more careful and explicit with types. (basically a large no-op.)


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.14 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.13 13-May-1994 mycroft

Update to 4.4-Lite networking code, with a few local changes.


# 1.12 14-Feb-1994 mycroft

PARANOID --> DIAGNOSTIC for inexpensive tests.


# 1.11 02-Feb-1994 hpeyerl

Multicast is no longer optional.


# 1.10 29-Jan-1994 brezak

Fix some cases of NOT dealing with m_pkthdr's. This code is still suspect though, at least this fixes some panics.


# 1.9 10-Jan-1994 mycroft

Should compile now with or without `options MULTICAST'.


# 1.8 09-Jan-1994 mycroft

Prototype the rest.


# 1.7 08-Jan-1994 mycroft

More prototypes.


# 1.6 08-Jan-1994 mycroft

Fix some inconsistent spacing; spaces at the end of lines, etc.


# 1.5 18-Dec-1993 mycroft

Canonicalize all #includes.


# 1.4 06-Dec-1993 hpeyerl

multicast support.
>From Chris Maeda, cmaeda@cs.washington.edu
These patches are derived from the IP Multicast patches for BSDI.


Revision tags: magnum-base netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.3 20-May-1993 cgd

branches: 1.3.4;
more rcsid additions and file header cleanups


# 1.2 04-May-1993 cgd

make ip_input recursion checking be for -DPARANOID, and make it panic


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.351 17-Feb-2017 ozaki-r

Fix return value


# 1.350 17-Feb-2017 ozaki-r

Protect sysctl_net_inet_ip_pmtudto with icmp_mtx instead of softnet_lock


# 1.349 07-Feb-2017 ozaki-r

Add missing NULL checks for m_get_rcvif


Revision tags: nick-nhusb-base-20170204
# 1.348 24-Jan-2017 ozaki-r

Tweak softnet_lock and NET_MPSAFE

- Don't hold softnet_lock in some functions if NET_MPSAFE
- Add softnet_lock to sysctl_net_inet_icmp_redirtimeout
- Add softnet_lock to expire_upcalls of ip_mroute.c
- Restore softnet_lock for in{,6}_pcbpurgeif{,0} if NET_MPSAFE
- Mark some softnet_lock for future work


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107
# 1.347 12-Dec-2016 ozaki-r

Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.346 08-Dec-2016 ozaki-r

Use psref for ip_rtaddr

ip_rtaddr will be sleepable soon. So use psref instead of pserialize.


# 1.345 08-Dec-2016 ozaki-r

Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.344 18-Oct-2016 ozaki-r

Don't hold global locks if NET_MPSAFE is enabled

If NET_MPSAFE is enabled, don't hold KERNEL_LOCK and softnet_lock in
part of the network stack such as IP forwarding paths. The aim of the
change is to make it easy to test the network stack without the locks
and reduce our local diffs.

By default (i.e., if NET_MPSAFE isn't enabled), the locks are held
as they used to be.

Reviewed by knakahara@


# 1.343 18-Oct-2016 ozaki-r

Avoid double frees of mbuf

May fix one of panicks reported by Tom Ivar Helbekkmo in PR kern/51522


# 1.342 11-Oct-2016 ozaki-r

Fix kernel builds with IFA_STATS


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.341 07-Sep-2016 roy

Disallow input to detached addresses because they are not yet valid.


# 1.340 31-Aug-2016 ozaki-r

Make ipforward_rt and ip6_forward_rt percpu

Sharing one rtcache between CPUs is just a bad idea.

Reviewed by knakahara@


Revision tags: pgoyette-localcount-20160806
# 1.339 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


# 1.338 26-Jul-2016 ozaki-r

Fix downmatch increment


Revision tags: pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.337 08-Jul-2016 ozaki-r

branches: 1.337.2;
CID 1363344: remove dead code

We may need to reconsider a case when m_get_rcvif_psref returns NULL.


# 1.336 07-Jul-2016 ozaki-r

Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.


# 1.335 06-Jul-2016 ozaki-r

Switch the IPv4 address list to pslist(9)

Note that we leave the old list just in case; it seems there are some
kvm(3) users accessing the list. We can remove it later if we confirmed
nobody does actually.


# 1.334 06-Jul-2016 ozaki-r

Add and use pslist(9)-based hashtable for IPv4 addresses

Note that we leave the old hashtable to keep vmstat -H working.


# 1.333 04-Jul-2016 ozaki-r

Separate IP address matching functions

No functional change intended.


# 1.332 30-Jun-2016 ozaki-r

Tidy up goto lables

No functional change.


# 1.331 30-Jun-2016 ozaki-r

Fix error paths

Some error paths did m_put_rcvif_psref twice.


# 1.330 28-Jun-2016 ozaki-r

Add missing NULL checks for m_get_rcvif_psref


# 1.329 10-Jun-2016 ozaki-r

Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319
# 1.328 21-Jan-2016 riastradh

Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.


# 1.327 21-Jan-2016 riastradh

Give proper prototype to ip_output.


# 1.326 08-Jan-2016 knakahara

eliminate ip_input.c and ip6_input.c dependency on gif(4)


Revision tags: nick-nhusb-base-20151226
# 1.325 13-Oct-2015 roy

Include arp.h to restore the sysctl net.inet.ip.dad_count.
Fixes PR kern/49883 thanks to HITOSHI Osada.


Revision tags: nick-nhusb-base-20150921
# 1.324 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.323 07-Aug-2015 ozaki-r

Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.


Revision tags: nick-nhusb-base-20150606
# 1.322 02-May-2015 joerg

Fix !ARP build.


# 1.321 02-May-2015 roy

Add IPv4 address flags IN_IFF_TENTATIVE, IN_IFF_DUPLICATED and
IN_IFF_DETATCHED to mimic the IPv6 address behaviour.
Add SIOCGIFAFLAG_IN ioctl to retrieve the address flag via the
ifreq structure.
Add IPv4 DAD detection via the ARP methods described in RFC 5227.
Add sysctls net.inet.ip.dad_count and net.inet.arp.debug.

Discussed on tech-net@


Revision tags: nick-nhusb-base-20150406
# 1.320 26-Mar-2015 ozaki-r

Tidy up the regular path of ip_forward

No functional change is intended.


Revision tags: netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.319 16-Jun-2014 ozaki-r

branches: 1.319.4;
Add 3rd argument to pktq_create to pass sc

It will be used to pass bridge sc for bridge_forward softint.

ok rmind@


# 1.318 05-Jun-2014 rmind

- Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.


# 1.317 30-May-2014 christos

Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.


# 1.316 29-May-2014 rmind

Make IGMP and multicast group management code MP-safe. Use a read-write
lock to protect the hash table of multicast address records; also, make it
private and eliminate some macros. In the long term, the lookup path ought
to be optimised.


# 1.315 28-May-2014 christos

CID 12164{49,51}: Remove bogus ifp == NULL checks; if ifp was really NULL,
we would have been dead a few lines before the tests.


# 1.314 23-May-2014 rmind

ip_input(), ip_savecontrol(): cache m->m_pkthdr.rcvif in a variable.


# 1.313 23-May-2014 rmind

Make ip_forward() static, there is no need to expose it.


# 1.312 23-May-2014 rmind

Make ip_input() static, there is no need to expose it.


# 1.311 22-May-2014 rmind

- Add in_init() and move some functions, variables and sysctls into in.c
where they belong to. Make some functions and variables static.
- ip_input.c: reduce some #ifdefs, cleanup a little.
- Move some sysctls into ip_flow.c as they belong there.

No functional change.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.310 19-Mar-2014 liamjfoy

branches: 1.310.2;
Remove ipflow_prune and replace with ipflow_reap. ok rmind@


Revision tags: riastradh-drm2-base3
# 1.309 25-Feb-2014 pooka

Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.308 29-Jun-2013 rmind

- Rewrite parts of pfil(9): use array to store hooks and thus be more cache
friendly (there are only few hooks in the system). Make the structures
opaque and the interface more strict.
- Remove PFIL_HOOKS option by making pfil(9) mandatory.


# 1.307 27-Jun-2013 christos

branches: 1.307.2;
flip src/dst


# 1.306 27-Jun-2013 christos

implement IP_PKTINFO and IP_RECVPKTINFO.


# 1.305 08-Jun-2013 rmind

Split IPsec code in ip_input() and ip_forward() into the separate routines
ipsec4_input() and ipsec4_forward(). Tested by christos@.


# 1.304 05-Jun-2013 christos

IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.


Revision tags: agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.303 29-Nov-2012 christos

Add a new sysctl to mark ports as reserved, so that they are not used in
the anonymous or reserved port allocation.


Revision tags: yamt-pagecache-base6
# 1.302 25-Jun-2012 christos

branches: 1.302.2;
rename rfc6056 -> portalgo, requested by yamt


# 1.301 22-Jun-2012 christos

PR/46602: Move the rfc6056 port randomization to the IP layer.


# 1.300 02-Jun-2012 dsl

Add some pre-processor magic to verify that the type of the data item
passed to sysctl_createv() actually matches the declared type for
the item itself.
In the places where the caller specifies a function and a structure
address (typically the 'softc') an explicit (void *) cast is now needed.
Fixes bugs in sys/dev/acpi/asus_acpi.c sys/dev/bluetooth/bcsp.c
sys/kern/vfs_bio.c sys/miscfs/syncfs/sync_subr.c and setting
AcpiGbl_EnableAmlDebugObject.
(mostly passing the address of a uint64_t when typed as CTLTYPE_INT).
I've test built quite a few kernels, but there may be some unfixed MD
fallout. Most likely passing &char[] to char *.
Also add CTLFLAG_UNSIGNED for unsiged decimals - not set yet.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.299 22-Mar-2012 drochner

remove KAME IPSEC, replaced by FAST_IPSEC


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.298 09-Jan-2012 liamjfoy

check against NULL


# 1.297 19-Dec-2011 drochner

rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.296 31-Aug-2011 plunky

branches: 1.296.2; 1.296.6;
NULL does not need a cast


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.295 03-May-2011 dyoung

*_drain() routines may be called with locks held, so instead of doing
any work in *_drain(), set a drain-needed flag. Do the work in the
fasttimo handler.

Contributed by Coyote Point Systems, Inc.


# 1.294 14-Apr-2011 dyoung

In ipintr(), don't overwrite ipintrq.ifq_maxlen with IFQ_MAXLEN.

Initialize ipintrq.ifq_maxlen using IFQ_MAXLEN directly instead of using
the global ipqmaxlen. Get rid of the global ipqmaxlen.

Now it works again to override the maximum IP queue length with, for
example, sysctl -w net.inet.ip.ifq.maxlen=5.


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.293 13-Dec-2010 matt

branches: 1.293.2;
Back out rev that shouldn't have been committed.


# 1.292 11-Dec-2010 matt

Add routines to calculate a checkesum if the driver concludes that the
h/w can't do it.


Revision tags: uebayasi-xip-base4
# 1.291 05-Nov-2010 rmind

ip_randomid: make mechanism MP-safe and more modular.

OK matt@


# 1.290 05-Nov-2010 rmind

ip_reass_packet: finish abstraction; some clean-up.
Discussed some time ago with matt@.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.289 19-Jul-2010 rmind

Abstract IP reassembly into single generic routine - ip_reass_packet().
Make struct ipq private and struct ipqent not visible to userland.
Push ip_len adjustment into reassembly layer.

OK matt@


# 1.288 13-Jul-2010 rmind

Split-off IPv4 re-assembly mechanism into a separate module. Abstract
into ip_reass_init(), ip_reass_lookup(), etc (note: abstraction is not
yet complete). No functional changes to the actual mechanism.

OK matt@


# 1.287 09-Jul-2010 rmind

ip_input: move lookup for fragment queue a little bit further. OK matt@.


Revision tags: uebayasi-xip-base1
# 1.286 01-Apr-2010 tls

As suggested by at least 3 different people (the guilty parties know who
they are) avoid repeated kernel_lock/unlock by using an intrq on the stack.

About 5%-10% better from run to run, on my *very* simpleminded test. Can't
possibly be worse.


# 1.285 31-Mar-2010 tls

Don't hold kernel lock across call to ip_input() -- it blocked *all*
hardware interrupts for the length of time it took for all dequeued
packets to flow up the stack (on multiprocessors only). Initial testing
shows performance impact is minimal -- since this temporary fix actually
means taking/releasing the kernel lock per-packet, that seems
acceptable.

Holding the kernel lock across the ip_input() call duplicated the
exclusion intended to be provided by the socket locks/softnet lock
(same lock, for INET/INET6 sockets) and could mask serious bugs. Several
hours' testing didn't turn any up but I'd be surprised if some don't now
appear.

Damon Permezel noticed the problem. Temporary fix suggested by matt@.


Revision tags: yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.284 16-Sep-2009 pooka

branches: 1.284.2; 1.284.4;
Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.283 17-Jul-2009 minskim

Delete trailing whitespace.


Revision tags: yamt-nfs-mp-base6
# 1.282 16-Jul-2009 minskim

Add the IP_RECVTTL option support.

If the IP_RECVTTL option is enabled on a SOCK_DGRAM socket, the
recvmsg(2) call will return the TTL of the received datagram. The
msg_control field in the msghdr structure points to a buffer that
contains a cmsghdr structure followed by the TTL value.

Modeled after FreeBSD implementation.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.281 18-Apr-2009 tsutsui

Remove extra whitespace added by a stupid tool.
XXX: more in src/sys/arch


# 1.280 15-Apr-2009 elad

Remove a few KAUTH_GENERIC_ISSUSER in favor of more descriptive
alternatives.

Discussed on tech-kern:

http://mail-index.netbsd.org/tech-kern/2009/04/11/msg004798.html

Input from ad@, christos@, dyoung@, tsutsui@.

Okay ad@.


# 1.279 18-Mar-2009 cegger

bcopy -> memcpy


Revision tags: nick-hppapmap-base2
# 1.278 19-Jan-2009 christos

branches: 1.278.2;
Provide compatibility to the old timeval SCM_TIMESTAMP messages.


Revision tags: mjf-devfs2-base
# 1.277 17-Dec-2008 cegger

kill MALLOC and FREE macros.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.276 23-Nov-2008 rmind

ip_input: fix an IPQ "lock" leak. (hi <matt>!)


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4
# 1.275 04-Oct-2008 pooka

branches: 1.275.2; 1.275.4;
POOL_INIT -> pool_init


Revision tags: wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.274 05-Sep-2008 seanb

Wrong route being consulted in one place
in ip_forward() after change to rtcache_*().
Restore previous behaviour.


# 1.273 20-Aug-2008 matt

Make the sysctl routines take out softnet_lock before dealing with
any data structures.

Change inet6ctlerrmap and zeroin6_addr to const.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base
# 1.272 05-May-2008 ad

branches: 1.272.2; 1.272.6;
- Convert hashinit() to use kmem_alloc(). The hash tables can be large
and it's better to not have them in kmem_map.
- Convert a couple of minor items along the way to kmem_alloc().
- Fix some memory leaks.


# 1.271 04-May-2008 thorpej

Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577


# 1.270 02-May-2008 ad

PR kern/38497 Out of memory allocating ksiginfo

Work around: don't acquire softnet_lock in protocol drain routines.


# 1.269 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.268 24-Apr-2008 ad

branches: 1.268.2;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.


# 1.267 23-Apr-2008 thorpej

Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.266 12-Apr-2008 thorpej

branches: 1.266.2;
Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.


# 1.265 09-Apr-2008 thorpej

- ipflow is not used outside ip_flow.c; move its definition there.
- Make ipflow_reap() private to ip_flow.c, and introduce ipflow_prune()
for external callers to use (avoids returning an ipflow * that is never
actually used anyway).


# 1.264 07-Apr-2008 thorpej

Change IP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ipstat structure; old netstat
binaries will continue to work properly.


# 1.263 27-Mar-2008 cube

- Make sure we send a reasonable fragment size when IPSEC is configured.
Otherwise we end up sending a dubious "0" whenever we cannot find a
proper association for the packet.
- Reset sack_newdata along with snd_nxt to avoid improper integer
arithmetics that lead to sending data from an incorrect place in the
stream, making it appear as corrupted.

Patch by Michael Van Elst, based on an analysis by Michael for the IPSEC
stuff and I for the SACK issue.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.262 06-Feb-2008 matt

branches: 1.262.6;
Add a new ip_id generation scheme based on a Fisher-Yates shuffle over a
sliding window. XXX replace use of arc4random RSN.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.261 14-Jan-2008 dyoung

Use rtcache_validate() instead of rtcache_getrt(). Shorten staircase
in in_losing().


Revision tags: vmlocking2-base3 matt-armv6-base
# 1.260 22-Dec-2007 matt

Fix offset calculation.
Make sure that all frags use the same TOS.


# 1.259 21-Dec-2007 matt

Also make sure the first is at 68 bytes long.


# 1.258 21-Dec-2007 matt

Prevent TCP blind data attacks by not allowing non-initial fragments to
start at less than 68 bytes (minimal fragment size).


# 1.257 20-Dec-2007 dyoung

Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.256 26-Nov-2007 yamt

branches: 1.256.2; 1.256.6;
inetctlerrmap: use designated initializer.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.255 09-Nov-2007 kefren

Don't MCLAIM in ipintr() because we do it anyway in ip_input()


Revision tags: jmcneill-base yamt-x86pmap-base4 yamt-x86pmap-base3 yamt-x86pmap-base2 vmlocking-base
# 1.254 02-Oct-2007 dyoung

branches: 1.254.2; 1.254.4;
Delete the unused second argument to ip_stripoptions(), move it
closer to its single caller in if_eon.c, try to move fewer bytes
by moving the IP header forward instead of moving the tail of the
mbuf backward, and use m_adj(9) instead of fiddling directly with
mbuf data members.


Revision tags: yamt-x86pmap-base
# 1.253 11-Sep-2007 degroote

branches: 1.253.2;
In some FAST_IPSEC, spl level is not restored correctly. Fix that.

Spotted by Wolfgang Stukenbrock in pr/36800


Revision tags: nick-csl-alignment-base5
# 1.252 30-Aug-2007 dyoung

Use malloc(9) for sockaddrs instead of pool(9), and remove dom_sa_pool
and dom_sa_len members from struct domain. Pools of fixed-size
objects are too rigid for sockaddr_dls, whose size can vary over
a wide range.

Return sockaddr_dl to its "historical" size. Now that I'm using
malloc(9) instead of pool(9) to allocate sockaddr_dl, I can create
a sockaddr_dl of any size in the kernel, so expanding sockaddr_dl
is useless.

Avoid using sizeof(struct sockaddr_dl) in the kernel.

Introduce sockaddr_dl_alloc() for allocating & initializing an
arbitrary sockaddr_dl on the heap.

Add an argument, the sockaddr length, to sockaddr_alloc(),
sockaddr_copy(), and sockaddr_dl_setaddr().

Constify: LLADDR() -> CLLADDR().

Where the kernel overwrites LLADDR(), use sockaddr_dl_setaddr(),
instead. Used properly, sockaddr_dl_setaddr() will not overrun
the end of the sockaddr.


# 1.251 10-Aug-2007 dyoung

branches: 1.251.2;
Use sockaddr_dl_init().


Revision tags: matt-mips64-base
# 1.250 19-Jul-2007 dyoung

branches: 1.250.4; 1.250.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base yamt-idlelwp-base8 mjf-ufs-trans-base
# 1.249 02-May-2007 dyoung

branches: 1.249.2;
Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


Revision tags: thorpej-atomic-base
# 1.248 25-Mar-2007 liamjfoy

Add net.inet.ip.hashsize to control the IPv4 fast forward hash table size.


# 1.247 24-Mar-2007 liamjfoy

Don't call ip*flow_reap if we're just looking up maxflows


# 1.246 12-Mar-2007 ad

branches: 1.246.2; 1.246.4;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.


# 1.245 05-Mar-2007 liamjfoy

branches: 1.245.2;
Move ipflow_slowtimo from ip_slowtimo and into in_proto.c

ok matt@


# 1.244 04-Mar-2007 christos

Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.243 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.242 29-Jan-2007 dyoung

branches: 1.242.2;
Cosmetic: remove extraneous, non-KNF parentheses. Change a
sizeof(type) to a sizeof(*ptr) so the correctness of the statement
is correct "at a glance" (or so I hope).


# 1.241 22-Dec-2006 ad

ipintr(): check if the queue is empty before looping. Hardly a giant
win, but removed 30% of splnet() calls in one local test.


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.240 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.239 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


# 1.238 06-Dec-2006 dyoung

KNF.


# 1.237 06-Dec-2006 dyoung

KNF.


Revision tags: netbsd-4-0-RC1 netbsd-4-base
# 1.236 16-Nov-2006 christos

branches: 1.236.2; 1.236.4;
__unused removal on arguments; approved by core.


Revision tags: yamt-splraiseipl-base2
# 1.235 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


# 1.234 10-Oct-2006 dogcow

change the MOWNER_INIT define to take two args; fix extant struct mowner
decls to use it. Makes options MBUFTRACE compile again and not whinge about
missing structure declarations. (Also makes initialization consistent.)


# 1.233 05-Oct-2006 tls

Protect calls to pool_put/pool_get that may occur in interrupt context
with spl used to protect other allocations and frees, or datastructure
element insertion and removal, in adjacent code.

It is almost unquestionably the case that some of the spl()/splx() calls
added here are superfluous, but it really seems wrong to see:

s=splfoo();
/* frob data structure */
splx(s);
pool_put(x);

and if we think we need to protect the first operation, then it is hard
to see why we should not think we need to protect the next. "Better
safe than sorry".

It is also almost unquestionably the case that I missed some pool
gets/puts from interrupt context with my strategy for finding these
calls; use of PR_NOWAIT is a strong hint that a pool may be used from
interrupt context but many callers in the kernel pass a "can wait/can't
wait" flag down such that my searches might not have found them. One
notable area that needs to be looked at is pf.

See also:

http://mail-index.netbsd.org/tech-kern/2006/07/19/0003.html
http://mail-index.netbsd.org/tech-kern/2006/07/19/0009.html


# 1.232 19-Sep-2006 elad

Remove ugly (void *) casts from network scope authorization wrapper and
calls to it.

While here, adapt code for system scope listeners to avoid some more
casts (forgotten in previous run).

Update documentation.


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9
# 1.231 13-Sep-2006 elad

branches: 1.231.2;
Don't use KAUTH_RESULT_* where it's not applicable.
Prompted by yamt@.


# 1.230 08-Sep-2006 elad

First take at security model abstraction.

- Add a few scopes to the kernel: system, network, and machdep.

- Add a few more actions/sub-actions (requests), and start using them as
opposed to the KAUTH_GENERIC_ISSUSER place-holders.

- Introduce a basic set of listeners that implement our "traditional"
security model, called "bsd44". This is the default (and only) model we
have at the moment.

- Update all relevant documentation.

- Add some code and docs to help folks who want to actually use this stuff:

* There's a sample overlay model, sitting on-top of "bsd44", for
fast experimenting with tweaking just a subset of an existing model.

This is pretty cool because it's *really* straightforward to do stuff
you had to use ugly hacks for until now...

* And of course, documentation describing how to do the above for quick
reference, including code samples.

All of these changes were tested for regressions using a Python-based
testsuite that will be (I hope) available soon via pkgsrc. Information
about the tests, and how to write new ones, can be found on:

http://kauth.linbsd.org/kauthwiki

NOTE FOR DEVELOPERS: *PLEASE* don't add any code that does any of the
following:

- Uses a KAUTH_GENERIC_ISSUSER kauth(9) request,
- Checks 'securelevel' directly,
- Checks a uid/gid directly.

(or if you feel you have to, contact me first)

This is still work in progress; It's far from being done, but now it'll
be a lot easier.

Relevant mailing list threads:

http://mail-index.netbsd.org/tech-security/2006/01/25/0011.html
http://mail-index.netbsd.org/tech-security/2006/03/24/0001.html
http://mail-index.netbsd.org/tech-security/2006/04/18/0000.html
http://mail-index.netbsd.org/tech-security/2006/05/15/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/01/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/25/0000.html

Many thanks to YAMAMOTO Takashi, Matt Thomas, and Christos Zoulas for help
stablizing kauth(9).

Full credit for the regression tests, making sure these changes didn't break
anything, goes to Matt Fleming and Jaime Fournier.

Happy birthday Randi! :)


Revision tags: yamt-pdpolicy-base8 rpaulo-netinet-merge-pcb-base
# 1.229 30-Aug-2006 christos

branches: 1.229.2;
fix initializer


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.228 30-Jul-2006 elad

ugh.. more stuff that's overdue and should not be in 4.0: remove the
sysctl(9) flags CTLFLAG_READONLY[12]. luckily they're not documented
so it's only half regression.

only two knobs used them; proc.curproc.corename (check added in the
existing handler; its CTLFLAG_ANYWRITE, yay) and net.inet.ip.forwsrcrt,
that got its own handler now too.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base
# 1.227 07-Jun-2006 kardel

merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base simonb-timecounters-base
# 1.226 08-May-2006 liamjfoy

branches: 1.226.2;
#if -> #ifdef

ok christos


# 1.225 15-Apr-2006 christos

Coverity CID 1134: Protect against NULL deref.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.224 18-Feb-2006 joerg

branches: 1.224.2; 1.224.4; 1.224.6;
Print the source and destination IP in ip_forward's DIAGNOSTIC code
with inet_ntoa, making it more human friendly.

From Liam J. Foy in private mail.


# 1.223 24-Dec-2005 perry

branches: 1.223.2; 1.223.4; 1.223.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.222 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.221 01-Nov-2005 christos

Don't decrement the ttl, until we are sure that we can forward this packet.
Before if there was no route, we would call icmp_error with a datagram
packet that has an incorrect checksum. (From Liam Foy)


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base
# 1.220 23-Oct-2005 christos

No need to pass an interface when only the mtu is needed. From OpenBSD via
Liam Foy.


Revision tags: yamt-vop-base
# 1.219 05-Aug-2005 elad

branches: 1.219.2;
Add sysctls for IP, ICMP, TCP, and UDP statistics.


# 1.218 28-Jun-2005 seanb

branches: 1.218.2;
- Return ICMP_UNREACH_NET when no route found as per
section 4.3.3.1 of rfc1812.


# 1.217 09-Jun-2005 atatat

Properly fix the constipated lossage wrt -Wcast-qual and the sysctl
code. I know it's not the prettiest code, but it seems to work rather
well in spite of itself.


# 1.216 01-Jun-2005 blymn

Unconstify rnode to prevent compile error when GATEWAY option set.


Revision tags: kent-audio2-base
# 1.215 29-Apr-2005 yamt

move decl of inetsw to its own header to avoid array of incomplete type.
found by gcc4. reported by Adam Ciarcinski.


# 1.214 18-Apr-2005 yamt

fix problems related to loopback interface checksum omission. PR/29971.

- for ipv4, defer decision to ip layer as h/w checksum offloading does
so that it can check the actual interface the packet is going to.
- for ipv6, disable it.
(maybe will be revisited when it implements h/w checksum offloading.)

ok'ed by Jason Thorpe.


# 1.213 29-Mar-2005 yamt

ip_reass: clear stale csum_flags.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base
# 1.212 26-Feb-2005 perry

branches: 1.212.2;
nuke trailing whitespace


Revision tags: yamt-km-base2
# 1.211 03-Feb-2005 perry

ANSIfy function declarations


# 1.210 02-Feb-2005 perry

de-__P -- will ANSIfy .c files later.


Revision tags: yamt-km-base
# 1.209 24-Jan-2005 matt

branches: 1.209.2;
Add IFNET_FOREACH and IFADDR_FOREACH macros and start using them.


Revision tags: kent-audio1-beforemerge
# 1.208 19-Dec-2004 christos

branches: 1.208.2;
yamt's changes seem to fix all the checksumming issues. Turn the loopback
checksums back off so we can make sure that everything works.


# 1.207 17-Dec-2004 christos

Turn checksumming on loopback back on until we fix the bugs in it.
Connect over tcp on the loopback is broken:

4729 amq 0.000007 CALL connect(4,0x804f2a0,0x1c)
4729 amq 75.007420 RET connect -1 errno 60 Connection timed out


# 1.206 15-Dec-2004 thorpej

Don't perform checksums on loopback interfaces. They can be reenabled with
the net.inet.*.do_loopback_cksum sysctl.

Approved by: groo


Revision tags: kent-audio1-base
# 1.205 06-Oct-2004 darrenr

Add a comment to document what setting "srcrt" is really on about in ipintr()


# 1.204 29-Sep-2004 christos

PR/27081: Sean Boudreau: ip_input() bad csum count not incremented on sw csum


Revision tags: BEFORE-IPF413
# 1.203 25-May-2004 atatat

Sysctl descriptions under net subtree (net.key not done)


# 1.202 02-May-2004 darrenr

at line 543, we do a pullup here of hlen bytes into the mbuf,
so these later ones are superfluous.


# 1.201 01-May-2004 matt

Use EVCNT_ATTACH_STATIC{,2}


# 1.200 25-Apr-2004 simonb

Initialise (most) pools from a link set instead of explicit calls
to pool_init. Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.

Convert struct session, ucred and lockf to pools.


# 1.199 22-Apr-2004 matt

Constify protosw arrays. This can reduce the kernel .data section by
over 4K (if all the network protocols) are loaded.


# 1.198 01-Apr-2004 matt

In ip_reass_ttl_descr, make i signed since it's compared to >= 0


Revision tags: netbsd-2-0-base BEFORE-IPF411
# 1.197 24-Mar-2004 atatat

branches: 1.197.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.


# 1.196 15-Jan-2004 itojun

correct typo in 1.94 -> 1.95. pointed out by Shiva Shenoy


# 1.195 14-Dec-2003 thorpej

Fix syntax errors in CHECK_NMBCLUSTER_PARAMS().


# 1.194 14-Dec-2003 jonathan

Second part of hashed IP_reassembly changes:

When under pressure for mbufs or we have too many fragments in the IP
reassembly queue, drop half of all fragments. This multiplicative-drop
strategy ensures we return to a healthy state, even under borderline
denial-of-service from extremely lossy NFS-over-UDP peers.
The multiplicative-drop phase currently drops 50% of fragments, but
has pre-placed support for implementing drop-fractions other than 50%

The threshhold for the `drop-half' phase is the new variable,
ip_maxfrags which is calculated as nmbclusters/4.

ip_input.c now keeps ip_nmbclusters, a cached copy of nmbclusters.
Before using limits derived from nmbclusters, we check if nmbclusters
and ip_nmclusters are equal. If not, we recompute Ip parameters
derived from nmbclusters. Based on a suggestion by Jason Thorpe.
ip_maxfrags is currently auto-recalcuated.

The counters ip_nfrags and ip_nfragpacketsr are now declared static
and uninitialized (bss), to discourage tampering with them.


# 1.193 12-Dec-2003 scw

Make fast-ipsec and ipflow (Fast Forwarding) interoperate.

The idea is that we only clear M_CANFASTFWD if an SPD exists
for the packet. Otherwise, it's safe to add a fast-forward
cache entry for the route.

To make this work properly, we invalidate the entire ipflow
cache if a fast-ipsec key is added or changed.


# 1.192 08-Dec-2003 jonathan

Add new field ipq_nfrags to struct ipq. Maintain count of fragments
(fragments, not fragmented packets) in each queue entry.
Use ipq_nfrags to maintain a count of total fragments in reassembly queue.


# 1.191 07-Dec-2003 jonathan

KNF: s/unsigned/u_int/, in a couple of places I missed.


# 1.190 06-Dec-2003 jonathan

Replace the single global IP reassembly list/listhead, with a
hashtable of list-heads. Independently re-invented, then reworked to
match similar code in FreeBSD.


# 1.189 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.188 04-Dec-2003 scw

ipflow (IP fast forwarding) is not compatible with FAST_IPSEC either.

XXX: The decision whether or not to fast forward should be made
XXX: dynamically. Using the current approach seriously reduces
XXX: routing performance on gateways with IPsec enabled.


# 1.187 26-Nov-2003 itojun

define RANDOM_IP_ID by default (unifdef -DRANDOM_IP_ID).
one use remains in sys/netipsec, which is kept for freebsd source code compat.


# 1.186 24-Nov-2003 scw

For FAST_IPSEC, ipfilter gets to see wire-format IPsec-encapsulated packets
only. Decapsulated packets bypass ipfilter. This mimics current behaviour
for Kame IPsec.


# 1.185 19-Nov-2003 fvdl

Correct number of arguments to sysctl_rdint.


# 1.184 19-Nov-2003 jonathan

Patch back support for (badly) randomized IP ids, by request:

* Include "opt_inet.h" everywhere IP-ids are generated with ip_newid(),
so the RANDOM_IP_ID option is visible. Also in ip_id(), to ensure
the prototype for ip_randomid() is made visible.

* Add new sysctl to enable randomized IP-ids, provided the kernel was
configured with RANDOM_IP_ID. (The sysctl defaults to zero, and is
a read-only zero if RANDOM_IP_ID is not configured).

Note that the implementation of randomized IP ids is still defective,
and should not be enabled at all (even if configured) without
very careful deliberation. Caveat emptor.


# 1.183 17-Nov-2003 jonathan

Diff to netinet/ip_input.c (restore ip_id, initialize) for ip_id fix:

Revert the (default) ip_id algorithm to the pre-randomid algorithm,
due to demonstrated low-period repeated IDs from the randomized IP_id
code. Consensus is that the low-period repetition (much less than
2^15) is not suitable for general-purpose use.

Allocators of new IPv4 IDs should now call the function ip_newid().
Randomized IP_ids is now a config-time option, "options RANDOM_IP_ID".
ip_newid() can use ip_random-id()_IP_ID if and only if configured
with RANDOM_IP_ID. A sysctl knob should be provided.

This API may be reworked in the near future to support linear ip_id
counters per (src,dst) IP-address pair.


# 1.182 12-Nov-2003 itojun

KNF


# 1.181 11-Nov-2003 jonathan

Change global head-of-local-IP-address list from in_ifaddr to
in_ifaddrhead. Recent changes in struct names caused a namespace
collision in fast-ipsec, which are most cleanly fixed by using
"in_ifaddrhead" as the listhead name.


# 1.180 10-Nov-2003 jonathan

Make per-protocol network input queue stats visible to userland via
sysctl. Add a protocol-independent sysctl handler to show the per-protocol
"struct ifq' statistics. Add IP(v4) specific call to the handler.
Other protocols can show their per-protocol input statistics by
allocating a sysclt node and calling sysctl_ifq() with their own struct ifq *.

As posted to tech-kern plus improvements/cleanup suggested by Andrew Brown.


# 1.179 28-Sep-2003 mycroft

Remove some code that breaks AH tunnels completely. The comment describing
the purpose of this code appears to be on crack -- it's talking about
end-to-end authentication, but the purpose of an AH tunnel is NOT end-to-end
authentication; it's authentication of the tunnel endpoints.

NB: This does not fix the fact that IPsec leaks "packet tags."


# 1.178 06-Sep-2003 itojun

randomize IPv4/v6 fragment ID and IPv6 flowlabel. avoids predictability
of these fields. ip_id.c is from openbsd. ip6_id.c is adapted by kame.


# 1.177 06-Sep-2003 itojun

backout previous, we don't know if arc4random() corrides on reboot.


# 1.176 05-Sep-2003 itojun

initialize fragment ID with arc4random, not by time.tv_sec


# 1.175 22-Aug-2003 itojun

remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.


# 1.174 22-Aug-2003 itojun

change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.


# 1.173 15-Aug-2003 jonathan

(fast-ipsec): Add hooks to pass IPv4 IPsec traffic into fast-ipsec, if
configured with ``options FAST_IPSEC''. Kernels with KAME IPsec or
with no IPsec should work as before.

All calls to ip_output() now always pass an additional compulsory
argument: the inpcb associated with the packet being sent,
or 0 if no inpcb is available.

Fast-ipsec tested with ICMP or UDP over ESP. TCP doesn't work, yet.


# 1.172 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.171 14-Jul-2003 itojun

correct igmp. from love


# 1.170 03-Jul-2003 itojun

minor KNF


# 1.169 30-Jun-2003 itojun

branches: 1.169.2;
do not generate ICMP redirect when packet filter alters ip_dst to an
address that reside on the same link. Cedric Berger convinced me that
it is necessary.


# 1.168 30-Jun-2003 itojun

fix indent


# 1.167 23-Jun-2003 martin

Make sure to include opt_foo.h if a defflag option FOO is used.


# 1.166 15-Jun-2003 matt

Change the way multicasts are kept. They now use a hash table in the same
manner as the ifaddr hash table. By doing this, the mkludge code can go
away. At the same time, keep track of what pcbs are using what ifaddr and
when an address is deleted from an interface, notify/abort all sockets
that have that address as a source. Switch IGMP and multicasts to use pools
for allocation. Fix a number of potential problems in the igmp code where
allocation failures could cause a trap/panic.


# 1.165 11-Apr-2003 christos

PR/991: Darren Reed: Add a sysctl (checkinteface) to implement this. This
implementation is taken from FreeBSD, but we default to off.
XXX: We should really do this on a per ifaddr basis as jason suggested.


# 1.164 26-Feb-2003 matt

Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.163 12-Nov-2002 itojun

remove all entries in rt timer queue on ip_mtudisc change, instead of
destroying the queue.


# 1.162 12-Nov-2002 itojun

ckout previous - doesn't compile


# 1.161 12-Nov-2002 itojun

update ip_mtudisc sysctl change handling.


# 1.160 10-Nov-2002 itojun

always create pmtud timeout queue, as ip_mtudisc can be tweaked via
sysctl at runtime. From lha@stacken.kth.se


# 1.159 02-Nov-2002 perry

/*CONTCOND*/ while (0)'ed macros


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.158 23-Sep-2002 itojun

revert mtudisc_timeout value to the old one if update falis


# 1.157 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame.


# 1.156 11-Sep-2002 itojun

correct signedness mixup in pointer passing. sync w/kame


Revision tags: gehenna-devsw-base
# 1.155 14-Aug-2002 itojun

avoid swapping endian of ip_len and ip_off on mbuf, to meet with M_LEADINGSPACE
optimization made last year. should solve PR 17867 and 10195.

IP_HDRINCL behavior of raw ip socket is kept unchanged. we may want to
provide IP_HDRINCL variant that does not swap endian.


# 1.154 30-Jun-2002 thorpej

Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).


# 1.153 13-Jun-2002 itojun

set IPv4 parameter to modern value.
- turn on path MTU discovery (previous: turned off)
- ICMPv4 redirect entry timeout = 600 sec (previous: never timeout)


# 1.152 09-Jun-2002 itojun

whitespace


# 1.151 07-Jun-2002 itojun

look at rmx_mtu on IPsec tunnel MTU computation.
From: David Waitzman <djw@bbn.com>


Revision tags: netbsd-1-6-base
# 1.150 12-May-2002 matt

branches: 1.150.2; 1.150.4;
Eliminate commons.


# 1.149 12-May-2002 wiz

Spelling fixes, from Sergey Svishchev in kern/16650.


# 1.148 07-May-2002 matt

Change struct ipqe to use TAILQ's instead of LIST's (primarily for TCP's
benefit currently). Rework tcp_reass code to optimize the 4 most likely causes
of out-of-order packets: first OoO pkt, next OoO pkt in seq, OoO pkt is part
of new chuck of OoO packets, and the OoO pkt fills the first hole. Add evcnts
to instrument tcp_reass (enabled by the options TCP_REASS_COUNTERS). This is
part 1/2 of tcp_reass changes.


# 1.147 18-Apr-2002 matt

Change test for M_EXT to M_READONLY for MROUTING. We only need to to do
a pullup if we aren't allowed to modify the packet.


Revision tags: eeh-devprop-base newlock-base
# 1.146 08-Mar-2002 thorpej

Pool deals fairly well with physical memory shortage, but it doesn't
deal with shortages of the VM maps where the backing pages are mapped
(usually kmem_map). Try to deal with this:

* Group all information about the backend allocator for a pool in a
separate structure. The pool references this structure, rather than
the individual fields.
* Change the pool_init() API accordingly, and adjust all callers.
* Link all pools using the same backend allocator on a list.
* The backend allocator is responsible for waiting for physical memory
to become available, but will still fail if it cannot callocate KVA
space for the pages. If this happens, carefully drain all pools using
the same backend allocator, so that some KVA space can be freed.
* Change pool_reclaim() to indicate if it actually succeeded in freeing
some pages, and use that information to make draining easier and more
efficient.
* Get rid of PR_URGENT. There was only one use of it, and it could be
dealt with by the caller.

From art@openbsd.org.


Revision tags: ifpoll-base
# 1.145 25-Feb-2002 itojun

correctly enforce ipsec policy check on forwarding case.
From: Greg Troxel <gdt@ir.bbn.com>, Bill Chiarchiaro <wjc@work.cleartech.com>


# 1.144 24-Feb-2002 martin

Clear M_BCAST and M_MCAST on outgoing mbufs.
Don't copy ttl from the inner packet to the encapsulating packet. Make
the outer ttl sysctl'able. This should close PR 14269 from Jasper Wallace
(change partly from there) and it makes traceroute work over gre tunnels.


# 1.143 21-Feb-2002 itojun

suppress source quence message, based on router-req RFC (also could be abused
as DoS traffic generator). from kjc/kame


# 1.142 28-Nov-2001 darrenr

recompute hlen after calling pfil_run_hooks() in case ip_hl was changed.


# 1.141 13-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-mips-cache-base
# 1.140 04-Nov-2001 matt

Convert netinet to not use the internal <sys/queue.h> field names
but instead the access macros. Use the FOREACH macros where appropriate.


# 1.139 04-Nov-2001 matt

Change a few variable/tables to const since they are read-only.


# 1.138 29-Oct-2001 simonb

Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.137 17-Sep-2001 thorpej

branches: 1.137.2;
Split the pre-computed ifnet checksum flags into Tx and Rx directions.
Add capabilities bits that indicate an interface can only perform
in-bound TCPv4 or UDPv4 checksums. There is at least one Gig-E chip
for which this is true (Level One LXT-1001), and this is also the
case for the Intel i82559 10/100 Ethernet chips.


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.136 06-Aug-2001 itojun

branches: 1.136.2;
cache IPsec policy on in6?pcb. most of the lookup operations can be bypassed,
especially when it is a connected SOCK_STREAM in6?pcb. sync with kame.


# 1.135 02-Jun-2001 thorpej

branches: 1.135.2;
Implement support for IP/TCP/UDP checksum offloading provided by
network interfaces. This works by pre-computing the pseudo-header
checksum and caching it, delaying the actual checksum to ip_output()
if the hardware cannot perform the sum for us. In-bound checksums
can either be fully-checked by hardware, or summed up for final
verification by software. This method was modeled after how this
is done in FreeBSD, although the code is significantly different in
most places.

We don't delay checksums for IPv6/TCP, but we do take advantage of the
cached pseudo-header checksum.

Note: hardware-assisted checksumming defaults to "off". It is
enabled with ifconfig(8). See the manual page for details.

Implement hardware-assisted checksumming on the DP83820 Gigabit Ethernet,
3c90xB/3c90xC 10/100 Ethernet, and Alteon Tigon/Tigon2 Gigabit Ethernet.


# 1.134 21-May-2001 lukem

fix spelo in comment


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.133 16-Apr-2001 itojun

give a default value to net.inet.ip.maxfragpackets, to protect us from
"lots of fragmented packets" DoS attack.

the current default value is derived from ipv6 counterpart, which is
a magical value "200". it should be enough for normal systems, not sure
if it is enough when you take hundreds of thousands of tcp connections on
your system. if you have proposal for a better value with concrete reasons,
let me know.


# 1.132 13-Apr-2001 thorpej

Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.


# 1.131 27-Mar-2001 itojun

net.inet.ip.maxfragpackets defines the maximum size of ip reass queue
(prevents fragment flood from chewing up mbuf memory space).
derived from KAME net.inet6.ip6.maxfragpackets.


# 1.130 02-Mar-2001 itojun

branches: 1.130.2;
increase ipstat.ips_badaddr if the packet fails to pass address checks.


# 1.129 02-Mar-2001 itojun

reject packets with 127/8 on IPv4 src/dst, they must not appear on wire
(RFC1122). torture-tests will be welcomed.
XXX do we want to check source routing headers as well?


# 1.128 01-Mar-2001 itojun

make sure to enforce inbound ipsec policy checking, for any protocols on top
of ip (check it when final header is visited). sync with kame.
XXX kame team will need to re-check policy engine code


# 1.127 24-Jan-2001 itojun

- record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation


# 1.126 28-Dec-2000 thorpej

Back out the sledgehammer damage applied by wiz while I was out for
the holiday.


# 1.125 25-Dec-2000 wiz

Back out previous change. It causes NAT to fail, and was CLEARLY
NOT TESTED before it was committed.


# 1.124 22-Dec-2000 thorpej

Slight adjustment to how pfil_head's are registered. Instead of a
"key" and a "dlt", use a "type" (PFIL_TYPE_{AF,IFNET} for now) and
a val/ptr appropriate for that type. This allows for more future
flexibility with the pfil_hook mechanism.


# 1.123 14-Dec-2000 thorpej

Add ALTQ glue. XXX Temporary until ALTQ is changed to use a pfil hook.


# 1.122 24-Nov-2000 itojun

IFA_STATS stability (not complete); don't touch ip if it is NULL.


# 1.121 11-Nov-2000 thorpej

Restructure the PFIL_HOOKS mechanism a bit:
- All packets are passed to PFIL_HOOKS as they come off the wire, i.e.
fields in protocol headers in network order, etc.
- Allow for multiple hooks to be registered, using a "key" and a "dlt".
The "dlt" is a BPF data link type, indicating what type of header is
present.
- INET and INET6 register with key == AF_INET or AF_INET6, and
dlt == DLT_RAW.
- PFIL_HOOKS now take an argument for the filter hook, and mbuf **,
an ifnet *, and a direction (PFIL_IN or PFIL_OUT), thus making them
less IP (really, IP Filter) centric.

Maintain compatibility with IP Filter by adding wrapper functions for
IP Filter.


# 1.120 08-Nov-2000 ad

Update for hashinit() change.


# 1.119 13-Oct-2000 itojun

make sure we don't share external mbuf between m and mcopy, in ip_forward().
should solve PR 11201.


# 1.118 26-Aug-2000 itojun

make sure anonport{min,max} is not negative number


# 1.117 25-Aug-2000 tron

Add new sysctl variables "net.inet.ip.lowportmin" and
"net.inet.ip.lowportmax" which can be used to the set minimum
and maximum port number assigned to sockets using
IP_PORTRANGE_LOW.


# 1.116 06-Jul-2000 itojun

remove unnecessary #include <netkey/key_debug.h>. from kame.


# 1.115 28-Jun-2000 mrg

<vm/vm.h> -> <uvm/uvm_extern.h>


Revision tags: netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.114 10-May-2000 itojun

branches: 1.114.4;
add missing boundary checks to ip options processing.
correct timestamp option validation (len and ptr upper/lower bound
based on RFC791).
fill "pointer" field for parameter problem in timestamp option processing.


# 1.113 10-May-2000 itojun

correct more out-of-bounds memory access, if cnt == 1 and optlen > 1.


# 1.112 06-May-2000 sommerfeld

Handle large offsets with very small options correctly.


# 1.111 31-Mar-2000 jdolecek

Slighly improve previous - only include <netinet/ip_mroute.h> if MROUTING
is defined.


# 1.110 31-Mar-2000 jdolecek

include <netinet/ip_mroute.h> for ip_mforward() - needed after
last duplicate prototype sweep (prototype for ip_mforward() used to be in <netinet/ip_var.h>)


# 1.109 30-Mar-2000 augustss

Remove register declarations.


# 1.108 30-Mar-2000 simonb

Delete uninitialised declaration of ip_defttl - there's an initialised
decl earlier in this file.


# 1.107 10-Mar-2000 thorpej

Back out previous, and adjust a comment.


# 1.106 07-Mar-2000 thorpej

Back out part of 1.104 which isn't actually needed.


# 1.105 03-Mar-2000 itojun

remove unnecessary ttl initialization which I mistakingly bringed in
during KAME merge (this is part of WIDE's expeirmental reass code...)
NetBSD PR: 9412
From: Wolfgang Rupprecht <wolfgang@wsrcc.com>
Fix from: ho@crt.se
itojun was notified from: theo


# 1.104 02-Mar-2000 thorpej

Avoid a bug in GCC which manifests itself when processing unaligned
IP options. Problem pointed out by Matt Hargett and Erik Fair, analyzed
by me.


# 1.103 01-Mar-2000 itojun

introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)


# 1.102 20-Feb-2000 darrenr

pass "struct pfil_head *" to pfil_add_hook and pfil_remove hook rather
than "struct protosw *".


# 1.101 17-Feb-2000 darrenr

Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.


# 1.100 16-Feb-2000 itojun

- if ip_dst matches address on !IFF_UP interface, and
- there's no match against addresses on IFF_UP interface,
send icmp unreach if I'm router. drop it if I'm host.

Revised version of PR: 9387 from nrt@iij.ad.jp. Discussed with thorpej+nrt.


Revision tags: chs-ubc2-newbase
# 1.99 12-Feb-2000 thorpej

Typo (Thanks, Havard :-)


# 1.98 12-Feb-2000 thorpej

Small cosmetic change, and note a place where a statistic should be
gathered.


# 1.97 11-Feb-2000 itojun

fix in-kernel packet forwarding loop (till TTL becomes 0) when:
- a packet is delivered to an address X,
- and the address X is configured on my !IFF_UP interface
- and ipforwarding=1

NetBSD PR: 9387
From: nrt@iij.ad.jp


# 1.96 01-Feb-2000 thorpej

Use ifatoia() and sintosa() consistently, rather than using home-grown
casting macros intermixed.


# 1.95 31-Jan-2000 itojun

bring in latest KAME ipsec tree.
- interop issues in ipcomp is fixed
- padding type (after ESP) is configurable
- key database memory management (need more fixes)
- policy specification is revisited

XXX m->m_pkthdr.rcvif is still overloaded - hope to fix it soon


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.94 26-Oct-1999 itojun

disable ipflow (IPv4 fast fowarding) when IPsec is configured into the kernel.


# 1.93 17-Oct-1999 sommerfeld

branches: 1.93.2; 1.93.4;
In ip_forward():

Avoid forwarding ip unicast packets which were contained inside
link-level multicast packets; having M_MCAST still set in the packet
header flags will mean that the packet will get multicast to a bogus
group instead of unicast to the next hop.

Malformed packets like this have occasionally been spotted "in the
wild" on a mediaone cable modem segment which also had multiple netbsd
machines running as router/NAT boxes.

Without this, any subnet with multiple netbsd routers receiving all
multicasts will generate a packet storm on receipt of such a
multicast. Note that we already do the same check here for link-level
broadcasts; ip6_forward already does this as well.

Note that multicast forwarding does not go through ip_forward().

Adding some code to if_ethersubr to sanity check link-level
vs. ip-level multicast addresses might also be worthwhile.


Revision tags: chs-ubc2-base
# 1.92 23-Jul-1999 itojun

branches: 1.92.2;
do not include unnecessary include files.


# 1.91 09-Jul-1999 thorpej

defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).


# 1.90 06-Jul-1999 itojun

sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups


# 1.89 01-Jul-1999 itojun

IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.88 26-Jun-1999 sommerfeld

If the new global variable hostzerobroadcast is zero, no longer assume
address zero of each net/subnet is a broadcast address.
(The default value is nonzero, which preserves the current behavior).

This can be set using sysctl; the boot-time default can also be
configured using the HOSTZEROBROADCAST kernel config option.

While we're here, defopt HOSTZEROBROADCAST and SUBNETSARELOCAL


# 1.87 04-May-1999 hwr

It does not make much sense to increase a "output" counter on input.


# 1.86 03-May-1999 thorpej

In INADDR_TO_IA(), skip interfaces which are not up. Revert previous change
to ip_input.c to check the interface status after INADDR_TO_IA().

Fix cooked up by Heiko Rupp and myself.

Fixes PR 7480.


# 1.85 03-May-1999 hwr

Drop packets, that have a Class-D address as source address.
Implements the first half of PR 7003.


# 1.84 07-Apr-1999 proff

tiny KNF change


# 1.83 07-Apr-1999 proff

Prevent reception of packets on downed interfaces (via an up interface).
fixes kern/7327


Revision tags: netbsd-1-4-base
# 1.82 27-Mar-1999 aidan

branches: 1.82.2;
Added per-addr input/output statistics. Currently just support netatalk
and netinet, currently only tested under netinet.

Disabled by default, enabled by compiling the kernel with option
IFA_STATS. Enabling this feature seems to make the ip_output function
take 13% longer than before, which should be OK for people that need
this feature.


# 1.81 26-Mar-1999 proff

security: test for ip_len < ip_hl <<2 and drop packet accordingly


# 1.80 19-Jan-1999 mycroft

There's just no plausible reason to byte-swap ip_id internally. It's opaque.


# 1.79 19-Jan-1999 mycroft

Don't screw with ip_len; just subtract from it where we actually use the
value.


# 1.78 19-Jan-1999 mycroft

Don't overwrite the checksum fields when checking them. There's no reason to
do this, and it screws up ICMP replies.
XXX The returned IP checksum and length are still wrong.


# 1.77 11-Jan-1999 thorpej

Fix byte order and ip_len inconsistencies in ICMP reply code. Also, fix
some formatting and HTONS(foo) vs. foo = htons(foo) inconsistencies.

PR #6602, Darren Reed.


# 1.76 19-Dec-1998 thorpej

Reverse the copyright-notice-swap. It went against existing practice.


# 1.75 18-Dec-1998 thorpej

Add a lock around the IP fragment reassembly queue, to prevent ip_drain()
from corrupting the queue if called from a device's interrupt context.

Should fix PR #5684.


Revision tags: kenh-if-detach-base
# 1.74 13-Nov-1998 thorpej

branches: 1.74.2;
Once a fragmented IP packet has been reassembled, recompute the packet
length before passing it up the stack. From FreeBSD.


Revision tags: chs-ubc-base
# 1.73 08-Oct-1998 thorpej

Use the pool allocator for ipflow entries.


# 1.72 08-Oct-1998 thorpej

Use the pool allocator for ipqent structures.


# 1.71 30-Sep-1998 tls

Switch order of TNF and UCB copyrights so UCB copyright is first; this seems more appropriate since UCB wrote the original code, after all.


# 1.70 09-Sep-1998 thorpej

Make a diagnostic printf more sensible, PR #5951, Heiko W. Rupp.


# 1.69 09-Aug-1998 mrg

defopt PFIL_HOOKS.


Revision tags: eeh-paddr_t-base
# 1.68 17-Jul-1998 sommerfe

Fix PR5508: ipfil cut-through forwarding causes panic


# 1.67 01-Jun-1998 thorpej

Protect the ipflow_reap() call with splsoftnet.


# 1.66 24-May-1998 thorpej

Fix OBOB in IP timestamp option processing, as noted in FreeBSD PR 6738,
from Jennifer Dawn Meyers <jdm@enteract.com>.


# 1.65 04-May-1998 matt

Default IP flow to being enabled. Add a sysctl to control the maximum
number of flows (net.inet.ip.maxflows). If set to 0, will disable fast
path forwarding.


# 1.64 01-May-1998 thorpej

Allow packet filters to prevent a packet from creating a fast-forwarding
flow, by setting the "can fast forward" flag in the packet header, and
giving a chance for filters to clear the flag. If the flag is still
set after the filters have given it a chance, the packet will be used
to create a fast-forward flow entry.


# 1.63 29-Apr-1998 matt

Add support for "fast" forwarding. Add hooks in if_ethersubr.c and
if_fddisubr.c to fastpath IP forwarding. If ip_forward successfully
forwards a packet, it will create a cache (ipflow) entry. ether_input
and fddi_input will first call ipflow_fastforward with the received
packet and if the packet passes enough tests, it will be forwarded (the
ttl is decremented and the cksum is adjusted incrementally).


# 1.62 29-Apr-1998 matt

defopt GATEWAY


# 1.61 29-Apr-1998 kml

change path MTU timeout value to match RFC 1191


# 1.60 29-Apr-1998 kml

Add support for deletion of routes added by path MTU discovery;
uses new generic route timeout code. Add sysctl for timeout period.


# 1.59 19-Mar-1998 mrg

convert pfil(9) in and out lists from <sys/queue.h> LISTs to TAILQs, and
change pfil_add_hook to put output filters at the tail of the queue,
while continuing to place input filters at the head of the queue. update
the two users of these functions, and document these changes.

fixes PR#4593.


# 1.58 15-Feb-1998 tls

Add correct copyright notice for IP address hash change. This code is donated to TNF by the original copyright holder, Panix.


# 1.57 13-Feb-1998 tls

Change list of interface IP addresses to a hash. Improves performance on hosts with a large number of IP addresses significantly.


# 1.56 28-Jan-1998 thorpej

Use offsetof() from libkern.h


# 1.55 12-Jan-1998 scottr

Use option header file for MROUTING


# 1.54 05-Jan-1998 lukem

enhance ephemeral port allocation code:
* support sysctl net.inet.ip.anonportmin (lowest ephemeral port)
and net.inet.ip.anonportmax (highest ephemeral port).
these can't be set to >65535, < IPPORT_RESERVED (unless IPNOPRIVPORTS
is defined), and anonportmin has to be < anonportmax.
* use a cleaner way of only cycling through the available set once;
this will be useful for when a random allocation scheme is used
* define IPPORT_ANON{MIN,MAX} instead of IPPORT_USER{LOW,HIGH}


Revision tags: netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base
# 1.53 18-Oct-1997 kml

branches: 1.53.2;
change sysctl net.inet.icmp.mtudisc to net.inet.ip.mtudisc


# 1.52 17-Oct-1997 thorpej

Allow `subnetsarelocal' to be changed via sysctl.


Revision tags: thorpej-signal-base marc-pcmcia-base
# 1.51 29-Aug-1997 gwr

Tweaks to allow operation with an interface address of 0.0.0.0
(needed for NFS mountroot using BOOTP to get boot parameters)


Revision tags: marc-pcmcia-bp
# 1.50 24-Jun-1997 thorpej

branches: 1.50.4;
Eliminate use of dtom() from the network code, allowing more flexible
use of mbuf external storage and increasing performance (by eliminating
an m_pullup() for clusters in the IP reassembly code).

Changes from Koji Imada <koji@math.human.nagoya-u.ac.jp>, in PR #3628
and #3480, with ever-so-slight integration changes by me.


# 1.49 15-Apr-1997 christos

Move the mtod calls *after* we've made sure that the packet has passed the
filter successfully. Otherwise it can be NULL if the filter blocked it,
and we die. How did this ever work?


Revision tags: is-newarp-before-merge
# 1.48 26-Feb-1997 mrg

allow src-routed packetd by default, per host requirements


# 1.47 25-Feb-1997 cjs

Add net.inet.ip.allowsrcrt option which allows/drops all source
routed packets. This currently defaults to `drop,' but once we
verify that all applications that rely on determining remote IP
addresses for authentication are dropping the connection when they
see a source route option (not just disabling the source route
option), we can turn this back on and conform with the host
requirements.


# 1.46 19-Feb-1997 cjs

Fix bug in sysctl net.inet.ip.forwsrcrt handing: now you can read it
if securelevel > 0. (Thanks, cgd.)


# 1.45 18-Feb-1997 mrg

pseudo-device ipfilter brings in PFIL_HOOKS.


Revision tags: is-newarp-base
# 1.44 11-Jan-1997 thorpej

branches: 1.44.4;
Implement the IP_RECVIF socket option: supply a datagram packet's incoming
interface using a sockaddr_dl in a control mbuf.

Implement SO_TIMESTAMP for IP datagrams.

Move packet information option processing into a generic function
so that they work with multicast UDP and raw IP as well as unicast UDP.

Contributed by Bill Fenner <fenner@parc.xerox.com>.


# 1.43 20-Dec-1996 mrg

in pfil_hooks: always reassign ip after calling hook.


# 1.42 20-Dec-1996 mrg

remove pfil_bad.


# 1.41 25-Oct-1996 thorpej

Before concatenating frags, sanity check the length of the packet. If it's
larger than IP_MAXPACKET, discard it.
Based on a patch from Bill Fenner <fenner@parc.xerox.com>


# 1.40 22-Oct-1996 veego

Fix a panic from the pfil_hooks.


# 1.39 13-Oct-1996 christos

backout previous kprintf changes


# 1.38 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.37 21-Sep-1996 perry

commit fix in pr 2772 -- the IP input code was assuming that the
reserved (must be zero) flag must necessarily be zero. We now define
an IP_RF (by analogy to IP_DF and IP_MF) and mask it out when necessary.


# 1.36 14-Sep-1996 mrg

move the packet filter hooks in to a saner location. while i'm here, rename
PACKET_FILTER to PFIL_HOOKS.


# 1.35 09-Sep-1996 mycroft

Add in_nullhost() and in_hosteq() macros, to hide some protocol
details. Also, fix a bug in TCP wrt SYN+URG packets.


# 1.34 08-Sep-1996 mycroft

Save 68 bytes of the packet for ICMP, not 64. From Laine Stump, PR 2296.


# 1.33 06-Sep-1996 mrg

add packet filter interface code. see pfil(9) for more details. you
need the PACKET_FILTER option to enable this code. currently, ipfilter
version 3.1.1-beta has been converted to use this new interface.


# 1.32 14-Aug-1996 thorpej

Fix some DIAGNOSTIC printf() formats; ntohl() provides a 32-bit quantity,
and should be printed with %x, not %lx.


# 1.31 10-Jul-1996 cgd

print result of ntohl/htonl as a long. (makes -Wformat work on the
Alpha.)


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.30 16-Mar-1996 christos

branches: 1.30.4;
Fix printf format args.


# 1.29 26-Feb-1996 mrg

two more local addr changes, all done differently now (idea from charles)


# 1.28 13-Feb-1996 christos

netinet prototypes


# 1.27 16-Jan-1996 thorpej

Add a net.inet.ip.directed-broadcast sysctl as suggested by
Darren Reed <darrenr@vitruvius.arbld.unimelb.edu.au> in PR #1227.
This change is slightly different than the one submitted by Darren in
that the DIRECTED_BROADCAST compile-time option will behave like it used
to so that existing configurations utilizing it won't have to change.


# 1.26 15-Jan-1996 thorpej

Add net.inet.ip.forwsrcrt: if zero, the system will not forward
source-routed packets. Note this value is protected by kernel security
level; it can only be changed if securelevel < 1.


# 1.25 21-Nov-1995 cgd

make netinet work on systems where pointers and longs are 64 bits
(like the alpha). Biggest problem: IP headers were overlayed with
structure which included pointers, and which therefore didn't overlay
properly on 64-bit machines. Solution: instead of threading pointers
through IP header overlays, add a "queue element" structure to do
the threading, and point it at the ip headers.


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.24 12-Aug-1995 mycroft

splnet --> splsoftnet


# 1.23 12-Jun-1995 mycroft

Change in_pcbnotify*() to take an errno value. Make inetctlerrmap[] an
array on ints, not u_chars.


# 1.22 12-Jun-1995 mycroft

Various cleanup, including:
* Convert several data structures to use queue.h.
* Split in_pcbnotify() into two parts; one for notifying a specific PCB, and
one for notifying all PCBs for a particular foreign address.


# 1.21 07-Jun-1995 mycroft

Remove ip_ifmatrix completely.


# 1.20 04-Jun-1995 mycroft

Don't cast things unnecessarily.


# 1.19 04-Jun-1995 mycroft

Clean up many more casts.


# 1.18 01-Jun-1995 mycroft

Avoid byte-swapping IP addresses at run time.


# 1.17 15-May-1995 cgd

oops; forgot a '{'


# 1.16 14-May-1995 cgd

drop (and record) malformed IP fragments. Fixes pr 1030 (differently).


# 1.15 13-Apr-1995 cgd

be a bit more careful and explicit with types. (basically a large no-op.)


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.14 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.13 13-May-1994 mycroft

Update to 4.4-Lite networking code, with a few local changes.


# 1.12 14-Feb-1994 mycroft

PARANOID --> DIAGNOSTIC for inexpensive tests.


# 1.11 02-Feb-1994 hpeyerl

Multicast is no longer optional.


# 1.10 29-Jan-1994 brezak

Fix some cases of NOT dealing with m_pkthdr's. This code is still suspect though, at least this fixes some panics.


# 1.9 10-Jan-1994 mycroft

Should compile now with or without `options MULTICAST'.


# 1.8 09-Jan-1994 mycroft

Prototype the rest.


# 1.7 08-Jan-1994 mycroft

More prototypes.


# 1.6 08-Jan-1994 mycroft

Fix some inconsistent spacing; spaces at the end of lines, etc.


# 1.5 18-Dec-1993 mycroft

Canonicalize all #includes.


# 1.4 06-Dec-1993 hpeyerl

multicast support.
>From Chris Maeda, cmaeda@cs.washington.edu
These patches are derived from the IP Multicast patches for BSDI.


Revision tags: magnum-base netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.3 20-May-1993 cgd

branches: 1.3.4;
more rcsid additions and file header cleanups


# 1.2 04-May-1993 cgd

make ip_input recursion checking be for -DPARANOID, and make it panic


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.350 17-Feb-2017 ozaki-r

Protect sysctl_net_inet_ip_pmtudto with icmp_mtx instead of softnet_lock


# 1.349 07-Feb-2017 ozaki-r

Add missing NULL checks for m_get_rcvif


Revision tags: nick-nhusb-base-20170204
# 1.348 24-Jan-2017 ozaki-r

Tweak softnet_lock and NET_MPSAFE

- Don't hold softnet_lock in some functions if NET_MPSAFE
- Add softnet_lock to sysctl_net_inet_icmp_redirtimeout
- Add softnet_lock to expire_upcalls of ip_mroute.c
- Restore softnet_lock for in{,6}_pcbpurgeif{,0} if NET_MPSAFE
- Mark some softnet_lock for future work


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107
# 1.347 12-Dec-2016 ozaki-r

Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.346 08-Dec-2016 ozaki-r

Use psref for ip_rtaddr

ip_rtaddr will be sleepable soon. So use psref instead of pserialize.


# 1.345 08-Dec-2016 ozaki-r

Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.344 18-Oct-2016 ozaki-r

Don't hold global locks if NET_MPSAFE is enabled

If NET_MPSAFE is enabled, don't hold KERNEL_LOCK and softnet_lock in
part of the network stack such as IP forwarding paths. The aim of the
change is to make it easy to test the network stack without the locks
and reduce our local diffs.

By default (i.e., if NET_MPSAFE isn't enabled), the locks are held
as they used to be.

Reviewed by knakahara@


# 1.343 18-Oct-2016 ozaki-r

Avoid double frees of mbuf

May fix one of panicks reported by Tom Ivar Helbekkmo in PR kern/51522


# 1.342 11-Oct-2016 ozaki-r

Fix kernel builds with IFA_STATS


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.341 07-Sep-2016 roy

Disallow input to detached addresses because they are not yet valid.


# 1.340 31-Aug-2016 ozaki-r

Make ipforward_rt and ip6_forward_rt percpu

Sharing one rtcache between CPUs is just a bad idea.

Reviewed by knakahara@


Revision tags: pgoyette-localcount-20160806
# 1.339 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


# 1.338 26-Jul-2016 ozaki-r

Fix downmatch increment


Revision tags: pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.337 08-Jul-2016 ozaki-r

branches: 1.337.2;
CID 1363344: remove dead code

We may need to reconsider a case when m_get_rcvif_psref returns NULL.


# 1.336 07-Jul-2016 ozaki-r

Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.


# 1.335 06-Jul-2016 ozaki-r

Switch the IPv4 address list to pslist(9)

Note that we leave the old list just in case; it seems there are some
kvm(3) users accessing the list. We can remove it later if we confirmed
nobody does actually.


# 1.334 06-Jul-2016 ozaki-r

Add and use pslist(9)-based hashtable for IPv4 addresses

Note that we leave the old hashtable to keep vmstat -H working.


# 1.333 04-Jul-2016 ozaki-r

Separate IP address matching functions

No functional change intended.


# 1.332 30-Jun-2016 ozaki-r

Tidy up goto lables

No functional change.


# 1.331 30-Jun-2016 ozaki-r

Fix error paths

Some error paths did m_put_rcvif_psref twice.


# 1.330 28-Jun-2016 ozaki-r

Add missing NULL checks for m_get_rcvif_psref


# 1.329 10-Jun-2016 ozaki-r

Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319
# 1.328 21-Jan-2016 riastradh

Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.


# 1.327 21-Jan-2016 riastradh

Give proper prototype to ip_output.


# 1.326 08-Jan-2016 knakahara

eliminate ip_input.c and ip6_input.c dependency on gif(4)


Revision tags: nick-nhusb-base-20151226
# 1.325 13-Oct-2015 roy

Include arp.h to restore the sysctl net.inet.ip.dad_count.
Fixes PR kern/49883 thanks to HITOSHI Osada.


Revision tags: nick-nhusb-base-20150921
# 1.324 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.323 07-Aug-2015 ozaki-r

Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.


Revision tags: nick-nhusb-base-20150606
# 1.322 02-May-2015 joerg

Fix !ARP build.


# 1.321 02-May-2015 roy

Add IPv4 address flags IN_IFF_TENTATIVE, IN_IFF_DUPLICATED and
IN_IFF_DETATCHED to mimic the IPv6 address behaviour.
Add SIOCGIFAFLAG_IN ioctl to retrieve the address flag via the
ifreq structure.
Add IPv4 DAD detection via the ARP methods described in RFC 5227.
Add sysctls net.inet.ip.dad_count and net.inet.arp.debug.

Discussed on tech-net@


Revision tags: nick-nhusb-base-20150406
# 1.320 26-Mar-2015 ozaki-r

Tidy up the regular path of ip_forward

No functional change is intended.


Revision tags: netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.319 16-Jun-2014 ozaki-r

branches: 1.319.4;
Add 3rd argument to pktq_create to pass sc

It will be used to pass bridge sc for bridge_forward softint.

ok rmind@


# 1.318 05-Jun-2014 rmind

- Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.


# 1.317 30-May-2014 christos

Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.


# 1.316 29-May-2014 rmind

Make IGMP and multicast group management code MP-safe. Use a read-write
lock to protect the hash table of multicast address records; also, make it
private and eliminate some macros. In the long term, the lookup path ought
to be optimised.


# 1.315 28-May-2014 christos

CID 12164{49,51}: Remove bogus ifp == NULL checks; if ifp was really NULL,
we would have been dead a few lines before the tests.


# 1.314 23-May-2014 rmind

ip_input(), ip_savecontrol(): cache m->m_pkthdr.rcvif in a variable.


# 1.313 23-May-2014 rmind

Make ip_forward() static, there is no need to expose it.


# 1.312 23-May-2014 rmind

Make ip_input() static, there is no need to expose it.


# 1.311 22-May-2014 rmind

- Add in_init() and move some functions, variables and sysctls into in.c
where they belong to. Make some functions and variables static.
- ip_input.c: reduce some #ifdefs, cleanup a little.
- Move some sysctls into ip_flow.c as they belong there.

No functional change.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.310 19-Mar-2014 liamjfoy

branches: 1.310.2;
Remove ipflow_prune and replace with ipflow_reap. ok rmind@


Revision tags: riastradh-drm2-base3
# 1.309 25-Feb-2014 pooka

Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.308 29-Jun-2013 rmind

- Rewrite parts of pfil(9): use array to store hooks and thus be more cache
friendly (there are only few hooks in the system). Make the structures
opaque and the interface more strict.
- Remove PFIL_HOOKS option by making pfil(9) mandatory.


# 1.307 27-Jun-2013 christos

branches: 1.307.2;
flip src/dst


# 1.306 27-Jun-2013 christos

implement IP_PKTINFO and IP_RECVPKTINFO.


# 1.305 08-Jun-2013 rmind

Split IPsec code in ip_input() and ip_forward() into the separate routines
ipsec4_input() and ipsec4_forward(). Tested by christos@.


# 1.304 05-Jun-2013 christos

IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.


Revision tags: agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.303 29-Nov-2012 christos

Add a new sysctl to mark ports as reserved, so that they are not used in
the anonymous or reserved port allocation.


Revision tags: yamt-pagecache-base6
# 1.302 25-Jun-2012 christos

branches: 1.302.2;
rename rfc6056 -> portalgo, requested by yamt


# 1.301 22-Jun-2012 christos

PR/46602: Move the rfc6056 port randomization to the IP layer.


# 1.300 02-Jun-2012 dsl

Add some pre-processor magic to verify that the type of the data item
passed to sysctl_createv() actually matches the declared type for
the item itself.
In the places where the caller specifies a function and a structure
address (typically the 'softc') an explicit (void *) cast is now needed.
Fixes bugs in sys/dev/acpi/asus_acpi.c sys/dev/bluetooth/bcsp.c
sys/kern/vfs_bio.c sys/miscfs/syncfs/sync_subr.c and setting
AcpiGbl_EnableAmlDebugObject.
(mostly passing the address of a uint64_t when typed as CTLTYPE_INT).
I've test built quite a few kernels, but there may be some unfixed MD
fallout. Most likely passing &char[] to char *.
Also add CTLFLAG_UNSIGNED for unsiged decimals - not set yet.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.299 22-Mar-2012 drochner

remove KAME IPSEC, replaced by FAST_IPSEC


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.298 09-Jan-2012 liamjfoy

check against NULL


# 1.297 19-Dec-2011 drochner

rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.296 31-Aug-2011 plunky

branches: 1.296.2; 1.296.6;
NULL does not need a cast


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.295 03-May-2011 dyoung

*_drain() routines may be called with locks held, so instead of doing
any work in *_drain(), set a drain-needed flag. Do the work in the
fasttimo handler.

Contributed by Coyote Point Systems, Inc.


# 1.294 14-Apr-2011 dyoung

In ipintr(), don't overwrite ipintrq.ifq_maxlen with IFQ_MAXLEN.

Initialize ipintrq.ifq_maxlen using IFQ_MAXLEN directly instead of using
the global ipqmaxlen. Get rid of the global ipqmaxlen.

Now it works again to override the maximum IP queue length with, for
example, sysctl -w net.inet.ip.ifq.maxlen=5.


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.293 13-Dec-2010 matt

branches: 1.293.2;
Back out rev that shouldn't have been committed.


# 1.292 11-Dec-2010 matt

Add routines to calculate a checkesum if the driver concludes that the
h/w can't do it.


Revision tags: uebayasi-xip-base4
# 1.291 05-Nov-2010 rmind

ip_randomid: make mechanism MP-safe and more modular.

OK matt@


# 1.290 05-Nov-2010 rmind

ip_reass_packet: finish abstraction; some clean-up.
Discussed some time ago with matt@.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.289 19-Jul-2010 rmind

Abstract IP reassembly into single generic routine - ip_reass_packet().
Make struct ipq private and struct ipqent not visible to userland.
Push ip_len adjustment into reassembly layer.

OK matt@


# 1.288 13-Jul-2010 rmind

Split-off IPv4 re-assembly mechanism into a separate module. Abstract
into ip_reass_init(), ip_reass_lookup(), etc (note: abstraction is not
yet complete). No functional changes to the actual mechanism.

OK matt@


# 1.287 09-Jul-2010 rmind

ip_input: move lookup for fragment queue a little bit further. OK matt@.


Revision tags: uebayasi-xip-base1
# 1.286 01-Apr-2010 tls

As suggested by at least 3 different people (the guilty parties know who
they are) avoid repeated kernel_lock/unlock by using an intrq on the stack.

About 5%-10% better from run to run, on my *very* simpleminded test. Can't
possibly be worse.


# 1.285 31-Mar-2010 tls

Don't hold kernel lock across call to ip_input() -- it blocked *all*
hardware interrupts for the length of time it took for all dequeued
packets to flow up the stack (on multiprocessors only). Initial testing
shows performance impact is minimal -- since this temporary fix actually
means taking/releasing the kernel lock per-packet, that seems
acceptable.

Holding the kernel lock across the ip_input() call duplicated the
exclusion intended to be provided by the socket locks/softnet lock
(same lock, for INET/INET6 sockets) and could mask serious bugs. Several
hours' testing didn't turn any up but I'd be surprised if some don't now
appear.

Damon Permezel noticed the problem. Temporary fix suggested by matt@.


Revision tags: yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.284 16-Sep-2009 pooka

branches: 1.284.2; 1.284.4;
Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.283 17-Jul-2009 minskim

Delete trailing whitespace.


Revision tags: yamt-nfs-mp-base6
# 1.282 16-Jul-2009 minskim

Add the IP_RECVTTL option support.

If the IP_RECVTTL option is enabled on a SOCK_DGRAM socket, the
recvmsg(2) call will return the TTL of the received datagram. The
msg_control field in the msghdr structure points to a buffer that
contains a cmsghdr structure followed by the TTL value.

Modeled after FreeBSD implementation.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.281 18-Apr-2009 tsutsui

Remove extra whitespace added by a stupid tool.
XXX: more in src/sys/arch


# 1.280 15-Apr-2009 elad

Remove a few KAUTH_GENERIC_ISSUSER in favor of more descriptive
alternatives.

Discussed on tech-kern:

http://mail-index.netbsd.org/tech-kern/2009/04/11/msg004798.html

Input from ad@, christos@, dyoung@, tsutsui@.

Okay ad@.


# 1.279 18-Mar-2009 cegger

bcopy -> memcpy


Revision tags: nick-hppapmap-base2
# 1.278 19-Jan-2009 christos

branches: 1.278.2;
Provide compatibility to the old timeval SCM_TIMESTAMP messages.


Revision tags: mjf-devfs2-base
# 1.277 17-Dec-2008 cegger

kill MALLOC and FREE macros.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.276 23-Nov-2008 rmind

ip_input: fix an IPQ "lock" leak. (hi <matt>!)


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4
# 1.275 04-Oct-2008 pooka

branches: 1.275.2; 1.275.4;
POOL_INIT -> pool_init


Revision tags: wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.274 05-Sep-2008 seanb

Wrong route being consulted in one place
in ip_forward() after change to rtcache_*().
Restore previous behaviour.


# 1.273 20-Aug-2008 matt

Make the sysctl routines take out softnet_lock before dealing with
any data structures.

Change inet6ctlerrmap and zeroin6_addr to const.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base
# 1.272 05-May-2008 ad

branches: 1.272.2; 1.272.6;
- Convert hashinit() to use kmem_alloc(). The hash tables can be large
and it's better to not have them in kmem_map.
- Convert a couple of minor items along the way to kmem_alloc().
- Fix some memory leaks.


# 1.271 04-May-2008 thorpej

Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577


# 1.270 02-May-2008 ad

PR kern/38497 Out of memory allocating ksiginfo

Work around: don't acquire softnet_lock in protocol drain routines.


# 1.269 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.268 24-Apr-2008 ad

branches: 1.268.2;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.


# 1.267 23-Apr-2008 thorpej

Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.266 12-Apr-2008 thorpej

branches: 1.266.2;
Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.


# 1.265 09-Apr-2008 thorpej

- ipflow is not used outside ip_flow.c; move its definition there.
- Make ipflow_reap() private to ip_flow.c, and introduce ipflow_prune()
for external callers to use (avoids returning an ipflow * that is never
actually used anyway).


# 1.264 07-Apr-2008 thorpej

Change IP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ipstat structure; old netstat
binaries will continue to work properly.


# 1.263 27-Mar-2008 cube

- Make sure we send a reasonable fragment size when IPSEC is configured.
Otherwise we end up sending a dubious "0" whenever we cannot find a
proper association for the packet.
- Reset sack_newdata along with snd_nxt to avoid improper integer
arithmetics that lead to sending data from an incorrect place in the
stream, making it appear as corrupted.

Patch by Michael Van Elst, based on an analysis by Michael for the IPSEC
stuff and I for the SACK issue.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.262 06-Feb-2008 matt

branches: 1.262.6;
Add a new ip_id generation scheme based on a Fisher-Yates shuffle over a
sliding window. XXX replace use of arc4random RSN.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.261 14-Jan-2008 dyoung

Use rtcache_validate() instead of rtcache_getrt(). Shorten staircase
in in_losing().


Revision tags: vmlocking2-base3 matt-armv6-base
# 1.260 22-Dec-2007 matt

Fix offset calculation.
Make sure that all frags use the same TOS.


# 1.259 21-Dec-2007 matt

Also make sure the first is at 68 bytes long.


# 1.258 21-Dec-2007 matt

Prevent TCP blind data attacks by not allowing non-initial fragments to
start at less than 68 bytes (minimal fragment size).


# 1.257 20-Dec-2007 dyoung

Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.256 26-Nov-2007 yamt

branches: 1.256.2; 1.256.6;
inetctlerrmap: use designated initializer.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.255 09-Nov-2007 kefren

Don't MCLAIM in ipintr() because we do it anyway in ip_input()


Revision tags: jmcneill-base yamt-x86pmap-base4 yamt-x86pmap-base3 yamt-x86pmap-base2 vmlocking-base
# 1.254 02-Oct-2007 dyoung

branches: 1.254.2; 1.254.4;
Delete the unused second argument to ip_stripoptions(), move it
closer to its single caller in if_eon.c, try to move fewer bytes
by moving the IP header forward instead of moving the tail of the
mbuf backward, and use m_adj(9) instead of fiddling directly with
mbuf data members.


Revision tags: yamt-x86pmap-base
# 1.253 11-Sep-2007 degroote

branches: 1.253.2;
In some FAST_IPSEC, spl level is not restored correctly. Fix that.

Spotted by Wolfgang Stukenbrock in pr/36800


Revision tags: nick-csl-alignment-base5
# 1.252 30-Aug-2007 dyoung

Use malloc(9) for sockaddrs instead of pool(9), and remove dom_sa_pool
and dom_sa_len members from struct domain. Pools of fixed-size
objects are too rigid for sockaddr_dls, whose size can vary over
a wide range.

Return sockaddr_dl to its "historical" size. Now that I'm using
malloc(9) instead of pool(9) to allocate sockaddr_dl, I can create
a sockaddr_dl of any size in the kernel, so expanding sockaddr_dl
is useless.

Avoid using sizeof(struct sockaddr_dl) in the kernel.

Introduce sockaddr_dl_alloc() for allocating & initializing an
arbitrary sockaddr_dl on the heap.

Add an argument, the sockaddr length, to sockaddr_alloc(),
sockaddr_copy(), and sockaddr_dl_setaddr().

Constify: LLADDR() -> CLLADDR().

Where the kernel overwrites LLADDR(), use sockaddr_dl_setaddr(),
instead. Used properly, sockaddr_dl_setaddr() will not overrun
the end of the sockaddr.


# 1.251 10-Aug-2007 dyoung

branches: 1.251.2;
Use sockaddr_dl_init().


Revision tags: matt-mips64-base
# 1.250 19-Jul-2007 dyoung

branches: 1.250.4; 1.250.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base yamt-idlelwp-base8 mjf-ufs-trans-base
# 1.249 02-May-2007 dyoung

branches: 1.249.2;
Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


Revision tags: thorpej-atomic-base
# 1.248 25-Mar-2007 liamjfoy

Add net.inet.ip.hashsize to control the IPv4 fast forward hash table size.


# 1.247 24-Mar-2007 liamjfoy

Don't call ip*flow_reap if we're just looking up maxflows


# 1.246 12-Mar-2007 ad

branches: 1.246.2; 1.246.4;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.


# 1.245 05-Mar-2007 liamjfoy

branches: 1.245.2;
Move ipflow_slowtimo from ip_slowtimo and into in_proto.c

ok matt@


# 1.244 04-Mar-2007 christos

Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.243 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.242 29-Jan-2007 dyoung

branches: 1.242.2;
Cosmetic: remove extraneous, non-KNF parentheses. Change a
sizeof(type) to a sizeof(*ptr) so the correctness of the statement
is correct "at a glance" (or so I hope).


# 1.241 22-Dec-2006 ad

ipintr(): check if the queue is empty before looping. Hardly a giant
win, but removed 30% of splnet() calls in one local test.


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.240 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.239 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


# 1.238 06-Dec-2006 dyoung

KNF.


# 1.237 06-Dec-2006 dyoung

KNF.


Revision tags: netbsd-4-0-RC1 netbsd-4-base
# 1.236 16-Nov-2006 christos

branches: 1.236.2; 1.236.4;
__unused removal on arguments; approved by core.


Revision tags: yamt-splraiseipl-base2
# 1.235 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


# 1.234 10-Oct-2006 dogcow

change the MOWNER_INIT define to take two args; fix extant struct mowner
decls to use it. Makes options MBUFTRACE compile again and not whinge about
missing structure declarations. (Also makes initialization consistent.)


# 1.233 05-Oct-2006 tls

Protect calls to pool_put/pool_get that may occur in interrupt context
with spl used to protect other allocations and frees, or datastructure
element insertion and removal, in adjacent code.

It is almost unquestionably the case that some of the spl()/splx() calls
added here are superfluous, but it really seems wrong to see:

s=splfoo();
/* frob data structure */
splx(s);
pool_put(x);

and if we think we need to protect the first operation, then it is hard
to see why we should not think we need to protect the next. "Better
safe than sorry".

It is also almost unquestionably the case that I missed some pool
gets/puts from interrupt context with my strategy for finding these
calls; use of PR_NOWAIT is a strong hint that a pool may be used from
interrupt context but many callers in the kernel pass a "can wait/can't
wait" flag down such that my searches might not have found them. One
notable area that needs to be looked at is pf.

See also:

http://mail-index.netbsd.org/tech-kern/2006/07/19/0003.html
http://mail-index.netbsd.org/tech-kern/2006/07/19/0009.html


# 1.232 19-Sep-2006 elad

Remove ugly (void *) casts from network scope authorization wrapper and
calls to it.

While here, adapt code for system scope listeners to avoid some more
casts (forgotten in previous run).

Update documentation.


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9
# 1.231 13-Sep-2006 elad

branches: 1.231.2;
Don't use KAUTH_RESULT_* where it's not applicable.
Prompted by yamt@.


# 1.230 08-Sep-2006 elad

First take at security model abstraction.

- Add a few scopes to the kernel: system, network, and machdep.

- Add a few more actions/sub-actions (requests), and start using them as
opposed to the KAUTH_GENERIC_ISSUSER place-holders.

- Introduce a basic set of listeners that implement our "traditional"
security model, called "bsd44". This is the default (and only) model we
have at the moment.

- Update all relevant documentation.

- Add some code and docs to help folks who want to actually use this stuff:

* There's a sample overlay model, sitting on-top of "bsd44", for
fast experimenting with tweaking just a subset of an existing model.

This is pretty cool because it's *really* straightforward to do stuff
you had to use ugly hacks for until now...

* And of course, documentation describing how to do the above for quick
reference, including code samples.

All of these changes were tested for regressions using a Python-based
testsuite that will be (I hope) available soon via pkgsrc. Information
about the tests, and how to write new ones, can be found on:

http://kauth.linbsd.org/kauthwiki

NOTE FOR DEVELOPERS: *PLEASE* don't add any code that does any of the
following:

- Uses a KAUTH_GENERIC_ISSUSER kauth(9) request,
- Checks 'securelevel' directly,
- Checks a uid/gid directly.

(or if you feel you have to, contact me first)

This is still work in progress; It's far from being done, but now it'll
be a lot easier.

Relevant mailing list threads:

http://mail-index.netbsd.org/tech-security/2006/01/25/0011.html
http://mail-index.netbsd.org/tech-security/2006/03/24/0001.html
http://mail-index.netbsd.org/tech-security/2006/04/18/0000.html
http://mail-index.netbsd.org/tech-security/2006/05/15/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/01/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/25/0000.html

Many thanks to YAMAMOTO Takashi, Matt Thomas, and Christos Zoulas for help
stablizing kauth(9).

Full credit for the regression tests, making sure these changes didn't break
anything, goes to Matt Fleming and Jaime Fournier.

Happy birthday Randi! :)


Revision tags: yamt-pdpolicy-base8 rpaulo-netinet-merge-pcb-base
# 1.229 30-Aug-2006 christos

branches: 1.229.2;
fix initializer


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.228 30-Jul-2006 elad

ugh.. more stuff that's overdue and should not be in 4.0: remove the
sysctl(9) flags CTLFLAG_READONLY[12]. luckily they're not documented
so it's only half regression.

only two knobs used them; proc.curproc.corename (check added in the
existing handler; its CTLFLAG_ANYWRITE, yay) and net.inet.ip.forwsrcrt,
that got its own handler now too.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base
# 1.227 07-Jun-2006 kardel

merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base simonb-timecounters-base
# 1.226 08-May-2006 liamjfoy

branches: 1.226.2;
#if -> #ifdef

ok christos


# 1.225 15-Apr-2006 christos

Coverity CID 1134: Protect against NULL deref.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.224 18-Feb-2006 joerg

branches: 1.224.2; 1.224.4; 1.224.6;
Print the source and destination IP in ip_forward's DIAGNOSTIC code
with inet_ntoa, making it more human friendly.

From Liam J. Foy in private mail.


# 1.223 24-Dec-2005 perry

branches: 1.223.2; 1.223.4; 1.223.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.222 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.221 01-Nov-2005 christos

Don't decrement the ttl, until we are sure that we can forward this packet.
Before if there was no route, we would call icmp_error with a datagram
packet that has an incorrect checksum. (From Liam Foy)


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base
# 1.220 23-Oct-2005 christos

No need to pass an interface when only the mtu is needed. From OpenBSD via
Liam Foy.


Revision tags: yamt-vop-base
# 1.219 05-Aug-2005 elad

branches: 1.219.2;
Add sysctls for IP, ICMP, TCP, and UDP statistics.


# 1.218 28-Jun-2005 seanb

branches: 1.218.2;
- Return ICMP_UNREACH_NET when no route found as per
section 4.3.3.1 of rfc1812.


# 1.217 09-Jun-2005 atatat

Properly fix the constipated lossage wrt -Wcast-qual and the sysctl
code. I know it's not the prettiest code, but it seems to work rather
well in spite of itself.


# 1.216 01-Jun-2005 blymn

Unconstify rnode to prevent compile error when GATEWAY option set.


Revision tags: kent-audio2-base
# 1.215 29-Apr-2005 yamt

move decl of inetsw to its own header to avoid array of incomplete type.
found by gcc4. reported by Adam Ciarcinski.


# 1.214 18-Apr-2005 yamt

fix problems related to loopback interface checksum omission. PR/29971.

- for ipv4, defer decision to ip layer as h/w checksum offloading does
so that it can check the actual interface the packet is going to.
- for ipv6, disable it.
(maybe will be revisited when it implements h/w checksum offloading.)

ok'ed by Jason Thorpe.


# 1.213 29-Mar-2005 yamt

ip_reass: clear stale csum_flags.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base
# 1.212 26-Feb-2005 perry

branches: 1.212.2;
nuke trailing whitespace


Revision tags: yamt-km-base2
# 1.211 03-Feb-2005 perry

ANSIfy function declarations


# 1.210 02-Feb-2005 perry

de-__P -- will ANSIfy .c files later.


Revision tags: yamt-km-base
# 1.209 24-Jan-2005 matt

branches: 1.209.2;
Add IFNET_FOREACH and IFADDR_FOREACH macros and start using them.


Revision tags: kent-audio1-beforemerge
# 1.208 19-Dec-2004 christos

branches: 1.208.2;
yamt's changes seem to fix all the checksumming issues. Turn the loopback
checksums back off so we can make sure that everything works.


# 1.207 17-Dec-2004 christos

Turn checksumming on loopback back on until we fix the bugs in it.
Connect over tcp on the loopback is broken:

4729 amq 0.000007 CALL connect(4,0x804f2a0,0x1c)
4729 amq 75.007420 RET connect -1 errno 60 Connection timed out


# 1.206 15-Dec-2004 thorpej

Don't perform checksums on loopback interfaces. They can be reenabled with
the net.inet.*.do_loopback_cksum sysctl.

Approved by: groo


Revision tags: kent-audio1-base
# 1.205 06-Oct-2004 darrenr

Add a comment to document what setting "srcrt" is really on about in ipintr()


# 1.204 29-Sep-2004 christos

PR/27081: Sean Boudreau: ip_input() bad csum count not incremented on sw csum


Revision tags: BEFORE-IPF413
# 1.203 25-May-2004 atatat

Sysctl descriptions under net subtree (net.key not done)


# 1.202 02-May-2004 darrenr

at line 543, we do a pullup here of hlen bytes into the mbuf,
so these later ones are superfluous.


# 1.201 01-May-2004 matt

Use EVCNT_ATTACH_STATIC{,2}


# 1.200 25-Apr-2004 simonb

Initialise (most) pools from a link set instead of explicit calls
to pool_init. Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.

Convert struct session, ucred and lockf to pools.


# 1.199 22-Apr-2004 matt

Constify protosw arrays. This can reduce the kernel .data section by
over 4K (if all the network protocols) are loaded.


# 1.198 01-Apr-2004 matt

In ip_reass_ttl_descr, make i signed since it's compared to >= 0


Revision tags: netbsd-2-0-base BEFORE-IPF411
# 1.197 24-Mar-2004 atatat

branches: 1.197.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.


# 1.196 15-Jan-2004 itojun

correct typo in 1.94 -> 1.95. pointed out by Shiva Shenoy


# 1.195 14-Dec-2003 thorpej

Fix syntax errors in CHECK_NMBCLUSTER_PARAMS().


# 1.194 14-Dec-2003 jonathan

Second part of hashed IP_reassembly changes:

When under pressure for mbufs or we have too many fragments in the IP
reassembly queue, drop half of all fragments. This multiplicative-drop
strategy ensures we return to a healthy state, even under borderline
denial-of-service from extremely lossy NFS-over-UDP peers.
The multiplicative-drop phase currently drops 50% of fragments, but
has pre-placed support for implementing drop-fractions other than 50%

The threshhold for the `drop-half' phase is the new variable,
ip_maxfrags which is calculated as nmbclusters/4.

ip_input.c now keeps ip_nmbclusters, a cached copy of nmbclusters.
Before using limits derived from nmbclusters, we check if nmbclusters
and ip_nmclusters are equal. If not, we recompute Ip parameters
derived from nmbclusters. Based on a suggestion by Jason Thorpe.
ip_maxfrags is currently auto-recalcuated.

The counters ip_nfrags and ip_nfragpacketsr are now declared static
and uninitialized (bss), to discourage tampering with them.


# 1.193 12-Dec-2003 scw

Make fast-ipsec and ipflow (Fast Forwarding) interoperate.

The idea is that we only clear M_CANFASTFWD if an SPD exists
for the packet. Otherwise, it's safe to add a fast-forward
cache entry for the route.

To make this work properly, we invalidate the entire ipflow
cache if a fast-ipsec key is added or changed.


# 1.192 08-Dec-2003 jonathan

Add new field ipq_nfrags to struct ipq. Maintain count of fragments
(fragments, not fragmented packets) in each queue entry.
Use ipq_nfrags to maintain a count of total fragments in reassembly queue.


# 1.191 07-Dec-2003 jonathan

KNF: s/unsigned/u_int/, in a couple of places I missed.


# 1.190 06-Dec-2003 jonathan

Replace the single global IP reassembly list/listhead, with a
hashtable of list-heads. Independently re-invented, then reworked to
match similar code in FreeBSD.


# 1.189 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.188 04-Dec-2003 scw

ipflow (IP fast forwarding) is not compatible with FAST_IPSEC either.

XXX: The decision whether or not to fast forward should be made
XXX: dynamically. Using the current approach seriously reduces
XXX: routing performance on gateways with IPsec enabled.


# 1.187 26-Nov-2003 itojun

define RANDOM_IP_ID by default (unifdef -DRANDOM_IP_ID).
one use remains in sys/netipsec, which is kept for freebsd source code compat.


# 1.186 24-Nov-2003 scw

For FAST_IPSEC, ipfilter gets to see wire-format IPsec-encapsulated packets
only. Decapsulated packets bypass ipfilter. This mimics current behaviour
for Kame IPsec.


# 1.185 19-Nov-2003 fvdl

Correct number of arguments to sysctl_rdint.


# 1.184 19-Nov-2003 jonathan

Patch back support for (badly) randomized IP ids, by request:

* Include "opt_inet.h" everywhere IP-ids are generated with ip_newid(),
so the RANDOM_IP_ID option is visible. Also in ip_id(), to ensure
the prototype for ip_randomid() is made visible.

* Add new sysctl to enable randomized IP-ids, provided the kernel was
configured with RANDOM_IP_ID. (The sysctl defaults to zero, and is
a read-only zero if RANDOM_IP_ID is not configured).

Note that the implementation of randomized IP ids is still defective,
and should not be enabled at all (even if configured) without
very careful deliberation. Caveat emptor.


# 1.183 17-Nov-2003 jonathan

Diff to netinet/ip_input.c (restore ip_id, initialize) for ip_id fix:

Revert the (default) ip_id algorithm to the pre-randomid algorithm,
due to demonstrated low-period repeated IDs from the randomized IP_id
code. Consensus is that the low-period repetition (much less than
2^15) is not suitable for general-purpose use.

Allocators of new IPv4 IDs should now call the function ip_newid().
Randomized IP_ids is now a config-time option, "options RANDOM_IP_ID".
ip_newid() can use ip_random-id()_IP_ID if and only if configured
with RANDOM_IP_ID. A sysctl knob should be provided.

This API may be reworked in the near future to support linear ip_id
counters per (src,dst) IP-address pair.


# 1.182 12-Nov-2003 itojun

KNF


# 1.181 11-Nov-2003 jonathan

Change global head-of-local-IP-address list from in_ifaddr to
in_ifaddrhead. Recent changes in struct names caused a namespace
collision in fast-ipsec, which are most cleanly fixed by using
"in_ifaddrhead" as the listhead name.


# 1.180 10-Nov-2003 jonathan

Make per-protocol network input queue stats visible to userland via
sysctl. Add a protocol-independent sysctl handler to show the per-protocol
"struct ifq' statistics. Add IP(v4) specific call to the handler.
Other protocols can show their per-protocol input statistics by
allocating a sysclt node and calling sysctl_ifq() with their own struct ifq *.

As posted to tech-kern plus improvements/cleanup suggested by Andrew Brown.


# 1.179 28-Sep-2003 mycroft

Remove some code that breaks AH tunnels completely. The comment describing
the purpose of this code appears to be on crack -- it's talking about
end-to-end authentication, but the purpose of an AH tunnel is NOT end-to-end
authentication; it's authentication of the tunnel endpoints.

NB: This does not fix the fact that IPsec leaks "packet tags."


# 1.178 06-Sep-2003 itojun

randomize IPv4/v6 fragment ID and IPv6 flowlabel. avoids predictability
of these fields. ip_id.c is from openbsd. ip6_id.c is adapted by kame.


# 1.177 06-Sep-2003 itojun

backout previous, we don't know if arc4random() corrides on reboot.


# 1.176 05-Sep-2003 itojun

initialize fragment ID with arc4random, not by time.tv_sec


# 1.175 22-Aug-2003 itojun

remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.


# 1.174 22-Aug-2003 itojun

change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.


# 1.173 15-Aug-2003 jonathan

(fast-ipsec): Add hooks to pass IPv4 IPsec traffic into fast-ipsec, if
configured with ``options FAST_IPSEC''. Kernels with KAME IPsec or
with no IPsec should work as before.

All calls to ip_output() now always pass an additional compulsory
argument: the inpcb associated with the packet being sent,
or 0 if no inpcb is available.

Fast-ipsec tested with ICMP or UDP over ESP. TCP doesn't work, yet.


# 1.172 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.171 14-Jul-2003 itojun

correct igmp. from love


# 1.170 03-Jul-2003 itojun

minor KNF


# 1.169 30-Jun-2003 itojun

branches: 1.169.2;
do not generate ICMP redirect when packet filter alters ip_dst to an
address that reside on the same link. Cedric Berger convinced me that
it is necessary.


# 1.168 30-Jun-2003 itojun

fix indent


# 1.167 23-Jun-2003 martin

Make sure to include opt_foo.h if a defflag option FOO is used.


# 1.166 15-Jun-2003 matt

Change the way multicasts are kept. They now use a hash table in the same
manner as the ifaddr hash table. By doing this, the mkludge code can go
away. At the same time, keep track of what pcbs are using what ifaddr and
when an address is deleted from an interface, notify/abort all sockets
that have that address as a source. Switch IGMP and multicasts to use pools
for allocation. Fix a number of potential problems in the igmp code where
allocation failures could cause a trap/panic.


# 1.165 11-Apr-2003 christos

PR/991: Darren Reed: Add a sysctl (checkinteface) to implement this. This
implementation is taken from FreeBSD, but we default to off.
XXX: We should really do this on a per ifaddr basis as jason suggested.


# 1.164 26-Feb-2003 matt

Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.163 12-Nov-2002 itojun

remove all entries in rt timer queue on ip_mtudisc change, instead of
destroying the queue.


# 1.162 12-Nov-2002 itojun

ckout previous - doesn't compile


# 1.161 12-Nov-2002 itojun

update ip_mtudisc sysctl change handling.


# 1.160 10-Nov-2002 itojun

always create pmtud timeout queue, as ip_mtudisc can be tweaked via
sysctl at runtime. From lha@stacken.kth.se


# 1.159 02-Nov-2002 perry

/*CONTCOND*/ while (0)'ed macros


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.158 23-Sep-2002 itojun

revert mtudisc_timeout value to the old one if update falis


# 1.157 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame.


# 1.156 11-Sep-2002 itojun

correct signedness mixup in pointer passing. sync w/kame


Revision tags: gehenna-devsw-base
# 1.155 14-Aug-2002 itojun

avoid swapping endian of ip_len and ip_off on mbuf, to meet with M_LEADINGSPACE
optimization made last year. should solve PR 17867 and 10195.

IP_HDRINCL behavior of raw ip socket is kept unchanged. we may want to
provide IP_HDRINCL variant that does not swap endian.


# 1.154 30-Jun-2002 thorpej

Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).


# 1.153 13-Jun-2002 itojun

set IPv4 parameter to modern value.
- turn on path MTU discovery (previous: turned off)
- ICMPv4 redirect entry timeout = 600 sec (previous: never timeout)


# 1.152 09-Jun-2002 itojun

whitespace


# 1.151 07-Jun-2002 itojun

look at rmx_mtu on IPsec tunnel MTU computation.
From: David Waitzman <djw@bbn.com>


Revision tags: netbsd-1-6-base
# 1.150 12-May-2002 matt

branches: 1.150.2; 1.150.4;
Eliminate commons.


# 1.149 12-May-2002 wiz

Spelling fixes, from Sergey Svishchev in kern/16650.


# 1.148 07-May-2002 matt

Change struct ipqe to use TAILQ's instead of LIST's (primarily for TCP's
benefit currently). Rework tcp_reass code to optimize the 4 most likely causes
of out-of-order packets: first OoO pkt, next OoO pkt in seq, OoO pkt is part
of new chuck of OoO packets, and the OoO pkt fills the first hole. Add evcnts
to instrument tcp_reass (enabled by the options TCP_REASS_COUNTERS). This is
part 1/2 of tcp_reass changes.


# 1.147 18-Apr-2002 matt

Change test for M_EXT to M_READONLY for MROUTING. We only need to to do
a pullup if we aren't allowed to modify the packet.


Revision tags: eeh-devprop-base newlock-base
# 1.146 08-Mar-2002 thorpej

Pool deals fairly well with physical memory shortage, but it doesn't
deal with shortages of the VM maps where the backing pages are mapped
(usually kmem_map). Try to deal with this:

* Group all information about the backend allocator for a pool in a
separate structure. The pool references this structure, rather than
the individual fields.
* Change the pool_init() API accordingly, and adjust all callers.
* Link all pools using the same backend allocator on a list.
* The backend allocator is responsible for waiting for physical memory
to become available, but will still fail if it cannot callocate KVA
space for the pages. If this happens, carefully drain all pools using
the same backend allocator, so that some KVA space can be freed.
* Change pool_reclaim() to indicate if it actually succeeded in freeing
some pages, and use that information to make draining easier and more
efficient.
* Get rid of PR_URGENT. There was only one use of it, and it could be
dealt with by the caller.

From art@openbsd.org.


Revision tags: ifpoll-base
# 1.145 25-Feb-2002 itojun

correctly enforce ipsec policy check on forwarding case.
From: Greg Troxel <gdt@ir.bbn.com>, Bill Chiarchiaro <wjc@work.cleartech.com>


# 1.144 24-Feb-2002 martin

Clear M_BCAST and M_MCAST on outgoing mbufs.
Don't copy ttl from the inner packet to the encapsulating packet. Make
the outer ttl sysctl'able. This should close PR 14269 from Jasper Wallace
(change partly from there) and it makes traceroute work over gre tunnels.


# 1.143 21-Feb-2002 itojun

suppress source quence message, based on router-req RFC (also could be abused
as DoS traffic generator). from kjc/kame


# 1.142 28-Nov-2001 darrenr

recompute hlen after calling pfil_run_hooks() in case ip_hl was changed.


# 1.141 13-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-mips-cache-base
# 1.140 04-Nov-2001 matt

Convert netinet to not use the internal <sys/queue.h> field names
but instead the access macros. Use the FOREACH macros where appropriate.


# 1.139 04-Nov-2001 matt

Change a few variable/tables to const since they are read-only.


# 1.138 29-Oct-2001 simonb

Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.137 17-Sep-2001 thorpej

branches: 1.137.2;
Split the pre-computed ifnet checksum flags into Tx and Rx directions.
Add capabilities bits that indicate an interface can only perform
in-bound TCPv4 or UDPv4 checksums. There is at least one Gig-E chip
for which this is true (Level One LXT-1001), and this is also the
case for the Intel i82559 10/100 Ethernet chips.


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.136 06-Aug-2001 itojun

branches: 1.136.2;
cache IPsec policy on in6?pcb. most of the lookup operations can be bypassed,
especially when it is a connected SOCK_STREAM in6?pcb. sync with kame.


# 1.135 02-Jun-2001 thorpej

branches: 1.135.2;
Implement support for IP/TCP/UDP checksum offloading provided by
network interfaces. This works by pre-computing the pseudo-header
checksum and caching it, delaying the actual checksum to ip_output()
if the hardware cannot perform the sum for us. In-bound checksums
can either be fully-checked by hardware, or summed up for final
verification by software. This method was modeled after how this
is done in FreeBSD, although the code is significantly different in
most places.

We don't delay checksums for IPv6/TCP, but we do take advantage of the
cached pseudo-header checksum.

Note: hardware-assisted checksumming defaults to "off". It is
enabled with ifconfig(8). See the manual page for details.

Implement hardware-assisted checksumming on the DP83820 Gigabit Ethernet,
3c90xB/3c90xC 10/100 Ethernet, and Alteon Tigon/Tigon2 Gigabit Ethernet.


# 1.134 21-May-2001 lukem

fix spelo in comment


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.133 16-Apr-2001 itojun

give a default value to net.inet.ip.maxfragpackets, to protect us from
"lots of fragmented packets" DoS attack.

the current default value is derived from ipv6 counterpart, which is
a magical value "200". it should be enough for normal systems, not sure
if it is enough when you take hundreds of thousands of tcp connections on
your system. if you have proposal for a better value with concrete reasons,
let me know.


# 1.132 13-Apr-2001 thorpej

Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.


# 1.131 27-Mar-2001 itojun

net.inet.ip.maxfragpackets defines the maximum size of ip reass queue
(prevents fragment flood from chewing up mbuf memory space).
derived from KAME net.inet6.ip6.maxfragpackets.


# 1.130 02-Mar-2001 itojun

branches: 1.130.2;
increase ipstat.ips_badaddr if the packet fails to pass address checks.


# 1.129 02-Mar-2001 itojun

reject packets with 127/8 on IPv4 src/dst, they must not appear on wire
(RFC1122). torture-tests will be welcomed.
XXX do we want to check source routing headers as well?


# 1.128 01-Mar-2001 itojun

make sure to enforce inbound ipsec policy checking, for any protocols on top
of ip (check it when final header is visited). sync with kame.
XXX kame team will need to re-check policy engine code


# 1.127 24-Jan-2001 itojun

- record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation


# 1.126 28-Dec-2000 thorpej

Back out the sledgehammer damage applied by wiz while I was out for
the holiday.


# 1.125 25-Dec-2000 wiz

Back out previous change. It causes NAT to fail, and was CLEARLY
NOT TESTED before it was committed.


# 1.124 22-Dec-2000 thorpej

Slight adjustment to how pfil_head's are registered. Instead of a
"key" and a "dlt", use a "type" (PFIL_TYPE_{AF,IFNET} for now) and
a val/ptr appropriate for that type. This allows for more future
flexibility with the pfil_hook mechanism.


# 1.123 14-Dec-2000 thorpej

Add ALTQ glue. XXX Temporary until ALTQ is changed to use a pfil hook.


# 1.122 24-Nov-2000 itojun

IFA_STATS stability (not complete); don't touch ip if it is NULL.


# 1.121 11-Nov-2000 thorpej

Restructure the PFIL_HOOKS mechanism a bit:
- All packets are passed to PFIL_HOOKS as they come off the wire, i.e.
fields in protocol headers in network order, etc.
- Allow for multiple hooks to be registered, using a "key" and a "dlt".
The "dlt" is a BPF data link type, indicating what type of header is
present.
- INET and INET6 register with key == AF_INET or AF_INET6, and
dlt == DLT_RAW.
- PFIL_HOOKS now take an argument for the filter hook, and mbuf **,
an ifnet *, and a direction (PFIL_IN or PFIL_OUT), thus making them
less IP (really, IP Filter) centric.

Maintain compatibility with IP Filter by adding wrapper functions for
IP Filter.


# 1.120 08-Nov-2000 ad

Update for hashinit() change.


# 1.119 13-Oct-2000 itojun

make sure we don't share external mbuf between m and mcopy, in ip_forward().
should solve PR 11201.


# 1.118 26-Aug-2000 itojun

make sure anonport{min,max} is not negative number


# 1.117 25-Aug-2000 tron

Add new sysctl variables "net.inet.ip.lowportmin" and
"net.inet.ip.lowportmax" which can be used to the set minimum
and maximum port number assigned to sockets using
IP_PORTRANGE_LOW.


# 1.116 06-Jul-2000 itojun

remove unnecessary #include <netkey/key_debug.h>. from kame.


# 1.115 28-Jun-2000 mrg

<vm/vm.h> -> <uvm/uvm_extern.h>


Revision tags: netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.114 10-May-2000 itojun

branches: 1.114.4;
add missing boundary checks to ip options processing.
correct timestamp option validation (len and ptr upper/lower bound
based on RFC791).
fill "pointer" field for parameter problem in timestamp option processing.


# 1.113 10-May-2000 itojun

correct more out-of-bounds memory access, if cnt == 1 and optlen > 1.


# 1.112 06-May-2000 sommerfeld

Handle large offsets with very small options correctly.


# 1.111 31-Mar-2000 jdolecek

Slighly improve previous - only include <netinet/ip_mroute.h> if MROUTING
is defined.


# 1.110 31-Mar-2000 jdolecek

include <netinet/ip_mroute.h> for ip_mforward() - needed after
last duplicate prototype sweep (prototype for ip_mforward() used to be in <netinet/ip_var.h>)


# 1.109 30-Mar-2000 augustss

Remove register declarations.


# 1.108 30-Mar-2000 simonb

Delete uninitialised declaration of ip_defttl - there's an initialised
decl earlier in this file.


# 1.107 10-Mar-2000 thorpej

Back out previous, and adjust a comment.


# 1.106 07-Mar-2000 thorpej

Back out part of 1.104 which isn't actually needed.


# 1.105 03-Mar-2000 itojun

remove unnecessary ttl initialization which I mistakingly bringed in
during KAME merge (this is part of WIDE's expeirmental reass code...)
NetBSD PR: 9412
From: Wolfgang Rupprecht <wolfgang@wsrcc.com>
Fix from: ho@crt.se
itojun was notified from: theo


# 1.104 02-Mar-2000 thorpej

Avoid a bug in GCC which manifests itself when processing unaligned
IP options. Problem pointed out by Matt Hargett and Erik Fair, analyzed
by me.


# 1.103 01-Mar-2000 itojun

introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)


# 1.102 20-Feb-2000 darrenr

pass "struct pfil_head *" to pfil_add_hook and pfil_remove hook rather
than "struct protosw *".


# 1.101 17-Feb-2000 darrenr

Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.


# 1.100 16-Feb-2000 itojun

- if ip_dst matches address on !IFF_UP interface, and
- there's no match against addresses on IFF_UP interface,
send icmp unreach if I'm router. drop it if I'm host.

Revised version of PR: 9387 from nrt@iij.ad.jp. Discussed with thorpej+nrt.


Revision tags: chs-ubc2-newbase
# 1.99 12-Feb-2000 thorpej

Typo (Thanks, Havard :-)


# 1.98 12-Feb-2000 thorpej

Small cosmetic change, and note a place where a statistic should be
gathered.


# 1.97 11-Feb-2000 itojun

fix in-kernel packet forwarding loop (till TTL becomes 0) when:
- a packet is delivered to an address X,
- and the address X is configured on my !IFF_UP interface
- and ipforwarding=1

NetBSD PR: 9387
From: nrt@iij.ad.jp


# 1.96 01-Feb-2000 thorpej

Use ifatoia() and sintosa() consistently, rather than using home-grown
casting macros intermixed.


# 1.95 31-Jan-2000 itojun

bring in latest KAME ipsec tree.
- interop issues in ipcomp is fixed
- padding type (after ESP) is configurable
- key database memory management (need more fixes)
- policy specification is revisited

XXX m->m_pkthdr.rcvif is still overloaded - hope to fix it soon


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.94 26-Oct-1999 itojun

disable ipflow (IPv4 fast fowarding) when IPsec is configured into the kernel.


# 1.93 17-Oct-1999 sommerfeld

branches: 1.93.2; 1.93.4;
In ip_forward():

Avoid forwarding ip unicast packets which were contained inside
link-level multicast packets; having M_MCAST still set in the packet
header flags will mean that the packet will get multicast to a bogus
group instead of unicast to the next hop.

Malformed packets like this have occasionally been spotted "in the
wild" on a mediaone cable modem segment which also had multiple netbsd
machines running as router/NAT boxes.

Without this, any subnet with multiple netbsd routers receiving all
multicasts will generate a packet storm on receipt of such a
multicast. Note that we already do the same check here for link-level
broadcasts; ip6_forward already does this as well.

Note that multicast forwarding does not go through ip_forward().

Adding some code to if_ethersubr to sanity check link-level
vs. ip-level multicast addresses might also be worthwhile.


Revision tags: chs-ubc2-base
# 1.92 23-Jul-1999 itojun

branches: 1.92.2;
do not include unnecessary include files.


# 1.91 09-Jul-1999 thorpej

defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).


# 1.90 06-Jul-1999 itojun

sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups


# 1.89 01-Jul-1999 itojun

IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.88 26-Jun-1999 sommerfeld

If the new global variable hostzerobroadcast is zero, no longer assume
address zero of each net/subnet is a broadcast address.
(The default value is nonzero, which preserves the current behavior).

This can be set using sysctl; the boot-time default can also be
configured using the HOSTZEROBROADCAST kernel config option.

While we're here, defopt HOSTZEROBROADCAST and SUBNETSARELOCAL


# 1.87 04-May-1999 hwr

It does not make much sense to increase a "output" counter on input.


# 1.86 03-May-1999 thorpej

In INADDR_TO_IA(), skip interfaces which are not up. Revert previous change
to ip_input.c to check the interface status after INADDR_TO_IA().

Fix cooked up by Heiko Rupp and myself.

Fixes PR 7480.


# 1.85 03-May-1999 hwr

Drop packets, that have a Class-D address as source address.
Implements the first half of PR 7003.


# 1.84 07-Apr-1999 proff

tiny KNF change


# 1.83 07-Apr-1999 proff

Prevent reception of packets on downed interfaces (via an up interface).
fixes kern/7327


Revision tags: netbsd-1-4-base
# 1.82 27-Mar-1999 aidan

branches: 1.82.2;
Added per-addr input/output statistics. Currently just support netatalk
and netinet, currently only tested under netinet.

Disabled by default, enabled by compiling the kernel with option
IFA_STATS. Enabling this feature seems to make the ip_output function
take 13% longer than before, which should be OK for people that need
this feature.


# 1.81 26-Mar-1999 proff

security: test for ip_len < ip_hl <<2 and drop packet accordingly


# 1.80 19-Jan-1999 mycroft

There's just no plausible reason to byte-swap ip_id internally. It's opaque.


# 1.79 19-Jan-1999 mycroft

Don't screw with ip_len; just subtract from it where we actually use the
value.


# 1.78 19-Jan-1999 mycroft

Don't overwrite the checksum fields when checking them. There's no reason to
do this, and it screws up ICMP replies.
XXX The returned IP checksum and length are still wrong.


# 1.77 11-Jan-1999 thorpej

Fix byte order and ip_len inconsistencies in ICMP reply code. Also, fix
some formatting and HTONS(foo) vs. foo = htons(foo) inconsistencies.

PR #6602, Darren Reed.


# 1.76 19-Dec-1998 thorpej

Reverse the copyright-notice-swap. It went against existing practice.


# 1.75 18-Dec-1998 thorpej

Add a lock around the IP fragment reassembly queue, to prevent ip_drain()
from corrupting the queue if called from a device's interrupt context.

Should fix PR #5684.


Revision tags: kenh-if-detach-base
# 1.74 13-Nov-1998 thorpej

branches: 1.74.2;
Once a fragmented IP packet has been reassembled, recompute the packet
length before passing it up the stack. From FreeBSD.


Revision tags: chs-ubc-base
# 1.73 08-Oct-1998 thorpej

Use the pool allocator for ipflow entries.


# 1.72 08-Oct-1998 thorpej

Use the pool allocator for ipqent structures.


# 1.71 30-Sep-1998 tls

Switch order of TNF and UCB copyrights so UCB copyright is first; this seems more appropriate since UCB wrote the original code, after all.


# 1.70 09-Sep-1998 thorpej

Make a diagnostic printf more sensible, PR #5951, Heiko W. Rupp.


# 1.69 09-Aug-1998 mrg

defopt PFIL_HOOKS.


Revision tags: eeh-paddr_t-base
# 1.68 17-Jul-1998 sommerfe

Fix PR5508: ipfil cut-through forwarding causes panic


# 1.67 01-Jun-1998 thorpej

Protect the ipflow_reap() call with splsoftnet.


# 1.66 24-May-1998 thorpej

Fix OBOB in IP timestamp option processing, as noted in FreeBSD PR 6738,
from Jennifer Dawn Meyers <jdm@enteract.com>.


# 1.65 04-May-1998 matt

Default IP flow to being enabled. Add a sysctl to control the maximum
number of flows (net.inet.ip.maxflows). If set to 0, will disable fast
path forwarding.


# 1.64 01-May-1998 thorpej

Allow packet filters to prevent a packet from creating a fast-forwarding
flow, by setting the "can fast forward" flag in the packet header, and
giving a chance for filters to clear the flag. If the flag is still
set after the filters have given it a chance, the packet will be used
to create a fast-forward flow entry.


# 1.63 29-Apr-1998 matt

Add support for "fast" forwarding. Add hooks in if_ethersubr.c and
if_fddisubr.c to fastpath IP forwarding. If ip_forward successfully
forwards a packet, it will create a cache (ipflow) entry. ether_input
and fddi_input will first call ipflow_fastforward with the received
packet and if the packet passes enough tests, it will be forwarded (the
ttl is decremented and the cksum is adjusted incrementally).


# 1.62 29-Apr-1998 matt

defopt GATEWAY


# 1.61 29-Apr-1998 kml

change path MTU timeout value to match RFC 1191


# 1.60 29-Apr-1998 kml

Add support for deletion of routes added by path MTU discovery;
uses new generic route timeout code. Add sysctl for timeout period.


# 1.59 19-Mar-1998 mrg

convert pfil(9) in and out lists from <sys/queue.h> LISTs to TAILQs, and
change pfil_add_hook to put output filters at the tail of the queue,
while continuing to place input filters at the head of the queue. update
the two users of these functions, and document these changes.

fixes PR#4593.


# 1.58 15-Feb-1998 tls

Add correct copyright notice for IP address hash change. This code is donated to TNF by the original copyright holder, Panix.


# 1.57 13-Feb-1998 tls

Change list of interface IP addresses to a hash. Improves performance on hosts with a large number of IP addresses significantly.


# 1.56 28-Jan-1998 thorpej

Use offsetof() from libkern.h


# 1.55 12-Jan-1998 scottr

Use option header file for MROUTING


# 1.54 05-Jan-1998 lukem

enhance ephemeral port allocation code:
* support sysctl net.inet.ip.anonportmin (lowest ephemeral port)
and net.inet.ip.anonportmax (highest ephemeral port).
these can't be set to >65535, < IPPORT_RESERVED (unless IPNOPRIVPORTS
is defined), and anonportmin has to be < anonportmax.
* use a cleaner way of only cycling through the available set once;
this will be useful for when a random allocation scheme is used
* define IPPORT_ANON{MIN,MAX} instead of IPPORT_USER{LOW,HIGH}


Revision tags: netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base
# 1.53 18-Oct-1997 kml

branches: 1.53.2;
change sysctl net.inet.icmp.mtudisc to net.inet.ip.mtudisc


# 1.52 17-Oct-1997 thorpej

Allow `subnetsarelocal' to be changed via sysctl.


Revision tags: thorpej-signal-base marc-pcmcia-base
# 1.51 29-Aug-1997 gwr

Tweaks to allow operation with an interface address of 0.0.0.0
(needed for NFS mountroot using BOOTP to get boot parameters)


Revision tags: marc-pcmcia-bp
# 1.50 24-Jun-1997 thorpej

branches: 1.50.4;
Eliminate use of dtom() from the network code, allowing more flexible
use of mbuf external storage and increasing performance (by eliminating
an m_pullup() for clusters in the IP reassembly code).

Changes from Koji Imada <koji@math.human.nagoya-u.ac.jp>, in PR #3628
and #3480, with ever-so-slight integration changes by me.


# 1.49 15-Apr-1997 christos

Move the mtod calls *after* we've made sure that the packet has passed the
filter successfully. Otherwise it can be NULL if the filter blocked it,
and we die. How did this ever work?


Revision tags: is-newarp-before-merge
# 1.48 26-Feb-1997 mrg

allow src-routed packetd by default, per host requirements


# 1.47 25-Feb-1997 cjs

Add net.inet.ip.allowsrcrt option which allows/drops all source
routed packets. This currently defaults to `drop,' but once we
verify that all applications that rely on determining remote IP
addresses for authentication are dropping the connection when they
see a source route option (not just disabling the source route
option), we can turn this back on and conform with the host
requirements.


# 1.46 19-Feb-1997 cjs

Fix bug in sysctl net.inet.ip.forwsrcrt handing: now you can read it
if securelevel > 0. (Thanks, cgd.)


# 1.45 18-Feb-1997 mrg

pseudo-device ipfilter brings in PFIL_HOOKS.


Revision tags: is-newarp-base
# 1.44 11-Jan-1997 thorpej

branches: 1.44.4;
Implement the IP_RECVIF socket option: supply a datagram packet's incoming
interface using a sockaddr_dl in a control mbuf.

Implement SO_TIMESTAMP for IP datagrams.

Move packet information option processing into a generic function
so that they work with multicast UDP and raw IP as well as unicast UDP.

Contributed by Bill Fenner <fenner@parc.xerox.com>.


# 1.43 20-Dec-1996 mrg

in pfil_hooks: always reassign ip after calling hook.


# 1.42 20-Dec-1996 mrg

remove pfil_bad.


# 1.41 25-Oct-1996 thorpej

Before concatenating frags, sanity check the length of the packet. If it's
larger than IP_MAXPACKET, discard it.
Based on a patch from Bill Fenner <fenner@parc.xerox.com>


# 1.40 22-Oct-1996 veego

Fix a panic from the pfil_hooks.


# 1.39 13-Oct-1996 christos

backout previous kprintf changes


# 1.38 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.37 21-Sep-1996 perry

commit fix in pr 2772 -- the IP input code was assuming that the
reserved (must be zero) flag must necessarily be zero. We now define
an IP_RF (by analogy to IP_DF and IP_MF) and mask it out when necessary.


# 1.36 14-Sep-1996 mrg

move the packet filter hooks in to a saner location. while i'm here, rename
PACKET_FILTER to PFIL_HOOKS.


# 1.35 09-Sep-1996 mycroft

Add in_nullhost() and in_hosteq() macros, to hide some protocol
details. Also, fix a bug in TCP wrt SYN+URG packets.


# 1.34 08-Sep-1996 mycroft

Save 68 bytes of the packet for ICMP, not 64. From Laine Stump, PR 2296.


# 1.33 06-Sep-1996 mrg

add packet filter interface code. see pfil(9) for more details. you
need the PACKET_FILTER option to enable this code. currently, ipfilter
version 3.1.1-beta has been converted to use this new interface.


# 1.32 14-Aug-1996 thorpej

Fix some DIAGNOSTIC printf() formats; ntohl() provides a 32-bit quantity,
and should be printed with %x, not %lx.


# 1.31 10-Jul-1996 cgd

print result of ntohl/htonl as a long. (makes -Wformat work on the
Alpha.)


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.30 16-Mar-1996 christos

branches: 1.30.4;
Fix printf format args.


# 1.29 26-Feb-1996 mrg

two more local addr changes, all done differently now (idea from charles)


# 1.28 13-Feb-1996 christos

netinet prototypes


# 1.27 16-Jan-1996 thorpej

Add a net.inet.ip.directed-broadcast sysctl as suggested by
Darren Reed <darrenr@vitruvius.arbld.unimelb.edu.au> in PR #1227.
This change is slightly different than the one submitted by Darren in
that the DIRECTED_BROADCAST compile-time option will behave like it used
to so that existing configurations utilizing it won't have to change.


# 1.26 15-Jan-1996 thorpej

Add net.inet.ip.forwsrcrt: if zero, the system will not forward
source-routed packets. Note this value is protected by kernel security
level; it can only be changed if securelevel < 1.


# 1.25 21-Nov-1995 cgd

make netinet work on systems where pointers and longs are 64 bits
(like the alpha). Biggest problem: IP headers were overlayed with
structure which included pointers, and which therefore didn't overlay
properly on 64-bit machines. Solution: instead of threading pointers
through IP header overlays, add a "queue element" structure to do
the threading, and point it at the ip headers.


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.24 12-Aug-1995 mycroft

splnet --> splsoftnet


# 1.23 12-Jun-1995 mycroft

Change in_pcbnotify*() to take an errno value. Make inetctlerrmap[] an
array on ints, not u_chars.


# 1.22 12-Jun-1995 mycroft

Various cleanup, including:
* Convert several data structures to use queue.h.
* Split in_pcbnotify() into two parts; one for notifying a specific PCB, and
one for notifying all PCBs for a particular foreign address.


# 1.21 07-Jun-1995 mycroft

Remove ip_ifmatrix completely.


# 1.20 04-Jun-1995 mycroft

Don't cast things unnecessarily.


# 1.19 04-Jun-1995 mycroft

Clean up many more casts.


# 1.18 01-Jun-1995 mycroft

Avoid byte-swapping IP addresses at run time.


# 1.17 15-May-1995 cgd

oops; forgot a '{'


# 1.16 14-May-1995 cgd

drop (and record) malformed IP fragments. Fixes pr 1030 (differently).


# 1.15 13-Apr-1995 cgd

be a bit more careful and explicit with types. (basically a large no-op.)


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.14 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.13 13-May-1994 mycroft

Update to 4.4-Lite networking code, with a few local changes.


# 1.12 14-Feb-1994 mycroft

PARANOID --> DIAGNOSTIC for inexpensive tests.


# 1.11 02-Feb-1994 hpeyerl

Multicast is no longer optional.


# 1.10 29-Jan-1994 brezak

Fix some cases of NOT dealing with m_pkthdr's. This code is still suspect though, at least this fixes some panics.


# 1.9 10-Jan-1994 mycroft

Should compile now with or without `options MULTICAST'.


# 1.8 09-Jan-1994 mycroft

Prototype the rest.


# 1.7 08-Jan-1994 mycroft

More prototypes.


# 1.6 08-Jan-1994 mycroft

Fix some inconsistent spacing; spaces at the end of lines, etc.


# 1.5 18-Dec-1993 mycroft

Canonicalize all #includes.


# 1.4 06-Dec-1993 hpeyerl

multicast support.
>From Chris Maeda, cmaeda@cs.washington.edu
These patches are derived from the IP Multicast patches for BSDI.


Revision tags: magnum-base netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.3 20-May-1993 cgd

branches: 1.3.4;
more rcsid additions and file header cleanups


# 1.2 04-May-1993 cgd

make ip_input recursion checking be for -DPARANOID, and make it panic


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.349 07-Feb-2017 ozaki-r

Add missing NULL checks for m_get_rcvif


Revision tags: nick-nhusb-base-20170204
# 1.348 24-Jan-2017 ozaki-r

Tweak softnet_lock and NET_MPSAFE

- Don't hold softnet_lock in some functions if NET_MPSAFE
- Add softnet_lock to sysctl_net_inet_icmp_redirtimeout
- Add softnet_lock to expire_upcalls of ip_mroute.c
- Restore softnet_lock for in{,6}_pcbpurgeif{,0} if NET_MPSAFE
- Mark some softnet_lock for future work


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107
# 1.347 12-Dec-2016 ozaki-r

Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.346 08-Dec-2016 ozaki-r

Use psref for ip_rtaddr

ip_rtaddr will be sleepable soon. So use psref instead of pserialize.


# 1.345 08-Dec-2016 ozaki-r

Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.344 18-Oct-2016 ozaki-r

Don't hold global locks if NET_MPSAFE is enabled

If NET_MPSAFE is enabled, don't hold KERNEL_LOCK and softnet_lock in
part of the network stack such as IP forwarding paths. The aim of the
change is to make it easy to test the network stack without the locks
and reduce our local diffs.

By default (i.e., if NET_MPSAFE isn't enabled), the locks are held
as they used to be.

Reviewed by knakahara@


# 1.343 18-Oct-2016 ozaki-r

Avoid double frees of mbuf

May fix one of panicks reported by Tom Ivar Helbekkmo in PR kern/51522


# 1.342 11-Oct-2016 ozaki-r

Fix kernel builds with IFA_STATS


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.341 07-Sep-2016 roy

Disallow input to detached addresses because they are not yet valid.


# 1.340 31-Aug-2016 ozaki-r

Make ipforward_rt and ip6_forward_rt percpu

Sharing one rtcache between CPUs is just a bad idea.

Reviewed by knakahara@


Revision tags: pgoyette-localcount-20160806
# 1.339 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


# 1.338 26-Jul-2016 ozaki-r

Fix downmatch increment


Revision tags: pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.337 08-Jul-2016 ozaki-r

branches: 1.337.2;
CID 1363344: remove dead code

We may need to reconsider a case when m_get_rcvif_psref returns NULL.


# 1.336 07-Jul-2016 ozaki-r

Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.


# 1.335 06-Jul-2016 ozaki-r

Switch the IPv4 address list to pslist(9)

Note that we leave the old list just in case; it seems there are some
kvm(3) users accessing the list. We can remove it later if we confirmed
nobody does actually.


# 1.334 06-Jul-2016 ozaki-r

Add and use pslist(9)-based hashtable for IPv4 addresses

Note that we leave the old hashtable to keep vmstat -H working.


# 1.333 04-Jul-2016 ozaki-r

Separate IP address matching functions

No functional change intended.


# 1.332 30-Jun-2016 ozaki-r

Tidy up goto lables

No functional change.


# 1.331 30-Jun-2016 ozaki-r

Fix error paths

Some error paths did m_put_rcvif_psref twice.


# 1.330 28-Jun-2016 ozaki-r

Add missing NULL checks for m_get_rcvif_psref


# 1.329 10-Jun-2016 ozaki-r

Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319
# 1.328 21-Jan-2016 riastradh

Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.


# 1.327 21-Jan-2016 riastradh

Give proper prototype to ip_output.


# 1.326 08-Jan-2016 knakahara

eliminate ip_input.c and ip6_input.c dependency on gif(4)


Revision tags: nick-nhusb-base-20151226
# 1.325 13-Oct-2015 roy

Include arp.h to restore the sysctl net.inet.ip.dad_count.
Fixes PR kern/49883 thanks to HITOSHI Osada.


Revision tags: nick-nhusb-base-20150921
# 1.324 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.323 07-Aug-2015 ozaki-r

Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.


Revision tags: nick-nhusb-base-20150606
# 1.322 02-May-2015 joerg

Fix !ARP build.


# 1.321 02-May-2015 roy

Add IPv4 address flags IN_IFF_TENTATIVE, IN_IFF_DUPLICATED and
IN_IFF_DETATCHED to mimic the IPv6 address behaviour.
Add SIOCGIFAFLAG_IN ioctl to retrieve the address flag via the
ifreq structure.
Add IPv4 DAD detection via the ARP methods described in RFC 5227.
Add sysctls net.inet.ip.dad_count and net.inet.arp.debug.

Discussed on tech-net@


Revision tags: nick-nhusb-base-20150406
# 1.320 26-Mar-2015 ozaki-r

Tidy up the regular path of ip_forward

No functional change is intended.


Revision tags: netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.319 16-Jun-2014 ozaki-r

branches: 1.319.4;
Add 3rd argument to pktq_create to pass sc

It will be used to pass bridge sc for bridge_forward softint.

ok rmind@


# 1.318 05-Jun-2014 rmind

- Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.


# 1.317 30-May-2014 christos

Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.


# 1.316 29-May-2014 rmind

Make IGMP and multicast group management code MP-safe. Use a read-write
lock to protect the hash table of multicast address records; also, make it
private and eliminate some macros. In the long term, the lookup path ought
to be optimised.


# 1.315 28-May-2014 christos

CID 12164{49,51}: Remove bogus ifp == NULL checks; if ifp was really NULL,
we would have been dead a few lines before the tests.


# 1.314 23-May-2014 rmind

ip_input(), ip_savecontrol(): cache m->m_pkthdr.rcvif in a variable.


# 1.313 23-May-2014 rmind

Make ip_forward() static, there is no need to expose it.


# 1.312 23-May-2014 rmind

Make ip_input() static, there is no need to expose it.


# 1.311 22-May-2014 rmind

- Add in_init() and move some functions, variables and sysctls into in.c
where they belong to. Make some functions and variables static.
- ip_input.c: reduce some #ifdefs, cleanup a little.
- Move some sysctls into ip_flow.c as they belong there.

No functional change.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.310 19-Mar-2014 liamjfoy

branches: 1.310.2;
Remove ipflow_prune and replace with ipflow_reap. ok rmind@


Revision tags: riastradh-drm2-base3
# 1.309 25-Feb-2014 pooka

Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.308 29-Jun-2013 rmind

- Rewrite parts of pfil(9): use array to store hooks and thus be more cache
friendly (there are only few hooks in the system). Make the structures
opaque and the interface more strict.
- Remove PFIL_HOOKS option by making pfil(9) mandatory.


# 1.307 27-Jun-2013 christos

branches: 1.307.2;
flip src/dst


# 1.306 27-Jun-2013 christos

implement IP_PKTINFO and IP_RECVPKTINFO.


# 1.305 08-Jun-2013 rmind

Split IPsec code in ip_input() and ip_forward() into the separate routines
ipsec4_input() and ipsec4_forward(). Tested by christos@.


# 1.304 05-Jun-2013 christos

IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.


Revision tags: agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.303 29-Nov-2012 christos

Add a new sysctl to mark ports as reserved, so that they are not used in
the anonymous or reserved port allocation.


Revision tags: yamt-pagecache-base6
# 1.302 25-Jun-2012 christos

branches: 1.302.2;
rename rfc6056 -> portalgo, requested by yamt


# 1.301 22-Jun-2012 christos

PR/46602: Move the rfc6056 port randomization to the IP layer.


# 1.300 02-Jun-2012 dsl

Add some pre-processor magic to verify that the type of the data item
passed to sysctl_createv() actually matches the declared type for
the item itself.
In the places where the caller specifies a function and a structure
address (typically the 'softc') an explicit (void *) cast is now needed.
Fixes bugs in sys/dev/acpi/asus_acpi.c sys/dev/bluetooth/bcsp.c
sys/kern/vfs_bio.c sys/miscfs/syncfs/sync_subr.c and setting
AcpiGbl_EnableAmlDebugObject.
(mostly passing the address of a uint64_t when typed as CTLTYPE_INT).
I've test built quite a few kernels, but there may be some unfixed MD
fallout. Most likely passing &char[] to char *.
Also add CTLFLAG_UNSIGNED for unsiged decimals - not set yet.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.299 22-Mar-2012 drochner

remove KAME IPSEC, replaced by FAST_IPSEC


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.298 09-Jan-2012 liamjfoy

check against NULL


# 1.297 19-Dec-2011 drochner

rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.296 31-Aug-2011 plunky

branches: 1.296.2; 1.296.6;
NULL does not need a cast


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.295 03-May-2011 dyoung

*_drain() routines may be called with locks held, so instead of doing
any work in *_drain(), set a drain-needed flag. Do the work in the
fasttimo handler.

Contributed by Coyote Point Systems, Inc.


# 1.294 14-Apr-2011 dyoung

In ipintr(), don't overwrite ipintrq.ifq_maxlen with IFQ_MAXLEN.

Initialize ipintrq.ifq_maxlen using IFQ_MAXLEN directly instead of using
the global ipqmaxlen. Get rid of the global ipqmaxlen.

Now it works again to override the maximum IP queue length with, for
example, sysctl -w net.inet.ip.ifq.maxlen=5.


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.293 13-Dec-2010 matt

branches: 1.293.2;
Back out rev that shouldn't have been committed.


# 1.292 11-Dec-2010 matt

Add routines to calculate a checkesum if the driver concludes that the
h/w can't do it.


Revision tags: uebayasi-xip-base4
# 1.291 05-Nov-2010 rmind

ip_randomid: make mechanism MP-safe and more modular.

OK matt@


# 1.290 05-Nov-2010 rmind

ip_reass_packet: finish abstraction; some clean-up.
Discussed some time ago with matt@.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.289 19-Jul-2010 rmind

Abstract IP reassembly into single generic routine - ip_reass_packet().
Make struct ipq private and struct ipqent not visible to userland.
Push ip_len adjustment into reassembly layer.

OK matt@


# 1.288 13-Jul-2010 rmind

Split-off IPv4 re-assembly mechanism into a separate module. Abstract
into ip_reass_init(), ip_reass_lookup(), etc (note: abstraction is not
yet complete). No functional changes to the actual mechanism.

OK matt@


# 1.287 09-Jul-2010 rmind

ip_input: move lookup for fragment queue a little bit further. OK matt@.


Revision tags: uebayasi-xip-base1
# 1.286 01-Apr-2010 tls

As suggested by at least 3 different people (the guilty parties know who
they are) avoid repeated kernel_lock/unlock by using an intrq on the stack.

About 5%-10% better from run to run, on my *very* simpleminded test. Can't
possibly be worse.


# 1.285 31-Mar-2010 tls

Don't hold kernel lock across call to ip_input() -- it blocked *all*
hardware interrupts for the length of time it took for all dequeued
packets to flow up the stack (on multiprocessors only). Initial testing
shows performance impact is minimal -- since this temporary fix actually
means taking/releasing the kernel lock per-packet, that seems
acceptable.

Holding the kernel lock across the ip_input() call duplicated the
exclusion intended to be provided by the socket locks/softnet lock
(same lock, for INET/INET6 sockets) and could mask serious bugs. Several
hours' testing didn't turn any up but I'd be surprised if some don't now
appear.

Damon Permezel noticed the problem. Temporary fix suggested by matt@.


Revision tags: yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.284 16-Sep-2009 pooka

branches: 1.284.2; 1.284.4;
Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.283 17-Jul-2009 minskim

Delete trailing whitespace.


Revision tags: yamt-nfs-mp-base6
# 1.282 16-Jul-2009 minskim

Add the IP_RECVTTL option support.

If the IP_RECVTTL option is enabled on a SOCK_DGRAM socket, the
recvmsg(2) call will return the TTL of the received datagram. The
msg_control field in the msghdr structure points to a buffer that
contains a cmsghdr structure followed by the TTL value.

Modeled after FreeBSD implementation.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.281 18-Apr-2009 tsutsui

Remove extra whitespace added by a stupid tool.
XXX: more in src/sys/arch


# 1.280 15-Apr-2009 elad

Remove a few KAUTH_GENERIC_ISSUSER in favor of more descriptive
alternatives.

Discussed on tech-kern:

http://mail-index.netbsd.org/tech-kern/2009/04/11/msg004798.html

Input from ad@, christos@, dyoung@, tsutsui@.

Okay ad@.


# 1.279 18-Mar-2009 cegger

bcopy -> memcpy


Revision tags: nick-hppapmap-base2
# 1.278 19-Jan-2009 christos

branches: 1.278.2;
Provide compatibility to the old timeval SCM_TIMESTAMP messages.


Revision tags: mjf-devfs2-base
# 1.277 17-Dec-2008 cegger

kill MALLOC and FREE macros.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.276 23-Nov-2008 rmind

ip_input: fix an IPQ "lock" leak. (hi <matt>!)


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4
# 1.275 04-Oct-2008 pooka

branches: 1.275.2; 1.275.4;
POOL_INIT -> pool_init


Revision tags: wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.274 05-Sep-2008 seanb

Wrong route being consulted in one place
in ip_forward() after change to rtcache_*().
Restore previous behaviour.


# 1.273 20-Aug-2008 matt

Make the sysctl routines take out softnet_lock before dealing with
any data structures.

Change inet6ctlerrmap and zeroin6_addr to const.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base
# 1.272 05-May-2008 ad

branches: 1.272.2; 1.272.6;
- Convert hashinit() to use kmem_alloc(). The hash tables can be large
and it's better to not have them in kmem_map.
- Convert a couple of minor items along the way to kmem_alloc().
- Fix some memory leaks.


# 1.271 04-May-2008 thorpej

Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577


# 1.270 02-May-2008 ad

PR kern/38497 Out of memory allocating ksiginfo

Work around: don't acquire softnet_lock in protocol drain routines.


# 1.269 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.268 24-Apr-2008 ad

branches: 1.268.2;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.


# 1.267 23-Apr-2008 thorpej

Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.266 12-Apr-2008 thorpej

branches: 1.266.2;
Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.


# 1.265 09-Apr-2008 thorpej

- ipflow is not used outside ip_flow.c; move its definition there.
- Make ipflow_reap() private to ip_flow.c, and introduce ipflow_prune()
for external callers to use (avoids returning an ipflow * that is never
actually used anyway).


# 1.264 07-Apr-2008 thorpej

Change IP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ipstat structure; old netstat
binaries will continue to work properly.


# 1.263 27-Mar-2008 cube

- Make sure we send a reasonable fragment size when IPSEC is configured.
Otherwise we end up sending a dubious "0" whenever we cannot find a
proper association for the packet.
- Reset sack_newdata along with snd_nxt to avoid improper integer
arithmetics that lead to sending data from an incorrect place in the
stream, making it appear as corrupted.

Patch by Michael Van Elst, based on an analysis by Michael for the IPSEC
stuff and I for the SACK issue.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.262 06-Feb-2008 matt

branches: 1.262.6;
Add a new ip_id generation scheme based on a Fisher-Yates shuffle over a
sliding window. XXX replace use of arc4random RSN.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.261 14-Jan-2008 dyoung

Use rtcache_validate() instead of rtcache_getrt(). Shorten staircase
in in_losing().


Revision tags: vmlocking2-base3 matt-armv6-base
# 1.260 22-Dec-2007 matt

Fix offset calculation.
Make sure that all frags use the same TOS.


# 1.259 21-Dec-2007 matt

Also make sure the first is at 68 bytes long.


# 1.258 21-Dec-2007 matt

Prevent TCP blind data attacks by not allowing non-initial fragments to
start at less than 68 bytes (minimal fragment size).


# 1.257 20-Dec-2007 dyoung

Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.256 26-Nov-2007 yamt

branches: 1.256.2; 1.256.6;
inetctlerrmap: use designated initializer.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.255 09-Nov-2007 kefren

Don't MCLAIM in ipintr() because we do it anyway in ip_input()


Revision tags: jmcneill-base yamt-x86pmap-base4 yamt-x86pmap-base3 yamt-x86pmap-base2 vmlocking-base
# 1.254 02-Oct-2007 dyoung

branches: 1.254.2; 1.254.4;
Delete the unused second argument to ip_stripoptions(), move it
closer to its single caller in if_eon.c, try to move fewer bytes
by moving the IP header forward instead of moving the tail of the
mbuf backward, and use m_adj(9) instead of fiddling directly with
mbuf data members.


Revision tags: yamt-x86pmap-base
# 1.253 11-Sep-2007 degroote

branches: 1.253.2;
In some FAST_IPSEC, spl level is not restored correctly. Fix that.

Spotted by Wolfgang Stukenbrock in pr/36800


Revision tags: nick-csl-alignment-base5
# 1.252 30-Aug-2007 dyoung

Use malloc(9) for sockaddrs instead of pool(9), and remove dom_sa_pool
and dom_sa_len members from struct domain. Pools of fixed-size
objects are too rigid for sockaddr_dls, whose size can vary over
a wide range.

Return sockaddr_dl to its "historical" size. Now that I'm using
malloc(9) instead of pool(9) to allocate sockaddr_dl, I can create
a sockaddr_dl of any size in the kernel, so expanding sockaddr_dl
is useless.

Avoid using sizeof(struct sockaddr_dl) in the kernel.

Introduce sockaddr_dl_alloc() for allocating & initializing an
arbitrary sockaddr_dl on the heap.

Add an argument, the sockaddr length, to sockaddr_alloc(),
sockaddr_copy(), and sockaddr_dl_setaddr().

Constify: LLADDR() -> CLLADDR().

Where the kernel overwrites LLADDR(), use sockaddr_dl_setaddr(),
instead. Used properly, sockaddr_dl_setaddr() will not overrun
the end of the sockaddr.


# 1.251 10-Aug-2007 dyoung

branches: 1.251.2;
Use sockaddr_dl_init().


Revision tags: matt-mips64-base
# 1.250 19-Jul-2007 dyoung

branches: 1.250.4; 1.250.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base yamt-idlelwp-base8 mjf-ufs-trans-base
# 1.249 02-May-2007 dyoung

branches: 1.249.2;
Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


Revision tags: thorpej-atomic-base
# 1.248 25-Mar-2007 liamjfoy

Add net.inet.ip.hashsize to control the IPv4 fast forward hash table size.


# 1.247 24-Mar-2007 liamjfoy

Don't call ip*flow_reap if we're just looking up maxflows


# 1.246 12-Mar-2007 ad

branches: 1.246.2; 1.246.4;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.


# 1.245 05-Mar-2007 liamjfoy

branches: 1.245.2;
Move ipflow_slowtimo from ip_slowtimo and into in_proto.c

ok matt@


# 1.244 04-Mar-2007 christos

Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.243 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.242 29-Jan-2007 dyoung

branches: 1.242.2;
Cosmetic: remove extraneous, non-KNF parentheses. Change a
sizeof(type) to a sizeof(*ptr) so the correctness of the statement
is correct "at a glance" (or so I hope).


# 1.241 22-Dec-2006 ad

ipintr(): check if the queue is empty before looping. Hardly a giant
win, but removed 30% of splnet() calls in one local test.


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.240 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.239 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


# 1.238 06-Dec-2006 dyoung

KNF.


# 1.237 06-Dec-2006 dyoung

KNF.


Revision tags: netbsd-4-0-RC1 netbsd-4-base
# 1.236 16-Nov-2006 christos

branches: 1.236.2; 1.236.4;
__unused removal on arguments; approved by core.


Revision tags: yamt-splraiseipl-base2
# 1.235 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


# 1.234 10-Oct-2006 dogcow

change the MOWNER_INIT define to take two args; fix extant struct mowner
decls to use it. Makes options MBUFTRACE compile again and not whinge about
missing structure declarations. (Also makes initialization consistent.)


# 1.233 05-Oct-2006 tls

Protect calls to pool_put/pool_get that may occur in interrupt context
with spl used to protect other allocations and frees, or datastructure
element insertion and removal, in adjacent code.

It is almost unquestionably the case that some of the spl()/splx() calls
added here are superfluous, but it really seems wrong to see:

s=splfoo();
/* frob data structure */
splx(s);
pool_put(x);

and if we think we need to protect the first operation, then it is hard
to see why we should not think we need to protect the next. "Better
safe than sorry".

It is also almost unquestionably the case that I missed some pool
gets/puts from interrupt context with my strategy for finding these
calls; use of PR_NOWAIT is a strong hint that a pool may be used from
interrupt context but many callers in the kernel pass a "can wait/can't
wait" flag down such that my searches might not have found them. One
notable area that needs to be looked at is pf.

See also:

http://mail-index.netbsd.org/tech-kern/2006/07/19/0003.html
http://mail-index.netbsd.org/tech-kern/2006/07/19/0009.html


# 1.232 19-Sep-2006 elad

Remove ugly (void *) casts from network scope authorization wrapper and
calls to it.

While here, adapt code for system scope listeners to avoid some more
casts (forgotten in previous run).

Update documentation.


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9
# 1.231 13-Sep-2006 elad

branches: 1.231.2;
Don't use KAUTH_RESULT_* where it's not applicable.
Prompted by yamt@.


# 1.230 08-Sep-2006 elad

First take at security model abstraction.

- Add a few scopes to the kernel: system, network, and machdep.

- Add a few more actions/sub-actions (requests), and start using them as
opposed to the KAUTH_GENERIC_ISSUSER place-holders.

- Introduce a basic set of listeners that implement our "traditional"
security model, called "bsd44". This is the default (and only) model we
have at the moment.

- Update all relevant documentation.

- Add some code and docs to help folks who want to actually use this stuff:

* There's a sample overlay model, sitting on-top of "bsd44", for
fast experimenting with tweaking just a subset of an existing model.

This is pretty cool because it's *really* straightforward to do stuff
you had to use ugly hacks for until now...

* And of course, documentation describing how to do the above for quick
reference, including code samples.

All of these changes were tested for regressions using a Python-based
testsuite that will be (I hope) available soon via pkgsrc. Information
about the tests, and how to write new ones, can be found on:

http://kauth.linbsd.org/kauthwiki

NOTE FOR DEVELOPERS: *PLEASE* don't add any code that does any of the
following:

- Uses a KAUTH_GENERIC_ISSUSER kauth(9) request,
- Checks 'securelevel' directly,
- Checks a uid/gid directly.

(or if you feel you have to, contact me first)

This is still work in progress; It's far from being done, but now it'll
be a lot easier.

Relevant mailing list threads:

http://mail-index.netbsd.org/tech-security/2006/01/25/0011.html
http://mail-index.netbsd.org/tech-security/2006/03/24/0001.html
http://mail-index.netbsd.org/tech-security/2006/04/18/0000.html
http://mail-index.netbsd.org/tech-security/2006/05/15/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/01/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/25/0000.html

Many thanks to YAMAMOTO Takashi, Matt Thomas, and Christos Zoulas for help
stablizing kauth(9).

Full credit for the regression tests, making sure these changes didn't break
anything, goes to Matt Fleming and Jaime Fournier.

Happy birthday Randi! :)


Revision tags: yamt-pdpolicy-base8 rpaulo-netinet-merge-pcb-base
# 1.229 30-Aug-2006 christos

branches: 1.229.2;
fix initializer


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.228 30-Jul-2006 elad

ugh.. more stuff that's overdue and should not be in 4.0: remove the
sysctl(9) flags CTLFLAG_READONLY[12]. luckily they're not documented
so it's only half regression.

only two knobs used them; proc.curproc.corename (check added in the
existing handler; its CTLFLAG_ANYWRITE, yay) and net.inet.ip.forwsrcrt,
that got its own handler now too.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base
# 1.227 07-Jun-2006 kardel

merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base simonb-timecounters-base
# 1.226 08-May-2006 liamjfoy

branches: 1.226.2;
#if -> #ifdef

ok christos


# 1.225 15-Apr-2006 christos

Coverity CID 1134: Protect against NULL deref.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.224 18-Feb-2006 joerg

branches: 1.224.2; 1.224.4; 1.224.6;
Print the source and destination IP in ip_forward's DIAGNOSTIC code
with inet_ntoa, making it more human friendly.

From Liam J. Foy in private mail.


# 1.223 24-Dec-2005 perry

branches: 1.223.2; 1.223.4; 1.223.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.222 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.221 01-Nov-2005 christos

Don't decrement the ttl, until we are sure that we can forward this packet.
Before if there was no route, we would call icmp_error with a datagram
packet that has an incorrect checksum. (From Liam Foy)


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base
# 1.220 23-Oct-2005 christos

No need to pass an interface when only the mtu is needed. From OpenBSD via
Liam Foy.


Revision tags: yamt-vop-base
# 1.219 05-Aug-2005 elad

branches: 1.219.2;
Add sysctls for IP, ICMP, TCP, and UDP statistics.


# 1.218 28-Jun-2005 seanb

branches: 1.218.2;
- Return ICMP_UNREACH_NET when no route found as per
section 4.3.3.1 of rfc1812.


# 1.217 09-Jun-2005 atatat

Properly fix the constipated lossage wrt -Wcast-qual and the sysctl
code. I know it's not the prettiest code, but it seems to work rather
well in spite of itself.


# 1.216 01-Jun-2005 blymn

Unconstify rnode to prevent compile error when GATEWAY option set.


Revision tags: kent-audio2-base
# 1.215 29-Apr-2005 yamt

move decl of inetsw to its own header to avoid array of incomplete type.
found by gcc4. reported by Adam Ciarcinski.


# 1.214 18-Apr-2005 yamt

fix problems related to loopback interface checksum omission. PR/29971.

- for ipv4, defer decision to ip layer as h/w checksum offloading does
so that it can check the actual interface the packet is going to.
- for ipv6, disable it.
(maybe will be revisited when it implements h/w checksum offloading.)

ok'ed by Jason Thorpe.


# 1.213 29-Mar-2005 yamt

ip_reass: clear stale csum_flags.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base
# 1.212 26-Feb-2005 perry

branches: 1.212.2;
nuke trailing whitespace


Revision tags: yamt-km-base2
# 1.211 03-Feb-2005 perry

ANSIfy function declarations


# 1.210 02-Feb-2005 perry

de-__P -- will ANSIfy .c files later.


Revision tags: yamt-km-base
# 1.209 24-Jan-2005 matt

branches: 1.209.2;
Add IFNET_FOREACH and IFADDR_FOREACH macros and start using them.


Revision tags: kent-audio1-beforemerge
# 1.208 19-Dec-2004 christos

branches: 1.208.2;
yamt's changes seem to fix all the checksumming issues. Turn the loopback
checksums back off so we can make sure that everything works.


# 1.207 17-Dec-2004 christos

Turn checksumming on loopback back on until we fix the bugs in it.
Connect over tcp on the loopback is broken:

4729 amq 0.000007 CALL connect(4,0x804f2a0,0x1c)
4729 amq 75.007420 RET connect -1 errno 60 Connection timed out


# 1.206 15-Dec-2004 thorpej

Don't perform checksums on loopback interfaces. They can be reenabled with
the net.inet.*.do_loopback_cksum sysctl.

Approved by: groo


Revision tags: kent-audio1-base
# 1.205 06-Oct-2004 darrenr

Add a comment to document what setting "srcrt" is really on about in ipintr()


# 1.204 29-Sep-2004 christos

PR/27081: Sean Boudreau: ip_input() bad csum count not incremented on sw csum


Revision tags: BEFORE-IPF413
# 1.203 25-May-2004 atatat

Sysctl descriptions under net subtree (net.key not done)


# 1.202 02-May-2004 darrenr

at line 543, we do a pullup here of hlen bytes into the mbuf,
so these later ones are superfluous.


# 1.201 01-May-2004 matt

Use EVCNT_ATTACH_STATIC{,2}


# 1.200 25-Apr-2004 simonb

Initialise (most) pools from a link set instead of explicit calls
to pool_init. Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.

Convert struct session, ucred and lockf to pools.


# 1.199 22-Apr-2004 matt

Constify protosw arrays. This can reduce the kernel .data section by
over 4K (if all the network protocols) are loaded.


# 1.198 01-Apr-2004 matt

In ip_reass_ttl_descr, make i signed since it's compared to >= 0


Revision tags: netbsd-2-0-base BEFORE-IPF411
# 1.197 24-Mar-2004 atatat

branches: 1.197.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.


# 1.196 15-Jan-2004 itojun

correct typo in 1.94 -> 1.95. pointed out by Shiva Shenoy


# 1.195 14-Dec-2003 thorpej

Fix syntax errors in CHECK_NMBCLUSTER_PARAMS().


# 1.194 14-Dec-2003 jonathan

Second part of hashed IP_reassembly changes:

When under pressure for mbufs or we have too many fragments in the IP
reassembly queue, drop half of all fragments. This multiplicative-drop
strategy ensures we return to a healthy state, even under borderline
denial-of-service from extremely lossy NFS-over-UDP peers.
The multiplicative-drop phase currently drops 50% of fragments, but
has pre-placed support for implementing drop-fractions other than 50%

The threshhold for the `drop-half' phase is the new variable,
ip_maxfrags which is calculated as nmbclusters/4.

ip_input.c now keeps ip_nmbclusters, a cached copy of nmbclusters.
Before using limits derived from nmbclusters, we check if nmbclusters
and ip_nmclusters are equal. If not, we recompute Ip parameters
derived from nmbclusters. Based on a suggestion by Jason Thorpe.
ip_maxfrags is currently auto-recalcuated.

The counters ip_nfrags and ip_nfragpacketsr are now declared static
and uninitialized (bss), to discourage tampering with them.


# 1.193 12-Dec-2003 scw

Make fast-ipsec and ipflow (Fast Forwarding) interoperate.

The idea is that we only clear M_CANFASTFWD if an SPD exists
for the packet. Otherwise, it's safe to add a fast-forward
cache entry for the route.

To make this work properly, we invalidate the entire ipflow
cache if a fast-ipsec key is added or changed.


# 1.192 08-Dec-2003 jonathan

Add new field ipq_nfrags to struct ipq. Maintain count of fragments
(fragments, not fragmented packets) in each queue entry.
Use ipq_nfrags to maintain a count of total fragments in reassembly queue.


# 1.191 07-Dec-2003 jonathan

KNF: s/unsigned/u_int/, in a couple of places I missed.


# 1.190 06-Dec-2003 jonathan

Replace the single global IP reassembly list/listhead, with a
hashtable of list-heads. Independently re-invented, then reworked to
match similar code in FreeBSD.


# 1.189 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.188 04-Dec-2003 scw

ipflow (IP fast forwarding) is not compatible with FAST_IPSEC either.

XXX: The decision whether or not to fast forward should be made
XXX: dynamically. Using the current approach seriously reduces
XXX: routing performance on gateways with IPsec enabled.


# 1.187 26-Nov-2003 itojun

define RANDOM_IP_ID by default (unifdef -DRANDOM_IP_ID).
one use remains in sys/netipsec, which is kept for freebsd source code compat.


# 1.186 24-Nov-2003 scw

For FAST_IPSEC, ipfilter gets to see wire-format IPsec-encapsulated packets
only. Decapsulated packets bypass ipfilter. This mimics current behaviour
for Kame IPsec.


# 1.185 19-Nov-2003 fvdl

Correct number of arguments to sysctl_rdint.


# 1.184 19-Nov-2003 jonathan

Patch back support for (badly) randomized IP ids, by request:

* Include "opt_inet.h" everywhere IP-ids are generated with ip_newid(),
so the RANDOM_IP_ID option is visible. Also in ip_id(), to ensure
the prototype for ip_randomid() is made visible.

* Add new sysctl to enable randomized IP-ids, provided the kernel was
configured with RANDOM_IP_ID. (The sysctl defaults to zero, and is
a read-only zero if RANDOM_IP_ID is not configured).

Note that the implementation of randomized IP ids is still defective,
and should not be enabled at all (even if configured) without
very careful deliberation. Caveat emptor.


# 1.183 17-Nov-2003 jonathan

Diff to netinet/ip_input.c (restore ip_id, initialize) for ip_id fix:

Revert the (default) ip_id algorithm to the pre-randomid algorithm,
due to demonstrated low-period repeated IDs from the randomized IP_id
code. Consensus is that the low-period repetition (much less than
2^15) is not suitable for general-purpose use.

Allocators of new IPv4 IDs should now call the function ip_newid().
Randomized IP_ids is now a config-time option, "options RANDOM_IP_ID".
ip_newid() can use ip_random-id()_IP_ID if and only if configured
with RANDOM_IP_ID. A sysctl knob should be provided.

This API may be reworked in the near future to support linear ip_id
counters per (src,dst) IP-address pair.


# 1.182 12-Nov-2003 itojun

KNF


# 1.181 11-Nov-2003 jonathan

Change global head-of-local-IP-address list from in_ifaddr to
in_ifaddrhead. Recent changes in struct names caused a namespace
collision in fast-ipsec, which are most cleanly fixed by using
"in_ifaddrhead" as the listhead name.


# 1.180 10-Nov-2003 jonathan

Make per-protocol network input queue stats visible to userland via
sysctl. Add a protocol-independent sysctl handler to show the per-protocol
"struct ifq' statistics. Add IP(v4) specific call to the handler.
Other protocols can show their per-protocol input statistics by
allocating a sysclt node and calling sysctl_ifq() with their own struct ifq *.

As posted to tech-kern plus improvements/cleanup suggested by Andrew Brown.


# 1.179 28-Sep-2003 mycroft

Remove some code that breaks AH tunnels completely. The comment describing
the purpose of this code appears to be on crack -- it's talking about
end-to-end authentication, but the purpose of an AH tunnel is NOT end-to-end
authentication; it's authentication of the tunnel endpoints.

NB: This does not fix the fact that IPsec leaks "packet tags."


# 1.178 06-Sep-2003 itojun

randomize IPv4/v6 fragment ID and IPv6 flowlabel. avoids predictability
of these fields. ip_id.c is from openbsd. ip6_id.c is adapted by kame.


# 1.177 06-Sep-2003 itojun

backout previous, we don't know if arc4random() corrides on reboot.


# 1.176 05-Sep-2003 itojun

initialize fragment ID with arc4random, not by time.tv_sec


# 1.175 22-Aug-2003 itojun

remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.


# 1.174 22-Aug-2003 itojun

change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.


# 1.173 15-Aug-2003 jonathan

(fast-ipsec): Add hooks to pass IPv4 IPsec traffic into fast-ipsec, if
configured with ``options FAST_IPSEC''. Kernels with KAME IPsec or
with no IPsec should work as before.

All calls to ip_output() now always pass an additional compulsory
argument: the inpcb associated with the packet being sent,
or 0 if no inpcb is available.

Fast-ipsec tested with ICMP or UDP over ESP. TCP doesn't work, yet.


# 1.172 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.171 14-Jul-2003 itojun

correct igmp. from love


# 1.170 03-Jul-2003 itojun

minor KNF


# 1.169 30-Jun-2003 itojun

branches: 1.169.2;
do not generate ICMP redirect when packet filter alters ip_dst to an
address that reside on the same link. Cedric Berger convinced me that
it is necessary.


# 1.168 30-Jun-2003 itojun

fix indent


# 1.167 23-Jun-2003 martin

Make sure to include opt_foo.h if a defflag option FOO is used.


# 1.166 15-Jun-2003 matt

Change the way multicasts are kept. They now use a hash table in the same
manner as the ifaddr hash table. By doing this, the mkludge code can go
away. At the same time, keep track of what pcbs are using what ifaddr and
when an address is deleted from an interface, notify/abort all sockets
that have that address as a source. Switch IGMP and multicasts to use pools
for allocation. Fix a number of potential problems in the igmp code where
allocation failures could cause a trap/panic.


# 1.165 11-Apr-2003 christos

PR/991: Darren Reed: Add a sysctl (checkinteface) to implement this. This
implementation is taken from FreeBSD, but we default to off.
XXX: We should really do this on a per ifaddr basis as jason suggested.


# 1.164 26-Feb-2003 matt

Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.163 12-Nov-2002 itojun

remove all entries in rt timer queue on ip_mtudisc change, instead of
destroying the queue.


# 1.162 12-Nov-2002 itojun

ckout previous - doesn't compile


# 1.161 12-Nov-2002 itojun

update ip_mtudisc sysctl change handling.


# 1.160 10-Nov-2002 itojun

always create pmtud timeout queue, as ip_mtudisc can be tweaked via
sysctl at runtime. From lha@stacken.kth.se


# 1.159 02-Nov-2002 perry

/*CONTCOND*/ while (0)'ed macros


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.158 23-Sep-2002 itojun

revert mtudisc_timeout value to the old one if update falis


# 1.157 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame.


# 1.156 11-Sep-2002 itojun

correct signedness mixup in pointer passing. sync w/kame


Revision tags: gehenna-devsw-base
# 1.155 14-Aug-2002 itojun

avoid swapping endian of ip_len and ip_off on mbuf, to meet with M_LEADINGSPACE
optimization made last year. should solve PR 17867 and 10195.

IP_HDRINCL behavior of raw ip socket is kept unchanged. we may want to
provide IP_HDRINCL variant that does not swap endian.


# 1.154 30-Jun-2002 thorpej

Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).


# 1.153 13-Jun-2002 itojun

set IPv4 parameter to modern value.
- turn on path MTU discovery (previous: turned off)
- ICMPv4 redirect entry timeout = 600 sec (previous: never timeout)


# 1.152 09-Jun-2002 itojun

whitespace


# 1.151 07-Jun-2002 itojun

look at rmx_mtu on IPsec tunnel MTU computation.
From: David Waitzman <djw@bbn.com>


Revision tags: netbsd-1-6-base
# 1.150 12-May-2002 matt

branches: 1.150.2; 1.150.4;
Eliminate commons.


# 1.149 12-May-2002 wiz

Spelling fixes, from Sergey Svishchev in kern/16650.


# 1.148 07-May-2002 matt

Change struct ipqe to use TAILQ's instead of LIST's (primarily for TCP's
benefit currently). Rework tcp_reass code to optimize the 4 most likely causes
of out-of-order packets: first OoO pkt, next OoO pkt in seq, OoO pkt is part
of new chuck of OoO packets, and the OoO pkt fills the first hole. Add evcnts
to instrument tcp_reass (enabled by the options TCP_REASS_COUNTERS). This is
part 1/2 of tcp_reass changes.


# 1.147 18-Apr-2002 matt

Change test for M_EXT to M_READONLY for MROUTING. We only need to to do
a pullup if we aren't allowed to modify the packet.


Revision tags: eeh-devprop-base newlock-base
# 1.146 08-Mar-2002 thorpej

Pool deals fairly well with physical memory shortage, but it doesn't
deal with shortages of the VM maps where the backing pages are mapped
(usually kmem_map). Try to deal with this:

* Group all information about the backend allocator for a pool in a
separate structure. The pool references this structure, rather than
the individual fields.
* Change the pool_init() API accordingly, and adjust all callers.
* Link all pools using the same backend allocator on a list.
* The backend allocator is responsible for waiting for physical memory
to become available, but will still fail if it cannot callocate KVA
space for the pages. If this happens, carefully drain all pools using
the same backend allocator, so that some KVA space can be freed.
* Change pool_reclaim() to indicate if it actually succeeded in freeing
some pages, and use that information to make draining easier and more
efficient.
* Get rid of PR_URGENT. There was only one use of it, and it could be
dealt with by the caller.

From art@openbsd.org.


Revision tags: ifpoll-base
# 1.145 25-Feb-2002 itojun

correctly enforce ipsec policy check on forwarding case.
From: Greg Troxel <gdt@ir.bbn.com>, Bill Chiarchiaro <wjc@work.cleartech.com>


# 1.144 24-Feb-2002 martin

Clear M_BCAST and M_MCAST on outgoing mbufs.
Don't copy ttl from the inner packet to the encapsulating packet. Make
the outer ttl sysctl'able. This should close PR 14269 from Jasper Wallace
(change partly from there) and it makes traceroute work over gre tunnels.


# 1.143 21-Feb-2002 itojun

suppress source quence message, based on router-req RFC (also could be abused
as DoS traffic generator). from kjc/kame


# 1.142 28-Nov-2001 darrenr

recompute hlen after calling pfil_run_hooks() in case ip_hl was changed.


# 1.141 13-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-mips-cache-base
# 1.140 04-Nov-2001 matt

Convert netinet to not use the internal <sys/queue.h> field names
but instead the access macros. Use the FOREACH macros where appropriate.


# 1.139 04-Nov-2001 matt

Change a few variable/tables to const since they are read-only.


# 1.138 29-Oct-2001 simonb

Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.137 17-Sep-2001 thorpej

branches: 1.137.2;
Split the pre-computed ifnet checksum flags into Tx and Rx directions.
Add capabilities bits that indicate an interface can only perform
in-bound TCPv4 or UDPv4 checksums. There is at least one Gig-E chip
for which this is true (Level One LXT-1001), and this is also the
case for the Intel i82559 10/100 Ethernet chips.


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.136 06-Aug-2001 itojun

branches: 1.136.2;
cache IPsec policy on in6?pcb. most of the lookup operations can be bypassed,
especially when it is a connected SOCK_STREAM in6?pcb. sync with kame.


# 1.135 02-Jun-2001 thorpej

branches: 1.135.2;
Implement support for IP/TCP/UDP checksum offloading provided by
network interfaces. This works by pre-computing the pseudo-header
checksum and caching it, delaying the actual checksum to ip_output()
if the hardware cannot perform the sum for us. In-bound checksums
can either be fully-checked by hardware, or summed up for final
verification by software. This method was modeled after how this
is done in FreeBSD, although the code is significantly different in
most places.

We don't delay checksums for IPv6/TCP, but we do take advantage of the
cached pseudo-header checksum.

Note: hardware-assisted checksumming defaults to "off". It is
enabled with ifconfig(8). See the manual page for details.

Implement hardware-assisted checksumming on the DP83820 Gigabit Ethernet,
3c90xB/3c90xC 10/100 Ethernet, and Alteon Tigon/Tigon2 Gigabit Ethernet.


# 1.134 21-May-2001 lukem

fix spelo in comment


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.133 16-Apr-2001 itojun

give a default value to net.inet.ip.maxfragpackets, to protect us from
"lots of fragmented packets" DoS attack.

the current default value is derived from ipv6 counterpart, which is
a magical value "200". it should be enough for normal systems, not sure
if it is enough when you take hundreds of thousands of tcp connections on
your system. if you have proposal for a better value with concrete reasons,
let me know.


# 1.132 13-Apr-2001 thorpej

Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.


# 1.131 27-Mar-2001 itojun

net.inet.ip.maxfragpackets defines the maximum size of ip reass queue
(prevents fragment flood from chewing up mbuf memory space).
derived from KAME net.inet6.ip6.maxfragpackets.


# 1.130 02-Mar-2001 itojun

branches: 1.130.2;
increase ipstat.ips_badaddr if the packet fails to pass address checks.


# 1.129 02-Mar-2001 itojun

reject packets with 127/8 on IPv4 src/dst, they must not appear on wire
(RFC1122). torture-tests will be welcomed.
XXX do we want to check source routing headers as well?


# 1.128 01-Mar-2001 itojun

make sure to enforce inbound ipsec policy checking, for any protocols on top
of ip (check it when final header is visited). sync with kame.
XXX kame team will need to re-check policy engine code


# 1.127 24-Jan-2001 itojun

- record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation


# 1.126 28-Dec-2000 thorpej

Back out the sledgehammer damage applied by wiz while I was out for
the holiday.


# 1.125 25-Dec-2000 wiz

Back out previous change. It causes NAT to fail, and was CLEARLY
NOT TESTED before it was committed.


# 1.124 22-Dec-2000 thorpej

Slight adjustment to how pfil_head's are registered. Instead of a
"key" and a "dlt", use a "type" (PFIL_TYPE_{AF,IFNET} for now) and
a val/ptr appropriate for that type. This allows for more future
flexibility with the pfil_hook mechanism.


# 1.123 14-Dec-2000 thorpej

Add ALTQ glue. XXX Temporary until ALTQ is changed to use a pfil hook.


# 1.122 24-Nov-2000 itojun

IFA_STATS stability (not complete); don't touch ip if it is NULL.


# 1.121 11-Nov-2000 thorpej

Restructure the PFIL_HOOKS mechanism a bit:
- All packets are passed to PFIL_HOOKS as they come off the wire, i.e.
fields in protocol headers in network order, etc.
- Allow for multiple hooks to be registered, using a "key" and a "dlt".
The "dlt" is a BPF data link type, indicating what type of header is
present.
- INET and INET6 register with key == AF_INET or AF_INET6, and
dlt == DLT_RAW.
- PFIL_HOOKS now take an argument for the filter hook, and mbuf **,
an ifnet *, and a direction (PFIL_IN or PFIL_OUT), thus making them
less IP (really, IP Filter) centric.

Maintain compatibility with IP Filter by adding wrapper functions for
IP Filter.


# 1.120 08-Nov-2000 ad

Update for hashinit() change.


# 1.119 13-Oct-2000 itojun

make sure we don't share external mbuf between m and mcopy, in ip_forward().
should solve PR 11201.


# 1.118 26-Aug-2000 itojun

make sure anonport{min,max} is not negative number


# 1.117 25-Aug-2000 tron

Add new sysctl variables "net.inet.ip.lowportmin" and
"net.inet.ip.lowportmax" which can be used to the set minimum
and maximum port number assigned to sockets using
IP_PORTRANGE_LOW.


# 1.116 06-Jul-2000 itojun

remove unnecessary #include <netkey/key_debug.h>. from kame.


# 1.115 28-Jun-2000 mrg

<vm/vm.h> -> <uvm/uvm_extern.h>


Revision tags: netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.114 10-May-2000 itojun

branches: 1.114.4;
add missing boundary checks to ip options processing.
correct timestamp option validation (len and ptr upper/lower bound
based on RFC791).
fill "pointer" field for parameter problem in timestamp option processing.


# 1.113 10-May-2000 itojun

correct more out-of-bounds memory access, if cnt == 1 and optlen > 1.


# 1.112 06-May-2000 sommerfeld

Handle large offsets with very small options correctly.


# 1.111 31-Mar-2000 jdolecek

Slighly improve previous - only include <netinet/ip_mroute.h> if MROUTING
is defined.


# 1.110 31-Mar-2000 jdolecek

include <netinet/ip_mroute.h> for ip_mforward() - needed after
last duplicate prototype sweep (prototype for ip_mforward() used to be in <netinet/ip_var.h>)


# 1.109 30-Mar-2000 augustss

Remove register declarations.


# 1.108 30-Mar-2000 simonb

Delete uninitialised declaration of ip_defttl - there's an initialised
decl earlier in this file.


# 1.107 10-Mar-2000 thorpej

Back out previous, and adjust a comment.


# 1.106 07-Mar-2000 thorpej

Back out part of 1.104 which isn't actually needed.


# 1.105 03-Mar-2000 itojun

remove unnecessary ttl initialization which I mistakingly bringed in
during KAME merge (this is part of WIDE's expeirmental reass code...)
NetBSD PR: 9412
From: Wolfgang Rupprecht <wolfgang@wsrcc.com>
Fix from: ho@crt.se
itojun was notified from: theo


# 1.104 02-Mar-2000 thorpej

Avoid a bug in GCC which manifests itself when processing unaligned
IP options. Problem pointed out by Matt Hargett and Erik Fair, analyzed
by me.


# 1.103 01-Mar-2000 itojun

introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)


# 1.102 20-Feb-2000 darrenr

pass "struct pfil_head *" to pfil_add_hook and pfil_remove hook rather
than "struct protosw *".


# 1.101 17-Feb-2000 darrenr

Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.


# 1.100 16-Feb-2000 itojun

- if ip_dst matches address on !IFF_UP interface, and
- there's no match against addresses on IFF_UP interface,
send icmp unreach if I'm router. drop it if I'm host.

Revised version of PR: 9387 from nrt@iij.ad.jp. Discussed with thorpej+nrt.


Revision tags: chs-ubc2-newbase
# 1.99 12-Feb-2000 thorpej

Typo (Thanks, Havard :-)


# 1.98 12-Feb-2000 thorpej

Small cosmetic change, and note a place where a statistic should be
gathered.


# 1.97 11-Feb-2000 itojun

fix in-kernel packet forwarding loop (till TTL becomes 0) when:
- a packet is delivered to an address X,
- and the address X is configured on my !IFF_UP interface
- and ipforwarding=1

NetBSD PR: 9387
From: nrt@iij.ad.jp


# 1.96 01-Feb-2000 thorpej

Use ifatoia() and sintosa() consistently, rather than using home-grown
casting macros intermixed.


# 1.95 31-Jan-2000 itojun

bring in latest KAME ipsec tree.
- interop issues in ipcomp is fixed
- padding type (after ESP) is configurable
- key database memory management (need more fixes)
- policy specification is revisited

XXX m->m_pkthdr.rcvif is still overloaded - hope to fix it soon


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.94 26-Oct-1999 itojun

disable ipflow (IPv4 fast fowarding) when IPsec is configured into the kernel.


# 1.93 17-Oct-1999 sommerfeld

branches: 1.93.2; 1.93.4;
In ip_forward():

Avoid forwarding ip unicast packets which were contained inside
link-level multicast packets; having M_MCAST still set in the packet
header flags will mean that the packet will get multicast to a bogus
group instead of unicast to the next hop.

Malformed packets like this have occasionally been spotted "in the
wild" on a mediaone cable modem segment which also had multiple netbsd
machines running as router/NAT boxes.

Without this, any subnet with multiple netbsd routers receiving all
multicasts will generate a packet storm on receipt of such a
multicast. Note that we already do the same check here for link-level
broadcasts; ip6_forward already does this as well.

Note that multicast forwarding does not go through ip_forward().

Adding some code to if_ethersubr to sanity check link-level
vs. ip-level multicast addresses might also be worthwhile.


Revision tags: chs-ubc2-base
# 1.92 23-Jul-1999 itojun

branches: 1.92.2;
do not include unnecessary include files.


# 1.91 09-Jul-1999 thorpej

defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).


# 1.90 06-Jul-1999 itojun

sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups


# 1.89 01-Jul-1999 itojun

IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.88 26-Jun-1999 sommerfeld

If the new global variable hostzerobroadcast is zero, no longer assume
address zero of each net/subnet is a broadcast address.
(The default value is nonzero, which preserves the current behavior).

This can be set using sysctl; the boot-time default can also be
configured using the HOSTZEROBROADCAST kernel config option.

While we're here, defopt HOSTZEROBROADCAST and SUBNETSARELOCAL


# 1.87 04-May-1999 hwr

It does not make much sense to increase a "output" counter on input.


# 1.86 03-May-1999 thorpej

In INADDR_TO_IA(), skip interfaces which are not up. Revert previous change
to ip_input.c to check the interface status after INADDR_TO_IA().

Fix cooked up by Heiko Rupp and myself.

Fixes PR 7480.


# 1.85 03-May-1999 hwr

Drop packets, that have a Class-D address as source address.
Implements the first half of PR 7003.


# 1.84 07-Apr-1999 proff

tiny KNF change


# 1.83 07-Apr-1999 proff

Prevent reception of packets on downed interfaces (via an up interface).
fixes kern/7327


Revision tags: netbsd-1-4-base
# 1.82 27-Mar-1999 aidan

branches: 1.82.2;
Added per-addr input/output statistics. Currently just support netatalk
and netinet, currently only tested under netinet.

Disabled by default, enabled by compiling the kernel with option
IFA_STATS. Enabling this feature seems to make the ip_output function
take 13% longer than before, which should be OK for people that need
this feature.


# 1.81 26-Mar-1999 proff

security: test for ip_len < ip_hl <<2 and drop packet accordingly


# 1.80 19-Jan-1999 mycroft

There's just no plausible reason to byte-swap ip_id internally. It's opaque.


# 1.79 19-Jan-1999 mycroft

Don't screw with ip_len; just subtract from it where we actually use the
value.


# 1.78 19-Jan-1999 mycroft

Don't overwrite the checksum fields when checking them. There's no reason to
do this, and it screws up ICMP replies.
XXX The returned IP checksum and length are still wrong.


# 1.77 11-Jan-1999 thorpej

Fix byte order and ip_len inconsistencies in ICMP reply code. Also, fix
some formatting and HTONS(foo) vs. foo = htons(foo) inconsistencies.

PR #6602, Darren Reed.


# 1.76 19-Dec-1998 thorpej

Reverse the copyright-notice-swap. It went against existing practice.


# 1.75 18-Dec-1998 thorpej

Add a lock around the IP fragment reassembly queue, to prevent ip_drain()
from corrupting the queue if called from a device's interrupt context.

Should fix PR #5684.


Revision tags: kenh-if-detach-base
# 1.74 13-Nov-1998 thorpej

branches: 1.74.2;
Once a fragmented IP packet has been reassembled, recompute the packet
length before passing it up the stack. From FreeBSD.


Revision tags: chs-ubc-base
# 1.73 08-Oct-1998 thorpej

Use the pool allocator for ipflow entries.


# 1.72 08-Oct-1998 thorpej

Use the pool allocator for ipqent structures.


# 1.71 30-Sep-1998 tls

Switch order of TNF and UCB copyrights so UCB copyright is first; this seems more appropriate since UCB wrote the original code, after all.


# 1.70 09-Sep-1998 thorpej

Make a diagnostic printf more sensible, PR #5951, Heiko W. Rupp.


# 1.69 09-Aug-1998 mrg

defopt PFIL_HOOKS.


Revision tags: eeh-paddr_t-base
# 1.68 17-Jul-1998 sommerfe

Fix PR5508: ipfil cut-through forwarding causes panic


# 1.67 01-Jun-1998 thorpej

Protect the ipflow_reap() call with splsoftnet.


# 1.66 24-May-1998 thorpej

Fix OBOB in IP timestamp option processing, as noted in FreeBSD PR 6738,
from Jennifer Dawn Meyers <jdm@enteract.com>.


# 1.65 04-May-1998 matt

Default IP flow to being enabled. Add a sysctl to control the maximum
number of flows (net.inet.ip.maxflows). If set to 0, will disable fast
path forwarding.


# 1.64 01-May-1998 thorpej

Allow packet filters to prevent a packet from creating a fast-forwarding
flow, by setting the "can fast forward" flag in the packet header, and
giving a chance for filters to clear the flag. If the flag is still
set after the filters have given it a chance, the packet will be used
to create a fast-forward flow entry.


# 1.63 29-Apr-1998 matt

Add support for "fast" forwarding. Add hooks in if_ethersubr.c and
if_fddisubr.c to fastpath IP forwarding. If ip_forward successfully
forwards a packet, it will create a cache (ipflow) entry. ether_input
and fddi_input will first call ipflow_fastforward with the received
packet and if the packet passes enough tests, it will be forwarded (the
ttl is decremented and the cksum is adjusted incrementally).


# 1.62 29-Apr-1998 matt

defopt GATEWAY


# 1.61 29-Apr-1998 kml

change path MTU timeout value to match RFC 1191


# 1.60 29-Apr-1998 kml

Add support for deletion of routes added by path MTU discovery;
uses new generic route timeout code. Add sysctl for timeout period.


# 1.59 19-Mar-1998 mrg

convert pfil(9) in and out lists from <sys/queue.h> LISTs to TAILQs, and
change pfil_add_hook to put output filters at the tail of the queue,
while continuing to place input filters at the head of the queue. update
the two users of these functions, and document these changes.

fixes PR#4593.


# 1.58 15-Feb-1998 tls

Add correct copyright notice for IP address hash change. This code is donated to TNF by the original copyright holder, Panix.


# 1.57 13-Feb-1998 tls

Change list of interface IP addresses to a hash. Improves performance on hosts with a large number of IP addresses significantly.


# 1.56 28-Jan-1998 thorpej

Use offsetof() from libkern.h


# 1.55 12-Jan-1998 scottr

Use option header file for MROUTING


# 1.54 05-Jan-1998 lukem

enhance ephemeral port allocation code:
* support sysctl net.inet.ip.anonportmin (lowest ephemeral port)
and net.inet.ip.anonportmax (highest ephemeral port).
these can't be set to >65535, < IPPORT_RESERVED (unless IPNOPRIVPORTS
is defined), and anonportmin has to be < anonportmax.
* use a cleaner way of only cycling through the available set once;
this will be useful for when a random allocation scheme is used
* define IPPORT_ANON{MIN,MAX} instead of IPPORT_USER{LOW,HIGH}


Revision tags: netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base
# 1.53 18-Oct-1997 kml

branches: 1.53.2;
change sysctl net.inet.icmp.mtudisc to net.inet.ip.mtudisc


# 1.52 17-Oct-1997 thorpej

Allow `subnetsarelocal' to be changed via sysctl.


Revision tags: thorpej-signal-base marc-pcmcia-base
# 1.51 29-Aug-1997 gwr

Tweaks to allow operation with an interface address of 0.0.0.0
(needed for NFS mountroot using BOOTP to get boot parameters)


Revision tags: marc-pcmcia-bp
# 1.50 24-Jun-1997 thorpej

branches: 1.50.4;
Eliminate use of dtom() from the network code, allowing more flexible
use of mbuf external storage and increasing performance (by eliminating
an m_pullup() for clusters in the IP reassembly code).

Changes from Koji Imada <koji@math.human.nagoya-u.ac.jp>, in PR #3628
and #3480, with ever-so-slight integration changes by me.


# 1.49 15-Apr-1997 christos

Move the mtod calls *after* we've made sure that the packet has passed the
filter successfully. Otherwise it can be NULL if the filter blocked it,
and we die. How did this ever work?


Revision tags: is-newarp-before-merge
# 1.48 26-Feb-1997 mrg

allow src-routed packetd by default, per host requirements


# 1.47 25-Feb-1997 cjs

Add net.inet.ip.allowsrcrt option which allows/drops all source
routed packets. This currently defaults to `drop,' but once we
verify that all applications that rely on determining remote IP
addresses for authentication are dropping the connection when they
see a source route option (not just disabling the source route
option), we can turn this back on and conform with the host
requirements.


# 1.46 19-Feb-1997 cjs

Fix bug in sysctl net.inet.ip.forwsrcrt handing: now you can read it
if securelevel > 0. (Thanks, cgd.)


# 1.45 18-Feb-1997 mrg

pseudo-device ipfilter brings in PFIL_HOOKS.


Revision tags: is-newarp-base
# 1.44 11-Jan-1997 thorpej

branches: 1.44.4;
Implement the IP_RECVIF socket option: supply a datagram packet's incoming
interface using a sockaddr_dl in a control mbuf.

Implement SO_TIMESTAMP for IP datagrams.

Move packet information option processing into a generic function
so that they work with multicast UDP and raw IP as well as unicast UDP.

Contributed by Bill Fenner <fenner@parc.xerox.com>.


# 1.43 20-Dec-1996 mrg

in pfil_hooks: always reassign ip after calling hook.


# 1.42 20-Dec-1996 mrg

remove pfil_bad.


# 1.41 25-Oct-1996 thorpej

Before concatenating frags, sanity check the length of the packet. If it's
larger than IP_MAXPACKET, discard it.
Based on a patch from Bill Fenner <fenner@parc.xerox.com>


# 1.40 22-Oct-1996 veego

Fix a panic from the pfil_hooks.


# 1.39 13-Oct-1996 christos

backout previous kprintf changes


# 1.38 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.37 21-Sep-1996 perry

commit fix in pr 2772 -- the IP input code was assuming that the
reserved (must be zero) flag must necessarily be zero. We now define
an IP_RF (by analogy to IP_DF and IP_MF) and mask it out when necessary.


# 1.36 14-Sep-1996 mrg

move the packet filter hooks in to a saner location. while i'm here, rename
PACKET_FILTER to PFIL_HOOKS.


# 1.35 09-Sep-1996 mycroft

Add in_nullhost() and in_hosteq() macros, to hide some protocol
details. Also, fix a bug in TCP wrt SYN+URG packets.


# 1.34 08-Sep-1996 mycroft

Save 68 bytes of the packet for ICMP, not 64. From Laine Stump, PR 2296.


# 1.33 06-Sep-1996 mrg

add packet filter interface code. see pfil(9) for more details. you
need the PACKET_FILTER option to enable this code. currently, ipfilter
version 3.1.1-beta has been converted to use this new interface.


# 1.32 14-Aug-1996 thorpej

Fix some DIAGNOSTIC printf() formats; ntohl() provides a 32-bit quantity,
and should be printed with %x, not %lx.


# 1.31 10-Jul-1996 cgd

print result of ntohl/htonl as a long. (makes -Wformat work on the
Alpha.)


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.30 16-Mar-1996 christos

branches: 1.30.4;
Fix printf format args.


# 1.29 26-Feb-1996 mrg

two more local addr changes, all done differently now (idea from charles)


# 1.28 13-Feb-1996 christos

netinet prototypes


# 1.27 16-Jan-1996 thorpej

Add a net.inet.ip.directed-broadcast sysctl as suggested by
Darren Reed <darrenr@vitruvius.arbld.unimelb.edu.au> in PR #1227.
This change is slightly different than the one submitted by Darren in
that the DIRECTED_BROADCAST compile-time option will behave like it used
to so that existing configurations utilizing it won't have to change.


# 1.26 15-Jan-1996 thorpej

Add net.inet.ip.forwsrcrt: if zero, the system will not forward
source-routed packets. Note this value is protected by kernel security
level; it can only be changed if securelevel < 1.


# 1.25 21-Nov-1995 cgd

make netinet work on systems where pointers and longs are 64 bits
(like the alpha). Biggest problem: IP headers were overlayed with
structure which included pointers, and which therefore didn't overlay
properly on 64-bit machines. Solution: instead of threading pointers
through IP header overlays, add a "queue element" structure to do
the threading, and point it at the ip headers.


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.24 12-Aug-1995 mycroft

splnet --> splsoftnet


# 1.23 12-Jun-1995 mycroft

Change in_pcbnotify*() to take an errno value. Make inetctlerrmap[] an
array on ints, not u_chars.


# 1.22 12-Jun-1995 mycroft

Various cleanup, including:
* Convert several data structures to use queue.h.
* Split in_pcbnotify() into two parts; one for notifying a specific PCB, and
one for notifying all PCBs for a particular foreign address.


# 1.21 07-Jun-1995 mycroft

Remove ip_ifmatrix completely.


# 1.20 04-Jun-1995 mycroft

Don't cast things unnecessarily.


# 1.19 04-Jun-1995 mycroft

Clean up many more casts.


# 1.18 01-Jun-1995 mycroft

Avoid byte-swapping IP addresses at run time.


# 1.17 15-May-1995 cgd

oops; forgot a '{'


# 1.16 14-May-1995 cgd

drop (and record) malformed IP fragments. Fixes pr 1030 (differently).


# 1.15 13-Apr-1995 cgd

be a bit more careful and explicit with types. (basically a large no-op.)


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.14 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.13 13-May-1994 mycroft

Update to 4.4-Lite networking code, with a few local changes.


# 1.12 14-Feb-1994 mycroft

PARANOID --> DIAGNOSTIC for inexpensive tests.


# 1.11 02-Feb-1994 hpeyerl

Multicast is no longer optional.


# 1.10 29-Jan-1994 brezak

Fix some cases of NOT dealing with m_pkthdr's. This code is still suspect though, at least this fixes some panics.


# 1.9 10-Jan-1994 mycroft

Should compile now with or without `options MULTICAST'.


# 1.8 09-Jan-1994 mycroft

Prototype the rest.


# 1.7 08-Jan-1994 mycroft

More prototypes.


# 1.6 08-Jan-1994 mycroft

Fix some inconsistent spacing; spaces at the end of lines, etc.


# 1.5 18-Dec-1993 mycroft

Canonicalize all #includes.


# 1.4 06-Dec-1993 hpeyerl

multicast support.
>From Chris Maeda, cmaeda@cs.washington.edu
These patches are derived from the IP Multicast patches for BSDI.


Revision tags: magnum-base netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.3 20-May-1993 cgd

branches: 1.3.4;
more rcsid additions and file header cleanups


# 1.2 04-May-1993 cgd

make ip_input recursion checking be for -DPARANOID, and make it panic


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.348 24-Jan-2017 ozaki-r

Tweak softnet_lock and NET_MPSAFE

- Don't hold softnet_lock in some functions if NET_MPSAFE
- Add softnet_lock to sysctl_net_inet_icmp_redirtimeout
- Add softnet_lock to expire_upcalls of ip_mroute.c
- Restore softnet_lock for in{,6}_pcbpurgeif{,0} if NET_MPSAFE
- Mark some softnet_lock for future work


Revision tags: bouyer-socketcan-base pgoyette-localcount-20170107
# 1.347 12-Dec-2016 ozaki-r

Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.346 08-Dec-2016 ozaki-r

Use psref for ip_rtaddr

ip_rtaddr will be sleepable soon. So use psref instead of pserialize.


# 1.345 08-Dec-2016 ozaki-r

Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.344 18-Oct-2016 ozaki-r

Don't hold global locks if NET_MPSAFE is enabled

If NET_MPSAFE is enabled, don't hold KERNEL_LOCK and softnet_lock in
part of the network stack such as IP forwarding paths. The aim of the
change is to make it easy to test the network stack without the locks
and reduce our local diffs.

By default (i.e., if NET_MPSAFE isn't enabled), the locks are held
as they used to be.

Reviewed by knakahara@


# 1.343 18-Oct-2016 ozaki-r

Avoid double frees of mbuf

May fix one of panicks reported by Tom Ivar Helbekkmo in PR kern/51522


# 1.342 11-Oct-2016 ozaki-r

Fix kernel builds with IFA_STATS


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.341 07-Sep-2016 roy

Disallow input to detached addresses because they are not yet valid.


# 1.340 31-Aug-2016 ozaki-r

Make ipforward_rt and ip6_forward_rt percpu

Sharing one rtcache between CPUs is just a bad idea.

Reviewed by knakahara@


Revision tags: pgoyette-localcount-20160806
# 1.339 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


# 1.338 26-Jul-2016 ozaki-r

Fix downmatch increment


Revision tags: pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.337 08-Jul-2016 ozaki-r

branches: 1.337.2;
CID 1363344: remove dead code

We may need to reconsider a case when m_get_rcvif_psref returns NULL.


# 1.336 07-Jul-2016 ozaki-r

Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.


# 1.335 06-Jul-2016 ozaki-r

Switch the IPv4 address list to pslist(9)

Note that we leave the old list just in case; it seems there are some
kvm(3) users accessing the list. We can remove it later if we confirmed
nobody does actually.


# 1.334 06-Jul-2016 ozaki-r

Add and use pslist(9)-based hashtable for IPv4 addresses

Note that we leave the old hashtable to keep vmstat -H working.


# 1.333 04-Jul-2016 ozaki-r

Separate IP address matching functions

No functional change intended.


# 1.332 30-Jun-2016 ozaki-r

Tidy up goto lables

No functional change.


# 1.331 30-Jun-2016 ozaki-r

Fix error paths

Some error paths did m_put_rcvif_psref twice.


# 1.330 28-Jun-2016 ozaki-r

Add missing NULL checks for m_get_rcvif_psref


# 1.329 10-Jun-2016 ozaki-r

Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319
# 1.328 21-Jan-2016 riastradh

Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.


# 1.327 21-Jan-2016 riastradh

Give proper prototype to ip_output.


# 1.326 08-Jan-2016 knakahara

eliminate ip_input.c and ip6_input.c dependency on gif(4)


Revision tags: nick-nhusb-base-20151226
# 1.325 13-Oct-2015 roy

Include arp.h to restore the sysctl net.inet.ip.dad_count.
Fixes PR kern/49883 thanks to HITOSHI Osada.


Revision tags: nick-nhusb-base-20150921
# 1.324 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.323 07-Aug-2015 ozaki-r

Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.


Revision tags: nick-nhusb-base-20150606
# 1.322 02-May-2015 joerg

Fix !ARP build.


# 1.321 02-May-2015 roy

Add IPv4 address flags IN_IFF_TENTATIVE, IN_IFF_DUPLICATED and
IN_IFF_DETATCHED to mimic the IPv6 address behaviour.
Add SIOCGIFAFLAG_IN ioctl to retrieve the address flag via the
ifreq structure.
Add IPv4 DAD detection via the ARP methods described in RFC 5227.
Add sysctls net.inet.ip.dad_count and net.inet.arp.debug.

Discussed on tech-net@


Revision tags: nick-nhusb-base-20150406
# 1.320 26-Mar-2015 ozaki-r

Tidy up the regular path of ip_forward

No functional change is intended.


Revision tags: netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.319 16-Jun-2014 ozaki-r

branches: 1.319.4;
Add 3rd argument to pktq_create to pass sc

It will be used to pass bridge sc for bridge_forward softint.

ok rmind@


# 1.318 05-Jun-2014 rmind

- Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.


# 1.317 30-May-2014 christos

Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.


# 1.316 29-May-2014 rmind

Make IGMP and multicast group management code MP-safe. Use a read-write
lock to protect the hash table of multicast address records; also, make it
private and eliminate some macros. In the long term, the lookup path ought
to be optimised.


# 1.315 28-May-2014 christos

CID 12164{49,51}: Remove bogus ifp == NULL checks; if ifp was really NULL,
we would have been dead a few lines before the tests.


# 1.314 23-May-2014 rmind

ip_input(), ip_savecontrol(): cache m->m_pkthdr.rcvif in a variable.


# 1.313 23-May-2014 rmind

Make ip_forward() static, there is no need to expose it.


# 1.312 23-May-2014 rmind

Make ip_input() static, there is no need to expose it.


# 1.311 22-May-2014 rmind

- Add in_init() and move some functions, variables and sysctls into in.c
where they belong to. Make some functions and variables static.
- ip_input.c: reduce some #ifdefs, cleanup a little.
- Move some sysctls into ip_flow.c as they belong there.

No functional change.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.310 19-Mar-2014 liamjfoy

branches: 1.310.2;
Remove ipflow_prune and replace with ipflow_reap. ok rmind@


Revision tags: riastradh-drm2-base3
# 1.309 25-Feb-2014 pooka

Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.308 29-Jun-2013 rmind

- Rewrite parts of pfil(9): use array to store hooks and thus be more cache
friendly (there are only few hooks in the system). Make the structures
opaque and the interface more strict.
- Remove PFIL_HOOKS option by making pfil(9) mandatory.


# 1.307 27-Jun-2013 christos

branches: 1.307.2;
flip src/dst


# 1.306 27-Jun-2013 christos

implement IP_PKTINFO and IP_RECVPKTINFO.


# 1.305 08-Jun-2013 rmind

Split IPsec code in ip_input() and ip_forward() into the separate routines
ipsec4_input() and ipsec4_forward(). Tested by christos@.


# 1.304 05-Jun-2013 christos

IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.


Revision tags: agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.303 29-Nov-2012 christos

Add a new sysctl to mark ports as reserved, so that they are not used in
the anonymous or reserved port allocation.


Revision tags: yamt-pagecache-base6
# 1.302 25-Jun-2012 christos

branches: 1.302.2;
rename rfc6056 -> portalgo, requested by yamt


# 1.301 22-Jun-2012 christos

PR/46602: Move the rfc6056 port randomization to the IP layer.


# 1.300 02-Jun-2012 dsl

Add some pre-processor magic to verify that the type of the data item
passed to sysctl_createv() actually matches the declared type for
the item itself.
In the places where the caller specifies a function and a structure
address (typically the 'softc') an explicit (void *) cast is now needed.
Fixes bugs in sys/dev/acpi/asus_acpi.c sys/dev/bluetooth/bcsp.c
sys/kern/vfs_bio.c sys/miscfs/syncfs/sync_subr.c and setting
AcpiGbl_EnableAmlDebugObject.
(mostly passing the address of a uint64_t when typed as CTLTYPE_INT).
I've test built quite a few kernels, but there may be some unfixed MD
fallout. Most likely passing &char[] to char *.
Also add CTLFLAG_UNSIGNED for unsiged decimals - not set yet.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.299 22-Mar-2012 drochner

remove KAME IPSEC, replaced by FAST_IPSEC


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.298 09-Jan-2012 liamjfoy

check against NULL


# 1.297 19-Dec-2011 drochner

rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.296 31-Aug-2011 plunky

branches: 1.296.2; 1.296.6;
NULL does not need a cast


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.295 03-May-2011 dyoung

*_drain() routines may be called with locks held, so instead of doing
any work in *_drain(), set a drain-needed flag. Do the work in the
fasttimo handler.

Contributed by Coyote Point Systems, Inc.


# 1.294 14-Apr-2011 dyoung

In ipintr(), don't overwrite ipintrq.ifq_maxlen with IFQ_MAXLEN.

Initialize ipintrq.ifq_maxlen using IFQ_MAXLEN directly instead of using
the global ipqmaxlen. Get rid of the global ipqmaxlen.

Now it works again to override the maximum IP queue length with, for
example, sysctl -w net.inet.ip.ifq.maxlen=5.


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.293 13-Dec-2010 matt

branches: 1.293.2;
Back out rev that shouldn't have been committed.


# 1.292 11-Dec-2010 matt

Add routines to calculate a checkesum if the driver concludes that the
h/w can't do it.


Revision tags: uebayasi-xip-base4
# 1.291 05-Nov-2010 rmind

ip_randomid: make mechanism MP-safe and more modular.

OK matt@


# 1.290 05-Nov-2010 rmind

ip_reass_packet: finish abstraction; some clean-up.
Discussed some time ago with matt@.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.289 19-Jul-2010 rmind

Abstract IP reassembly into single generic routine - ip_reass_packet().
Make struct ipq private and struct ipqent not visible to userland.
Push ip_len adjustment into reassembly layer.

OK matt@


# 1.288 13-Jul-2010 rmind

Split-off IPv4 re-assembly mechanism into a separate module. Abstract
into ip_reass_init(), ip_reass_lookup(), etc (note: abstraction is not
yet complete). No functional changes to the actual mechanism.

OK matt@


# 1.287 09-Jul-2010 rmind

ip_input: move lookup for fragment queue a little bit further. OK matt@.


Revision tags: uebayasi-xip-base1
# 1.286 01-Apr-2010 tls

As suggested by at least 3 different people (the guilty parties know who
they are) avoid repeated kernel_lock/unlock by using an intrq on the stack.

About 5%-10% better from run to run, on my *very* simpleminded test. Can't
possibly be worse.


# 1.285 31-Mar-2010 tls

Don't hold kernel lock across call to ip_input() -- it blocked *all*
hardware interrupts for the length of time it took for all dequeued
packets to flow up the stack (on multiprocessors only). Initial testing
shows performance impact is minimal -- since this temporary fix actually
means taking/releasing the kernel lock per-packet, that seems
acceptable.

Holding the kernel lock across the ip_input() call duplicated the
exclusion intended to be provided by the socket locks/softnet lock
(same lock, for INET/INET6 sockets) and could mask serious bugs. Several
hours' testing didn't turn any up but I'd be surprised if some don't now
appear.

Damon Permezel noticed the problem. Temporary fix suggested by matt@.


Revision tags: yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.284 16-Sep-2009 pooka

branches: 1.284.2; 1.284.4;
Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.283 17-Jul-2009 minskim

Delete trailing whitespace.


Revision tags: yamt-nfs-mp-base6
# 1.282 16-Jul-2009 minskim

Add the IP_RECVTTL option support.

If the IP_RECVTTL option is enabled on a SOCK_DGRAM socket, the
recvmsg(2) call will return the TTL of the received datagram. The
msg_control field in the msghdr structure points to a buffer that
contains a cmsghdr structure followed by the TTL value.

Modeled after FreeBSD implementation.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.281 18-Apr-2009 tsutsui

Remove extra whitespace added by a stupid tool.
XXX: more in src/sys/arch


# 1.280 15-Apr-2009 elad

Remove a few KAUTH_GENERIC_ISSUSER in favor of more descriptive
alternatives.

Discussed on tech-kern:

http://mail-index.netbsd.org/tech-kern/2009/04/11/msg004798.html

Input from ad@, christos@, dyoung@, tsutsui@.

Okay ad@.


# 1.279 18-Mar-2009 cegger

bcopy -> memcpy


Revision tags: nick-hppapmap-base2
# 1.278 19-Jan-2009 christos

branches: 1.278.2;
Provide compatibility to the old timeval SCM_TIMESTAMP messages.


Revision tags: mjf-devfs2-base
# 1.277 17-Dec-2008 cegger

kill MALLOC and FREE macros.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.276 23-Nov-2008 rmind

ip_input: fix an IPQ "lock" leak. (hi <matt>!)


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4
# 1.275 04-Oct-2008 pooka

branches: 1.275.2; 1.275.4;
POOL_INIT -> pool_init


Revision tags: wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.274 05-Sep-2008 seanb

Wrong route being consulted in one place
in ip_forward() after change to rtcache_*().
Restore previous behaviour.


# 1.273 20-Aug-2008 matt

Make the sysctl routines take out softnet_lock before dealing with
any data structures.

Change inet6ctlerrmap and zeroin6_addr to const.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base
# 1.272 05-May-2008 ad

branches: 1.272.2; 1.272.6;
- Convert hashinit() to use kmem_alloc(). The hash tables can be large
and it's better to not have them in kmem_map.
- Convert a couple of minor items along the way to kmem_alloc().
- Fix some memory leaks.


# 1.271 04-May-2008 thorpej

Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577


# 1.270 02-May-2008 ad

PR kern/38497 Out of memory allocating ksiginfo

Work around: don't acquire softnet_lock in protocol drain routines.


# 1.269 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.268 24-Apr-2008 ad

branches: 1.268.2;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.


# 1.267 23-Apr-2008 thorpej

Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.266 12-Apr-2008 thorpej

branches: 1.266.2;
Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.


# 1.265 09-Apr-2008 thorpej

- ipflow is not used outside ip_flow.c; move its definition there.
- Make ipflow_reap() private to ip_flow.c, and introduce ipflow_prune()
for external callers to use (avoids returning an ipflow * that is never
actually used anyway).


# 1.264 07-Apr-2008 thorpej

Change IP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ipstat structure; old netstat
binaries will continue to work properly.


# 1.263 27-Mar-2008 cube

- Make sure we send a reasonable fragment size when IPSEC is configured.
Otherwise we end up sending a dubious "0" whenever we cannot find a
proper association for the packet.
- Reset sack_newdata along with snd_nxt to avoid improper integer
arithmetics that lead to sending data from an incorrect place in the
stream, making it appear as corrupted.

Patch by Michael Van Elst, based on an analysis by Michael for the IPSEC
stuff and I for the SACK issue.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.262 06-Feb-2008 matt

branches: 1.262.6;
Add a new ip_id generation scheme based on a Fisher-Yates shuffle over a
sliding window. XXX replace use of arc4random RSN.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.261 14-Jan-2008 dyoung

Use rtcache_validate() instead of rtcache_getrt(). Shorten staircase
in in_losing().


Revision tags: vmlocking2-base3 matt-armv6-base
# 1.260 22-Dec-2007 matt

Fix offset calculation.
Make sure that all frags use the same TOS.


# 1.259 21-Dec-2007 matt

Also make sure the first is at 68 bytes long.


# 1.258 21-Dec-2007 matt

Prevent TCP blind data attacks by not allowing non-initial fragments to
start at less than 68 bytes (minimal fragment size).


# 1.257 20-Dec-2007 dyoung

Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.256 26-Nov-2007 yamt

branches: 1.256.2; 1.256.6;
inetctlerrmap: use designated initializer.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.255 09-Nov-2007 kefren

Don't MCLAIM in ipintr() because we do it anyway in ip_input()


Revision tags: jmcneill-base yamt-x86pmap-base4 yamt-x86pmap-base3 yamt-x86pmap-base2 vmlocking-base
# 1.254 02-Oct-2007 dyoung

branches: 1.254.2; 1.254.4;
Delete the unused second argument to ip_stripoptions(), move it
closer to its single caller in if_eon.c, try to move fewer bytes
by moving the IP header forward instead of moving the tail of the
mbuf backward, and use m_adj(9) instead of fiddling directly with
mbuf data members.


Revision tags: yamt-x86pmap-base
# 1.253 11-Sep-2007 degroote

branches: 1.253.2;
In some FAST_IPSEC, spl level is not restored correctly. Fix that.

Spotted by Wolfgang Stukenbrock in pr/36800


Revision tags: nick-csl-alignment-base5
# 1.252 30-Aug-2007 dyoung

Use malloc(9) for sockaddrs instead of pool(9), and remove dom_sa_pool
and dom_sa_len members from struct domain. Pools of fixed-size
objects are too rigid for sockaddr_dls, whose size can vary over
a wide range.

Return sockaddr_dl to its "historical" size. Now that I'm using
malloc(9) instead of pool(9) to allocate sockaddr_dl, I can create
a sockaddr_dl of any size in the kernel, so expanding sockaddr_dl
is useless.

Avoid using sizeof(struct sockaddr_dl) in the kernel.

Introduce sockaddr_dl_alloc() for allocating & initializing an
arbitrary sockaddr_dl on the heap.

Add an argument, the sockaddr length, to sockaddr_alloc(),
sockaddr_copy(), and sockaddr_dl_setaddr().

Constify: LLADDR() -> CLLADDR().

Where the kernel overwrites LLADDR(), use sockaddr_dl_setaddr(),
instead. Used properly, sockaddr_dl_setaddr() will not overrun
the end of the sockaddr.


# 1.251 10-Aug-2007 dyoung

branches: 1.251.2;
Use sockaddr_dl_init().


Revision tags: matt-mips64-base
# 1.250 19-Jul-2007 dyoung

branches: 1.250.4; 1.250.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base yamt-idlelwp-base8 mjf-ufs-trans-base
# 1.249 02-May-2007 dyoung

branches: 1.249.2;
Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


Revision tags: thorpej-atomic-base
# 1.248 25-Mar-2007 liamjfoy

Add net.inet.ip.hashsize to control the IPv4 fast forward hash table size.


# 1.247 24-Mar-2007 liamjfoy

Don't call ip*flow_reap if we're just looking up maxflows


# 1.246 12-Mar-2007 ad

branches: 1.246.2; 1.246.4;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.


# 1.245 05-Mar-2007 liamjfoy

branches: 1.245.2;
Move ipflow_slowtimo from ip_slowtimo and into in_proto.c

ok matt@


# 1.244 04-Mar-2007 christos

Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.243 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.242 29-Jan-2007 dyoung

branches: 1.242.2;
Cosmetic: remove extraneous, non-KNF parentheses. Change a
sizeof(type) to a sizeof(*ptr) so the correctness of the statement
is correct "at a glance" (or so I hope).


# 1.241 22-Dec-2006 ad

ipintr(): check if the queue is empty before looping. Hardly a giant
win, but removed 30% of splnet() calls in one local test.


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.240 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.239 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


# 1.238 06-Dec-2006 dyoung

KNF.


# 1.237 06-Dec-2006 dyoung

KNF.


Revision tags: netbsd-4-0-RC1 netbsd-4-base
# 1.236 16-Nov-2006 christos

branches: 1.236.2; 1.236.4;
__unused removal on arguments; approved by core.


Revision tags: yamt-splraiseipl-base2
# 1.235 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


# 1.234 10-Oct-2006 dogcow

change the MOWNER_INIT define to take two args; fix extant struct mowner
decls to use it. Makes options MBUFTRACE compile again and not whinge about
missing structure declarations. (Also makes initialization consistent.)


# 1.233 05-Oct-2006 tls

Protect calls to pool_put/pool_get that may occur in interrupt context
with spl used to protect other allocations and frees, or datastructure
element insertion and removal, in adjacent code.

It is almost unquestionably the case that some of the spl()/splx() calls
added here are superfluous, but it really seems wrong to see:

s=splfoo();
/* frob data structure */
splx(s);
pool_put(x);

and if we think we need to protect the first operation, then it is hard
to see why we should not think we need to protect the next. "Better
safe than sorry".

It is also almost unquestionably the case that I missed some pool
gets/puts from interrupt context with my strategy for finding these
calls; use of PR_NOWAIT is a strong hint that a pool may be used from
interrupt context but many callers in the kernel pass a "can wait/can't
wait" flag down such that my searches might not have found them. One
notable area that needs to be looked at is pf.

See also:

http://mail-index.netbsd.org/tech-kern/2006/07/19/0003.html
http://mail-index.netbsd.org/tech-kern/2006/07/19/0009.html


# 1.232 19-Sep-2006 elad

Remove ugly (void *) casts from network scope authorization wrapper and
calls to it.

While here, adapt code for system scope listeners to avoid some more
casts (forgotten in previous run).

Update documentation.


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9
# 1.231 13-Sep-2006 elad

branches: 1.231.2;
Don't use KAUTH_RESULT_* where it's not applicable.
Prompted by yamt@.


# 1.230 08-Sep-2006 elad

First take at security model abstraction.

- Add a few scopes to the kernel: system, network, and machdep.

- Add a few more actions/sub-actions (requests), and start using them as
opposed to the KAUTH_GENERIC_ISSUSER place-holders.

- Introduce a basic set of listeners that implement our "traditional"
security model, called "bsd44". This is the default (and only) model we
have at the moment.

- Update all relevant documentation.

- Add some code and docs to help folks who want to actually use this stuff:

* There's a sample overlay model, sitting on-top of "bsd44", for
fast experimenting with tweaking just a subset of an existing model.

This is pretty cool because it's *really* straightforward to do stuff
you had to use ugly hacks for until now...

* And of course, documentation describing how to do the above for quick
reference, including code samples.

All of these changes were tested for regressions using a Python-based
testsuite that will be (I hope) available soon via pkgsrc. Information
about the tests, and how to write new ones, can be found on:

http://kauth.linbsd.org/kauthwiki

NOTE FOR DEVELOPERS: *PLEASE* don't add any code that does any of the
following:

- Uses a KAUTH_GENERIC_ISSUSER kauth(9) request,
- Checks 'securelevel' directly,
- Checks a uid/gid directly.

(or if you feel you have to, contact me first)

This is still work in progress; It's far from being done, but now it'll
be a lot easier.

Relevant mailing list threads:

http://mail-index.netbsd.org/tech-security/2006/01/25/0011.html
http://mail-index.netbsd.org/tech-security/2006/03/24/0001.html
http://mail-index.netbsd.org/tech-security/2006/04/18/0000.html
http://mail-index.netbsd.org/tech-security/2006/05/15/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/01/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/25/0000.html

Many thanks to YAMAMOTO Takashi, Matt Thomas, and Christos Zoulas for help
stablizing kauth(9).

Full credit for the regression tests, making sure these changes didn't break
anything, goes to Matt Fleming and Jaime Fournier.

Happy birthday Randi! :)


Revision tags: yamt-pdpolicy-base8 rpaulo-netinet-merge-pcb-base
# 1.229 30-Aug-2006 christos

branches: 1.229.2;
fix initializer


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.228 30-Jul-2006 elad

ugh.. more stuff that's overdue and should not be in 4.0: remove the
sysctl(9) flags CTLFLAG_READONLY[12]. luckily they're not documented
so it's only half regression.

only two knobs used them; proc.curproc.corename (check added in the
existing handler; its CTLFLAG_ANYWRITE, yay) and net.inet.ip.forwsrcrt,
that got its own handler now too.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base
# 1.227 07-Jun-2006 kardel

merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base simonb-timecounters-base
# 1.226 08-May-2006 liamjfoy

branches: 1.226.2;
#if -> #ifdef

ok christos


# 1.225 15-Apr-2006 christos

Coverity CID 1134: Protect against NULL deref.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.224 18-Feb-2006 joerg

branches: 1.224.2; 1.224.4; 1.224.6;
Print the source and destination IP in ip_forward's DIAGNOSTIC code
with inet_ntoa, making it more human friendly.

From Liam J. Foy in private mail.


# 1.223 24-Dec-2005 perry

branches: 1.223.2; 1.223.4; 1.223.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.222 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.221 01-Nov-2005 christos

Don't decrement the ttl, until we are sure that we can forward this packet.
Before if there was no route, we would call icmp_error with a datagram
packet that has an incorrect checksum. (From Liam Foy)


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base
# 1.220 23-Oct-2005 christos

No need to pass an interface when only the mtu is needed. From OpenBSD via
Liam Foy.


Revision tags: yamt-vop-base
# 1.219 05-Aug-2005 elad

branches: 1.219.2;
Add sysctls for IP, ICMP, TCP, and UDP statistics.


# 1.218 28-Jun-2005 seanb

branches: 1.218.2;
- Return ICMP_UNREACH_NET when no route found as per
section 4.3.3.1 of rfc1812.


# 1.217 09-Jun-2005 atatat

Properly fix the constipated lossage wrt -Wcast-qual and the sysctl
code. I know it's not the prettiest code, but it seems to work rather
well in spite of itself.


# 1.216 01-Jun-2005 blymn

Unconstify rnode to prevent compile error when GATEWAY option set.


Revision tags: kent-audio2-base
# 1.215 29-Apr-2005 yamt

move decl of inetsw to its own header to avoid array of incomplete type.
found by gcc4. reported by Adam Ciarcinski.


# 1.214 18-Apr-2005 yamt

fix problems related to loopback interface checksum omission. PR/29971.

- for ipv4, defer decision to ip layer as h/w checksum offloading does
so that it can check the actual interface the packet is going to.
- for ipv6, disable it.
(maybe will be revisited when it implements h/w checksum offloading.)

ok'ed by Jason Thorpe.


# 1.213 29-Mar-2005 yamt

ip_reass: clear stale csum_flags.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base
# 1.212 26-Feb-2005 perry

branches: 1.212.2;
nuke trailing whitespace


Revision tags: yamt-km-base2
# 1.211 03-Feb-2005 perry

ANSIfy function declarations


# 1.210 02-Feb-2005 perry

de-__P -- will ANSIfy .c files later.


Revision tags: yamt-km-base
# 1.209 24-Jan-2005 matt

branches: 1.209.2;
Add IFNET_FOREACH and IFADDR_FOREACH macros and start using them.


Revision tags: kent-audio1-beforemerge
# 1.208 19-Dec-2004 christos

branches: 1.208.2;
yamt's changes seem to fix all the checksumming issues. Turn the loopback
checksums back off so we can make sure that everything works.


# 1.207 17-Dec-2004 christos

Turn checksumming on loopback back on until we fix the bugs in it.
Connect over tcp on the loopback is broken:

4729 amq 0.000007 CALL connect(4,0x804f2a0,0x1c)
4729 amq 75.007420 RET connect -1 errno 60 Connection timed out


# 1.206 15-Dec-2004 thorpej

Don't perform checksums on loopback interfaces. They can be reenabled with
the net.inet.*.do_loopback_cksum sysctl.

Approved by: groo


Revision tags: kent-audio1-base
# 1.205 06-Oct-2004 darrenr

Add a comment to document what setting "srcrt" is really on about in ipintr()


# 1.204 29-Sep-2004 christos

PR/27081: Sean Boudreau: ip_input() bad csum count not incremented on sw csum


Revision tags: BEFORE-IPF413
# 1.203 25-May-2004 atatat

Sysctl descriptions under net subtree (net.key not done)


# 1.202 02-May-2004 darrenr

at line 543, we do a pullup here of hlen bytes into the mbuf,
so these later ones are superfluous.


# 1.201 01-May-2004 matt

Use EVCNT_ATTACH_STATIC{,2}


# 1.200 25-Apr-2004 simonb

Initialise (most) pools from a link set instead of explicit calls
to pool_init. Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.

Convert struct session, ucred and lockf to pools.


# 1.199 22-Apr-2004 matt

Constify protosw arrays. This can reduce the kernel .data section by
over 4K (if all the network protocols) are loaded.


# 1.198 01-Apr-2004 matt

In ip_reass_ttl_descr, make i signed since it's compared to >= 0


Revision tags: netbsd-2-0-base BEFORE-IPF411
# 1.197 24-Mar-2004 atatat

branches: 1.197.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.


# 1.196 15-Jan-2004 itojun

correct typo in 1.94 -> 1.95. pointed out by Shiva Shenoy


# 1.195 14-Dec-2003 thorpej

Fix syntax errors in CHECK_NMBCLUSTER_PARAMS().


# 1.194 14-Dec-2003 jonathan

Second part of hashed IP_reassembly changes:

When under pressure for mbufs or we have too many fragments in the IP
reassembly queue, drop half of all fragments. This multiplicative-drop
strategy ensures we return to a healthy state, even under borderline
denial-of-service from extremely lossy NFS-over-UDP peers.
The multiplicative-drop phase currently drops 50% of fragments, but
has pre-placed support for implementing drop-fractions other than 50%

The threshhold for the `drop-half' phase is the new variable,
ip_maxfrags which is calculated as nmbclusters/4.

ip_input.c now keeps ip_nmbclusters, a cached copy of nmbclusters.
Before using limits derived from nmbclusters, we check if nmbclusters
and ip_nmclusters are equal. If not, we recompute Ip parameters
derived from nmbclusters. Based on a suggestion by Jason Thorpe.
ip_maxfrags is currently auto-recalcuated.

The counters ip_nfrags and ip_nfragpacketsr are now declared static
and uninitialized (bss), to discourage tampering with them.


# 1.193 12-Dec-2003 scw

Make fast-ipsec and ipflow (Fast Forwarding) interoperate.

The idea is that we only clear M_CANFASTFWD if an SPD exists
for the packet. Otherwise, it's safe to add a fast-forward
cache entry for the route.

To make this work properly, we invalidate the entire ipflow
cache if a fast-ipsec key is added or changed.


# 1.192 08-Dec-2003 jonathan

Add new field ipq_nfrags to struct ipq. Maintain count of fragments
(fragments, not fragmented packets) in each queue entry.
Use ipq_nfrags to maintain a count of total fragments in reassembly queue.


# 1.191 07-Dec-2003 jonathan

KNF: s/unsigned/u_int/, in a couple of places I missed.


# 1.190 06-Dec-2003 jonathan

Replace the single global IP reassembly list/listhead, with a
hashtable of list-heads. Independently re-invented, then reworked to
match similar code in FreeBSD.


# 1.189 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.188 04-Dec-2003 scw

ipflow (IP fast forwarding) is not compatible with FAST_IPSEC either.

XXX: The decision whether or not to fast forward should be made
XXX: dynamically. Using the current approach seriously reduces
XXX: routing performance on gateways with IPsec enabled.


# 1.187 26-Nov-2003 itojun

define RANDOM_IP_ID by default (unifdef -DRANDOM_IP_ID).
one use remains in sys/netipsec, which is kept for freebsd source code compat.


# 1.186 24-Nov-2003 scw

For FAST_IPSEC, ipfilter gets to see wire-format IPsec-encapsulated packets
only. Decapsulated packets bypass ipfilter. This mimics current behaviour
for Kame IPsec.


# 1.185 19-Nov-2003 fvdl

Correct number of arguments to sysctl_rdint.


# 1.184 19-Nov-2003 jonathan

Patch back support for (badly) randomized IP ids, by request:

* Include "opt_inet.h" everywhere IP-ids are generated with ip_newid(),
so the RANDOM_IP_ID option is visible. Also in ip_id(), to ensure
the prototype for ip_randomid() is made visible.

* Add new sysctl to enable randomized IP-ids, provided the kernel was
configured with RANDOM_IP_ID. (The sysctl defaults to zero, and is
a read-only zero if RANDOM_IP_ID is not configured).

Note that the implementation of randomized IP ids is still defective,
and should not be enabled at all (even if configured) without
very careful deliberation. Caveat emptor.


# 1.183 17-Nov-2003 jonathan

Diff to netinet/ip_input.c (restore ip_id, initialize) for ip_id fix:

Revert the (default) ip_id algorithm to the pre-randomid algorithm,
due to demonstrated low-period repeated IDs from the randomized IP_id
code. Consensus is that the low-period repetition (much less than
2^15) is not suitable for general-purpose use.

Allocators of new IPv4 IDs should now call the function ip_newid().
Randomized IP_ids is now a config-time option, "options RANDOM_IP_ID".
ip_newid() can use ip_random-id()_IP_ID if and only if configured
with RANDOM_IP_ID. A sysctl knob should be provided.

This API may be reworked in the near future to support linear ip_id
counters per (src,dst) IP-address pair.


# 1.182 12-Nov-2003 itojun

KNF


# 1.181 11-Nov-2003 jonathan

Change global head-of-local-IP-address list from in_ifaddr to
in_ifaddrhead. Recent changes in struct names caused a namespace
collision in fast-ipsec, which are most cleanly fixed by using
"in_ifaddrhead" as the listhead name.


# 1.180 10-Nov-2003 jonathan

Make per-protocol network input queue stats visible to userland via
sysctl. Add a protocol-independent sysctl handler to show the per-protocol
"struct ifq' statistics. Add IP(v4) specific call to the handler.
Other protocols can show their per-protocol input statistics by
allocating a sysclt node and calling sysctl_ifq() with their own struct ifq *.

As posted to tech-kern plus improvements/cleanup suggested by Andrew Brown.


# 1.179 28-Sep-2003 mycroft

Remove some code that breaks AH tunnels completely. The comment describing
the purpose of this code appears to be on crack -- it's talking about
end-to-end authentication, but the purpose of an AH tunnel is NOT end-to-end
authentication; it's authentication of the tunnel endpoints.

NB: This does not fix the fact that IPsec leaks "packet tags."


# 1.178 06-Sep-2003 itojun

randomize IPv4/v6 fragment ID and IPv6 flowlabel. avoids predictability
of these fields. ip_id.c is from openbsd. ip6_id.c is adapted by kame.


# 1.177 06-Sep-2003 itojun

backout previous, we don't know if arc4random() corrides on reboot.


# 1.176 05-Sep-2003 itojun

initialize fragment ID with arc4random, not by time.tv_sec


# 1.175 22-Aug-2003 itojun

remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.


# 1.174 22-Aug-2003 itojun

change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.


# 1.173 15-Aug-2003 jonathan

(fast-ipsec): Add hooks to pass IPv4 IPsec traffic into fast-ipsec, if
configured with ``options FAST_IPSEC''. Kernels with KAME IPsec or
with no IPsec should work as before.

All calls to ip_output() now always pass an additional compulsory
argument: the inpcb associated with the packet being sent,
or 0 if no inpcb is available.

Fast-ipsec tested with ICMP or UDP over ESP. TCP doesn't work, yet.


# 1.172 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.171 14-Jul-2003 itojun

correct igmp. from love


# 1.170 03-Jul-2003 itojun

minor KNF


# 1.169 30-Jun-2003 itojun

branches: 1.169.2;
do not generate ICMP redirect when packet filter alters ip_dst to an
address that reside on the same link. Cedric Berger convinced me that
it is necessary.


# 1.168 30-Jun-2003 itojun

fix indent


# 1.167 23-Jun-2003 martin

Make sure to include opt_foo.h if a defflag option FOO is used.


# 1.166 15-Jun-2003 matt

Change the way multicasts are kept. They now use a hash table in the same
manner as the ifaddr hash table. By doing this, the mkludge code can go
away. At the same time, keep track of what pcbs are using what ifaddr and
when an address is deleted from an interface, notify/abort all sockets
that have that address as a source. Switch IGMP and multicasts to use pools
for allocation. Fix a number of potential problems in the igmp code where
allocation failures could cause a trap/panic.


# 1.165 11-Apr-2003 christos

PR/991: Darren Reed: Add a sysctl (checkinteface) to implement this. This
implementation is taken from FreeBSD, but we default to off.
XXX: We should really do this on a per ifaddr basis as jason suggested.


# 1.164 26-Feb-2003 matt

Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.163 12-Nov-2002 itojun

remove all entries in rt timer queue on ip_mtudisc change, instead of
destroying the queue.


# 1.162 12-Nov-2002 itojun

ckout previous - doesn't compile


# 1.161 12-Nov-2002 itojun

update ip_mtudisc sysctl change handling.


# 1.160 10-Nov-2002 itojun

always create pmtud timeout queue, as ip_mtudisc can be tweaked via
sysctl at runtime. From lha@stacken.kth.se


# 1.159 02-Nov-2002 perry

/*CONTCOND*/ while (0)'ed macros


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.158 23-Sep-2002 itojun

revert mtudisc_timeout value to the old one if update falis


# 1.157 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame.


# 1.156 11-Sep-2002 itojun

correct signedness mixup in pointer passing. sync w/kame


Revision tags: gehenna-devsw-base
# 1.155 14-Aug-2002 itojun

avoid swapping endian of ip_len and ip_off on mbuf, to meet with M_LEADINGSPACE
optimization made last year. should solve PR 17867 and 10195.

IP_HDRINCL behavior of raw ip socket is kept unchanged. we may want to
provide IP_HDRINCL variant that does not swap endian.


# 1.154 30-Jun-2002 thorpej

Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).


# 1.153 13-Jun-2002 itojun

set IPv4 parameter to modern value.
- turn on path MTU discovery (previous: turned off)
- ICMPv4 redirect entry timeout = 600 sec (previous: never timeout)


# 1.152 09-Jun-2002 itojun

whitespace


# 1.151 07-Jun-2002 itojun

look at rmx_mtu on IPsec tunnel MTU computation.
From: David Waitzman <djw@bbn.com>


Revision tags: netbsd-1-6-base
# 1.150 12-May-2002 matt

branches: 1.150.2; 1.150.4;
Eliminate commons.


# 1.149 12-May-2002 wiz

Spelling fixes, from Sergey Svishchev in kern/16650.


# 1.148 07-May-2002 matt

Change struct ipqe to use TAILQ's instead of LIST's (primarily for TCP's
benefit currently). Rework tcp_reass code to optimize the 4 most likely causes
of out-of-order packets: first OoO pkt, next OoO pkt in seq, OoO pkt is part
of new chuck of OoO packets, and the OoO pkt fills the first hole. Add evcnts
to instrument tcp_reass (enabled by the options TCP_REASS_COUNTERS). This is
part 1/2 of tcp_reass changes.


# 1.147 18-Apr-2002 matt

Change test for M_EXT to M_READONLY for MROUTING. We only need to to do
a pullup if we aren't allowed to modify the packet.


Revision tags: eeh-devprop-base newlock-base
# 1.146 08-Mar-2002 thorpej

Pool deals fairly well with physical memory shortage, but it doesn't
deal with shortages of the VM maps where the backing pages are mapped
(usually kmem_map). Try to deal with this:

* Group all information about the backend allocator for a pool in a
separate structure. The pool references this structure, rather than
the individual fields.
* Change the pool_init() API accordingly, and adjust all callers.
* Link all pools using the same backend allocator on a list.
* The backend allocator is responsible for waiting for physical memory
to become available, but will still fail if it cannot callocate KVA
space for the pages. If this happens, carefully drain all pools using
the same backend allocator, so that some KVA space can be freed.
* Change pool_reclaim() to indicate if it actually succeeded in freeing
some pages, and use that information to make draining easier and more
efficient.
* Get rid of PR_URGENT. There was only one use of it, and it could be
dealt with by the caller.

From art@openbsd.org.


Revision tags: ifpoll-base
# 1.145 25-Feb-2002 itojun

correctly enforce ipsec policy check on forwarding case.
From: Greg Troxel <gdt@ir.bbn.com>, Bill Chiarchiaro <wjc@work.cleartech.com>


# 1.144 24-Feb-2002 martin

Clear M_BCAST and M_MCAST on outgoing mbufs.
Don't copy ttl from the inner packet to the encapsulating packet. Make
the outer ttl sysctl'able. This should close PR 14269 from Jasper Wallace
(change partly from there) and it makes traceroute work over gre tunnels.


# 1.143 21-Feb-2002 itojun

suppress source quence message, based on router-req RFC (also could be abused
as DoS traffic generator). from kjc/kame


# 1.142 28-Nov-2001 darrenr

recompute hlen after calling pfil_run_hooks() in case ip_hl was changed.


# 1.141 13-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-mips-cache-base
# 1.140 04-Nov-2001 matt

Convert netinet to not use the internal <sys/queue.h> field names
but instead the access macros. Use the FOREACH macros where appropriate.


# 1.139 04-Nov-2001 matt

Change a few variable/tables to const since they are read-only.


# 1.138 29-Oct-2001 simonb

Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.137 17-Sep-2001 thorpej

branches: 1.137.2;
Split the pre-computed ifnet checksum flags into Tx and Rx directions.
Add capabilities bits that indicate an interface can only perform
in-bound TCPv4 or UDPv4 checksums. There is at least one Gig-E chip
for which this is true (Level One LXT-1001), and this is also the
case for the Intel i82559 10/100 Ethernet chips.


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.136 06-Aug-2001 itojun

branches: 1.136.2;
cache IPsec policy on in6?pcb. most of the lookup operations can be bypassed,
especially when it is a connected SOCK_STREAM in6?pcb. sync with kame.


# 1.135 02-Jun-2001 thorpej

branches: 1.135.2;
Implement support for IP/TCP/UDP checksum offloading provided by
network interfaces. This works by pre-computing the pseudo-header
checksum and caching it, delaying the actual checksum to ip_output()
if the hardware cannot perform the sum for us. In-bound checksums
can either be fully-checked by hardware, or summed up for final
verification by software. This method was modeled after how this
is done in FreeBSD, although the code is significantly different in
most places.

We don't delay checksums for IPv6/TCP, but we do take advantage of the
cached pseudo-header checksum.

Note: hardware-assisted checksumming defaults to "off". It is
enabled with ifconfig(8). See the manual page for details.

Implement hardware-assisted checksumming on the DP83820 Gigabit Ethernet,
3c90xB/3c90xC 10/100 Ethernet, and Alteon Tigon/Tigon2 Gigabit Ethernet.


# 1.134 21-May-2001 lukem

fix spelo in comment


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.133 16-Apr-2001 itojun

give a default value to net.inet.ip.maxfragpackets, to protect us from
"lots of fragmented packets" DoS attack.

the current default value is derived from ipv6 counterpart, which is
a magical value "200". it should be enough for normal systems, not sure
if it is enough when you take hundreds of thousands of tcp connections on
your system. if you have proposal for a better value with concrete reasons,
let me know.


# 1.132 13-Apr-2001 thorpej

Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.


# 1.131 27-Mar-2001 itojun

net.inet.ip.maxfragpackets defines the maximum size of ip reass queue
(prevents fragment flood from chewing up mbuf memory space).
derived from KAME net.inet6.ip6.maxfragpackets.


# 1.130 02-Mar-2001 itojun

branches: 1.130.2;
increase ipstat.ips_badaddr if the packet fails to pass address checks.


# 1.129 02-Mar-2001 itojun

reject packets with 127/8 on IPv4 src/dst, they must not appear on wire
(RFC1122). torture-tests will be welcomed.
XXX do we want to check source routing headers as well?


# 1.128 01-Mar-2001 itojun

make sure to enforce inbound ipsec policy checking, for any protocols on top
of ip (check it when final header is visited). sync with kame.
XXX kame team will need to re-check policy engine code


# 1.127 24-Jan-2001 itojun

- record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation


# 1.126 28-Dec-2000 thorpej

Back out the sledgehammer damage applied by wiz while I was out for
the holiday.


# 1.125 25-Dec-2000 wiz

Back out previous change. It causes NAT to fail, and was CLEARLY
NOT TESTED before it was committed.


# 1.124 22-Dec-2000 thorpej

Slight adjustment to how pfil_head's are registered. Instead of a
"key" and a "dlt", use a "type" (PFIL_TYPE_{AF,IFNET} for now) and
a val/ptr appropriate for that type. This allows for more future
flexibility with the pfil_hook mechanism.


# 1.123 14-Dec-2000 thorpej

Add ALTQ glue. XXX Temporary until ALTQ is changed to use a pfil hook.


# 1.122 24-Nov-2000 itojun

IFA_STATS stability (not complete); don't touch ip if it is NULL.


# 1.121 11-Nov-2000 thorpej

Restructure the PFIL_HOOKS mechanism a bit:
- All packets are passed to PFIL_HOOKS as they come off the wire, i.e.
fields in protocol headers in network order, etc.
- Allow for multiple hooks to be registered, using a "key" and a "dlt".
The "dlt" is a BPF data link type, indicating what type of header is
present.
- INET and INET6 register with key == AF_INET or AF_INET6, and
dlt == DLT_RAW.
- PFIL_HOOKS now take an argument for the filter hook, and mbuf **,
an ifnet *, and a direction (PFIL_IN or PFIL_OUT), thus making them
less IP (really, IP Filter) centric.

Maintain compatibility with IP Filter by adding wrapper functions for
IP Filter.


# 1.120 08-Nov-2000 ad

Update for hashinit() change.


# 1.119 13-Oct-2000 itojun

make sure we don't share external mbuf between m and mcopy, in ip_forward().
should solve PR 11201.


# 1.118 26-Aug-2000 itojun

make sure anonport{min,max} is not negative number


# 1.117 25-Aug-2000 tron

Add new sysctl variables "net.inet.ip.lowportmin" and
"net.inet.ip.lowportmax" which can be used to the set minimum
and maximum port number assigned to sockets using
IP_PORTRANGE_LOW.


# 1.116 06-Jul-2000 itojun

remove unnecessary #include <netkey/key_debug.h>. from kame.


# 1.115 28-Jun-2000 mrg

<vm/vm.h> -> <uvm/uvm_extern.h>


Revision tags: netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.114 10-May-2000 itojun

branches: 1.114.4;
add missing boundary checks to ip options processing.
correct timestamp option validation (len and ptr upper/lower bound
based on RFC791).
fill "pointer" field for parameter problem in timestamp option processing.


# 1.113 10-May-2000 itojun

correct more out-of-bounds memory access, if cnt == 1 and optlen > 1.


# 1.112 06-May-2000 sommerfeld

Handle large offsets with very small options correctly.


# 1.111 31-Mar-2000 jdolecek

Slighly improve previous - only include <netinet/ip_mroute.h> if MROUTING
is defined.


# 1.110 31-Mar-2000 jdolecek

include <netinet/ip_mroute.h> for ip_mforward() - needed after
last duplicate prototype sweep (prototype for ip_mforward() used to be in <netinet/ip_var.h>)


# 1.109 30-Mar-2000 augustss

Remove register declarations.


# 1.108 30-Mar-2000 simonb

Delete uninitialised declaration of ip_defttl - there's an initialised
decl earlier in this file.


# 1.107 10-Mar-2000 thorpej

Back out previous, and adjust a comment.


# 1.106 07-Mar-2000 thorpej

Back out part of 1.104 which isn't actually needed.


# 1.105 03-Mar-2000 itojun

remove unnecessary ttl initialization which I mistakingly bringed in
during KAME merge (this is part of WIDE's expeirmental reass code...)
NetBSD PR: 9412
From: Wolfgang Rupprecht <wolfgang@wsrcc.com>
Fix from: ho@crt.se
itojun was notified from: theo


# 1.104 02-Mar-2000 thorpej

Avoid a bug in GCC which manifests itself when processing unaligned
IP options. Problem pointed out by Matt Hargett and Erik Fair, analyzed
by me.


# 1.103 01-Mar-2000 itojun

introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)


# 1.102 20-Feb-2000 darrenr

pass "struct pfil_head *" to pfil_add_hook and pfil_remove hook rather
than "struct protosw *".


# 1.101 17-Feb-2000 darrenr

Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.


# 1.100 16-Feb-2000 itojun

- if ip_dst matches address on !IFF_UP interface, and
- there's no match against addresses on IFF_UP interface,
send icmp unreach if I'm router. drop it if I'm host.

Revised version of PR: 9387 from nrt@iij.ad.jp. Discussed with thorpej+nrt.


Revision tags: chs-ubc2-newbase
# 1.99 12-Feb-2000 thorpej

Typo (Thanks, Havard :-)


# 1.98 12-Feb-2000 thorpej

Small cosmetic change, and note a place where a statistic should be
gathered.


# 1.97 11-Feb-2000 itojun

fix in-kernel packet forwarding loop (till TTL becomes 0) when:
- a packet is delivered to an address X,
- and the address X is configured on my !IFF_UP interface
- and ipforwarding=1

NetBSD PR: 9387
From: nrt@iij.ad.jp


# 1.96 01-Feb-2000 thorpej

Use ifatoia() and sintosa() consistently, rather than using home-grown
casting macros intermixed.


# 1.95 31-Jan-2000 itojun

bring in latest KAME ipsec tree.
- interop issues in ipcomp is fixed
- padding type (after ESP) is configurable
- key database memory management (need more fixes)
- policy specification is revisited

XXX m->m_pkthdr.rcvif is still overloaded - hope to fix it soon


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.94 26-Oct-1999 itojun

disable ipflow (IPv4 fast fowarding) when IPsec is configured into the kernel.


# 1.93 17-Oct-1999 sommerfeld

branches: 1.93.2; 1.93.4;
In ip_forward():

Avoid forwarding ip unicast packets which were contained inside
link-level multicast packets; having M_MCAST still set in the packet
header flags will mean that the packet will get multicast to a bogus
group instead of unicast to the next hop.

Malformed packets like this have occasionally been spotted "in the
wild" on a mediaone cable modem segment which also had multiple netbsd
machines running as router/NAT boxes.

Without this, any subnet with multiple netbsd routers receiving all
multicasts will generate a packet storm on receipt of such a
multicast. Note that we already do the same check here for link-level
broadcasts; ip6_forward already does this as well.

Note that multicast forwarding does not go through ip_forward().

Adding some code to if_ethersubr to sanity check link-level
vs. ip-level multicast addresses might also be worthwhile.


Revision tags: chs-ubc2-base
# 1.92 23-Jul-1999 itojun

branches: 1.92.2;
do not include unnecessary include files.


# 1.91 09-Jul-1999 thorpej

defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).


# 1.90 06-Jul-1999 itojun

sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups


# 1.89 01-Jul-1999 itojun

IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.88 26-Jun-1999 sommerfeld

If the new global variable hostzerobroadcast is zero, no longer assume
address zero of each net/subnet is a broadcast address.
(The default value is nonzero, which preserves the current behavior).

This can be set using sysctl; the boot-time default can also be
configured using the HOSTZEROBROADCAST kernel config option.

While we're here, defopt HOSTZEROBROADCAST and SUBNETSARELOCAL


# 1.87 04-May-1999 hwr

It does not make much sense to increase a "output" counter on input.


# 1.86 03-May-1999 thorpej

In INADDR_TO_IA(), skip interfaces which are not up. Revert previous change
to ip_input.c to check the interface status after INADDR_TO_IA().

Fix cooked up by Heiko Rupp and myself.

Fixes PR 7480.


# 1.85 03-May-1999 hwr

Drop packets, that have a Class-D address as source address.
Implements the first half of PR 7003.


# 1.84 07-Apr-1999 proff

tiny KNF change


# 1.83 07-Apr-1999 proff

Prevent reception of packets on downed interfaces (via an up interface).
fixes kern/7327


Revision tags: netbsd-1-4-base
# 1.82 27-Mar-1999 aidan

branches: 1.82.2;
Added per-addr input/output statistics. Currently just support netatalk
and netinet, currently only tested under netinet.

Disabled by default, enabled by compiling the kernel with option
IFA_STATS. Enabling this feature seems to make the ip_output function
take 13% longer than before, which should be OK for people that need
this feature.


# 1.81 26-Mar-1999 proff

security: test for ip_len < ip_hl <<2 and drop packet accordingly


# 1.80 19-Jan-1999 mycroft

There's just no plausible reason to byte-swap ip_id internally. It's opaque.


# 1.79 19-Jan-1999 mycroft

Don't screw with ip_len; just subtract from it where we actually use the
value.


# 1.78 19-Jan-1999 mycroft

Don't overwrite the checksum fields when checking them. There's no reason to
do this, and it screws up ICMP replies.
XXX The returned IP checksum and length are still wrong.


# 1.77 11-Jan-1999 thorpej

Fix byte order and ip_len inconsistencies in ICMP reply code. Also, fix
some formatting and HTONS(foo) vs. foo = htons(foo) inconsistencies.

PR #6602, Darren Reed.


# 1.76 19-Dec-1998 thorpej

Reverse the copyright-notice-swap. It went against existing practice.


# 1.75 18-Dec-1998 thorpej

Add a lock around the IP fragment reassembly queue, to prevent ip_drain()
from corrupting the queue if called from a device's interrupt context.

Should fix PR #5684.


Revision tags: kenh-if-detach-base
# 1.74 13-Nov-1998 thorpej

branches: 1.74.2;
Once a fragmented IP packet has been reassembled, recompute the packet
length before passing it up the stack. From FreeBSD.


Revision tags: chs-ubc-base
# 1.73 08-Oct-1998 thorpej

Use the pool allocator for ipflow entries.


# 1.72 08-Oct-1998 thorpej

Use the pool allocator for ipqent structures.


# 1.71 30-Sep-1998 tls

Switch order of TNF and UCB copyrights so UCB copyright is first; this seems more appropriate since UCB wrote the original code, after all.


# 1.70 09-Sep-1998 thorpej

Make a diagnostic printf more sensible, PR #5951, Heiko W. Rupp.


# 1.69 09-Aug-1998 mrg

defopt PFIL_HOOKS.


Revision tags: eeh-paddr_t-base
# 1.68 17-Jul-1998 sommerfe

Fix PR5508: ipfil cut-through forwarding causes panic


# 1.67 01-Jun-1998 thorpej

Protect the ipflow_reap() call with splsoftnet.


# 1.66 24-May-1998 thorpej

Fix OBOB in IP timestamp option processing, as noted in FreeBSD PR 6738,
from Jennifer Dawn Meyers <jdm@enteract.com>.


# 1.65 04-May-1998 matt

Default IP flow to being enabled. Add a sysctl to control the maximum
number of flows (net.inet.ip.maxflows). If set to 0, will disable fast
path forwarding.


# 1.64 01-May-1998 thorpej

Allow packet filters to prevent a packet from creating a fast-forwarding
flow, by setting the "can fast forward" flag in the packet header, and
giving a chance for filters to clear the flag. If the flag is still
set after the filters have given it a chance, the packet will be used
to create a fast-forward flow entry.


# 1.63 29-Apr-1998 matt

Add support for "fast" forwarding. Add hooks in if_ethersubr.c and
if_fddisubr.c to fastpath IP forwarding. If ip_forward successfully
forwards a packet, it will create a cache (ipflow) entry. ether_input
and fddi_input will first call ipflow_fastforward with the received
packet and if the packet passes enough tests, it will be forwarded (the
ttl is decremented and the cksum is adjusted incrementally).


# 1.62 29-Apr-1998 matt

defopt GATEWAY


# 1.61 29-Apr-1998 kml

change path MTU timeout value to match RFC 1191


# 1.60 29-Apr-1998 kml

Add support for deletion of routes added by path MTU discovery;
uses new generic route timeout code. Add sysctl for timeout period.


# 1.59 19-Mar-1998 mrg

convert pfil(9) in and out lists from <sys/queue.h> LISTs to TAILQs, and
change pfil_add_hook to put output filters at the tail of the queue,
while continuing to place input filters at the head of the queue. update
the two users of these functions, and document these changes.

fixes PR#4593.


# 1.58 15-Feb-1998 tls

Add correct copyright notice for IP address hash change. This code is donated to TNF by the original copyright holder, Panix.


# 1.57 13-Feb-1998 tls

Change list of interface IP addresses to a hash. Improves performance on hosts with a large number of IP addresses significantly.


# 1.56 28-Jan-1998 thorpej

Use offsetof() from libkern.h


# 1.55 12-Jan-1998 scottr

Use option header file for MROUTING


# 1.54 05-Jan-1998 lukem

enhance ephemeral port allocation code:
* support sysctl net.inet.ip.anonportmin (lowest ephemeral port)
and net.inet.ip.anonportmax (highest ephemeral port).
these can't be set to >65535, < IPPORT_RESERVED (unless IPNOPRIVPORTS
is defined), and anonportmin has to be < anonportmax.
* use a cleaner way of only cycling through the available set once;
this will be useful for when a random allocation scheme is used
* define IPPORT_ANON{MIN,MAX} instead of IPPORT_USER{LOW,HIGH}


Revision tags: netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base
# 1.53 18-Oct-1997 kml

branches: 1.53.2;
change sysctl net.inet.icmp.mtudisc to net.inet.ip.mtudisc


# 1.52 17-Oct-1997 thorpej

Allow `subnetsarelocal' to be changed via sysctl.


Revision tags: thorpej-signal-base marc-pcmcia-base
# 1.51 29-Aug-1997 gwr

Tweaks to allow operation with an interface address of 0.0.0.0
(needed for NFS mountroot using BOOTP to get boot parameters)


Revision tags: marc-pcmcia-bp
# 1.50 24-Jun-1997 thorpej

branches: 1.50.4;
Eliminate use of dtom() from the network code, allowing more flexible
use of mbuf external storage and increasing performance (by eliminating
an m_pullup() for clusters in the IP reassembly code).

Changes from Koji Imada <koji@math.human.nagoya-u.ac.jp>, in PR #3628
and #3480, with ever-so-slight integration changes by me.


# 1.49 15-Apr-1997 christos

Move the mtod calls *after* we've made sure that the packet has passed the
filter successfully. Otherwise it can be NULL if the filter blocked it,
and we die. How did this ever work?


Revision tags: is-newarp-before-merge
# 1.48 26-Feb-1997 mrg

allow src-routed packetd by default, per host requirements


# 1.47 25-Feb-1997 cjs

Add net.inet.ip.allowsrcrt option which allows/drops all source
routed packets. This currently defaults to `drop,' but once we
verify that all applications that rely on determining remote IP
addresses for authentication are dropping the connection when they
see a source route option (not just disabling the source route
option), we can turn this back on and conform with the host
requirements.


# 1.46 19-Feb-1997 cjs

Fix bug in sysctl net.inet.ip.forwsrcrt handing: now you can read it
if securelevel > 0. (Thanks, cgd.)


# 1.45 18-Feb-1997 mrg

pseudo-device ipfilter brings in PFIL_HOOKS.


Revision tags: is-newarp-base
# 1.44 11-Jan-1997 thorpej

branches: 1.44.4;
Implement the IP_RECVIF socket option: supply a datagram packet's incoming
interface using a sockaddr_dl in a control mbuf.

Implement SO_TIMESTAMP for IP datagrams.

Move packet information option processing into a generic function
so that they work with multicast UDP and raw IP as well as unicast UDP.

Contributed by Bill Fenner <fenner@parc.xerox.com>.


# 1.43 20-Dec-1996 mrg

in pfil_hooks: always reassign ip after calling hook.


# 1.42 20-Dec-1996 mrg

remove pfil_bad.


# 1.41 25-Oct-1996 thorpej

Before concatenating frags, sanity check the length of the packet. If it's
larger than IP_MAXPACKET, discard it.
Based on a patch from Bill Fenner <fenner@parc.xerox.com>


# 1.40 22-Oct-1996 veego

Fix a panic from the pfil_hooks.


# 1.39 13-Oct-1996 christos

backout previous kprintf changes


# 1.38 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.37 21-Sep-1996 perry

commit fix in pr 2772 -- the IP input code was assuming that the
reserved (must be zero) flag must necessarily be zero. We now define
an IP_RF (by analogy to IP_DF and IP_MF) and mask it out when necessary.


# 1.36 14-Sep-1996 mrg

move the packet filter hooks in to a saner location. while i'm here, rename
PACKET_FILTER to PFIL_HOOKS.


# 1.35 09-Sep-1996 mycroft

Add in_nullhost() and in_hosteq() macros, to hide some protocol
details. Also, fix a bug in TCP wrt SYN+URG packets.


# 1.34 08-Sep-1996 mycroft

Save 68 bytes of the packet for ICMP, not 64. From Laine Stump, PR 2296.


# 1.33 06-Sep-1996 mrg

add packet filter interface code. see pfil(9) for more details. you
need the PACKET_FILTER option to enable this code. currently, ipfilter
version 3.1.1-beta has been converted to use this new interface.


# 1.32 14-Aug-1996 thorpej

Fix some DIAGNOSTIC printf() formats; ntohl() provides a 32-bit quantity,
and should be printed with %x, not %lx.


# 1.31 10-Jul-1996 cgd

print result of ntohl/htonl as a long. (makes -Wformat work on the
Alpha.)


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.30 16-Mar-1996 christos

branches: 1.30.4;
Fix printf format args.


# 1.29 26-Feb-1996 mrg

two more local addr changes, all done differently now (idea from charles)


# 1.28 13-Feb-1996 christos

netinet prototypes


# 1.27 16-Jan-1996 thorpej

Add a net.inet.ip.directed-broadcast sysctl as suggested by
Darren Reed <darrenr@vitruvius.arbld.unimelb.edu.au> in PR #1227.
This change is slightly different than the one submitted by Darren in
that the DIRECTED_BROADCAST compile-time option will behave like it used
to so that existing configurations utilizing it won't have to change.


# 1.26 15-Jan-1996 thorpej

Add net.inet.ip.forwsrcrt: if zero, the system will not forward
source-routed packets. Note this value is protected by kernel security
level; it can only be changed if securelevel < 1.


# 1.25 21-Nov-1995 cgd

make netinet work on systems where pointers and longs are 64 bits
(like the alpha). Biggest problem: IP headers were overlayed with
structure which included pointers, and which therefore didn't overlay
properly on 64-bit machines. Solution: instead of threading pointers
through IP header overlays, add a "queue element" structure to do
the threading, and point it at the ip headers.


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.24 12-Aug-1995 mycroft

splnet --> splsoftnet


# 1.23 12-Jun-1995 mycroft

Change in_pcbnotify*() to take an errno value. Make inetctlerrmap[] an
array on ints, not u_chars.


# 1.22 12-Jun-1995 mycroft

Various cleanup, including:
* Convert several data structures to use queue.h.
* Split in_pcbnotify() into two parts; one for notifying a specific PCB, and
one for notifying all PCBs for a particular foreign address.


# 1.21 07-Jun-1995 mycroft

Remove ip_ifmatrix completely.


# 1.20 04-Jun-1995 mycroft

Don't cast things unnecessarily.


# 1.19 04-Jun-1995 mycroft

Clean up many more casts.


# 1.18 01-Jun-1995 mycroft

Avoid byte-swapping IP addresses at run time.


# 1.17 15-May-1995 cgd

oops; forgot a '{'


# 1.16 14-May-1995 cgd

drop (and record) malformed IP fragments. Fixes pr 1030 (differently).


# 1.15 13-Apr-1995 cgd

be a bit more careful and explicit with types. (basically a large no-op.)


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.14 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.13 13-May-1994 mycroft

Update to 4.4-Lite networking code, with a few local changes.


# 1.12 14-Feb-1994 mycroft

PARANOID --> DIAGNOSTIC for inexpensive tests.


# 1.11 02-Feb-1994 hpeyerl

Multicast is no longer optional.


# 1.10 29-Jan-1994 brezak

Fix some cases of NOT dealing with m_pkthdr's. This code is still suspect though, at least this fixes some panics.


# 1.9 10-Jan-1994 mycroft

Should compile now with or without `options MULTICAST'.


# 1.8 09-Jan-1994 mycroft

Prototype the rest.


# 1.7 08-Jan-1994 mycroft

More prototypes.


# 1.6 08-Jan-1994 mycroft

Fix some inconsistent spacing; spaces at the end of lines, etc.


# 1.5 18-Dec-1993 mycroft

Canonicalize all #includes.


# 1.4 06-Dec-1993 hpeyerl

multicast support.
>From Chris Maeda, cmaeda@cs.washington.edu
These patches are derived from the IP Multicast patches for BSDI.


Revision tags: magnum-base netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.3 20-May-1993 cgd

branches: 1.3.4;
more rcsid additions and file header cleanups


# 1.2 04-May-1993 cgd

make ip_input recursion checking be for -DPARANOID, and make it panic


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.347 12-Dec-2016 ozaki-r

Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.346 08-Dec-2016 ozaki-r

Use psref for ip_rtaddr

ip_rtaddr will be sleepable soon. So use psref instead of pserialize.


# 1.345 08-Dec-2016 ozaki-r

Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.


Revision tags: nick-nhusb-base-20161204 pgoyette-localcount-20161104
# 1.344 18-Oct-2016 ozaki-r

Don't hold global locks if NET_MPSAFE is enabled

If NET_MPSAFE is enabled, don't hold KERNEL_LOCK and softnet_lock in
part of the network stack such as IP forwarding paths. The aim of the
change is to make it easy to test the network stack without the locks
and reduce our local diffs.

By default (i.e., if NET_MPSAFE isn't enabled), the locks are held
as they used to be.

Reviewed by knakahara@


# 1.343 18-Oct-2016 ozaki-r

Avoid double frees of mbuf

May fix one of panicks reported by Tom Ivar Helbekkmo in PR kern/51522


# 1.342 11-Oct-2016 ozaki-r

Fix kernel builds with IFA_STATS


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.341 07-Sep-2016 roy

Disallow input to detached addresses because they are not yet valid.


# 1.340 31-Aug-2016 ozaki-r

Make ipforward_rt and ip6_forward_rt percpu

Sharing one rtcache between CPUs is just a bad idea.

Reviewed by knakahara@


Revision tags: pgoyette-localcount-20160806
# 1.339 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


# 1.338 26-Jul-2016 ozaki-r

Fix downmatch increment


Revision tags: pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907
# 1.337 08-Jul-2016 ozaki-r

branches: 1.337.2;
CID 1363344: remove dead code

We may need to reconsider a case when m_get_rcvif_psref returns NULL.


# 1.336 07-Jul-2016 ozaki-r

Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.


# 1.335 06-Jul-2016 ozaki-r

Switch the IPv4 address list to pslist(9)

Note that we leave the old list just in case; it seems there are some
kvm(3) users accessing the list. We can remove it later if we confirmed
nobody does actually.


# 1.334 06-Jul-2016 ozaki-r

Add and use pslist(9)-based hashtable for IPv4 addresses

Note that we leave the old hashtable to keep vmstat -H working.


# 1.333 04-Jul-2016 ozaki-r

Separate IP address matching functions

No functional change intended.


# 1.332 30-Jun-2016 ozaki-r

Tidy up goto lables

No functional change.


# 1.331 30-Jun-2016 ozaki-r

Fix error paths

Some error paths did m_put_rcvif_psref twice.


# 1.330 28-Jun-2016 ozaki-r

Add missing NULL checks for m_get_rcvif_psref


# 1.329 10-Jun-2016 ozaki-r

Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.


Revision tags: nick-nhusb-base-20160529 nick-nhusb-base-20160422 nick-nhusb-base-20160319
# 1.328 21-Jan-2016 riastradh

Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.


# 1.327 21-Jan-2016 riastradh

Give proper prototype to ip_output.


# 1.326 08-Jan-2016 knakahara

eliminate ip_input.c and ip6_input.c dependency on gif(4)


Revision tags: nick-nhusb-base-20151226
# 1.325 13-Oct-2015 roy

Include arp.h to restore the sysctl net.inet.ip.dad_count.
Fixes PR kern/49883 thanks to HITOSHI Osada.


Revision tags: nick-nhusb-base-20150921
# 1.324 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.323 07-Aug-2015 ozaki-r

Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.


Revision tags: nick-nhusb-base-20150606
# 1.322 02-May-2015 joerg

Fix !ARP build.


# 1.321 02-May-2015 roy

Add IPv4 address flags IN_IFF_TENTATIVE, IN_IFF_DUPLICATED and
IN_IFF_DETATCHED to mimic the IPv6 address behaviour.
Add SIOCGIFAFLAG_IN ioctl to retrieve the address flag via the
ifreq structure.
Add IPv4 DAD detection via the ARP methods described in RFC 5227.
Add sysctls net.inet.ip.dad_count and net.inet.arp.debug.

Discussed on tech-net@


Revision tags: nick-nhusb-base-20150406
# 1.320 26-Mar-2015 ozaki-r

Tidy up the regular path of ip_forward

No functional change is intended.


Revision tags: netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.319 16-Jun-2014 ozaki-r

branches: 1.319.4;
Add 3rd argument to pktq_create to pass sc

It will be used to pass bridge sc for bridge_forward softint.

ok rmind@


# 1.318 05-Jun-2014 rmind

- Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.


# 1.317 30-May-2014 christos

Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.


# 1.316 29-May-2014 rmind

Make IGMP and multicast group management code MP-safe. Use a read-write
lock to protect the hash table of multicast address records; also, make it
private and eliminate some macros. In the long term, the lookup path ought
to be optimised.


# 1.315 28-May-2014 christos

CID 12164{49,51}: Remove bogus ifp == NULL checks; if ifp was really NULL,
we would have been dead a few lines before the tests.


# 1.314 23-May-2014 rmind

ip_input(), ip_savecontrol(): cache m->m_pkthdr.rcvif in a variable.


# 1.313 23-May-2014 rmind

Make ip_forward() static, there is no need to expose it.


# 1.312 23-May-2014 rmind

Make ip_input() static, there is no need to expose it.


# 1.311 22-May-2014 rmind

- Add in_init() and move some functions, variables and sysctls into in.c
where they belong to. Make some functions and variables static.
- ip_input.c: reduce some #ifdefs, cleanup a little.
- Move some sysctls into ip_flow.c as they belong there.

No functional change.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 rmind-smpnet-nbase rmind-smpnet-base
# 1.310 19-Mar-2014 liamjfoy

branches: 1.310.2;
Remove ipflow_prune and replace with ipflow_reap. ok rmind@


Revision tags: riastradh-drm2-base3
# 1.309 25-Feb-2014 pooka

Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.308 29-Jun-2013 rmind

- Rewrite parts of pfil(9): use array to store hooks and thus be more cache
friendly (there are only few hooks in the system). Make the structures
opaque and the interface more strict.
- Remove PFIL_HOOKS option by making pfil(9) mandatory.


# 1.307 27-Jun-2013 christos

branches: 1.307.2;
flip src/dst


# 1.306 27-Jun-2013 christos

implement IP_PKTINFO and IP_RECVPKTINFO.


# 1.305 08-Jun-2013 rmind

Split IPsec code in ip_input() and ip_forward() into the separate routines
ipsec4_input() and ipsec4_forward(). Tested by christos@.


# 1.304 05-Jun-2013 christos

IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.


Revision tags: agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7
# 1.303 29-Nov-2012 christos

Add a new sysctl to mark ports as reserved, so that they are not used in
the anonymous or reserved port allocation.


Revision tags: yamt-pagecache-base6
# 1.302 25-Jun-2012 christos

branches: 1.302.2;
rename rfc6056 -> portalgo, requested by yamt


# 1.301 22-Jun-2012 christos

PR/46602: Move the rfc6056 port randomization to the IP layer.


# 1.300 02-Jun-2012 dsl

Add some pre-processor magic to verify that the type of the data item
passed to sysctl_createv() actually matches the declared type for
the item itself.
In the places where the caller specifies a function and a structure
address (typically the 'softc') an explicit (void *) cast is now needed.
Fixes bugs in sys/dev/acpi/asus_acpi.c sys/dev/bluetooth/bcsp.c
sys/kern/vfs_bio.c sys/miscfs/syncfs/sync_subr.c and setting
AcpiGbl_EnableAmlDebugObject.
(mostly passing the address of a uint64_t when typed as CTLTYPE_INT).
I've test built quite a few kernels, but there may be some unfixed MD
fallout. Most likely passing &char[] to char *.
Also add CTLFLAG_UNSIGNED for unsiged decimals - not set yet.


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.299 22-Mar-2012 drochner

remove KAME IPSEC, replaced by FAST_IPSEC


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.298 09-Jan-2012 liamjfoy

check against NULL


# 1.297 19-Dec-2011 drochner

rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.296 31-Aug-2011 plunky

branches: 1.296.2; 1.296.6;
NULL does not need a cast


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.295 03-May-2011 dyoung

*_drain() routines may be called with locks held, so instead of doing
any work in *_drain(), set a drain-needed flag. Do the work in the
fasttimo handler.

Contributed by Coyote Point Systems, Inc.


# 1.294 14-Apr-2011 dyoung

In ipintr(), don't overwrite ipintrq.ifq_maxlen with IFQ_MAXLEN.

Initialize ipintrq.ifq_maxlen using IFQ_MAXLEN directly instead of using
the global ipqmaxlen. Get rid of the global ipqmaxlen.

Now it works again to override the maximum IP queue length with, for
example, sysctl -w net.inet.ip.ifq.maxlen=5.


Revision tags: bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.293 13-Dec-2010 matt

branches: 1.293.2;
Back out rev that shouldn't have been committed.


# 1.292 11-Dec-2010 matt

Add routines to calculate a checkesum if the driver concludes that the
h/w can't do it.


Revision tags: uebayasi-xip-base4
# 1.291 05-Nov-2010 rmind

ip_randomid: make mechanism MP-safe and more modular.

OK matt@


# 1.290 05-Nov-2010 rmind

ip_reass_packet: finish abstraction; some clean-up.
Discussed some time ago with matt@.


Revision tags: uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.289 19-Jul-2010 rmind

Abstract IP reassembly into single generic routine - ip_reass_packet().
Make struct ipq private and struct ipqent not visible to userland.
Push ip_len adjustment into reassembly layer.

OK matt@


# 1.288 13-Jul-2010 rmind

Split-off IPv4 re-assembly mechanism into a separate module. Abstract
into ip_reass_init(), ip_reass_lookup(), etc (note: abstraction is not
yet complete). No functional changes to the actual mechanism.

OK matt@


# 1.287 09-Jul-2010 rmind

ip_input: move lookup for fragment queue a little bit further. OK matt@.


Revision tags: uebayasi-xip-base1
# 1.286 01-Apr-2010 tls

As suggested by at least 3 different people (the guilty parties know who
they are) avoid repeated kernel_lock/unlock by using an intrq on the stack.

About 5%-10% better from run to run, on my *very* simpleminded test. Can't
possibly be worse.


# 1.285 31-Mar-2010 tls

Don't hold kernel lock across call to ip_input() -- it blocked *all*
hardware interrupts for the length of time it took for all dequeued
packets to flow up the stack (on multiprocessors only). Initial testing
shows performance impact is minimal -- since this temporary fix actually
means taking/releasing the kernel lock per-packet, that seems
acceptable.

Holding the kernel lock across the ip_input() call duplicated the
exclusion intended to be provided by the socket locks/softnet lock
(same lock, for INET/INET6 sockets) and could mask serious bugs. Several
hours' testing didn't turn any up but I'd be surprised if some don't now
appear.

Damon Permezel noticed the problem. Temporary fix suggested by matt@.


Revision tags: yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.284 16-Sep-2009 pooka

branches: 1.284.2; 1.284.4;
Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base
# 1.283 17-Jul-2009 minskim

Delete trailing whitespace.


Revision tags: yamt-nfs-mp-base6
# 1.282 16-Jul-2009 minskim

Add the IP_RECVTTL option support.

If the IP_RECVTTL option is enabled on a SOCK_DGRAM socket, the
recvmsg(2) call will return the TTL of the received datagram. The
msg_control field in the msghdr structure points to a buffer that
contains a cmsghdr structure followed by the TTL value.

Modeled after FreeBSD implementation.


Revision tags: yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.281 18-Apr-2009 tsutsui

Remove extra whitespace added by a stupid tool.
XXX: more in src/sys/arch


# 1.280 15-Apr-2009 elad

Remove a few KAUTH_GENERIC_ISSUSER in favor of more descriptive
alternatives.

Discussed on tech-kern:

http://mail-index.netbsd.org/tech-kern/2009/04/11/msg004798.html

Input from ad@, christos@, dyoung@, tsutsui@.

Okay ad@.


# 1.279 18-Mar-2009 cegger

bcopy -> memcpy


Revision tags: nick-hppapmap-base2
# 1.278 19-Jan-2009 christos

branches: 1.278.2;
Provide compatibility to the old timeval SCM_TIMESTAMP messages.


Revision tags: mjf-devfs2-base
# 1.277 17-Dec-2008 cegger

kill MALLOC and FREE macros.


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base haad-dm-base
# 1.276 23-Nov-2008 rmind

ip_input: fix an IPQ "lock" leak. (hi <matt>!)


Revision tags: netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4
# 1.275 04-Oct-2008 pooka

branches: 1.275.2; 1.275.4;
POOL_INIT -> pool_init


Revision tags: wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.274 05-Sep-2008 seanb

Wrong route being consulted in one place
in ip_forward() after change to rtcache_*().
Restore previous behaviour.


# 1.273 20-Aug-2008 matt

Make the sysctl routines take out softnet_lock before dealing with
any data structures.

Change inet6ctlerrmap and zeroin6_addr to const.


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base
# 1.272 05-May-2008 ad

branches: 1.272.2; 1.272.6;
- Convert hashinit() to use kmem_alloc(). The hash tables can be large
and it's better to not have them in kmem_map.
- Convert a couple of minor items along the way to kmem_alloc().
- Fix some memory leaks.


# 1.271 04-May-2008 thorpej

Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577


# 1.270 02-May-2008 ad

PR kern/38497 Out of memory allocating ksiginfo

Work around: don't acquire softnet_lock in protocol drain routines.


# 1.269 28-Apr-2008 martin

Remove clause 3 and 4 from TNF licenses


Revision tags: yamt-nfs-mp-base
# 1.268 24-Apr-2008 ad

branches: 1.268.2;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.


# 1.267 23-Apr-2008 thorpej

Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.266 12-Apr-2008 thorpej

branches: 1.266.2;
Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.


# 1.265 09-Apr-2008 thorpej

- ipflow is not used outside ip_flow.c; move its definition there.
- Make ipflow_reap() private to ip_flow.c, and introduce ipflow_prune()
for external callers to use (avoids returning an ipflow * that is never
actually used anyway).


# 1.264 07-Apr-2008 thorpej

Change IP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ipstat structure; old netstat
binaries will continue to work properly.


# 1.263 27-Mar-2008 cube

- Make sure we send a reasonable fragment size when IPSEC is configured.
Otherwise we end up sending a dubious "0" whenever we cannot find a
proper association for the packet.
- Reset sack_newdata along with snd_nxt to avoid improper integer
arithmetics that lead to sending data from an incorrect place in the
stream, making it appear as corrupted.

Patch by Michael Van Elst, based on an analysis by Michael for the IPSEC
stuff and I for the SACK issue.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase mjf-devfs-base hpcarm-cleanup-base
# 1.262 06-Feb-2008 matt

branches: 1.262.6;
Add a new ip_id generation scheme based on a Fisher-Yates shuffle over a
sliding window. XXX replace use of arc4random RSN.


Revision tags: bouyer-xeni386-nbase bouyer-xeni386-base
# 1.261 14-Jan-2008 dyoung

Use rtcache_validate() instead of rtcache_getrt(). Shorten staircase
in in_losing().


Revision tags: vmlocking2-base3 matt-armv6-base
# 1.260 22-Dec-2007 matt

Fix offset calculation.
Make sure that all frags use the same TOS.


# 1.259 21-Dec-2007 matt

Also make sure the first is at 68 bytes long.


# 1.258 21-Dec-2007 matt

Prevent TCP blind data attacks by not allowing non-initial fragments to
start at less than 68 bytes (minimal fragment size).


# 1.257 20-Dec-2007 dyoung

Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.


Revision tags: yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 vmlocking-nbase jmcneill-pm-base reinoud-bufcleanup-base
# 1.256 26-Nov-2007 yamt

branches: 1.256.2; 1.256.6;
inetctlerrmap: use designated initializer.


Revision tags: bouyer-xenamd64-base2 bouyer-xenamd64-base
# 1.255 09-Nov-2007 kefren

Don't MCLAIM in ipintr() because we do it anyway in ip_input()


Revision tags: jmcneill-base yamt-x86pmap-base4 yamt-x86pmap-base3 yamt-x86pmap-base2 vmlocking-base
# 1.254 02-Oct-2007 dyoung

branches: 1.254.2; 1.254.4;
Delete the unused second argument to ip_stripoptions(), move it
closer to its single caller in if_eon.c, try to move fewer bytes
by moving the IP header forward instead of moving the tail of the
mbuf backward, and use m_adj(9) instead of fiddling directly with
mbuf data members.


Revision tags: yamt-x86pmap-base
# 1.253 11-Sep-2007 degroote

branches: 1.253.2;
In some FAST_IPSEC, spl level is not restored correctly. Fix that.

Spotted by Wolfgang Stukenbrock in pr/36800


Revision tags: nick-csl-alignment-base5
# 1.252 30-Aug-2007 dyoung

Use malloc(9) for sockaddrs instead of pool(9), and remove dom_sa_pool
and dom_sa_len members from struct domain. Pools of fixed-size
objects are too rigid for sockaddr_dls, whose size can vary over
a wide range.

Return sockaddr_dl to its "historical" size. Now that I'm using
malloc(9) instead of pool(9) to allocate sockaddr_dl, I can create
a sockaddr_dl of any size in the kernel, so expanding sockaddr_dl
is useless.

Avoid using sizeof(struct sockaddr_dl) in the kernel.

Introduce sockaddr_dl_alloc() for allocating & initializing an
arbitrary sockaddr_dl on the heap.

Add an argument, the sockaddr length, to sockaddr_alloc(),
sockaddr_copy(), and sockaddr_dl_setaddr().

Constify: LLADDR() -> CLLADDR().

Where the kernel overwrites LLADDR(), use sockaddr_dl_setaddr(),
instead. Used properly, sockaddr_dl_setaddr() will not overrun
the end of the sockaddr.


# 1.251 10-Aug-2007 dyoung

branches: 1.251.2;
Use sockaddr_dl_init().


Revision tags: matt-mips64-base
# 1.250 19-Jul-2007 dyoung

branches: 1.250.4; 1.250.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base yamt-idlelwp-base8 mjf-ufs-trans-base
# 1.249 02-May-2007 dyoung

branches: 1.249.2;
Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


Revision tags: thorpej-atomic-base
# 1.248 25-Mar-2007 liamjfoy

Add net.inet.ip.hashsize to control the IPv4 fast forward hash table size.


# 1.247 24-Mar-2007 liamjfoy

Don't call ip*flow_reap if we're just looking up maxflows


# 1.246 12-Mar-2007 ad

branches: 1.246.2; 1.246.4;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.


# 1.245 05-Mar-2007 liamjfoy

branches: 1.245.2;
Move ipflow_slowtimo from ip_slowtimo and into in_proto.c

ok matt@


# 1.244 04-Mar-2007 christos

Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.243 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.242 29-Jan-2007 dyoung

branches: 1.242.2;
Cosmetic: remove extraneous, non-KNF parentheses. Change a
sizeof(type) to a sizeof(*ptr) so the correctness of the statement
is correct "at a glance" (or so I hope).


# 1.241 22-Dec-2006 ad

ipintr(): check if the queue is empty before looping. Hardly a giant
win, but removed 30% of splnet() calls in one local test.


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.240 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.239 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


# 1.238 06-Dec-2006 dyoung

KNF.


# 1.237 06-Dec-2006 dyoung

KNF.


Revision tags: netbsd-4-0-RC1 netbsd-4-base
# 1.236 16-Nov-2006 christos

branches: 1.236.2; 1.236.4;
__unused removal on arguments; approved by core.


Revision tags: yamt-splraiseipl-base2
# 1.235 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


# 1.234 10-Oct-2006 dogcow

change the MOWNER_INIT define to take two args; fix extant struct mowner
decls to use it. Makes options MBUFTRACE compile again and not whinge about
missing structure declarations. (Also makes initialization consistent.)


# 1.233 05-Oct-2006 tls

Protect calls to pool_put/pool_get that may occur in interrupt context
with spl used to protect other allocations and frees, or datastructure
element insertion and removal, in adjacent code.

It is almost unquestionably the case that some of the spl()/splx() calls
added here are superfluous, but it really seems wrong to see:

s=splfoo();
/* frob data structure */
splx(s);
pool_put(x);

and if we think we need to protect the first operation, then it is hard
to see why we should not think we need to protect the next. "Better
safe than sorry".

It is also almost unquestionably the case that I missed some pool
gets/puts from interrupt context with my strategy for finding these
calls; use of PR_NOWAIT is a strong hint that a pool may be used from
interrupt context but many callers in the kernel pass a "can wait/can't
wait" flag down such that my searches might not have found them. One
notable area that needs to be looked at is pf.

See also:

http://mail-index.netbsd.org/tech-kern/2006/07/19/0003.html
http://mail-index.netbsd.org/tech-kern/2006/07/19/0009.html


# 1.232 19-Sep-2006 elad

Remove ugly (void *) casts from network scope authorization wrapper and
calls to it.

While here, adapt code for system scope listeners to avoid some more
casts (forgotten in previous run).

Update documentation.


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9
# 1.231 13-Sep-2006 elad

branches: 1.231.2;
Don't use KAUTH_RESULT_* where it's not applicable.
Prompted by yamt@.


# 1.230 08-Sep-2006 elad

First take at security model abstraction.

- Add a few scopes to the kernel: system, network, and machdep.

- Add a few more actions/sub-actions (requests), and start using them as
opposed to the KAUTH_GENERIC_ISSUSER place-holders.

- Introduce a basic set of listeners that implement our "traditional"
security model, called "bsd44". This is the default (and only) model we
have at the moment.

- Update all relevant documentation.

- Add some code and docs to help folks who want to actually use this stuff:

* There's a sample overlay model, sitting on-top of "bsd44", for
fast experimenting with tweaking just a subset of an existing model.

This is pretty cool because it's *really* straightforward to do stuff
you had to use ugly hacks for until now...

* And of course, documentation describing how to do the above for quick
reference, including code samples.

All of these changes were tested for regressions using a Python-based
testsuite that will be (I hope) available soon via pkgsrc. Information
about the tests, and how to write new ones, can be found on:

http://kauth.linbsd.org/kauthwiki

NOTE FOR DEVELOPERS: *PLEASE* don't add any code that does any of the
following:

- Uses a KAUTH_GENERIC_ISSUSER kauth(9) request,
- Checks 'securelevel' directly,
- Checks a uid/gid directly.

(or if you feel you have to, contact me first)

This is still work in progress; It's far from being done, but now it'll
be a lot easier.

Relevant mailing list threads:

http://mail-index.netbsd.org/tech-security/2006/01/25/0011.html
http://mail-index.netbsd.org/tech-security/2006/03/24/0001.html
http://mail-index.netbsd.org/tech-security/2006/04/18/0000.html
http://mail-index.netbsd.org/tech-security/2006/05/15/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/01/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/25/0000.html

Many thanks to YAMAMOTO Takashi, Matt Thomas, and Christos Zoulas for help
stablizing kauth(9).

Full credit for the regression tests, making sure these changes didn't break
anything, goes to Matt Fleming and Jaime Fournier.

Happy birthday Randi! :)


Revision tags: yamt-pdpolicy-base8 rpaulo-netinet-merge-pcb-base
# 1.229 30-Aug-2006 christos

branches: 1.229.2;
fix initializer


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.228 30-Jul-2006 elad

ugh.. more stuff that's overdue and should not be in 4.0: remove the
sysctl(9) flags CTLFLAG_READONLY[12]. luckily they're not documented
so it's only half regression.

only two knobs used them; proc.curproc.corename (check added in the
existing handler; its CTLFLAG_ANYWRITE, yay) and net.inet.ip.forwsrcrt,
that got its own handler now too.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base
# 1.227 07-Jun-2006 kardel

merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base simonb-timecounters-base
# 1.226 08-May-2006 liamjfoy

branches: 1.226.2;
#if -> #ifdef

ok christos


# 1.225 15-Apr-2006 christos

Coverity CID 1134: Protect against NULL deref.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 yamt-pdpolicy-base yamt-uio_vmspace-base5
# 1.224 18-Feb-2006 joerg

branches: 1.224.2; 1.224.4; 1.224.6;
Print the source and destination IP in ip_forward's DIAGNOSTIC code
with inet_ntoa, making it more human friendly.

From Liam J. Foy in private mail.


# 1.223 24-Dec-2005 perry

branches: 1.223.2; 1.223.4; 1.223.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.


# 1.222 11-Dec-2005 christos

merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 ktrace-lwp-base
# 1.221 01-Nov-2005 christos

Don't decrement the ttl, until we are sure that we can forward this packet.
Before if there was no route, we would call icmp_error with a datagram
packet that has an incorrect checksum. (From Liam Foy)


Revision tags: yamt-vop-base2 thorpej-vnode-attr-base
# 1.220 23-Oct-2005 christos

No need to pass an interface when only the mtu is needed. From OpenBSD via
Liam Foy.


Revision tags: yamt-vop-base
# 1.219 05-Aug-2005 elad

branches: 1.219.2;
Add sysctls for IP, ICMP, TCP, and UDP statistics.


# 1.218 28-Jun-2005 seanb

branches: 1.218.2;
- Return ICMP_UNREACH_NET when no route found as per
section 4.3.3.1 of rfc1812.


# 1.217 09-Jun-2005 atatat

Properly fix the constipated lossage wrt -Wcast-qual and the sysctl
code. I know it's not the prettiest code, but it seems to work rather
well in spite of itself.


# 1.216 01-Jun-2005 blymn

Unconstify rnode to prevent compile error when GATEWAY option set.


Revision tags: kent-audio2-base
# 1.215 29-Apr-2005 yamt

move decl of inetsw to its own header to avoid array of incomplete type.
found by gcc4. reported by Adam Ciarcinski.


# 1.214 18-Apr-2005 yamt

fix problems related to loopback interface checksum omission. PR/29971.

- for ipv4, defer decision to ip layer as h/w checksum offloading does
so that it can check the actual interface the packet is going to.
- for ipv6, disable it.
(maybe will be revisited when it implements h/w checksum offloading.)

ok'ed by Jason Thorpe.


# 1.213 29-Mar-2005 yamt

ip_reass: clear stale csum_flags.


Revision tags: yamt-km-base4 yamt-km-base3 netbsd-3-base
# 1.212 26-Feb-2005 perry

branches: 1.212.2;
nuke trailing whitespace


Revision tags: yamt-km-base2
# 1.211 03-Feb-2005 perry

ANSIfy function declarations


# 1.210 02-Feb-2005 perry

de-__P -- will ANSIfy .c files later.


Revision tags: yamt-km-base
# 1.209 24-Jan-2005 matt

branches: 1.209.2;
Add IFNET_FOREACH and IFADDR_FOREACH macros and start using them.


Revision tags: kent-audio1-beforemerge
# 1.208 19-Dec-2004 christos

branches: 1.208.2;
yamt's changes seem to fix all the checksumming issues. Turn the loopback
checksums back off so we can make sure that everything works.


# 1.207 17-Dec-2004 christos

Turn checksumming on loopback back on until we fix the bugs in it.
Connect over tcp on the loopback is broken:

4729 amq 0.000007 CALL connect(4,0x804f2a0,0x1c)
4729 amq 75.007420 RET connect -1 errno 60 Connection timed out


# 1.206 15-Dec-2004 thorpej

Don't perform checksums on loopback interfaces. They can be reenabled with
the net.inet.*.do_loopback_cksum sysctl.

Approved by: groo


Revision tags: kent-audio1-base
# 1.205 06-Oct-2004 darrenr

Add a comment to document what setting "srcrt" is really on about in ipintr()


# 1.204 29-Sep-2004 christos

PR/27081: Sean Boudreau: ip_input() bad csum count not incremented on sw csum


Revision tags: BEFORE-IPF413
# 1.203 25-May-2004 atatat

Sysctl descriptions under net subtree (net.key not done)


# 1.202 02-May-2004 darrenr

at line 543, we do a pullup here of hlen bytes into the mbuf,
so these later ones are superfluous.


# 1.201 01-May-2004 matt

Use EVCNT_ATTACH_STATIC{,2}


# 1.200 25-Apr-2004 simonb

Initialise (most) pools from a link set instead of explicit calls
to pool_init. Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.

Convert struct session, ucred and lockf to pools.


# 1.199 22-Apr-2004 matt

Constify protosw arrays. This can reduce the kernel .data section by
over 4K (if all the network protocols) are loaded.


# 1.198 01-Apr-2004 matt

In ip_reass_ttl_descr, make i signed since it's compared to >= 0


Revision tags: netbsd-2-0-base BEFORE-IPF411
# 1.197 24-Mar-2004 atatat

branches: 1.197.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.


# 1.196 15-Jan-2004 itojun

correct typo in 1.94 -> 1.95. pointed out by Shiva Shenoy


# 1.195 14-Dec-2003 thorpej

Fix syntax errors in CHECK_NMBCLUSTER_PARAMS().


# 1.194 14-Dec-2003 jonathan

Second part of hashed IP_reassembly changes:

When under pressure for mbufs or we have too many fragments in the IP
reassembly queue, drop half of all fragments. This multiplicative-drop
strategy ensures we return to a healthy state, even under borderline
denial-of-service from extremely lossy NFS-over-UDP peers.
The multiplicative-drop phase currently drops 50% of fragments, but
has pre-placed support for implementing drop-fractions other than 50%

The threshhold for the `drop-half' phase is the new variable,
ip_maxfrags which is calculated as nmbclusters/4.

ip_input.c now keeps ip_nmbclusters, a cached copy of nmbclusters.
Before using limits derived from nmbclusters, we check if nmbclusters
and ip_nmclusters are equal. If not, we recompute Ip parameters
derived from nmbclusters. Based on a suggestion by Jason Thorpe.
ip_maxfrags is currently auto-recalcuated.

The counters ip_nfrags and ip_nfragpacketsr are now declared static
and uninitialized (bss), to discourage tampering with them.


# 1.193 12-Dec-2003 scw

Make fast-ipsec and ipflow (Fast Forwarding) interoperate.

The idea is that we only clear M_CANFASTFWD if an SPD exists
for the packet. Otherwise, it's safe to add a fast-forward
cache entry for the route.

To make this work properly, we invalidate the entire ipflow
cache if a fast-ipsec key is added or changed.


# 1.192 08-Dec-2003 jonathan

Add new field ipq_nfrags to struct ipq. Maintain count of fragments
(fragments, not fragmented packets) in each queue entry.
Use ipq_nfrags to maintain a count of total fragments in reassembly queue.


# 1.191 07-Dec-2003 jonathan

KNF: s/unsigned/u_int/, in a couple of places I missed.


# 1.190 06-Dec-2003 jonathan

Replace the single global IP reassembly list/listhead, with a
hashtable of list-heads. Independently re-invented, then reworked to
match similar code in FreeBSD.


# 1.189 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.188 04-Dec-2003 scw

ipflow (IP fast forwarding) is not compatible with FAST_IPSEC either.

XXX: The decision whether or not to fast forward should be made
XXX: dynamically. Using the current approach seriously reduces
XXX: routing performance on gateways with IPsec enabled.


# 1.187 26-Nov-2003 itojun

define RANDOM_IP_ID by default (unifdef -DRANDOM_IP_ID).
one use remains in sys/netipsec, which is kept for freebsd source code compat.


# 1.186 24-Nov-2003 scw

For FAST_IPSEC, ipfilter gets to see wire-format IPsec-encapsulated packets
only. Decapsulated packets bypass ipfilter. This mimics current behaviour
for Kame IPsec.


# 1.185 19-Nov-2003 fvdl

Correct number of arguments to sysctl_rdint.


# 1.184 19-Nov-2003 jonathan

Patch back support for (badly) randomized IP ids, by request:

* Include "opt_inet.h" everywhere IP-ids are generated with ip_newid(),
so the RANDOM_IP_ID option is visible. Also in ip_id(), to ensure
the prototype for ip_randomid() is made visible.

* Add new sysctl to enable randomized IP-ids, provided the kernel was
configured with RANDOM_IP_ID. (The sysctl defaults to zero, and is
a read-only zero if RANDOM_IP_ID is not configured).

Note that the implementation of randomized IP ids is still defective,
and should not be enabled at all (even if configured) without
very careful deliberation. Caveat emptor.


# 1.183 17-Nov-2003 jonathan

Diff to netinet/ip_input.c (restore ip_id, initialize) for ip_id fix:

Revert the (default) ip_id algorithm to the pre-randomid algorithm,
due to demonstrated low-period repeated IDs from the randomized IP_id
code. Consensus is that the low-period repetition (much less than
2^15) is not suitable for general-purpose use.

Allocators of new IPv4 IDs should now call the function ip_newid().
Randomized IP_ids is now a config-time option, "options RANDOM_IP_ID".
ip_newid() can use ip_random-id()_IP_ID if and only if configured
with RANDOM_IP_ID. A sysctl knob should be provided.

This API may be reworked in the near future to support linear ip_id
counters per (src,dst) IP-address pair.


# 1.182 12-Nov-2003 itojun

KNF


# 1.181 11-Nov-2003 jonathan

Change global head-of-local-IP-address list from in_ifaddr to
in_ifaddrhead. Recent changes in struct names caused a namespace
collision in fast-ipsec, which are most cleanly fixed by using
"in_ifaddrhead" as the listhead name.


# 1.180 10-Nov-2003 jonathan

Make per-protocol network input queue stats visible to userland via
sysctl. Add a protocol-independent sysctl handler to show the per-protocol
"struct ifq' statistics. Add IP(v4) specific call to the handler.
Other protocols can show their per-protocol input statistics by
allocating a sysclt node and calling sysctl_ifq() with their own struct ifq *.

As posted to tech-kern plus improvements/cleanup suggested by Andrew Brown.


# 1.179 28-Sep-2003 mycroft

Remove some code that breaks AH tunnels completely. The comment describing
the purpose of this code appears to be on crack -- it's talking about
end-to-end authentication, but the purpose of an AH tunnel is NOT end-to-end
authentication; it's authentication of the tunnel endpoints.

NB: This does not fix the fact that IPsec leaks "packet tags."


# 1.178 06-Sep-2003 itojun

randomize IPv4/v6 fragment ID and IPv6 flowlabel. avoids predictability
of these fields. ip_id.c is from openbsd. ip6_id.c is adapted by kame.


# 1.177 06-Sep-2003 itojun

backout previous, we don't know if arc4random() corrides on reboot.


# 1.176 05-Sep-2003 itojun

initialize fragment ID with arc4random, not by time.tv_sec


# 1.175 22-Aug-2003 itojun

remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.


# 1.174 22-Aug-2003 itojun

change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.


# 1.173 15-Aug-2003 jonathan

(fast-ipsec): Add hooks to pass IPv4 IPsec traffic into fast-ipsec, if
configured with ``options FAST_IPSEC''. Kernels with KAME IPsec or
with no IPsec should work as before.

All calls to ip_output() now always pass an additional compulsory
argument: the inpcb associated with the packet being sent,
or 0 if no inpcb is available.

Fast-ipsec tested with ICMP or UDP over ESP. TCP doesn't work, yet.


# 1.172 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.171 14-Jul-2003 itojun

correct igmp. from love


# 1.170 03-Jul-2003 itojun

minor KNF


# 1.169 30-Jun-2003 itojun

branches: 1.169.2;
do not generate ICMP redirect when packet filter alters ip_dst to an
address that reside on the same link. Cedric Berger convinced me that
it is necessary.


# 1.168 30-Jun-2003 itojun

fix indent


# 1.167 23-Jun-2003 martin

Make sure to include opt_foo.h if a defflag option FOO is used.


# 1.166 15-Jun-2003 matt

Change the way multicasts are kept. They now use a hash table in the same
manner as the ifaddr hash table. By doing this, the mkludge code can go
away. At the same time, keep track of what pcbs are using what ifaddr and
when an address is deleted from an interface, notify/abort all sockets
that have that address as a source. Switch IGMP and multicasts to use pools
for allocation. Fix a number of potential problems in the igmp code where
allocation failures could cause a trap/panic.


# 1.165 11-Apr-2003 christos

PR/991: Darren Reed: Add a sysctl (checkinteface) to implement this. This
implementation is taken from FreeBSD, but we default to off.
XXX: We should really do this on a per ifaddr basis as jason suggested.


# 1.164 26-Feb-2003 matt

Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.163 12-Nov-2002 itojun

remove all entries in rt timer queue on ip_mtudisc change, instead of
destroying the queue.


# 1.162 12-Nov-2002 itojun

ckout previous - doesn't compile


# 1.161 12-Nov-2002 itojun

update ip_mtudisc sysctl change handling.


# 1.160 10-Nov-2002 itojun

always create pmtud timeout queue, as ip_mtudisc can be tweaked via
sysctl at runtime. From lha@stacken.kth.se


# 1.159 02-Nov-2002 perry

/*CONTCOND*/ while (0)'ed macros


Revision tags: kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.158 23-Sep-2002 itojun

revert mtudisc_timeout value to the old one if update falis


# 1.157 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame.


# 1.156 11-Sep-2002 itojun

correct signedness mixup in pointer passing. sync w/kame


Revision tags: gehenna-devsw-base
# 1.155 14-Aug-2002 itojun

avoid swapping endian of ip_len and ip_off on mbuf, to meet with M_LEADINGSPACE
optimization made last year. should solve PR 17867 and 10195.

IP_HDRINCL behavior of raw ip socket is kept unchanged. we may want to
provide IP_HDRINCL variant that does not swap endian.


# 1.154 30-Jun-2002 thorpej

Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).


# 1.153 13-Jun-2002 itojun

set IPv4 parameter to modern value.
- turn on path MTU discovery (previous: turned off)
- ICMPv4 redirect entry timeout = 600 sec (previous: never timeout)


# 1.152 09-Jun-2002 itojun

whitespace


# 1.151 07-Jun-2002 itojun

look at rmx_mtu on IPsec tunnel MTU computation.
From: David Waitzman <djw@bbn.com>


Revision tags: netbsd-1-6-base
# 1.150 12-May-2002 matt

branches: 1.150.2; 1.150.4;
Eliminate commons.


# 1.149 12-May-2002 wiz

Spelling fixes, from Sergey Svishchev in kern/16650.


# 1.148 07-May-2002 matt

Change struct ipqe to use TAILQ's instead of LIST's (primarily for TCP's
benefit currently). Rework tcp_reass code to optimize the 4 most likely causes
of out-of-order packets: first OoO pkt, next OoO pkt in seq, OoO pkt is part
of new chuck of OoO packets, and the OoO pkt fills the first hole. Add evcnts
to instrument tcp_reass (enabled by the options TCP_REASS_COUNTERS). This is
part 1/2 of tcp_reass changes.


# 1.147 18-Apr-2002 matt

Change test for M_EXT to M_READONLY for MROUTING. We only need to to do
a pullup if we aren't allowed to modify the packet.


Revision tags: eeh-devprop-base newlock-base
# 1.146 08-Mar-2002 thorpej

Pool deals fairly well with physical memory shortage, but it doesn't
deal with shortages of the VM maps where the backing pages are mapped
(usually kmem_map). Try to deal with this:

* Group all information about the backend allocator for a pool in a
separate structure. The pool references this structure, rather than
the individual fields.
* Change the pool_init() API accordingly, and adjust all callers.
* Link all pools using the same backend allocator on a list.
* The backend allocator is responsible for waiting for physical memory
to become available, but will still fail if it cannot callocate KVA
space for the pages. If this happens, carefully drain all pools using
the same backend allocator, so that some KVA space can be freed.
* Change pool_reclaim() to indicate if it actually succeeded in freeing
some pages, and use that information to make draining easier and more
efficient.
* Get rid of PR_URGENT. There was only one use of it, and it could be
dealt with by the caller.

From art@openbsd.org.


Revision tags: ifpoll-base
# 1.145 25-Feb-2002 itojun

correctly enforce ipsec policy check on forwarding case.
From: Greg Troxel <gdt@ir.bbn.com>, Bill Chiarchiaro <wjc@work.cleartech.com>


# 1.144 24-Feb-2002 martin

Clear M_BCAST and M_MCAST on outgoing mbufs.
Don't copy ttl from the inner packet to the encapsulating packet. Make
the outer ttl sysctl'able. This should close PR 14269 from Jasper Wallace
(change partly from there) and it makes traceroute work over gre tunnels.


# 1.143 21-Feb-2002 itojun

suppress source quence message, based on router-req RFC (also could be abused
as DoS traffic generator). from kjc/kame


# 1.142 28-Nov-2001 darrenr

recompute hlen after calling pfil_run_hooks() in case ip_hl was changed.


# 1.141 13-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-mips-cache-base
# 1.140 04-Nov-2001 matt

Convert netinet to not use the internal <sys/queue.h> field names
but instead the access macros. Use the FOREACH macros where appropriate.


# 1.139 04-Nov-2001 matt

Change a few variable/tables to const since they are read-only.


# 1.138 29-Oct-2001 simonb

Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2
# 1.137 17-Sep-2001 thorpej

branches: 1.137.2;
Split the pre-computed ifnet checksum flags into Tx and Rx directions.
Add capabilities bits that indicate an interface can only perform
in-bound TCPv4 or UDPv4 checksums. There is at least one Gig-E chip
for which this is true (Level One LXT-1001), and this is also the
case for the Intel i82559 10/100 Ethernet chips.


Revision tags: post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.136 06-Aug-2001 itojun

branches: 1.136.2;
cache IPsec policy on in6?pcb. most of the lookup operations can be bypassed,
especially when it is a connected SOCK_STREAM in6?pcb. sync with kame.


# 1.135 02-Jun-2001 thorpej

branches: 1.135.2;
Implement support for IP/TCP/UDP checksum offloading provided by
network interfaces. This works by pre-computing the pseudo-header
checksum and caching it, delaying the actual checksum to ip_output()
if the hardware cannot perform the sum for us. In-bound checksums
can either be fully-checked by hardware, or summed up for final
verification by software. This method was modeled after how this
is done in FreeBSD, although the code is significantly different in
most places.

We don't delay checksums for IPv6/TCP, but we do take advantage of the
cached pseudo-header checksum.

Note: hardware-assisted checksumming defaults to "off". It is
enabled with ifconfig(8). See the manual page for details.

Implement hardware-assisted checksumming on the DP83820 Gigabit Ethernet,
3c90xB/3c90xC 10/100 Ethernet, and Alteon Tigon/Tigon2 Gigabit Ethernet.


# 1.134 21-May-2001 lukem

fix spelo in comment


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.133 16-Apr-2001 itojun

give a default value to net.inet.ip.maxfragpackets, to protect us from
"lots of fragmented packets" DoS attack.

the current default value is derived from ipv6 counterpart, which is
a magical value "200". it should be enough for normal systems, not sure
if it is enough when you take hundreds of thousands of tcp connections on
your system. if you have proposal for a better value with concrete reasons,
let me know.


# 1.132 13-Apr-2001 thorpej

Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.


# 1.131 27-Mar-2001 itojun

net.inet.ip.maxfragpackets defines the maximum size of ip reass queue
(prevents fragment flood from chewing up mbuf memory space).
derived from KAME net.inet6.ip6.maxfragpackets.


# 1.130 02-Mar-2001 itojun

branches: 1.130.2;
increase ipstat.ips_badaddr if the packet fails to pass address checks.


# 1.129 02-Mar-2001 itojun

reject packets with 127/8 on IPv4 src/dst, they must not appear on wire
(RFC1122). torture-tests will be welcomed.
XXX do we want to check source routing headers as well?


# 1.128 01-Mar-2001 itojun

make sure to enforce inbound ipsec policy checking, for any protocols on top
of ip (check it when final header is visited). sync with kame.
XXX kame team will need to re-check policy engine code


# 1.127 24-Jan-2001 itojun

- record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation


# 1.126 28-Dec-2000 thorpej

Back out the sledgehammer damage applied by wiz while I was out for
the holiday.


# 1.125 25-Dec-2000 wiz

Back out previous change. It causes NAT to fail, and was CLEARLY
NOT TESTED before it was committed.


# 1.124 22-Dec-2000 thorpej

Slight adjustment to how pfil_head's are registered. Instead of a
"key" and a "dlt", use a "type" (PFIL_TYPE_{AF,IFNET} for now) and
a val/ptr appropriate for that type. This allows for more future
flexibility with the pfil_hook mechanism.


# 1.123 14-Dec-2000 thorpej

Add ALTQ glue. XXX Temporary until ALTQ is changed to use a pfil hook.


# 1.122 24-Nov-2000 itojun

IFA_STATS stability (not complete); don't touch ip if it is NULL.


# 1.121 11-Nov-2000 thorpej

Restructure the PFIL_HOOKS mechanism a bit:
- All packets are passed to PFIL_HOOKS as they come off the wire, i.e.
fields in protocol headers in network order, etc.
- Allow for multiple hooks to be registered, using a "key" and a "dlt".
The "dlt" is a BPF data link type, indicating what type of header is
present.
- INET and INET6 register with key == AF_INET or AF_INET6, and
dlt == DLT_RAW.
- PFIL_HOOKS now take an argument for the filter hook, and mbuf **,
an ifnet *, and a direction (PFIL_IN or PFIL_OUT), thus making them
less IP (really, IP Filter) centric.

Maintain compatibility with IP Filter by adding wrapper functions for
IP Filter.


# 1.120 08-Nov-2000 ad

Update for hashinit() change.


# 1.119 13-Oct-2000 itojun

make sure we don't share external mbuf between m and mcopy, in ip_forward().
should solve PR 11201.


# 1.118 26-Aug-2000 itojun

make sure anonport{min,max} is not negative number


# 1.117 25-Aug-2000 tron

Add new sysctl variables "net.inet.ip.lowportmin" and
"net.inet.ip.lowportmax" which can be used to the set minimum
and maximum port number assigned to sockets using
IP_PORTRANGE_LOW.


# 1.116 06-Jul-2000 itojun

remove unnecessary #include <netkey/key_debug.h>. from kame.


# 1.115 28-Jun-2000 mrg

<vm/vm.h> -> <uvm/uvm_extern.h>


Revision tags: netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.114 10-May-2000 itojun

branches: 1.114.4;
add missing boundary checks to ip options processing.
correct timestamp option validation (len and ptr upper/lower bound
based on RFC791).
fill "pointer" field for parameter problem in timestamp option processing.


# 1.113 10-May-2000 itojun

correct more out-of-bounds memory access, if cnt == 1 and optlen > 1.


# 1.112 06-May-2000 sommerfeld

Handle large offsets with very small options correctly.


# 1.111 31-Mar-2000 jdolecek

Slighly improve previous - only include <netinet/ip_mroute.h> if MROUTING
is defined.


# 1.110 31-Mar-2000 jdolecek

include <netinet/ip_mroute.h> for ip_mforward() - needed after
last duplicate prototype sweep (prototype for ip_mforward() used to be in <netinet/ip_var.h>)


# 1.109 30-Mar-2000 augustss

Remove register declarations.


# 1.108 30-Mar-2000 simonb

Delete uninitialised declaration of ip_defttl - there's an initialised
decl earlier in this file.


# 1.107 10-Mar-2000 thorpej

Back out previous, and adjust a comment.


# 1.106 07-Mar-2000 thorpej

Back out part of 1.104 which isn't actually needed.


# 1.105 03-Mar-2000 itojun

remove unnecessary ttl initialization which I mistakingly bringed in
during KAME merge (this is part of WIDE's expeirmental reass code...)
NetBSD PR: 9412
From: Wolfgang Rupprecht <wolfgang@wsrcc.com>
Fix from: ho@crt.se
itojun was notified from: theo


# 1.104 02-Mar-2000 thorpej

Avoid a bug in GCC which manifests itself when processing unaligned
IP options. Problem pointed out by Matt Hargett and Erik Fair, analyzed
by me.


# 1.103 01-Mar-2000 itojun

introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)


# 1.102 20-Feb-2000 darrenr

pass "struct pfil_head *" to pfil_add_hook and pfil_remove hook rather
than "struct protosw *".


# 1.101 17-Feb-2000 darrenr

Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.


# 1.100 16-Feb-2000 itojun

- if ip_dst matches address on !IFF_UP interface, and
- there's no match against addresses on IFF_UP interface,
send icmp unreach if I'm router. drop it if I'm host.

Revised version of PR: 9387 from nrt@iij.ad.jp. Discussed with thorpej+nrt.


Revision tags: chs-ubc2-newbase
# 1.99 12-Feb-2000 thorpej

Typo (Thanks, Havard :-)


# 1.98 12-Feb-2000 thorpej

Small cosmetic change, and note a place where a statistic should be
gathered.


# 1.97 11-Feb-2000 itojun

fix in-kernel packet forwarding loop (till TTL becomes 0) when:
- a packet is delivered to an address X,
- and the address X is configured on my !IFF_UP interface
- and ipforwarding=1

NetBSD PR: 9387
From: nrt@iij.ad.jp


# 1.96 01-Feb-2000 thorpej

Use ifatoia() and sintosa() consistently, rather than using home-grown
casting macros intermixed.


# 1.95 31-Jan-2000 itojun

bring in latest KAME ipsec tree.
- interop issues in ipcomp is fixed
- padding type (after ESP) is configurable
- key database memory management (need more fixes)
- policy specification is revisited

XXX m->m_pkthdr.rcvif is still overloaded - hope to fix it soon


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base comdex-fall-1999-base fvdl-softdep-base
# 1.94 26-Oct-1999 itojun

disable ipflow (IPv4 fast fowarding) when IPsec is configured into the kernel.


# 1.93 17-Oct-1999 sommerfeld

branches: 1.93.2; 1.93.4;
In ip_forward():

Avoid forwarding ip unicast packets which were contained inside
link-level multicast packets; having M_MCAST still set in the packet
header flags will mean that the packet will get multicast to a bogus
group instead of unicast to the next hop.

Malformed packets like this have occasionally been spotted "in the
wild" on a mediaone cable modem segment which also had multiple netbsd
machines running as router/NAT boxes.

Without this, any subnet with multiple netbsd routers receiving all
multicasts will generate a packet storm on receipt of such a
multicast. Note that we already do the same check here for link-level
broadcasts; ip6_forward already does this as well.

Note that multicast forwarding does not go through ip_forward().

Adding some code to if_ethersubr to sanity check link-level
vs. ip-level multicast addresses might also be worthwhile.


Revision tags: chs-ubc2-base
# 1.92 23-Jul-1999 itojun

branches: 1.92.2;
do not include unnecessary include files.


# 1.91 09-Jul-1999 thorpej

defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).


# 1.90 06-Jul-1999 itojun

sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups


# 1.89 01-Jul-1999 itojun

IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.88 26-Jun-1999 sommerfeld

If the new global variable hostzerobroadcast is zero, no longer assume
address zero of each net/subnet is a broadcast address.
(The default value is nonzero, which preserves the current behavior).

This can be set using sysctl; the boot-time default can also be
configured using the HOSTZEROBROADCAST kernel config option.

While we're here, defopt HOSTZEROBROADCAST and SUBNETSARELOCAL


# 1.87 04-May-1999 hwr

It does not make much sense to increase a "output" counter on input.


# 1.86 03-May-1999 thorpej

In INADDR_TO_IA(), skip interfaces which are not up. Revert previous change
to ip_input.c to check the interface status after INADDR_TO_IA().

Fix cooked up by Heiko Rupp and myself.

Fixes PR 7480.


# 1.85 03-May-1999 hwr

Drop packets, that have a Class-D address as source address.
Implements the first half of PR 7003.


# 1.84 07-Apr-1999 proff

tiny KNF change


# 1.83 07-Apr-1999 proff

Prevent reception of packets on downed interfaces (via an up interface).
fixes kern/7327


Revision tags: netbsd-1-4-base
# 1.82 27-Mar-1999 aidan

branches: 1.82.2;
Added per-addr input/output statistics. Currently just support netatalk
and netinet, currently only tested under netinet.

Disabled by default, enabled by compiling the kernel with option
IFA_STATS. Enabling this feature seems to make the ip_output function
take 13% longer than before, which should be OK for people that need
this feature.


# 1.81 26-Mar-1999 proff

security: test for ip_len < ip_hl <<2 and drop packet accordingly


# 1.80 19-Jan-1999 mycroft

There's just no plausible reason to byte-swap ip_id internally. It's opaque.


# 1.79 19-Jan-1999 mycroft

Don't screw with ip_len; just subtract from it where we actually use the
value.


# 1.78 19-Jan-1999 mycroft

Don't overwrite the checksum fields when checking them. There's no reason to
do this, and it screws up ICMP replies.
XXX The returned IP checksum and length are still wrong.


# 1.77 11-Jan-1999 thorpej

Fix byte order and ip_len inconsistencies in ICMP reply code. Also, fix
some formatting and HTONS(foo) vs. foo = htons(foo) inconsistencies.

PR #6602, Darren Reed.


# 1.76 19-Dec-1998 thorpej

Reverse the copyright-notice-swap. It went against existing practice.


# 1.75 18-Dec-1998 thorpej

Add a lock around the IP fragment reassembly queue, to prevent ip_drain()
from corrupting the queue if called from a device's interrupt context.

Should fix PR #5684.


Revision tags: kenh-if-detach-base
# 1.74 13-Nov-1998 thorpej

branches: 1.74.2;
Once a fragmented IP packet has been reassembled, recompute the packet
length before passing it up the stack. From FreeBSD.


Revision tags: chs-ubc-base
# 1.73 08-Oct-1998 thorpej

Use the pool allocator for ipflow entries.


# 1.72 08-Oct-1998 thorpej

Use the pool allocator for ipqent structures.


# 1.71 30-Sep-1998 tls

Switch order of TNF and UCB copyrights so UCB copyright is first; this seems more appropriate since UCB wrote the original code, after all.


# 1.70 09-Sep-1998 thorpej

Make a diagnostic printf more sensible, PR #5951, Heiko W. Rupp.


# 1.69 09-Aug-1998 mrg

defopt PFIL_HOOKS.


Revision tags: eeh-paddr_t-base
# 1.68 17-Jul-1998 sommerfe

Fix PR5508: ipfil cut-through forwarding causes panic


# 1.67 01-Jun-1998 thorpej

Protect the ipflow_reap() call with splsoftnet.


# 1.66 24-May-1998 thorpej

Fix OBOB in IP timestamp option processing, as noted in FreeBSD PR 6738,
from Jennifer Dawn Meyers <jdm@enteract.com>.


# 1.65 04-May-1998 matt

Default IP flow to being enabled. Add a sysctl to control the maximum
number of flows (net.inet.ip.maxflows). If set to 0, will disable fast
path forwarding.


# 1.64 01-May-1998 thorpej

Allow packet filters to prevent a packet from creating a fast-forwarding
flow, by setting the "can fast forward" flag in the packet header, and
giving a chance for filters to clear the flag. If the flag is still
set after the filters have given it a chance, the packet will be used
to create a fast-forward flow entry.


# 1.63 29-Apr-1998 matt

Add support for "fast" forwarding. Add hooks in if_ethersubr.c and
if_fddisubr.c to fastpath IP forwarding. If ip_forward successfully
forwards a packet, it will create a cache (ipflow) entry. ether_input
and fddi_input will first call ipflow_fastforward with the received
packet and if the packet passes enough tests, it will be forwarded (the
ttl is decremented and the cksum is adjusted incrementally).


# 1.62 29-Apr-1998 matt

defopt GATEWAY


# 1.61 29-Apr-1998 kml

change path MTU timeout value to match RFC 1191


# 1.60 29-Apr-1998 kml

Add support for deletion of routes added by path MTU discovery;
uses new generic route timeout code. Add sysctl for timeout period.


# 1.59 19-Mar-1998 mrg

convert pfil(9) in and out lists from <sys/queue.h> LISTs to TAILQs, and
change pfil_add_hook to put output filters at the tail of the queue,
while continuing to place input filters at the head of the queue. update
the two users of these functions, and document these changes.

fixes PR#4593.


# 1.58 15-Feb-1998 tls

Add correct copyright notice for IP address hash change. This code is donated to TNF by the original copyright holder, Panix.


# 1.57 13-Feb-1998 tls

Change list of interface IP addresses to a hash. Improves performance on hosts with a large number of IP addresses significantly.


# 1.56 28-Jan-1998 thorpej

Use offsetof() from libkern.h


# 1.55 12-Jan-1998 scottr

Use option header file for MROUTING


# 1.54 05-Jan-1998 lukem

enhance ephemeral port allocation code:
* support sysctl net.inet.ip.anonportmin (lowest ephemeral port)
and net.inet.ip.anonportmax (highest ephemeral port).
these can't be set to >65535, < IPPORT_RESERVED (unless IPNOPRIVPORTS
is defined), and anonportmin has to be < anonportmax.
* use a cleaner way of only cycling through the available set once;
this will be useful for when a random allocation scheme is used
* define IPPORT_ANON{MIN,MAX} instead of IPPORT_USER{LOW,HIGH}


Revision tags: netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base
# 1.53 18-Oct-1997 kml

branches: 1.53.2;
change sysctl net.inet.icmp.mtudisc to net.inet.ip.mtudisc


# 1.52 17-Oct-1997 thorpej

Allow `subnetsarelocal' to be changed via sysctl.


Revision tags: thorpej-signal-base marc-pcmcia-base
# 1.51 29-Aug-1997 gwr

Tweaks to allow operation with an interface address of 0.0.0.0
(needed for NFS mountroot using BOOTP to get boot parameters)


Revision tags: marc-pcmcia-bp
# 1.50 24-Jun-1997 thorpej

branches: 1.50.4;
Eliminate use of dtom() from the network code, allowing more flexible
use of mbuf external storage and increasing performance (by eliminating
an m_pullup() for clusters in the IP reassembly code).

Changes from Koji Imada <koji@math.human.nagoya-u.ac.jp>, in PR #3628
and #3480, with ever-so-slight integration changes by me.


# 1.49 15-Apr-1997 christos

Move the mtod calls *after* we've made sure that the packet has passed the
filter successfully. Otherwise it can be NULL if the filter blocked it,
and we die. How did this ever work?


Revision tags: is-newarp-before-merge
# 1.48 26-Feb-1997 mrg

allow src-routed packetd by default, per host requirements


# 1.47 25-Feb-1997 cjs

Add net.inet.ip.allowsrcrt option which allows/drops all source
routed packets. This currently defaults to `drop,' but once we
verify that all applications that rely on determining remote IP
addresses for authentication are dropping the connection when they
see a source route option (not just disabling the source route
option), we can turn this back on and conform with the host
requirements.


# 1.46 19-Feb-1997 cjs

Fix bug in sysctl net.inet.ip.forwsrcrt handing: now you can read it
if securelevel > 0. (Thanks, cgd.)


# 1.45 18-Feb-1997 mrg

pseudo-device ipfilter brings in PFIL_HOOKS.


Revision tags: is-newarp-base
# 1.44 11-Jan-1997 thorpej

branches: 1.44.4;
Implement the IP_RECVIF socket option: supply a datagram packet's incoming
interface using a sockaddr_dl in a control mbuf.

Implement SO_TIMESTAMP for IP datagrams.

Move packet information option processing into a generic function
so that they work with multicast UDP and raw IP as well as unicast UDP.

Contributed by Bill Fenner <fenner@parc.xerox.com>.


# 1.43 20-Dec-1996 mrg

in pfil_hooks: always reassign ip after calling hook.


# 1.42 20-Dec-1996 mrg

remove pfil_bad.


# 1.41 25-Oct-1996 thorpej

Before concatenating frags, sanity check the length of the packet. If it's
larger than IP_MAXPACKET, discard it.
Based on a patch from Bill Fenner <fenner@parc.xerox.com>


# 1.40 22-Oct-1996 veego

Fix a panic from the pfil_hooks.


# 1.39 13-Oct-1996 christos

backout previous kprintf changes


# 1.38 10-Oct-1996 christos

printf -> kprintf, sprintf -> ksprintf


# 1.37 21-Sep-1996 perry

commit fix in pr 2772 -- the IP input code was assuming that the
reserved (must be zero) flag must necessarily be zero. We now define
an IP_RF (by analogy to IP_DF and IP_MF) and mask it out when necessary.


# 1.36 14-Sep-1996 mrg

move the packet filter hooks in to a saner location. while i'm here, rename
PACKET_FILTER to PFIL_HOOKS.


# 1.35 09-Sep-1996 mycroft

Add in_nullhost() and in_hosteq() macros, to hide some protocol
details. Also, fix a bug in TCP wrt SYN+URG packets.


# 1.34 08-Sep-1996 mycroft

Save 68 bytes of the packet for ICMP, not 64. From Laine Stump, PR 2296.


# 1.33 06-Sep-1996 mrg

add packet filter interface code. see pfil(9) for more details. you
need the PACKET_FILTER option to enable this code. currently, ipfilter
version 3.1.1-beta has been converted to use this new interface.


# 1.32 14-Aug-1996 thorpej

Fix some DIAGNOSTIC printf() formats; ntohl() provides a 32-bit quantity,
and should be printed with %x, not %lx.


# 1.31 10-Jul-1996 cgd

print result of ntohl/htonl as a long. (makes -Wformat work on the
Alpha.)


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.30 16-Mar-1996 christos

branches: 1.30.4;
Fix printf format args.


# 1.29 26-Feb-1996 mrg

two more local addr changes, all done differently now (idea from charles)


# 1.28 13-Feb-1996 christos

netinet prototypes


# 1.27 16-Jan-1996 thorpej

Add a net.inet.ip.directed-broadcast sysctl as suggested by
Darren Reed <darrenr@vitruvius.arbld.unimelb.edu.au> in PR #1227.
This change is slightly different than the one submitted by Darren in
that the DIRECTED_BROADCAST compile-time option will behave like it used
to so that existing configurations utilizing it won't have to change.


# 1.26 15-Jan-1996 thorpej

Add net.inet.ip.forwsrcrt: if zero, the system will not forward
source-routed packets. Note this value is protected by kernel security
level; it can only be changed if securelevel < 1.


# 1.25 21-Nov-1995 cgd

make netinet work on systems where pointers and longs are 64 bits
(like the alpha). Biggest problem: IP headers were overlayed with
structure which included pointers, and which therefore didn't overlay
properly on 64-bit machines. Solution: instead of threading pointers
through IP header overlays, add a "queue element" structure to do
the threading, and point it at the ip headers.


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.24 12-Aug-1995 mycroft

splnet --> splsoftnet


# 1.23 12-Jun-1995 mycroft

Change in_pcbnotify*() to take an errno value. Make inetctlerrmap[] an
array on ints, not u_chars.


# 1.22 12-Jun-1995 mycroft

Various cleanup, including:
* Convert several data structures to use queue.h.
* Split in_pcbnotify() into two parts; one for notifying a specific PCB, and
one for notifying all PCBs for a particular foreign address.


# 1.21 07-Jun-1995 mycroft

Remove ip_ifmatrix completely.


# 1.20 04-Jun-1995 mycroft

Don't cast things unnecessarily.


# 1.19 04-Jun-1995 mycroft

Clean up many more casts.


# 1.18 01-Jun-1995 mycroft

Avoid byte-swapping IP addresses at run time.


# 1.17 15-May-1995 cgd

oops; forgot a '{'


# 1.16 14-May-1995 cgd

drop (and record) malformed IP fragments. Fixes pr 1030 (differently).


# 1.15 13-Apr-1995 cgd

be a bit more careful and explicit with types. (basically a large no-op.)


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.14 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.13 13-May-1994 mycroft

Update to 4.4-Lite networking code, with a few local changes.


# 1.12 14-Feb-1994 mycroft

PARANOID --> DIAGNOSTIC for inexpensive tests.


# 1.11 02-Feb-1994 hpeyerl

Multicast is no longer optional.


# 1.10 29-Jan-1994 brezak

Fix some cases of NOT dealing with m_pkthdr's. This code is still suspect though, at least this fixes some panics.


# 1.9 10-Jan-1994 mycroft

Should compile now with or without `options MULTICAST'.


# 1.8 09-Jan-1994 mycroft

Prototype the rest.


# 1.7 08-Jan-1994 mycroft

More prototypes.


# 1.6 08-Jan-1994 mycroft

Fix some inconsistent spacing; spaces at the end of lines, etc.


# 1.5 18-Dec-1993 mycroft

Canonicalize all #includes.


# 1.4 06-Dec-1993 hpeyerl

multicast support.
>From Chris Maeda, cmaeda@cs.washington.edu
These patches are derived from the IP Multicast patches for BSDI.


Revision tags: magnum-base netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.3 20-May-1993 cgd

branches: 1.3.4;
more rcsid additions and file header cleanups


# 1.2 04-May-1993 cgd

make ip_input recursion checking be for -DPARANOID, and make it panic


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision