History log of /netbsd-current/sys/netinet6/icmp6.c
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 1.256 24-Feb-2024 mlelstv

Deliver timestamps also to raw sockets.
Fixes PR 57955


# 1.255 09-Dec-2023 pgoyette

Modularize the COMPAT_90 code that resulted from the removal of
netinet6/nd6 from the kernel. Now, the minimal compat code can
be successfully loaded and unloaded along with the rest of the
COMPAT_90 code.

XXX pullup-10 - hopefully before RC2


Revision tags: thorpej-ifq-base thorpej-altq-separation-base netbsd-10-0-RC1 netbsd-10-base
# 1.254 28-Oct-2022 ozaki-r

branches: 1.254.2;
inpcb: separate inpcb again to reduce the size of PCB for IPv4

The data size of PCB for IPv4 increased because of the merge of
struct in6pcb. The change decreases the size to the original size by
separating struct inpcb (again). struct in4pcb and in6pcb that embed
struct inpcb are introduced.

Even after the separation, users don't need to realize the separation
and only have to use some macros to access dedicated data. For example,
inp->inp_laddr is now accessed through in4p_laddr(inp).


# 1.253 28-Oct-2022 ozaki-r

inpcb: integrate data structures of PCB into one

Data structures of network protocol control blocks (PCBs), i.e.,
struct inpcb, in6pcb and inpcb_hdr, are not organized well. Users of
the data structures have to handle them separately and thus the code
is cluttered and duplicated.

The commit integrates the data structures into one, struct inpcb. As a
result, users of PCBs only have to handle just one data structure, so
the code becomes simple.

One drawback is that the data size of PCB for IPv4 increases by 40 bytes
(from 248 bytes to 288 bytes).


Revision tags: bouyer-sunxi-drm-base
# 1.252 29-Aug-2022 knakahara

Add sysctl entry to control to send routing message for RTM_DYNAMIC.

Some routing daemons require such routing message to keep coherency.

If we want to let kernel send such message, set net.inet.icmp.dynamic_rt_msg=1
for IPv4, net.inet6.icmp6.dynamic_rt_msg=1 for IPv6.
Default(=0) is the same as before, that is, not send such routing message.


# 1.251 22-Aug-2022 knakahara

Add sysctl entry to enable/disable to use path MTU discovery for icmpv6 reflecting.

If we want to use path MTU discovery for icmp reflecting set
net.inet6.icmp6.reflect_pmtu=1. Default(=0) is the same as before, that is,
use IPV6_MINMTU.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.250 19-Feb-2021 christos

- Make ALIGNED_POINTER use __alignof(t) instead of sizeof(t). This is more
correct because it works with non-primitive types and provides the ABI
alignment for the type the compiler will use.
- Remove all the *_HDR_ALIGNMENT macros and asserts
- Replace POINTER_ALIGNED_P with ACCESSIBLE_POINTER which is identical to
ALIGNED_POINTER, but returns that the pointer is always aligned if the
CPU supports unaligned accesses.
[ as proposed in tech-kern ]


# 1.249 15-Feb-2021 martin

Fix the build.
Maybe there should be a ICMP6_HDR_ALIGNMENT, but for now there is
only IP6_HDR_ALIGNMENT.


# 1.248 14-Feb-2021 christos

- centralize header align and pullup into a single inline function
- use a single macro to align pointers and expose the alignment, instead
of hard-coding 3 in 1/2 the macros.
- fix an issue in the ipv6 lt2p where it was aligning for ipv4 and pulling
for ipv6.


# 1.247 11-Sep-2020 roy

branches: 1.247.2;
inet6: Use generic Neighor Detection rather than IPv6 specific

No functional change intended.


# 1.246 27-Jul-2020 roy

icmp6: Remove __packed attribute from icmp6 structures

They should naturally align.
Add compile time assertations to icmp6.c to prove this.


# 1.245 12-Jun-2020 roy

Remove in-kernel handling of Router Advertisements

This is much better handled by a user-land tool.
Proposed on tech-net here:
https://mail-index.netbsd.org/tech-net/2020/04/22/msg007766.html

Note that the ioctl SIOCGIFINFO_IN6 no longer sets flags. That now
needs to be done using the pre-existing SIOCSIFINFO_FLAGS ioctl.

Compat is fully provided where it makes sense, but trying to turn on
RA handling will obviously throw an error as it no longer exists.

Note that if you use IPv6 temporary addresses, this now needs to be
turned on in dhcpcd.conf(5) rather than in sysctl.conf(5).


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base phil-wifi-20200406
# 1.244 09-Mar-2020 roy

route: RTM_MISS now puts the message source address in RTA_AUTHOR

route(8) also reports this.
A userland app could use this to blacklist nodes who probe for machines
that doesn't exist on a subnet / prefix.


Revision tags: is-mlppp-base ad-namecache-base3 ad-namecache-base2 ad-namecache-base1 ad-namecache-base phil-wifi-20191119
# 1.243 06-Oct-2019 uwe

icmp6_notify_error - fix ctlfunc typedef to match pr_ctlinput,
drop the cast that is no longer necessary.


Revision tags: netbsd-9-3-RELEASE netbsd-9-2-RELEASE netbsd-9-1-RELEASE netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.242 22-Dec-2018 maxv

Replace: M_COPY_PKTHDR -> m_copy_pkthdr. No functional change, since the
former is a macro to the latter.


# 1.241 22-Dec-2018 maxv

Replace: M_MOVE_PKTHDR -> m_move_pkthdr. No functional change, since the
former is a macro to the latter.


Revision tags: pgoyette-compat-1126
# 1.240 25-Oct-2018 ozaki-r

Remove a leftover debug printf

Pointed out by hannken@


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.239 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


Revision tags: pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625
# 1.238 01-Jun-2018 ozaki-r

branches: 1.238.2;
Fix _rt_free via rtrequest(RTM_DELETE) hangs in rt_timer handlers

A rt_timer handler is passed a rtentry with an extra reference that avoids the
rtentry is accidentally released. So rt_timer handers must release the reference
of a passed rtentry by themselves (but they didn't).


Revision tags: pgoyette-compat-0521
# 1.237 07-May-2018 maxv

Remove misleading comments.


Revision tags: pgoyette-compat-0502
# 1.236 01-May-2018 maxv

Remove now unused net_osdep.h includes, the other BSDs did the same.


# 1.235 29-Apr-2018 maxv

Replace
m_copym(m, 0, M_COPYALL, M_DONTWAIT)
by
m_copypacket(m, M_DONTWAIT)
when it is obvious that 'm' has M_PKTHDR set.


# 1.234 28-Apr-2018 maxv

Remove unused ipsec_var.h includes.


# 1.233 27-Apr-2018 maxv

Fix a bug introduced in rev1.154 (2009). mcl_cache still has a size of
MCLBYTES, so the area allocated is still too small.

I think it should have been MEXTMALLOC, and of course I can't test my
change.


# 1.232 26-Apr-2018 maxv

Stop using m_copy(), use m_copym() directly. m_copy is useless,
undocumented and confusing.


# 1.231 26-Apr-2018 maxv

Use M_UNWRITABLE, no functional change.


Revision tags: pgoyette-compat-0422 pgoyette-compat-0415
# 1.230 14-Apr-2018 maxv

Fix 'icmp6len', it shouldn't be ip6_plen, because we may not be at the
beginning of the packet (off+ip6_plen is beyond the end of the mbuf). By
luck, the IP6_EXTHDR_GET that follows will fail and prevent buffer
overflows in non-jumbogram packets.

For jumbograms we will probably be in trouble here; but it doesn't seem
possible to craft reliably a jumbogram for a non-jumbogram-enabled device.

So I don't think it's a huge problem.


# 1.229 14-Apr-2018 maxv

Cosmetic, and remove one XXX (no problem).


# 1.228 14-Apr-2018 maxv

Remove the RH0 code from ICMPv6. RH0 is deprecated by RFC5095 (2007) for
security reasons. We already removed it in Route6.

In addition there was an mbuf bug here: calling IP6_EXTHDR_GET twice with
the same offset, but still using the pointer from the first call, which
could have been made invalid. By luck, m_pulldown leaves zero-sized mbufs
in place, instead of freeing them.

And in general, using a 'finaldst' pointer on the mbuf, and then modifying
that mbuf with IP6_EXTHDR_GET with a smaller offset, was really error-
prone.


# 1.227 14-Apr-2018 maxv

Remove dead code. It is the same as the non-obsolete one, since
ICMP6_DST_UNREACH_NOTNEIGHBOR == ICMP6_DST_UNREACH_BEYONDSCOPE,
and the code leads to the same errno value (EHOSTUNREACH).


# 1.226 12-Apr-2018 maxv

Synchronize the code between raw_ip6.c<->icmp6.c<->raw_ip.c, so that it is
the same everywhere.


# 1.225 12-Apr-2018 maxv

Remove misleading comment; we're just checking the SP, not verifying the
AH/ESP payload. While here style a bit.


Revision tags: pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322
# 1.224 21-Mar-2018 roy

Sprinkle more soroverflow().


Revision tags: pgoyette-compat-0315 pgoyette-compat-base
# 1.223 28-Feb-2018 maxv

branches: 1.223.2;
Remove unused ipsec_private.h includes.


# 1.222 26-Feb-2018 maxv

Remove redundant condition (harmless). PR/53030.


# 1.221 26-Feb-2018 maxv

Dedup: merge ipsec4_in_reject and ipsec6_in_reject into ipsec_in_reject.
While here fix misleading comment.

ok ozaki-r@


# 1.220 12-Feb-2018 maxv

Replace bcopy -> memcpy when it is obvious that the areas don't overlap.
Rearrange ip6_splithdr() for clarity.


# 1.219 23-Jan-2018 maxv

Style, localify, remove XXX when there's no issue, and switch 'extra'
to int.


# 1.218 23-Jan-2018 maxv

Fix the check on 'maxlen', we are not creating struct icmp6_hdr but
struct nd_redirect (which is bigger). Also, make sure we can add a
struct nd_opt_rd_hdr.

Normally this doesn't change anything, since the mbuf has IPV6_MMTU
bytes, and it's always way bigger than what we need.


# 1.217 23-Jan-2018 maxv

Fix info leak. We are allocating a slot of size:

roundup(sizeof(*nd_opt) + ifp->if_addrlen, 8)

But we are not filling in the padding caused by the roundup, and therefore
several bytes are leaked, in the mbuf we're about to send to the network.


# 1.216 23-Jan-2018 maxv

Fix twice the same mistake: 'last' can't be null, so there's no point in
having this misleading branch.


# 1.215 23-Jan-2018 maxv

Style, and four fixes:

* Remove the (disabled) IPPROTO_ESP check. If the packet was decrypted it
will have M_DECRYPTED, and this is already checked.

* Memory leaks in icmp6_error2. They seem hardly triggerable.

* Fix miscomputation in _icmp6_input, the ICMP6 header is not guaranteed
to be located right after the IP6 header. ok mlelstv@

* Memory leak in _icmp6_input. This one seems to be impossible to trigger.


Revision tags: tls-maxphys-base-20171202
# 1.214 05-Nov-2017 ozaki-r

Fix usages of ipsec_used

If IPsec isn't used, we must go back to the normal path.

PR kern/52659


Revision tags: nick-nhusb-base-20170825
# 1.213 02-Aug-2017 ozaki-r

Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.


Revision tags: perseant-stdc-iso10646-base
# 1.212 07-Jul-2017 knakahara

fix PR kern/52353. implemented by ozaki-r@n.o. I just commit by proxy.

XXX need to pullup to -8.


Revision tags: netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.211 14-Mar-2017 ozaki-r

branches: 1.211.6;
Replace DIAGNOSTIC + panic with CTASSERT


# 1.210 17-Feb-2017 ozaki-r

Rename if_acquire_NOMPSAFE to if_acquire

It can be used in MP-safe ways. So let's remove the confusing postfix.
If it's used in a unsafe way, warn NOMPSAFE in a comment.


# 1.209 13-Feb-2017 ozaki-r

Protect mtudisc and redirect stuffs of icmp/icmp6 with mutex

We have to run pr_init of icmp and icmp6 prior to tcp and tcp6 ones
for mutex initialization.


# 1.208 07-Feb-2017 ozaki-r

Add missing NULL checks for m_get_rcvif


Revision tags: nick-nhusb-base-20170204
# 1.207 02-Feb-2017 ozaki-r

Defer some pr_input to workqueue

pr_input is currently called in softint. Some pr_input such as ICMP, ICMPv6
and CARP can add/delete/update IP addresses and routing table entries. For
example, icmp6_redirect_input updates an a routing table entry and
nd6_ra_input may delete an IP address.

Basically such operations shouldn't be done in softint. That aside, we have
a reason to avoid the situation; psz/psref waits cannot be used in softint,
however they are required to work in such pr_input in the MP-safe world.

The change implements the workqueue pr_input framework called wqinput which
provides a means to defer pr_input of a protocol to workqueue easily.
Currently icmp_input, icmp6_input, carp_proto_input and carp6_proto_input
are deferred to workqueue by the framework.

Proposed and discussed on tech-kern and tech-net


# 1.206 16-Jan-2017 christos

ip6_sprintf -> IN6_PRINT so that we pass the size.


# 1.205 16-Jan-2017 ryo

Make ip6_sprintf(), in_fmtaddr(), lla_snprintf() and icmp6_redirect_diag() mpsafe.

Reviewed by ozaki-r@


Revision tags: bouyer-socketcan-base
# 1.204 13-Jan-2017 ozaki-r

branches: 1.204.2;
Tweak icmp6_input; always use off, not *offp


Revision tags: pgoyette-localcount-20170107
# 1.203 12-Dec-2016 ozaki-r

Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.202 11-Dec-2016 ozaki-r

Correct sanity checks of icmp6_redirect_output

- rt->rt_ifp is always non-NULL
- Checking RTF_UP here is just racy and meaningless
- The arguments should be non-NULL (at least for now)


Revision tags: nick-nhusb-base-20161204
# 1.201 15-Nov-2016 mlelstv

Enforce alignment requirements that are violated in some cases.
For machines that don't need strict alignment (i386,amd64,vax,m68k) this
is a no-op.

Fixes PR kern/50766 but should be improved.


Revision tags: pgoyette-localcount-20161104
# 1.200 31-Oct-2016 ozaki-r

Fix race condition of in6_selectsrc

in6_selectsrc returned a pointer to in6_addr that wan't guaranteed to be
safe by pserialize (or psref), which was racy. Let callers pass a pointer
to in6_addr and in6_selectsrc copy a result to it inside pserialize
critical sections.


# 1.199 25-Oct-2016 ozaki-r

Remove unnecessary argument

No functional change.


# 1.198 18-Oct-2016 ozaki-r

Remove unnecessary pserialize_read_enter


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.197 26-Aug-2016 dholland

PR 51434 David Binderman: remove redundant test.


# 1.196 19-Aug-2016 roy

Revert r1.148
IP6_EXTHDR_GET ensures that a icmp6 header can be fetched from the mbuf
so m_pullup does not need to be called.

While here, we can safely increament interface error stats even with an
invalidated mbuf because we have a saved reference to the interface.


Revision tags: pgoyette-localcount-20160806
# 1.195 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


Revision tags: pgoyette-localcount-20160726
# 1.194 15-Jul-2016 ozaki-r

Use sin6tosa and sin6tocsa macros

No functional change.


# 1.193 15-Jul-2016 ozaki-r

Use ifatoia6 macro

No functional change.


Revision tags: pgoyette-localcount-base nick-nhusb-base-20160907
# 1.192 07-Jul-2016 ozaki-r

branches: 1.192.2;
Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.


# 1.191 05-Jul-2016 ozaki-r

Use ia6 or ia instead of ifa as a variable name of struct in6_ifaddr

We conventionally use ifa for struct ifaddr and use ia6 or ia for
struct in6_ifaddr.

No functional change.


# 1.190 28-Jun-2016 ozaki-r

Add missing NULL checks for m_get_rcvif_psref


# 1.189 21-Jun-2016 ozaki-r

Make sure returning ifp from in6_select* functions psref-ed

To this end, callers need to pass struct psref to the functions
and the fuctions acquire a reference of ifp with it. In some cases,
we can simply use if_get_byindex, however, in other cases
(say rt->rt_ifp and ia->ifa_ifp), we have no MP-safe way for now.
In order to take a reference anyway we use non MP-safe function
if_acquire_NOMPSAFE for the latter cases. They should be fixed in
the future somehow.


# 1.188 10-Jun-2016 ozaki-r

Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.


# 1.187 10-Jun-2016 ozaki-r

Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.186 18-May-2016 ozaki-r

Don't try to get outif unnecessarily from in6_selectsrc

The got outif is unused.


# 1.185 17-May-2016 ozaki-r

Get rcvif once and reuse it

No functional change.


# 1.184 17-May-2016 ozaki-r

Make sure icmp6_redirect_input frees mbuf before return


# 1.183 12-May-2016 ozaki-r

Protect ifnet list with psz and psref

The change ensures that ifnet objects in the ifnet list aren't freed during
list iterations by using pserialize(9) and psref(9).

Note that the change adds a pslist(9) for ifnet but doesn't remove the
original ifnet list (ifnet_list) to avoid breaking kvm(3) users. We
shouldn't use the original list in the kernel anymore.


Revision tags: nick-nhusb-base-20160422
# 1.182 04-Apr-2016 ozaki-r

Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html


# 1.181 01-Apr-2016 ozaki-r

Remove unnecessary casts and do s/0/NULL/ for rtrequest


# 1.180 01-Apr-2016 ozaki-r

Refine nd6log

Add __func__ to nd6log itself instead of adding it to callers.


Revision tags: nick-nhusb-base-20160319
# 1.179 21-Jan-2016 riastradh

Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.


# 1.178 21-Jan-2016 riastradh

Give proper prototype to ip_output.


Revision tags: nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.177 14-Sep-2015 ozaki-r

Update icmp6_redirect_timeout_q when changing net.inet6.icmp6.redirtimeout

We have to update icmp6_redirect_timeout_q as well as icmp6_redirtimeout
when changing net.inet6.icmp6.redirtimeout via sysctl. The updating logic
is copied from sysctl_net_inet_icmp_redirtimeout.

This change is from s-yamaguchi@IIJ (with KNF by ozaki-r) and fixes
PR kern/50240.


# 1.176 31-Aug-2015 ozaki-r

Make rt_refcnt take into account rt_timer


# 1.175 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.174 24-Aug-2015 ozaki-r

Change 0 to NULL for rtrequest's last argument (struct rtentry **ret_nrt)


# 1.173 07-Aug-2015 ozaki-r

Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.


# 1.172 24-Jul-2015 ozaki-r

Fix rtfree-ing wrong rtentry


# 1.171 17-Jul-2015 ozaki-r

Reform use of rt_refcnt

rt_refcnt of rtentry was used in bad manners, for example, direct rt_refcnt++
and rt_refcnt-- outside route.c, "rt->rt_refcnt++; rtfree(rt);" idiom, and
touching rt after rt->rt_refcnt--.

These abuses seem to be needed because rt_refcnt manages only references
between rtentry and doesn't take care of references during packet processing
(IOW references from local variables). In order to reduce the above abuses,
the latter cases should be counted by rt_refcnt as well as the former cases.

This change improves consistency of use of rt_refcnt:
- rtentry is always accessed with rt_refcnt incremented
- rtentry's rt_refcnt is decremented after use (rtfree is always used instead
of rt_refcnt--)
- functions returning rtentry increment its rt_refcnt (and caller rtfree it)

Note that rt_refcnt prevents rtentry from being freed but doesn't prevent
rtentry from being updated. Toward MP-safe, we need to provide another
protection for rtentry, e.g., locks. (Or introduce a better data structure
allowing concurrent readers during updates.)


Revision tags: nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.170 25-Nov-2014 christos

branches: 1.170.2;
CID 977389: Out of bounds access.


Revision tags: netbsd-7-0-2-RELEASE netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.169 06-Jun-2014 rmind

branches: 1.169.2;
- Eliminate RTFREE() macro in favour of rtfree() function.
- Make rtcache() function static.


# 1.168 30-May-2014 christos

Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.


# 1.167 19-May-2014 rmind

- Split off PRU_ATTACH and PRU_DETACH logic into separate functions.
- Replace malloc with kmem and eliminate M_PCB while here.
- Sprinkle more asserts.


Revision tags: rmind-smpnet-nbase rmind-smpnet-base
# 1.166 18-May-2014 rmind

Use IFNET_FIRST() rather than open coding ifnet access.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.165 25-Feb-2014 pooka

branches: 1.165.2;
Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.


# 1.164 20-Feb-2014 joerg

Bail out in case m_pulldown failed.


# 1.163 23-Nov-2013 christos

convert from CIRCLEQ to TAILQ.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.162 05-Jun-2013 christos

branches: 1.162.2;
IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.


Revision tags: agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.161 23-Jun-2012 christos

branches: 1.161.2;
4 new sysctls to avoid ipv6 DoS attacks from OpenBSD


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.160 22-Mar-2012 drochner

remove KAME IPSEC, replaced by FAST_IPSEC


Revision tags: netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.159 31-Dec-2011 christos

branches: 1.159.2; 1.159.6; 1.159.8;
- fix offsetof usage, and redundant defines
- kill pointer casts to 0


# 1.158 19-Dec-2011 drochner

rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.157 31-Aug-2011 plunky

branches: 1.157.2; 1.157.6;
NULL does not need a cast


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 rmind-uvmplock-base
# 1.156 12-Sep-2010 drochner

avoid NULL dereference in error case


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.155 18-Oct-2009 christos

branches: 1.155.2; 1.155.4;
fix the sun2 case for real.


# 1.154 12-Oct-2009 christos

unbreak sun2.


# 1.153 16-Sep-2009 pooka

Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.152 18-Mar-2009 cegger

bzero -> memset


# 1.151 18-Mar-2009 cegger

bcmp -> memcmp


Revision tags: netbsd-5-2-3-RELEASE netbsd-5-1-5-RELEASE netbsd-5-2-2-RELEASE netbsd-5-1-4-RELEASE netbsd-5-2-1-RELEASE netbsd-5-1-3-RELEASE netbsd-5-2-RELEASE netbsd-5-2-RC1 netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 nick-hppapmap-base2 netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 haad-dm-base mjf-devfs2-base
# 1.150 03-Oct-2008 adrianp

branches: 1.150.2; 1.150.8;
Fix for CVE-2008-3530 from matt@
Implement improved checking for MTU values on ICMP 'Packet Too Big Messages'


Revision tags: wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.149 06-Aug-2008 plunky

Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base
# 1.148 07-May-2008 bouyer

branches: 1.148.2; 1.148.6;
Sync with ipv4 icmp_input(): make sure the mbuf is writable and
contains the entire icmp message befre calling icmp6_input().
should fix "panic: mbuf too short for IPv6 header" seen by several peoples.


# 1.147 04-May-2008 thorpej

Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577


Revision tags: yamt-nfs-mp-base
# 1.146 23-Apr-2008 thorpej

branches: 1.146.2;
Use <net/net_stats.h> / netstat_sysctl().


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.145 15-Apr-2008 thorpej

branches: 1.145.2;
Make ip6 and icmp6 stats per-cpu.


# 1.144 08-Apr-2008 thorpej

Change IPv6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ip6stat structure; old netstat
binaries will continue to work properly.


# 1.143 08-Apr-2008 thorpej

Change ICMP6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old icmp6stat structure; old netstat
binaries will continue to work properly.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.142 27-Feb-2008 matt

Convert to ansi definitions from old-style definitons.
Remember that func() is not ansi, func(void) is.


Revision tags: nick-net80211-sync-base bouyer-xeni386-merge1 vmlocking2-base3 bouyer-xeni386-nbase yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 bouyer-xeni386-base yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase mjf-devfs-base matt-armv6-base jmcneill-pm-base hpcarm-cleanup-base reinoud-bufcleanup-base
# 1.141 04-Dec-2007 dyoung

branches: 1.141.8; 1.141.12;
Use IFNET_FOREACH() and IFADDR_FOREACH().


Revision tags: vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.140 01-Nov-2007 dyoung

branches: 1.140.2; 1.140.4;
De-__P().


# 1.139 29-Oct-2007 dyoung

The IPv6 stack labels incoming packets with an m_tag whose payload
is a struct ip6aux. A struct ip6aux used to contain a pointer to
an in6_ifaddr, but that pointer could become a dangling reference
in the lifetime of the m_tag, because ip6_setdstifaddr() did not
increase the in6_ifaddr's reference count. I have removed the
pointer from ip6aux. I load it with the interesting fields from
the in6_ifaddr (an IPv6 address, a scope ID, and some flags),
instead.


# 1.138 24-Oct-2007 dyoung

Replace rote sockaddr_in6 initializations (memset(), set sa6_family,
sa6_len, and sa6_add) with sockaddr_in6_init() calls.

De-__P(). Constify. KNF. Shorten a staircase. Change bcmp() to
memcmp().

Extract subroutine in6_setzoneid() from in6_setscope(), for re-use
soon.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base vmlocking-base
# 1.137 19-Sep-2007 dyoung

branches: 1.137.4;
1) Introduce a new socket option, (SOL_SOCKET, SO_NOHEADER), that
tells a socket that it should both add a protocol header to tx'd
datagrams and remove the header from rx'd datagrams:

int onoff = 1, s = socket(...);
setsockopt(s, SOL_SOCKET, SO_NOHEADER, &onoff);

2) Add an implementation of (SOL_SOCKET, SO_NOHEADER) for raw IPv4
sockets.

3) Reorganize the protocols' pr_ctloutput implementations a bit.
Consistently return ENOPROTOOPT when an option is unsupported,
and EINVAL if a supported option's arguments are incorrect.
Reorganize the flow of code so that it's more clear how/when
options are passed down the stack until they are handled.

Shorten some pr_ctloutput staircases for readability.

4) Extract common mbuf code into subroutines, add new sockaddr
methods, and introduce a new subroutine, fsocreate(), for reuse
later; use it first in sys_socket():

struct mbuf *m_getsombuf(struct socket *so)

Create an mbuf and make its owner the socket `so'.

struct mbuf *m_intopt(struct socket *so, int val)

Create an mbuf, make its owner the socket `so', put the
int `val' into it, and set its length to sizeof(int).


int fsocreate(..., int *fd)

Create a socket, a la socreate(9), put the socket into the
given LWP's descriptor table, return the descriptor at `fd'
on success.

void *sockaddr_addr(struct sockaddr *sa, socklen_t *slenp)
const void *sockaddr_const_addr(const struct sockaddr *sa, socklen_t *slenp)

Extract a pointer to the address part of a sockaddr. Write
the length of the address part at `slenp', if `slenp' is
not NULL.

socklen_t sockaddr_getlen(const struct sockaddr *sa)

Return the length of a sockaddr. This just evaluates to
sa->sa_len. I only add this for consistency with code that
appears in a portable userland library that I am going to
import.

const struct sockaddr *sockaddr_any(const struct sockaddr *sa)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.

const void *sockaddr_anyaddr(const struct sockaddr *sa, socklen_t *slenp)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.


Revision tags: nick-csl-alignment-base5
# 1.136 10-Aug-2007 dyoung

branches: 1.136.2;
Constify. bcopy -> memcpy.


Revision tags: matt-mips64-base
# 1.135 19-Jul-2007 dyoung

branches: 1.135.4; 1.135.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.134 13-Jun-2007 dyoung

branches: 1.134.2;
Persuasive programming: check M_UNWRITABLE(m, len) instead of
m->m_len<len before pulling up, because that helps make it clear
that we m_pullup() in order to guarantee that the contiguous region
is *writable*.


# 1.133 23-May-2007 christos

Ansify + add a few comments, from Karl Sj��dahl


Revision tags: yamt-idlelwp-base8
# 1.132 02-May-2007 dyoung

Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


Revision tags: thorpej-atomic-base
# 1.131 04-Mar-2007 christos

branches: 1.131.2; 1.131.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.130 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


# 1.129 10-Feb-2007 degroote

branches: 1.129.2;
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.128 29-Jan-2007 dyoung

bzero -> memset


# 1.127 15-Jan-2007 dyoung

Cosmetic: indent using ASCII horizontal tab, insert space following
comma, wrap line.


# 1.126 15-Jan-2007 degroote

Fix an infinite loop ( and local dos ) in the case where the ip6_hdr and
the icmp6_hdr are not in the same mbuf.
Fix pr/34994 and probably pr/35333
Ok @rpaulo


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.125 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.124 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


Revision tags: netbsd-4-base
# 1.123 16-Nov-2006 christos

branches: 1.123.2;
__unused removal on arguments; approved by core.


Revision tags: yamt-splraiseipl-base2
# 1.122 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.121 05-Sep-2006 dyoung

branches: 1.121.2; 1.121.4;
Simplify and repair icmp6_input() to stop the kernel from panicking
in m_copydata() when an ICMP6_ECHO_REQUEST is received, as reported
by Tatoku Ogaito on current-users@.


Revision tags: yamt-pdpolicy-base8
# 1.120 01-Sep-2006 dyoung

Vastly simplify the code that copies an ICMP6 packet to two data
paths: ICMP6 reply path, and socket path.


# 1.119 30-Aug-2006 christos

declare the type of code.


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.118 11-Jul-2006 tron

Clear mbuf checksum flags before passing it to ip6_output(). We might
recycle a mbuf which contained a hardware provided checksum. This
fixes "traceroute6" to a machine which is using a wm(4) interface
that has UDP or TCP checksum offload enabled.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base
# 1.117 07-Jun-2006 kardel

branches: 1.117.2;
merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base simonb-timecounters-base
# 1.116 15-Apr-2006 christos

branches: 1.116.2;
Coverity CID 740: Change constant comparisons to MCLBYTES to KASSERT and remove
extraneous tests.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2
# 1.115 05-Mar-2006 rpaulo

branches: 1.115.2; 1.115.4;
NDP-related improvements:
RFC4191
- supports host-side router-preference

RFC3542
- if DAD fails on a interface, disables IPv6 operation on the
interface
- don't advertise MLD report before DAD finishes

Others
- fixes integer overflow for valid and preferred lifetimes
- improves timer granularity for MLD, using callout-timer.
- reflects rtadvd's IPv6 host variable information into kernel
(router only)
- adds a sysctl option to enable/disable pMTUd for multicast
packets
- performs NUD on PPP/GRE interface by default
- Redirect works regardless of ip6_accept_rtadv
- removes RFC1885-related code

From the KAME project via SUZUKI Shinsuke.
Reviewed by core.


Revision tags: yamt-pdpolicy-base
# 1.114 03-Mar-2006 rpaulo

branches: 1.114.2;
Fix typos in comments.

From: the KAME project via SUZUKI Shinsuke.


Revision tags: yamt-uio_vmspace-base5
# 1.113 21-Jan-2006 rpaulo

branches: 1.113.2; 1.113.4;
Better support of IPv6 scoped addresses.

- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.
- and, while there, don't try to remove the ff02::/32 interface route
entry in in6_ifdetach() as it's already gone.

This also includes some level of support for the standard source
address selection algorithm defined in RFC3484, which will be
completed on in the future.

From the KAME project via JINMEI Tatuya.
Approved by core@.


# 1.112 11-Dec-2005 christos

branches: 1.112.2;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base ktrace-lwp-base
# 1.111 19-Oct-2005 bouyer

In icmp6_redirect_output(), sip6 is initialised to point to the data area of
m0. But m0 may be freed later, so trying to use sip6 at the end of this
function is wrong. My guess is that we want to reference the data area
of m (the mbuf about to be send) instead at this point.
Fix a panic on Xen (where a data area of a mbuf may be unmapped when the
mbuf is freed), and probably potential data/pool corruption in other cases.


Revision tags: yamt-vop-base
# 1.110 18-Aug-2005 yamt

branches: 1.110.2;
- introduce M_MOVE_PKTHDR and use it where appropriate.
intended to be mostly API compatible with openbsd/freebsd.
- remove a glue #define in netipsec/ipsec_osdep.h.


# 1.109 29-May-2005 christos

branches: 1.109.2;
- avoid shadowed variables
- sprinkle const.


Revision tags: netbsd-3-1-1-RELEASE netbsd-3-0-3-RELEASE netbsd-3-1-RELEASE netbsd-3-0-2-RELEASE netbsd-3-1-RC4 netbsd-3-1-RC3 netbsd-3-1-RC2 netbsd-3-1-RC1 netbsd-3-0-1-RELEASE netbsd-3-0-RELEASE netbsd-3-0-RC6 netbsd-3-0-RC5 netbsd-3-0-RC4 netbsd-3-0-RC3 netbsd-3-0-RC2 netbsd-3-0-RC1 yamt-km-base4 yamt-km-base3 netbsd-3-base yamt-km-base2 yamt-km-base kent-audio2-base
# 1.108 17-Jan-2005 itojun

branches: 1.108.6; 1.108.8; 1.108.10;
shouldn't check code field on "packet too big" icmp6 message.


Revision tags: kent-audio1-beforemerge kent-audio1-base
# 1.107 25-May-2004 atatat

branches: 1.107.4;
Sysctl descriptions under net subtree (net.key not done)


Revision tags: netbsd-2-0-base
# 1.106 26-Mar-2004 itojun

branches: 1.106.2;
do not touch m->m_pkthdr.rcvif after m becomes invalid. Patrick Latifi


# 1.105 24-Mar-2004 atatat

Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.


# 1.104 17-Dec-2003 lha

Fix ICMPV6CTL_ND6_[DP]RLIST, they broke with new sysctl.
Makes ndp -r/ndp -p work again, patch from atatat


# 1.103 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.102 30-Oct-2003 simonb

Remove some assigned-to but otherwise unused variables.


# 1.101 04-Sep-2003 itojun

revamp inpcb/in6pcb so that they are more aligned with each other.
in6pcb lookup now uses hash(9).


# 1.100 25-Aug-2003 itojun

deref member in in6p directly, don't rely on existence of macro


# 1.99 22-Aug-2003 itojun

remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.


# 1.98 22-Aug-2003 itojun

change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.


# 1.97 22-Aug-2003 jonathan

Replace the set_socket() method of passing an extra struct socket*
argument to ip6_output() with a new explicit struct in6pcb* argument.
(The underlying socket can be obtained via in6pcb->inp6_socket.)

In preparation for fast-ipsec. Reviewed by itojun.


# 1.96 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.95 06-Aug-2003 itojun

m_cat may free mbuf on 2nd arg, so m_pkthdr manipulation has to happen
before m_cat call. from Julian Coleman via kame.


# 1.94 24-Jun-2003 itojun

branches: 1.94.2;
remove unneeded checks of accept_rtadv. from kame


# 1.93 24-Jun-2003 itojun

use time.tv_sec directly


# 1.92 06-Jun-2003 itojun

- sync up MLD declaration with RFC3542 (s/MLD6/MLD/)
- routing header declaration with RFC3542
(note: sizeof(ip6_rthdr0) has changed!)
also, sync up with RFC2460 routing header definition (no "strict" source
routing mode any more)

part of advanced API update (RFC2292 -> 3542).


# 1.91 03-Jun-2003 itojun

remove assumption on redirect header option processing. from kame


# 1.90 14-May-2003 itojun

always use PULLDOWN_TEST codepath.


# 1.89 31-Mar-2003 itojun

avoid mbuf leak in redirect header option attachment. more complete
fix to come. from kame


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.88 27-Sep-2002 provos

remove trailing \n in panic(). approved perry.


# 1.87 23-Sep-2002 simonb

Remove breaks after returns, unreachable returns and returns after
returns(!).


# 1.86 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame.


Revision tags: gehenna-devsw-base
# 1.85 30-Jul-2002 itojun

no need to check NULL mbuf, as we touch it already.
From: tedu <grendel@zeitbombe.org>


# 1.84 10-Jul-2002 itojun

correct ping6 -w result wth hostname with [A-Z]. PR 17540. sync w/kame


# 1.83 30-Jun-2002 thorpej

Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).


# 1.82 09-Jun-2002 itojun

whitespace cleanup


# 1.81 08-Jun-2002 itojun

whitespace cleanup


# 1.80 31-May-2002 itojun

do not mistakenly lock PMTUD route entry with RTV_MTU.


# 1.79 29-May-2002 christos

make this compile again.


# 1.78 29-May-2002 itojun

correct rmx_mtu value after PMTUD entry timeout (should be set to 0)


# 1.77 24-May-2002 itojun

extra blank line


# 1.76 24-May-2002 itojun

make a strict check before sending FQDN node information reply. sync w/kame


Revision tags: netbsd-1-6-base eeh-devprop-base newlock-base
# 1.75 05-Mar-2002 itojun

branches: 1.75.6; 1.75.8;
on redirect output, always try to attach target link layer address option.


Revision tags: ifpoll-base
# 1.74 21-Dec-2001 itojun

whitespace/costmetic sync w/kame


# 1.73 20-Dec-2001 itojun

centralize multicast group management (in6_join/leavegroup).
have a flag for ip6_output() to fragment to minimum MTU.
sync with kame


# 1.72 07-Dec-2001 itojun

correct timing to increment icmp6 MIB variables. sync with kame


# 1.71 13-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-mips-cache-base
# 1.70 29-Oct-2001 simonb

Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.


# 1.69 24-Oct-2001 itojun

more whitespace sync with kame


# 1.68 18-Oct-2001 itojun

branches: 1.68.2;
simplify per-if stats.


# 1.67 15-Oct-2001 itojun

sync with kame.
net.inet6.icmp6.nodeinfo is now a bitmap (2^0 = ping6 -w, 2^1 = ping6 -a).
give up local if there's mbuf alloc failures.
cope with ".." in hostname.
sync comments/whitespaces.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.66 22-Jun-2001 itojun

branches: 1.66.2;
remove RFC1885 compatibility code in #ifdef COMPAT_RFC1885, for icmp6
reply packet size consideration (obsolete, not used for a long time).
sync with kame


# 1.65 01-Jun-2001 itojun

use default hoplimit when incoming interface is not given to icmp6_reflect.
sync with kame


# 1.64 08-May-2001 itojun

correct faith prefix determination. use sys/netinet/if_faith.c:faithprefix()
to determine. sync with kame.
(without this change, non-faith socket may mistakenly accept for-faith traffic)


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.63 04-Apr-2001 itojun

make sure rcvif is sane on call to icmp6_reflect


# 1.62 30-Mar-2001 itojun

enable FAKE_LOOPBACK_IF case by default.
now traffic on loopback interface will be presented to bpf as normal wire
format packet (without KAME scopeid in s6_addr16[1]).

fix KAME PR 250 (host mistakenly accepts packets to fe80::x%lo0).

sync with kame.


# 1.61 21-Mar-2001 itojun

set rmx_mtu to L2 interface mtu, instead of 0, on mtudisc timeout.
ip6_output() change is for safety. sync with kame


# 1.60 08-Mar-2001 itojun

remove bogus rtfree. sync with kame. inspired by openbsd PR 1706.


# 1.59 01-Mar-2001 itojun

branches: 1.59.2;
make sure to enforce inbound ipsec policy checking, for any protocols on top
of ip (check it when final header is visited). sync with kame.
XXX kame team will need to re-check policy engine code


# 1.58 11-Feb-2001 itojun

pull latest kame pcbnotify code. synchronizes ICMPv6 path mtu discovery
behavior with other protocols (i.e. validation, use of hiwat/lowat).


# 1.57 11-Feb-2001 itojun

recover $NetBSD$ (removed by mistake)


# 1.56 10-Feb-2001 itojun

to sync with kame better, (1) remove register declaration for variables,
(2) sync whitespaces, (3) update comments. (4) bring in some of portability
and logging enhancements. no functional changes here.


# 1.55 08-Feb-2001 itojun

implement upper limit to icmp6 redirects (experimental, turned off)
negative value to {mtudisc,redirect}_{hi,lo}wat will turn off the limitation.
sync with kame.


# 1.54 07-Feb-2001 itojun

remove bogus DIAGNOSTIC. sync with kame


# 1.53 07-Feb-2001 itojun

during ip6/icmp6 inbound packet processing, do not call log() nor printf() in
normal operation (/var can get filled up by flodding bogus packets).
sysctl net.inet6.icmp6.nd6_debug will turn on diagnostic messages.
(#define ND6_DEBUG will turn it on by default)

improve stats in ND6 code.

lots of synchronziation with kame (including comments and cometic ones).


# 1.52 24-Jan-2001 itojun

- record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation


# 1.51 16-Jan-2001 itojun

s/ND6DEBUG/ND6_DEBUG/ to meet other places


# 1.50 08-Jan-2001 itojun

wrap icmp6 checksum error printf() into #ifdef ND6DEBUG.
sync with kame, NetBSD PR 11911.


# 1.49 11-Dec-2000 itojun

no need to rtalloc1() twice in pmtud. from kame


# 1.48 09-Dec-2000 itojun

update icmp6 too big validation. the change is necessary since pmtud is
mandatory for IPv6 (so we can't just validate by using connected pcb - we need
to allow traffic from unconnected pcb to do pmtud).
- if the traffic is validated by xx_ctlinput, allow up to "hiwat" pmtud
route entries.
- if the traffic was not validated by xx_ctlinput, allow up to "lowat" pmtud
route entries (there's upper limit, so bad guys cannot blow up our routing
table).
sync with kame

XXX need to think again about default hiwat/lowat value.
XXX victim selection to help starvation case


# 1.47 11-Nov-2000 itojun

improve spec conformance of node information query (07).
sync with kame.


# 1.46 18-Oct-2000 itojun

verify ICMPv6 too big messages based on TCP pcbs, and/or IPsec SA.
TODO: udp6, and sendto consideration. as pmtud is mandatory for IPv6,
it is rather important for us to support those cases.
TODO: more testing
TODO: kame sync


# 1.45 10-Oct-2000 itojun

sync with kame ($KAME$)


# 1.44 02-Oct-2000 itojun

fix compilation without INET. fix confusion between ipsecstat and ipsec6stat.
sync with kame.


# 1.43 16-Sep-2000 itojun

kame sys/netinet6/icmp6.c 1.140 -> 1.144
> in the check for the incoming redirect message, examine the gateway
> (from the routing table) only when the address family of the gateway is
> AF_INET6.


# 1.42 19-Aug-2000 itojun

- icmp6 nodeinfo: remove possibility of unaligned pointer access.
- jumbo payload output: fix incorrect mbuf manipulation
- pedant: align issues, mbuf assumption
(sync with kame)


# 1.41 03-Aug-2000 itojun

clearifications in icmp6 node query support.
XXX previous commit included "supported qtypes" icmp6 node query support.
sorry commit message was mistaken.


# 1.40 03-Aug-2000 itojun

correct typo in #define. ICMP6_NI_SUCESS -> SUCCESS (notice missing C).
sync with kame.


# 1.39 30-Jul-2000 itojun

sync comment with reality


# 1.38 28-Jul-2000 itojun

nuke the following sysctl variables. "ppsratelimit" should work better.
need to recompile sbin/sysctl after updating /usr/include.
net.inet.tcp.rstratelimit
net.inet.icmp.errratelimit
net.inet6.icmp6.errratelimit


# 1.37 09-Jul-2000 itojun

add ppsratelimit(9), which does event-per-sec rate limitation.
use it from icmp6 error rate limitation code.
XXX better name for the function?


# 1.36 07-Jul-2000 itojun

sync with kame.
introduce in6_{recover,embed}scope, for in-kernel scoped-address manipulation.
improve in6_pcbnotify.


# 1.35 06-Jul-2000 itojun

- do not use bitfield for router renumbering header.
- add protection mechanism against ND cache corruption due to bad NUD hints.
- more stats
- icmp6 pps limitation. TOOD: should implement ppsratecheck(9).


# 1.34 28-Jun-2000 mrg

<vm/vm.h> -> <uvm/uvm_extern.h>


Revision tags: netbsd-1-5-base
# 1.33 13-Jun-2000 itojun

branches: 1.33.2;
signedness issue with char, take 2. confirmed with i386 cc -funsigned-char.


# 1.32 13-Jun-2000 itojun

workaround to suppress warning on char == unsigned char arch.


# 1.31 12-Jun-2000 itojun

better conformance to draft-ietf-ipngwg-icmp-name-lookups-05.
the old code was chimera of 03 and 05 draft.

-n by default, since IPv6 reverse lookup takes too much time.
use -H to enable reverse name lookup.


Revision tags: minoura-xpg4dl-base
# 1.30 22-May-2000 itojun

branches: 1.30.2;
disallow negative numbers for ratelimit interval (tcp, icmp, icmp6).


# 1.29 09-May-2000 itojun

do not try NUD unless the gateway is a real neighbor.
real fix to KAME PR 245 (workaround has been implemented).


# 1.28 13-Apr-2000 itojun

do not return icmp6 error against icmp6 error.
(this is due to a bug in header chain chasing)


# 1.27 22-Mar-2000 itojun

use ip6_{last,next}hdr in icmp6 inbound packet parsing.


# 1.26 01-Mar-2000 itojun

introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)


# 1.25 28-Feb-2000 itojun

fix ICMPv6 redirect input. the bug can result in invalid ND entry.


# 1.24 28-Feb-2000 itojun

support draft-ietf-ipngwg-icmp-name-lookups-05.txt, drop support for
draft-ietf-ipngwg-icmp-name-lookups-04.txt.

There are certain bitfield change in 04 draft to 05 draft, which makes
04 "ping6 -a" and 05 "ping6 -a" not interoperable. sigh.


# 1.23 26-Feb-2000 itojun

bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.


# 1.22 17-Feb-2000 darrenr

Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.


# 1.21 15-Feb-2000 thorpej

Fix a couple of brainos in the last.


# 1.20 14-Feb-2000 thorpej

Use ratecheck() for ICMP6 rate limiting.


Revision tags: chs-ubc2-newbase
# 1.19 06-Feb-2000 itojun

fix include pathname for better rfc2292 compliance.


# 1.18 16-Jan-2000 itojun

add missing ipcomp cases.


# 1.17 07-Jan-2000 itohy

Rename variable "prep" for PReP port.


# 1.16 06-Jan-2000 itojun

remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...


# 1.15 05-Jan-2000 itojun

avoid panic on getsockopt(ICMPV6_FILTER).


# 1.14 02-Jan-2000 itojun

add net.inet6.icmp6.nodeinfo sysctl.
this allows you to disable/enable ICMPv6 node information query/reply
processing (which tells remote end the gethostname(3) setting, interface
addresses on the node, and some other things - documented in
draft-ietf-ipngwg-icmp-name-lookup* or something alike).

to test it, try ping6 -w ::1 with nodeinfo=0 and nodeinfo=1.
(sync with kame change)


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.13 15-Dec-1999 itojun

do not overwrite traffic class field when we write IPv6 version field.


# 1.12 13-Dec-1999 itojun

sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)


Revision tags: comdex-fall-1999-base fvdl-softdep-base
# 1.11 01-Oct-1999 itojun

branches: 1.11.2; 1.11.8;
consistent logging for icmp6 redirects
XXX should make logs 1-liner so that duplicated logs can be compressed
by syslog(8)?


Revision tags: chs-ubc2-base
# 1.10 31-Jul-1999 itojun

sync with recent KAME.
- loosen ipsec restriction on packet diredtion.
- revise icmp6 redirect handling on IsRouter bit.
- tcp/udp notification processing (link-local address case)
- cosmetic fixes (better code share across *BSD).


# 1.9 30-Jul-1999 itojun

remove reference to in6_systm.h (file itself will be removed afterwords)


# 1.8 22-Jul-1999 itojun

- implement IPv6 pmtud, which is necessary for TCP6.
- fix memory leak on SO_DEBUG over TCP.


# 1.7 22-Jul-1999 itojun

change unnecessary u_long/long into u_int32_t or something relevant.
more fixes should follow.


# 1.6 09-Jul-1999 thorpej

defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).


# 1.5 06-Jul-1999 itojun

sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups


# 1.4 06-Jul-1999 itojun

checked build on alpha and i386, with GENERIC.v6.
fixed several sizeof(void *) and sizeof(size_t) issues on alpha.

Thanks to: Dave Huang and Tim Rightnour


# 1.3 03-Jul-1999 thorpej

RCS ID police.


# 1.2 01-Jul-1999 itojun

branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.1 28-Jun-1999 itojun

branches: 1.1.2;
file icmp6.c was initially added on branch kame.


# 1.255 09-Dec-2023 pgoyette

Modularize the COMPAT_90 code that resulted from the removal of
netinet6/nd6 from the kernel. Now, the minimal compat code can
be successfully loaded and unloaded along with the rest of the
COMPAT_90 code.

XXX pullup-10 - hopefully before RC2


Revision tags: thorpej-ifq-base thorpej-altq-separation-base netbsd-10-0-RC1 netbsd-10-base
# 1.254 28-Oct-2022 ozaki-r

branches: 1.254.2;
inpcb: separate inpcb again to reduce the size of PCB for IPv4

The data size of PCB for IPv4 increased because of the merge of
struct in6pcb. The change decreases the size to the original size by
separating struct inpcb (again). struct in4pcb and in6pcb that embed
struct inpcb are introduced.

Even after the separation, users don't need to realize the separation
and only have to use some macros to access dedicated data. For example,
inp->inp_laddr is now accessed through in4p_laddr(inp).


# 1.253 28-Oct-2022 ozaki-r

inpcb: integrate data structures of PCB into one

Data structures of network protocol control blocks (PCBs), i.e.,
struct inpcb, in6pcb and inpcb_hdr, are not organized well. Users of
the data structures have to handle them separately and thus the code
is cluttered and duplicated.

The commit integrates the data structures into one, struct inpcb. As a
result, users of PCBs only have to handle just one data structure, so
the code becomes simple.

One drawback is that the data size of PCB for IPv4 increases by 40 bytes
(from 248 bytes to 288 bytes).


Revision tags: bouyer-sunxi-drm-base
# 1.252 29-Aug-2022 knakahara

Add sysctl entry to control to send routing message for RTM_DYNAMIC.

Some routing daemons require such routing message to keep coherency.

If we want to let kernel send such message, set net.inet.icmp.dynamic_rt_msg=1
for IPv4, net.inet6.icmp6.dynamic_rt_msg=1 for IPv6.
Default(=0) is the same as before, that is, not send such routing message.


# 1.251 22-Aug-2022 knakahara

Add sysctl entry to enable/disable to use path MTU discovery for icmpv6 reflecting.

If we want to use path MTU discovery for icmp reflecting set
net.inet6.icmp6.reflect_pmtu=1. Default(=0) is the same as before, that is,
use IPV6_MINMTU.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.250 19-Feb-2021 christos

- Make ALIGNED_POINTER use __alignof(t) instead of sizeof(t). This is more
correct because it works with non-primitive types and provides the ABI
alignment for the type the compiler will use.
- Remove all the *_HDR_ALIGNMENT macros and asserts
- Replace POINTER_ALIGNED_P with ACCESSIBLE_POINTER which is identical to
ALIGNED_POINTER, but returns that the pointer is always aligned if the
CPU supports unaligned accesses.
[ as proposed in tech-kern ]


# 1.249 15-Feb-2021 martin

Fix the build.
Maybe there should be a ICMP6_HDR_ALIGNMENT, but for now there is
only IP6_HDR_ALIGNMENT.


# 1.248 14-Feb-2021 christos

- centralize header align and pullup into a single inline function
- use a single macro to align pointers and expose the alignment, instead
of hard-coding 3 in 1/2 the macros.
- fix an issue in the ipv6 lt2p where it was aligning for ipv4 and pulling
for ipv6.


# 1.247 11-Sep-2020 roy

branches: 1.247.2;
inet6: Use generic Neighor Detection rather than IPv6 specific

No functional change intended.


# 1.246 27-Jul-2020 roy

icmp6: Remove __packed attribute from icmp6 structures

They should naturally align.
Add compile time assertations to icmp6.c to prove this.


# 1.245 12-Jun-2020 roy

Remove in-kernel handling of Router Advertisements

This is much better handled by a user-land tool.
Proposed on tech-net here:
https://mail-index.netbsd.org/tech-net/2020/04/22/msg007766.html

Note that the ioctl SIOCGIFINFO_IN6 no longer sets flags. That now
needs to be done using the pre-existing SIOCSIFINFO_FLAGS ioctl.

Compat is fully provided where it makes sense, but trying to turn on
RA handling will obviously throw an error as it no longer exists.

Note that if you use IPv6 temporary addresses, this now needs to be
turned on in dhcpcd.conf(5) rather than in sysctl.conf(5).


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base phil-wifi-20200406
# 1.244 09-Mar-2020 roy

route: RTM_MISS now puts the message source address in RTA_AUTHOR

route(8) also reports this.
A userland app could use this to blacklist nodes who probe for machines
that doesn't exist on a subnet / prefix.


Revision tags: is-mlppp-base ad-namecache-base3 ad-namecache-base2 ad-namecache-base1 ad-namecache-base phil-wifi-20191119
# 1.243 06-Oct-2019 uwe

icmp6_notify_error - fix ctlfunc typedef to match pr_ctlinput,
drop the cast that is no longer necessary.


Revision tags: netbsd-9-3-RELEASE netbsd-9-2-RELEASE netbsd-9-1-RELEASE netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.242 22-Dec-2018 maxv

Replace: M_COPY_PKTHDR -> m_copy_pkthdr. No functional change, since the
former is a macro to the latter.


# 1.241 22-Dec-2018 maxv

Replace: M_MOVE_PKTHDR -> m_move_pkthdr. No functional change, since the
former is a macro to the latter.


Revision tags: pgoyette-compat-1126
# 1.240 25-Oct-2018 ozaki-r

Remove a leftover debug printf

Pointed out by hannken@


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.239 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


Revision tags: pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625
# 1.238 01-Jun-2018 ozaki-r

branches: 1.238.2;
Fix _rt_free via rtrequest(RTM_DELETE) hangs in rt_timer handlers

A rt_timer handler is passed a rtentry with an extra reference that avoids the
rtentry is accidentally released. So rt_timer handers must release the reference
of a passed rtentry by themselves (but they didn't).


Revision tags: pgoyette-compat-0521
# 1.237 07-May-2018 maxv

Remove misleading comments.


Revision tags: pgoyette-compat-0502
# 1.236 01-May-2018 maxv

Remove now unused net_osdep.h includes, the other BSDs did the same.


# 1.235 29-Apr-2018 maxv

Replace
m_copym(m, 0, M_COPYALL, M_DONTWAIT)
by
m_copypacket(m, M_DONTWAIT)
when it is obvious that 'm' has M_PKTHDR set.


# 1.234 28-Apr-2018 maxv

Remove unused ipsec_var.h includes.


# 1.233 27-Apr-2018 maxv

Fix a bug introduced in rev1.154 (2009). mcl_cache still has a size of
MCLBYTES, so the area allocated is still too small.

I think it should have been MEXTMALLOC, and of course I can't test my
change.


# 1.232 26-Apr-2018 maxv

Stop using m_copy(), use m_copym() directly. m_copy is useless,
undocumented and confusing.


# 1.231 26-Apr-2018 maxv

Use M_UNWRITABLE, no functional change.


Revision tags: pgoyette-compat-0422 pgoyette-compat-0415
# 1.230 14-Apr-2018 maxv

Fix 'icmp6len', it shouldn't be ip6_plen, because we may not be at the
beginning of the packet (off+ip6_plen is beyond the end of the mbuf). By
luck, the IP6_EXTHDR_GET that follows will fail and prevent buffer
overflows in non-jumbogram packets.

For jumbograms we will probably be in trouble here; but it doesn't seem
possible to craft reliably a jumbogram for a non-jumbogram-enabled device.

So I don't think it's a huge problem.


# 1.229 14-Apr-2018 maxv

Cosmetic, and remove one XXX (no problem).


# 1.228 14-Apr-2018 maxv

Remove the RH0 code from ICMPv6. RH0 is deprecated by RFC5095 (2007) for
security reasons. We already removed it in Route6.

In addition there was an mbuf bug here: calling IP6_EXTHDR_GET twice with
the same offset, but still using the pointer from the first call, which
could have been made invalid. By luck, m_pulldown leaves zero-sized mbufs
in place, instead of freeing them.

And in general, using a 'finaldst' pointer on the mbuf, and then modifying
that mbuf with IP6_EXTHDR_GET with a smaller offset, was really error-
prone.


# 1.227 14-Apr-2018 maxv

Remove dead code. It is the same as the non-obsolete one, since
ICMP6_DST_UNREACH_NOTNEIGHBOR == ICMP6_DST_UNREACH_BEYONDSCOPE,
and the code leads to the same errno value (EHOSTUNREACH).


# 1.226 12-Apr-2018 maxv

Synchronize the code between raw_ip6.c<->icmp6.c<->raw_ip.c, so that it is
the same everywhere.


# 1.225 12-Apr-2018 maxv

Remove misleading comment; we're just checking the SP, not verifying the
AH/ESP payload. While here style a bit.


Revision tags: pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322
# 1.224 21-Mar-2018 roy

Sprinkle more soroverflow().


Revision tags: pgoyette-compat-0315 pgoyette-compat-base
# 1.223 28-Feb-2018 maxv

branches: 1.223.2;
Remove unused ipsec_private.h includes.


# 1.222 26-Feb-2018 maxv

Remove redundant condition (harmless). PR/53030.


# 1.221 26-Feb-2018 maxv

Dedup: merge ipsec4_in_reject and ipsec6_in_reject into ipsec_in_reject.
While here fix misleading comment.

ok ozaki-r@


# 1.220 12-Feb-2018 maxv

Replace bcopy -> memcpy when it is obvious that the areas don't overlap.
Rearrange ip6_splithdr() for clarity.


# 1.219 23-Jan-2018 maxv

Style, localify, remove XXX when there's no issue, and switch 'extra'
to int.


# 1.218 23-Jan-2018 maxv

Fix the check on 'maxlen', we are not creating struct icmp6_hdr but
struct nd_redirect (which is bigger). Also, make sure we can add a
struct nd_opt_rd_hdr.

Normally this doesn't change anything, since the mbuf has IPV6_MMTU
bytes, and it's always way bigger than what we need.


# 1.217 23-Jan-2018 maxv

Fix info leak. We are allocating a slot of size:

roundup(sizeof(*nd_opt) + ifp->if_addrlen, 8)

But we are not filling in the padding caused by the roundup, and therefore
several bytes are leaked, in the mbuf we're about to send to the network.


# 1.216 23-Jan-2018 maxv

Fix twice the same mistake: 'last' can't be null, so there's no point in
having this misleading branch.


# 1.215 23-Jan-2018 maxv

Style, and four fixes:

* Remove the (disabled) IPPROTO_ESP check. If the packet was decrypted it
will have M_DECRYPTED, and this is already checked.

* Memory leaks in icmp6_error2. They seem hardly triggerable.

* Fix miscomputation in _icmp6_input, the ICMP6 header is not guaranteed
to be located right after the IP6 header. ok mlelstv@

* Memory leak in _icmp6_input. This one seems to be impossible to trigger.


Revision tags: tls-maxphys-base-20171202
# 1.214 05-Nov-2017 ozaki-r

Fix usages of ipsec_used

If IPsec isn't used, we must go back to the normal path.

PR kern/52659


Revision tags: nick-nhusb-base-20170825
# 1.213 02-Aug-2017 ozaki-r

Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.


Revision tags: perseant-stdc-iso10646-base
# 1.212 07-Jul-2017 knakahara

fix PR kern/52353. implemented by ozaki-r@n.o. I just commit by proxy.

XXX need to pullup to -8.


Revision tags: netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.211 14-Mar-2017 ozaki-r

branches: 1.211.6;
Replace DIAGNOSTIC + panic with CTASSERT


# 1.210 17-Feb-2017 ozaki-r

Rename if_acquire_NOMPSAFE to if_acquire

It can be used in MP-safe ways. So let's remove the confusing postfix.
If it's used in a unsafe way, warn NOMPSAFE in a comment.


# 1.209 13-Feb-2017 ozaki-r

Protect mtudisc and redirect stuffs of icmp/icmp6 with mutex

We have to run pr_init of icmp and icmp6 prior to tcp and tcp6 ones
for mutex initialization.


# 1.208 07-Feb-2017 ozaki-r

Add missing NULL checks for m_get_rcvif


Revision tags: nick-nhusb-base-20170204
# 1.207 02-Feb-2017 ozaki-r

Defer some pr_input to workqueue

pr_input is currently called in softint. Some pr_input such as ICMP, ICMPv6
and CARP can add/delete/update IP addresses and routing table entries. For
example, icmp6_redirect_input updates an a routing table entry and
nd6_ra_input may delete an IP address.

Basically such operations shouldn't be done in softint. That aside, we have
a reason to avoid the situation; psz/psref waits cannot be used in softint,
however they are required to work in such pr_input in the MP-safe world.

The change implements the workqueue pr_input framework called wqinput which
provides a means to defer pr_input of a protocol to workqueue easily.
Currently icmp_input, icmp6_input, carp_proto_input and carp6_proto_input
are deferred to workqueue by the framework.

Proposed and discussed on tech-kern and tech-net


# 1.206 16-Jan-2017 christos

ip6_sprintf -> IN6_PRINT so that we pass the size.


# 1.205 16-Jan-2017 ryo

Make ip6_sprintf(), in_fmtaddr(), lla_snprintf() and icmp6_redirect_diag() mpsafe.

Reviewed by ozaki-r@


Revision tags: bouyer-socketcan-base
# 1.204 13-Jan-2017 ozaki-r

branches: 1.204.2;
Tweak icmp6_input; always use off, not *offp


Revision tags: pgoyette-localcount-20170107
# 1.203 12-Dec-2016 ozaki-r

Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.202 11-Dec-2016 ozaki-r

Correct sanity checks of icmp6_redirect_output

- rt->rt_ifp is always non-NULL
- Checking RTF_UP here is just racy and meaningless
- The arguments should be non-NULL (at least for now)


Revision tags: nick-nhusb-base-20161204
# 1.201 15-Nov-2016 mlelstv

Enforce alignment requirements that are violated in some cases.
For machines that don't need strict alignment (i386,amd64,vax,m68k) this
is a no-op.

Fixes PR kern/50766 but should be improved.


Revision tags: pgoyette-localcount-20161104
# 1.200 31-Oct-2016 ozaki-r

Fix race condition of in6_selectsrc

in6_selectsrc returned a pointer to in6_addr that wan't guaranteed to be
safe by pserialize (or psref), which was racy. Let callers pass a pointer
to in6_addr and in6_selectsrc copy a result to it inside pserialize
critical sections.


# 1.199 25-Oct-2016 ozaki-r

Remove unnecessary argument

No functional change.


# 1.198 18-Oct-2016 ozaki-r

Remove unnecessary pserialize_read_enter


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.197 26-Aug-2016 dholland

PR 51434 David Binderman: remove redundant test.


# 1.196 19-Aug-2016 roy

Revert r1.148
IP6_EXTHDR_GET ensures that a icmp6 header can be fetched from the mbuf
so m_pullup does not need to be called.

While here, we can safely increament interface error stats even with an
invalidated mbuf because we have a saved reference to the interface.


Revision tags: pgoyette-localcount-20160806
# 1.195 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


Revision tags: pgoyette-localcount-20160726
# 1.194 15-Jul-2016 ozaki-r

Use sin6tosa and sin6tocsa macros

No functional change.


# 1.193 15-Jul-2016 ozaki-r

Use ifatoia6 macro

No functional change.


Revision tags: pgoyette-localcount-base nick-nhusb-base-20160907
# 1.192 07-Jul-2016 ozaki-r

branches: 1.192.2;
Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.


# 1.191 05-Jul-2016 ozaki-r

Use ia6 or ia instead of ifa as a variable name of struct in6_ifaddr

We conventionally use ifa for struct ifaddr and use ia6 or ia for
struct in6_ifaddr.

No functional change.


# 1.190 28-Jun-2016 ozaki-r

Add missing NULL checks for m_get_rcvif_psref


# 1.189 21-Jun-2016 ozaki-r

Make sure returning ifp from in6_select* functions psref-ed

To this end, callers need to pass struct psref to the functions
and the fuctions acquire a reference of ifp with it. In some cases,
we can simply use if_get_byindex, however, in other cases
(say rt->rt_ifp and ia->ifa_ifp), we have no MP-safe way for now.
In order to take a reference anyway we use non MP-safe function
if_acquire_NOMPSAFE for the latter cases. They should be fixed in
the future somehow.


# 1.188 10-Jun-2016 ozaki-r

Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.


# 1.187 10-Jun-2016 ozaki-r

Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.186 18-May-2016 ozaki-r

Don't try to get outif unnecessarily from in6_selectsrc

The got outif is unused.


# 1.185 17-May-2016 ozaki-r

Get rcvif once and reuse it

No functional change.


# 1.184 17-May-2016 ozaki-r

Make sure icmp6_redirect_input frees mbuf before return


# 1.183 12-May-2016 ozaki-r

Protect ifnet list with psz and psref

The change ensures that ifnet objects in the ifnet list aren't freed during
list iterations by using pserialize(9) and psref(9).

Note that the change adds a pslist(9) for ifnet but doesn't remove the
original ifnet list (ifnet_list) to avoid breaking kvm(3) users. We
shouldn't use the original list in the kernel anymore.


Revision tags: nick-nhusb-base-20160422
# 1.182 04-Apr-2016 ozaki-r

Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html


# 1.181 01-Apr-2016 ozaki-r

Remove unnecessary casts and do s/0/NULL/ for rtrequest


# 1.180 01-Apr-2016 ozaki-r

Refine nd6log

Add __func__ to nd6log itself instead of adding it to callers.


Revision tags: nick-nhusb-base-20160319
# 1.179 21-Jan-2016 riastradh

Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.


# 1.178 21-Jan-2016 riastradh

Give proper prototype to ip_output.


Revision tags: nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.177 14-Sep-2015 ozaki-r

Update icmp6_redirect_timeout_q when changing net.inet6.icmp6.redirtimeout

We have to update icmp6_redirect_timeout_q as well as icmp6_redirtimeout
when changing net.inet6.icmp6.redirtimeout via sysctl. The updating logic
is copied from sysctl_net_inet_icmp_redirtimeout.

This change is from s-yamaguchi@IIJ (with KNF by ozaki-r) and fixes
PR kern/50240.


# 1.176 31-Aug-2015 ozaki-r

Make rt_refcnt take into account rt_timer


# 1.175 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.174 24-Aug-2015 ozaki-r

Change 0 to NULL for rtrequest's last argument (struct rtentry **ret_nrt)


# 1.173 07-Aug-2015 ozaki-r

Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.


# 1.172 24-Jul-2015 ozaki-r

Fix rtfree-ing wrong rtentry


# 1.171 17-Jul-2015 ozaki-r

Reform use of rt_refcnt

rt_refcnt of rtentry was used in bad manners, for example, direct rt_refcnt++
and rt_refcnt-- outside route.c, "rt->rt_refcnt++; rtfree(rt);" idiom, and
touching rt after rt->rt_refcnt--.

These abuses seem to be needed because rt_refcnt manages only references
between rtentry and doesn't take care of references during packet processing
(IOW references from local variables). In order to reduce the above abuses,
the latter cases should be counted by rt_refcnt as well as the former cases.

This change improves consistency of use of rt_refcnt:
- rtentry is always accessed with rt_refcnt incremented
- rtentry's rt_refcnt is decremented after use (rtfree is always used instead
of rt_refcnt--)
- functions returning rtentry increment its rt_refcnt (and caller rtfree it)

Note that rt_refcnt prevents rtentry from being freed but doesn't prevent
rtentry from being updated. Toward MP-safe, we need to provide another
protection for rtentry, e.g., locks. (Or introduce a better data structure
allowing concurrent readers during updates.)


Revision tags: nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.170 25-Nov-2014 christos

branches: 1.170.2;
CID 977389: Out of bounds access.


Revision tags: netbsd-7-0-2-RELEASE netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.169 06-Jun-2014 rmind

branches: 1.169.2;
- Eliminate RTFREE() macro in favour of rtfree() function.
- Make rtcache() function static.


# 1.168 30-May-2014 christos

Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.


# 1.167 19-May-2014 rmind

- Split off PRU_ATTACH and PRU_DETACH logic into separate functions.
- Replace malloc with kmem and eliminate M_PCB while here.
- Sprinkle more asserts.


Revision tags: rmind-smpnet-nbase rmind-smpnet-base
# 1.166 18-May-2014 rmind

Use IFNET_FIRST() rather than open coding ifnet access.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.165 25-Feb-2014 pooka

branches: 1.165.2;
Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.


# 1.164 20-Feb-2014 joerg

Bail out in case m_pulldown failed.


# 1.163 23-Nov-2013 christos

convert from CIRCLEQ to TAILQ.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.162 05-Jun-2013 christos

branches: 1.162.2;
IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.


Revision tags: agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.161 23-Jun-2012 christos

branches: 1.161.2;
4 new sysctls to avoid ipv6 DoS attacks from OpenBSD


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.160 22-Mar-2012 drochner

remove KAME IPSEC, replaced by FAST_IPSEC


Revision tags: netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.159 31-Dec-2011 christos

branches: 1.159.2; 1.159.6; 1.159.8;
- fix offsetof usage, and redundant defines
- kill pointer casts to 0


# 1.158 19-Dec-2011 drochner

rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.157 31-Aug-2011 plunky

branches: 1.157.2; 1.157.6;
NULL does not need a cast


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 rmind-uvmplock-base
# 1.156 12-Sep-2010 drochner

avoid NULL dereference in error case


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.155 18-Oct-2009 christos

branches: 1.155.2; 1.155.4;
fix the sun2 case for real.


# 1.154 12-Oct-2009 christos

unbreak sun2.


# 1.153 16-Sep-2009 pooka

Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.152 18-Mar-2009 cegger

bzero -> memset


# 1.151 18-Mar-2009 cegger

bcmp -> memcmp


Revision tags: netbsd-5-2-3-RELEASE netbsd-5-1-5-RELEASE netbsd-5-2-2-RELEASE netbsd-5-1-4-RELEASE netbsd-5-2-1-RELEASE netbsd-5-1-3-RELEASE netbsd-5-2-RELEASE netbsd-5-2-RC1 netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 nick-hppapmap-base2 netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 haad-dm-base mjf-devfs2-base
# 1.150 03-Oct-2008 adrianp

branches: 1.150.2; 1.150.8;
Fix for CVE-2008-3530 from matt@
Implement improved checking for MTU values on ICMP 'Packet Too Big Messages'


Revision tags: wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.149 06-Aug-2008 plunky

Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base
# 1.148 07-May-2008 bouyer

branches: 1.148.2; 1.148.6;
Sync with ipv4 icmp_input(): make sure the mbuf is writable and
contains the entire icmp message befre calling icmp6_input().
should fix "panic: mbuf too short for IPv6 header" seen by several peoples.


# 1.147 04-May-2008 thorpej

Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577


Revision tags: yamt-nfs-mp-base
# 1.146 23-Apr-2008 thorpej

branches: 1.146.2;
Use <net/net_stats.h> / netstat_sysctl().


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.145 15-Apr-2008 thorpej

branches: 1.145.2;
Make ip6 and icmp6 stats per-cpu.


# 1.144 08-Apr-2008 thorpej

Change IPv6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ip6stat structure; old netstat
binaries will continue to work properly.


# 1.143 08-Apr-2008 thorpej

Change ICMP6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old icmp6stat structure; old netstat
binaries will continue to work properly.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.142 27-Feb-2008 matt

Convert to ansi definitions from old-style definitons.
Remember that func() is not ansi, func(void) is.


Revision tags: nick-net80211-sync-base bouyer-xeni386-merge1 vmlocking2-base3 bouyer-xeni386-nbase yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 bouyer-xeni386-base yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase mjf-devfs-base matt-armv6-base jmcneill-pm-base hpcarm-cleanup-base reinoud-bufcleanup-base
# 1.141 04-Dec-2007 dyoung

branches: 1.141.8; 1.141.12;
Use IFNET_FOREACH() and IFADDR_FOREACH().


Revision tags: vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.140 01-Nov-2007 dyoung

branches: 1.140.2; 1.140.4;
De-__P().


# 1.139 29-Oct-2007 dyoung

The IPv6 stack labels incoming packets with an m_tag whose payload
is a struct ip6aux. A struct ip6aux used to contain a pointer to
an in6_ifaddr, but that pointer could become a dangling reference
in the lifetime of the m_tag, because ip6_setdstifaddr() did not
increase the in6_ifaddr's reference count. I have removed the
pointer from ip6aux. I load it with the interesting fields from
the in6_ifaddr (an IPv6 address, a scope ID, and some flags),
instead.


# 1.138 24-Oct-2007 dyoung

Replace rote sockaddr_in6 initializations (memset(), set sa6_family,
sa6_len, and sa6_add) with sockaddr_in6_init() calls.

De-__P(). Constify. KNF. Shorten a staircase. Change bcmp() to
memcmp().

Extract subroutine in6_setzoneid() from in6_setscope(), for re-use
soon.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base vmlocking-base
# 1.137 19-Sep-2007 dyoung

branches: 1.137.4;
1) Introduce a new socket option, (SOL_SOCKET, SO_NOHEADER), that
tells a socket that it should both add a protocol header to tx'd
datagrams and remove the header from rx'd datagrams:

int onoff = 1, s = socket(...);
setsockopt(s, SOL_SOCKET, SO_NOHEADER, &onoff);

2) Add an implementation of (SOL_SOCKET, SO_NOHEADER) for raw IPv4
sockets.

3) Reorganize the protocols' pr_ctloutput implementations a bit.
Consistently return ENOPROTOOPT when an option is unsupported,
and EINVAL if a supported option's arguments are incorrect.
Reorganize the flow of code so that it's more clear how/when
options are passed down the stack until they are handled.

Shorten some pr_ctloutput staircases for readability.

4) Extract common mbuf code into subroutines, add new sockaddr
methods, and introduce a new subroutine, fsocreate(), for reuse
later; use it first in sys_socket():

struct mbuf *m_getsombuf(struct socket *so)

Create an mbuf and make its owner the socket `so'.

struct mbuf *m_intopt(struct socket *so, int val)

Create an mbuf, make its owner the socket `so', put the
int `val' into it, and set its length to sizeof(int).


int fsocreate(..., int *fd)

Create a socket, a la socreate(9), put the socket into the
given LWP's descriptor table, return the descriptor at `fd'
on success.

void *sockaddr_addr(struct sockaddr *sa, socklen_t *slenp)
const void *sockaddr_const_addr(const struct sockaddr *sa, socklen_t *slenp)

Extract a pointer to the address part of a sockaddr. Write
the length of the address part at `slenp', if `slenp' is
not NULL.

socklen_t sockaddr_getlen(const struct sockaddr *sa)

Return the length of a sockaddr. This just evaluates to
sa->sa_len. I only add this for consistency with code that
appears in a portable userland library that I am going to
import.

const struct sockaddr *sockaddr_any(const struct sockaddr *sa)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.

const void *sockaddr_anyaddr(const struct sockaddr *sa, socklen_t *slenp)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.


Revision tags: nick-csl-alignment-base5
# 1.136 10-Aug-2007 dyoung

branches: 1.136.2;
Constify. bcopy -> memcpy.


Revision tags: matt-mips64-base
# 1.135 19-Jul-2007 dyoung

branches: 1.135.4; 1.135.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.134 13-Jun-2007 dyoung

branches: 1.134.2;
Persuasive programming: check M_UNWRITABLE(m, len) instead of
m->m_len<len before pulling up, because that helps make it clear
that we m_pullup() in order to guarantee that the contiguous region
is *writable*.


# 1.133 23-May-2007 christos

Ansify + add a few comments, from Karl Sj��dahl


Revision tags: yamt-idlelwp-base8
# 1.132 02-May-2007 dyoung

Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


Revision tags: thorpej-atomic-base
# 1.131 04-Mar-2007 christos

branches: 1.131.2; 1.131.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.130 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


# 1.129 10-Feb-2007 degroote

branches: 1.129.2;
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.128 29-Jan-2007 dyoung

bzero -> memset


# 1.127 15-Jan-2007 dyoung

Cosmetic: indent using ASCII horizontal tab, insert space following
comma, wrap line.


# 1.126 15-Jan-2007 degroote

Fix an infinite loop ( and local dos ) in the case where the ip6_hdr and
the icmp6_hdr are not in the same mbuf.
Fix pr/34994 and probably pr/35333
Ok @rpaulo


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.125 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.124 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


Revision tags: netbsd-4-base
# 1.123 16-Nov-2006 christos

branches: 1.123.2;
__unused removal on arguments; approved by core.


Revision tags: yamt-splraiseipl-base2
# 1.122 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.121 05-Sep-2006 dyoung

branches: 1.121.2; 1.121.4;
Simplify and repair icmp6_input() to stop the kernel from panicking
in m_copydata() when an ICMP6_ECHO_REQUEST is received, as reported
by Tatoku Ogaito on current-users@.


Revision tags: yamt-pdpolicy-base8
# 1.120 01-Sep-2006 dyoung

Vastly simplify the code that copies an ICMP6 packet to two data
paths: ICMP6 reply path, and socket path.


# 1.119 30-Aug-2006 christos

declare the type of code.


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.118 11-Jul-2006 tron

Clear mbuf checksum flags before passing it to ip6_output(). We might
recycle a mbuf which contained a hardware provided checksum. This
fixes "traceroute6" to a machine which is using a wm(4) interface
that has UDP or TCP checksum offload enabled.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base
# 1.117 07-Jun-2006 kardel

branches: 1.117.2;
merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base simonb-timecounters-base
# 1.116 15-Apr-2006 christos

branches: 1.116.2;
Coverity CID 740: Change constant comparisons to MCLBYTES to KASSERT and remove
extraneous tests.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2
# 1.115 05-Mar-2006 rpaulo

branches: 1.115.2; 1.115.4;
NDP-related improvements:
RFC4191
- supports host-side router-preference

RFC3542
- if DAD fails on a interface, disables IPv6 operation on the
interface
- don't advertise MLD report before DAD finishes

Others
- fixes integer overflow for valid and preferred lifetimes
- improves timer granularity for MLD, using callout-timer.
- reflects rtadvd's IPv6 host variable information into kernel
(router only)
- adds a sysctl option to enable/disable pMTUd for multicast
packets
- performs NUD on PPP/GRE interface by default
- Redirect works regardless of ip6_accept_rtadv
- removes RFC1885-related code

From the KAME project via SUZUKI Shinsuke.
Reviewed by core.


Revision tags: yamt-pdpolicy-base
# 1.114 03-Mar-2006 rpaulo

branches: 1.114.2;
Fix typos in comments.

From: the KAME project via SUZUKI Shinsuke.


Revision tags: yamt-uio_vmspace-base5
# 1.113 21-Jan-2006 rpaulo

branches: 1.113.2; 1.113.4;
Better support of IPv6 scoped addresses.

- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.
- and, while there, don't try to remove the ff02::/32 interface route
entry in in6_ifdetach() as it's already gone.

This also includes some level of support for the standard source
address selection algorithm defined in RFC3484, which will be
completed on in the future.

From the KAME project via JINMEI Tatuya.
Approved by core@.


# 1.112 11-Dec-2005 christos

branches: 1.112.2;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base ktrace-lwp-base
# 1.111 19-Oct-2005 bouyer

In icmp6_redirect_output(), sip6 is initialised to point to the data area of
m0. But m0 may be freed later, so trying to use sip6 at the end of this
function is wrong. My guess is that we want to reference the data area
of m (the mbuf about to be send) instead at this point.
Fix a panic on Xen (where a data area of a mbuf may be unmapped when the
mbuf is freed), and probably potential data/pool corruption in other cases.


Revision tags: yamt-vop-base
# 1.110 18-Aug-2005 yamt

branches: 1.110.2;
- introduce M_MOVE_PKTHDR and use it where appropriate.
intended to be mostly API compatible with openbsd/freebsd.
- remove a glue #define in netipsec/ipsec_osdep.h.


# 1.109 29-May-2005 christos

branches: 1.109.2;
- avoid shadowed variables
- sprinkle const.


Revision tags: netbsd-3-1-1-RELEASE netbsd-3-0-3-RELEASE netbsd-3-1-RELEASE netbsd-3-0-2-RELEASE netbsd-3-1-RC4 netbsd-3-1-RC3 netbsd-3-1-RC2 netbsd-3-1-RC1 netbsd-3-0-1-RELEASE netbsd-3-0-RELEASE netbsd-3-0-RC6 netbsd-3-0-RC5 netbsd-3-0-RC4 netbsd-3-0-RC3 netbsd-3-0-RC2 netbsd-3-0-RC1 yamt-km-base4 yamt-km-base3 netbsd-3-base yamt-km-base2 yamt-km-base kent-audio2-base
# 1.108 17-Jan-2005 itojun

branches: 1.108.6; 1.108.8; 1.108.10;
shouldn't check code field on "packet too big" icmp6 message.


Revision tags: kent-audio1-beforemerge kent-audio1-base
# 1.107 25-May-2004 atatat

branches: 1.107.4;
Sysctl descriptions under net subtree (net.key not done)


Revision tags: netbsd-2-0-base
# 1.106 26-Mar-2004 itojun

branches: 1.106.2;
do not touch m->m_pkthdr.rcvif after m becomes invalid. Patrick Latifi


# 1.105 24-Mar-2004 atatat

Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.


# 1.104 17-Dec-2003 lha

Fix ICMPV6CTL_ND6_[DP]RLIST, they broke with new sysctl.
Makes ndp -r/ndp -p work again, patch from atatat


# 1.103 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.102 30-Oct-2003 simonb

Remove some assigned-to but otherwise unused variables.


# 1.101 04-Sep-2003 itojun

revamp inpcb/in6pcb so that they are more aligned with each other.
in6pcb lookup now uses hash(9).


# 1.100 25-Aug-2003 itojun

deref member in in6p directly, don't rely on existence of macro


# 1.99 22-Aug-2003 itojun

remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.


# 1.98 22-Aug-2003 itojun

change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.


# 1.97 22-Aug-2003 jonathan

Replace the set_socket() method of passing an extra struct socket*
argument to ip6_output() with a new explicit struct in6pcb* argument.
(The underlying socket can be obtained via in6pcb->inp6_socket.)

In preparation for fast-ipsec. Reviewed by itojun.


# 1.96 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.95 06-Aug-2003 itojun

m_cat may free mbuf on 2nd arg, so m_pkthdr manipulation has to happen
before m_cat call. from Julian Coleman via kame.


# 1.94 24-Jun-2003 itojun

branches: 1.94.2;
remove unneeded checks of accept_rtadv. from kame


# 1.93 24-Jun-2003 itojun

use time.tv_sec directly


# 1.92 06-Jun-2003 itojun

- sync up MLD declaration with RFC3542 (s/MLD6/MLD/)
- routing header declaration with RFC3542
(note: sizeof(ip6_rthdr0) has changed!)
also, sync up with RFC2460 routing header definition (no "strict" source
routing mode any more)

part of advanced API update (RFC2292 -> 3542).


# 1.91 03-Jun-2003 itojun

remove assumption on redirect header option processing. from kame


# 1.90 14-May-2003 itojun

always use PULLDOWN_TEST codepath.


# 1.89 31-Mar-2003 itojun

avoid mbuf leak in redirect header option attachment. more complete
fix to come. from kame


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.88 27-Sep-2002 provos

remove trailing \n in panic(). approved perry.


# 1.87 23-Sep-2002 simonb

Remove breaks after returns, unreachable returns and returns after
returns(!).


# 1.86 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame.


Revision tags: gehenna-devsw-base
# 1.85 30-Jul-2002 itojun

no need to check NULL mbuf, as we touch it already.
From: tedu <grendel@zeitbombe.org>


# 1.84 10-Jul-2002 itojun

correct ping6 -w result wth hostname with [A-Z]. PR 17540. sync w/kame


# 1.83 30-Jun-2002 thorpej

Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).


# 1.82 09-Jun-2002 itojun

whitespace cleanup


# 1.81 08-Jun-2002 itojun

whitespace cleanup


# 1.80 31-May-2002 itojun

do not mistakenly lock PMTUD route entry with RTV_MTU.


# 1.79 29-May-2002 christos

make this compile again.


# 1.78 29-May-2002 itojun

correct rmx_mtu value after PMTUD entry timeout (should be set to 0)


# 1.77 24-May-2002 itojun

extra blank line


# 1.76 24-May-2002 itojun

make a strict check before sending FQDN node information reply. sync w/kame


Revision tags: netbsd-1-6-base eeh-devprop-base newlock-base
# 1.75 05-Mar-2002 itojun

branches: 1.75.6; 1.75.8;
on redirect output, always try to attach target link layer address option.


Revision tags: ifpoll-base
# 1.74 21-Dec-2001 itojun

whitespace/costmetic sync w/kame


# 1.73 20-Dec-2001 itojun

centralize multicast group management (in6_join/leavegroup).
have a flag for ip6_output() to fragment to minimum MTU.
sync with kame


# 1.72 07-Dec-2001 itojun

correct timing to increment icmp6 MIB variables. sync with kame


# 1.71 13-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-mips-cache-base
# 1.70 29-Oct-2001 simonb

Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.


# 1.69 24-Oct-2001 itojun

more whitespace sync with kame


# 1.68 18-Oct-2001 itojun

branches: 1.68.2;
simplify per-if stats.


# 1.67 15-Oct-2001 itojun

sync with kame.
net.inet6.icmp6.nodeinfo is now a bitmap (2^0 = ping6 -w, 2^1 = ping6 -a).
give up local if there's mbuf alloc failures.
cope with ".." in hostname.
sync comments/whitespaces.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.66 22-Jun-2001 itojun

branches: 1.66.2;
remove RFC1885 compatibility code in #ifdef COMPAT_RFC1885, for icmp6
reply packet size consideration (obsolete, not used for a long time).
sync with kame


# 1.65 01-Jun-2001 itojun

use default hoplimit when incoming interface is not given to icmp6_reflect.
sync with kame


# 1.64 08-May-2001 itojun

correct faith prefix determination. use sys/netinet/if_faith.c:faithprefix()
to determine. sync with kame.
(without this change, non-faith socket may mistakenly accept for-faith traffic)


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.63 04-Apr-2001 itojun

make sure rcvif is sane on call to icmp6_reflect


# 1.62 30-Mar-2001 itojun

enable FAKE_LOOPBACK_IF case by default.
now traffic on loopback interface will be presented to bpf as normal wire
format packet (without KAME scopeid in s6_addr16[1]).

fix KAME PR 250 (host mistakenly accepts packets to fe80::x%lo0).

sync with kame.


# 1.61 21-Mar-2001 itojun

set rmx_mtu to L2 interface mtu, instead of 0, on mtudisc timeout.
ip6_output() change is for safety. sync with kame


# 1.60 08-Mar-2001 itojun

remove bogus rtfree. sync with kame. inspired by openbsd PR 1706.


# 1.59 01-Mar-2001 itojun

branches: 1.59.2;
make sure to enforce inbound ipsec policy checking, for any protocols on top
of ip (check it when final header is visited). sync with kame.
XXX kame team will need to re-check policy engine code


# 1.58 11-Feb-2001 itojun

pull latest kame pcbnotify code. synchronizes ICMPv6 path mtu discovery
behavior with other protocols (i.e. validation, use of hiwat/lowat).


# 1.57 11-Feb-2001 itojun

recover $NetBSD$ (removed by mistake)


# 1.56 10-Feb-2001 itojun

to sync with kame better, (1) remove register declaration for variables,
(2) sync whitespaces, (3) update comments. (4) bring in some of portability
and logging enhancements. no functional changes here.


# 1.55 08-Feb-2001 itojun

implement upper limit to icmp6 redirects (experimental, turned off)
negative value to {mtudisc,redirect}_{hi,lo}wat will turn off the limitation.
sync with kame.


# 1.54 07-Feb-2001 itojun

remove bogus DIAGNOSTIC. sync with kame


# 1.53 07-Feb-2001 itojun

during ip6/icmp6 inbound packet processing, do not call log() nor printf() in
normal operation (/var can get filled up by flodding bogus packets).
sysctl net.inet6.icmp6.nd6_debug will turn on diagnostic messages.
(#define ND6_DEBUG will turn it on by default)

improve stats in ND6 code.

lots of synchronziation with kame (including comments and cometic ones).


# 1.52 24-Jan-2001 itojun

- record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation


# 1.51 16-Jan-2001 itojun

s/ND6DEBUG/ND6_DEBUG/ to meet other places


# 1.50 08-Jan-2001 itojun

wrap icmp6 checksum error printf() into #ifdef ND6DEBUG.
sync with kame, NetBSD PR 11911.


# 1.49 11-Dec-2000 itojun

no need to rtalloc1() twice in pmtud. from kame


# 1.48 09-Dec-2000 itojun

update icmp6 too big validation. the change is necessary since pmtud is
mandatory for IPv6 (so we can't just validate by using connected pcb - we need
to allow traffic from unconnected pcb to do pmtud).
- if the traffic is validated by xx_ctlinput, allow up to "hiwat" pmtud
route entries.
- if the traffic was not validated by xx_ctlinput, allow up to "lowat" pmtud
route entries (there's upper limit, so bad guys cannot blow up our routing
table).
sync with kame

XXX need to think again about default hiwat/lowat value.
XXX victim selection to help starvation case


# 1.47 11-Nov-2000 itojun

improve spec conformance of node information query (07).
sync with kame.


# 1.46 18-Oct-2000 itojun

verify ICMPv6 too big messages based on TCP pcbs, and/or IPsec SA.
TODO: udp6, and sendto consideration. as pmtud is mandatory for IPv6,
it is rather important for us to support those cases.
TODO: more testing
TODO: kame sync


# 1.45 10-Oct-2000 itojun

sync with kame ($KAME$)


# 1.44 02-Oct-2000 itojun

fix compilation without INET. fix confusion between ipsecstat and ipsec6stat.
sync with kame.


# 1.43 16-Sep-2000 itojun

kame sys/netinet6/icmp6.c 1.140 -> 1.144
> in the check for the incoming redirect message, examine the gateway
> (from the routing table) only when the address family of the gateway is
> AF_INET6.


# 1.42 19-Aug-2000 itojun

- icmp6 nodeinfo: remove possibility of unaligned pointer access.
- jumbo payload output: fix incorrect mbuf manipulation
- pedant: align issues, mbuf assumption
(sync with kame)


# 1.41 03-Aug-2000 itojun

clearifications in icmp6 node query support.
XXX previous commit included "supported qtypes" icmp6 node query support.
sorry commit message was mistaken.


# 1.40 03-Aug-2000 itojun

correct typo in #define. ICMP6_NI_SUCESS -> SUCCESS (notice missing C).
sync with kame.


# 1.39 30-Jul-2000 itojun

sync comment with reality


# 1.38 28-Jul-2000 itojun

nuke the following sysctl variables. "ppsratelimit" should work better.
need to recompile sbin/sysctl after updating /usr/include.
net.inet.tcp.rstratelimit
net.inet.icmp.errratelimit
net.inet6.icmp6.errratelimit


# 1.37 09-Jul-2000 itojun

add ppsratelimit(9), which does event-per-sec rate limitation.
use it from icmp6 error rate limitation code.
XXX better name for the function?


# 1.36 07-Jul-2000 itojun

sync with kame.
introduce in6_{recover,embed}scope, for in-kernel scoped-address manipulation.
improve in6_pcbnotify.


# 1.35 06-Jul-2000 itojun

- do not use bitfield for router renumbering header.
- add protection mechanism against ND cache corruption due to bad NUD hints.
- more stats
- icmp6 pps limitation. TOOD: should implement ppsratecheck(9).


# 1.34 28-Jun-2000 mrg

<vm/vm.h> -> <uvm/uvm_extern.h>


Revision tags: netbsd-1-5-base
# 1.33 13-Jun-2000 itojun

branches: 1.33.2;
signedness issue with char, take 2. confirmed with i386 cc -funsigned-char.


# 1.32 13-Jun-2000 itojun

workaround to suppress warning on char == unsigned char arch.


# 1.31 12-Jun-2000 itojun

better conformance to draft-ietf-ipngwg-icmp-name-lookups-05.
the old code was chimera of 03 and 05 draft.

-n by default, since IPv6 reverse lookup takes too much time.
use -H to enable reverse name lookup.


Revision tags: minoura-xpg4dl-base
# 1.30 22-May-2000 itojun

branches: 1.30.2;
disallow negative numbers for ratelimit interval (tcp, icmp, icmp6).


# 1.29 09-May-2000 itojun

do not try NUD unless the gateway is a real neighbor.
real fix to KAME PR 245 (workaround has been implemented).


# 1.28 13-Apr-2000 itojun

do not return icmp6 error against icmp6 error.
(this is due to a bug in header chain chasing)


# 1.27 22-Mar-2000 itojun

use ip6_{last,next}hdr in icmp6 inbound packet parsing.


# 1.26 01-Mar-2000 itojun

introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)


# 1.25 28-Feb-2000 itojun

fix ICMPv6 redirect input. the bug can result in invalid ND entry.


# 1.24 28-Feb-2000 itojun

support draft-ietf-ipngwg-icmp-name-lookups-05.txt, drop support for
draft-ietf-ipngwg-icmp-name-lookups-04.txt.

There are certain bitfield change in 04 draft to 05 draft, which makes
04 "ping6 -a" and 05 "ping6 -a" not interoperable. sigh.


# 1.23 26-Feb-2000 itojun

bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.


# 1.22 17-Feb-2000 darrenr

Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.


# 1.21 15-Feb-2000 thorpej

Fix a couple of brainos in the last.


# 1.20 14-Feb-2000 thorpej

Use ratecheck() for ICMP6 rate limiting.


Revision tags: chs-ubc2-newbase
# 1.19 06-Feb-2000 itojun

fix include pathname for better rfc2292 compliance.


# 1.18 16-Jan-2000 itojun

add missing ipcomp cases.


# 1.17 07-Jan-2000 itohy

Rename variable "prep" for PReP port.


# 1.16 06-Jan-2000 itojun

remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...


# 1.15 05-Jan-2000 itojun

avoid panic on getsockopt(ICMPV6_FILTER).


# 1.14 02-Jan-2000 itojun

add net.inet6.icmp6.nodeinfo sysctl.
this allows you to disable/enable ICMPv6 node information query/reply
processing (which tells remote end the gethostname(3) setting, interface
addresses on the node, and some other things - documented in
draft-ietf-ipngwg-icmp-name-lookup* or something alike).

to test it, try ping6 -w ::1 with nodeinfo=0 and nodeinfo=1.
(sync with kame change)


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.13 15-Dec-1999 itojun

do not overwrite traffic class field when we write IPv6 version field.


# 1.12 13-Dec-1999 itojun

sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)


Revision tags: comdex-fall-1999-base fvdl-softdep-base
# 1.11 01-Oct-1999 itojun

branches: 1.11.2; 1.11.8;
consistent logging for icmp6 redirects
XXX should make logs 1-liner so that duplicated logs can be compressed
by syslog(8)?


Revision tags: chs-ubc2-base
# 1.10 31-Jul-1999 itojun

sync with recent KAME.
- loosen ipsec restriction on packet diredtion.
- revise icmp6 redirect handling on IsRouter bit.
- tcp/udp notification processing (link-local address case)
- cosmetic fixes (better code share across *BSD).


# 1.9 30-Jul-1999 itojun

remove reference to in6_systm.h (file itself will be removed afterwords)


# 1.8 22-Jul-1999 itojun

- implement IPv6 pmtud, which is necessary for TCP6.
- fix memory leak on SO_DEBUG over TCP.


# 1.7 22-Jul-1999 itojun

change unnecessary u_long/long into u_int32_t or something relevant.
more fixes should follow.


# 1.6 09-Jul-1999 thorpej

defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).


# 1.5 06-Jul-1999 itojun

sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups


# 1.4 06-Jul-1999 itojun

checked build on alpha and i386, with GENERIC.v6.
fixed several sizeof(void *) and sizeof(size_t) issues on alpha.

Thanks to: Dave Huang and Tim Rightnour


# 1.3 03-Jul-1999 thorpej

RCS ID police.


# 1.2 01-Jul-1999 itojun

branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.1 28-Jun-1999 itojun

branches: 1.1.2;
file icmp6.c was initially added on branch kame.


# 1.254 28-Oct-2022 ozaki-r

inpcb: separate inpcb again to reduce the size of PCB for IPv4

The data size of PCB for IPv4 increased because of the merge of
struct in6pcb. The change decreases the size to the original size by
separating struct inpcb (again). struct in4pcb and in6pcb that embed
struct inpcb are introduced.

Even after the separation, users don't need to realize the separation
and only have to use some macros to access dedicated data. For example,
inp->inp_laddr is now accessed through in4p_laddr(inp).


# 1.253 28-Oct-2022 ozaki-r

inpcb: integrate data structures of PCB into one

Data structures of network protocol control blocks (PCBs), i.e.,
struct inpcb, in6pcb and inpcb_hdr, are not organized well. Users of
the data structures have to handle them separately and thus the code
is cluttered and duplicated.

The commit integrates the data structures into one, struct inpcb. As a
result, users of PCBs only have to handle just one data structure, so
the code becomes simple.

One drawback is that the data size of PCB for IPv4 increases by 40 bytes
(from 248 bytes to 288 bytes).


Revision tags: bouyer-sunxi-drm-base
# 1.252 29-Aug-2022 knakahara

Add sysctl entry to control to send routing message for RTM_DYNAMIC.

Some routing daemons require such routing message to keep coherency.

If we want to let kernel send such message, set net.inet.icmp.dynamic_rt_msg=1
for IPv4, net.inet6.icmp6.dynamic_rt_msg=1 for IPv6.
Default(=0) is the same as before, that is, not send such routing message.


# 1.251 22-Aug-2022 knakahara

Add sysctl entry to enable/disable to use path MTU discovery for icmpv6 reflecting.

If we want to use path MTU discovery for icmp reflecting set
net.inet6.icmp6.reflect_pmtu=1. Default(=0) is the same as before, that is,
use IPV6_MINMTU.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.250 19-Feb-2021 christos

- Make ALIGNED_POINTER use __alignof(t) instead of sizeof(t). This is more
correct because it works with non-primitive types and provides the ABI
alignment for the type the compiler will use.
- Remove all the *_HDR_ALIGNMENT macros and asserts
- Replace POINTER_ALIGNED_P with ACCESSIBLE_POINTER which is identical to
ALIGNED_POINTER, but returns that the pointer is always aligned if the
CPU supports unaligned accesses.
[ as proposed in tech-kern ]


# 1.249 15-Feb-2021 martin

Fix the build.
Maybe there should be a ICMP6_HDR_ALIGNMENT, but for now there is
only IP6_HDR_ALIGNMENT.


# 1.248 14-Feb-2021 christos

- centralize header align and pullup into a single inline function
- use a single macro to align pointers and expose the alignment, instead
of hard-coding 3 in 1/2 the macros.
- fix an issue in the ipv6 lt2p where it was aligning for ipv4 and pulling
for ipv6.


# 1.247 11-Sep-2020 roy

branches: 1.247.2;
inet6: Use generic Neighor Detection rather than IPv6 specific

No functional change intended.


# 1.246 27-Jul-2020 roy

icmp6: Remove __packed attribute from icmp6 structures

They should naturally align.
Add compile time assertations to icmp6.c to prove this.


# 1.245 12-Jun-2020 roy

Remove in-kernel handling of Router Advertisements

This is much better handled by a user-land tool.
Proposed on tech-net here:
https://mail-index.netbsd.org/tech-net/2020/04/22/msg007766.html

Note that the ioctl SIOCGIFINFO_IN6 no longer sets flags. That now
needs to be done using the pre-existing SIOCSIFINFO_FLAGS ioctl.

Compat is fully provided where it makes sense, but trying to turn on
RA handling will obviously throw an error as it no longer exists.

Note that if you use IPv6 temporary addresses, this now needs to be
turned on in dhcpcd.conf(5) rather than in sysctl.conf(5).


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base phil-wifi-20200406
# 1.244 09-Mar-2020 roy

route: RTM_MISS now puts the message source address in RTA_AUTHOR

route(8) also reports this.
A userland app could use this to blacklist nodes who probe for machines
that doesn't exist on a subnet / prefix.


Revision tags: is-mlppp-base ad-namecache-base3 ad-namecache-base2 ad-namecache-base1 ad-namecache-base phil-wifi-20191119
# 1.243 06-Oct-2019 uwe

icmp6_notify_error - fix ctlfunc typedef to match pr_ctlinput,
drop the cast that is no longer necessary.


Revision tags: netbsd-9-3-RELEASE netbsd-9-2-RELEASE netbsd-9-1-RELEASE netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.242 22-Dec-2018 maxv

Replace: M_COPY_PKTHDR -> m_copy_pkthdr. No functional change, since the
former is a macro to the latter.


# 1.241 22-Dec-2018 maxv

Replace: M_MOVE_PKTHDR -> m_move_pkthdr. No functional change, since the
former is a macro to the latter.


Revision tags: pgoyette-compat-1126
# 1.240 25-Oct-2018 ozaki-r

Remove a leftover debug printf

Pointed out by hannken@


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.239 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


Revision tags: pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625
# 1.238 01-Jun-2018 ozaki-r

branches: 1.238.2;
Fix _rt_free via rtrequest(RTM_DELETE) hangs in rt_timer handlers

A rt_timer handler is passed a rtentry with an extra reference that avoids the
rtentry is accidentally released. So rt_timer handers must release the reference
of a passed rtentry by themselves (but they didn't).


Revision tags: pgoyette-compat-0521
# 1.237 07-May-2018 maxv

Remove misleading comments.


Revision tags: pgoyette-compat-0502
# 1.236 01-May-2018 maxv

Remove now unused net_osdep.h includes, the other BSDs did the same.


# 1.235 29-Apr-2018 maxv

Replace
m_copym(m, 0, M_COPYALL, M_DONTWAIT)
by
m_copypacket(m, M_DONTWAIT)
when it is obvious that 'm' has M_PKTHDR set.


# 1.234 28-Apr-2018 maxv

Remove unused ipsec_var.h includes.


# 1.233 27-Apr-2018 maxv

Fix a bug introduced in rev1.154 (2009). mcl_cache still has a size of
MCLBYTES, so the area allocated is still too small.

I think it should have been MEXTMALLOC, and of course I can't test my
change.


# 1.232 26-Apr-2018 maxv

Stop using m_copy(), use m_copym() directly. m_copy is useless,
undocumented and confusing.


# 1.231 26-Apr-2018 maxv

Use M_UNWRITABLE, no functional change.


Revision tags: pgoyette-compat-0422 pgoyette-compat-0415
# 1.230 14-Apr-2018 maxv

Fix 'icmp6len', it shouldn't be ip6_plen, because we may not be at the
beginning of the packet (off+ip6_plen is beyond the end of the mbuf). By
luck, the IP6_EXTHDR_GET that follows will fail and prevent buffer
overflows in non-jumbogram packets.

For jumbograms we will probably be in trouble here; but it doesn't seem
possible to craft reliably a jumbogram for a non-jumbogram-enabled device.

So I don't think it's a huge problem.


# 1.229 14-Apr-2018 maxv

Cosmetic, and remove one XXX (no problem).


# 1.228 14-Apr-2018 maxv

Remove the RH0 code from ICMPv6. RH0 is deprecated by RFC5095 (2007) for
security reasons. We already removed it in Route6.

In addition there was an mbuf bug here: calling IP6_EXTHDR_GET twice with
the same offset, but still using the pointer from the first call, which
could have been made invalid. By luck, m_pulldown leaves zero-sized mbufs
in place, instead of freeing them.

And in general, using a 'finaldst' pointer on the mbuf, and then modifying
that mbuf with IP6_EXTHDR_GET with a smaller offset, was really error-
prone.


# 1.227 14-Apr-2018 maxv

Remove dead code. It is the same as the non-obsolete one, since
ICMP6_DST_UNREACH_NOTNEIGHBOR == ICMP6_DST_UNREACH_BEYONDSCOPE,
and the code leads to the same errno value (EHOSTUNREACH).


# 1.226 12-Apr-2018 maxv

Synchronize the code between raw_ip6.c<->icmp6.c<->raw_ip.c, so that it is
the same everywhere.


# 1.225 12-Apr-2018 maxv

Remove misleading comment; we're just checking the SP, not verifying the
AH/ESP payload. While here style a bit.


Revision tags: pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322
# 1.224 21-Mar-2018 roy

Sprinkle more soroverflow().


Revision tags: pgoyette-compat-0315 pgoyette-compat-base
# 1.223 28-Feb-2018 maxv

branches: 1.223.2;
Remove unused ipsec_private.h includes.


# 1.222 26-Feb-2018 maxv

Remove redundant condition (harmless). PR/53030.


# 1.221 26-Feb-2018 maxv

Dedup: merge ipsec4_in_reject and ipsec6_in_reject into ipsec_in_reject.
While here fix misleading comment.

ok ozaki-r@


# 1.220 12-Feb-2018 maxv

Replace bcopy -> memcpy when it is obvious that the areas don't overlap.
Rearrange ip6_splithdr() for clarity.


# 1.219 23-Jan-2018 maxv

Style, localify, remove XXX when there's no issue, and switch 'extra'
to int.


# 1.218 23-Jan-2018 maxv

Fix the check on 'maxlen', we are not creating struct icmp6_hdr but
struct nd_redirect (which is bigger). Also, make sure we can add a
struct nd_opt_rd_hdr.

Normally this doesn't change anything, since the mbuf has IPV6_MMTU
bytes, and it's always way bigger than what we need.


# 1.217 23-Jan-2018 maxv

Fix info leak. We are allocating a slot of size:

roundup(sizeof(*nd_opt) + ifp->if_addrlen, 8)

But we are not filling in the padding caused by the roundup, and therefore
several bytes are leaked, in the mbuf we're about to send to the network.


# 1.216 23-Jan-2018 maxv

Fix twice the same mistake: 'last' can't be null, so there's no point in
having this misleading branch.


# 1.215 23-Jan-2018 maxv

Style, and four fixes:

* Remove the (disabled) IPPROTO_ESP check. If the packet was decrypted it
will have M_DECRYPTED, and this is already checked.

* Memory leaks in icmp6_error2. They seem hardly triggerable.

* Fix miscomputation in _icmp6_input, the ICMP6 header is not guaranteed
to be located right after the IP6 header. ok mlelstv@

* Memory leak in _icmp6_input. This one seems to be impossible to trigger.


Revision tags: tls-maxphys-base-20171202
# 1.214 05-Nov-2017 ozaki-r

Fix usages of ipsec_used

If IPsec isn't used, we must go back to the normal path.

PR kern/52659


Revision tags: nick-nhusb-base-20170825
# 1.213 02-Aug-2017 ozaki-r

Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.


Revision tags: perseant-stdc-iso10646-base
# 1.212 07-Jul-2017 knakahara

fix PR kern/52353. implemented by ozaki-r@n.o. I just commit by proxy.

XXX need to pullup to -8.


Revision tags: netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.211 14-Mar-2017 ozaki-r

branches: 1.211.6;
Replace DIAGNOSTIC + panic with CTASSERT


# 1.210 17-Feb-2017 ozaki-r

Rename if_acquire_NOMPSAFE to if_acquire

It can be used in MP-safe ways. So let's remove the confusing postfix.
If it's used in a unsafe way, warn NOMPSAFE in a comment.


# 1.209 13-Feb-2017 ozaki-r

Protect mtudisc and redirect stuffs of icmp/icmp6 with mutex

We have to run pr_init of icmp and icmp6 prior to tcp and tcp6 ones
for mutex initialization.


# 1.208 07-Feb-2017 ozaki-r

Add missing NULL checks for m_get_rcvif


Revision tags: nick-nhusb-base-20170204
# 1.207 02-Feb-2017 ozaki-r

Defer some pr_input to workqueue

pr_input is currently called in softint. Some pr_input such as ICMP, ICMPv6
and CARP can add/delete/update IP addresses and routing table entries. For
example, icmp6_redirect_input updates an a routing table entry and
nd6_ra_input may delete an IP address.

Basically such operations shouldn't be done in softint. That aside, we have
a reason to avoid the situation; psz/psref waits cannot be used in softint,
however they are required to work in such pr_input in the MP-safe world.

The change implements the workqueue pr_input framework called wqinput which
provides a means to defer pr_input of a protocol to workqueue easily.
Currently icmp_input, icmp6_input, carp_proto_input and carp6_proto_input
are deferred to workqueue by the framework.

Proposed and discussed on tech-kern and tech-net


# 1.206 16-Jan-2017 christos

ip6_sprintf -> IN6_PRINT so that we pass the size.


# 1.205 16-Jan-2017 ryo

Make ip6_sprintf(), in_fmtaddr(), lla_snprintf() and icmp6_redirect_diag() mpsafe.

Reviewed by ozaki-r@


Revision tags: bouyer-socketcan-base
# 1.204 13-Jan-2017 ozaki-r

branches: 1.204.2;
Tweak icmp6_input; always use off, not *offp


Revision tags: pgoyette-localcount-20170107
# 1.203 12-Dec-2016 ozaki-r

Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.202 11-Dec-2016 ozaki-r

Correct sanity checks of icmp6_redirect_output

- rt->rt_ifp is always non-NULL
- Checking RTF_UP here is just racy and meaningless
- The arguments should be non-NULL (at least for now)


Revision tags: nick-nhusb-base-20161204
# 1.201 15-Nov-2016 mlelstv

Enforce alignment requirements that are violated in some cases.
For machines that don't need strict alignment (i386,amd64,vax,m68k) this
is a no-op.

Fixes PR kern/50766 but should be improved.


Revision tags: pgoyette-localcount-20161104
# 1.200 31-Oct-2016 ozaki-r

Fix race condition of in6_selectsrc

in6_selectsrc returned a pointer to in6_addr that wan't guaranteed to be
safe by pserialize (or psref), which was racy. Let callers pass a pointer
to in6_addr and in6_selectsrc copy a result to it inside pserialize
critical sections.


# 1.199 25-Oct-2016 ozaki-r

Remove unnecessary argument

No functional change.


# 1.198 18-Oct-2016 ozaki-r

Remove unnecessary pserialize_read_enter


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.197 26-Aug-2016 dholland

PR 51434 David Binderman: remove redundant test.


# 1.196 19-Aug-2016 roy

Revert r1.148
IP6_EXTHDR_GET ensures that a icmp6 header can be fetched from the mbuf
so m_pullup does not need to be called.

While here, we can safely increament interface error stats even with an
invalidated mbuf because we have a saved reference to the interface.


Revision tags: pgoyette-localcount-20160806
# 1.195 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


Revision tags: pgoyette-localcount-20160726
# 1.194 15-Jul-2016 ozaki-r

Use sin6tosa and sin6tocsa macros

No functional change.


# 1.193 15-Jul-2016 ozaki-r

Use ifatoia6 macro

No functional change.


Revision tags: pgoyette-localcount-base nick-nhusb-base-20160907
# 1.192 07-Jul-2016 ozaki-r

branches: 1.192.2;
Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.


# 1.191 05-Jul-2016 ozaki-r

Use ia6 or ia instead of ifa as a variable name of struct in6_ifaddr

We conventionally use ifa for struct ifaddr and use ia6 or ia for
struct in6_ifaddr.

No functional change.


# 1.190 28-Jun-2016 ozaki-r

Add missing NULL checks for m_get_rcvif_psref


# 1.189 21-Jun-2016 ozaki-r

Make sure returning ifp from in6_select* functions psref-ed

To this end, callers need to pass struct psref to the functions
and the fuctions acquire a reference of ifp with it. In some cases,
we can simply use if_get_byindex, however, in other cases
(say rt->rt_ifp and ia->ifa_ifp), we have no MP-safe way for now.
In order to take a reference anyway we use non MP-safe function
if_acquire_NOMPSAFE for the latter cases. They should be fixed in
the future somehow.


# 1.188 10-Jun-2016 ozaki-r

Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.


# 1.187 10-Jun-2016 ozaki-r

Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.186 18-May-2016 ozaki-r

Don't try to get outif unnecessarily from in6_selectsrc

The got outif is unused.


# 1.185 17-May-2016 ozaki-r

Get rcvif once and reuse it

No functional change.


# 1.184 17-May-2016 ozaki-r

Make sure icmp6_redirect_input frees mbuf before return


# 1.183 12-May-2016 ozaki-r

Protect ifnet list with psz and psref

The change ensures that ifnet objects in the ifnet list aren't freed during
list iterations by using pserialize(9) and psref(9).

Note that the change adds a pslist(9) for ifnet but doesn't remove the
original ifnet list (ifnet_list) to avoid breaking kvm(3) users. We
shouldn't use the original list in the kernel anymore.


Revision tags: nick-nhusb-base-20160422
# 1.182 04-Apr-2016 ozaki-r

Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html


# 1.181 01-Apr-2016 ozaki-r

Remove unnecessary casts and do s/0/NULL/ for rtrequest


# 1.180 01-Apr-2016 ozaki-r

Refine nd6log

Add __func__ to nd6log itself instead of adding it to callers.


Revision tags: nick-nhusb-base-20160319
# 1.179 21-Jan-2016 riastradh

Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.


# 1.178 21-Jan-2016 riastradh

Give proper prototype to ip_output.


Revision tags: nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.177 14-Sep-2015 ozaki-r

Update icmp6_redirect_timeout_q when changing net.inet6.icmp6.redirtimeout

We have to update icmp6_redirect_timeout_q as well as icmp6_redirtimeout
when changing net.inet6.icmp6.redirtimeout via sysctl. The updating logic
is copied from sysctl_net_inet_icmp_redirtimeout.

This change is from s-yamaguchi@IIJ (with KNF by ozaki-r) and fixes
PR kern/50240.


# 1.176 31-Aug-2015 ozaki-r

Make rt_refcnt take into account rt_timer


# 1.175 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.174 24-Aug-2015 ozaki-r

Change 0 to NULL for rtrequest's last argument (struct rtentry **ret_nrt)


# 1.173 07-Aug-2015 ozaki-r

Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.


# 1.172 24-Jul-2015 ozaki-r

Fix rtfree-ing wrong rtentry


# 1.171 17-Jul-2015 ozaki-r

Reform use of rt_refcnt

rt_refcnt of rtentry was used in bad manners, for example, direct rt_refcnt++
and rt_refcnt-- outside route.c, "rt->rt_refcnt++; rtfree(rt);" idiom, and
touching rt after rt->rt_refcnt--.

These abuses seem to be needed because rt_refcnt manages only references
between rtentry and doesn't take care of references during packet processing
(IOW references from local variables). In order to reduce the above abuses,
the latter cases should be counted by rt_refcnt as well as the former cases.

This change improves consistency of use of rt_refcnt:
- rtentry is always accessed with rt_refcnt incremented
- rtentry's rt_refcnt is decremented after use (rtfree is always used instead
of rt_refcnt--)
- functions returning rtentry increment its rt_refcnt (and caller rtfree it)

Note that rt_refcnt prevents rtentry from being freed but doesn't prevent
rtentry from being updated. Toward MP-safe, we need to provide another
protection for rtentry, e.g., locks. (Or introduce a better data structure
allowing concurrent readers during updates.)


Revision tags: nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.170 25-Nov-2014 christos

branches: 1.170.2;
CID 977389: Out of bounds access.


Revision tags: netbsd-7-0-2-RELEASE netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.169 06-Jun-2014 rmind

branches: 1.169.2;
- Eliminate RTFREE() macro in favour of rtfree() function.
- Make rtcache() function static.


# 1.168 30-May-2014 christos

Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.


# 1.167 19-May-2014 rmind

- Split off PRU_ATTACH and PRU_DETACH logic into separate functions.
- Replace malloc with kmem and eliminate M_PCB while here.
- Sprinkle more asserts.


Revision tags: rmind-smpnet-nbase rmind-smpnet-base
# 1.166 18-May-2014 rmind

Use IFNET_FIRST() rather than open coding ifnet access.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.165 25-Feb-2014 pooka

branches: 1.165.2;
Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.


# 1.164 20-Feb-2014 joerg

Bail out in case m_pulldown failed.


# 1.163 23-Nov-2013 christos

convert from CIRCLEQ to TAILQ.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.162 05-Jun-2013 christos

branches: 1.162.2;
IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.


Revision tags: agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.161 23-Jun-2012 christos

branches: 1.161.2;
4 new sysctls to avoid ipv6 DoS attacks from OpenBSD


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.160 22-Mar-2012 drochner

remove KAME IPSEC, replaced by FAST_IPSEC


Revision tags: netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.159 31-Dec-2011 christos

branches: 1.159.2; 1.159.6; 1.159.8;
- fix offsetof usage, and redundant defines
- kill pointer casts to 0


# 1.158 19-Dec-2011 drochner

rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.157 31-Aug-2011 plunky

branches: 1.157.2; 1.157.6;
NULL does not need a cast


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 rmind-uvmplock-base
# 1.156 12-Sep-2010 drochner

avoid NULL dereference in error case


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.155 18-Oct-2009 christos

branches: 1.155.2; 1.155.4;
fix the sun2 case for real.


# 1.154 12-Oct-2009 christos

unbreak sun2.


# 1.153 16-Sep-2009 pooka

Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.152 18-Mar-2009 cegger

bzero -> memset


# 1.151 18-Mar-2009 cegger

bcmp -> memcmp


Revision tags: netbsd-5-2-3-RELEASE netbsd-5-1-5-RELEASE netbsd-5-2-2-RELEASE netbsd-5-1-4-RELEASE netbsd-5-2-1-RELEASE netbsd-5-1-3-RELEASE netbsd-5-2-RELEASE netbsd-5-2-RC1 netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 nick-hppapmap-base2 netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 haad-dm-base mjf-devfs2-base
# 1.150 03-Oct-2008 adrianp

branches: 1.150.2; 1.150.8;
Fix for CVE-2008-3530 from matt@
Implement improved checking for MTU values on ICMP 'Packet Too Big Messages'


Revision tags: wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.149 06-Aug-2008 plunky

Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base
# 1.148 07-May-2008 bouyer

branches: 1.148.2; 1.148.6;
Sync with ipv4 icmp_input(): make sure the mbuf is writable and
contains the entire icmp message befre calling icmp6_input().
should fix "panic: mbuf too short for IPv6 header" seen by several peoples.


# 1.147 04-May-2008 thorpej

Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577


Revision tags: yamt-nfs-mp-base
# 1.146 23-Apr-2008 thorpej

branches: 1.146.2;
Use <net/net_stats.h> / netstat_sysctl().


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.145 15-Apr-2008 thorpej

branches: 1.145.2;
Make ip6 and icmp6 stats per-cpu.


# 1.144 08-Apr-2008 thorpej

Change IPv6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ip6stat structure; old netstat
binaries will continue to work properly.


# 1.143 08-Apr-2008 thorpej

Change ICMP6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old icmp6stat structure; old netstat
binaries will continue to work properly.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.142 27-Feb-2008 matt

Convert to ansi definitions from old-style definitons.
Remember that func() is not ansi, func(void) is.


Revision tags: nick-net80211-sync-base bouyer-xeni386-merge1 vmlocking2-base3 bouyer-xeni386-nbase yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 bouyer-xeni386-base yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase mjf-devfs-base matt-armv6-base jmcneill-pm-base hpcarm-cleanup-base reinoud-bufcleanup-base
# 1.141 04-Dec-2007 dyoung

branches: 1.141.8; 1.141.12;
Use IFNET_FOREACH() and IFADDR_FOREACH().


Revision tags: vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.140 01-Nov-2007 dyoung

branches: 1.140.2; 1.140.4;
De-__P().


# 1.139 29-Oct-2007 dyoung

The IPv6 stack labels incoming packets with an m_tag whose payload
is a struct ip6aux. A struct ip6aux used to contain a pointer to
an in6_ifaddr, but that pointer could become a dangling reference
in the lifetime of the m_tag, because ip6_setdstifaddr() did not
increase the in6_ifaddr's reference count. I have removed the
pointer from ip6aux. I load it with the interesting fields from
the in6_ifaddr (an IPv6 address, a scope ID, and some flags),
instead.


# 1.138 24-Oct-2007 dyoung

Replace rote sockaddr_in6 initializations (memset(), set sa6_family,
sa6_len, and sa6_add) with sockaddr_in6_init() calls.

De-__P(). Constify. KNF. Shorten a staircase. Change bcmp() to
memcmp().

Extract subroutine in6_setzoneid() from in6_setscope(), for re-use
soon.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base vmlocking-base
# 1.137 19-Sep-2007 dyoung

branches: 1.137.4;
1) Introduce a new socket option, (SOL_SOCKET, SO_NOHEADER), that
tells a socket that it should both add a protocol header to tx'd
datagrams and remove the header from rx'd datagrams:

int onoff = 1, s = socket(...);
setsockopt(s, SOL_SOCKET, SO_NOHEADER, &onoff);

2) Add an implementation of (SOL_SOCKET, SO_NOHEADER) for raw IPv4
sockets.

3) Reorganize the protocols' pr_ctloutput implementations a bit.
Consistently return ENOPROTOOPT when an option is unsupported,
and EINVAL if a supported option's arguments are incorrect.
Reorganize the flow of code so that it's more clear how/when
options are passed down the stack until they are handled.

Shorten some pr_ctloutput staircases for readability.

4) Extract common mbuf code into subroutines, add new sockaddr
methods, and introduce a new subroutine, fsocreate(), for reuse
later; use it first in sys_socket():

struct mbuf *m_getsombuf(struct socket *so)

Create an mbuf and make its owner the socket `so'.

struct mbuf *m_intopt(struct socket *so, int val)

Create an mbuf, make its owner the socket `so', put the
int `val' into it, and set its length to sizeof(int).


int fsocreate(..., int *fd)

Create a socket, a la socreate(9), put the socket into the
given LWP's descriptor table, return the descriptor at `fd'
on success.

void *sockaddr_addr(struct sockaddr *sa, socklen_t *slenp)
const void *sockaddr_const_addr(const struct sockaddr *sa, socklen_t *slenp)

Extract a pointer to the address part of a sockaddr. Write
the length of the address part at `slenp', if `slenp' is
not NULL.

socklen_t sockaddr_getlen(const struct sockaddr *sa)

Return the length of a sockaddr. This just evaluates to
sa->sa_len. I only add this for consistency with code that
appears in a portable userland library that I am going to
import.

const struct sockaddr *sockaddr_any(const struct sockaddr *sa)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.

const void *sockaddr_anyaddr(const struct sockaddr *sa, socklen_t *slenp)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.


Revision tags: nick-csl-alignment-base5
# 1.136 10-Aug-2007 dyoung

branches: 1.136.2;
Constify. bcopy -> memcpy.


Revision tags: matt-mips64-base
# 1.135 19-Jul-2007 dyoung

branches: 1.135.4; 1.135.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.134 13-Jun-2007 dyoung

branches: 1.134.2;
Persuasive programming: check M_UNWRITABLE(m, len) instead of
m->m_len<len before pulling up, because that helps make it clear
that we m_pullup() in order to guarantee that the contiguous region
is *writable*.


# 1.133 23-May-2007 christos

Ansify + add a few comments, from Karl Sj��dahl


Revision tags: yamt-idlelwp-base8
# 1.132 02-May-2007 dyoung

Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


Revision tags: thorpej-atomic-base
# 1.131 04-Mar-2007 christos

branches: 1.131.2; 1.131.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.130 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


# 1.129 10-Feb-2007 degroote

branches: 1.129.2;
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.128 29-Jan-2007 dyoung

bzero -> memset


# 1.127 15-Jan-2007 dyoung

Cosmetic: indent using ASCII horizontal tab, insert space following
comma, wrap line.


# 1.126 15-Jan-2007 degroote

Fix an infinite loop ( and local dos ) in the case where the ip6_hdr and
the icmp6_hdr are not in the same mbuf.
Fix pr/34994 and probably pr/35333
Ok @rpaulo


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.125 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.124 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


Revision tags: netbsd-4-base
# 1.123 16-Nov-2006 christos

branches: 1.123.2;
__unused removal on arguments; approved by core.


Revision tags: yamt-splraiseipl-base2
# 1.122 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.121 05-Sep-2006 dyoung

branches: 1.121.2; 1.121.4;
Simplify and repair icmp6_input() to stop the kernel from panicking
in m_copydata() when an ICMP6_ECHO_REQUEST is received, as reported
by Tatoku Ogaito on current-users@.


Revision tags: yamt-pdpolicy-base8
# 1.120 01-Sep-2006 dyoung

Vastly simplify the code that copies an ICMP6 packet to two data
paths: ICMP6 reply path, and socket path.


# 1.119 30-Aug-2006 christos

declare the type of code.


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.118 11-Jul-2006 tron

Clear mbuf checksum flags before passing it to ip6_output(). We might
recycle a mbuf which contained a hardware provided checksum. This
fixes "traceroute6" to a machine which is using a wm(4) interface
that has UDP or TCP checksum offload enabled.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base
# 1.117 07-Jun-2006 kardel

branches: 1.117.2;
merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base simonb-timecounters-base
# 1.116 15-Apr-2006 christos

branches: 1.116.2;
Coverity CID 740: Change constant comparisons to MCLBYTES to KASSERT and remove
extraneous tests.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2
# 1.115 05-Mar-2006 rpaulo

branches: 1.115.2; 1.115.4;
NDP-related improvements:
RFC4191
- supports host-side router-preference

RFC3542
- if DAD fails on a interface, disables IPv6 operation on the
interface
- don't advertise MLD report before DAD finishes

Others
- fixes integer overflow for valid and preferred lifetimes
- improves timer granularity for MLD, using callout-timer.
- reflects rtadvd's IPv6 host variable information into kernel
(router only)
- adds a sysctl option to enable/disable pMTUd for multicast
packets
- performs NUD on PPP/GRE interface by default
- Redirect works regardless of ip6_accept_rtadv
- removes RFC1885-related code

From the KAME project via SUZUKI Shinsuke.
Reviewed by core.


Revision tags: yamt-pdpolicy-base
# 1.114 03-Mar-2006 rpaulo

branches: 1.114.2;
Fix typos in comments.

From: the KAME project via SUZUKI Shinsuke.


Revision tags: yamt-uio_vmspace-base5
# 1.113 21-Jan-2006 rpaulo

branches: 1.113.2; 1.113.4;
Better support of IPv6 scoped addresses.

- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.
- and, while there, don't try to remove the ff02::/32 interface route
entry in in6_ifdetach() as it's already gone.

This also includes some level of support for the standard source
address selection algorithm defined in RFC3484, which will be
completed on in the future.

From the KAME project via JINMEI Tatuya.
Approved by core@.


# 1.112 11-Dec-2005 christos

branches: 1.112.2;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base ktrace-lwp-base
# 1.111 19-Oct-2005 bouyer

In icmp6_redirect_output(), sip6 is initialised to point to the data area of
m0. But m0 may be freed later, so trying to use sip6 at the end of this
function is wrong. My guess is that we want to reference the data area
of m (the mbuf about to be send) instead at this point.
Fix a panic on Xen (where a data area of a mbuf may be unmapped when the
mbuf is freed), and probably potential data/pool corruption in other cases.


Revision tags: yamt-vop-base
# 1.110 18-Aug-2005 yamt

branches: 1.110.2;
- introduce M_MOVE_PKTHDR and use it where appropriate.
intended to be mostly API compatible with openbsd/freebsd.
- remove a glue #define in netipsec/ipsec_osdep.h.


# 1.109 29-May-2005 christos

branches: 1.109.2;
- avoid shadowed variables
- sprinkle const.


Revision tags: netbsd-3-1-1-RELEASE netbsd-3-0-3-RELEASE netbsd-3-1-RELEASE netbsd-3-0-2-RELEASE netbsd-3-1-RC4 netbsd-3-1-RC3 netbsd-3-1-RC2 netbsd-3-1-RC1 netbsd-3-0-1-RELEASE netbsd-3-0-RELEASE netbsd-3-0-RC6 netbsd-3-0-RC5 netbsd-3-0-RC4 netbsd-3-0-RC3 netbsd-3-0-RC2 netbsd-3-0-RC1 yamt-km-base4 yamt-km-base3 netbsd-3-base yamt-km-base2 yamt-km-base kent-audio2-base
# 1.108 17-Jan-2005 itojun

branches: 1.108.6; 1.108.8; 1.108.10;
shouldn't check code field on "packet too big" icmp6 message.


Revision tags: kent-audio1-beforemerge kent-audio1-base
# 1.107 25-May-2004 atatat

branches: 1.107.4;
Sysctl descriptions under net subtree (net.key not done)


Revision tags: netbsd-2-0-base
# 1.106 26-Mar-2004 itojun

branches: 1.106.2;
do not touch m->m_pkthdr.rcvif after m becomes invalid. Patrick Latifi


# 1.105 24-Mar-2004 atatat

Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.


# 1.104 17-Dec-2003 lha

Fix ICMPV6CTL_ND6_[DP]RLIST, they broke with new sysctl.
Makes ndp -r/ndp -p work again, patch from atatat


# 1.103 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.102 30-Oct-2003 simonb

Remove some assigned-to but otherwise unused variables.


# 1.101 04-Sep-2003 itojun

revamp inpcb/in6pcb so that they are more aligned with each other.
in6pcb lookup now uses hash(9).


# 1.100 25-Aug-2003 itojun

deref member in in6p directly, don't rely on existence of macro


# 1.99 22-Aug-2003 itojun

remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.


# 1.98 22-Aug-2003 itojun

change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.


# 1.97 22-Aug-2003 jonathan

Replace the set_socket() method of passing an extra struct socket*
argument to ip6_output() with a new explicit struct in6pcb* argument.
(The underlying socket can be obtained via in6pcb->inp6_socket.)

In preparation for fast-ipsec. Reviewed by itojun.


# 1.96 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.95 06-Aug-2003 itojun

m_cat may free mbuf on 2nd arg, so m_pkthdr manipulation has to happen
before m_cat call. from Julian Coleman via kame.


# 1.94 24-Jun-2003 itojun

branches: 1.94.2;
remove unneeded checks of accept_rtadv. from kame


# 1.93 24-Jun-2003 itojun

use time.tv_sec directly


# 1.92 06-Jun-2003 itojun

- sync up MLD declaration with RFC3542 (s/MLD6/MLD/)
- routing header declaration with RFC3542
(note: sizeof(ip6_rthdr0) has changed!)
also, sync up with RFC2460 routing header definition (no "strict" source
routing mode any more)

part of advanced API update (RFC2292 -> 3542).


# 1.91 03-Jun-2003 itojun

remove assumption on redirect header option processing. from kame


# 1.90 14-May-2003 itojun

always use PULLDOWN_TEST codepath.


# 1.89 31-Mar-2003 itojun

avoid mbuf leak in redirect header option attachment. more complete
fix to come. from kame


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.88 27-Sep-2002 provos

remove trailing \n in panic(). approved perry.


# 1.87 23-Sep-2002 simonb

Remove breaks after returns, unreachable returns and returns after
returns(!).


# 1.86 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame.


Revision tags: gehenna-devsw-base
# 1.85 30-Jul-2002 itojun

no need to check NULL mbuf, as we touch it already.
From: tedu <grendel@zeitbombe.org>


# 1.84 10-Jul-2002 itojun

correct ping6 -w result wth hostname with [A-Z]. PR 17540. sync w/kame


# 1.83 30-Jun-2002 thorpej

Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).


# 1.82 09-Jun-2002 itojun

whitespace cleanup


# 1.81 08-Jun-2002 itojun

whitespace cleanup


# 1.80 31-May-2002 itojun

do not mistakenly lock PMTUD route entry with RTV_MTU.


# 1.79 29-May-2002 christos

make this compile again.


# 1.78 29-May-2002 itojun

correct rmx_mtu value after PMTUD entry timeout (should be set to 0)


# 1.77 24-May-2002 itojun

extra blank line


# 1.76 24-May-2002 itojun

make a strict check before sending FQDN node information reply. sync w/kame


Revision tags: netbsd-1-6-base eeh-devprop-base newlock-base
# 1.75 05-Mar-2002 itojun

branches: 1.75.6; 1.75.8;
on redirect output, always try to attach target link layer address option.


Revision tags: ifpoll-base
# 1.74 21-Dec-2001 itojun

whitespace/costmetic sync w/kame


# 1.73 20-Dec-2001 itojun

centralize multicast group management (in6_join/leavegroup).
have a flag for ip6_output() to fragment to minimum MTU.
sync with kame


# 1.72 07-Dec-2001 itojun

correct timing to increment icmp6 MIB variables. sync with kame


# 1.71 13-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-mips-cache-base
# 1.70 29-Oct-2001 simonb

Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.


# 1.69 24-Oct-2001 itojun

more whitespace sync with kame


# 1.68 18-Oct-2001 itojun

branches: 1.68.2;
simplify per-if stats.


# 1.67 15-Oct-2001 itojun

sync with kame.
net.inet6.icmp6.nodeinfo is now a bitmap (2^0 = ping6 -w, 2^1 = ping6 -a).
give up local if there's mbuf alloc failures.
cope with ".." in hostname.
sync comments/whitespaces.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.66 22-Jun-2001 itojun

branches: 1.66.2;
remove RFC1885 compatibility code in #ifdef COMPAT_RFC1885, for icmp6
reply packet size consideration (obsolete, not used for a long time).
sync with kame


# 1.65 01-Jun-2001 itojun

use default hoplimit when incoming interface is not given to icmp6_reflect.
sync with kame


# 1.64 08-May-2001 itojun

correct faith prefix determination. use sys/netinet/if_faith.c:faithprefix()
to determine. sync with kame.
(without this change, non-faith socket may mistakenly accept for-faith traffic)


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.63 04-Apr-2001 itojun

make sure rcvif is sane on call to icmp6_reflect


# 1.62 30-Mar-2001 itojun

enable FAKE_LOOPBACK_IF case by default.
now traffic on loopback interface will be presented to bpf as normal wire
format packet (without KAME scopeid in s6_addr16[1]).

fix KAME PR 250 (host mistakenly accepts packets to fe80::x%lo0).

sync with kame.


# 1.61 21-Mar-2001 itojun

set rmx_mtu to L2 interface mtu, instead of 0, on mtudisc timeout.
ip6_output() change is for safety. sync with kame


# 1.60 08-Mar-2001 itojun

remove bogus rtfree. sync with kame. inspired by openbsd PR 1706.


# 1.59 01-Mar-2001 itojun

branches: 1.59.2;
make sure to enforce inbound ipsec policy checking, for any protocols on top
of ip (check it when final header is visited). sync with kame.
XXX kame team will need to re-check policy engine code


# 1.58 11-Feb-2001 itojun

pull latest kame pcbnotify code. synchronizes ICMPv6 path mtu discovery
behavior with other protocols (i.e. validation, use of hiwat/lowat).


# 1.57 11-Feb-2001 itojun

recover $NetBSD$ (removed by mistake)


# 1.56 10-Feb-2001 itojun

to sync with kame better, (1) remove register declaration for variables,
(2) sync whitespaces, (3) update comments. (4) bring in some of portability
and logging enhancements. no functional changes here.


# 1.55 08-Feb-2001 itojun

implement upper limit to icmp6 redirects (experimental, turned off)
negative value to {mtudisc,redirect}_{hi,lo}wat will turn off the limitation.
sync with kame.


# 1.54 07-Feb-2001 itojun

remove bogus DIAGNOSTIC. sync with kame


# 1.53 07-Feb-2001 itojun

during ip6/icmp6 inbound packet processing, do not call log() nor printf() in
normal operation (/var can get filled up by flodding bogus packets).
sysctl net.inet6.icmp6.nd6_debug will turn on diagnostic messages.
(#define ND6_DEBUG will turn it on by default)

improve stats in ND6 code.

lots of synchronziation with kame (including comments and cometic ones).


# 1.52 24-Jan-2001 itojun

- record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation


# 1.51 16-Jan-2001 itojun

s/ND6DEBUG/ND6_DEBUG/ to meet other places


# 1.50 08-Jan-2001 itojun

wrap icmp6 checksum error printf() into #ifdef ND6DEBUG.
sync with kame, NetBSD PR 11911.


# 1.49 11-Dec-2000 itojun

no need to rtalloc1() twice in pmtud. from kame


# 1.48 09-Dec-2000 itojun

update icmp6 too big validation. the change is necessary since pmtud is
mandatory for IPv6 (so we can't just validate by using connected pcb - we need
to allow traffic from unconnected pcb to do pmtud).
- if the traffic is validated by xx_ctlinput, allow up to "hiwat" pmtud
route entries.
- if the traffic was not validated by xx_ctlinput, allow up to "lowat" pmtud
route entries (there's upper limit, so bad guys cannot blow up our routing
table).
sync with kame

XXX need to think again about default hiwat/lowat value.
XXX victim selection to help starvation case


# 1.47 11-Nov-2000 itojun

improve spec conformance of node information query (07).
sync with kame.


# 1.46 18-Oct-2000 itojun

verify ICMPv6 too big messages based on TCP pcbs, and/or IPsec SA.
TODO: udp6, and sendto consideration. as pmtud is mandatory for IPv6,
it is rather important for us to support those cases.
TODO: more testing
TODO: kame sync


# 1.45 10-Oct-2000 itojun

sync with kame ($KAME$)


# 1.44 02-Oct-2000 itojun

fix compilation without INET. fix confusion between ipsecstat and ipsec6stat.
sync with kame.


# 1.43 16-Sep-2000 itojun

kame sys/netinet6/icmp6.c 1.140 -> 1.144
> in the check for the incoming redirect message, examine the gateway
> (from the routing table) only when the address family of the gateway is
> AF_INET6.


# 1.42 19-Aug-2000 itojun

- icmp6 nodeinfo: remove possibility of unaligned pointer access.
- jumbo payload output: fix incorrect mbuf manipulation
- pedant: align issues, mbuf assumption
(sync with kame)


# 1.41 03-Aug-2000 itojun

clearifications in icmp6 node query support.
XXX previous commit included "supported qtypes" icmp6 node query support.
sorry commit message was mistaken.


# 1.40 03-Aug-2000 itojun

correct typo in #define. ICMP6_NI_SUCESS -> SUCCESS (notice missing C).
sync with kame.


# 1.39 30-Jul-2000 itojun

sync comment with reality


# 1.38 28-Jul-2000 itojun

nuke the following sysctl variables. "ppsratelimit" should work better.
need to recompile sbin/sysctl after updating /usr/include.
net.inet.tcp.rstratelimit
net.inet.icmp.errratelimit
net.inet6.icmp6.errratelimit


# 1.37 09-Jul-2000 itojun

add ppsratelimit(9), which does event-per-sec rate limitation.
use it from icmp6 error rate limitation code.
XXX better name for the function?


# 1.36 07-Jul-2000 itojun

sync with kame.
introduce in6_{recover,embed}scope, for in-kernel scoped-address manipulation.
improve in6_pcbnotify.


# 1.35 06-Jul-2000 itojun

- do not use bitfield for router renumbering header.
- add protection mechanism against ND cache corruption due to bad NUD hints.
- more stats
- icmp6 pps limitation. TOOD: should implement ppsratecheck(9).


# 1.34 28-Jun-2000 mrg

<vm/vm.h> -> <uvm/uvm_extern.h>


Revision tags: netbsd-1-5-base
# 1.33 13-Jun-2000 itojun

branches: 1.33.2;
signedness issue with char, take 2. confirmed with i386 cc -funsigned-char.


# 1.32 13-Jun-2000 itojun

workaround to suppress warning on char == unsigned char arch.


# 1.31 12-Jun-2000 itojun

better conformance to draft-ietf-ipngwg-icmp-name-lookups-05.
the old code was chimera of 03 and 05 draft.

-n by default, since IPv6 reverse lookup takes too much time.
use -H to enable reverse name lookup.


Revision tags: minoura-xpg4dl-base
# 1.30 22-May-2000 itojun

branches: 1.30.2;
disallow negative numbers for ratelimit interval (tcp, icmp, icmp6).


# 1.29 09-May-2000 itojun

do not try NUD unless the gateway is a real neighbor.
real fix to KAME PR 245 (workaround has been implemented).


# 1.28 13-Apr-2000 itojun

do not return icmp6 error against icmp6 error.
(this is due to a bug in header chain chasing)


# 1.27 22-Mar-2000 itojun

use ip6_{last,next}hdr in icmp6 inbound packet parsing.


# 1.26 01-Mar-2000 itojun

introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)


# 1.25 28-Feb-2000 itojun

fix ICMPv6 redirect input. the bug can result in invalid ND entry.


# 1.24 28-Feb-2000 itojun

support draft-ietf-ipngwg-icmp-name-lookups-05.txt, drop support for
draft-ietf-ipngwg-icmp-name-lookups-04.txt.

There are certain bitfield change in 04 draft to 05 draft, which makes
04 "ping6 -a" and 05 "ping6 -a" not interoperable. sigh.


# 1.23 26-Feb-2000 itojun

bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.


# 1.22 17-Feb-2000 darrenr

Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.


# 1.21 15-Feb-2000 thorpej

Fix a couple of brainos in the last.


# 1.20 14-Feb-2000 thorpej

Use ratecheck() for ICMP6 rate limiting.


Revision tags: chs-ubc2-newbase
# 1.19 06-Feb-2000 itojun

fix include pathname for better rfc2292 compliance.


# 1.18 16-Jan-2000 itojun

add missing ipcomp cases.


# 1.17 07-Jan-2000 itohy

Rename variable "prep" for PReP port.


# 1.16 06-Jan-2000 itojun

remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...


# 1.15 05-Jan-2000 itojun

avoid panic on getsockopt(ICMPV6_FILTER).


# 1.14 02-Jan-2000 itojun

add net.inet6.icmp6.nodeinfo sysctl.
this allows you to disable/enable ICMPv6 node information query/reply
processing (which tells remote end the gethostname(3) setting, interface
addresses on the node, and some other things - documented in
draft-ietf-ipngwg-icmp-name-lookup* or something alike).

to test it, try ping6 -w ::1 with nodeinfo=0 and nodeinfo=1.
(sync with kame change)


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.13 15-Dec-1999 itojun

do not overwrite traffic class field when we write IPv6 version field.


# 1.12 13-Dec-1999 itojun

sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)


Revision tags: comdex-fall-1999-base fvdl-softdep-base
# 1.11 01-Oct-1999 itojun

branches: 1.11.2; 1.11.8;
consistent logging for icmp6 redirects
XXX should make logs 1-liner so that duplicated logs can be compressed
by syslog(8)?


Revision tags: chs-ubc2-base
# 1.10 31-Jul-1999 itojun

sync with recent KAME.
- loosen ipsec restriction on packet diredtion.
- revise icmp6 redirect handling on IsRouter bit.
- tcp/udp notification processing (link-local address case)
- cosmetic fixes (better code share across *BSD).


# 1.9 30-Jul-1999 itojun

remove reference to in6_systm.h (file itself will be removed afterwords)


# 1.8 22-Jul-1999 itojun

- implement IPv6 pmtud, which is necessary for TCP6.
- fix memory leak on SO_DEBUG over TCP.


# 1.7 22-Jul-1999 itojun

change unnecessary u_long/long into u_int32_t or something relevant.
more fixes should follow.


# 1.6 09-Jul-1999 thorpej

defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).


# 1.5 06-Jul-1999 itojun

sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups


# 1.4 06-Jul-1999 itojun

checked build on alpha and i386, with GENERIC.v6.
fixed several sizeof(void *) and sizeof(size_t) issues on alpha.

Thanks to: Dave Huang and Tim Rightnour


# 1.3 03-Jul-1999 thorpej

RCS ID police.


# 1.2 01-Jul-1999 itojun

branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.1 28-Jun-1999 itojun

branches: 1.1.2;
file icmp6.c was initially added on branch kame.


# 1.252 29-Aug-2022 knakahara

Add sysctl entry to control to send routing message for RTM_DYNAMIC.

Some routing daemons require such routing message to keep coherency.

If we want to let kernel send such message, set net.inet.icmp.dynamic_rt_msg=1
for IPv4, net.inet6.icmp6.dynamic_rt_msg=1 for IPv6.
Default(=0) is the same as before, that is, not send such routing message.


# 1.251 22-Aug-2022 knakahara

Add sysctl entry to enable/disable to use path MTU discovery for icmpv6 reflecting.

If we want to use path MTU discovery for icmp reflecting set
net.inet6.icmp6.reflect_pmtu=1. Default(=0) is the same as before, that is,
use IPV6_MINMTU.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.250 19-Feb-2021 christos

- Make ALIGNED_POINTER use __alignof(t) instead of sizeof(t). This is more
correct because it works with non-primitive types and provides the ABI
alignment for the type the compiler will use.
- Remove all the *_HDR_ALIGNMENT macros and asserts
- Replace POINTER_ALIGNED_P with ACCESSIBLE_POINTER which is identical to
ALIGNED_POINTER, but returns that the pointer is always aligned if the
CPU supports unaligned accesses.
[ as proposed in tech-kern ]


# 1.249 15-Feb-2021 martin

Fix the build.
Maybe there should be a ICMP6_HDR_ALIGNMENT, but for now there is
only IP6_HDR_ALIGNMENT.


# 1.248 14-Feb-2021 christos

- centralize header align and pullup into a single inline function
- use a single macro to align pointers and expose the alignment, instead
of hard-coding 3 in 1/2 the macros.
- fix an issue in the ipv6 lt2p where it was aligning for ipv4 and pulling
for ipv6.


# 1.247 11-Sep-2020 roy

branches: 1.247.2;
inet6: Use generic Neighor Detection rather than IPv6 specific

No functional change intended.


# 1.246 27-Jul-2020 roy

icmp6: Remove __packed attribute from icmp6 structures

They should naturally align.
Add compile time assertations to icmp6.c to prove this.


# 1.245 12-Jun-2020 roy

Remove in-kernel handling of Router Advertisements

This is much better handled by a user-land tool.
Proposed on tech-net here:
https://mail-index.netbsd.org/tech-net/2020/04/22/msg007766.html

Note that the ioctl SIOCGIFINFO_IN6 no longer sets flags. That now
needs to be done using the pre-existing SIOCSIFINFO_FLAGS ioctl.

Compat is fully provided where it makes sense, but trying to turn on
RA handling will obviously throw an error as it no longer exists.

Note that if you use IPv6 temporary addresses, this now needs to be
turned on in dhcpcd.conf(5) rather than in sysctl.conf(5).


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base phil-wifi-20200406
# 1.244 09-Mar-2020 roy

route: RTM_MISS now puts the message source address in RTA_AUTHOR

route(8) also reports this.
A userland app could use this to blacklist nodes who probe for machines
that doesn't exist on a subnet / prefix.


Revision tags: is-mlppp-base ad-namecache-base3 ad-namecache-base2 ad-namecache-base1 ad-namecache-base phil-wifi-20191119
# 1.243 06-Oct-2019 uwe

icmp6_notify_error - fix ctlfunc typedef to match pr_ctlinput,
drop the cast that is no longer necessary.


Revision tags: netbsd-9-3-RELEASE netbsd-9-2-RELEASE netbsd-9-1-RELEASE netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.242 22-Dec-2018 maxv

Replace: M_COPY_PKTHDR -> m_copy_pkthdr. No functional change, since the
former is a macro to the latter.


# 1.241 22-Dec-2018 maxv

Replace: M_MOVE_PKTHDR -> m_move_pkthdr. No functional change, since the
former is a macro to the latter.


Revision tags: pgoyette-compat-1126
# 1.240 25-Oct-2018 ozaki-r

Remove a leftover debug printf

Pointed out by hannken@


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.239 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


Revision tags: pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625
# 1.238 01-Jun-2018 ozaki-r

branches: 1.238.2;
Fix _rt_free via rtrequest(RTM_DELETE) hangs in rt_timer handlers

A rt_timer handler is passed a rtentry with an extra reference that avoids the
rtentry is accidentally released. So rt_timer handers must release the reference
of a passed rtentry by themselves (but they didn't).


Revision tags: pgoyette-compat-0521
# 1.237 07-May-2018 maxv

Remove misleading comments.


Revision tags: pgoyette-compat-0502
# 1.236 01-May-2018 maxv

Remove now unused net_osdep.h includes, the other BSDs did the same.


# 1.235 29-Apr-2018 maxv

Replace
m_copym(m, 0, M_COPYALL, M_DONTWAIT)
by
m_copypacket(m, M_DONTWAIT)
when it is obvious that 'm' has M_PKTHDR set.


# 1.234 28-Apr-2018 maxv

Remove unused ipsec_var.h includes.


# 1.233 27-Apr-2018 maxv

Fix a bug introduced in rev1.154 (2009). mcl_cache still has a size of
MCLBYTES, so the area allocated is still too small.

I think it should have been MEXTMALLOC, and of course I can't test my
change.


# 1.232 26-Apr-2018 maxv

Stop using m_copy(), use m_copym() directly. m_copy is useless,
undocumented and confusing.


# 1.231 26-Apr-2018 maxv

Use M_UNWRITABLE, no functional change.


Revision tags: pgoyette-compat-0422 pgoyette-compat-0415
# 1.230 14-Apr-2018 maxv

Fix 'icmp6len', it shouldn't be ip6_plen, because we may not be at the
beginning of the packet (off+ip6_plen is beyond the end of the mbuf). By
luck, the IP6_EXTHDR_GET that follows will fail and prevent buffer
overflows in non-jumbogram packets.

For jumbograms we will probably be in trouble here; but it doesn't seem
possible to craft reliably a jumbogram for a non-jumbogram-enabled device.

So I don't think it's a huge problem.


# 1.229 14-Apr-2018 maxv

Cosmetic, and remove one XXX (no problem).


# 1.228 14-Apr-2018 maxv

Remove the RH0 code from ICMPv6. RH0 is deprecated by RFC5095 (2007) for
security reasons. We already removed it in Route6.

In addition there was an mbuf bug here: calling IP6_EXTHDR_GET twice with
the same offset, but still using the pointer from the first call, which
could have been made invalid. By luck, m_pulldown leaves zero-sized mbufs
in place, instead of freeing them.

And in general, using a 'finaldst' pointer on the mbuf, and then modifying
that mbuf with IP6_EXTHDR_GET with a smaller offset, was really error-
prone.


# 1.227 14-Apr-2018 maxv

Remove dead code. It is the same as the non-obsolete one, since
ICMP6_DST_UNREACH_NOTNEIGHBOR == ICMP6_DST_UNREACH_BEYONDSCOPE,
and the code leads to the same errno value (EHOSTUNREACH).


# 1.226 12-Apr-2018 maxv

Synchronize the code between raw_ip6.c<->icmp6.c<->raw_ip.c, so that it is
the same everywhere.


# 1.225 12-Apr-2018 maxv

Remove misleading comment; we're just checking the SP, not verifying the
AH/ESP payload. While here style a bit.


Revision tags: pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322
# 1.224 21-Mar-2018 roy

Sprinkle more soroverflow().


Revision tags: pgoyette-compat-0315 pgoyette-compat-base
# 1.223 28-Feb-2018 maxv

branches: 1.223.2;
Remove unused ipsec_private.h includes.


# 1.222 26-Feb-2018 maxv

Remove redundant condition (harmless). PR/53030.


# 1.221 26-Feb-2018 maxv

Dedup: merge ipsec4_in_reject and ipsec6_in_reject into ipsec_in_reject.
While here fix misleading comment.

ok ozaki-r@


# 1.220 12-Feb-2018 maxv

Replace bcopy -> memcpy when it is obvious that the areas don't overlap.
Rearrange ip6_splithdr() for clarity.


# 1.219 23-Jan-2018 maxv

Style, localify, remove XXX when there's no issue, and switch 'extra'
to int.


# 1.218 23-Jan-2018 maxv

Fix the check on 'maxlen', we are not creating struct icmp6_hdr but
struct nd_redirect (which is bigger). Also, make sure we can add a
struct nd_opt_rd_hdr.

Normally this doesn't change anything, since the mbuf has IPV6_MMTU
bytes, and it's always way bigger than what we need.


# 1.217 23-Jan-2018 maxv

Fix info leak. We are allocating a slot of size:

roundup(sizeof(*nd_opt) + ifp->if_addrlen, 8)

But we are not filling in the padding caused by the roundup, and therefore
several bytes are leaked, in the mbuf we're about to send to the network.


# 1.216 23-Jan-2018 maxv

Fix twice the same mistake: 'last' can't be null, so there's no point in
having this misleading branch.


# 1.215 23-Jan-2018 maxv

Style, and four fixes:

* Remove the (disabled) IPPROTO_ESP check. If the packet was decrypted it
will have M_DECRYPTED, and this is already checked.

* Memory leaks in icmp6_error2. They seem hardly triggerable.

* Fix miscomputation in _icmp6_input, the ICMP6 header is not guaranteed
to be located right after the IP6 header. ok mlelstv@

* Memory leak in _icmp6_input. This one seems to be impossible to trigger.


Revision tags: tls-maxphys-base-20171202
# 1.214 05-Nov-2017 ozaki-r

Fix usages of ipsec_used

If IPsec isn't used, we must go back to the normal path.

PR kern/52659


Revision tags: nick-nhusb-base-20170825
# 1.213 02-Aug-2017 ozaki-r

Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.


Revision tags: perseant-stdc-iso10646-base
# 1.212 07-Jul-2017 knakahara

fix PR kern/52353. implemented by ozaki-r@n.o. I just commit by proxy.

XXX need to pullup to -8.


Revision tags: netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.211 14-Mar-2017 ozaki-r

branches: 1.211.6;
Replace DIAGNOSTIC + panic with CTASSERT


# 1.210 17-Feb-2017 ozaki-r

Rename if_acquire_NOMPSAFE to if_acquire

It can be used in MP-safe ways. So let's remove the confusing postfix.
If it's used in a unsafe way, warn NOMPSAFE in a comment.


# 1.209 13-Feb-2017 ozaki-r

Protect mtudisc and redirect stuffs of icmp/icmp6 with mutex

We have to run pr_init of icmp and icmp6 prior to tcp and tcp6 ones
for mutex initialization.


# 1.208 07-Feb-2017 ozaki-r

Add missing NULL checks for m_get_rcvif


Revision tags: nick-nhusb-base-20170204
# 1.207 02-Feb-2017 ozaki-r

Defer some pr_input to workqueue

pr_input is currently called in softint. Some pr_input such as ICMP, ICMPv6
and CARP can add/delete/update IP addresses and routing table entries. For
example, icmp6_redirect_input updates an a routing table entry and
nd6_ra_input may delete an IP address.

Basically such operations shouldn't be done in softint. That aside, we have
a reason to avoid the situation; psz/psref waits cannot be used in softint,
however they are required to work in such pr_input in the MP-safe world.

The change implements the workqueue pr_input framework called wqinput which
provides a means to defer pr_input of a protocol to workqueue easily.
Currently icmp_input, icmp6_input, carp_proto_input and carp6_proto_input
are deferred to workqueue by the framework.

Proposed and discussed on tech-kern and tech-net


# 1.206 16-Jan-2017 christos

ip6_sprintf -> IN6_PRINT so that we pass the size.


# 1.205 16-Jan-2017 ryo

Make ip6_sprintf(), in_fmtaddr(), lla_snprintf() and icmp6_redirect_diag() mpsafe.

Reviewed by ozaki-r@


Revision tags: bouyer-socketcan-base
# 1.204 13-Jan-2017 ozaki-r

branches: 1.204.2;
Tweak icmp6_input; always use off, not *offp


Revision tags: pgoyette-localcount-20170107
# 1.203 12-Dec-2016 ozaki-r

Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.202 11-Dec-2016 ozaki-r

Correct sanity checks of icmp6_redirect_output

- rt->rt_ifp is always non-NULL
- Checking RTF_UP here is just racy and meaningless
- The arguments should be non-NULL (at least for now)


Revision tags: nick-nhusb-base-20161204
# 1.201 15-Nov-2016 mlelstv

Enforce alignment requirements that are violated in some cases.
For machines that don't need strict alignment (i386,amd64,vax,m68k) this
is a no-op.

Fixes PR kern/50766 but should be improved.


Revision tags: pgoyette-localcount-20161104
# 1.200 31-Oct-2016 ozaki-r

Fix race condition of in6_selectsrc

in6_selectsrc returned a pointer to in6_addr that wan't guaranteed to be
safe by pserialize (or psref), which was racy. Let callers pass a pointer
to in6_addr and in6_selectsrc copy a result to it inside pserialize
critical sections.


# 1.199 25-Oct-2016 ozaki-r

Remove unnecessary argument

No functional change.


# 1.198 18-Oct-2016 ozaki-r

Remove unnecessary pserialize_read_enter


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.197 26-Aug-2016 dholland

PR 51434 David Binderman: remove redundant test.


# 1.196 19-Aug-2016 roy

Revert r1.148
IP6_EXTHDR_GET ensures that a icmp6 header can be fetched from the mbuf
so m_pullup does not need to be called.

While here, we can safely increament interface error stats even with an
invalidated mbuf because we have a saved reference to the interface.


Revision tags: pgoyette-localcount-20160806
# 1.195 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


Revision tags: pgoyette-localcount-20160726
# 1.194 15-Jul-2016 ozaki-r

Use sin6tosa and sin6tocsa macros

No functional change.


# 1.193 15-Jul-2016 ozaki-r

Use ifatoia6 macro

No functional change.


Revision tags: pgoyette-localcount-base nick-nhusb-base-20160907
# 1.192 07-Jul-2016 ozaki-r

branches: 1.192.2;
Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.


# 1.191 05-Jul-2016 ozaki-r

Use ia6 or ia instead of ifa as a variable name of struct in6_ifaddr

We conventionally use ifa for struct ifaddr and use ia6 or ia for
struct in6_ifaddr.

No functional change.


# 1.190 28-Jun-2016 ozaki-r

Add missing NULL checks for m_get_rcvif_psref


# 1.189 21-Jun-2016 ozaki-r

Make sure returning ifp from in6_select* functions psref-ed

To this end, callers need to pass struct psref to the functions
and the fuctions acquire a reference of ifp with it. In some cases,
we can simply use if_get_byindex, however, in other cases
(say rt->rt_ifp and ia->ifa_ifp), we have no MP-safe way for now.
In order to take a reference anyway we use non MP-safe function
if_acquire_NOMPSAFE for the latter cases. They should be fixed in
the future somehow.


# 1.188 10-Jun-2016 ozaki-r

Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.


# 1.187 10-Jun-2016 ozaki-r

Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.186 18-May-2016 ozaki-r

Don't try to get outif unnecessarily from in6_selectsrc

The got outif is unused.


# 1.185 17-May-2016 ozaki-r

Get rcvif once and reuse it

No functional change.


# 1.184 17-May-2016 ozaki-r

Make sure icmp6_redirect_input frees mbuf before return


# 1.183 12-May-2016 ozaki-r

Protect ifnet list with psz and psref

The change ensures that ifnet objects in the ifnet list aren't freed during
list iterations by using pserialize(9) and psref(9).

Note that the change adds a pslist(9) for ifnet but doesn't remove the
original ifnet list (ifnet_list) to avoid breaking kvm(3) users. We
shouldn't use the original list in the kernel anymore.


Revision tags: nick-nhusb-base-20160422
# 1.182 04-Apr-2016 ozaki-r

Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html


# 1.181 01-Apr-2016 ozaki-r

Remove unnecessary casts and do s/0/NULL/ for rtrequest


# 1.180 01-Apr-2016 ozaki-r

Refine nd6log

Add __func__ to nd6log itself instead of adding it to callers.


Revision tags: nick-nhusb-base-20160319
# 1.179 21-Jan-2016 riastradh

Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.


# 1.178 21-Jan-2016 riastradh

Give proper prototype to ip_output.


Revision tags: nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.177 14-Sep-2015 ozaki-r

Update icmp6_redirect_timeout_q when changing net.inet6.icmp6.redirtimeout

We have to update icmp6_redirect_timeout_q as well as icmp6_redirtimeout
when changing net.inet6.icmp6.redirtimeout via sysctl. The updating logic
is copied from sysctl_net_inet_icmp_redirtimeout.

This change is from s-yamaguchi@IIJ (with KNF by ozaki-r) and fixes
PR kern/50240.


# 1.176 31-Aug-2015 ozaki-r

Make rt_refcnt take into account rt_timer


# 1.175 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.174 24-Aug-2015 ozaki-r

Change 0 to NULL for rtrequest's last argument (struct rtentry **ret_nrt)


# 1.173 07-Aug-2015 ozaki-r

Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.


# 1.172 24-Jul-2015 ozaki-r

Fix rtfree-ing wrong rtentry


# 1.171 17-Jul-2015 ozaki-r

Reform use of rt_refcnt

rt_refcnt of rtentry was used in bad manners, for example, direct rt_refcnt++
and rt_refcnt-- outside route.c, "rt->rt_refcnt++; rtfree(rt);" idiom, and
touching rt after rt->rt_refcnt--.

These abuses seem to be needed because rt_refcnt manages only references
between rtentry and doesn't take care of references during packet processing
(IOW references from local variables). In order to reduce the above abuses,
the latter cases should be counted by rt_refcnt as well as the former cases.

This change improves consistency of use of rt_refcnt:
- rtentry is always accessed with rt_refcnt incremented
- rtentry's rt_refcnt is decremented after use (rtfree is always used instead
of rt_refcnt--)
- functions returning rtentry increment its rt_refcnt (and caller rtfree it)

Note that rt_refcnt prevents rtentry from being freed but doesn't prevent
rtentry from being updated. Toward MP-safe, we need to provide another
protection for rtentry, e.g., locks. (Or introduce a better data structure
allowing concurrent readers during updates.)


Revision tags: nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.170 25-Nov-2014 christos

branches: 1.170.2;
CID 977389: Out of bounds access.


Revision tags: netbsd-7-0-2-RELEASE netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.169 06-Jun-2014 rmind

branches: 1.169.2;
- Eliminate RTFREE() macro in favour of rtfree() function.
- Make rtcache() function static.


# 1.168 30-May-2014 christos

Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.


# 1.167 19-May-2014 rmind

- Split off PRU_ATTACH and PRU_DETACH logic into separate functions.
- Replace malloc with kmem and eliminate M_PCB while here.
- Sprinkle more asserts.


Revision tags: rmind-smpnet-nbase rmind-smpnet-base
# 1.166 18-May-2014 rmind

Use IFNET_FIRST() rather than open coding ifnet access.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.165 25-Feb-2014 pooka

branches: 1.165.2;
Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.


# 1.164 20-Feb-2014 joerg

Bail out in case m_pulldown failed.


# 1.163 23-Nov-2013 christos

convert from CIRCLEQ to TAILQ.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.162 05-Jun-2013 christos

branches: 1.162.2;
IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.


Revision tags: agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.161 23-Jun-2012 christos

branches: 1.161.2;
4 new sysctls to avoid ipv6 DoS attacks from OpenBSD


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.160 22-Mar-2012 drochner

remove KAME IPSEC, replaced by FAST_IPSEC


Revision tags: netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.159 31-Dec-2011 christos

branches: 1.159.2; 1.159.6; 1.159.8;
- fix offsetof usage, and redundant defines
- kill pointer casts to 0


# 1.158 19-Dec-2011 drochner

rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.157 31-Aug-2011 plunky

branches: 1.157.2; 1.157.6;
NULL does not need a cast


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 rmind-uvmplock-base
# 1.156 12-Sep-2010 drochner

avoid NULL dereference in error case


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.155 18-Oct-2009 christos

branches: 1.155.2; 1.155.4;
fix the sun2 case for real.


# 1.154 12-Oct-2009 christos

unbreak sun2.


# 1.153 16-Sep-2009 pooka

Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.152 18-Mar-2009 cegger

bzero -> memset


# 1.151 18-Mar-2009 cegger

bcmp -> memcmp


Revision tags: netbsd-5-2-3-RELEASE netbsd-5-1-5-RELEASE netbsd-5-2-2-RELEASE netbsd-5-1-4-RELEASE netbsd-5-2-1-RELEASE netbsd-5-1-3-RELEASE netbsd-5-2-RELEASE netbsd-5-2-RC1 netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 nick-hppapmap-base2 netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 haad-dm-base mjf-devfs2-base
# 1.150 03-Oct-2008 adrianp

branches: 1.150.2; 1.150.8;
Fix for CVE-2008-3530 from matt@
Implement improved checking for MTU values on ICMP 'Packet Too Big Messages'


Revision tags: wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.149 06-Aug-2008 plunky

Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base
# 1.148 07-May-2008 bouyer

branches: 1.148.2; 1.148.6;
Sync with ipv4 icmp_input(): make sure the mbuf is writable and
contains the entire icmp message befre calling icmp6_input().
should fix "panic: mbuf too short for IPv6 header" seen by several peoples.


# 1.147 04-May-2008 thorpej

Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577


Revision tags: yamt-nfs-mp-base
# 1.146 23-Apr-2008 thorpej

branches: 1.146.2;
Use <net/net_stats.h> / netstat_sysctl().


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.145 15-Apr-2008 thorpej

branches: 1.145.2;
Make ip6 and icmp6 stats per-cpu.


# 1.144 08-Apr-2008 thorpej

Change IPv6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ip6stat structure; old netstat
binaries will continue to work properly.


# 1.143 08-Apr-2008 thorpej

Change ICMP6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old icmp6stat structure; old netstat
binaries will continue to work properly.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.142 27-Feb-2008 matt

Convert to ansi definitions from old-style definitons.
Remember that func() is not ansi, func(void) is.


Revision tags: nick-net80211-sync-base bouyer-xeni386-merge1 vmlocking2-base3 bouyer-xeni386-nbase yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 bouyer-xeni386-base yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase mjf-devfs-base matt-armv6-base jmcneill-pm-base hpcarm-cleanup-base reinoud-bufcleanup-base
# 1.141 04-Dec-2007 dyoung

branches: 1.141.8; 1.141.12;
Use IFNET_FOREACH() and IFADDR_FOREACH().


Revision tags: vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.140 01-Nov-2007 dyoung

branches: 1.140.2; 1.140.4;
De-__P().


# 1.139 29-Oct-2007 dyoung

The IPv6 stack labels incoming packets with an m_tag whose payload
is a struct ip6aux. A struct ip6aux used to contain a pointer to
an in6_ifaddr, but that pointer could become a dangling reference
in the lifetime of the m_tag, because ip6_setdstifaddr() did not
increase the in6_ifaddr's reference count. I have removed the
pointer from ip6aux. I load it with the interesting fields from
the in6_ifaddr (an IPv6 address, a scope ID, and some flags),
instead.


# 1.138 24-Oct-2007 dyoung

Replace rote sockaddr_in6 initializations (memset(), set sa6_family,
sa6_len, and sa6_add) with sockaddr_in6_init() calls.

De-__P(). Constify. KNF. Shorten a staircase. Change bcmp() to
memcmp().

Extract subroutine in6_setzoneid() from in6_setscope(), for re-use
soon.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base vmlocking-base
# 1.137 19-Sep-2007 dyoung

branches: 1.137.4;
1) Introduce a new socket option, (SOL_SOCKET, SO_NOHEADER), that
tells a socket that it should both add a protocol header to tx'd
datagrams and remove the header from rx'd datagrams:

int onoff = 1, s = socket(...);
setsockopt(s, SOL_SOCKET, SO_NOHEADER, &onoff);

2) Add an implementation of (SOL_SOCKET, SO_NOHEADER) for raw IPv4
sockets.

3) Reorganize the protocols' pr_ctloutput implementations a bit.
Consistently return ENOPROTOOPT when an option is unsupported,
and EINVAL if a supported option's arguments are incorrect.
Reorganize the flow of code so that it's more clear how/when
options are passed down the stack until they are handled.

Shorten some pr_ctloutput staircases for readability.

4) Extract common mbuf code into subroutines, add new sockaddr
methods, and introduce a new subroutine, fsocreate(), for reuse
later; use it first in sys_socket():

struct mbuf *m_getsombuf(struct socket *so)

Create an mbuf and make its owner the socket `so'.

struct mbuf *m_intopt(struct socket *so, int val)

Create an mbuf, make its owner the socket `so', put the
int `val' into it, and set its length to sizeof(int).


int fsocreate(..., int *fd)

Create a socket, a la socreate(9), put the socket into the
given LWP's descriptor table, return the descriptor at `fd'
on success.

void *sockaddr_addr(struct sockaddr *sa, socklen_t *slenp)
const void *sockaddr_const_addr(const struct sockaddr *sa, socklen_t *slenp)

Extract a pointer to the address part of a sockaddr. Write
the length of the address part at `slenp', if `slenp' is
not NULL.

socklen_t sockaddr_getlen(const struct sockaddr *sa)

Return the length of a sockaddr. This just evaluates to
sa->sa_len. I only add this for consistency with code that
appears in a portable userland library that I am going to
import.

const struct sockaddr *sockaddr_any(const struct sockaddr *sa)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.

const void *sockaddr_anyaddr(const struct sockaddr *sa, socklen_t *slenp)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.


Revision tags: nick-csl-alignment-base5
# 1.136 10-Aug-2007 dyoung

branches: 1.136.2;
Constify. bcopy -> memcpy.


Revision tags: matt-mips64-base
# 1.135 19-Jul-2007 dyoung

branches: 1.135.4; 1.135.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.134 13-Jun-2007 dyoung

branches: 1.134.2;
Persuasive programming: check M_UNWRITABLE(m, len) instead of
m->m_len<len before pulling up, because that helps make it clear
that we m_pullup() in order to guarantee that the contiguous region
is *writable*.


# 1.133 23-May-2007 christos

Ansify + add a few comments, from Karl Sj��dahl


Revision tags: yamt-idlelwp-base8
# 1.132 02-May-2007 dyoung

Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


Revision tags: thorpej-atomic-base
# 1.131 04-Mar-2007 christos

branches: 1.131.2; 1.131.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.130 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


# 1.129 10-Feb-2007 degroote

branches: 1.129.2;
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.128 29-Jan-2007 dyoung

bzero -> memset


# 1.127 15-Jan-2007 dyoung

Cosmetic: indent using ASCII horizontal tab, insert space following
comma, wrap line.


# 1.126 15-Jan-2007 degroote

Fix an infinite loop ( and local dos ) in the case where the ip6_hdr and
the icmp6_hdr are not in the same mbuf.
Fix pr/34994 and probably pr/35333
Ok @rpaulo


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.125 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.124 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


Revision tags: netbsd-4-base
# 1.123 16-Nov-2006 christos

branches: 1.123.2;
__unused removal on arguments; approved by core.


Revision tags: yamt-splraiseipl-base2
# 1.122 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.121 05-Sep-2006 dyoung

branches: 1.121.2; 1.121.4;
Simplify and repair icmp6_input() to stop the kernel from panicking
in m_copydata() when an ICMP6_ECHO_REQUEST is received, as reported
by Tatoku Ogaito on current-users@.


Revision tags: yamt-pdpolicy-base8
# 1.120 01-Sep-2006 dyoung

Vastly simplify the code that copies an ICMP6 packet to two data
paths: ICMP6 reply path, and socket path.


# 1.119 30-Aug-2006 christos

declare the type of code.


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.118 11-Jul-2006 tron

Clear mbuf checksum flags before passing it to ip6_output(). We might
recycle a mbuf which contained a hardware provided checksum. This
fixes "traceroute6" to a machine which is using a wm(4) interface
that has UDP or TCP checksum offload enabled.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base
# 1.117 07-Jun-2006 kardel

branches: 1.117.2;
merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base simonb-timecounters-base
# 1.116 15-Apr-2006 christos

branches: 1.116.2;
Coverity CID 740: Change constant comparisons to MCLBYTES to KASSERT and remove
extraneous tests.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2
# 1.115 05-Mar-2006 rpaulo

branches: 1.115.2; 1.115.4;
NDP-related improvements:
RFC4191
- supports host-side router-preference

RFC3542
- if DAD fails on a interface, disables IPv6 operation on the
interface
- don't advertise MLD report before DAD finishes

Others
- fixes integer overflow for valid and preferred lifetimes
- improves timer granularity for MLD, using callout-timer.
- reflects rtadvd's IPv6 host variable information into kernel
(router only)
- adds a sysctl option to enable/disable pMTUd for multicast
packets
- performs NUD on PPP/GRE interface by default
- Redirect works regardless of ip6_accept_rtadv
- removes RFC1885-related code

From the KAME project via SUZUKI Shinsuke.
Reviewed by core.


Revision tags: yamt-pdpolicy-base
# 1.114 03-Mar-2006 rpaulo

branches: 1.114.2;
Fix typos in comments.

From: the KAME project via SUZUKI Shinsuke.


Revision tags: yamt-uio_vmspace-base5
# 1.113 21-Jan-2006 rpaulo

branches: 1.113.2; 1.113.4;
Better support of IPv6 scoped addresses.

- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.
- and, while there, don't try to remove the ff02::/32 interface route
entry in in6_ifdetach() as it's already gone.

This also includes some level of support for the standard source
address selection algorithm defined in RFC3484, which will be
completed on in the future.

From the KAME project via JINMEI Tatuya.
Approved by core@.


# 1.112 11-Dec-2005 christos

branches: 1.112.2;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base ktrace-lwp-base
# 1.111 19-Oct-2005 bouyer

In icmp6_redirect_output(), sip6 is initialised to point to the data area of
m0. But m0 may be freed later, so trying to use sip6 at the end of this
function is wrong. My guess is that we want to reference the data area
of m (the mbuf about to be send) instead at this point.
Fix a panic on Xen (where a data area of a mbuf may be unmapped when the
mbuf is freed), and probably potential data/pool corruption in other cases.


Revision tags: yamt-vop-base
# 1.110 18-Aug-2005 yamt

branches: 1.110.2;
- introduce M_MOVE_PKTHDR and use it where appropriate.
intended to be mostly API compatible with openbsd/freebsd.
- remove a glue #define in netipsec/ipsec_osdep.h.


# 1.109 29-May-2005 christos

branches: 1.109.2;
- avoid shadowed variables
- sprinkle const.


Revision tags: netbsd-3-1-1-RELEASE netbsd-3-0-3-RELEASE netbsd-3-1-RELEASE netbsd-3-0-2-RELEASE netbsd-3-1-RC4 netbsd-3-1-RC3 netbsd-3-1-RC2 netbsd-3-1-RC1 netbsd-3-0-1-RELEASE netbsd-3-0-RELEASE netbsd-3-0-RC6 netbsd-3-0-RC5 netbsd-3-0-RC4 netbsd-3-0-RC3 netbsd-3-0-RC2 netbsd-3-0-RC1 yamt-km-base4 yamt-km-base3 netbsd-3-base yamt-km-base2 yamt-km-base kent-audio2-base
# 1.108 17-Jan-2005 itojun

branches: 1.108.6; 1.108.8; 1.108.10;
shouldn't check code field on "packet too big" icmp6 message.


Revision tags: kent-audio1-beforemerge kent-audio1-base
# 1.107 25-May-2004 atatat

branches: 1.107.4;
Sysctl descriptions under net subtree (net.key not done)


Revision tags: netbsd-2-0-base
# 1.106 26-Mar-2004 itojun

branches: 1.106.2;
do not touch m->m_pkthdr.rcvif after m becomes invalid. Patrick Latifi


# 1.105 24-Mar-2004 atatat

Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.


# 1.104 17-Dec-2003 lha

Fix ICMPV6CTL_ND6_[DP]RLIST, they broke with new sysctl.
Makes ndp -r/ndp -p work again, patch from atatat


# 1.103 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.102 30-Oct-2003 simonb

Remove some assigned-to but otherwise unused variables.


# 1.101 04-Sep-2003 itojun

revamp inpcb/in6pcb so that they are more aligned with each other.
in6pcb lookup now uses hash(9).


# 1.100 25-Aug-2003 itojun

deref member in in6p directly, don't rely on existence of macro


# 1.99 22-Aug-2003 itojun

remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.


# 1.98 22-Aug-2003 itojun

change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.


# 1.97 22-Aug-2003 jonathan

Replace the set_socket() method of passing an extra struct socket*
argument to ip6_output() with a new explicit struct in6pcb* argument.
(The underlying socket can be obtained via in6pcb->inp6_socket.)

In preparation for fast-ipsec. Reviewed by itojun.


# 1.96 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.95 06-Aug-2003 itojun

m_cat may free mbuf on 2nd arg, so m_pkthdr manipulation has to happen
before m_cat call. from Julian Coleman via kame.


# 1.94 24-Jun-2003 itojun

branches: 1.94.2;
remove unneeded checks of accept_rtadv. from kame


# 1.93 24-Jun-2003 itojun

use time.tv_sec directly


# 1.92 06-Jun-2003 itojun

- sync up MLD declaration with RFC3542 (s/MLD6/MLD/)
- routing header declaration with RFC3542
(note: sizeof(ip6_rthdr0) has changed!)
also, sync up with RFC2460 routing header definition (no "strict" source
routing mode any more)

part of advanced API update (RFC2292 -> 3542).


# 1.91 03-Jun-2003 itojun

remove assumption on redirect header option processing. from kame


# 1.90 14-May-2003 itojun

always use PULLDOWN_TEST codepath.


# 1.89 31-Mar-2003 itojun

avoid mbuf leak in redirect header option attachment. more complete
fix to come. from kame


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.88 27-Sep-2002 provos

remove trailing \n in panic(). approved perry.


# 1.87 23-Sep-2002 simonb

Remove breaks after returns, unreachable returns and returns after
returns(!).


# 1.86 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame.


Revision tags: gehenna-devsw-base
# 1.85 30-Jul-2002 itojun

no need to check NULL mbuf, as we touch it already.
From: tedu <grendel@zeitbombe.org>


# 1.84 10-Jul-2002 itojun

correct ping6 -w result wth hostname with [A-Z]. PR 17540. sync w/kame


# 1.83 30-Jun-2002 thorpej

Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).


# 1.82 09-Jun-2002 itojun

whitespace cleanup


# 1.81 08-Jun-2002 itojun

whitespace cleanup


# 1.80 31-May-2002 itojun

do not mistakenly lock PMTUD route entry with RTV_MTU.


# 1.79 29-May-2002 christos

make this compile again.


# 1.78 29-May-2002 itojun

correct rmx_mtu value after PMTUD entry timeout (should be set to 0)


# 1.77 24-May-2002 itojun

extra blank line


# 1.76 24-May-2002 itojun

make a strict check before sending FQDN node information reply. sync w/kame


Revision tags: netbsd-1-6-base eeh-devprop-base newlock-base
# 1.75 05-Mar-2002 itojun

branches: 1.75.6; 1.75.8;
on redirect output, always try to attach target link layer address option.


Revision tags: ifpoll-base
# 1.74 21-Dec-2001 itojun

whitespace/costmetic sync w/kame


# 1.73 20-Dec-2001 itojun

centralize multicast group management (in6_join/leavegroup).
have a flag for ip6_output() to fragment to minimum MTU.
sync with kame


# 1.72 07-Dec-2001 itojun

correct timing to increment icmp6 MIB variables. sync with kame


# 1.71 13-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-mips-cache-base
# 1.70 29-Oct-2001 simonb

Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.


# 1.69 24-Oct-2001 itojun

more whitespace sync with kame


# 1.68 18-Oct-2001 itojun

branches: 1.68.2;
simplify per-if stats.


# 1.67 15-Oct-2001 itojun

sync with kame.
net.inet6.icmp6.nodeinfo is now a bitmap (2^0 = ping6 -w, 2^1 = ping6 -a).
give up local if there's mbuf alloc failures.
cope with ".." in hostname.
sync comments/whitespaces.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.66 22-Jun-2001 itojun

branches: 1.66.2;
remove RFC1885 compatibility code in #ifdef COMPAT_RFC1885, for icmp6
reply packet size consideration (obsolete, not used for a long time).
sync with kame


# 1.65 01-Jun-2001 itojun

use default hoplimit when incoming interface is not given to icmp6_reflect.
sync with kame


# 1.64 08-May-2001 itojun

correct faith prefix determination. use sys/netinet/if_faith.c:faithprefix()
to determine. sync with kame.
(without this change, non-faith socket may mistakenly accept for-faith traffic)


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.63 04-Apr-2001 itojun

make sure rcvif is sane on call to icmp6_reflect


# 1.62 30-Mar-2001 itojun

enable FAKE_LOOPBACK_IF case by default.
now traffic on loopback interface will be presented to bpf as normal wire
format packet (without KAME scopeid in s6_addr16[1]).

fix KAME PR 250 (host mistakenly accepts packets to fe80::x%lo0).

sync with kame.


# 1.61 21-Mar-2001 itojun

set rmx_mtu to L2 interface mtu, instead of 0, on mtudisc timeout.
ip6_output() change is for safety. sync with kame


# 1.60 08-Mar-2001 itojun

remove bogus rtfree. sync with kame. inspired by openbsd PR 1706.


# 1.59 01-Mar-2001 itojun

branches: 1.59.2;
make sure to enforce inbound ipsec policy checking, for any protocols on top
of ip (check it when final header is visited). sync with kame.
XXX kame team will need to re-check policy engine code


# 1.58 11-Feb-2001 itojun

pull latest kame pcbnotify code. synchronizes ICMPv6 path mtu discovery
behavior with other protocols (i.e. validation, use of hiwat/lowat).


# 1.57 11-Feb-2001 itojun

recover $NetBSD$ (removed by mistake)


# 1.56 10-Feb-2001 itojun

to sync with kame better, (1) remove register declaration for variables,
(2) sync whitespaces, (3) update comments. (4) bring in some of portability
and logging enhancements. no functional changes here.


# 1.55 08-Feb-2001 itojun

implement upper limit to icmp6 redirects (experimental, turned off)
negative value to {mtudisc,redirect}_{hi,lo}wat will turn off the limitation.
sync with kame.


# 1.54 07-Feb-2001 itojun

remove bogus DIAGNOSTIC. sync with kame


# 1.53 07-Feb-2001 itojun

during ip6/icmp6 inbound packet processing, do not call log() nor printf() in
normal operation (/var can get filled up by flodding bogus packets).
sysctl net.inet6.icmp6.nd6_debug will turn on diagnostic messages.
(#define ND6_DEBUG will turn it on by default)

improve stats in ND6 code.

lots of synchronziation with kame (including comments and cometic ones).


# 1.52 24-Jan-2001 itojun

- record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation


# 1.51 16-Jan-2001 itojun

s/ND6DEBUG/ND6_DEBUG/ to meet other places


# 1.50 08-Jan-2001 itojun

wrap icmp6 checksum error printf() into #ifdef ND6DEBUG.
sync with kame, NetBSD PR 11911.


# 1.49 11-Dec-2000 itojun

no need to rtalloc1() twice in pmtud. from kame


# 1.48 09-Dec-2000 itojun

update icmp6 too big validation. the change is necessary since pmtud is
mandatory for IPv6 (so we can't just validate by using connected pcb - we need
to allow traffic from unconnected pcb to do pmtud).
- if the traffic is validated by xx_ctlinput, allow up to "hiwat" pmtud
route entries.
- if the traffic was not validated by xx_ctlinput, allow up to "lowat" pmtud
route entries (there's upper limit, so bad guys cannot blow up our routing
table).
sync with kame

XXX need to think again about default hiwat/lowat value.
XXX victim selection to help starvation case


# 1.47 11-Nov-2000 itojun

improve spec conformance of node information query (07).
sync with kame.


# 1.46 18-Oct-2000 itojun

verify ICMPv6 too big messages based on TCP pcbs, and/or IPsec SA.
TODO: udp6, and sendto consideration. as pmtud is mandatory for IPv6,
it is rather important for us to support those cases.
TODO: more testing
TODO: kame sync


# 1.45 10-Oct-2000 itojun

sync with kame ($KAME$)


# 1.44 02-Oct-2000 itojun

fix compilation without INET. fix confusion between ipsecstat and ipsec6stat.
sync with kame.


# 1.43 16-Sep-2000 itojun

kame sys/netinet6/icmp6.c 1.140 -> 1.144
> in the check for the incoming redirect message, examine the gateway
> (from the routing table) only when the address family of the gateway is
> AF_INET6.


# 1.42 19-Aug-2000 itojun

- icmp6 nodeinfo: remove possibility of unaligned pointer access.
- jumbo payload output: fix incorrect mbuf manipulation
- pedant: align issues, mbuf assumption
(sync with kame)


# 1.41 03-Aug-2000 itojun

clearifications in icmp6 node query support.
XXX previous commit included "supported qtypes" icmp6 node query support.
sorry commit message was mistaken.


# 1.40 03-Aug-2000 itojun

correct typo in #define. ICMP6_NI_SUCESS -> SUCCESS (notice missing C).
sync with kame.


# 1.39 30-Jul-2000 itojun

sync comment with reality


# 1.38 28-Jul-2000 itojun

nuke the following sysctl variables. "ppsratelimit" should work better.
need to recompile sbin/sysctl after updating /usr/include.
net.inet.tcp.rstratelimit
net.inet.icmp.errratelimit
net.inet6.icmp6.errratelimit


# 1.37 09-Jul-2000 itojun

add ppsratelimit(9), which does event-per-sec rate limitation.
use it from icmp6 error rate limitation code.
XXX better name for the function?


# 1.36 07-Jul-2000 itojun

sync with kame.
introduce in6_{recover,embed}scope, for in-kernel scoped-address manipulation.
improve in6_pcbnotify.


# 1.35 06-Jul-2000 itojun

- do not use bitfield for router renumbering header.
- add protection mechanism against ND cache corruption due to bad NUD hints.
- more stats
- icmp6 pps limitation. TOOD: should implement ppsratecheck(9).


# 1.34 28-Jun-2000 mrg

<vm/vm.h> -> <uvm/uvm_extern.h>


Revision tags: netbsd-1-5-base
# 1.33 13-Jun-2000 itojun

branches: 1.33.2;
signedness issue with char, take 2. confirmed with i386 cc -funsigned-char.


# 1.32 13-Jun-2000 itojun

workaround to suppress warning on char == unsigned char arch.


# 1.31 12-Jun-2000 itojun

better conformance to draft-ietf-ipngwg-icmp-name-lookups-05.
the old code was chimera of 03 and 05 draft.

-n by default, since IPv6 reverse lookup takes too much time.
use -H to enable reverse name lookup.


Revision tags: minoura-xpg4dl-base
# 1.30 22-May-2000 itojun

branches: 1.30.2;
disallow negative numbers for ratelimit interval (tcp, icmp, icmp6).


# 1.29 09-May-2000 itojun

do not try NUD unless the gateway is a real neighbor.
real fix to KAME PR 245 (workaround has been implemented).


# 1.28 13-Apr-2000 itojun

do not return icmp6 error against icmp6 error.
(this is due to a bug in header chain chasing)


# 1.27 22-Mar-2000 itojun

use ip6_{last,next}hdr in icmp6 inbound packet parsing.


# 1.26 01-Mar-2000 itojun

introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)


# 1.25 28-Feb-2000 itojun

fix ICMPv6 redirect input. the bug can result in invalid ND entry.


# 1.24 28-Feb-2000 itojun

support draft-ietf-ipngwg-icmp-name-lookups-05.txt, drop support for
draft-ietf-ipngwg-icmp-name-lookups-04.txt.

There are certain bitfield change in 04 draft to 05 draft, which makes
04 "ping6 -a" and 05 "ping6 -a" not interoperable. sigh.


# 1.23 26-Feb-2000 itojun

bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.


# 1.22 17-Feb-2000 darrenr

Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.


# 1.21 15-Feb-2000 thorpej

Fix a couple of brainos in the last.


# 1.20 14-Feb-2000 thorpej

Use ratecheck() for ICMP6 rate limiting.


Revision tags: chs-ubc2-newbase
# 1.19 06-Feb-2000 itojun

fix include pathname for better rfc2292 compliance.


# 1.18 16-Jan-2000 itojun

add missing ipcomp cases.


# 1.17 07-Jan-2000 itohy

Rename variable "prep" for PReP port.


# 1.16 06-Jan-2000 itojun

remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...


# 1.15 05-Jan-2000 itojun

avoid panic on getsockopt(ICMPV6_FILTER).


# 1.14 02-Jan-2000 itojun

add net.inet6.icmp6.nodeinfo sysctl.
this allows you to disable/enable ICMPv6 node information query/reply
processing (which tells remote end the gethostname(3) setting, interface
addresses on the node, and some other things - documented in
draft-ietf-ipngwg-icmp-name-lookup* or something alike).

to test it, try ping6 -w ::1 with nodeinfo=0 and nodeinfo=1.
(sync with kame change)


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.13 15-Dec-1999 itojun

do not overwrite traffic class field when we write IPv6 version field.


# 1.12 13-Dec-1999 itojun

sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)


Revision tags: comdex-fall-1999-base fvdl-softdep-base
# 1.11 01-Oct-1999 itojun

branches: 1.11.2; 1.11.8;
consistent logging for icmp6 redirects
XXX should make logs 1-liner so that duplicated logs can be compressed
by syslog(8)?


Revision tags: chs-ubc2-base
# 1.10 31-Jul-1999 itojun

sync with recent KAME.
- loosen ipsec restriction on packet diredtion.
- revise icmp6 redirect handling on IsRouter bit.
- tcp/udp notification processing (link-local address case)
- cosmetic fixes (better code share across *BSD).


# 1.9 30-Jul-1999 itojun

remove reference to in6_systm.h (file itself will be removed afterwords)


# 1.8 22-Jul-1999 itojun

- implement IPv6 pmtud, which is necessary for TCP6.
- fix memory leak on SO_DEBUG over TCP.


# 1.7 22-Jul-1999 itojun

change unnecessary u_long/long into u_int32_t or something relevant.
more fixes should follow.


# 1.6 09-Jul-1999 thorpej

defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).


# 1.5 06-Jul-1999 itojun

sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups


# 1.4 06-Jul-1999 itojun

checked build on alpha and i386, with GENERIC.v6.
fixed several sizeof(void *) and sizeof(size_t) issues on alpha.

Thanks to: Dave Huang and Tim Rightnour


# 1.3 03-Jul-1999 thorpej

RCS ID police.


# 1.2 01-Jul-1999 itojun

branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.1 28-Jun-1999 itojun

branches: 1.1.2;
file icmp6.c was initially added on branch kame.


# 1.251 22-Aug-2022 knakahara

Add sysctl entry to enable/disable to use path MTU discovery for icmpv6 reflecting.

If we want to use path MTU discovery for icmp reflecting set
net.inet6.icmp6.reflect_pmtu=1. Default(=0) is the same as before, that is,
use IPV6_MINMTU.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.250 19-Feb-2021 christos

- Make ALIGNED_POINTER use __alignof(t) instead of sizeof(t). This is more
correct because it works with non-primitive types and provides the ABI
alignment for the type the compiler will use.
- Remove all the *_HDR_ALIGNMENT macros and asserts
- Replace POINTER_ALIGNED_P with ACCESSIBLE_POINTER which is identical to
ALIGNED_POINTER, but returns that the pointer is always aligned if the
CPU supports unaligned accesses.
[ as proposed in tech-kern ]


# 1.249 15-Feb-2021 martin

Fix the build.
Maybe there should be a ICMP6_HDR_ALIGNMENT, but for now there is
only IP6_HDR_ALIGNMENT.


# 1.248 14-Feb-2021 christos

- centralize header align and pullup into a single inline function
- use a single macro to align pointers and expose the alignment, instead
of hard-coding 3 in 1/2 the macros.
- fix an issue in the ipv6 lt2p where it was aligning for ipv4 and pulling
for ipv6.


# 1.247 11-Sep-2020 roy

branches: 1.247.2;
inet6: Use generic Neighor Detection rather than IPv6 specific

No functional change intended.


# 1.246 27-Jul-2020 roy

icmp6: Remove __packed attribute from icmp6 structures

They should naturally align.
Add compile time assertations to icmp6.c to prove this.


# 1.245 12-Jun-2020 roy

Remove in-kernel handling of Router Advertisements

This is much better handled by a user-land tool.
Proposed on tech-net here:
https://mail-index.netbsd.org/tech-net/2020/04/22/msg007766.html

Note that the ioctl SIOCGIFINFO_IN6 no longer sets flags. That now
needs to be done using the pre-existing SIOCSIFINFO_FLAGS ioctl.

Compat is fully provided where it makes sense, but trying to turn on
RA handling will obviously throw an error as it no longer exists.

Note that if you use IPv6 temporary addresses, this now needs to be
turned on in dhcpcd.conf(5) rather than in sysctl.conf(5).


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base phil-wifi-20200406
# 1.244 09-Mar-2020 roy

route: RTM_MISS now puts the message source address in RTA_AUTHOR

route(8) also reports this.
A userland app could use this to blacklist nodes who probe for machines
that doesn't exist on a subnet / prefix.


Revision tags: is-mlppp-base ad-namecache-base3 ad-namecache-base2 ad-namecache-base1 ad-namecache-base phil-wifi-20191119
# 1.243 06-Oct-2019 uwe

icmp6_notify_error - fix ctlfunc typedef to match pr_ctlinput,
drop the cast that is no longer necessary.


Revision tags: netbsd-9-3-RELEASE netbsd-9-2-RELEASE netbsd-9-1-RELEASE netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.242 22-Dec-2018 maxv

Replace: M_COPY_PKTHDR -> m_copy_pkthdr. No functional change, since the
former is a macro to the latter.


# 1.241 22-Dec-2018 maxv

Replace: M_MOVE_PKTHDR -> m_move_pkthdr. No functional change, since the
former is a macro to the latter.


Revision tags: pgoyette-compat-1126
# 1.240 25-Oct-2018 ozaki-r

Remove a leftover debug printf

Pointed out by hannken@


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.239 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


Revision tags: pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625
# 1.238 01-Jun-2018 ozaki-r

branches: 1.238.2;
Fix _rt_free via rtrequest(RTM_DELETE) hangs in rt_timer handlers

A rt_timer handler is passed a rtentry with an extra reference that avoids the
rtentry is accidentally released. So rt_timer handers must release the reference
of a passed rtentry by themselves (but they didn't).


Revision tags: pgoyette-compat-0521
# 1.237 07-May-2018 maxv

Remove misleading comments.


Revision tags: pgoyette-compat-0502
# 1.236 01-May-2018 maxv

Remove now unused net_osdep.h includes, the other BSDs did the same.


# 1.235 29-Apr-2018 maxv

Replace
m_copym(m, 0, M_COPYALL, M_DONTWAIT)
by
m_copypacket(m, M_DONTWAIT)
when it is obvious that 'm' has M_PKTHDR set.


# 1.234 28-Apr-2018 maxv

Remove unused ipsec_var.h includes.


# 1.233 27-Apr-2018 maxv

Fix a bug introduced in rev1.154 (2009). mcl_cache still has a size of
MCLBYTES, so the area allocated is still too small.

I think it should have been MEXTMALLOC, and of course I can't test my
change.


# 1.232 26-Apr-2018 maxv

Stop using m_copy(), use m_copym() directly. m_copy is useless,
undocumented and confusing.


# 1.231 26-Apr-2018 maxv

Use M_UNWRITABLE, no functional change.


Revision tags: pgoyette-compat-0422 pgoyette-compat-0415
# 1.230 14-Apr-2018 maxv

Fix 'icmp6len', it shouldn't be ip6_plen, because we may not be at the
beginning of the packet (off+ip6_plen is beyond the end of the mbuf). By
luck, the IP6_EXTHDR_GET that follows will fail and prevent buffer
overflows in non-jumbogram packets.

For jumbograms we will probably be in trouble here; but it doesn't seem
possible to craft reliably a jumbogram for a non-jumbogram-enabled device.

So I don't think it's a huge problem.


# 1.229 14-Apr-2018 maxv

Cosmetic, and remove one XXX (no problem).


# 1.228 14-Apr-2018 maxv

Remove the RH0 code from ICMPv6. RH0 is deprecated by RFC5095 (2007) for
security reasons. We already removed it in Route6.

In addition there was an mbuf bug here: calling IP6_EXTHDR_GET twice with
the same offset, but still using the pointer from the first call, which
could have been made invalid. By luck, m_pulldown leaves zero-sized mbufs
in place, instead of freeing them.

And in general, using a 'finaldst' pointer on the mbuf, and then modifying
that mbuf with IP6_EXTHDR_GET with a smaller offset, was really error-
prone.


# 1.227 14-Apr-2018 maxv

Remove dead code. It is the same as the non-obsolete one, since
ICMP6_DST_UNREACH_NOTNEIGHBOR == ICMP6_DST_UNREACH_BEYONDSCOPE,
and the code leads to the same errno value (EHOSTUNREACH).


# 1.226 12-Apr-2018 maxv

Synchronize the code between raw_ip6.c<->icmp6.c<->raw_ip.c, so that it is
the same everywhere.


# 1.225 12-Apr-2018 maxv

Remove misleading comment; we're just checking the SP, not verifying the
AH/ESP payload. While here style a bit.


Revision tags: pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322
# 1.224 21-Mar-2018 roy

Sprinkle more soroverflow().


Revision tags: pgoyette-compat-0315 pgoyette-compat-base
# 1.223 28-Feb-2018 maxv

branches: 1.223.2;
Remove unused ipsec_private.h includes.


# 1.222 26-Feb-2018 maxv

Remove redundant condition (harmless). PR/53030.


# 1.221 26-Feb-2018 maxv

Dedup: merge ipsec4_in_reject and ipsec6_in_reject into ipsec_in_reject.
While here fix misleading comment.

ok ozaki-r@


# 1.220 12-Feb-2018 maxv

Replace bcopy -> memcpy when it is obvious that the areas don't overlap.
Rearrange ip6_splithdr() for clarity.


# 1.219 23-Jan-2018 maxv

Style, localify, remove XXX when there's no issue, and switch 'extra'
to int.


# 1.218 23-Jan-2018 maxv

Fix the check on 'maxlen', we are not creating struct icmp6_hdr but
struct nd_redirect (which is bigger). Also, make sure we can add a
struct nd_opt_rd_hdr.

Normally this doesn't change anything, since the mbuf has IPV6_MMTU
bytes, and it's always way bigger than what we need.


# 1.217 23-Jan-2018 maxv

Fix info leak. We are allocating a slot of size:

roundup(sizeof(*nd_opt) + ifp->if_addrlen, 8)

But we are not filling in the padding caused by the roundup, and therefore
several bytes are leaked, in the mbuf we're about to send to the network.


# 1.216 23-Jan-2018 maxv

Fix twice the same mistake: 'last' can't be null, so there's no point in
having this misleading branch.


# 1.215 23-Jan-2018 maxv

Style, and four fixes:

* Remove the (disabled) IPPROTO_ESP check. If the packet was decrypted it
will have M_DECRYPTED, and this is already checked.

* Memory leaks in icmp6_error2. They seem hardly triggerable.

* Fix miscomputation in _icmp6_input, the ICMP6 header is not guaranteed
to be located right after the IP6 header. ok mlelstv@

* Memory leak in _icmp6_input. This one seems to be impossible to trigger.


Revision tags: tls-maxphys-base-20171202
# 1.214 05-Nov-2017 ozaki-r

Fix usages of ipsec_used

If IPsec isn't used, we must go back to the normal path.

PR kern/52659


Revision tags: nick-nhusb-base-20170825
# 1.213 02-Aug-2017 ozaki-r

Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.


Revision tags: perseant-stdc-iso10646-base
# 1.212 07-Jul-2017 knakahara

fix PR kern/52353. implemented by ozaki-r@n.o. I just commit by proxy.

XXX need to pullup to -8.


Revision tags: netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.211 14-Mar-2017 ozaki-r

branches: 1.211.6;
Replace DIAGNOSTIC + panic with CTASSERT


# 1.210 17-Feb-2017 ozaki-r

Rename if_acquire_NOMPSAFE to if_acquire

It can be used in MP-safe ways. So let's remove the confusing postfix.
If it's used in a unsafe way, warn NOMPSAFE in a comment.


# 1.209 13-Feb-2017 ozaki-r

Protect mtudisc and redirect stuffs of icmp/icmp6 with mutex

We have to run pr_init of icmp and icmp6 prior to tcp and tcp6 ones
for mutex initialization.


# 1.208 07-Feb-2017 ozaki-r

Add missing NULL checks for m_get_rcvif


Revision tags: nick-nhusb-base-20170204
# 1.207 02-Feb-2017 ozaki-r

Defer some pr_input to workqueue

pr_input is currently called in softint. Some pr_input such as ICMP, ICMPv6
and CARP can add/delete/update IP addresses and routing table entries. For
example, icmp6_redirect_input updates an a routing table entry and
nd6_ra_input may delete an IP address.

Basically such operations shouldn't be done in softint. That aside, we have
a reason to avoid the situation; psz/psref waits cannot be used in softint,
however they are required to work in such pr_input in the MP-safe world.

The change implements the workqueue pr_input framework called wqinput which
provides a means to defer pr_input of a protocol to workqueue easily.
Currently icmp_input, icmp6_input, carp_proto_input and carp6_proto_input
are deferred to workqueue by the framework.

Proposed and discussed on tech-kern and tech-net


# 1.206 16-Jan-2017 christos

ip6_sprintf -> IN6_PRINT so that we pass the size.


# 1.205 16-Jan-2017 ryo

Make ip6_sprintf(), in_fmtaddr(), lla_snprintf() and icmp6_redirect_diag() mpsafe.

Reviewed by ozaki-r@


Revision tags: bouyer-socketcan-base
# 1.204 13-Jan-2017 ozaki-r

branches: 1.204.2;
Tweak icmp6_input; always use off, not *offp


Revision tags: pgoyette-localcount-20170107
# 1.203 12-Dec-2016 ozaki-r

Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.202 11-Dec-2016 ozaki-r

Correct sanity checks of icmp6_redirect_output

- rt->rt_ifp is always non-NULL
- Checking RTF_UP here is just racy and meaningless
- The arguments should be non-NULL (at least for now)


Revision tags: nick-nhusb-base-20161204
# 1.201 15-Nov-2016 mlelstv

Enforce alignment requirements that are violated in some cases.
For machines that don't need strict alignment (i386,amd64,vax,m68k) this
is a no-op.

Fixes PR kern/50766 but should be improved.


Revision tags: pgoyette-localcount-20161104
# 1.200 31-Oct-2016 ozaki-r

Fix race condition of in6_selectsrc

in6_selectsrc returned a pointer to in6_addr that wan't guaranteed to be
safe by pserialize (or psref), which was racy. Let callers pass a pointer
to in6_addr and in6_selectsrc copy a result to it inside pserialize
critical sections.


# 1.199 25-Oct-2016 ozaki-r

Remove unnecessary argument

No functional change.


# 1.198 18-Oct-2016 ozaki-r

Remove unnecessary pserialize_read_enter


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.197 26-Aug-2016 dholland

PR 51434 David Binderman: remove redundant test.


# 1.196 19-Aug-2016 roy

Revert r1.148
IP6_EXTHDR_GET ensures that a icmp6 header can be fetched from the mbuf
so m_pullup does not need to be called.

While here, we can safely increament interface error stats even with an
invalidated mbuf because we have a saved reference to the interface.


Revision tags: pgoyette-localcount-20160806
# 1.195 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


Revision tags: pgoyette-localcount-20160726
# 1.194 15-Jul-2016 ozaki-r

Use sin6tosa and sin6tocsa macros

No functional change.


# 1.193 15-Jul-2016 ozaki-r

Use ifatoia6 macro

No functional change.


Revision tags: pgoyette-localcount-base nick-nhusb-base-20160907
# 1.192 07-Jul-2016 ozaki-r

branches: 1.192.2;
Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.


# 1.191 05-Jul-2016 ozaki-r

Use ia6 or ia instead of ifa as a variable name of struct in6_ifaddr

We conventionally use ifa for struct ifaddr and use ia6 or ia for
struct in6_ifaddr.

No functional change.


# 1.190 28-Jun-2016 ozaki-r

Add missing NULL checks for m_get_rcvif_psref


# 1.189 21-Jun-2016 ozaki-r

Make sure returning ifp from in6_select* functions psref-ed

To this end, callers need to pass struct psref to the functions
and the fuctions acquire a reference of ifp with it. In some cases,
we can simply use if_get_byindex, however, in other cases
(say rt->rt_ifp and ia->ifa_ifp), we have no MP-safe way for now.
In order to take a reference anyway we use non MP-safe function
if_acquire_NOMPSAFE for the latter cases. They should be fixed in
the future somehow.


# 1.188 10-Jun-2016 ozaki-r

Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.


# 1.187 10-Jun-2016 ozaki-r

Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.186 18-May-2016 ozaki-r

Don't try to get outif unnecessarily from in6_selectsrc

The got outif is unused.


# 1.185 17-May-2016 ozaki-r

Get rcvif once and reuse it

No functional change.


# 1.184 17-May-2016 ozaki-r

Make sure icmp6_redirect_input frees mbuf before return


# 1.183 12-May-2016 ozaki-r

Protect ifnet list with psz and psref

The change ensures that ifnet objects in the ifnet list aren't freed during
list iterations by using pserialize(9) and psref(9).

Note that the change adds a pslist(9) for ifnet but doesn't remove the
original ifnet list (ifnet_list) to avoid breaking kvm(3) users. We
shouldn't use the original list in the kernel anymore.


Revision tags: nick-nhusb-base-20160422
# 1.182 04-Apr-2016 ozaki-r

Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html


# 1.181 01-Apr-2016 ozaki-r

Remove unnecessary casts and do s/0/NULL/ for rtrequest


# 1.180 01-Apr-2016 ozaki-r

Refine nd6log

Add __func__ to nd6log itself instead of adding it to callers.


Revision tags: nick-nhusb-base-20160319
# 1.179 21-Jan-2016 riastradh

Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.


# 1.178 21-Jan-2016 riastradh

Give proper prototype to ip_output.


Revision tags: nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.177 14-Sep-2015 ozaki-r

Update icmp6_redirect_timeout_q when changing net.inet6.icmp6.redirtimeout

We have to update icmp6_redirect_timeout_q as well as icmp6_redirtimeout
when changing net.inet6.icmp6.redirtimeout via sysctl. The updating logic
is copied from sysctl_net_inet_icmp_redirtimeout.

This change is from s-yamaguchi@IIJ (with KNF by ozaki-r) and fixes
PR kern/50240.


# 1.176 31-Aug-2015 ozaki-r

Make rt_refcnt take into account rt_timer


# 1.175 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.174 24-Aug-2015 ozaki-r

Change 0 to NULL for rtrequest's last argument (struct rtentry **ret_nrt)


# 1.173 07-Aug-2015 ozaki-r

Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.


# 1.172 24-Jul-2015 ozaki-r

Fix rtfree-ing wrong rtentry


# 1.171 17-Jul-2015 ozaki-r

Reform use of rt_refcnt

rt_refcnt of rtentry was used in bad manners, for example, direct rt_refcnt++
and rt_refcnt-- outside route.c, "rt->rt_refcnt++; rtfree(rt);" idiom, and
touching rt after rt->rt_refcnt--.

These abuses seem to be needed because rt_refcnt manages only references
between rtentry and doesn't take care of references during packet processing
(IOW references from local variables). In order to reduce the above abuses,
the latter cases should be counted by rt_refcnt as well as the former cases.

This change improves consistency of use of rt_refcnt:
- rtentry is always accessed with rt_refcnt incremented
- rtentry's rt_refcnt is decremented after use (rtfree is always used instead
of rt_refcnt--)
- functions returning rtentry increment its rt_refcnt (and caller rtfree it)

Note that rt_refcnt prevents rtentry from being freed but doesn't prevent
rtentry from being updated. Toward MP-safe, we need to provide another
protection for rtentry, e.g., locks. (Or introduce a better data structure
allowing concurrent readers during updates.)


Revision tags: nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.170 25-Nov-2014 christos

branches: 1.170.2;
CID 977389: Out of bounds access.


Revision tags: netbsd-7-0-2-RELEASE netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.169 06-Jun-2014 rmind

branches: 1.169.2;
- Eliminate RTFREE() macro in favour of rtfree() function.
- Make rtcache() function static.


# 1.168 30-May-2014 christos

Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.


# 1.167 19-May-2014 rmind

- Split off PRU_ATTACH and PRU_DETACH logic into separate functions.
- Replace malloc with kmem and eliminate M_PCB while here.
- Sprinkle more asserts.


Revision tags: rmind-smpnet-nbase rmind-smpnet-base
# 1.166 18-May-2014 rmind

Use IFNET_FIRST() rather than open coding ifnet access.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.165 25-Feb-2014 pooka

branches: 1.165.2;
Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.


# 1.164 20-Feb-2014 joerg

Bail out in case m_pulldown failed.


# 1.163 23-Nov-2013 christos

convert from CIRCLEQ to TAILQ.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.162 05-Jun-2013 christos

branches: 1.162.2;
IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.


Revision tags: agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.161 23-Jun-2012 christos

branches: 1.161.2;
4 new sysctls to avoid ipv6 DoS attacks from OpenBSD


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.160 22-Mar-2012 drochner

remove KAME IPSEC, replaced by FAST_IPSEC


Revision tags: netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.159 31-Dec-2011 christos

branches: 1.159.2; 1.159.6; 1.159.8;
- fix offsetof usage, and redundant defines
- kill pointer casts to 0


# 1.158 19-Dec-2011 drochner

rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.157 31-Aug-2011 plunky

branches: 1.157.2; 1.157.6;
NULL does not need a cast


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 rmind-uvmplock-base
# 1.156 12-Sep-2010 drochner

avoid NULL dereference in error case


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.155 18-Oct-2009 christos

branches: 1.155.2; 1.155.4;
fix the sun2 case for real.


# 1.154 12-Oct-2009 christos

unbreak sun2.


# 1.153 16-Sep-2009 pooka

Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.152 18-Mar-2009 cegger

bzero -> memset


# 1.151 18-Mar-2009 cegger

bcmp -> memcmp


Revision tags: netbsd-5-2-3-RELEASE netbsd-5-1-5-RELEASE netbsd-5-2-2-RELEASE netbsd-5-1-4-RELEASE netbsd-5-2-1-RELEASE netbsd-5-1-3-RELEASE netbsd-5-2-RELEASE netbsd-5-2-RC1 netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 nick-hppapmap-base2 netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 haad-dm-base mjf-devfs2-base
# 1.150 03-Oct-2008 adrianp

branches: 1.150.2; 1.150.8;
Fix for CVE-2008-3530 from matt@
Implement improved checking for MTU values on ICMP 'Packet Too Big Messages'


Revision tags: wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.149 06-Aug-2008 plunky

Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base
# 1.148 07-May-2008 bouyer

branches: 1.148.2; 1.148.6;
Sync with ipv4 icmp_input(): make sure the mbuf is writable and
contains the entire icmp message befre calling icmp6_input().
should fix "panic: mbuf too short for IPv6 header" seen by several peoples.


# 1.147 04-May-2008 thorpej

Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577


Revision tags: yamt-nfs-mp-base
# 1.146 23-Apr-2008 thorpej

branches: 1.146.2;
Use <net/net_stats.h> / netstat_sysctl().


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.145 15-Apr-2008 thorpej

branches: 1.145.2;
Make ip6 and icmp6 stats per-cpu.


# 1.144 08-Apr-2008 thorpej

Change IPv6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ip6stat structure; old netstat
binaries will continue to work properly.


# 1.143 08-Apr-2008 thorpej

Change ICMP6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old icmp6stat structure; old netstat
binaries will continue to work properly.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.142 27-Feb-2008 matt

Convert to ansi definitions from old-style definitons.
Remember that func() is not ansi, func(void) is.


Revision tags: nick-net80211-sync-base bouyer-xeni386-merge1 vmlocking2-base3 bouyer-xeni386-nbase yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 bouyer-xeni386-base yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase mjf-devfs-base matt-armv6-base jmcneill-pm-base hpcarm-cleanup-base reinoud-bufcleanup-base
# 1.141 04-Dec-2007 dyoung

branches: 1.141.8; 1.141.12;
Use IFNET_FOREACH() and IFADDR_FOREACH().


Revision tags: vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.140 01-Nov-2007 dyoung

branches: 1.140.2; 1.140.4;
De-__P().


# 1.139 29-Oct-2007 dyoung

The IPv6 stack labels incoming packets with an m_tag whose payload
is a struct ip6aux. A struct ip6aux used to contain a pointer to
an in6_ifaddr, but that pointer could become a dangling reference
in the lifetime of the m_tag, because ip6_setdstifaddr() did not
increase the in6_ifaddr's reference count. I have removed the
pointer from ip6aux. I load it with the interesting fields from
the in6_ifaddr (an IPv6 address, a scope ID, and some flags),
instead.


# 1.138 24-Oct-2007 dyoung

Replace rote sockaddr_in6 initializations (memset(), set sa6_family,
sa6_len, and sa6_add) with sockaddr_in6_init() calls.

De-__P(). Constify. KNF. Shorten a staircase. Change bcmp() to
memcmp().

Extract subroutine in6_setzoneid() from in6_setscope(), for re-use
soon.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base vmlocking-base
# 1.137 19-Sep-2007 dyoung

branches: 1.137.4;
1) Introduce a new socket option, (SOL_SOCKET, SO_NOHEADER), that
tells a socket that it should both add a protocol header to tx'd
datagrams and remove the header from rx'd datagrams:

int onoff = 1, s = socket(...);
setsockopt(s, SOL_SOCKET, SO_NOHEADER, &onoff);

2) Add an implementation of (SOL_SOCKET, SO_NOHEADER) for raw IPv4
sockets.

3) Reorganize the protocols' pr_ctloutput implementations a bit.
Consistently return ENOPROTOOPT when an option is unsupported,
and EINVAL if a supported option's arguments are incorrect.
Reorganize the flow of code so that it's more clear how/when
options are passed down the stack until they are handled.

Shorten some pr_ctloutput staircases for readability.

4) Extract common mbuf code into subroutines, add new sockaddr
methods, and introduce a new subroutine, fsocreate(), for reuse
later; use it first in sys_socket():

struct mbuf *m_getsombuf(struct socket *so)

Create an mbuf and make its owner the socket `so'.

struct mbuf *m_intopt(struct socket *so, int val)

Create an mbuf, make its owner the socket `so', put the
int `val' into it, and set its length to sizeof(int).


int fsocreate(..., int *fd)

Create a socket, a la socreate(9), put the socket into the
given LWP's descriptor table, return the descriptor at `fd'
on success.

void *sockaddr_addr(struct sockaddr *sa, socklen_t *slenp)
const void *sockaddr_const_addr(const struct sockaddr *sa, socklen_t *slenp)

Extract a pointer to the address part of a sockaddr. Write
the length of the address part at `slenp', if `slenp' is
not NULL.

socklen_t sockaddr_getlen(const struct sockaddr *sa)

Return the length of a sockaddr. This just evaluates to
sa->sa_len. I only add this for consistency with code that
appears in a portable userland library that I am going to
import.

const struct sockaddr *sockaddr_any(const struct sockaddr *sa)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.

const void *sockaddr_anyaddr(const struct sockaddr *sa, socklen_t *slenp)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.


Revision tags: nick-csl-alignment-base5
# 1.136 10-Aug-2007 dyoung

branches: 1.136.2;
Constify. bcopy -> memcpy.


Revision tags: matt-mips64-base
# 1.135 19-Jul-2007 dyoung

branches: 1.135.4; 1.135.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.134 13-Jun-2007 dyoung

branches: 1.134.2;
Persuasive programming: check M_UNWRITABLE(m, len) instead of
m->m_len<len before pulling up, because that helps make it clear
that we m_pullup() in order to guarantee that the contiguous region
is *writable*.


# 1.133 23-May-2007 christos

Ansify + add a few comments, from Karl Sj��dahl


Revision tags: yamt-idlelwp-base8
# 1.132 02-May-2007 dyoung

Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


Revision tags: thorpej-atomic-base
# 1.131 04-Mar-2007 christos

branches: 1.131.2; 1.131.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.130 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


# 1.129 10-Feb-2007 degroote

branches: 1.129.2;
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.128 29-Jan-2007 dyoung

bzero -> memset


# 1.127 15-Jan-2007 dyoung

Cosmetic: indent using ASCII horizontal tab, insert space following
comma, wrap line.


# 1.126 15-Jan-2007 degroote

Fix an infinite loop ( and local dos ) in the case where the ip6_hdr and
the icmp6_hdr are not in the same mbuf.
Fix pr/34994 and probably pr/35333
Ok @rpaulo


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.125 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.124 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


Revision tags: netbsd-4-base
# 1.123 16-Nov-2006 christos

branches: 1.123.2;
__unused removal on arguments; approved by core.


Revision tags: yamt-splraiseipl-base2
# 1.122 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.121 05-Sep-2006 dyoung

branches: 1.121.2; 1.121.4;
Simplify and repair icmp6_input() to stop the kernel from panicking
in m_copydata() when an ICMP6_ECHO_REQUEST is received, as reported
by Tatoku Ogaito on current-users@.


Revision tags: yamt-pdpolicy-base8
# 1.120 01-Sep-2006 dyoung

Vastly simplify the code that copies an ICMP6 packet to two data
paths: ICMP6 reply path, and socket path.


# 1.119 30-Aug-2006 christos

declare the type of code.


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.118 11-Jul-2006 tron

Clear mbuf checksum flags before passing it to ip6_output(). We might
recycle a mbuf which contained a hardware provided checksum. This
fixes "traceroute6" to a machine which is using a wm(4) interface
that has UDP or TCP checksum offload enabled.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base
# 1.117 07-Jun-2006 kardel

branches: 1.117.2;
merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base simonb-timecounters-base
# 1.116 15-Apr-2006 christos

branches: 1.116.2;
Coverity CID 740: Change constant comparisons to MCLBYTES to KASSERT and remove
extraneous tests.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2
# 1.115 05-Mar-2006 rpaulo

branches: 1.115.2; 1.115.4;
NDP-related improvements:
RFC4191
- supports host-side router-preference

RFC3542
- if DAD fails on a interface, disables IPv6 operation on the
interface
- don't advertise MLD report before DAD finishes

Others
- fixes integer overflow for valid and preferred lifetimes
- improves timer granularity for MLD, using callout-timer.
- reflects rtadvd's IPv6 host variable information into kernel
(router only)
- adds a sysctl option to enable/disable pMTUd for multicast
packets
- performs NUD on PPP/GRE interface by default
- Redirect works regardless of ip6_accept_rtadv
- removes RFC1885-related code

From the KAME project via SUZUKI Shinsuke.
Reviewed by core.


Revision tags: yamt-pdpolicy-base
# 1.114 03-Mar-2006 rpaulo

branches: 1.114.2;
Fix typos in comments.

From: the KAME project via SUZUKI Shinsuke.


Revision tags: yamt-uio_vmspace-base5
# 1.113 21-Jan-2006 rpaulo

branches: 1.113.2; 1.113.4;
Better support of IPv6 scoped addresses.

- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.
- and, while there, don't try to remove the ff02::/32 interface route
entry in in6_ifdetach() as it's already gone.

This also includes some level of support for the standard source
address selection algorithm defined in RFC3484, which will be
completed on in the future.

From the KAME project via JINMEI Tatuya.
Approved by core@.


# 1.112 11-Dec-2005 christos

branches: 1.112.2;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base ktrace-lwp-base
# 1.111 19-Oct-2005 bouyer

In icmp6_redirect_output(), sip6 is initialised to point to the data area of
m0. But m0 may be freed later, so trying to use sip6 at the end of this
function is wrong. My guess is that we want to reference the data area
of m (the mbuf about to be send) instead at this point.
Fix a panic on Xen (where a data area of a mbuf may be unmapped when the
mbuf is freed), and probably potential data/pool corruption in other cases.


Revision tags: yamt-vop-base
# 1.110 18-Aug-2005 yamt

branches: 1.110.2;
- introduce M_MOVE_PKTHDR and use it where appropriate.
intended to be mostly API compatible with openbsd/freebsd.
- remove a glue #define in netipsec/ipsec_osdep.h.


# 1.109 29-May-2005 christos

branches: 1.109.2;
- avoid shadowed variables
- sprinkle const.


Revision tags: netbsd-3-1-1-RELEASE netbsd-3-0-3-RELEASE netbsd-3-1-RELEASE netbsd-3-0-2-RELEASE netbsd-3-1-RC4 netbsd-3-1-RC3 netbsd-3-1-RC2 netbsd-3-1-RC1 netbsd-3-0-1-RELEASE netbsd-3-0-RELEASE netbsd-3-0-RC6 netbsd-3-0-RC5 netbsd-3-0-RC4 netbsd-3-0-RC3 netbsd-3-0-RC2 netbsd-3-0-RC1 yamt-km-base4 yamt-km-base3 netbsd-3-base yamt-km-base2 yamt-km-base kent-audio2-base
# 1.108 17-Jan-2005 itojun

branches: 1.108.6; 1.108.8; 1.108.10;
shouldn't check code field on "packet too big" icmp6 message.


Revision tags: kent-audio1-beforemerge kent-audio1-base
# 1.107 25-May-2004 atatat

branches: 1.107.4;
Sysctl descriptions under net subtree (net.key not done)


Revision tags: netbsd-2-0-base
# 1.106 26-Mar-2004 itojun

branches: 1.106.2;
do not touch m->m_pkthdr.rcvif after m becomes invalid. Patrick Latifi


# 1.105 24-Mar-2004 atatat

Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.


# 1.104 17-Dec-2003 lha

Fix ICMPV6CTL_ND6_[DP]RLIST, they broke with new sysctl.
Makes ndp -r/ndp -p work again, patch from atatat


# 1.103 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.102 30-Oct-2003 simonb

Remove some assigned-to but otherwise unused variables.


# 1.101 04-Sep-2003 itojun

revamp inpcb/in6pcb so that they are more aligned with each other.
in6pcb lookup now uses hash(9).


# 1.100 25-Aug-2003 itojun

deref member in in6p directly, don't rely on existence of macro


# 1.99 22-Aug-2003 itojun

remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.


# 1.98 22-Aug-2003 itojun

change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.


# 1.97 22-Aug-2003 jonathan

Replace the set_socket() method of passing an extra struct socket*
argument to ip6_output() with a new explicit struct in6pcb* argument.
(The underlying socket can be obtained via in6pcb->inp6_socket.)

In preparation for fast-ipsec. Reviewed by itojun.


# 1.96 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.95 06-Aug-2003 itojun

m_cat may free mbuf on 2nd arg, so m_pkthdr manipulation has to happen
before m_cat call. from Julian Coleman via kame.


# 1.94 24-Jun-2003 itojun

branches: 1.94.2;
remove unneeded checks of accept_rtadv. from kame


# 1.93 24-Jun-2003 itojun

use time.tv_sec directly


# 1.92 06-Jun-2003 itojun

- sync up MLD declaration with RFC3542 (s/MLD6/MLD/)
- routing header declaration with RFC3542
(note: sizeof(ip6_rthdr0) has changed!)
also, sync up with RFC2460 routing header definition (no "strict" source
routing mode any more)

part of advanced API update (RFC2292 -> 3542).


# 1.91 03-Jun-2003 itojun

remove assumption on redirect header option processing. from kame


# 1.90 14-May-2003 itojun

always use PULLDOWN_TEST codepath.


# 1.89 31-Mar-2003 itojun

avoid mbuf leak in redirect header option attachment. more complete
fix to come. from kame


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.88 27-Sep-2002 provos

remove trailing \n in panic(). approved perry.


# 1.87 23-Sep-2002 simonb

Remove breaks after returns, unreachable returns and returns after
returns(!).


# 1.86 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame.


Revision tags: gehenna-devsw-base
# 1.85 30-Jul-2002 itojun

no need to check NULL mbuf, as we touch it already.
From: tedu <grendel@zeitbombe.org>


# 1.84 10-Jul-2002 itojun

correct ping6 -w result wth hostname with [A-Z]. PR 17540. sync w/kame


# 1.83 30-Jun-2002 thorpej

Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).


# 1.82 09-Jun-2002 itojun

whitespace cleanup


# 1.81 08-Jun-2002 itojun

whitespace cleanup


# 1.80 31-May-2002 itojun

do not mistakenly lock PMTUD route entry with RTV_MTU.


# 1.79 29-May-2002 christos

make this compile again.


# 1.78 29-May-2002 itojun

correct rmx_mtu value after PMTUD entry timeout (should be set to 0)


# 1.77 24-May-2002 itojun

extra blank line


# 1.76 24-May-2002 itojun

make a strict check before sending FQDN node information reply. sync w/kame


Revision tags: netbsd-1-6-base eeh-devprop-base newlock-base
# 1.75 05-Mar-2002 itojun

branches: 1.75.6; 1.75.8;
on redirect output, always try to attach target link layer address option.


Revision tags: ifpoll-base
# 1.74 21-Dec-2001 itojun

whitespace/costmetic sync w/kame


# 1.73 20-Dec-2001 itojun

centralize multicast group management (in6_join/leavegroup).
have a flag for ip6_output() to fragment to minimum MTU.
sync with kame


# 1.72 07-Dec-2001 itojun

correct timing to increment icmp6 MIB variables. sync with kame


# 1.71 13-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-mips-cache-base
# 1.70 29-Oct-2001 simonb

Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.


# 1.69 24-Oct-2001 itojun

more whitespace sync with kame


# 1.68 18-Oct-2001 itojun

branches: 1.68.2;
simplify per-if stats.


# 1.67 15-Oct-2001 itojun

sync with kame.
net.inet6.icmp6.nodeinfo is now a bitmap (2^0 = ping6 -w, 2^1 = ping6 -a).
give up local if there's mbuf alloc failures.
cope with ".." in hostname.
sync comments/whitespaces.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.66 22-Jun-2001 itojun

branches: 1.66.2;
remove RFC1885 compatibility code in #ifdef COMPAT_RFC1885, for icmp6
reply packet size consideration (obsolete, not used for a long time).
sync with kame


# 1.65 01-Jun-2001 itojun

use default hoplimit when incoming interface is not given to icmp6_reflect.
sync with kame


# 1.64 08-May-2001 itojun

correct faith prefix determination. use sys/netinet/if_faith.c:faithprefix()
to determine. sync with kame.
(without this change, non-faith socket may mistakenly accept for-faith traffic)


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.63 04-Apr-2001 itojun

make sure rcvif is sane on call to icmp6_reflect


# 1.62 30-Mar-2001 itojun

enable FAKE_LOOPBACK_IF case by default.
now traffic on loopback interface will be presented to bpf as normal wire
format packet (without KAME scopeid in s6_addr16[1]).

fix KAME PR 250 (host mistakenly accepts packets to fe80::x%lo0).

sync with kame.


# 1.61 21-Mar-2001 itojun

set rmx_mtu to L2 interface mtu, instead of 0, on mtudisc timeout.
ip6_output() change is for safety. sync with kame


# 1.60 08-Mar-2001 itojun

remove bogus rtfree. sync with kame. inspired by openbsd PR 1706.


# 1.59 01-Mar-2001 itojun

branches: 1.59.2;
make sure to enforce inbound ipsec policy checking, for any protocols on top
of ip (check it when final header is visited). sync with kame.
XXX kame team will need to re-check policy engine code


# 1.58 11-Feb-2001 itojun

pull latest kame pcbnotify code. synchronizes ICMPv6 path mtu discovery
behavior with other protocols (i.e. validation, use of hiwat/lowat).


# 1.57 11-Feb-2001 itojun

recover $NetBSD$ (removed by mistake)


# 1.56 10-Feb-2001 itojun

to sync with kame better, (1) remove register declaration for variables,
(2) sync whitespaces, (3) update comments. (4) bring in some of portability
and logging enhancements. no functional changes here.


# 1.55 08-Feb-2001 itojun

implement upper limit to icmp6 redirects (experimental, turned off)
negative value to {mtudisc,redirect}_{hi,lo}wat will turn off the limitation.
sync with kame.


# 1.54 07-Feb-2001 itojun

remove bogus DIAGNOSTIC. sync with kame


# 1.53 07-Feb-2001 itojun

during ip6/icmp6 inbound packet processing, do not call log() nor printf() in
normal operation (/var can get filled up by flodding bogus packets).
sysctl net.inet6.icmp6.nd6_debug will turn on diagnostic messages.
(#define ND6_DEBUG will turn it on by default)

improve stats in ND6 code.

lots of synchronziation with kame (including comments and cometic ones).


# 1.52 24-Jan-2001 itojun

- record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation


# 1.51 16-Jan-2001 itojun

s/ND6DEBUG/ND6_DEBUG/ to meet other places


# 1.50 08-Jan-2001 itojun

wrap icmp6 checksum error printf() into #ifdef ND6DEBUG.
sync with kame, NetBSD PR 11911.


# 1.49 11-Dec-2000 itojun

no need to rtalloc1() twice in pmtud. from kame


# 1.48 09-Dec-2000 itojun

update icmp6 too big validation. the change is necessary since pmtud is
mandatory for IPv6 (so we can't just validate by using connected pcb - we need
to allow traffic from unconnected pcb to do pmtud).
- if the traffic is validated by xx_ctlinput, allow up to "hiwat" pmtud
route entries.
- if the traffic was not validated by xx_ctlinput, allow up to "lowat" pmtud
route entries (there's upper limit, so bad guys cannot blow up our routing
table).
sync with kame

XXX need to think again about default hiwat/lowat value.
XXX victim selection to help starvation case


# 1.47 11-Nov-2000 itojun

improve spec conformance of node information query (07).
sync with kame.


# 1.46 18-Oct-2000 itojun

verify ICMPv6 too big messages based on TCP pcbs, and/or IPsec SA.
TODO: udp6, and sendto consideration. as pmtud is mandatory for IPv6,
it is rather important for us to support those cases.
TODO: more testing
TODO: kame sync


# 1.45 10-Oct-2000 itojun

sync with kame ($KAME$)


# 1.44 02-Oct-2000 itojun

fix compilation without INET. fix confusion between ipsecstat and ipsec6stat.
sync with kame.


# 1.43 16-Sep-2000 itojun

kame sys/netinet6/icmp6.c 1.140 -> 1.144
> in the check for the incoming redirect message, examine the gateway
> (from the routing table) only when the address family of the gateway is
> AF_INET6.


# 1.42 19-Aug-2000 itojun

- icmp6 nodeinfo: remove possibility of unaligned pointer access.
- jumbo payload output: fix incorrect mbuf manipulation
- pedant: align issues, mbuf assumption
(sync with kame)


# 1.41 03-Aug-2000 itojun

clearifications in icmp6 node query support.
XXX previous commit included "supported qtypes" icmp6 node query support.
sorry commit message was mistaken.


# 1.40 03-Aug-2000 itojun

correct typo in #define. ICMP6_NI_SUCESS -> SUCCESS (notice missing C).
sync with kame.


# 1.39 30-Jul-2000 itojun

sync comment with reality


# 1.38 28-Jul-2000 itojun

nuke the following sysctl variables. "ppsratelimit" should work better.
need to recompile sbin/sysctl after updating /usr/include.
net.inet.tcp.rstratelimit
net.inet.icmp.errratelimit
net.inet6.icmp6.errratelimit


# 1.37 09-Jul-2000 itojun

add ppsratelimit(9), which does event-per-sec rate limitation.
use it from icmp6 error rate limitation code.
XXX better name for the function?


# 1.36 07-Jul-2000 itojun

sync with kame.
introduce in6_{recover,embed}scope, for in-kernel scoped-address manipulation.
improve in6_pcbnotify.


# 1.35 06-Jul-2000 itojun

- do not use bitfield for router renumbering header.
- add protection mechanism against ND cache corruption due to bad NUD hints.
- more stats
- icmp6 pps limitation. TOOD: should implement ppsratecheck(9).


# 1.34 28-Jun-2000 mrg

<vm/vm.h> -> <uvm/uvm_extern.h>


Revision tags: netbsd-1-5-base
# 1.33 13-Jun-2000 itojun

branches: 1.33.2;
signedness issue with char, take 2. confirmed with i386 cc -funsigned-char.


# 1.32 13-Jun-2000 itojun

workaround to suppress warning on char == unsigned char arch.


# 1.31 12-Jun-2000 itojun

better conformance to draft-ietf-ipngwg-icmp-name-lookups-05.
the old code was chimera of 03 and 05 draft.

-n by default, since IPv6 reverse lookup takes too much time.
use -H to enable reverse name lookup.


Revision tags: minoura-xpg4dl-base
# 1.30 22-May-2000 itojun

branches: 1.30.2;
disallow negative numbers for ratelimit interval (tcp, icmp, icmp6).


# 1.29 09-May-2000 itojun

do not try NUD unless the gateway is a real neighbor.
real fix to KAME PR 245 (workaround has been implemented).


# 1.28 13-Apr-2000 itojun

do not return icmp6 error against icmp6 error.
(this is due to a bug in header chain chasing)


# 1.27 22-Mar-2000 itojun

use ip6_{last,next}hdr in icmp6 inbound packet parsing.


# 1.26 01-Mar-2000 itojun

introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)


# 1.25 28-Feb-2000 itojun

fix ICMPv6 redirect input. the bug can result in invalid ND entry.


# 1.24 28-Feb-2000 itojun

support draft-ietf-ipngwg-icmp-name-lookups-05.txt, drop support for
draft-ietf-ipngwg-icmp-name-lookups-04.txt.

There are certain bitfield change in 04 draft to 05 draft, which makes
04 "ping6 -a" and 05 "ping6 -a" not interoperable. sigh.


# 1.23 26-Feb-2000 itojun

bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.


# 1.22 17-Feb-2000 darrenr

Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.


# 1.21 15-Feb-2000 thorpej

Fix a couple of brainos in the last.


# 1.20 14-Feb-2000 thorpej

Use ratecheck() for ICMP6 rate limiting.


Revision tags: chs-ubc2-newbase
# 1.19 06-Feb-2000 itojun

fix include pathname for better rfc2292 compliance.


# 1.18 16-Jan-2000 itojun

add missing ipcomp cases.


# 1.17 07-Jan-2000 itohy

Rename variable "prep" for PReP port.


# 1.16 06-Jan-2000 itojun

remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...


# 1.15 05-Jan-2000 itojun

avoid panic on getsockopt(ICMPV6_FILTER).


# 1.14 02-Jan-2000 itojun

add net.inet6.icmp6.nodeinfo sysctl.
this allows you to disable/enable ICMPv6 node information query/reply
processing (which tells remote end the gethostname(3) setting, interface
addresses on the node, and some other things - documented in
draft-ietf-ipngwg-icmp-name-lookup* or something alike).

to test it, try ping6 -w ::1 with nodeinfo=0 and nodeinfo=1.
(sync with kame change)


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.13 15-Dec-1999 itojun

do not overwrite traffic class field when we write IPv6 version field.


# 1.12 13-Dec-1999 itojun

sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)


Revision tags: comdex-fall-1999-base fvdl-softdep-base
# 1.11 01-Oct-1999 itojun

branches: 1.11.2; 1.11.8;
consistent logging for icmp6 redirects
XXX should make logs 1-liner so that duplicated logs can be compressed
by syslog(8)?


Revision tags: chs-ubc2-base
# 1.10 31-Jul-1999 itojun

sync with recent KAME.
- loosen ipsec restriction on packet diredtion.
- revise icmp6 redirect handling on IsRouter bit.
- tcp/udp notification processing (link-local address case)
- cosmetic fixes (better code share across *BSD).


# 1.9 30-Jul-1999 itojun

remove reference to in6_systm.h (file itself will be removed afterwords)


# 1.8 22-Jul-1999 itojun

- implement IPv6 pmtud, which is necessary for TCP6.
- fix memory leak on SO_DEBUG over TCP.


# 1.7 22-Jul-1999 itojun

change unnecessary u_long/long into u_int32_t or something relevant.
more fixes should follow.


# 1.6 09-Jul-1999 thorpej

defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).


# 1.5 06-Jul-1999 itojun

sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups


# 1.4 06-Jul-1999 itojun

checked build on alpha and i386, with GENERIC.v6.
fixed several sizeof(void *) and sizeof(size_t) issues on alpha.

Thanks to: Dave Huang and Tim Rightnour


# 1.3 03-Jul-1999 thorpej

RCS ID police.


# 1.2 01-Jul-1999 itojun

branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.1 28-Jun-1999 itojun

branches: 1.1.2;
file icmp6.c was initially added on branch kame.


# 1.250 19-Feb-2021 christos

- Make ALIGNED_POINTER use __alignof(t) instead of sizeof(t). This is more
correct because it works with non-primitive types and provides the ABI
alignment for the type the compiler will use.
- Remove all the *_HDR_ALIGNMENT macros and asserts
- Replace POINTER_ALIGNED_P with ACCESSIBLE_POINTER which is identical to
ALIGNED_POINTER, but returns that the pointer is always aligned if the
CPU supports unaligned accesses.
[ as proposed in tech-kern ]


# 1.249 15-Feb-2021 martin

Fix the build.
Maybe there should be a ICMP6_HDR_ALIGNMENT, but for now there is
only IP6_HDR_ALIGNMENT.


# 1.248 14-Feb-2021 christos

- centralize header align and pullup into a single inline function
- use a single macro to align pointers and expose the alignment, instead
of hard-coding 3 in 1/2 the macros.
- fix an issue in the ipv6 lt2p where it was aligning for ipv4 and pulling
for ipv6.


Revision tags: thorpej-futex-base
# 1.247 11-Sep-2020 roy

inet6: Use generic Neighor Detection rather than IPv6 specific

No functional change intended.


# 1.246 27-Jul-2020 roy

icmp6: Remove __packed attribute from icmp6 structures

They should naturally align.
Add compile time assertations to icmp6.c to prove this.


# 1.245 12-Jun-2020 roy

Remove in-kernel handling of Router Advertisements

This is much better handled by a user-land tool.
Proposed on tech-net here:
https://mail-index.netbsd.org/tech-net/2020/04/22/msg007766.html

Note that the ioctl SIOCGIFINFO_IN6 no longer sets flags. That now
needs to be done using the pre-existing SIOCSIFINFO_FLAGS ioctl.

Compat is fully provided where it makes sense, but trying to turn on
RA handling will obviously throw an error as it no longer exists.

Note that if you use IPv6 temporary addresses, this now needs to be
turned on in dhcpcd.conf(5) rather than in sysctl.conf(5).


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base phil-wifi-20200406
# 1.244 09-Mar-2020 roy

route: RTM_MISS now puts the message source address in RTA_AUTHOR

route(8) also reports this.
A userland app could use this to blacklist nodes who probe for machines
that doesn't exist on a subnet / prefix.


Revision tags: is-mlppp-base ad-namecache-base3 ad-namecache-base2 ad-namecache-base1 ad-namecache-base phil-wifi-20191119
# 1.243 06-Oct-2019 uwe

icmp6_notify_error - fix ctlfunc typedef to match pr_ctlinput,
drop the cast that is no longer necessary.


Revision tags: netbsd-9-1-RELEASE netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.242 22-Dec-2018 maxv

Replace: M_COPY_PKTHDR -> m_copy_pkthdr. No functional change, since the
former is a macro to the latter.


# 1.241 22-Dec-2018 maxv

Replace: M_MOVE_PKTHDR -> m_move_pkthdr. No functional change, since the
former is a macro to the latter.


Revision tags: pgoyette-compat-1126
# 1.240 25-Oct-2018 ozaki-r

Remove a leftover debug printf

Pointed out by hannken@


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.239 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


Revision tags: pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625
# 1.238 01-Jun-2018 ozaki-r

branches: 1.238.2;
Fix _rt_free via rtrequest(RTM_DELETE) hangs in rt_timer handlers

A rt_timer handler is passed a rtentry with an extra reference that avoids the
rtentry is accidentally released. So rt_timer handers must release the reference
of a passed rtentry by themselves (but they didn't).


Revision tags: pgoyette-compat-0521
# 1.237 07-May-2018 maxv

Remove misleading comments.


Revision tags: pgoyette-compat-0502
# 1.236 01-May-2018 maxv

Remove now unused net_osdep.h includes, the other BSDs did the same.


# 1.235 29-Apr-2018 maxv

Replace
m_copym(m, 0, M_COPYALL, M_DONTWAIT)
by
m_copypacket(m, M_DONTWAIT)
when it is obvious that 'm' has M_PKTHDR set.


# 1.234 28-Apr-2018 maxv

Remove unused ipsec_var.h includes.


# 1.233 27-Apr-2018 maxv

Fix a bug introduced in rev1.154 (2009). mcl_cache still has a size of
MCLBYTES, so the area allocated is still too small.

I think it should have been MEXTMALLOC, and of course I can't test my
change.


# 1.232 26-Apr-2018 maxv

Stop using m_copy(), use m_copym() directly. m_copy is useless,
undocumented and confusing.


# 1.231 26-Apr-2018 maxv

Use M_UNWRITABLE, no functional change.


Revision tags: pgoyette-compat-0422 pgoyette-compat-0415
# 1.230 14-Apr-2018 maxv

Fix 'icmp6len', it shouldn't be ip6_plen, because we may not be at the
beginning of the packet (off+ip6_plen is beyond the end of the mbuf). By
luck, the IP6_EXTHDR_GET that follows will fail and prevent buffer
overflows in non-jumbogram packets.

For jumbograms we will probably be in trouble here; but it doesn't seem
possible to craft reliably a jumbogram for a non-jumbogram-enabled device.

So I don't think it's a huge problem.


# 1.229 14-Apr-2018 maxv

Cosmetic, and remove one XXX (no problem).


# 1.228 14-Apr-2018 maxv

Remove the RH0 code from ICMPv6. RH0 is deprecated by RFC5095 (2007) for
security reasons. We already removed it in Route6.

In addition there was an mbuf bug here: calling IP6_EXTHDR_GET twice with
the same offset, but still using the pointer from the first call, which
could have been made invalid. By luck, m_pulldown leaves zero-sized mbufs
in place, instead of freeing them.

And in general, using a 'finaldst' pointer on the mbuf, and then modifying
that mbuf with IP6_EXTHDR_GET with a smaller offset, was really error-
prone.


# 1.227 14-Apr-2018 maxv

Remove dead code. It is the same as the non-obsolete one, since
ICMP6_DST_UNREACH_NOTNEIGHBOR == ICMP6_DST_UNREACH_BEYONDSCOPE,
and the code leads to the same errno value (EHOSTUNREACH).


# 1.226 12-Apr-2018 maxv

Synchronize the code between raw_ip6.c<->icmp6.c<->raw_ip.c, so that it is
the same everywhere.


# 1.225 12-Apr-2018 maxv

Remove misleading comment; we're just checking the SP, not verifying the
AH/ESP payload. While here style a bit.


Revision tags: pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322
# 1.224 21-Mar-2018 roy

Sprinkle more soroverflow().


Revision tags: pgoyette-compat-0315 pgoyette-compat-base
# 1.223 28-Feb-2018 maxv

branches: 1.223.2;
Remove unused ipsec_private.h includes.


# 1.222 26-Feb-2018 maxv

Remove redundant condition (harmless). PR/53030.


# 1.221 26-Feb-2018 maxv

Dedup: merge ipsec4_in_reject and ipsec6_in_reject into ipsec_in_reject.
While here fix misleading comment.

ok ozaki-r@


# 1.220 12-Feb-2018 maxv

Replace bcopy -> memcpy when it is obvious that the areas don't overlap.
Rearrange ip6_splithdr() for clarity.


# 1.219 23-Jan-2018 maxv

Style, localify, remove XXX when there's no issue, and switch 'extra'
to int.


# 1.218 23-Jan-2018 maxv

Fix the check on 'maxlen', we are not creating struct icmp6_hdr but
struct nd_redirect (which is bigger). Also, make sure we can add a
struct nd_opt_rd_hdr.

Normally this doesn't change anything, since the mbuf has IPV6_MMTU
bytes, and it's always way bigger than what we need.


# 1.217 23-Jan-2018 maxv

Fix info leak. We are allocating a slot of size:

roundup(sizeof(*nd_opt) + ifp->if_addrlen, 8)

But we are not filling in the padding caused by the roundup, and therefore
several bytes are leaked, in the mbuf we're about to send to the network.


# 1.216 23-Jan-2018 maxv

Fix twice the same mistake: 'last' can't be null, so there's no point in
having this misleading branch.


# 1.215 23-Jan-2018 maxv

Style, and four fixes:

* Remove the (disabled) IPPROTO_ESP check. If the packet was decrypted it
will have M_DECRYPTED, and this is already checked.

* Memory leaks in icmp6_error2. They seem hardly triggerable.

* Fix miscomputation in _icmp6_input, the ICMP6 header is not guaranteed
to be located right after the IP6 header. ok mlelstv@

* Memory leak in _icmp6_input. This one seems to be impossible to trigger.


Revision tags: tls-maxphys-base-20171202
# 1.214 05-Nov-2017 ozaki-r

Fix usages of ipsec_used

If IPsec isn't used, we must go back to the normal path.

PR kern/52659


Revision tags: nick-nhusb-base-20170825
# 1.213 02-Aug-2017 ozaki-r

Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.


Revision tags: perseant-stdc-iso10646-base
# 1.212 07-Jul-2017 knakahara

fix PR kern/52353. implemented by ozaki-r@n.o. I just commit by proxy.

XXX need to pullup to -8.


Revision tags: netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.211 14-Mar-2017 ozaki-r

branches: 1.211.6;
Replace DIAGNOSTIC + panic with CTASSERT


# 1.210 17-Feb-2017 ozaki-r

Rename if_acquire_NOMPSAFE to if_acquire

It can be used in MP-safe ways. So let's remove the confusing postfix.
If it's used in a unsafe way, warn NOMPSAFE in a comment.


# 1.209 13-Feb-2017 ozaki-r

Protect mtudisc and redirect stuffs of icmp/icmp6 with mutex

We have to run pr_init of icmp and icmp6 prior to tcp and tcp6 ones
for mutex initialization.


# 1.208 07-Feb-2017 ozaki-r

Add missing NULL checks for m_get_rcvif


Revision tags: nick-nhusb-base-20170204
# 1.207 02-Feb-2017 ozaki-r

Defer some pr_input to workqueue

pr_input is currently called in softint. Some pr_input such as ICMP, ICMPv6
and CARP can add/delete/update IP addresses and routing table entries. For
example, icmp6_redirect_input updates an a routing table entry and
nd6_ra_input may delete an IP address.

Basically such operations shouldn't be done in softint. That aside, we have
a reason to avoid the situation; psz/psref waits cannot be used in softint,
however they are required to work in such pr_input in the MP-safe world.

The change implements the workqueue pr_input framework called wqinput which
provides a means to defer pr_input of a protocol to workqueue easily.
Currently icmp_input, icmp6_input, carp_proto_input and carp6_proto_input
are deferred to workqueue by the framework.

Proposed and discussed on tech-kern and tech-net


# 1.206 16-Jan-2017 christos

ip6_sprintf -> IN6_PRINT so that we pass the size.


# 1.205 16-Jan-2017 ryo

Make ip6_sprintf(), in_fmtaddr(), lla_snprintf() and icmp6_redirect_diag() mpsafe.

Reviewed by ozaki-r@


Revision tags: bouyer-socketcan-base
# 1.204 13-Jan-2017 ozaki-r

branches: 1.204.2;
Tweak icmp6_input; always use off, not *offp


Revision tags: pgoyette-localcount-20170107
# 1.203 12-Dec-2016 ozaki-r

Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.202 11-Dec-2016 ozaki-r

Correct sanity checks of icmp6_redirect_output

- rt->rt_ifp is always non-NULL
- Checking RTF_UP here is just racy and meaningless
- The arguments should be non-NULL (at least for now)


Revision tags: nick-nhusb-base-20161204
# 1.201 15-Nov-2016 mlelstv

Enforce alignment requirements that are violated in some cases.
For machines that don't need strict alignment (i386,amd64,vax,m68k) this
is a no-op.

Fixes PR kern/50766 but should be improved.


Revision tags: pgoyette-localcount-20161104
# 1.200 31-Oct-2016 ozaki-r

Fix race condition of in6_selectsrc

in6_selectsrc returned a pointer to in6_addr that wan't guaranteed to be
safe by pserialize (or psref), which was racy. Let callers pass a pointer
to in6_addr and in6_selectsrc copy a result to it inside pserialize
critical sections.


# 1.199 25-Oct-2016 ozaki-r

Remove unnecessary argument

No functional change.


# 1.198 18-Oct-2016 ozaki-r

Remove unnecessary pserialize_read_enter


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.197 26-Aug-2016 dholland

PR 51434 David Binderman: remove redundant test.


# 1.196 19-Aug-2016 roy

Revert r1.148
IP6_EXTHDR_GET ensures that a icmp6 header can be fetched from the mbuf
so m_pullup does not need to be called.

While here, we can safely increament interface error stats even with an
invalidated mbuf because we have a saved reference to the interface.


Revision tags: pgoyette-localcount-20160806
# 1.195 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


Revision tags: pgoyette-localcount-20160726
# 1.194 15-Jul-2016 ozaki-r

Use sin6tosa and sin6tocsa macros

No functional change.


# 1.193 15-Jul-2016 ozaki-r

Use ifatoia6 macro

No functional change.


Revision tags: pgoyette-localcount-base nick-nhusb-base-20160907
# 1.192 07-Jul-2016 ozaki-r

branches: 1.192.2;
Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.


# 1.191 05-Jul-2016 ozaki-r

Use ia6 or ia instead of ifa as a variable name of struct in6_ifaddr

We conventionally use ifa for struct ifaddr and use ia6 or ia for
struct in6_ifaddr.

No functional change.


# 1.190 28-Jun-2016 ozaki-r

Add missing NULL checks for m_get_rcvif_psref


# 1.189 21-Jun-2016 ozaki-r

Make sure returning ifp from in6_select* functions psref-ed

To this end, callers need to pass struct psref to the functions
and the fuctions acquire a reference of ifp with it. In some cases,
we can simply use if_get_byindex, however, in other cases
(say rt->rt_ifp and ia->ifa_ifp), we have no MP-safe way for now.
In order to take a reference anyway we use non MP-safe function
if_acquire_NOMPSAFE for the latter cases. They should be fixed in
the future somehow.


# 1.188 10-Jun-2016 ozaki-r

Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.


# 1.187 10-Jun-2016 ozaki-r

Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.186 18-May-2016 ozaki-r

Don't try to get outif unnecessarily from in6_selectsrc

The got outif is unused.


# 1.185 17-May-2016 ozaki-r

Get rcvif once and reuse it

No functional change.


# 1.184 17-May-2016 ozaki-r

Make sure icmp6_redirect_input frees mbuf before return


# 1.183 12-May-2016 ozaki-r

Protect ifnet list with psz and psref

The change ensures that ifnet objects in the ifnet list aren't freed during
list iterations by using pserialize(9) and psref(9).

Note that the change adds a pslist(9) for ifnet but doesn't remove the
original ifnet list (ifnet_list) to avoid breaking kvm(3) users. We
shouldn't use the original list in the kernel anymore.


Revision tags: nick-nhusb-base-20160422
# 1.182 04-Apr-2016 ozaki-r

Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html


# 1.181 01-Apr-2016 ozaki-r

Remove unnecessary casts and do s/0/NULL/ for rtrequest


# 1.180 01-Apr-2016 ozaki-r

Refine nd6log

Add __func__ to nd6log itself instead of adding it to callers.


Revision tags: nick-nhusb-base-20160319
# 1.179 21-Jan-2016 riastradh

Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.


# 1.178 21-Jan-2016 riastradh

Give proper prototype to ip_output.


Revision tags: nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.177 14-Sep-2015 ozaki-r

Update icmp6_redirect_timeout_q when changing net.inet6.icmp6.redirtimeout

We have to update icmp6_redirect_timeout_q as well as icmp6_redirtimeout
when changing net.inet6.icmp6.redirtimeout via sysctl. The updating logic
is copied from sysctl_net_inet_icmp_redirtimeout.

This change is from s-yamaguchi@IIJ (with KNF by ozaki-r) and fixes
PR kern/50240.


# 1.176 31-Aug-2015 ozaki-r

Make rt_refcnt take into account rt_timer


# 1.175 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.174 24-Aug-2015 ozaki-r

Change 0 to NULL for rtrequest's last argument (struct rtentry **ret_nrt)


# 1.173 07-Aug-2015 ozaki-r

Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.


# 1.172 24-Jul-2015 ozaki-r

Fix rtfree-ing wrong rtentry


# 1.171 17-Jul-2015 ozaki-r

Reform use of rt_refcnt

rt_refcnt of rtentry was used in bad manners, for example, direct rt_refcnt++
and rt_refcnt-- outside route.c, "rt->rt_refcnt++; rtfree(rt);" idiom, and
touching rt after rt->rt_refcnt--.

These abuses seem to be needed because rt_refcnt manages only references
between rtentry and doesn't take care of references during packet processing
(IOW references from local variables). In order to reduce the above abuses,
the latter cases should be counted by rt_refcnt as well as the former cases.

This change improves consistency of use of rt_refcnt:
- rtentry is always accessed with rt_refcnt incremented
- rtentry's rt_refcnt is decremented after use (rtfree is always used instead
of rt_refcnt--)
- functions returning rtentry increment its rt_refcnt (and caller rtfree it)

Note that rt_refcnt prevents rtentry from being freed but doesn't prevent
rtentry from being updated. Toward MP-safe, we need to provide another
protection for rtentry, e.g., locks. (Or introduce a better data structure
allowing concurrent readers during updates.)


Revision tags: nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.170 25-Nov-2014 christos

branches: 1.170.2;
CID 977389: Out of bounds access.


Revision tags: netbsd-7-0-2-RELEASE netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.169 06-Jun-2014 rmind

branches: 1.169.2;
- Eliminate RTFREE() macro in favour of rtfree() function.
- Make rtcache() function static.


# 1.168 30-May-2014 christos

Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.


# 1.167 19-May-2014 rmind

- Split off PRU_ATTACH and PRU_DETACH logic into separate functions.
- Replace malloc with kmem and eliminate M_PCB while here.
- Sprinkle more asserts.


Revision tags: rmind-smpnet-nbase rmind-smpnet-base
# 1.166 18-May-2014 rmind

Use IFNET_FIRST() rather than open coding ifnet access.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.165 25-Feb-2014 pooka

branches: 1.165.2;
Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.


# 1.164 20-Feb-2014 joerg

Bail out in case m_pulldown failed.


# 1.163 23-Nov-2013 christos

convert from CIRCLEQ to TAILQ.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.162 05-Jun-2013 christos

branches: 1.162.2;
IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.


Revision tags: agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.161 23-Jun-2012 christos

branches: 1.161.2;
4 new sysctls to avoid ipv6 DoS attacks from OpenBSD


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.160 22-Mar-2012 drochner

remove KAME IPSEC, replaced by FAST_IPSEC


Revision tags: netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.159 31-Dec-2011 christos

branches: 1.159.2; 1.159.6; 1.159.8;
- fix offsetof usage, and redundant defines
- kill pointer casts to 0


# 1.158 19-Dec-2011 drochner

rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.157 31-Aug-2011 plunky

branches: 1.157.2; 1.157.6;
NULL does not need a cast


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 rmind-uvmplock-base
# 1.156 12-Sep-2010 drochner

avoid NULL dereference in error case


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.155 18-Oct-2009 christos

branches: 1.155.2; 1.155.4;
fix the sun2 case for real.


# 1.154 12-Oct-2009 christos

unbreak sun2.


# 1.153 16-Sep-2009 pooka

Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.152 18-Mar-2009 cegger

bzero -> memset


# 1.151 18-Mar-2009 cegger

bcmp -> memcmp


Revision tags: netbsd-5-2-3-RELEASE netbsd-5-1-5-RELEASE netbsd-5-2-2-RELEASE netbsd-5-1-4-RELEASE netbsd-5-2-1-RELEASE netbsd-5-1-3-RELEASE netbsd-5-2-RELEASE netbsd-5-2-RC1 netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 nick-hppapmap-base2 netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 haad-dm-base mjf-devfs2-base
# 1.150 03-Oct-2008 adrianp

branches: 1.150.2; 1.150.8;
Fix for CVE-2008-3530 from matt@
Implement improved checking for MTU values on ICMP 'Packet Too Big Messages'


Revision tags: wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.149 06-Aug-2008 plunky

Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base
# 1.148 07-May-2008 bouyer

branches: 1.148.2; 1.148.6;
Sync with ipv4 icmp_input(): make sure the mbuf is writable and
contains the entire icmp message befre calling icmp6_input().
should fix "panic: mbuf too short for IPv6 header" seen by several peoples.


# 1.147 04-May-2008 thorpej

Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577


Revision tags: yamt-nfs-mp-base
# 1.146 23-Apr-2008 thorpej

branches: 1.146.2;
Use <net/net_stats.h> / netstat_sysctl().


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.145 15-Apr-2008 thorpej

branches: 1.145.2;
Make ip6 and icmp6 stats per-cpu.


# 1.144 08-Apr-2008 thorpej

Change IPv6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ip6stat structure; old netstat
binaries will continue to work properly.


# 1.143 08-Apr-2008 thorpej

Change ICMP6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old icmp6stat structure; old netstat
binaries will continue to work properly.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.142 27-Feb-2008 matt

Convert to ansi definitions from old-style definitons.
Remember that func() is not ansi, func(void) is.


Revision tags: nick-net80211-sync-base bouyer-xeni386-merge1 vmlocking2-base3 bouyer-xeni386-nbase yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 bouyer-xeni386-base yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase mjf-devfs-base matt-armv6-base jmcneill-pm-base hpcarm-cleanup-base reinoud-bufcleanup-base
# 1.141 04-Dec-2007 dyoung

branches: 1.141.8; 1.141.12;
Use IFNET_FOREACH() and IFADDR_FOREACH().


Revision tags: vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.140 01-Nov-2007 dyoung

branches: 1.140.2; 1.140.4;
De-__P().


# 1.139 29-Oct-2007 dyoung

The IPv6 stack labels incoming packets with an m_tag whose payload
is a struct ip6aux. A struct ip6aux used to contain a pointer to
an in6_ifaddr, but that pointer could become a dangling reference
in the lifetime of the m_tag, because ip6_setdstifaddr() did not
increase the in6_ifaddr's reference count. I have removed the
pointer from ip6aux. I load it with the interesting fields from
the in6_ifaddr (an IPv6 address, a scope ID, and some flags),
instead.


# 1.138 24-Oct-2007 dyoung

Replace rote sockaddr_in6 initializations (memset(), set sa6_family,
sa6_len, and sa6_add) with sockaddr_in6_init() calls.

De-__P(). Constify. KNF. Shorten a staircase. Change bcmp() to
memcmp().

Extract subroutine in6_setzoneid() from in6_setscope(), for re-use
soon.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base vmlocking-base
# 1.137 19-Sep-2007 dyoung

branches: 1.137.4;
1) Introduce a new socket option, (SOL_SOCKET, SO_NOHEADER), that
tells a socket that it should both add a protocol header to tx'd
datagrams and remove the header from rx'd datagrams:

int onoff = 1, s = socket(...);
setsockopt(s, SOL_SOCKET, SO_NOHEADER, &onoff);

2) Add an implementation of (SOL_SOCKET, SO_NOHEADER) for raw IPv4
sockets.

3) Reorganize the protocols' pr_ctloutput implementations a bit.
Consistently return ENOPROTOOPT when an option is unsupported,
and EINVAL if a supported option's arguments are incorrect.
Reorganize the flow of code so that it's more clear how/when
options are passed down the stack until they are handled.

Shorten some pr_ctloutput staircases for readability.

4) Extract common mbuf code into subroutines, add new sockaddr
methods, and introduce a new subroutine, fsocreate(), for reuse
later; use it first in sys_socket():

struct mbuf *m_getsombuf(struct socket *so)

Create an mbuf and make its owner the socket `so'.

struct mbuf *m_intopt(struct socket *so, int val)

Create an mbuf, make its owner the socket `so', put the
int `val' into it, and set its length to sizeof(int).


int fsocreate(..., int *fd)

Create a socket, a la socreate(9), put the socket into the
given LWP's descriptor table, return the descriptor at `fd'
on success.

void *sockaddr_addr(struct sockaddr *sa, socklen_t *slenp)
const void *sockaddr_const_addr(const struct sockaddr *sa, socklen_t *slenp)

Extract a pointer to the address part of a sockaddr. Write
the length of the address part at `slenp', if `slenp' is
not NULL.

socklen_t sockaddr_getlen(const struct sockaddr *sa)

Return the length of a sockaddr. This just evaluates to
sa->sa_len. I only add this for consistency with code that
appears in a portable userland library that I am going to
import.

const struct sockaddr *sockaddr_any(const struct sockaddr *sa)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.

const void *sockaddr_anyaddr(const struct sockaddr *sa, socklen_t *slenp)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.


Revision tags: nick-csl-alignment-base5
# 1.136 10-Aug-2007 dyoung

branches: 1.136.2;
Constify. bcopy -> memcpy.


Revision tags: matt-mips64-base
# 1.135 19-Jul-2007 dyoung

branches: 1.135.4; 1.135.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.134 13-Jun-2007 dyoung

branches: 1.134.2;
Persuasive programming: check M_UNWRITABLE(m, len) instead of
m->m_len<len before pulling up, because that helps make it clear
that we m_pullup() in order to guarantee that the contiguous region
is *writable*.


# 1.133 23-May-2007 christos

Ansify + add a few comments, from Karl Sj��dahl


Revision tags: yamt-idlelwp-base8
# 1.132 02-May-2007 dyoung

Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


Revision tags: thorpej-atomic-base
# 1.131 04-Mar-2007 christos

branches: 1.131.2; 1.131.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.130 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


# 1.129 10-Feb-2007 degroote

branches: 1.129.2;
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.128 29-Jan-2007 dyoung

bzero -> memset


# 1.127 15-Jan-2007 dyoung

Cosmetic: indent using ASCII horizontal tab, insert space following
comma, wrap line.


# 1.126 15-Jan-2007 degroote

Fix an infinite loop ( and local dos ) in the case where the ip6_hdr and
the icmp6_hdr are not in the same mbuf.
Fix pr/34994 and probably pr/35333
Ok @rpaulo


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.125 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.124 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


Revision tags: netbsd-4-base
# 1.123 16-Nov-2006 christos

branches: 1.123.2;
__unused removal on arguments; approved by core.


Revision tags: yamt-splraiseipl-base2
# 1.122 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.121 05-Sep-2006 dyoung

branches: 1.121.2; 1.121.4;
Simplify and repair icmp6_input() to stop the kernel from panicking
in m_copydata() when an ICMP6_ECHO_REQUEST is received, as reported
by Tatoku Ogaito on current-users@.


Revision tags: yamt-pdpolicy-base8
# 1.120 01-Sep-2006 dyoung

Vastly simplify the code that copies an ICMP6 packet to two data
paths: ICMP6 reply path, and socket path.


# 1.119 30-Aug-2006 christos

declare the type of code.


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.118 11-Jul-2006 tron

Clear mbuf checksum flags before passing it to ip6_output(). We might
recycle a mbuf which contained a hardware provided checksum. This
fixes "traceroute6" to a machine which is using a wm(4) interface
that has UDP or TCP checksum offload enabled.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base
# 1.117 07-Jun-2006 kardel

branches: 1.117.2;
merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base simonb-timecounters-base
# 1.116 15-Apr-2006 christos

branches: 1.116.2;
Coverity CID 740: Change constant comparisons to MCLBYTES to KASSERT and remove
extraneous tests.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2
# 1.115 05-Mar-2006 rpaulo

branches: 1.115.2; 1.115.4;
NDP-related improvements:
RFC4191
- supports host-side router-preference

RFC3542
- if DAD fails on a interface, disables IPv6 operation on the
interface
- don't advertise MLD report before DAD finishes

Others
- fixes integer overflow for valid and preferred lifetimes
- improves timer granularity for MLD, using callout-timer.
- reflects rtadvd's IPv6 host variable information into kernel
(router only)
- adds a sysctl option to enable/disable pMTUd for multicast
packets
- performs NUD on PPP/GRE interface by default
- Redirect works regardless of ip6_accept_rtadv
- removes RFC1885-related code

From the KAME project via SUZUKI Shinsuke.
Reviewed by core.


Revision tags: yamt-pdpolicy-base
# 1.114 03-Mar-2006 rpaulo

branches: 1.114.2;
Fix typos in comments.

From: the KAME project via SUZUKI Shinsuke.


Revision tags: yamt-uio_vmspace-base5
# 1.113 21-Jan-2006 rpaulo

branches: 1.113.2; 1.113.4;
Better support of IPv6 scoped addresses.

- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.
- and, while there, don't try to remove the ff02::/32 interface route
entry in in6_ifdetach() as it's already gone.

This also includes some level of support for the standard source
address selection algorithm defined in RFC3484, which will be
completed on in the future.

From the KAME project via JINMEI Tatuya.
Approved by core@.


# 1.112 11-Dec-2005 christos

branches: 1.112.2;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base ktrace-lwp-base
# 1.111 19-Oct-2005 bouyer

In icmp6_redirect_output(), sip6 is initialised to point to the data area of
m0. But m0 may be freed later, so trying to use sip6 at the end of this
function is wrong. My guess is that we want to reference the data area
of m (the mbuf about to be send) instead at this point.
Fix a panic on Xen (where a data area of a mbuf may be unmapped when the
mbuf is freed), and probably potential data/pool corruption in other cases.


Revision tags: yamt-vop-base
# 1.110 18-Aug-2005 yamt

branches: 1.110.2;
- introduce M_MOVE_PKTHDR and use it where appropriate.
intended to be mostly API compatible with openbsd/freebsd.
- remove a glue #define in netipsec/ipsec_osdep.h.


# 1.109 29-May-2005 christos

branches: 1.109.2;
- avoid shadowed variables
- sprinkle const.


Revision tags: netbsd-3-1-1-RELEASE netbsd-3-0-3-RELEASE netbsd-3-1-RELEASE netbsd-3-0-2-RELEASE netbsd-3-1-RC4 netbsd-3-1-RC3 netbsd-3-1-RC2 netbsd-3-1-RC1 netbsd-3-0-1-RELEASE netbsd-3-0-RELEASE netbsd-3-0-RC6 netbsd-3-0-RC5 netbsd-3-0-RC4 netbsd-3-0-RC3 netbsd-3-0-RC2 netbsd-3-0-RC1 yamt-km-base4 yamt-km-base3 netbsd-3-base yamt-km-base2 yamt-km-base kent-audio2-base
# 1.108 17-Jan-2005 itojun

branches: 1.108.6; 1.108.8; 1.108.10;
shouldn't check code field on "packet too big" icmp6 message.


Revision tags: kent-audio1-beforemerge kent-audio1-base
# 1.107 25-May-2004 atatat

branches: 1.107.4;
Sysctl descriptions under net subtree (net.key not done)


Revision tags: netbsd-2-0-base
# 1.106 26-Mar-2004 itojun

branches: 1.106.2;
do not touch m->m_pkthdr.rcvif after m becomes invalid. Patrick Latifi


# 1.105 24-Mar-2004 atatat

Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.


# 1.104 17-Dec-2003 lha

Fix ICMPV6CTL_ND6_[DP]RLIST, they broke with new sysctl.
Makes ndp -r/ndp -p work again, patch from atatat


# 1.103 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.102 30-Oct-2003 simonb

Remove some assigned-to but otherwise unused variables.


# 1.101 04-Sep-2003 itojun

revamp inpcb/in6pcb so that they are more aligned with each other.
in6pcb lookup now uses hash(9).


# 1.100 25-Aug-2003 itojun

deref member in in6p directly, don't rely on existence of macro


# 1.99 22-Aug-2003 itojun

remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.


# 1.98 22-Aug-2003 itojun

change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.


# 1.97 22-Aug-2003 jonathan

Replace the set_socket() method of passing an extra struct socket*
argument to ip6_output() with a new explicit struct in6pcb* argument.
(The underlying socket can be obtained via in6pcb->inp6_socket.)

In preparation for fast-ipsec. Reviewed by itojun.


# 1.96 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.95 06-Aug-2003 itojun

m_cat may free mbuf on 2nd arg, so m_pkthdr manipulation has to happen
before m_cat call. from Julian Coleman via kame.


# 1.94 24-Jun-2003 itojun

branches: 1.94.2;
remove unneeded checks of accept_rtadv. from kame


# 1.93 24-Jun-2003 itojun

use time.tv_sec directly


# 1.92 06-Jun-2003 itojun

- sync up MLD declaration with RFC3542 (s/MLD6/MLD/)
- routing header declaration with RFC3542
(note: sizeof(ip6_rthdr0) has changed!)
also, sync up with RFC2460 routing header definition (no "strict" source
routing mode any more)

part of advanced API update (RFC2292 -> 3542).


# 1.91 03-Jun-2003 itojun

remove assumption on redirect header option processing. from kame


# 1.90 14-May-2003 itojun

always use PULLDOWN_TEST codepath.


# 1.89 31-Mar-2003 itojun

avoid mbuf leak in redirect header option attachment. more complete
fix to come. from kame


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.88 27-Sep-2002 provos

remove trailing \n in panic(). approved perry.


# 1.87 23-Sep-2002 simonb

Remove breaks after returns, unreachable returns and returns after
returns(!).


# 1.86 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame.


Revision tags: gehenna-devsw-base
# 1.85 30-Jul-2002 itojun

no need to check NULL mbuf, as we touch it already.
From: tedu <grendel@zeitbombe.org>


# 1.84 10-Jul-2002 itojun

correct ping6 -w result wth hostname with [A-Z]. PR 17540. sync w/kame


# 1.83 30-Jun-2002 thorpej

Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).


# 1.82 09-Jun-2002 itojun

whitespace cleanup


# 1.81 08-Jun-2002 itojun

whitespace cleanup


# 1.80 31-May-2002 itojun

do not mistakenly lock PMTUD route entry with RTV_MTU.


# 1.79 29-May-2002 christos

make this compile again.


# 1.78 29-May-2002 itojun

correct rmx_mtu value after PMTUD entry timeout (should be set to 0)


# 1.77 24-May-2002 itojun

extra blank line


# 1.76 24-May-2002 itojun

make a strict check before sending FQDN node information reply. sync w/kame


Revision tags: netbsd-1-6-base eeh-devprop-base newlock-base
# 1.75 05-Mar-2002 itojun

branches: 1.75.6; 1.75.8;
on redirect output, always try to attach target link layer address option.


Revision tags: ifpoll-base
# 1.74 21-Dec-2001 itojun

whitespace/costmetic sync w/kame


# 1.73 20-Dec-2001 itojun

centralize multicast group management (in6_join/leavegroup).
have a flag for ip6_output() to fragment to minimum MTU.
sync with kame


# 1.72 07-Dec-2001 itojun

correct timing to increment icmp6 MIB variables. sync with kame


# 1.71 13-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-mips-cache-base
# 1.70 29-Oct-2001 simonb

Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.


# 1.69 24-Oct-2001 itojun

more whitespace sync with kame


# 1.68 18-Oct-2001 itojun

branches: 1.68.2;
simplify per-if stats.


# 1.67 15-Oct-2001 itojun

sync with kame.
net.inet6.icmp6.nodeinfo is now a bitmap (2^0 = ping6 -w, 2^1 = ping6 -a).
give up local if there's mbuf alloc failures.
cope with ".." in hostname.
sync comments/whitespaces.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.66 22-Jun-2001 itojun

branches: 1.66.2;
remove RFC1885 compatibility code in #ifdef COMPAT_RFC1885, for icmp6
reply packet size consideration (obsolete, not used for a long time).
sync with kame


# 1.65 01-Jun-2001 itojun

use default hoplimit when incoming interface is not given to icmp6_reflect.
sync with kame


# 1.64 08-May-2001 itojun

correct faith prefix determination. use sys/netinet/if_faith.c:faithprefix()
to determine. sync with kame.
(without this change, non-faith socket may mistakenly accept for-faith traffic)


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.63 04-Apr-2001 itojun

make sure rcvif is sane on call to icmp6_reflect


# 1.62 30-Mar-2001 itojun

enable FAKE_LOOPBACK_IF case by default.
now traffic on loopback interface will be presented to bpf as normal wire
format packet (without KAME scopeid in s6_addr16[1]).

fix KAME PR 250 (host mistakenly accepts packets to fe80::x%lo0).

sync with kame.


# 1.61 21-Mar-2001 itojun

set rmx_mtu to L2 interface mtu, instead of 0, on mtudisc timeout.
ip6_output() change is for safety. sync with kame


# 1.60 08-Mar-2001 itojun

remove bogus rtfree. sync with kame. inspired by openbsd PR 1706.


# 1.59 01-Mar-2001 itojun

branches: 1.59.2;
make sure to enforce inbound ipsec policy checking, for any protocols on top
of ip (check it when final header is visited). sync with kame.
XXX kame team will need to re-check policy engine code


# 1.58 11-Feb-2001 itojun

pull latest kame pcbnotify code. synchronizes ICMPv6 path mtu discovery
behavior with other protocols (i.e. validation, use of hiwat/lowat).


# 1.57 11-Feb-2001 itojun

recover $NetBSD$ (removed by mistake)


# 1.56 10-Feb-2001 itojun

to sync with kame better, (1) remove register declaration for variables,
(2) sync whitespaces, (3) update comments. (4) bring in some of portability
and logging enhancements. no functional changes here.


# 1.55 08-Feb-2001 itojun

implement upper limit to icmp6 redirects (experimental, turned off)
negative value to {mtudisc,redirect}_{hi,lo}wat will turn off the limitation.
sync with kame.


# 1.54 07-Feb-2001 itojun

remove bogus DIAGNOSTIC. sync with kame


# 1.53 07-Feb-2001 itojun

during ip6/icmp6 inbound packet processing, do not call log() nor printf() in
normal operation (/var can get filled up by flodding bogus packets).
sysctl net.inet6.icmp6.nd6_debug will turn on diagnostic messages.
(#define ND6_DEBUG will turn it on by default)

improve stats in ND6 code.

lots of synchronziation with kame (including comments and cometic ones).


# 1.52 24-Jan-2001 itojun

- record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation


# 1.51 16-Jan-2001 itojun

s/ND6DEBUG/ND6_DEBUG/ to meet other places


# 1.50 08-Jan-2001 itojun

wrap icmp6 checksum error printf() into #ifdef ND6DEBUG.
sync with kame, NetBSD PR 11911.


# 1.49 11-Dec-2000 itojun

no need to rtalloc1() twice in pmtud. from kame


# 1.48 09-Dec-2000 itojun

update icmp6 too big validation. the change is necessary since pmtud is
mandatory for IPv6 (so we can't just validate by using connected pcb - we need
to allow traffic from unconnected pcb to do pmtud).
- if the traffic is validated by xx_ctlinput, allow up to "hiwat" pmtud
route entries.
- if the traffic was not validated by xx_ctlinput, allow up to "lowat" pmtud
route entries (there's upper limit, so bad guys cannot blow up our routing
table).
sync with kame

XXX need to think again about default hiwat/lowat value.
XXX victim selection to help starvation case


# 1.47 11-Nov-2000 itojun

improve spec conformance of node information query (07).
sync with kame.


# 1.46 18-Oct-2000 itojun

verify ICMPv6 too big messages based on TCP pcbs, and/or IPsec SA.
TODO: udp6, and sendto consideration. as pmtud is mandatory for IPv6,
it is rather important for us to support those cases.
TODO: more testing
TODO: kame sync


# 1.45 10-Oct-2000 itojun

sync with kame ($KAME$)


# 1.44 02-Oct-2000 itojun

fix compilation without INET. fix confusion between ipsecstat and ipsec6stat.
sync with kame.


# 1.43 16-Sep-2000 itojun

kame sys/netinet6/icmp6.c 1.140 -> 1.144
> in the check for the incoming redirect message, examine the gateway
> (from the routing table) only when the address family of the gateway is
> AF_INET6.


# 1.42 19-Aug-2000 itojun

- icmp6 nodeinfo: remove possibility of unaligned pointer access.
- jumbo payload output: fix incorrect mbuf manipulation
- pedant: align issues, mbuf assumption
(sync with kame)


# 1.41 03-Aug-2000 itojun

clearifications in icmp6 node query support.
XXX previous commit included "supported qtypes" icmp6 node query support.
sorry commit message was mistaken.


# 1.40 03-Aug-2000 itojun

correct typo in #define. ICMP6_NI_SUCESS -> SUCCESS (notice missing C).
sync with kame.


# 1.39 30-Jul-2000 itojun

sync comment with reality


# 1.38 28-Jul-2000 itojun

nuke the following sysctl variables. "ppsratelimit" should work better.
need to recompile sbin/sysctl after updating /usr/include.
net.inet.tcp.rstratelimit
net.inet.icmp.errratelimit
net.inet6.icmp6.errratelimit


# 1.37 09-Jul-2000 itojun

add ppsratelimit(9), which does event-per-sec rate limitation.
use it from icmp6 error rate limitation code.
XXX better name for the function?


# 1.36 07-Jul-2000 itojun

sync with kame.
introduce in6_{recover,embed}scope, for in-kernel scoped-address manipulation.
improve in6_pcbnotify.


# 1.35 06-Jul-2000 itojun

- do not use bitfield for router renumbering header.
- add protection mechanism against ND cache corruption due to bad NUD hints.
- more stats
- icmp6 pps limitation. TOOD: should implement ppsratecheck(9).


# 1.34 28-Jun-2000 mrg

<vm/vm.h> -> <uvm/uvm_extern.h>


Revision tags: netbsd-1-5-base
# 1.33 13-Jun-2000 itojun

branches: 1.33.2;
signedness issue with char, take 2. confirmed with i386 cc -funsigned-char.


# 1.32 13-Jun-2000 itojun

workaround to suppress warning on char == unsigned char arch.


# 1.31 12-Jun-2000 itojun

better conformance to draft-ietf-ipngwg-icmp-name-lookups-05.
the old code was chimera of 03 and 05 draft.

-n by default, since IPv6 reverse lookup takes too much time.
use -H to enable reverse name lookup.


Revision tags: minoura-xpg4dl-base
# 1.30 22-May-2000 itojun

branches: 1.30.2;
disallow negative numbers for ratelimit interval (tcp, icmp, icmp6).


# 1.29 09-May-2000 itojun

do not try NUD unless the gateway is a real neighbor.
real fix to KAME PR 245 (workaround has been implemented).


# 1.28 13-Apr-2000 itojun

do not return icmp6 error against icmp6 error.
(this is due to a bug in header chain chasing)


# 1.27 22-Mar-2000 itojun

use ip6_{last,next}hdr in icmp6 inbound packet parsing.


# 1.26 01-Mar-2000 itojun

introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)


# 1.25 28-Feb-2000 itojun

fix ICMPv6 redirect input. the bug can result in invalid ND entry.


# 1.24 28-Feb-2000 itojun

support draft-ietf-ipngwg-icmp-name-lookups-05.txt, drop support for
draft-ietf-ipngwg-icmp-name-lookups-04.txt.

There are certain bitfield change in 04 draft to 05 draft, which makes
04 "ping6 -a" and 05 "ping6 -a" not interoperable. sigh.


# 1.23 26-Feb-2000 itojun

bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.


# 1.22 17-Feb-2000 darrenr

Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.


# 1.21 15-Feb-2000 thorpej

Fix a couple of brainos in the last.


# 1.20 14-Feb-2000 thorpej

Use ratecheck() for ICMP6 rate limiting.


Revision tags: chs-ubc2-newbase
# 1.19 06-Feb-2000 itojun

fix include pathname for better rfc2292 compliance.


# 1.18 16-Jan-2000 itojun

add missing ipcomp cases.


# 1.17 07-Jan-2000 itohy

Rename variable "prep" for PReP port.


# 1.16 06-Jan-2000 itojun

remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...


# 1.15 05-Jan-2000 itojun

avoid panic on getsockopt(ICMPV6_FILTER).


# 1.14 02-Jan-2000 itojun

add net.inet6.icmp6.nodeinfo sysctl.
this allows you to disable/enable ICMPv6 node information query/reply
processing (which tells remote end the gethostname(3) setting, interface
addresses on the node, and some other things - documented in
draft-ietf-ipngwg-icmp-name-lookup* or something alike).

to test it, try ping6 -w ::1 with nodeinfo=0 and nodeinfo=1.
(sync with kame change)


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.13 15-Dec-1999 itojun

do not overwrite traffic class field when we write IPv6 version field.


# 1.12 13-Dec-1999 itojun

sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)


Revision tags: comdex-fall-1999-base fvdl-softdep-base
# 1.11 01-Oct-1999 itojun

branches: 1.11.2; 1.11.8;
consistent logging for icmp6 redirects
XXX should make logs 1-liner so that duplicated logs can be compressed
by syslog(8)?


Revision tags: chs-ubc2-base
# 1.10 31-Jul-1999 itojun

sync with recent KAME.
- loosen ipsec restriction on packet diredtion.
- revise icmp6 redirect handling on IsRouter bit.
- tcp/udp notification processing (link-local address case)
- cosmetic fixes (better code share across *BSD).


# 1.9 30-Jul-1999 itojun

remove reference to in6_systm.h (file itself will be removed afterwords)


# 1.8 22-Jul-1999 itojun

- implement IPv6 pmtud, which is necessary for TCP6.
- fix memory leak on SO_DEBUG over TCP.


# 1.7 22-Jul-1999 itojun

change unnecessary u_long/long into u_int32_t or something relevant.
more fixes should follow.


# 1.6 09-Jul-1999 thorpej

defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).


# 1.5 06-Jul-1999 itojun

sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups


# 1.4 06-Jul-1999 itojun

checked build on alpha and i386, with GENERIC.v6.
fixed several sizeof(void *) and sizeof(size_t) issues on alpha.

Thanks to: Dave Huang and Tim Rightnour


# 1.3 03-Jul-1999 thorpej

RCS ID police.


# 1.2 01-Jul-1999 itojun

branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.1 28-Jun-1999 itojun

branches: 1.1.2;
file icmp6.c was initially added on branch kame.


# 1.249 15-Feb-2021 martin

Fix the build.
Maybe there should be a ICMP6_HDR_ALIGNMENT, but for now there is
only IP6_HDR_ALIGNMENT.


# 1.248 14-Feb-2021 christos

- centralize header align and pullup into a single inline function
- use a single macro to align pointers and expose the alignment, instead
of hard-coding 3 in 1/2 the macros.
- fix an issue in the ipv6 lt2p where it was aligning for ipv4 and pulling
for ipv6.


Revision tags: thorpej-futex-base
# 1.247 11-Sep-2020 roy

inet6: Use generic Neighor Detection rather than IPv6 specific

No functional change intended.


# 1.246 27-Jul-2020 roy

icmp6: Remove __packed attribute from icmp6 structures

They should naturally align.
Add compile time assertations to icmp6.c to prove this.


# 1.245 12-Jun-2020 roy

Remove in-kernel handling of Router Advertisements

This is much better handled by a user-land tool.
Proposed on tech-net here:
https://mail-index.netbsd.org/tech-net/2020/04/22/msg007766.html

Note that the ioctl SIOCGIFINFO_IN6 no longer sets flags. That now
needs to be done using the pre-existing SIOCSIFINFO_FLAGS ioctl.

Compat is fully provided where it makes sense, but trying to turn on
RA handling will obviously throw an error as it no longer exists.

Note that if you use IPv6 temporary addresses, this now needs to be
turned on in dhcpcd.conf(5) rather than in sysctl.conf(5).


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base phil-wifi-20200406
# 1.244 09-Mar-2020 roy

route: RTM_MISS now puts the message source address in RTA_AUTHOR

route(8) also reports this.
A userland app could use this to blacklist nodes who probe for machines
that doesn't exist on a subnet / prefix.


Revision tags: is-mlppp-base ad-namecache-base3 ad-namecache-base2 ad-namecache-base1 ad-namecache-base phil-wifi-20191119
# 1.243 06-Oct-2019 uwe

icmp6_notify_error - fix ctlfunc typedef to match pr_ctlinput,
drop the cast that is no longer necessary.


Revision tags: netbsd-9-1-RELEASE netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.242 22-Dec-2018 maxv

Replace: M_COPY_PKTHDR -> m_copy_pkthdr. No functional change, since the
former is a macro to the latter.


# 1.241 22-Dec-2018 maxv

Replace: M_MOVE_PKTHDR -> m_move_pkthdr. No functional change, since the
former is a macro to the latter.


Revision tags: pgoyette-compat-1126
# 1.240 25-Oct-2018 ozaki-r

Remove a leftover debug printf

Pointed out by hannken@


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.239 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


Revision tags: pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625
# 1.238 01-Jun-2018 ozaki-r

branches: 1.238.2;
Fix _rt_free via rtrequest(RTM_DELETE) hangs in rt_timer handlers

A rt_timer handler is passed a rtentry with an extra reference that avoids the
rtentry is accidentally released. So rt_timer handers must release the reference
of a passed rtentry by themselves (but they didn't).


Revision tags: pgoyette-compat-0521
# 1.237 07-May-2018 maxv

Remove misleading comments.


Revision tags: pgoyette-compat-0502
# 1.236 01-May-2018 maxv

Remove now unused net_osdep.h includes, the other BSDs did the same.


# 1.235 29-Apr-2018 maxv

Replace
m_copym(m, 0, M_COPYALL, M_DONTWAIT)
by
m_copypacket(m, M_DONTWAIT)
when it is obvious that 'm' has M_PKTHDR set.


# 1.234 28-Apr-2018 maxv

Remove unused ipsec_var.h includes.


# 1.233 27-Apr-2018 maxv

Fix a bug introduced in rev1.154 (2009). mcl_cache still has a size of
MCLBYTES, so the area allocated is still too small.

I think it should have been MEXTMALLOC, and of course I can't test my
change.


# 1.232 26-Apr-2018 maxv

Stop using m_copy(), use m_copym() directly. m_copy is useless,
undocumented and confusing.


# 1.231 26-Apr-2018 maxv

Use M_UNWRITABLE, no functional change.


Revision tags: pgoyette-compat-0422 pgoyette-compat-0415
# 1.230 14-Apr-2018 maxv

Fix 'icmp6len', it shouldn't be ip6_plen, because we may not be at the
beginning of the packet (off+ip6_plen is beyond the end of the mbuf). By
luck, the IP6_EXTHDR_GET that follows will fail and prevent buffer
overflows in non-jumbogram packets.

For jumbograms we will probably be in trouble here; but it doesn't seem
possible to craft reliably a jumbogram for a non-jumbogram-enabled device.

So I don't think it's a huge problem.


# 1.229 14-Apr-2018 maxv

Cosmetic, and remove one XXX (no problem).


# 1.228 14-Apr-2018 maxv

Remove the RH0 code from ICMPv6. RH0 is deprecated by RFC5095 (2007) for
security reasons. We already removed it in Route6.

In addition there was an mbuf bug here: calling IP6_EXTHDR_GET twice with
the same offset, but still using the pointer from the first call, which
could have been made invalid. By luck, m_pulldown leaves zero-sized mbufs
in place, instead of freeing them.

And in general, using a 'finaldst' pointer on the mbuf, and then modifying
that mbuf with IP6_EXTHDR_GET with a smaller offset, was really error-
prone.


# 1.227 14-Apr-2018 maxv

Remove dead code. It is the same as the non-obsolete one, since
ICMP6_DST_UNREACH_NOTNEIGHBOR == ICMP6_DST_UNREACH_BEYONDSCOPE,
and the code leads to the same errno value (EHOSTUNREACH).


# 1.226 12-Apr-2018 maxv

Synchronize the code between raw_ip6.c<->icmp6.c<->raw_ip.c, so that it is
the same everywhere.


# 1.225 12-Apr-2018 maxv

Remove misleading comment; we're just checking the SP, not verifying the
AH/ESP payload. While here style a bit.


Revision tags: pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322
# 1.224 21-Mar-2018 roy

Sprinkle more soroverflow().


Revision tags: pgoyette-compat-0315 pgoyette-compat-base
# 1.223 28-Feb-2018 maxv

branches: 1.223.2;
Remove unused ipsec_private.h includes.


# 1.222 26-Feb-2018 maxv

Remove redundant condition (harmless). PR/53030.


# 1.221 26-Feb-2018 maxv

Dedup: merge ipsec4_in_reject and ipsec6_in_reject into ipsec_in_reject.
While here fix misleading comment.

ok ozaki-r@


# 1.220 12-Feb-2018 maxv

Replace bcopy -> memcpy when it is obvious that the areas don't overlap.
Rearrange ip6_splithdr() for clarity.


# 1.219 23-Jan-2018 maxv

Style, localify, remove XXX when there's no issue, and switch 'extra'
to int.


# 1.218 23-Jan-2018 maxv

Fix the check on 'maxlen', we are not creating struct icmp6_hdr but
struct nd_redirect (which is bigger). Also, make sure we can add a
struct nd_opt_rd_hdr.

Normally this doesn't change anything, since the mbuf has IPV6_MMTU
bytes, and it's always way bigger than what we need.


# 1.217 23-Jan-2018 maxv

Fix info leak. We are allocating a slot of size:

roundup(sizeof(*nd_opt) + ifp->if_addrlen, 8)

But we are not filling in the padding caused by the roundup, and therefore
several bytes are leaked, in the mbuf we're about to send to the network.


# 1.216 23-Jan-2018 maxv

Fix twice the same mistake: 'last' can't be null, so there's no point in
having this misleading branch.


# 1.215 23-Jan-2018 maxv

Style, and four fixes:

* Remove the (disabled) IPPROTO_ESP check. If the packet was decrypted it
will have M_DECRYPTED, and this is already checked.

* Memory leaks in icmp6_error2. They seem hardly triggerable.

* Fix miscomputation in _icmp6_input, the ICMP6 header is not guaranteed
to be located right after the IP6 header. ok mlelstv@

* Memory leak in _icmp6_input. This one seems to be impossible to trigger.


Revision tags: tls-maxphys-base-20171202
# 1.214 05-Nov-2017 ozaki-r

Fix usages of ipsec_used

If IPsec isn't used, we must go back to the normal path.

PR kern/52659


Revision tags: nick-nhusb-base-20170825
# 1.213 02-Aug-2017 ozaki-r

Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.


Revision tags: perseant-stdc-iso10646-base
# 1.212 07-Jul-2017 knakahara

fix PR kern/52353. implemented by ozaki-r@n.o. I just commit by proxy.

XXX need to pullup to -8.


Revision tags: netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.211 14-Mar-2017 ozaki-r

branches: 1.211.6;
Replace DIAGNOSTIC + panic with CTASSERT


# 1.210 17-Feb-2017 ozaki-r

Rename if_acquire_NOMPSAFE to if_acquire

It can be used in MP-safe ways. So let's remove the confusing postfix.
If it's used in a unsafe way, warn NOMPSAFE in a comment.


# 1.209 13-Feb-2017 ozaki-r

Protect mtudisc and redirect stuffs of icmp/icmp6 with mutex

We have to run pr_init of icmp and icmp6 prior to tcp and tcp6 ones
for mutex initialization.


# 1.208 07-Feb-2017 ozaki-r

Add missing NULL checks for m_get_rcvif


Revision tags: nick-nhusb-base-20170204
# 1.207 02-Feb-2017 ozaki-r

Defer some pr_input to workqueue

pr_input is currently called in softint. Some pr_input such as ICMP, ICMPv6
and CARP can add/delete/update IP addresses and routing table entries. For
example, icmp6_redirect_input updates an a routing table entry and
nd6_ra_input may delete an IP address.

Basically such operations shouldn't be done in softint. That aside, we have
a reason to avoid the situation; psz/psref waits cannot be used in softint,
however they are required to work in such pr_input in the MP-safe world.

The change implements the workqueue pr_input framework called wqinput which
provides a means to defer pr_input of a protocol to workqueue easily.
Currently icmp_input, icmp6_input, carp_proto_input and carp6_proto_input
are deferred to workqueue by the framework.

Proposed and discussed on tech-kern and tech-net


# 1.206 16-Jan-2017 christos

ip6_sprintf -> IN6_PRINT so that we pass the size.


# 1.205 16-Jan-2017 ryo

Make ip6_sprintf(), in_fmtaddr(), lla_snprintf() and icmp6_redirect_diag() mpsafe.

Reviewed by ozaki-r@


Revision tags: bouyer-socketcan-base
# 1.204 13-Jan-2017 ozaki-r

branches: 1.204.2;
Tweak icmp6_input; always use off, not *offp


Revision tags: pgoyette-localcount-20170107
# 1.203 12-Dec-2016 ozaki-r

Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.202 11-Dec-2016 ozaki-r

Correct sanity checks of icmp6_redirect_output

- rt->rt_ifp is always non-NULL
- Checking RTF_UP here is just racy and meaningless
- The arguments should be non-NULL (at least for now)


Revision tags: nick-nhusb-base-20161204
# 1.201 15-Nov-2016 mlelstv

Enforce alignment requirements that are violated in some cases.
For machines that don't need strict alignment (i386,amd64,vax,m68k) this
is a no-op.

Fixes PR kern/50766 but should be improved.


Revision tags: pgoyette-localcount-20161104
# 1.200 31-Oct-2016 ozaki-r

Fix race condition of in6_selectsrc

in6_selectsrc returned a pointer to in6_addr that wan't guaranteed to be
safe by pserialize (or psref), which was racy. Let callers pass a pointer
to in6_addr and in6_selectsrc copy a result to it inside pserialize
critical sections.


# 1.199 25-Oct-2016 ozaki-r

Remove unnecessary argument

No functional change.


# 1.198 18-Oct-2016 ozaki-r

Remove unnecessary pserialize_read_enter


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.197 26-Aug-2016 dholland

PR 51434 David Binderman: remove redundant test.


# 1.196 19-Aug-2016 roy

Revert r1.148
IP6_EXTHDR_GET ensures that a icmp6 header can be fetched from the mbuf
so m_pullup does not need to be called.

While here, we can safely increament interface error stats even with an
invalidated mbuf because we have a saved reference to the interface.


Revision tags: pgoyette-localcount-20160806
# 1.195 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


Revision tags: pgoyette-localcount-20160726
# 1.194 15-Jul-2016 ozaki-r

Use sin6tosa and sin6tocsa macros

No functional change.


# 1.193 15-Jul-2016 ozaki-r

Use ifatoia6 macro

No functional change.


Revision tags: pgoyette-localcount-base nick-nhusb-base-20160907
# 1.192 07-Jul-2016 ozaki-r

branches: 1.192.2;
Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.


# 1.191 05-Jul-2016 ozaki-r

Use ia6 or ia instead of ifa as a variable name of struct in6_ifaddr

We conventionally use ifa for struct ifaddr and use ia6 or ia for
struct in6_ifaddr.

No functional change.


# 1.190 28-Jun-2016 ozaki-r

Add missing NULL checks for m_get_rcvif_psref


# 1.189 21-Jun-2016 ozaki-r

Make sure returning ifp from in6_select* functions psref-ed

To this end, callers need to pass struct psref to the functions
and the fuctions acquire a reference of ifp with it. In some cases,
we can simply use if_get_byindex, however, in other cases
(say rt->rt_ifp and ia->ifa_ifp), we have no MP-safe way for now.
In order to take a reference anyway we use non MP-safe function
if_acquire_NOMPSAFE for the latter cases. They should be fixed in
the future somehow.


# 1.188 10-Jun-2016 ozaki-r

Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.


# 1.187 10-Jun-2016 ozaki-r

Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.186 18-May-2016 ozaki-r

Don't try to get outif unnecessarily from in6_selectsrc

The got outif is unused.


# 1.185 17-May-2016 ozaki-r

Get rcvif once and reuse it

No functional change.


# 1.184 17-May-2016 ozaki-r

Make sure icmp6_redirect_input frees mbuf before return


# 1.183 12-May-2016 ozaki-r

Protect ifnet list with psz and psref

The change ensures that ifnet objects in the ifnet list aren't freed during
list iterations by using pserialize(9) and psref(9).

Note that the change adds a pslist(9) for ifnet but doesn't remove the
original ifnet list (ifnet_list) to avoid breaking kvm(3) users. We
shouldn't use the original list in the kernel anymore.


Revision tags: nick-nhusb-base-20160422
# 1.182 04-Apr-2016 ozaki-r

Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html


# 1.181 01-Apr-2016 ozaki-r

Remove unnecessary casts and do s/0/NULL/ for rtrequest


# 1.180 01-Apr-2016 ozaki-r

Refine nd6log

Add __func__ to nd6log itself instead of adding it to callers.


Revision tags: nick-nhusb-base-20160319
# 1.179 21-Jan-2016 riastradh

Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.


# 1.178 21-Jan-2016 riastradh

Give proper prototype to ip_output.


Revision tags: nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.177 14-Sep-2015 ozaki-r

Update icmp6_redirect_timeout_q when changing net.inet6.icmp6.redirtimeout

We have to update icmp6_redirect_timeout_q as well as icmp6_redirtimeout
when changing net.inet6.icmp6.redirtimeout via sysctl. The updating logic
is copied from sysctl_net_inet_icmp_redirtimeout.

This change is from s-yamaguchi@IIJ (with KNF by ozaki-r) and fixes
PR kern/50240.


# 1.176 31-Aug-2015 ozaki-r

Make rt_refcnt take into account rt_timer


# 1.175 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.174 24-Aug-2015 ozaki-r

Change 0 to NULL for rtrequest's last argument (struct rtentry **ret_nrt)


# 1.173 07-Aug-2015 ozaki-r

Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.


# 1.172 24-Jul-2015 ozaki-r

Fix rtfree-ing wrong rtentry


# 1.171 17-Jul-2015 ozaki-r

Reform use of rt_refcnt

rt_refcnt of rtentry was used in bad manners, for example, direct rt_refcnt++
and rt_refcnt-- outside route.c, "rt->rt_refcnt++; rtfree(rt);" idiom, and
touching rt after rt->rt_refcnt--.

These abuses seem to be needed because rt_refcnt manages only references
between rtentry and doesn't take care of references during packet processing
(IOW references from local variables). In order to reduce the above abuses,
the latter cases should be counted by rt_refcnt as well as the former cases.

This change improves consistency of use of rt_refcnt:
- rtentry is always accessed with rt_refcnt incremented
- rtentry's rt_refcnt is decremented after use (rtfree is always used instead
of rt_refcnt--)
- functions returning rtentry increment its rt_refcnt (and caller rtfree it)

Note that rt_refcnt prevents rtentry from being freed but doesn't prevent
rtentry from being updated. Toward MP-safe, we need to provide another
protection for rtentry, e.g., locks. (Or introduce a better data structure
allowing concurrent readers during updates.)


Revision tags: nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.170 25-Nov-2014 christos

branches: 1.170.2;
CID 977389: Out of bounds access.


Revision tags: netbsd-7-0-2-RELEASE netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.169 06-Jun-2014 rmind

branches: 1.169.2;
- Eliminate RTFREE() macro in favour of rtfree() function.
- Make rtcache() function static.


# 1.168 30-May-2014 christos

Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.


# 1.167 19-May-2014 rmind

- Split off PRU_ATTACH and PRU_DETACH logic into separate functions.
- Replace malloc with kmem and eliminate M_PCB while here.
- Sprinkle more asserts.


Revision tags: rmind-smpnet-nbase rmind-smpnet-base
# 1.166 18-May-2014 rmind

Use IFNET_FIRST() rather than open coding ifnet access.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.165 25-Feb-2014 pooka

branches: 1.165.2;
Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.


# 1.164 20-Feb-2014 joerg

Bail out in case m_pulldown failed.


# 1.163 23-Nov-2013 christos

convert from CIRCLEQ to TAILQ.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.162 05-Jun-2013 christos

branches: 1.162.2;
IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.


Revision tags: agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.161 23-Jun-2012 christos

branches: 1.161.2;
4 new sysctls to avoid ipv6 DoS attacks from OpenBSD


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.160 22-Mar-2012 drochner

remove KAME IPSEC, replaced by FAST_IPSEC


Revision tags: netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.159 31-Dec-2011 christos

branches: 1.159.2; 1.159.6; 1.159.8;
- fix offsetof usage, and redundant defines
- kill pointer casts to 0


# 1.158 19-Dec-2011 drochner

rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.157 31-Aug-2011 plunky

branches: 1.157.2; 1.157.6;
NULL does not need a cast


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 rmind-uvmplock-base
# 1.156 12-Sep-2010 drochner

avoid NULL dereference in error case


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.155 18-Oct-2009 christos

branches: 1.155.2; 1.155.4;
fix the sun2 case for real.


# 1.154 12-Oct-2009 christos

unbreak sun2.


# 1.153 16-Sep-2009 pooka

Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.152 18-Mar-2009 cegger

bzero -> memset


# 1.151 18-Mar-2009 cegger

bcmp -> memcmp


Revision tags: netbsd-5-2-3-RELEASE netbsd-5-1-5-RELEASE netbsd-5-2-2-RELEASE netbsd-5-1-4-RELEASE netbsd-5-2-1-RELEASE netbsd-5-1-3-RELEASE netbsd-5-2-RELEASE netbsd-5-2-RC1 netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 nick-hppapmap-base2 netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 haad-dm-base mjf-devfs2-base
# 1.150 03-Oct-2008 adrianp

branches: 1.150.2; 1.150.8;
Fix for CVE-2008-3530 from matt@
Implement improved checking for MTU values on ICMP 'Packet Too Big Messages'


Revision tags: wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.149 06-Aug-2008 plunky

Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base
# 1.148 07-May-2008 bouyer

branches: 1.148.2; 1.148.6;
Sync with ipv4 icmp_input(): make sure the mbuf is writable and
contains the entire icmp message befre calling icmp6_input().
should fix "panic: mbuf too short for IPv6 header" seen by several peoples.


# 1.147 04-May-2008 thorpej

Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577


Revision tags: yamt-nfs-mp-base
# 1.146 23-Apr-2008 thorpej

branches: 1.146.2;
Use <net/net_stats.h> / netstat_sysctl().


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.145 15-Apr-2008 thorpej

branches: 1.145.2;
Make ip6 and icmp6 stats per-cpu.


# 1.144 08-Apr-2008 thorpej

Change IPv6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ip6stat structure; old netstat
binaries will continue to work properly.


# 1.143 08-Apr-2008 thorpej

Change ICMP6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old icmp6stat structure; old netstat
binaries will continue to work properly.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.142 27-Feb-2008 matt

Convert to ansi definitions from old-style definitons.
Remember that func() is not ansi, func(void) is.


Revision tags: nick-net80211-sync-base bouyer-xeni386-merge1 vmlocking2-base3 bouyer-xeni386-nbase yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 bouyer-xeni386-base yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase mjf-devfs-base matt-armv6-base jmcneill-pm-base hpcarm-cleanup-base reinoud-bufcleanup-base
# 1.141 04-Dec-2007 dyoung

branches: 1.141.8; 1.141.12;
Use IFNET_FOREACH() and IFADDR_FOREACH().


Revision tags: vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.140 01-Nov-2007 dyoung

branches: 1.140.2; 1.140.4;
De-__P().


# 1.139 29-Oct-2007 dyoung

The IPv6 stack labels incoming packets with an m_tag whose payload
is a struct ip6aux. A struct ip6aux used to contain a pointer to
an in6_ifaddr, but that pointer could become a dangling reference
in the lifetime of the m_tag, because ip6_setdstifaddr() did not
increase the in6_ifaddr's reference count. I have removed the
pointer from ip6aux. I load it with the interesting fields from
the in6_ifaddr (an IPv6 address, a scope ID, and some flags),
instead.


# 1.138 24-Oct-2007 dyoung

Replace rote sockaddr_in6 initializations (memset(), set sa6_family,
sa6_len, and sa6_add) with sockaddr_in6_init() calls.

De-__P(). Constify. KNF. Shorten a staircase. Change bcmp() to
memcmp().

Extract subroutine in6_setzoneid() from in6_setscope(), for re-use
soon.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base vmlocking-base
# 1.137 19-Sep-2007 dyoung

branches: 1.137.4;
1) Introduce a new socket option, (SOL_SOCKET, SO_NOHEADER), that
tells a socket that it should both add a protocol header to tx'd
datagrams and remove the header from rx'd datagrams:

int onoff = 1, s = socket(...);
setsockopt(s, SOL_SOCKET, SO_NOHEADER, &onoff);

2) Add an implementation of (SOL_SOCKET, SO_NOHEADER) for raw IPv4
sockets.

3) Reorganize the protocols' pr_ctloutput implementations a bit.
Consistently return ENOPROTOOPT when an option is unsupported,
and EINVAL if a supported option's arguments are incorrect.
Reorganize the flow of code so that it's more clear how/when
options are passed down the stack until they are handled.

Shorten some pr_ctloutput staircases for readability.

4) Extract common mbuf code into subroutines, add new sockaddr
methods, and introduce a new subroutine, fsocreate(), for reuse
later; use it first in sys_socket():

struct mbuf *m_getsombuf(struct socket *so)

Create an mbuf and make its owner the socket `so'.

struct mbuf *m_intopt(struct socket *so, int val)

Create an mbuf, make its owner the socket `so', put the
int `val' into it, and set its length to sizeof(int).


int fsocreate(..., int *fd)

Create a socket, a la socreate(9), put the socket into the
given LWP's descriptor table, return the descriptor at `fd'
on success.

void *sockaddr_addr(struct sockaddr *sa, socklen_t *slenp)
const void *sockaddr_const_addr(const struct sockaddr *sa, socklen_t *slenp)

Extract a pointer to the address part of a sockaddr. Write
the length of the address part at `slenp', if `slenp' is
not NULL.

socklen_t sockaddr_getlen(const struct sockaddr *sa)

Return the length of a sockaddr. This just evaluates to
sa->sa_len. I only add this for consistency with code that
appears in a portable userland library that I am going to
import.

const struct sockaddr *sockaddr_any(const struct sockaddr *sa)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.

const void *sockaddr_anyaddr(const struct sockaddr *sa, socklen_t *slenp)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.


Revision tags: nick-csl-alignment-base5
# 1.136 10-Aug-2007 dyoung

branches: 1.136.2;
Constify. bcopy -> memcpy.


Revision tags: matt-mips64-base
# 1.135 19-Jul-2007 dyoung

branches: 1.135.4; 1.135.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.134 13-Jun-2007 dyoung

branches: 1.134.2;
Persuasive programming: check M_UNWRITABLE(m, len) instead of
m->m_len<len before pulling up, because that helps make it clear
that we m_pullup() in order to guarantee that the contiguous region
is *writable*.


# 1.133 23-May-2007 christos

Ansify + add a few comments, from Karl Sj��dahl


Revision tags: yamt-idlelwp-base8
# 1.132 02-May-2007 dyoung

Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


Revision tags: thorpej-atomic-base
# 1.131 04-Mar-2007 christos

branches: 1.131.2; 1.131.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.130 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


# 1.129 10-Feb-2007 degroote

branches: 1.129.2;
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.128 29-Jan-2007 dyoung

bzero -> memset


# 1.127 15-Jan-2007 dyoung

Cosmetic: indent using ASCII horizontal tab, insert space following
comma, wrap line.


# 1.126 15-Jan-2007 degroote

Fix an infinite loop ( and local dos ) in the case where the ip6_hdr and
the icmp6_hdr are not in the same mbuf.
Fix pr/34994 and probably pr/35333
Ok @rpaulo


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.125 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.124 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


Revision tags: netbsd-4-base
# 1.123 16-Nov-2006 christos

branches: 1.123.2;
__unused removal on arguments; approved by core.


Revision tags: yamt-splraiseipl-base2
# 1.122 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.121 05-Sep-2006 dyoung

branches: 1.121.2; 1.121.4;
Simplify and repair icmp6_input() to stop the kernel from panicking
in m_copydata() when an ICMP6_ECHO_REQUEST is received, as reported
by Tatoku Ogaito on current-users@.


Revision tags: yamt-pdpolicy-base8
# 1.120 01-Sep-2006 dyoung

Vastly simplify the code that copies an ICMP6 packet to two data
paths: ICMP6 reply path, and socket path.


# 1.119 30-Aug-2006 christos

declare the type of code.


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.118 11-Jul-2006 tron

Clear mbuf checksum flags before passing it to ip6_output(). We might
recycle a mbuf which contained a hardware provided checksum. This
fixes "traceroute6" to a machine which is using a wm(4) interface
that has UDP or TCP checksum offload enabled.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base
# 1.117 07-Jun-2006 kardel

branches: 1.117.2;
merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base simonb-timecounters-base
# 1.116 15-Apr-2006 christos

branches: 1.116.2;
Coverity CID 740: Change constant comparisons to MCLBYTES to KASSERT and remove
extraneous tests.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2
# 1.115 05-Mar-2006 rpaulo

branches: 1.115.2; 1.115.4;
NDP-related improvements:
RFC4191
- supports host-side router-preference

RFC3542
- if DAD fails on a interface, disables IPv6 operation on the
interface
- don't advertise MLD report before DAD finishes

Others
- fixes integer overflow for valid and preferred lifetimes
- improves timer granularity for MLD, using callout-timer.
- reflects rtadvd's IPv6 host variable information into kernel
(router only)
- adds a sysctl option to enable/disable pMTUd for multicast
packets
- performs NUD on PPP/GRE interface by default
- Redirect works regardless of ip6_accept_rtadv
- removes RFC1885-related code

From the KAME project via SUZUKI Shinsuke.
Reviewed by core.


Revision tags: yamt-pdpolicy-base
# 1.114 03-Mar-2006 rpaulo

branches: 1.114.2;
Fix typos in comments.

From: the KAME project via SUZUKI Shinsuke.


Revision tags: yamt-uio_vmspace-base5
# 1.113 21-Jan-2006 rpaulo

branches: 1.113.2; 1.113.4;
Better support of IPv6 scoped addresses.

- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.
- and, while there, don't try to remove the ff02::/32 interface route
entry in in6_ifdetach() as it's already gone.

This also includes some level of support for the standard source
address selection algorithm defined in RFC3484, which will be
completed on in the future.

From the KAME project via JINMEI Tatuya.
Approved by core@.


# 1.112 11-Dec-2005 christos

branches: 1.112.2;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base ktrace-lwp-base
# 1.111 19-Oct-2005 bouyer

In icmp6_redirect_output(), sip6 is initialised to point to the data area of
m0. But m0 may be freed later, so trying to use sip6 at the end of this
function is wrong. My guess is that we want to reference the data area
of m (the mbuf about to be send) instead at this point.
Fix a panic on Xen (where a data area of a mbuf may be unmapped when the
mbuf is freed), and probably potential data/pool corruption in other cases.


Revision tags: yamt-vop-base
# 1.110 18-Aug-2005 yamt

branches: 1.110.2;
- introduce M_MOVE_PKTHDR and use it where appropriate.
intended to be mostly API compatible with openbsd/freebsd.
- remove a glue #define in netipsec/ipsec_osdep.h.


# 1.109 29-May-2005 christos

branches: 1.109.2;
- avoid shadowed variables
- sprinkle const.


Revision tags: netbsd-3-1-1-RELEASE netbsd-3-0-3-RELEASE netbsd-3-1-RELEASE netbsd-3-0-2-RELEASE netbsd-3-1-RC4 netbsd-3-1-RC3 netbsd-3-1-RC2 netbsd-3-1-RC1 netbsd-3-0-1-RELEASE netbsd-3-0-RELEASE netbsd-3-0-RC6 netbsd-3-0-RC5 netbsd-3-0-RC4 netbsd-3-0-RC3 netbsd-3-0-RC2 netbsd-3-0-RC1 yamt-km-base4 yamt-km-base3 netbsd-3-base yamt-km-base2 yamt-km-base kent-audio2-base
# 1.108 17-Jan-2005 itojun

branches: 1.108.6; 1.108.8; 1.108.10;
shouldn't check code field on "packet too big" icmp6 message.


Revision tags: kent-audio1-beforemerge kent-audio1-base
# 1.107 25-May-2004 atatat

branches: 1.107.4;
Sysctl descriptions under net subtree (net.key not done)


Revision tags: netbsd-2-0-base
# 1.106 26-Mar-2004 itojun

branches: 1.106.2;
do not touch m->m_pkthdr.rcvif after m becomes invalid. Patrick Latifi


# 1.105 24-Mar-2004 atatat

Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.


# 1.104 17-Dec-2003 lha

Fix ICMPV6CTL_ND6_[DP]RLIST, they broke with new sysctl.
Makes ndp -r/ndp -p work again, patch from atatat


# 1.103 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.102 30-Oct-2003 simonb

Remove some assigned-to but otherwise unused variables.


# 1.101 04-Sep-2003 itojun

revamp inpcb/in6pcb so that they are more aligned with each other.
in6pcb lookup now uses hash(9).


# 1.100 25-Aug-2003 itojun

deref member in in6p directly, don't rely on existence of macro


# 1.99 22-Aug-2003 itojun

remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.


# 1.98 22-Aug-2003 itojun

change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.


# 1.97 22-Aug-2003 jonathan

Replace the set_socket() method of passing an extra struct socket*
argument to ip6_output() with a new explicit struct in6pcb* argument.
(The underlying socket can be obtained via in6pcb->inp6_socket.)

In preparation for fast-ipsec. Reviewed by itojun.


# 1.96 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.95 06-Aug-2003 itojun

m_cat may free mbuf on 2nd arg, so m_pkthdr manipulation has to happen
before m_cat call. from Julian Coleman via kame.


# 1.94 24-Jun-2003 itojun

branches: 1.94.2;
remove unneeded checks of accept_rtadv. from kame


# 1.93 24-Jun-2003 itojun

use time.tv_sec directly


# 1.92 06-Jun-2003 itojun

- sync up MLD declaration with RFC3542 (s/MLD6/MLD/)
- routing header declaration with RFC3542
(note: sizeof(ip6_rthdr0) has changed!)
also, sync up with RFC2460 routing header definition (no "strict" source
routing mode any more)

part of advanced API update (RFC2292 -> 3542).


# 1.91 03-Jun-2003 itojun

remove assumption on redirect header option processing. from kame


# 1.90 14-May-2003 itojun

always use PULLDOWN_TEST codepath.


# 1.89 31-Mar-2003 itojun

avoid mbuf leak in redirect header option attachment. more complete
fix to come. from kame


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.88 27-Sep-2002 provos

remove trailing \n in panic(). approved perry.


# 1.87 23-Sep-2002 simonb

Remove breaks after returns, unreachable returns and returns after
returns(!).


# 1.86 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame.


Revision tags: gehenna-devsw-base
# 1.85 30-Jul-2002 itojun

no need to check NULL mbuf, as we touch it already.
From: tedu <grendel@zeitbombe.org>


# 1.84 10-Jul-2002 itojun

correct ping6 -w result wth hostname with [A-Z]. PR 17540. sync w/kame


# 1.83 30-Jun-2002 thorpej

Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).


# 1.82 09-Jun-2002 itojun

whitespace cleanup


# 1.81 08-Jun-2002 itojun

whitespace cleanup


# 1.80 31-May-2002 itojun

do not mistakenly lock PMTUD route entry with RTV_MTU.


# 1.79 29-May-2002 christos

make this compile again.


# 1.78 29-May-2002 itojun

correct rmx_mtu value after PMTUD entry timeout (should be set to 0)


# 1.77 24-May-2002 itojun

extra blank line


# 1.76 24-May-2002 itojun

make a strict check before sending FQDN node information reply. sync w/kame


Revision tags: netbsd-1-6-base eeh-devprop-base newlock-base
# 1.75 05-Mar-2002 itojun

branches: 1.75.6; 1.75.8;
on redirect output, always try to attach target link layer address option.


Revision tags: ifpoll-base
# 1.74 21-Dec-2001 itojun

whitespace/costmetic sync w/kame


# 1.73 20-Dec-2001 itojun

centralize multicast group management (in6_join/leavegroup).
have a flag for ip6_output() to fragment to minimum MTU.
sync with kame


# 1.72 07-Dec-2001 itojun

correct timing to increment icmp6 MIB variables. sync with kame


# 1.71 13-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-mips-cache-base
# 1.70 29-Oct-2001 simonb

Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.


# 1.69 24-Oct-2001 itojun

more whitespace sync with kame


# 1.68 18-Oct-2001 itojun

branches: 1.68.2;
simplify per-if stats.


# 1.67 15-Oct-2001 itojun

sync with kame.
net.inet6.icmp6.nodeinfo is now a bitmap (2^0 = ping6 -w, 2^1 = ping6 -a).
give up local if there's mbuf alloc failures.
cope with ".." in hostname.
sync comments/whitespaces.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.66 22-Jun-2001 itojun

branches: 1.66.2;
remove RFC1885 compatibility code in #ifdef COMPAT_RFC1885, for icmp6
reply packet size consideration (obsolete, not used for a long time).
sync with kame


# 1.65 01-Jun-2001 itojun

use default hoplimit when incoming interface is not given to icmp6_reflect.
sync with kame


# 1.64 08-May-2001 itojun

correct faith prefix determination. use sys/netinet/if_faith.c:faithprefix()
to determine. sync with kame.
(without this change, non-faith socket may mistakenly accept for-faith traffic)


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.63 04-Apr-2001 itojun

make sure rcvif is sane on call to icmp6_reflect


# 1.62 30-Mar-2001 itojun

enable FAKE_LOOPBACK_IF case by default.
now traffic on loopback interface will be presented to bpf as normal wire
format packet (without KAME scopeid in s6_addr16[1]).

fix KAME PR 250 (host mistakenly accepts packets to fe80::x%lo0).

sync with kame.


# 1.61 21-Mar-2001 itojun

set rmx_mtu to L2 interface mtu, instead of 0, on mtudisc timeout.
ip6_output() change is for safety. sync with kame


# 1.60 08-Mar-2001 itojun

remove bogus rtfree. sync with kame. inspired by openbsd PR 1706.


# 1.59 01-Mar-2001 itojun

branches: 1.59.2;
make sure to enforce inbound ipsec policy checking, for any protocols on top
of ip (check it when final header is visited). sync with kame.
XXX kame team will need to re-check policy engine code


# 1.58 11-Feb-2001 itojun

pull latest kame pcbnotify code. synchronizes ICMPv6 path mtu discovery
behavior with other protocols (i.e. validation, use of hiwat/lowat).


# 1.57 11-Feb-2001 itojun

recover $NetBSD$ (removed by mistake)


# 1.56 10-Feb-2001 itojun

to sync with kame better, (1) remove register declaration for variables,
(2) sync whitespaces, (3) update comments. (4) bring in some of portability
and logging enhancements. no functional changes here.


# 1.55 08-Feb-2001 itojun

implement upper limit to icmp6 redirects (experimental, turned off)
negative value to {mtudisc,redirect}_{hi,lo}wat will turn off the limitation.
sync with kame.


# 1.54 07-Feb-2001 itojun

remove bogus DIAGNOSTIC. sync with kame


# 1.53 07-Feb-2001 itojun

during ip6/icmp6 inbound packet processing, do not call log() nor printf() in
normal operation (/var can get filled up by flodding bogus packets).
sysctl net.inet6.icmp6.nd6_debug will turn on diagnostic messages.
(#define ND6_DEBUG will turn it on by default)

improve stats in ND6 code.

lots of synchronziation with kame (including comments and cometic ones).


# 1.52 24-Jan-2001 itojun

- record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation


# 1.51 16-Jan-2001 itojun

s/ND6DEBUG/ND6_DEBUG/ to meet other places


# 1.50 08-Jan-2001 itojun

wrap icmp6 checksum error printf() into #ifdef ND6DEBUG.
sync with kame, NetBSD PR 11911.


# 1.49 11-Dec-2000 itojun

no need to rtalloc1() twice in pmtud. from kame


# 1.48 09-Dec-2000 itojun

update icmp6 too big validation. the change is necessary since pmtud is
mandatory for IPv6 (so we can't just validate by using connected pcb - we need
to allow traffic from unconnected pcb to do pmtud).
- if the traffic is validated by xx_ctlinput, allow up to "hiwat" pmtud
route entries.
- if the traffic was not validated by xx_ctlinput, allow up to "lowat" pmtud
route entries (there's upper limit, so bad guys cannot blow up our routing
table).
sync with kame

XXX need to think again about default hiwat/lowat value.
XXX victim selection to help starvation case


# 1.47 11-Nov-2000 itojun

improve spec conformance of node information query (07).
sync with kame.


# 1.46 18-Oct-2000 itojun

verify ICMPv6 too big messages based on TCP pcbs, and/or IPsec SA.
TODO: udp6, and sendto consideration. as pmtud is mandatory for IPv6,
it is rather important for us to support those cases.
TODO: more testing
TODO: kame sync


# 1.45 10-Oct-2000 itojun

sync with kame ($KAME$)


# 1.44 02-Oct-2000 itojun

fix compilation without INET. fix confusion between ipsecstat and ipsec6stat.
sync with kame.


# 1.43 16-Sep-2000 itojun

kame sys/netinet6/icmp6.c 1.140 -> 1.144
> in the check for the incoming redirect message, examine the gateway
> (from the routing table) only when the address family of the gateway is
> AF_INET6.


# 1.42 19-Aug-2000 itojun

- icmp6 nodeinfo: remove possibility of unaligned pointer access.
- jumbo payload output: fix incorrect mbuf manipulation
- pedant: align issues, mbuf assumption
(sync with kame)


# 1.41 03-Aug-2000 itojun

clearifications in icmp6 node query support.
XXX previous commit included "supported qtypes" icmp6 node query support.
sorry commit message was mistaken.


# 1.40 03-Aug-2000 itojun

correct typo in #define. ICMP6_NI_SUCESS -> SUCCESS (notice missing C).
sync with kame.


# 1.39 30-Jul-2000 itojun

sync comment with reality


# 1.38 28-Jul-2000 itojun

nuke the following sysctl variables. "ppsratelimit" should work better.
need to recompile sbin/sysctl after updating /usr/include.
net.inet.tcp.rstratelimit
net.inet.icmp.errratelimit
net.inet6.icmp6.errratelimit


# 1.37 09-Jul-2000 itojun

add ppsratelimit(9), which does event-per-sec rate limitation.
use it from icmp6 error rate limitation code.
XXX better name for the function?


# 1.36 07-Jul-2000 itojun

sync with kame.
introduce in6_{recover,embed}scope, for in-kernel scoped-address manipulation.
improve in6_pcbnotify.


# 1.35 06-Jul-2000 itojun

- do not use bitfield for router renumbering header.
- add protection mechanism against ND cache corruption due to bad NUD hints.
- more stats
- icmp6 pps limitation. TOOD: should implement ppsratecheck(9).


# 1.34 28-Jun-2000 mrg

<vm/vm.h> -> <uvm/uvm_extern.h>


Revision tags: netbsd-1-5-base
# 1.33 13-Jun-2000 itojun

branches: 1.33.2;
signedness issue with char, take 2. confirmed with i386 cc -funsigned-char.


# 1.32 13-Jun-2000 itojun

workaround to suppress warning on char == unsigned char arch.


# 1.31 12-Jun-2000 itojun

better conformance to draft-ietf-ipngwg-icmp-name-lookups-05.
the old code was chimera of 03 and 05 draft.

-n by default, since IPv6 reverse lookup takes too much time.
use -H to enable reverse name lookup.


Revision tags: minoura-xpg4dl-base
# 1.30 22-May-2000 itojun

branches: 1.30.2;
disallow negative numbers for ratelimit interval (tcp, icmp, icmp6).


# 1.29 09-May-2000 itojun

do not try NUD unless the gateway is a real neighbor.
real fix to KAME PR 245 (workaround has been implemented).


# 1.28 13-Apr-2000 itojun

do not return icmp6 error against icmp6 error.
(this is due to a bug in header chain chasing)


# 1.27 22-Mar-2000 itojun

use ip6_{last,next}hdr in icmp6 inbound packet parsing.


# 1.26 01-Mar-2000 itojun

introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)


# 1.25 28-Feb-2000 itojun

fix ICMPv6 redirect input. the bug can result in invalid ND entry.


# 1.24 28-Feb-2000 itojun

support draft-ietf-ipngwg-icmp-name-lookups-05.txt, drop support for
draft-ietf-ipngwg-icmp-name-lookups-04.txt.

There are certain bitfield change in 04 draft to 05 draft, which makes
04 "ping6 -a" and 05 "ping6 -a" not interoperable. sigh.


# 1.23 26-Feb-2000 itojun

bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.


# 1.22 17-Feb-2000 darrenr

Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.


# 1.21 15-Feb-2000 thorpej

Fix a couple of brainos in the last.


# 1.20 14-Feb-2000 thorpej

Use ratecheck() for ICMP6 rate limiting.


Revision tags: chs-ubc2-newbase
# 1.19 06-Feb-2000 itojun

fix include pathname for better rfc2292 compliance.


# 1.18 16-Jan-2000 itojun

add missing ipcomp cases.


# 1.17 07-Jan-2000 itohy

Rename variable "prep" for PReP port.


# 1.16 06-Jan-2000 itojun

remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...


# 1.15 05-Jan-2000 itojun

avoid panic on getsockopt(ICMPV6_FILTER).


# 1.14 02-Jan-2000 itojun

add net.inet6.icmp6.nodeinfo sysctl.
this allows you to disable/enable ICMPv6 node information query/reply
processing (which tells remote end the gethostname(3) setting, interface
addresses on the node, and some other things - documented in
draft-ietf-ipngwg-icmp-name-lookup* or something alike).

to test it, try ping6 -w ::1 with nodeinfo=0 and nodeinfo=1.
(sync with kame change)


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.13 15-Dec-1999 itojun

do not overwrite traffic class field when we write IPv6 version field.


# 1.12 13-Dec-1999 itojun

sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)


Revision tags: comdex-fall-1999-base fvdl-softdep-base
# 1.11 01-Oct-1999 itojun

branches: 1.11.2; 1.11.8;
consistent logging for icmp6 redirects
XXX should make logs 1-liner so that duplicated logs can be compressed
by syslog(8)?


Revision tags: chs-ubc2-base
# 1.10 31-Jul-1999 itojun

sync with recent KAME.
- loosen ipsec restriction on packet diredtion.
- revise icmp6 redirect handling on IsRouter bit.
- tcp/udp notification processing (link-local address case)
- cosmetic fixes (better code share across *BSD).


# 1.9 30-Jul-1999 itojun

remove reference to in6_systm.h (file itself will be removed afterwords)


# 1.8 22-Jul-1999 itojun

- implement IPv6 pmtud, which is necessary for TCP6.
- fix memory leak on SO_DEBUG over TCP.


# 1.7 22-Jul-1999 itojun

change unnecessary u_long/long into u_int32_t or something relevant.
more fixes should follow.


# 1.6 09-Jul-1999 thorpej

defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).


# 1.5 06-Jul-1999 itojun

sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups


# 1.4 06-Jul-1999 itojun

checked build on alpha and i386, with GENERIC.v6.
fixed several sizeof(void *) and sizeof(size_t) issues on alpha.

Thanks to: Dave Huang and Tim Rightnour


# 1.3 03-Jul-1999 thorpej

RCS ID police.


# 1.2 01-Jul-1999 itojun

branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.1 28-Jun-1999 itojun

branches: 1.1.2;
file icmp6.c was initially added on branch kame.


# 1.247 11-Sep-2020 roy

inet6: Use generic Neighor Detection rather than IPv6 specific

No functional change intended.


# 1.246 27-Jul-2020 roy

icmp6: Remove __packed attribute from icmp6 structures

They should naturally align.
Add compile time assertations to icmp6.c to prove this.


# 1.245 12-Jun-2020 roy

Remove in-kernel handling of Router Advertisements

This is much better handled by a user-land tool.
Proposed on tech-net here:
https://mail-index.netbsd.org/tech-net/2020/04/22/msg007766.html

Note that the ioctl SIOCGIFINFO_IN6 no longer sets flags. That now
needs to be done using the pre-existing SIOCSIFINFO_FLAGS ioctl.

Compat is fully provided where it makes sense, but trying to turn on
RA handling will obviously throw an error as it no longer exists.

Note that if you use IPv6 temporary addresses, this now needs to be
turned on in dhcpcd.conf(5) rather than in sysctl.conf(5).


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base phil-wifi-20200406
# 1.244 09-Mar-2020 roy

route: RTM_MISS now puts the message source address in RTA_AUTHOR

route(8) also reports this.
A userland app could use this to blacklist nodes who probe for machines
that doesn't exist on a subnet / prefix.


Revision tags: is-mlppp-base ad-namecache-base3 ad-namecache-base2 ad-namecache-base1 ad-namecache-base phil-wifi-20191119
# 1.243 06-Oct-2019 uwe

icmp6_notify_error - fix ctlfunc typedef to match pr_ctlinput,
drop the cast that is no longer necessary.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.242 22-Dec-2018 maxv

Replace: M_COPY_PKTHDR -> m_copy_pkthdr. No functional change, since the
former is a macro to the latter.


# 1.241 22-Dec-2018 maxv

Replace: M_MOVE_PKTHDR -> m_move_pkthdr. No functional change, since the
former is a macro to the latter.


Revision tags: pgoyette-compat-1126
# 1.240 25-Oct-2018 ozaki-r

Remove a leftover debug printf

Pointed out by hannken@


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.239 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


Revision tags: pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625
# 1.238 01-Jun-2018 ozaki-r

branches: 1.238.2;
Fix _rt_free via rtrequest(RTM_DELETE) hangs in rt_timer handlers

A rt_timer handler is passed a rtentry with an extra reference that avoids the
rtentry is accidentally released. So rt_timer handers must release the reference
of a passed rtentry by themselves (but they didn't).


Revision tags: pgoyette-compat-0521
# 1.237 07-May-2018 maxv

Remove misleading comments.


Revision tags: pgoyette-compat-0502
# 1.236 01-May-2018 maxv

Remove now unused net_osdep.h includes, the other BSDs did the same.


# 1.235 29-Apr-2018 maxv

Replace
m_copym(m, 0, M_COPYALL, M_DONTWAIT)
by
m_copypacket(m, M_DONTWAIT)
when it is obvious that 'm' has M_PKTHDR set.


# 1.234 28-Apr-2018 maxv

Remove unused ipsec_var.h includes.


# 1.233 27-Apr-2018 maxv

Fix a bug introduced in rev1.154 (2009). mcl_cache still has a size of
MCLBYTES, so the area allocated is still too small.

I think it should have been MEXTMALLOC, and of course I can't test my
change.


# 1.232 26-Apr-2018 maxv

Stop using m_copy(), use m_copym() directly. m_copy is useless,
undocumented and confusing.


# 1.231 26-Apr-2018 maxv

Use M_UNWRITABLE, no functional change.


Revision tags: pgoyette-compat-0422 pgoyette-compat-0415
# 1.230 14-Apr-2018 maxv

Fix 'icmp6len', it shouldn't be ip6_plen, because we may not be at the
beginning of the packet (off+ip6_plen is beyond the end of the mbuf). By
luck, the IP6_EXTHDR_GET that follows will fail and prevent buffer
overflows in non-jumbogram packets.

For jumbograms we will probably be in trouble here; but it doesn't seem
possible to craft reliably a jumbogram for a non-jumbogram-enabled device.

So I don't think it's a huge problem.


# 1.229 14-Apr-2018 maxv

Cosmetic, and remove one XXX (no problem).


# 1.228 14-Apr-2018 maxv

Remove the RH0 code from ICMPv6. RH0 is deprecated by RFC5095 (2007) for
security reasons. We already removed it in Route6.

In addition there was an mbuf bug here: calling IP6_EXTHDR_GET twice with
the same offset, but still using the pointer from the first call, which
could have been made invalid. By luck, m_pulldown leaves zero-sized mbufs
in place, instead of freeing them.

And in general, using a 'finaldst' pointer on the mbuf, and then modifying
that mbuf with IP6_EXTHDR_GET with a smaller offset, was really error-
prone.


# 1.227 14-Apr-2018 maxv

Remove dead code. It is the same as the non-obsolete one, since
ICMP6_DST_UNREACH_NOTNEIGHBOR == ICMP6_DST_UNREACH_BEYONDSCOPE,
and the code leads to the same errno value (EHOSTUNREACH).


# 1.226 12-Apr-2018 maxv

Synchronize the code between raw_ip6.c<->icmp6.c<->raw_ip.c, so that it is
the same everywhere.


# 1.225 12-Apr-2018 maxv

Remove misleading comment; we're just checking the SP, not verifying the
AH/ESP payload. While here style a bit.


Revision tags: pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322
# 1.224 21-Mar-2018 roy

Sprinkle more soroverflow().


Revision tags: pgoyette-compat-0315 pgoyette-compat-base
# 1.223 28-Feb-2018 maxv

branches: 1.223.2;
Remove unused ipsec_private.h includes.


# 1.222 26-Feb-2018 maxv

Remove redundant condition (harmless). PR/53030.


# 1.221 26-Feb-2018 maxv

Dedup: merge ipsec4_in_reject and ipsec6_in_reject into ipsec_in_reject.
While here fix misleading comment.

ok ozaki-r@


# 1.220 12-Feb-2018 maxv

Replace bcopy -> memcpy when it is obvious that the areas don't overlap.
Rearrange ip6_splithdr() for clarity.


# 1.219 23-Jan-2018 maxv

Style, localify, remove XXX when there's no issue, and switch 'extra'
to int.


# 1.218 23-Jan-2018 maxv

Fix the check on 'maxlen', we are not creating struct icmp6_hdr but
struct nd_redirect (which is bigger). Also, make sure we can add a
struct nd_opt_rd_hdr.

Normally this doesn't change anything, since the mbuf has IPV6_MMTU
bytes, and it's always way bigger than what we need.


# 1.217 23-Jan-2018 maxv

Fix info leak. We are allocating a slot of size:

roundup(sizeof(*nd_opt) + ifp->if_addrlen, 8)

But we are not filling in the padding caused by the roundup, and therefore
several bytes are leaked, in the mbuf we're about to send to the network.


# 1.216 23-Jan-2018 maxv

Fix twice the same mistake: 'last' can't be null, so there's no point in
having this misleading branch.


# 1.215 23-Jan-2018 maxv

Style, and four fixes:

* Remove the (disabled) IPPROTO_ESP check. If the packet was decrypted it
will have M_DECRYPTED, and this is already checked.

* Memory leaks in icmp6_error2. They seem hardly triggerable.

* Fix miscomputation in _icmp6_input, the ICMP6 header is not guaranteed
to be located right after the IP6 header. ok mlelstv@

* Memory leak in _icmp6_input. This one seems to be impossible to trigger.


Revision tags: tls-maxphys-base-20171202
# 1.214 05-Nov-2017 ozaki-r

Fix usages of ipsec_used

If IPsec isn't used, we must go back to the normal path.

PR kern/52659


Revision tags: nick-nhusb-base-20170825
# 1.213 02-Aug-2017 ozaki-r

Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.


Revision tags: perseant-stdc-iso10646-base
# 1.212 07-Jul-2017 knakahara

fix PR kern/52353. implemented by ozaki-r@n.o. I just commit by proxy.

XXX need to pullup to -8.


Revision tags: netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.211 14-Mar-2017 ozaki-r

branches: 1.211.6;
Replace DIAGNOSTIC + panic with CTASSERT


# 1.210 17-Feb-2017 ozaki-r

Rename if_acquire_NOMPSAFE to if_acquire

It can be used in MP-safe ways. So let's remove the confusing postfix.
If it's used in a unsafe way, warn NOMPSAFE in a comment.


# 1.209 13-Feb-2017 ozaki-r

Protect mtudisc and redirect stuffs of icmp/icmp6 with mutex

We have to run pr_init of icmp and icmp6 prior to tcp and tcp6 ones
for mutex initialization.


# 1.208 07-Feb-2017 ozaki-r

Add missing NULL checks for m_get_rcvif


Revision tags: nick-nhusb-base-20170204
# 1.207 02-Feb-2017 ozaki-r

Defer some pr_input to workqueue

pr_input is currently called in softint. Some pr_input such as ICMP, ICMPv6
and CARP can add/delete/update IP addresses and routing table entries. For
example, icmp6_redirect_input updates an a routing table entry and
nd6_ra_input may delete an IP address.

Basically such operations shouldn't be done in softint. That aside, we have
a reason to avoid the situation; psz/psref waits cannot be used in softint,
however they are required to work in such pr_input in the MP-safe world.

The change implements the workqueue pr_input framework called wqinput which
provides a means to defer pr_input of a protocol to workqueue easily.
Currently icmp_input, icmp6_input, carp_proto_input and carp6_proto_input
are deferred to workqueue by the framework.

Proposed and discussed on tech-kern and tech-net


# 1.206 16-Jan-2017 christos

ip6_sprintf -> IN6_PRINT so that we pass the size.


# 1.205 16-Jan-2017 ryo

Make ip6_sprintf(), in_fmtaddr(), lla_snprintf() and icmp6_redirect_diag() mpsafe.

Reviewed by ozaki-r@


Revision tags: bouyer-socketcan-base
# 1.204 13-Jan-2017 ozaki-r

branches: 1.204.2;
Tweak icmp6_input; always use off, not *offp


Revision tags: pgoyette-localcount-20170107
# 1.203 12-Dec-2016 ozaki-r

Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.202 11-Dec-2016 ozaki-r

Correct sanity checks of icmp6_redirect_output

- rt->rt_ifp is always non-NULL
- Checking RTF_UP here is just racy and meaningless
- The arguments should be non-NULL (at least for now)


Revision tags: nick-nhusb-base-20161204
# 1.201 15-Nov-2016 mlelstv

Enforce alignment requirements that are violated in some cases.
For machines that don't need strict alignment (i386,amd64,vax,m68k) this
is a no-op.

Fixes PR kern/50766 but should be improved.


Revision tags: pgoyette-localcount-20161104
# 1.200 31-Oct-2016 ozaki-r

Fix race condition of in6_selectsrc

in6_selectsrc returned a pointer to in6_addr that wan't guaranteed to be
safe by pserialize (or psref), which was racy. Let callers pass a pointer
to in6_addr and in6_selectsrc copy a result to it inside pserialize
critical sections.


# 1.199 25-Oct-2016 ozaki-r

Remove unnecessary argument

No functional change.


# 1.198 18-Oct-2016 ozaki-r

Remove unnecessary pserialize_read_enter


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.197 26-Aug-2016 dholland

PR 51434 David Binderman: remove redundant test.


# 1.196 19-Aug-2016 roy

Revert r1.148
IP6_EXTHDR_GET ensures that a icmp6 header can be fetched from the mbuf
so m_pullup does not need to be called.

While here, we can safely increament interface error stats even with an
invalidated mbuf because we have a saved reference to the interface.


Revision tags: pgoyette-localcount-20160806
# 1.195 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


Revision tags: pgoyette-localcount-20160726
# 1.194 15-Jul-2016 ozaki-r

Use sin6tosa and sin6tocsa macros

No functional change.


# 1.193 15-Jul-2016 ozaki-r

Use ifatoia6 macro

No functional change.


Revision tags: pgoyette-localcount-base nick-nhusb-base-20160907
# 1.192 07-Jul-2016 ozaki-r

branches: 1.192.2;
Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.


# 1.191 05-Jul-2016 ozaki-r

Use ia6 or ia instead of ifa as a variable name of struct in6_ifaddr

We conventionally use ifa for struct ifaddr and use ia6 or ia for
struct in6_ifaddr.

No functional change.


# 1.190 28-Jun-2016 ozaki-r

Add missing NULL checks for m_get_rcvif_psref


# 1.189 21-Jun-2016 ozaki-r

Make sure returning ifp from in6_select* functions psref-ed

To this end, callers need to pass struct psref to the functions
and the fuctions acquire a reference of ifp with it. In some cases,
we can simply use if_get_byindex, however, in other cases
(say rt->rt_ifp and ia->ifa_ifp), we have no MP-safe way for now.
In order to take a reference anyway we use non MP-safe function
if_acquire_NOMPSAFE for the latter cases. They should be fixed in
the future somehow.


# 1.188 10-Jun-2016 ozaki-r

Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.


# 1.187 10-Jun-2016 ozaki-r

Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.186 18-May-2016 ozaki-r

Don't try to get outif unnecessarily from in6_selectsrc

The got outif is unused.


# 1.185 17-May-2016 ozaki-r

Get rcvif once and reuse it

No functional change.


# 1.184 17-May-2016 ozaki-r

Make sure icmp6_redirect_input frees mbuf before return


# 1.183 12-May-2016 ozaki-r

Protect ifnet list with psz and psref

The change ensures that ifnet objects in the ifnet list aren't freed during
list iterations by using pserialize(9) and psref(9).

Note that the change adds a pslist(9) for ifnet but doesn't remove the
original ifnet list (ifnet_list) to avoid breaking kvm(3) users. We
shouldn't use the original list in the kernel anymore.


Revision tags: nick-nhusb-base-20160422
# 1.182 04-Apr-2016 ozaki-r

Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html


# 1.181 01-Apr-2016 ozaki-r

Remove unnecessary casts and do s/0/NULL/ for rtrequest


# 1.180 01-Apr-2016 ozaki-r

Refine nd6log

Add __func__ to nd6log itself instead of adding it to callers.


Revision tags: nick-nhusb-base-20160319
# 1.179 21-Jan-2016 riastradh

Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.


# 1.178 21-Jan-2016 riastradh

Give proper prototype to ip_output.


Revision tags: nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.177 14-Sep-2015 ozaki-r

Update icmp6_redirect_timeout_q when changing net.inet6.icmp6.redirtimeout

We have to update icmp6_redirect_timeout_q as well as icmp6_redirtimeout
when changing net.inet6.icmp6.redirtimeout via sysctl. The updating logic
is copied from sysctl_net_inet_icmp_redirtimeout.

This change is from s-yamaguchi@IIJ (with KNF by ozaki-r) and fixes
PR kern/50240.


# 1.176 31-Aug-2015 ozaki-r

Make rt_refcnt take into account rt_timer


# 1.175 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.174 24-Aug-2015 ozaki-r

Change 0 to NULL for rtrequest's last argument (struct rtentry **ret_nrt)


# 1.173 07-Aug-2015 ozaki-r

Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.


# 1.172 24-Jul-2015 ozaki-r

Fix rtfree-ing wrong rtentry


# 1.171 17-Jul-2015 ozaki-r

Reform use of rt_refcnt

rt_refcnt of rtentry was used in bad manners, for example, direct rt_refcnt++
and rt_refcnt-- outside route.c, "rt->rt_refcnt++; rtfree(rt);" idiom, and
touching rt after rt->rt_refcnt--.

These abuses seem to be needed because rt_refcnt manages only references
between rtentry and doesn't take care of references during packet processing
(IOW references from local variables). In order to reduce the above abuses,
the latter cases should be counted by rt_refcnt as well as the former cases.

This change improves consistency of use of rt_refcnt:
- rtentry is always accessed with rt_refcnt incremented
- rtentry's rt_refcnt is decremented after use (rtfree is always used instead
of rt_refcnt--)
- functions returning rtentry increment its rt_refcnt (and caller rtfree it)

Note that rt_refcnt prevents rtentry from being freed but doesn't prevent
rtentry from being updated. Toward MP-safe, we need to provide another
protection for rtentry, e.g., locks. (Or introduce a better data structure
allowing concurrent readers during updates.)


Revision tags: nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.170 25-Nov-2014 christos

branches: 1.170.2;
CID 977389: Out of bounds access.


Revision tags: netbsd-7-0-2-RELEASE netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.169 06-Jun-2014 rmind

branches: 1.169.2;
- Eliminate RTFREE() macro in favour of rtfree() function.
- Make rtcache() function static.


# 1.168 30-May-2014 christos

Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.


# 1.167 19-May-2014 rmind

- Split off PRU_ATTACH and PRU_DETACH logic into separate functions.
- Replace malloc with kmem and eliminate M_PCB while here.
- Sprinkle more asserts.


Revision tags: rmind-smpnet-nbase rmind-smpnet-base
# 1.166 18-May-2014 rmind

Use IFNET_FIRST() rather than open coding ifnet access.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.165 25-Feb-2014 pooka

branches: 1.165.2;
Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.


# 1.164 20-Feb-2014 joerg

Bail out in case m_pulldown failed.


# 1.163 23-Nov-2013 christos

convert from CIRCLEQ to TAILQ.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.162 05-Jun-2013 christos

branches: 1.162.2;
IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.


Revision tags: agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.161 23-Jun-2012 christos

branches: 1.161.2;
4 new sysctls to avoid ipv6 DoS attacks from OpenBSD


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.160 22-Mar-2012 drochner

remove KAME IPSEC, replaced by FAST_IPSEC


Revision tags: netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.159 31-Dec-2011 christos

branches: 1.159.2; 1.159.6; 1.159.8;
- fix offsetof usage, and redundant defines
- kill pointer casts to 0


# 1.158 19-Dec-2011 drochner

rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.157 31-Aug-2011 plunky

branches: 1.157.2; 1.157.6;
NULL does not need a cast


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 rmind-uvmplock-base
# 1.156 12-Sep-2010 drochner

avoid NULL dereference in error case


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.155 18-Oct-2009 christos

branches: 1.155.2; 1.155.4;
fix the sun2 case for real.


# 1.154 12-Oct-2009 christos

unbreak sun2.


# 1.153 16-Sep-2009 pooka

Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.152 18-Mar-2009 cegger

bzero -> memset


# 1.151 18-Mar-2009 cegger

bcmp -> memcmp


Revision tags: netbsd-5-2-3-RELEASE netbsd-5-1-5-RELEASE netbsd-5-2-2-RELEASE netbsd-5-1-4-RELEASE netbsd-5-2-1-RELEASE netbsd-5-1-3-RELEASE netbsd-5-2-RELEASE netbsd-5-2-RC1 netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 nick-hppapmap-base2 netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 haad-dm-base mjf-devfs2-base
# 1.150 03-Oct-2008 adrianp

branches: 1.150.2; 1.150.8;
Fix for CVE-2008-3530 from matt@
Implement improved checking for MTU values on ICMP 'Packet Too Big Messages'


Revision tags: wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.149 06-Aug-2008 plunky

Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base
# 1.148 07-May-2008 bouyer

branches: 1.148.2; 1.148.6;
Sync with ipv4 icmp_input(): make sure the mbuf is writable and
contains the entire icmp message befre calling icmp6_input().
should fix "panic: mbuf too short for IPv6 header" seen by several peoples.


# 1.147 04-May-2008 thorpej

Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577


Revision tags: yamt-nfs-mp-base
# 1.146 23-Apr-2008 thorpej

branches: 1.146.2;
Use <net/net_stats.h> / netstat_sysctl().


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.145 15-Apr-2008 thorpej

branches: 1.145.2;
Make ip6 and icmp6 stats per-cpu.


# 1.144 08-Apr-2008 thorpej

Change IPv6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ip6stat structure; old netstat
binaries will continue to work properly.


# 1.143 08-Apr-2008 thorpej

Change ICMP6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old icmp6stat structure; old netstat
binaries will continue to work properly.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.142 27-Feb-2008 matt

Convert to ansi definitions from old-style definitons.
Remember that func() is not ansi, func(void) is.


Revision tags: nick-net80211-sync-base bouyer-xeni386-merge1 vmlocking2-base3 bouyer-xeni386-nbase yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 bouyer-xeni386-base yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase mjf-devfs-base matt-armv6-base jmcneill-pm-base hpcarm-cleanup-base reinoud-bufcleanup-base
# 1.141 04-Dec-2007 dyoung

branches: 1.141.8; 1.141.12;
Use IFNET_FOREACH() and IFADDR_FOREACH().


Revision tags: vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.140 01-Nov-2007 dyoung

branches: 1.140.2; 1.140.4;
De-__P().


# 1.139 29-Oct-2007 dyoung

The IPv6 stack labels incoming packets with an m_tag whose payload
is a struct ip6aux. A struct ip6aux used to contain a pointer to
an in6_ifaddr, but that pointer could become a dangling reference
in the lifetime of the m_tag, because ip6_setdstifaddr() did not
increase the in6_ifaddr's reference count. I have removed the
pointer from ip6aux. I load it with the interesting fields from
the in6_ifaddr (an IPv6 address, a scope ID, and some flags),
instead.


# 1.138 24-Oct-2007 dyoung

Replace rote sockaddr_in6 initializations (memset(), set sa6_family,
sa6_len, and sa6_add) with sockaddr_in6_init() calls.

De-__P(). Constify. KNF. Shorten a staircase. Change bcmp() to
memcmp().

Extract subroutine in6_setzoneid() from in6_setscope(), for re-use
soon.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base vmlocking-base
# 1.137 19-Sep-2007 dyoung

branches: 1.137.4;
1) Introduce a new socket option, (SOL_SOCKET, SO_NOHEADER), that
tells a socket that it should both add a protocol header to tx'd
datagrams and remove the header from rx'd datagrams:

int onoff = 1, s = socket(...);
setsockopt(s, SOL_SOCKET, SO_NOHEADER, &onoff);

2) Add an implementation of (SOL_SOCKET, SO_NOHEADER) for raw IPv4
sockets.

3) Reorganize the protocols' pr_ctloutput implementations a bit.
Consistently return ENOPROTOOPT when an option is unsupported,
and EINVAL if a supported option's arguments are incorrect.
Reorganize the flow of code so that it's more clear how/when
options are passed down the stack until they are handled.

Shorten some pr_ctloutput staircases for readability.

4) Extract common mbuf code into subroutines, add new sockaddr
methods, and introduce a new subroutine, fsocreate(), for reuse
later; use it first in sys_socket():

struct mbuf *m_getsombuf(struct socket *so)

Create an mbuf and make its owner the socket `so'.

struct mbuf *m_intopt(struct socket *so, int val)

Create an mbuf, make its owner the socket `so', put the
int `val' into it, and set its length to sizeof(int).


int fsocreate(..., int *fd)

Create a socket, a la socreate(9), put the socket into the
given LWP's descriptor table, return the descriptor at `fd'
on success.

void *sockaddr_addr(struct sockaddr *sa, socklen_t *slenp)
const void *sockaddr_const_addr(const struct sockaddr *sa, socklen_t *slenp)

Extract a pointer to the address part of a sockaddr. Write
the length of the address part at `slenp', if `slenp' is
not NULL.

socklen_t sockaddr_getlen(const struct sockaddr *sa)

Return the length of a sockaddr. This just evaluates to
sa->sa_len. I only add this for consistency with code that
appears in a portable userland library that I am going to
import.

const struct sockaddr *sockaddr_any(const struct sockaddr *sa)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.

const void *sockaddr_anyaddr(const struct sockaddr *sa, socklen_t *slenp)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.


Revision tags: nick-csl-alignment-base5
# 1.136 10-Aug-2007 dyoung

branches: 1.136.2;
Constify. bcopy -> memcpy.


Revision tags: matt-mips64-base
# 1.135 19-Jul-2007 dyoung

branches: 1.135.4; 1.135.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.134 13-Jun-2007 dyoung

branches: 1.134.2;
Persuasive programming: check M_UNWRITABLE(m, len) instead of
m->m_len<len before pulling up, because that helps make it clear
that we m_pullup() in order to guarantee that the contiguous region
is *writable*.


# 1.133 23-May-2007 christos

Ansify + add a few comments, from Karl Sj��dahl


Revision tags: yamt-idlelwp-base8
# 1.132 02-May-2007 dyoung

Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


Revision tags: thorpej-atomic-base
# 1.131 04-Mar-2007 christos

branches: 1.131.2; 1.131.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.130 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


# 1.129 10-Feb-2007 degroote

branches: 1.129.2;
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.128 29-Jan-2007 dyoung

bzero -> memset


# 1.127 15-Jan-2007 dyoung

Cosmetic: indent using ASCII horizontal tab, insert space following
comma, wrap line.


# 1.126 15-Jan-2007 degroote

Fix an infinite loop ( and local dos ) in the case where the ip6_hdr and
the icmp6_hdr are not in the same mbuf.
Fix pr/34994 and probably pr/35333
Ok @rpaulo


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.125 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.124 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


Revision tags: netbsd-4-base
# 1.123 16-Nov-2006 christos

branches: 1.123.2;
__unused removal on arguments; approved by core.


Revision tags: yamt-splraiseipl-base2
# 1.122 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.121 05-Sep-2006 dyoung

branches: 1.121.2; 1.121.4;
Simplify and repair icmp6_input() to stop the kernel from panicking
in m_copydata() when an ICMP6_ECHO_REQUEST is received, as reported
by Tatoku Ogaito on current-users@.


Revision tags: yamt-pdpolicy-base8
# 1.120 01-Sep-2006 dyoung

Vastly simplify the code that copies an ICMP6 packet to two data
paths: ICMP6 reply path, and socket path.


# 1.119 30-Aug-2006 christos

declare the type of code.


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.118 11-Jul-2006 tron

Clear mbuf checksum flags before passing it to ip6_output(). We might
recycle a mbuf which contained a hardware provided checksum. This
fixes "traceroute6" to a machine which is using a wm(4) interface
that has UDP or TCP checksum offload enabled.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base
# 1.117 07-Jun-2006 kardel

branches: 1.117.2;
merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base simonb-timecounters-base
# 1.116 15-Apr-2006 christos

branches: 1.116.2;
Coverity CID 740: Change constant comparisons to MCLBYTES to KASSERT and remove
extraneous tests.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2
# 1.115 05-Mar-2006 rpaulo

branches: 1.115.2; 1.115.4;
NDP-related improvements:
RFC4191
- supports host-side router-preference

RFC3542
- if DAD fails on a interface, disables IPv6 operation on the
interface
- don't advertise MLD report before DAD finishes

Others
- fixes integer overflow for valid and preferred lifetimes
- improves timer granularity for MLD, using callout-timer.
- reflects rtadvd's IPv6 host variable information into kernel
(router only)
- adds a sysctl option to enable/disable pMTUd for multicast
packets
- performs NUD on PPP/GRE interface by default
- Redirect works regardless of ip6_accept_rtadv
- removes RFC1885-related code

From the KAME project via SUZUKI Shinsuke.
Reviewed by core.


Revision tags: yamt-pdpolicy-base
# 1.114 03-Mar-2006 rpaulo

branches: 1.114.2;
Fix typos in comments.

From: the KAME project via SUZUKI Shinsuke.


Revision tags: yamt-uio_vmspace-base5
# 1.113 21-Jan-2006 rpaulo

branches: 1.113.2; 1.113.4;
Better support of IPv6 scoped addresses.

- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.
- and, while there, don't try to remove the ff02::/32 interface route
entry in in6_ifdetach() as it's already gone.

This also includes some level of support for the standard source
address selection algorithm defined in RFC3484, which will be
completed on in the future.

From the KAME project via JINMEI Tatuya.
Approved by core@.


# 1.112 11-Dec-2005 christos

branches: 1.112.2;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base ktrace-lwp-base
# 1.111 19-Oct-2005 bouyer

In icmp6_redirect_output(), sip6 is initialised to point to the data area of
m0. But m0 may be freed later, so trying to use sip6 at the end of this
function is wrong. My guess is that we want to reference the data area
of m (the mbuf about to be send) instead at this point.
Fix a panic on Xen (where a data area of a mbuf may be unmapped when the
mbuf is freed), and probably potential data/pool corruption in other cases.


Revision tags: yamt-vop-base
# 1.110 18-Aug-2005 yamt

branches: 1.110.2;
- introduce M_MOVE_PKTHDR and use it where appropriate.
intended to be mostly API compatible with openbsd/freebsd.
- remove a glue #define in netipsec/ipsec_osdep.h.


# 1.109 29-May-2005 christos

branches: 1.109.2;
- avoid shadowed variables
- sprinkle const.


Revision tags: netbsd-3-1-1-RELEASE netbsd-3-0-3-RELEASE netbsd-3-1-RELEASE netbsd-3-0-2-RELEASE netbsd-3-1-RC4 netbsd-3-1-RC3 netbsd-3-1-RC2 netbsd-3-1-RC1 netbsd-3-0-1-RELEASE netbsd-3-0-RELEASE netbsd-3-0-RC6 netbsd-3-0-RC5 netbsd-3-0-RC4 netbsd-3-0-RC3 netbsd-3-0-RC2 netbsd-3-0-RC1 yamt-km-base4 yamt-km-base3 netbsd-3-base yamt-km-base2 yamt-km-base kent-audio2-base
# 1.108 17-Jan-2005 itojun

branches: 1.108.6; 1.108.8; 1.108.10;
shouldn't check code field on "packet too big" icmp6 message.


Revision tags: kent-audio1-beforemerge kent-audio1-base
# 1.107 25-May-2004 atatat

branches: 1.107.4;
Sysctl descriptions under net subtree (net.key not done)


Revision tags: netbsd-2-0-base
# 1.106 26-Mar-2004 itojun

branches: 1.106.2;
do not touch m->m_pkthdr.rcvif after m becomes invalid. Patrick Latifi


# 1.105 24-Mar-2004 atatat

Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.


# 1.104 17-Dec-2003 lha

Fix ICMPV6CTL_ND6_[DP]RLIST, they broke with new sysctl.
Makes ndp -r/ndp -p work again, patch from atatat


# 1.103 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.102 30-Oct-2003 simonb

Remove some assigned-to but otherwise unused variables.


# 1.101 04-Sep-2003 itojun

revamp inpcb/in6pcb so that they are more aligned with each other.
in6pcb lookup now uses hash(9).


# 1.100 25-Aug-2003 itojun

deref member in in6p directly, don't rely on existence of macro


# 1.99 22-Aug-2003 itojun

remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.


# 1.98 22-Aug-2003 itojun

change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.


# 1.97 22-Aug-2003 jonathan

Replace the set_socket() method of passing an extra struct socket*
argument to ip6_output() with a new explicit struct in6pcb* argument.
(The underlying socket can be obtained via in6pcb->inp6_socket.)

In preparation for fast-ipsec. Reviewed by itojun.


# 1.96 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.95 06-Aug-2003 itojun

m_cat may free mbuf on 2nd arg, so m_pkthdr manipulation has to happen
before m_cat call. from Julian Coleman via kame.


# 1.94 24-Jun-2003 itojun

branches: 1.94.2;
remove unneeded checks of accept_rtadv. from kame


# 1.93 24-Jun-2003 itojun

use time.tv_sec directly


# 1.92 06-Jun-2003 itojun

- sync up MLD declaration with RFC3542 (s/MLD6/MLD/)
- routing header declaration with RFC3542
(note: sizeof(ip6_rthdr0) has changed!)
also, sync up with RFC2460 routing header definition (no "strict" source
routing mode any more)

part of advanced API update (RFC2292 -> 3542).


# 1.91 03-Jun-2003 itojun

remove assumption on redirect header option processing. from kame


# 1.90 14-May-2003 itojun

always use PULLDOWN_TEST codepath.


# 1.89 31-Mar-2003 itojun

avoid mbuf leak in redirect header option attachment. more complete
fix to come. from kame


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.88 27-Sep-2002 provos

remove trailing \n in panic(). approved perry.


# 1.87 23-Sep-2002 simonb

Remove breaks after returns, unreachable returns and returns after
returns(!).


# 1.86 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame.


Revision tags: gehenna-devsw-base
# 1.85 30-Jul-2002 itojun

no need to check NULL mbuf, as we touch it already.
From: tedu <grendel@zeitbombe.org>


# 1.84 10-Jul-2002 itojun

correct ping6 -w result wth hostname with [A-Z]. PR 17540. sync w/kame


# 1.83 30-Jun-2002 thorpej

Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).


# 1.82 09-Jun-2002 itojun

whitespace cleanup


# 1.81 08-Jun-2002 itojun

whitespace cleanup


# 1.80 31-May-2002 itojun

do not mistakenly lock PMTUD route entry with RTV_MTU.


# 1.79 29-May-2002 christos

make this compile again.


# 1.78 29-May-2002 itojun

correct rmx_mtu value after PMTUD entry timeout (should be set to 0)


# 1.77 24-May-2002 itojun

extra blank line


# 1.76 24-May-2002 itojun

make a strict check before sending FQDN node information reply. sync w/kame


Revision tags: netbsd-1-6-base eeh-devprop-base newlock-base
# 1.75 05-Mar-2002 itojun

branches: 1.75.6; 1.75.8;
on redirect output, always try to attach target link layer address option.


Revision tags: ifpoll-base
# 1.74 21-Dec-2001 itojun

whitespace/costmetic sync w/kame


# 1.73 20-Dec-2001 itojun

centralize multicast group management (in6_join/leavegroup).
have a flag for ip6_output() to fragment to minimum MTU.
sync with kame


# 1.72 07-Dec-2001 itojun

correct timing to increment icmp6 MIB variables. sync with kame


# 1.71 13-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-mips-cache-base
# 1.70 29-Oct-2001 simonb

Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.


# 1.69 24-Oct-2001 itojun

more whitespace sync with kame


# 1.68 18-Oct-2001 itojun

branches: 1.68.2;
simplify per-if stats.


# 1.67 15-Oct-2001 itojun

sync with kame.
net.inet6.icmp6.nodeinfo is now a bitmap (2^0 = ping6 -w, 2^1 = ping6 -a).
give up local if there's mbuf alloc failures.
cope with ".." in hostname.
sync comments/whitespaces.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.66 22-Jun-2001 itojun

branches: 1.66.2;
remove RFC1885 compatibility code in #ifdef COMPAT_RFC1885, for icmp6
reply packet size consideration (obsolete, not used for a long time).
sync with kame


# 1.65 01-Jun-2001 itojun

use default hoplimit when incoming interface is not given to icmp6_reflect.
sync with kame


# 1.64 08-May-2001 itojun

correct faith prefix determination. use sys/netinet/if_faith.c:faithprefix()
to determine. sync with kame.
(without this change, non-faith socket may mistakenly accept for-faith traffic)


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.63 04-Apr-2001 itojun

make sure rcvif is sane on call to icmp6_reflect


# 1.62 30-Mar-2001 itojun

enable FAKE_LOOPBACK_IF case by default.
now traffic on loopback interface will be presented to bpf as normal wire
format packet (without KAME scopeid in s6_addr16[1]).

fix KAME PR 250 (host mistakenly accepts packets to fe80::x%lo0).

sync with kame.


# 1.61 21-Mar-2001 itojun

set rmx_mtu to L2 interface mtu, instead of 0, on mtudisc timeout.
ip6_output() change is for safety. sync with kame


# 1.60 08-Mar-2001 itojun

remove bogus rtfree. sync with kame. inspired by openbsd PR 1706.


# 1.59 01-Mar-2001 itojun

branches: 1.59.2;
make sure to enforce inbound ipsec policy checking, for any protocols on top
of ip (check it when final header is visited). sync with kame.
XXX kame team will need to re-check policy engine code


# 1.58 11-Feb-2001 itojun

pull latest kame pcbnotify code. synchronizes ICMPv6 path mtu discovery
behavior with other protocols (i.e. validation, use of hiwat/lowat).


# 1.57 11-Feb-2001 itojun

recover $NetBSD$ (removed by mistake)


# 1.56 10-Feb-2001 itojun

to sync with kame better, (1) remove register declaration for variables,
(2) sync whitespaces, (3) update comments. (4) bring in some of portability
and logging enhancements. no functional changes here.


# 1.55 08-Feb-2001 itojun

implement upper limit to icmp6 redirects (experimental, turned off)
negative value to {mtudisc,redirect}_{hi,lo}wat will turn off the limitation.
sync with kame.


# 1.54 07-Feb-2001 itojun

remove bogus DIAGNOSTIC. sync with kame


# 1.53 07-Feb-2001 itojun

during ip6/icmp6 inbound packet processing, do not call log() nor printf() in
normal operation (/var can get filled up by flodding bogus packets).
sysctl net.inet6.icmp6.nd6_debug will turn on diagnostic messages.
(#define ND6_DEBUG will turn it on by default)

improve stats in ND6 code.

lots of synchronziation with kame (including comments and cometic ones).


# 1.52 24-Jan-2001 itojun

- record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation


# 1.51 16-Jan-2001 itojun

s/ND6DEBUG/ND6_DEBUG/ to meet other places


# 1.50 08-Jan-2001 itojun

wrap icmp6 checksum error printf() into #ifdef ND6DEBUG.
sync with kame, NetBSD PR 11911.


# 1.49 11-Dec-2000 itojun

no need to rtalloc1() twice in pmtud. from kame


# 1.48 09-Dec-2000 itojun

update icmp6 too big validation. the change is necessary since pmtud is
mandatory for IPv6 (so we can't just validate by using connected pcb - we need
to allow traffic from unconnected pcb to do pmtud).
- if the traffic is validated by xx_ctlinput, allow up to "hiwat" pmtud
route entries.
- if the traffic was not validated by xx_ctlinput, allow up to "lowat" pmtud
route entries (there's upper limit, so bad guys cannot blow up our routing
table).
sync with kame

XXX need to think again about default hiwat/lowat value.
XXX victim selection to help starvation case


# 1.47 11-Nov-2000 itojun

improve spec conformance of node information query (07).
sync with kame.


# 1.46 18-Oct-2000 itojun

verify ICMPv6 too big messages based on TCP pcbs, and/or IPsec SA.
TODO: udp6, and sendto consideration. as pmtud is mandatory for IPv6,
it is rather important for us to support those cases.
TODO: more testing
TODO: kame sync


# 1.45 10-Oct-2000 itojun

sync with kame ($KAME$)


# 1.44 02-Oct-2000 itojun

fix compilation without INET. fix confusion between ipsecstat and ipsec6stat.
sync with kame.


# 1.43 16-Sep-2000 itojun

kame sys/netinet6/icmp6.c 1.140 -> 1.144
> in the check for the incoming redirect message, examine the gateway
> (from the routing table) only when the address family of the gateway is
> AF_INET6.


# 1.42 19-Aug-2000 itojun

- icmp6 nodeinfo: remove possibility of unaligned pointer access.
- jumbo payload output: fix incorrect mbuf manipulation
- pedant: align issues, mbuf assumption
(sync with kame)


# 1.41 03-Aug-2000 itojun

clearifications in icmp6 node query support.
XXX previous commit included "supported qtypes" icmp6 node query support.
sorry commit message was mistaken.


# 1.40 03-Aug-2000 itojun

correct typo in #define. ICMP6_NI_SUCESS -> SUCCESS (notice missing C).
sync with kame.


# 1.39 30-Jul-2000 itojun

sync comment with reality


# 1.38 28-Jul-2000 itojun

nuke the following sysctl variables. "ppsratelimit" should work better.
need to recompile sbin/sysctl after updating /usr/include.
net.inet.tcp.rstratelimit
net.inet.icmp.errratelimit
net.inet6.icmp6.errratelimit


# 1.37 09-Jul-2000 itojun

add ppsratelimit(9), which does event-per-sec rate limitation.
use it from icmp6 error rate limitation code.
XXX better name for the function?


# 1.36 07-Jul-2000 itojun

sync with kame.
introduce in6_{recover,embed}scope, for in-kernel scoped-address manipulation.
improve in6_pcbnotify.


# 1.35 06-Jul-2000 itojun

- do not use bitfield for router renumbering header.
- add protection mechanism against ND cache corruption due to bad NUD hints.
- more stats
- icmp6 pps limitation. TOOD: should implement ppsratecheck(9).


# 1.34 28-Jun-2000 mrg

<vm/vm.h> -> <uvm/uvm_extern.h>


Revision tags: netbsd-1-5-base
# 1.33 13-Jun-2000 itojun

branches: 1.33.2;
signedness issue with char, take 2. confirmed with i386 cc -funsigned-char.


# 1.32 13-Jun-2000 itojun

workaround to suppress warning on char == unsigned char arch.


# 1.31 12-Jun-2000 itojun

better conformance to draft-ietf-ipngwg-icmp-name-lookups-05.
the old code was chimera of 03 and 05 draft.

-n by default, since IPv6 reverse lookup takes too much time.
use -H to enable reverse name lookup.


Revision tags: minoura-xpg4dl-base
# 1.30 22-May-2000 itojun

branches: 1.30.2;
disallow negative numbers for ratelimit interval (tcp, icmp, icmp6).


# 1.29 09-May-2000 itojun

do not try NUD unless the gateway is a real neighbor.
real fix to KAME PR 245 (workaround has been implemented).


# 1.28 13-Apr-2000 itojun

do not return icmp6 error against icmp6 error.
(this is due to a bug in header chain chasing)


# 1.27 22-Mar-2000 itojun

use ip6_{last,next}hdr in icmp6 inbound packet parsing.


# 1.26 01-Mar-2000 itojun

introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)


# 1.25 28-Feb-2000 itojun

fix ICMPv6 redirect input. the bug can result in invalid ND entry.


# 1.24 28-Feb-2000 itojun

support draft-ietf-ipngwg-icmp-name-lookups-05.txt, drop support for
draft-ietf-ipngwg-icmp-name-lookups-04.txt.

There are certain bitfield change in 04 draft to 05 draft, which makes
04 "ping6 -a" and 05 "ping6 -a" not interoperable. sigh.


# 1.23 26-Feb-2000 itojun

bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.


# 1.22 17-Feb-2000 darrenr

Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.


# 1.21 15-Feb-2000 thorpej

Fix a couple of brainos in the last.


# 1.20 14-Feb-2000 thorpej

Use ratecheck() for ICMP6 rate limiting.


Revision tags: chs-ubc2-newbase
# 1.19 06-Feb-2000 itojun

fix include pathname for better rfc2292 compliance.


# 1.18 16-Jan-2000 itojun

add missing ipcomp cases.


# 1.17 07-Jan-2000 itohy

Rename variable "prep" for PReP port.


# 1.16 06-Jan-2000 itojun

remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...


# 1.15 05-Jan-2000 itojun

avoid panic on getsockopt(ICMPV6_FILTER).


# 1.14 02-Jan-2000 itojun

add net.inet6.icmp6.nodeinfo sysctl.
this allows you to disable/enable ICMPv6 node information query/reply
processing (which tells remote end the gethostname(3) setting, interface
addresses on the node, and some other things - documented in
draft-ietf-ipngwg-icmp-name-lookup* or something alike).

to test it, try ping6 -w ::1 with nodeinfo=0 and nodeinfo=1.
(sync with kame change)


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.13 15-Dec-1999 itojun

do not overwrite traffic class field when we write IPv6 version field.


# 1.12 13-Dec-1999 itojun

sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)


Revision tags: comdex-fall-1999-base fvdl-softdep-base
# 1.11 01-Oct-1999 itojun

branches: 1.11.2; 1.11.8;
consistent logging for icmp6 redirects
XXX should make logs 1-liner so that duplicated logs can be compressed
by syslog(8)?


Revision tags: chs-ubc2-base
# 1.10 31-Jul-1999 itojun

sync with recent KAME.
- loosen ipsec restriction on packet diredtion.
- revise icmp6 redirect handling on IsRouter bit.
- tcp/udp notification processing (link-local address case)
- cosmetic fixes (better code share across *BSD).


# 1.9 30-Jul-1999 itojun

remove reference to in6_systm.h (file itself will be removed afterwords)


# 1.8 22-Jul-1999 itojun

- implement IPv6 pmtud, which is necessary for TCP6.
- fix memory leak on SO_DEBUG over TCP.


# 1.7 22-Jul-1999 itojun

change unnecessary u_long/long into u_int32_t or something relevant.
more fixes should follow.


# 1.6 09-Jul-1999 thorpej

defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).


# 1.5 06-Jul-1999 itojun

sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups


# 1.4 06-Jul-1999 itojun

checked build on alpha and i386, with GENERIC.v6.
fixed several sizeof(void *) and sizeof(size_t) issues on alpha.

Thanks to: Dave Huang and Tim Rightnour


# 1.3 03-Jul-1999 thorpej

RCS ID police.


# 1.2 01-Jul-1999 itojun

branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.1 28-Jun-1999 itojun

branches: 1.1.2;
file icmp6.c was initially added on branch kame.


# 1.246 27-Jul-2020 roy

icmp6: Remove __packed attribute from icmp6 structures

They should naturally align.
Add compile time assertations to icmp6.c to prove this.


# 1.245 12-Jun-2020 roy

Remove in-kernel handling of Router Advertisements

This is much better handled by a user-land tool.
Proposed on tech-net here:
https://mail-index.netbsd.org/tech-net/2020/04/22/msg007766.html

Note that the ioctl SIOCGIFINFO_IN6 no longer sets flags. That now
needs to be done using the pre-existing SIOCSIFINFO_FLAGS ioctl.

Compat is fully provided where it makes sense, but trying to turn on
RA handling will obviously throw an error as it no longer exists.

Note that if you use IPv6 temporary addresses, this now needs to be
turned on in dhcpcd.conf(5) rather than in sysctl.conf(5).


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base phil-wifi-20200406
# 1.244 09-Mar-2020 roy

route: RTM_MISS now puts the message source address in RTA_AUTHOR

route(8) also reports this.
A userland app could use this to blacklist nodes who probe for machines
that doesn't exist on a subnet / prefix.


Revision tags: is-mlppp-base ad-namecache-base3 ad-namecache-base2 ad-namecache-base1 ad-namecache-base phil-wifi-20191119
# 1.243 06-Oct-2019 uwe

icmp6_notify_error - fix ctlfunc typedef to match pr_ctlinput,
drop the cast that is no longer necessary.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.242 22-Dec-2018 maxv

Replace: M_COPY_PKTHDR -> m_copy_pkthdr. No functional change, since the
former is a macro to the latter.


# 1.241 22-Dec-2018 maxv

Replace: M_MOVE_PKTHDR -> m_move_pkthdr. No functional change, since the
former is a macro to the latter.


Revision tags: pgoyette-compat-1126
# 1.240 25-Oct-2018 ozaki-r

Remove a leftover debug printf

Pointed out by hannken@


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.239 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


Revision tags: pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625
# 1.238 01-Jun-2018 ozaki-r

branches: 1.238.2;
Fix _rt_free via rtrequest(RTM_DELETE) hangs in rt_timer handlers

A rt_timer handler is passed a rtentry with an extra reference that avoids the
rtentry is accidentally released. So rt_timer handers must release the reference
of a passed rtentry by themselves (but they didn't).


Revision tags: pgoyette-compat-0521
# 1.237 07-May-2018 maxv

Remove misleading comments.


Revision tags: pgoyette-compat-0502
# 1.236 01-May-2018 maxv

Remove now unused net_osdep.h includes, the other BSDs did the same.


# 1.235 29-Apr-2018 maxv

Replace
m_copym(m, 0, M_COPYALL, M_DONTWAIT)
by
m_copypacket(m, M_DONTWAIT)
when it is obvious that 'm' has M_PKTHDR set.


# 1.234 28-Apr-2018 maxv

Remove unused ipsec_var.h includes.


# 1.233 27-Apr-2018 maxv

Fix a bug introduced in rev1.154 (2009). mcl_cache still has a size of
MCLBYTES, so the area allocated is still too small.

I think it should have been MEXTMALLOC, and of course I can't test my
change.


# 1.232 26-Apr-2018 maxv

Stop using m_copy(), use m_copym() directly. m_copy is useless,
undocumented and confusing.


# 1.231 26-Apr-2018 maxv

Use M_UNWRITABLE, no functional change.


Revision tags: pgoyette-compat-0422 pgoyette-compat-0415
# 1.230 14-Apr-2018 maxv

Fix 'icmp6len', it shouldn't be ip6_plen, because we may not be at the
beginning of the packet (off+ip6_plen is beyond the end of the mbuf). By
luck, the IP6_EXTHDR_GET that follows will fail and prevent buffer
overflows in non-jumbogram packets.

For jumbograms we will probably be in trouble here; but it doesn't seem
possible to craft reliably a jumbogram for a non-jumbogram-enabled device.

So I don't think it's a huge problem.


# 1.229 14-Apr-2018 maxv

Cosmetic, and remove one XXX (no problem).


# 1.228 14-Apr-2018 maxv

Remove the RH0 code from ICMPv6. RH0 is deprecated by RFC5095 (2007) for
security reasons. We already removed it in Route6.

In addition there was an mbuf bug here: calling IP6_EXTHDR_GET twice with
the same offset, but still using the pointer from the first call, which
could have been made invalid. By luck, m_pulldown leaves zero-sized mbufs
in place, instead of freeing them.

And in general, using a 'finaldst' pointer on the mbuf, and then modifying
that mbuf with IP6_EXTHDR_GET with a smaller offset, was really error-
prone.


# 1.227 14-Apr-2018 maxv

Remove dead code. It is the same as the non-obsolete one, since
ICMP6_DST_UNREACH_NOTNEIGHBOR == ICMP6_DST_UNREACH_BEYONDSCOPE,
and the code leads to the same errno value (EHOSTUNREACH).


# 1.226 12-Apr-2018 maxv

Synchronize the code between raw_ip6.c<->icmp6.c<->raw_ip.c, so that it is
the same everywhere.


# 1.225 12-Apr-2018 maxv

Remove misleading comment; we're just checking the SP, not verifying the
AH/ESP payload. While here style a bit.


Revision tags: pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322
# 1.224 21-Mar-2018 roy

Sprinkle more soroverflow().


Revision tags: pgoyette-compat-0315 pgoyette-compat-base
# 1.223 28-Feb-2018 maxv

branches: 1.223.2;
Remove unused ipsec_private.h includes.


# 1.222 26-Feb-2018 maxv

Remove redundant condition (harmless). PR/53030.


# 1.221 26-Feb-2018 maxv

Dedup: merge ipsec4_in_reject and ipsec6_in_reject into ipsec_in_reject.
While here fix misleading comment.

ok ozaki-r@


# 1.220 12-Feb-2018 maxv

Replace bcopy -> memcpy when it is obvious that the areas don't overlap.
Rearrange ip6_splithdr() for clarity.


# 1.219 23-Jan-2018 maxv

Style, localify, remove XXX when there's no issue, and switch 'extra'
to int.


# 1.218 23-Jan-2018 maxv

Fix the check on 'maxlen', we are not creating struct icmp6_hdr but
struct nd_redirect (which is bigger). Also, make sure we can add a
struct nd_opt_rd_hdr.

Normally this doesn't change anything, since the mbuf has IPV6_MMTU
bytes, and it's always way bigger than what we need.


# 1.217 23-Jan-2018 maxv

Fix info leak. We are allocating a slot of size:

roundup(sizeof(*nd_opt) + ifp->if_addrlen, 8)

But we are not filling in the padding caused by the roundup, and therefore
several bytes are leaked, in the mbuf we're about to send to the network.


# 1.216 23-Jan-2018 maxv

Fix twice the same mistake: 'last' can't be null, so there's no point in
having this misleading branch.


# 1.215 23-Jan-2018 maxv

Style, and four fixes:

* Remove the (disabled) IPPROTO_ESP check. If the packet was decrypted it
will have M_DECRYPTED, and this is already checked.

* Memory leaks in icmp6_error2. They seem hardly triggerable.

* Fix miscomputation in _icmp6_input, the ICMP6 header is not guaranteed
to be located right after the IP6 header. ok mlelstv@

* Memory leak in _icmp6_input. This one seems to be impossible to trigger.


Revision tags: tls-maxphys-base-20171202
# 1.214 05-Nov-2017 ozaki-r

Fix usages of ipsec_used

If IPsec isn't used, we must go back to the normal path.

PR kern/52659


Revision tags: nick-nhusb-base-20170825
# 1.213 02-Aug-2017 ozaki-r

Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.


Revision tags: perseant-stdc-iso10646-base
# 1.212 07-Jul-2017 knakahara

fix PR kern/52353. implemented by ozaki-r@n.o. I just commit by proxy.

XXX need to pullup to -8.


Revision tags: netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.211 14-Mar-2017 ozaki-r

branches: 1.211.6;
Replace DIAGNOSTIC + panic with CTASSERT


# 1.210 17-Feb-2017 ozaki-r

Rename if_acquire_NOMPSAFE to if_acquire

It can be used in MP-safe ways. So let's remove the confusing postfix.
If it's used in a unsafe way, warn NOMPSAFE in a comment.


# 1.209 13-Feb-2017 ozaki-r

Protect mtudisc and redirect stuffs of icmp/icmp6 with mutex

We have to run pr_init of icmp and icmp6 prior to tcp and tcp6 ones
for mutex initialization.


# 1.208 07-Feb-2017 ozaki-r

Add missing NULL checks for m_get_rcvif


Revision tags: nick-nhusb-base-20170204
# 1.207 02-Feb-2017 ozaki-r

Defer some pr_input to workqueue

pr_input is currently called in softint. Some pr_input such as ICMP, ICMPv6
and CARP can add/delete/update IP addresses and routing table entries. For
example, icmp6_redirect_input updates an a routing table entry and
nd6_ra_input may delete an IP address.

Basically such operations shouldn't be done in softint. That aside, we have
a reason to avoid the situation; psz/psref waits cannot be used in softint,
however they are required to work in such pr_input in the MP-safe world.

The change implements the workqueue pr_input framework called wqinput which
provides a means to defer pr_input of a protocol to workqueue easily.
Currently icmp_input, icmp6_input, carp_proto_input and carp6_proto_input
are deferred to workqueue by the framework.

Proposed and discussed on tech-kern and tech-net


# 1.206 16-Jan-2017 christos

ip6_sprintf -> IN6_PRINT so that we pass the size.


# 1.205 16-Jan-2017 ryo

Make ip6_sprintf(), in_fmtaddr(), lla_snprintf() and icmp6_redirect_diag() mpsafe.

Reviewed by ozaki-r@


Revision tags: bouyer-socketcan-base
# 1.204 13-Jan-2017 ozaki-r

branches: 1.204.2;
Tweak icmp6_input; always use off, not *offp


Revision tags: pgoyette-localcount-20170107
# 1.203 12-Dec-2016 ozaki-r

Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.202 11-Dec-2016 ozaki-r

Correct sanity checks of icmp6_redirect_output

- rt->rt_ifp is always non-NULL
- Checking RTF_UP here is just racy and meaningless
- The arguments should be non-NULL (at least for now)


Revision tags: nick-nhusb-base-20161204
# 1.201 15-Nov-2016 mlelstv

Enforce alignment requirements that are violated in some cases.
For machines that don't need strict alignment (i386,amd64,vax,m68k) this
is a no-op.

Fixes PR kern/50766 but should be improved.


Revision tags: pgoyette-localcount-20161104
# 1.200 31-Oct-2016 ozaki-r

Fix race condition of in6_selectsrc

in6_selectsrc returned a pointer to in6_addr that wan't guaranteed to be
safe by pserialize (or psref), which was racy. Let callers pass a pointer
to in6_addr and in6_selectsrc copy a result to it inside pserialize
critical sections.


# 1.199 25-Oct-2016 ozaki-r

Remove unnecessary argument

No functional change.


# 1.198 18-Oct-2016 ozaki-r

Remove unnecessary pserialize_read_enter


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.197 26-Aug-2016 dholland

PR 51434 David Binderman: remove redundant test.


# 1.196 19-Aug-2016 roy

Revert r1.148
IP6_EXTHDR_GET ensures that a icmp6 header can be fetched from the mbuf
so m_pullup does not need to be called.

While here, we can safely increament interface error stats even with an
invalidated mbuf because we have a saved reference to the interface.


Revision tags: pgoyette-localcount-20160806
# 1.195 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


Revision tags: pgoyette-localcount-20160726
# 1.194 15-Jul-2016 ozaki-r

Use sin6tosa and sin6tocsa macros

No functional change.


# 1.193 15-Jul-2016 ozaki-r

Use ifatoia6 macro

No functional change.


Revision tags: pgoyette-localcount-base nick-nhusb-base-20160907
# 1.192 07-Jul-2016 ozaki-r

branches: 1.192.2;
Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.


# 1.191 05-Jul-2016 ozaki-r

Use ia6 or ia instead of ifa as a variable name of struct in6_ifaddr

We conventionally use ifa for struct ifaddr and use ia6 or ia for
struct in6_ifaddr.

No functional change.


# 1.190 28-Jun-2016 ozaki-r

Add missing NULL checks for m_get_rcvif_psref


# 1.189 21-Jun-2016 ozaki-r

Make sure returning ifp from in6_select* functions psref-ed

To this end, callers need to pass struct psref to the functions
and the fuctions acquire a reference of ifp with it. In some cases,
we can simply use if_get_byindex, however, in other cases
(say rt->rt_ifp and ia->ifa_ifp), we have no MP-safe way for now.
In order to take a reference anyway we use non MP-safe function
if_acquire_NOMPSAFE for the latter cases. They should be fixed in
the future somehow.


# 1.188 10-Jun-2016 ozaki-r

Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.


# 1.187 10-Jun-2016 ozaki-r

Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.186 18-May-2016 ozaki-r

Don't try to get outif unnecessarily from in6_selectsrc

The got outif is unused.


# 1.185 17-May-2016 ozaki-r

Get rcvif once and reuse it

No functional change.


# 1.184 17-May-2016 ozaki-r

Make sure icmp6_redirect_input frees mbuf before return


# 1.183 12-May-2016 ozaki-r

Protect ifnet list with psz and psref

The change ensures that ifnet objects in the ifnet list aren't freed during
list iterations by using pserialize(9) and psref(9).

Note that the change adds a pslist(9) for ifnet but doesn't remove the
original ifnet list (ifnet_list) to avoid breaking kvm(3) users. We
shouldn't use the original list in the kernel anymore.


Revision tags: nick-nhusb-base-20160422
# 1.182 04-Apr-2016 ozaki-r

Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html


# 1.181 01-Apr-2016 ozaki-r

Remove unnecessary casts and do s/0/NULL/ for rtrequest


# 1.180 01-Apr-2016 ozaki-r

Refine nd6log

Add __func__ to nd6log itself instead of adding it to callers.


Revision tags: nick-nhusb-base-20160319
# 1.179 21-Jan-2016 riastradh

Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.


# 1.178 21-Jan-2016 riastradh

Give proper prototype to ip_output.


Revision tags: nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.177 14-Sep-2015 ozaki-r

Update icmp6_redirect_timeout_q when changing net.inet6.icmp6.redirtimeout

We have to update icmp6_redirect_timeout_q as well as icmp6_redirtimeout
when changing net.inet6.icmp6.redirtimeout via sysctl. The updating logic
is copied from sysctl_net_inet_icmp_redirtimeout.

This change is from s-yamaguchi@IIJ (with KNF by ozaki-r) and fixes
PR kern/50240.


# 1.176 31-Aug-2015 ozaki-r

Make rt_refcnt take into account rt_timer


# 1.175 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.174 24-Aug-2015 ozaki-r

Change 0 to NULL for rtrequest's last argument (struct rtentry **ret_nrt)


# 1.173 07-Aug-2015 ozaki-r

Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.


# 1.172 24-Jul-2015 ozaki-r

Fix rtfree-ing wrong rtentry


# 1.171 17-Jul-2015 ozaki-r

Reform use of rt_refcnt

rt_refcnt of rtentry was used in bad manners, for example, direct rt_refcnt++
and rt_refcnt-- outside route.c, "rt->rt_refcnt++; rtfree(rt);" idiom, and
touching rt after rt->rt_refcnt--.

These abuses seem to be needed because rt_refcnt manages only references
between rtentry and doesn't take care of references during packet processing
(IOW references from local variables). In order to reduce the above abuses,
the latter cases should be counted by rt_refcnt as well as the former cases.

This change improves consistency of use of rt_refcnt:
- rtentry is always accessed with rt_refcnt incremented
- rtentry's rt_refcnt is decremented after use (rtfree is always used instead
of rt_refcnt--)
- functions returning rtentry increment its rt_refcnt (and caller rtfree it)

Note that rt_refcnt prevents rtentry from being freed but doesn't prevent
rtentry from being updated. Toward MP-safe, we need to provide another
protection for rtentry, e.g., locks. (Or introduce a better data structure
allowing concurrent readers during updates.)


Revision tags: nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.170 25-Nov-2014 christos

branches: 1.170.2;
CID 977389: Out of bounds access.


Revision tags: netbsd-7-0-2-RELEASE netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.169 06-Jun-2014 rmind

branches: 1.169.2;
- Eliminate RTFREE() macro in favour of rtfree() function.
- Make rtcache() function static.


# 1.168 30-May-2014 christos

Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.


# 1.167 19-May-2014 rmind

- Split off PRU_ATTACH and PRU_DETACH logic into separate functions.
- Replace malloc with kmem and eliminate M_PCB while here.
- Sprinkle more asserts.


Revision tags: rmind-smpnet-nbase rmind-smpnet-base
# 1.166 18-May-2014 rmind

Use IFNET_FIRST() rather than open coding ifnet access.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.165 25-Feb-2014 pooka

branches: 1.165.2;
Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.


# 1.164 20-Feb-2014 joerg

Bail out in case m_pulldown failed.


# 1.163 23-Nov-2013 christos

convert from CIRCLEQ to TAILQ.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.162 05-Jun-2013 christos

branches: 1.162.2;
IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.


Revision tags: agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.161 23-Jun-2012 christos

branches: 1.161.2;
4 new sysctls to avoid ipv6 DoS attacks from OpenBSD


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.160 22-Mar-2012 drochner

remove KAME IPSEC, replaced by FAST_IPSEC


Revision tags: netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.159 31-Dec-2011 christos

branches: 1.159.2; 1.159.6; 1.159.8;
- fix offsetof usage, and redundant defines
- kill pointer casts to 0


# 1.158 19-Dec-2011 drochner

rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.157 31-Aug-2011 plunky

branches: 1.157.2; 1.157.6;
NULL does not need a cast


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 rmind-uvmplock-base
# 1.156 12-Sep-2010 drochner

avoid NULL dereference in error case


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.155 18-Oct-2009 christos

branches: 1.155.2; 1.155.4;
fix the sun2 case for real.


# 1.154 12-Oct-2009 christos

unbreak sun2.


# 1.153 16-Sep-2009 pooka

Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.152 18-Mar-2009 cegger

bzero -> memset


# 1.151 18-Mar-2009 cegger

bcmp -> memcmp


Revision tags: netbsd-5-2-3-RELEASE netbsd-5-1-5-RELEASE netbsd-5-2-2-RELEASE netbsd-5-1-4-RELEASE netbsd-5-2-1-RELEASE netbsd-5-1-3-RELEASE netbsd-5-2-RELEASE netbsd-5-2-RC1 netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 nick-hppapmap-base2 netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 haad-dm-base mjf-devfs2-base
# 1.150 03-Oct-2008 adrianp

branches: 1.150.2; 1.150.8;
Fix for CVE-2008-3530 from matt@
Implement improved checking for MTU values on ICMP 'Packet Too Big Messages'


Revision tags: wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.149 06-Aug-2008 plunky

Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base
# 1.148 07-May-2008 bouyer

branches: 1.148.2; 1.148.6;
Sync with ipv4 icmp_input(): make sure the mbuf is writable and
contains the entire icmp message befre calling icmp6_input().
should fix "panic: mbuf too short for IPv6 header" seen by several peoples.


# 1.147 04-May-2008 thorpej

Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577


Revision tags: yamt-nfs-mp-base
# 1.146 23-Apr-2008 thorpej

branches: 1.146.2;
Use <net/net_stats.h> / netstat_sysctl().


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.145 15-Apr-2008 thorpej

branches: 1.145.2;
Make ip6 and icmp6 stats per-cpu.


# 1.144 08-Apr-2008 thorpej

Change IPv6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ip6stat structure; old netstat
binaries will continue to work properly.


# 1.143 08-Apr-2008 thorpej

Change ICMP6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old icmp6stat structure; old netstat
binaries will continue to work properly.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.142 27-Feb-2008 matt

Convert to ansi definitions from old-style definitons.
Remember that func() is not ansi, func(void) is.


Revision tags: nick-net80211-sync-base bouyer-xeni386-merge1 vmlocking2-base3 bouyer-xeni386-nbase yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 bouyer-xeni386-base yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase mjf-devfs-base matt-armv6-base jmcneill-pm-base hpcarm-cleanup-base reinoud-bufcleanup-base
# 1.141 04-Dec-2007 dyoung

branches: 1.141.8; 1.141.12;
Use IFNET_FOREACH() and IFADDR_FOREACH().


Revision tags: vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.140 01-Nov-2007 dyoung

branches: 1.140.2; 1.140.4;
De-__P().


# 1.139 29-Oct-2007 dyoung

The IPv6 stack labels incoming packets with an m_tag whose payload
is a struct ip6aux. A struct ip6aux used to contain a pointer to
an in6_ifaddr, but that pointer could become a dangling reference
in the lifetime of the m_tag, because ip6_setdstifaddr() did not
increase the in6_ifaddr's reference count. I have removed the
pointer from ip6aux. I load it with the interesting fields from
the in6_ifaddr (an IPv6 address, a scope ID, and some flags),
instead.


# 1.138 24-Oct-2007 dyoung

Replace rote sockaddr_in6 initializations (memset(), set sa6_family,
sa6_len, and sa6_add) with sockaddr_in6_init() calls.

De-__P(). Constify. KNF. Shorten a staircase. Change bcmp() to
memcmp().

Extract subroutine in6_setzoneid() from in6_setscope(), for re-use
soon.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base vmlocking-base
# 1.137 19-Sep-2007 dyoung

branches: 1.137.4;
1) Introduce a new socket option, (SOL_SOCKET, SO_NOHEADER), that
tells a socket that it should both add a protocol header to tx'd
datagrams and remove the header from rx'd datagrams:

int onoff = 1, s = socket(...);
setsockopt(s, SOL_SOCKET, SO_NOHEADER, &onoff);

2) Add an implementation of (SOL_SOCKET, SO_NOHEADER) for raw IPv4
sockets.

3) Reorganize the protocols' pr_ctloutput implementations a bit.
Consistently return ENOPROTOOPT when an option is unsupported,
and EINVAL if a supported option's arguments are incorrect.
Reorganize the flow of code so that it's more clear how/when
options are passed down the stack until they are handled.

Shorten some pr_ctloutput staircases for readability.

4) Extract common mbuf code into subroutines, add new sockaddr
methods, and introduce a new subroutine, fsocreate(), for reuse
later; use it first in sys_socket():

struct mbuf *m_getsombuf(struct socket *so)

Create an mbuf and make its owner the socket `so'.

struct mbuf *m_intopt(struct socket *so, int val)

Create an mbuf, make its owner the socket `so', put the
int `val' into it, and set its length to sizeof(int).


int fsocreate(..., int *fd)

Create a socket, a la socreate(9), put the socket into the
given LWP's descriptor table, return the descriptor at `fd'
on success.

void *sockaddr_addr(struct sockaddr *sa, socklen_t *slenp)
const void *sockaddr_const_addr(const struct sockaddr *sa, socklen_t *slenp)

Extract a pointer to the address part of a sockaddr. Write
the length of the address part at `slenp', if `slenp' is
not NULL.

socklen_t sockaddr_getlen(const struct sockaddr *sa)

Return the length of a sockaddr. This just evaluates to
sa->sa_len. I only add this for consistency with code that
appears in a portable userland library that I am going to
import.

const struct sockaddr *sockaddr_any(const struct sockaddr *sa)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.

const void *sockaddr_anyaddr(const struct sockaddr *sa, socklen_t *slenp)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.


Revision tags: nick-csl-alignment-base5
# 1.136 10-Aug-2007 dyoung

branches: 1.136.2;
Constify. bcopy -> memcpy.


Revision tags: matt-mips64-base
# 1.135 19-Jul-2007 dyoung

branches: 1.135.4; 1.135.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.134 13-Jun-2007 dyoung

branches: 1.134.2;
Persuasive programming: check M_UNWRITABLE(m, len) instead of
m->m_len<len before pulling up, because that helps make it clear
that we m_pullup() in order to guarantee that the contiguous region
is *writable*.


# 1.133 23-May-2007 christos

Ansify + add a few comments, from Karl Sj��dahl


Revision tags: yamt-idlelwp-base8
# 1.132 02-May-2007 dyoung

Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


Revision tags: thorpej-atomic-base
# 1.131 04-Mar-2007 christos

branches: 1.131.2; 1.131.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.130 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


# 1.129 10-Feb-2007 degroote

branches: 1.129.2;
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.128 29-Jan-2007 dyoung

bzero -> memset


# 1.127 15-Jan-2007 dyoung

Cosmetic: indent using ASCII horizontal tab, insert space following
comma, wrap line.


# 1.126 15-Jan-2007 degroote

Fix an infinite loop ( and local dos ) in the case where the ip6_hdr and
the icmp6_hdr are not in the same mbuf.
Fix pr/34994 and probably pr/35333
Ok @rpaulo


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.125 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.124 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


Revision tags: netbsd-4-base
# 1.123 16-Nov-2006 christos

branches: 1.123.2;
__unused removal on arguments; approved by core.


Revision tags: yamt-splraiseipl-base2
# 1.122 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.121 05-Sep-2006 dyoung

branches: 1.121.2; 1.121.4;
Simplify and repair icmp6_input() to stop the kernel from panicking
in m_copydata() when an ICMP6_ECHO_REQUEST is received, as reported
by Tatoku Ogaito on current-users@.


Revision tags: yamt-pdpolicy-base8
# 1.120 01-Sep-2006 dyoung

Vastly simplify the code that copies an ICMP6 packet to two data
paths: ICMP6 reply path, and socket path.


# 1.119 30-Aug-2006 christos

declare the type of code.


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.118 11-Jul-2006 tron

Clear mbuf checksum flags before passing it to ip6_output(). We might
recycle a mbuf which contained a hardware provided checksum. This
fixes "traceroute6" to a machine which is using a wm(4) interface
that has UDP or TCP checksum offload enabled.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base
# 1.117 07-Jun-2006 kardel

branches: 1.117.2;
merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base simonb-timecounters-base
# 1.116 15-Apr-2006 christos

branches: 1.116.2;
Coverity CID 740: Change constant comparisons to MCLBYTES to KASSERT and remove
extraneous tests.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2
# 1.115 05-Mar-2006 rpaulo

branches: 1.115.2; 1.115.4;
NDP-related improvements:
RFC4191
- supports host-side router-preference

RFC3542
- if DAD fails on a interface, disables IPv6 operation on the
interface
- don't advertise MLD report before DAD finishes

Others
- fixes integer overflow for valid and preferred lifetimes
- improves timer granularity for MLD, using callout-timer.
- reflects rtadvd's IPv6 host variable information into kernel
(router only)
- adds a sysctl option to enable/disable pMTUd for multicast
packets
- performs NUD on PPP/GRE interface by default
- Redirect works regardless of ip6_accept_rtadv
- removes RFC1885-related code

From the KAME project via SUZUKI Shinsuke.
Reviewed by core.


Revision tags: yamt-pdpolicy-base
# 1.114 03-Mar-2006 rpaulo

branches: 1.114.2;
Fix typos in comments.

From: the KAME project via SUZUKI Shinsuke.


Revision tags: yamt-uio_vmspace-base5
# 1.113 21-Jan-2006 rpaulo

branches: 1.113.2; 1.113.4;
Better support of IPv6 scoped addresses.

- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.
- and, while there, don't try to remove the ff02::/32 interface route
entry in in6_ifdetach() as it's already gone.

This also includes some level of support for the standard source
address selection algorithm defined in RFC3484, which will be
completed on in the future.

From the KAME project via JINMEI Tatuya.
Approved by core@.


# 1.112 11-Dec-2005 christos

branches: 1.112.2;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base ktrace-lwp-base
# 1.111 19-Oct-2005 bouyer

In icmp6_redirect_output(), sip6 is initialised to point to the data area of
m0. But m0 may be freed later, so trying to use sip6 at the end of this
function is wrong. My guess is that we want to reference the data area
of m (the mbuf about to be send) instead at this point.
Fix a panic on Xen (where a data area of a mbuf may be unmapped when the
mbuf is freed), and probably potential data/pool corruption in other cases.


Revision tags: yamt-vop-base
# 1.110 18-Aug-2005 yamt

branches: 1.110.2;
- introduce M_MOVE_PKTHDR and use it where appropriate.
intended to be mostly API compatible with openbsd/freebsd.
- remove a glue #define in netipsec/ipsec_osdep.h.


# 1.109 29-May-2005 christos

branches: 1.109.2;
- avoid shadowed variables
- sprinkle const.


Revision tags: netbsd-3-1-1-RELEASE netbsd-3-0-3-RELEASE netbsd-3-1-RELEASE netbsd-3-0-2-RELEASE netbsd-3-1-RC4 netbsd-3-1-RC3 netbsd-3-1-RC2 netbsd-3-1-RC1 netbsd-3-0-1-RELEASE netbsd-3-0-RELEASE netbsd-3-0-RC6 netbsd-3-0-RC5 netbsd-3-0-RC4 netbsd-3-0-RC3 netbsd-3-0-RC2 netbsd-3-0-RC1 yamt-km-base4 yamt-km-base3 netbsd-3-base yamt-km-base2 yamt-km-base kent-audio2-base
# 1.108 17-Jan-2005 itojun

branches: 1.108.6; 1.108.8; 1.108.10;
shouldn't check code field on "packet too big" icmp6 message.


Revision tags: kent-audio1-beforemerge kent-audio1-base
# 1.107 25-May-2004 atatat

branches: 1.107.4;
Sysctl descriptions under net subtree (net.key not done)


Revision tags: netbsd-2-0-base
# 1.106 26-Mar-2004 itojun

branches: 1.106.2;
do not touch m->m_pkthdr.rcvif after m becomes invalid. Patrick Latifi


# 1.105 24-Mar-2004 atatat

Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.


# 1.104 17-Dec-2003 lha

Fix ICMPV6CTL_ND6_[DP]RLIST, they broke with new sysctl.
Makes ndp -r/ndp -p work again, patch from atatat


# 1.103 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.102 30-Oct-2003 simonb

Remove some assigned-to but otherwise unused variables.


# 1.101 04-Sep-2003 itojun

revamp inpcb/in6pcb so that they are more aligned with each other.
in6pcb lookup now uses hash(9).


# 1.100 25-Aug-2003 itojun

deref member in in6p directly, don't rely on existence of macro


# 1.99 22-Aug-2003 itojun

remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.


# 1.98 22-Aug-2003 itojun

change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.


# 1.97 22-Aug-2003 jonathan

Replace the set_socket() method of passing an extra struct socket*
argument to ip6_output() with a new explicit struct in6pcb* argument.
(The underlying socket can be obtained via in6pcb->inp6_socket.)

In preparation for fast-ipsec. Reviewed by itojun.


# 1.96 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.95 06-Aug-2003 itojun

m_cat may free mbuf on 2nd arg, so m_pkthdr manipulation has to happen
before m_cat call. from Julian Coleman via kame.


# 1.94 24-Jun-2003 itojun

branches: 1.94.2;
remove unneeded checks of accept_rtadv. from kame


# 1.93 24-Jun-2003 itojun

use time.tv_sec directly


# 1.92 06-Jun-2003 itojun

- sync up MLD declaration with RFC3542 (s/MLD6/MLD/)
- routing header declaration with RFC3542
(note: sizeof(ip6_rthdr0) has changed!)
also, sync up with RFC2460 routing header definition (no "strict" source
routing mode any more)

part of advanced API update (RFC2292 -> 3542).


# 1.91 03-Jun-2003 itojun

remove assumption on redirect header option processing. from kame


# 1.90 14-May-2003 itojun

always use PULLDOWN_TEST codepath.


# 1.89 31-Mar-2003 itojun

avoid mbuf leak in redirect header option attachment. more complete
fix to come. from kame


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.88 27-Sep-2002 provos

remove trailing \n in panic(). approved perry.


# 1.87 23-Sep-2002 simonb

Remove breaks after returns, unreachable returns and returns after
returns(!).


# 1.86 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame.


Revision tags: gehenna-devsw-base
# 1.85 30-Jul-2002 itojun

no need to check NULL mbuf, as we touch it already.
From: tedu <grendel@zeitbombe.org>


# 1.84 10-Jul-2002 itojun

correct ping6 -w result wth hostname with [A-Z]. PR 17540. sync w/kame


# 1.83 30-Jun-2002 thorpej

Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).


# 1.82 09-Jun-2002 itojun

whitespace cleanup


# 1.81 08-Jun-2002 itojun

whitespace cleanup


# 1.80 31-May-2002 itojun

do not mistakenly lock PMTUD route entry with RTV_MTU.


# 1.79 29-May-2002 christos

make this compile again.


# 1.78 29-May-2002 itojun

correct rmx_mtu value after PMTUD entry timeout (should be set to 0)


# 1.77 24-May-2002 itojun

extra blank line


# 1.76 24-May-2002 itojun

make a strict check before sending FQDN node information reply. sync w/kame


Revision tags: netbsd-1-6-base eeh-devprop-base newlock-base
# 1.75 05-Mar-2002 itojun

branches: 1.75.6; 1.75.8;
on redirect output, always try to attach target link layer address option.


Revision tags: ifpoll-base
# 1.74 21-Dec-2001 itojun

whitespace/costmetic sync w/kame


# 1.73 20-Dec-2001 itojun

centralize multicast group management (in6_join/leavegroup).
have a flag for ip6_output() to fragment to minimum MTU.
sync with kame


# 1.72 07-Dec-2001 itojun

correct timing to increment icmp6 MIB variables. sync with kame


# 1.71 13-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-mips-cache-base
# 1.70 29-Oct-2001 simonb

Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.


# 1.69 24-Oct-2001 itojun

more whitespace sync with kame


# 1.68 18-Oct-2001 itojun

branches: 1.68.2;
simplify per-if stats.


# 1.67 15-Oct-2001 itojun

sync with kame.
net.inet6.icmp6.nodeinfo is now a bitmap (2^0 = ping6 -w, 2^1 = ping6 -a).
give up local if there's mbuf alloc failures.
cope with ".." in hostname.
sync comments/whitespaces.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.66 22-Jun-2001 itojun

branches: 1.66.2;
remove RFC1885 compatibility code in #ifdef COMPAT_RFC1885, for icmp6
reply packet size consideration (obsolete, not used for a long time).
sync with kame


# 1.65 01-Jun-2001 itojun

use default hoplimit when incoming interface is not given to icmp6_reflect.
sync with kame


# 1.64 08-May-2001 itojun

correct faith prefix determination. use sys/netinet/if_faith.c:faithprefix()
to determine. sync with kame.
(without this change, non-faith socket may mistakenly accept for-faith traffic)


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.63 04-Apr-2001 itojun

make sure rcvif is sane on call to icmp6_reflect


# 1.62 30-Mar-2001 itojun

enable FAKE_LOOPBACK_IF case by default.
now traffic on loopback interface will be presented to bpf as normal wire
format packet (without KAME scopeid in s6_addr16[1]).

fix KAME PR 250 (host mistakenly accepts packets to fe80::x%lo0).

sync with kame.


# 1.61 21-Mar-2001 itojun

set rmx_mtu to L2 interface mtu, instead of 0, on mtudisc timeout.
ip6_output() change is for safety. sync with kame


# 1.60 08-Mar-2001 itojun

remove bogus rtfree. sync with kame. inspired by openbsd PR 1706.


# 1.59 01-Mar-2001 itojun

branches: 1.59.2;
make sure to enforce inbound ipsec policy checking, for any protocols on top
of ip (check it when final header is visited). sync with kame.
XXX kame team will need to re-check policy engine code


# 1.58 11-Feb-2001 itojun

pull latest kame pcbnotify code. synchronizes ICMPv6 path mtu discovery
behavior with other protocols (i.e. validation, use of hiwat/lowat).


# 1.57 11-Feb-2001 itojun

recover $NetBSD$ (removed by mistake)


# 1.56 10-Feb-2001 itojun

to sync with kame better, (1) remove register declaration for variables,
(2) sync whitespaces, (3) update comments. (4) bring in some of portability
and logging enhancements. no functional changes here.


# 1.55 08-Feb-2001 itojun

implement upper limit to icmp6 redirects (experimental, turned off)
negative value to {mtudisc,redirect}_{hi,lo}wat will turn off the limitation.
sync with kame.


# 1.54 07-Feb-2001 itojun

remove bogus DIAGNOSTIC. sync with kame


# 1.53 07-Feb-2001 itojun

during ip6/icmp6 inbound packet processing, do not call log() nor printf() in
normal operation (/var can get filled up by flodding bogus packets).
sysctl net.inet6.icmp6.nd6_debug will turn on diagnostic messages.
(#define ND6_DEBUG will turn it on by default)

improve stats in ND6 code.

lots of synchronziation with kame (including comments and cometic ones).


# 1.52 24-Jan-2001 itojun

- record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation


# 1.51 16-Jan-2001 itojun

s/ND6DEBUG/ND6_DEBUG/ to meet other places


# 1.50 08-Jan-2001 itojun

wrap icmp6 checksum error printf() into #ifdef ND6DEBUG.
sync with kame, NetBSD PR 11911.


# 1.49 11-Dec-2000 itojun

no need to rtalloc1() twice in pmtud. from kame


# 1.48 09-Dec-2000 itojun

update icmp6 too big validation. the change is necessary since pmtud is
mandatory for IPv6 (so we can't just validate by using connected pcb - we need
to allow traffic from unconnected pcb to do pmtud).
- if the traffic is validated by xx_ctlinput, allow up to "hiwat" pmtud
route entries.
- if the traffic was not validated by xx_ctlinput, allow up to "lowat" pmtud
route entries (there's upper limit, so bad guys cannot blow up our routing
table).
sync with kame

XXX need to think again about default hiwat/lowat value.
XXX victim selection to help starvation case


# 1.47 11-Nov-2000 itojun

improve spec conformance of node information query (07).
sync with kame.


# 1.46 18-Oct-2000 itojun

verify ICMPv6 too big messages based on TCP pcbs, and/or IPsec SA.
TODO: udp6, and sendto consideration. as pmtud is mandatory for IPv6,
it is rather important for us to support those cases.
TODO: more testing
TODO: kame sync


# 1.45 10-Oct-2000 itojun

sync with kame ($KAME$)


# 1.44 02-Oct-2000 itojun

fix compilation without INET. fix confusion between ipsecstat and ipsec6stat.
sync with kame.


# 1.43 16-Sep-2000 itojun

kame sys/netinet6/icmp6.c 1.140 -> 1.144
> in the check for the incoming redirect message, examine the gateway
> (from the routing table) only when the address family of the gateway is
> AF_INET6.


# 1.42 19-Aug-2000 itojun

- icmp6 nodeinfo: remove possibility of unaligned pointer access.
- jumbo payload output: fix incorrect mbuf manipulation
- pedant: align issues, mbuf assumption
(sync with kame)


# 1.41 03-Aug-2000 itojun

clearifications in icmp6 node query support.
XXX previous commit included "supported qtypes" icmp6 node query support.
sorry commit message was mistaken.


# 1.40 03-Aug-2000 itojun

correct typo in #define. ICMP6_NI_SUCESS -> SUCCESS (notice missing C).
sync with kame.


# 1.39 30-Jul-2000 itojun

sync comment with reality


# 1.38 28-Jul-2000 itojun

nuke the following sysctl variables. "ppsratelimit" should work better.
need to recompile sbin/sysctl after updating /usr/include.
net.inet.tcp.rstratelimit
net.inet.icmp.errratelimit
net.inet6.icmp6.errratelimit


# 1.37 09-Jul-2000 itojun

add ppsratelimit(9), which does event-per-sec rate limitation.
use it from icmp6 error rate limitation code.
XXX better name for the function?


# 1.36 07-Jul-2000 itojun

sync with kame.
introduce in6_{recover,embed}scope, for in-kernel scoped-address manipulation.
improve in6_pcbnotify.


# 1.35 06-Jul-2000 itojun

- do not use bitfield for router renumbering header.
- add protection mechanism against ND cache corruption due to bad NUD hints.
- more stats
- icmp6 pps limitation. TOOD: should implement ppsratecheck(9).


# 1.34 28-Jun-2000 mrg

<vm/vm.h> -> <uvm/uvm_extern.h>


Revision tags: netbsd-1-5-base
# 1.33 13-Jun-2000 itojun

branches: 1.33.2;
signedness issue with char, take 2. confirmed with i386 cc -funsigned-char.


# 1.32 13-Jun-2000 itojun

workaround to suppress warning on char == unsigned char arch.


# 1.31 12-Jun-2000 itojun

better conformance to draft-ietf-ipngwg-icmp-name-lookups-05.
the old code was chimera of 03 and 05 draft.

-n by default, since IPv6 reverse lookup takes too much time.
use -H to enable reverse name lookup.


Revision tags: minoura-xpg4dl-base
# 1.30 22-May-2000 itojun

branches: 1.30.2;
disallow negative numbers for ratelimit interval (tcp, icmp, icmp6).


# 1.29 09-May-2000 itojun

do not try NUD unless the gateway is a real neighbor.
real fix to KAME PR 245 (workaround has been implemented).


# 1.28 13-Apr-2000 itojun

do not return icmp6 error against icmp6 error.
(this is due to a bug in header chain chasing)


# 1.27 22-Mar-2000 itojun

use ip6_{last,next}hdr in icmp6 inbound packet parsing.


# 1.26 01-Mar-2000 itojun

introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)


# 1.25 28-Feb-2000 itojun

fix ICMPv6 redirect input. the bug can result in invalid ND entry.


# 1.24 28-Feb-2000 itojun

support draft-ietf-ipngwg-icmp-name-lookups-05.txt, drop support for
draft-ietf-ipngwg-icmp-name-lookups-04.txt.

There are certain bitfield change in 04 draft to 05 draft, which makes
04 "ping6 -a" and 05 "ping6 -a" not interoperable. sigh.


# 1.23 26-Feb-2000 itojun

bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.


# 1.22 17-Feb-2000 darrenr

Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.


# 1.21 15-Feb-2000 thorpej

Fix a couple of brainos in the last.


# 1.20 14-Feb-2000 thorpej

Use ratecheck() for ICMP6 rate limiting.


Revision tags: chs-ubc2-newbase
# 1.19 06-Feb-2000 itojun

fix include pathname for better rfc2292 compliance.


# 1.18 16-Jan-2000 itojun

add missing ipcomp cases.


# 1.17 07-Jan-2000 itohy

Rename variable "prep" for PReP port.


# 1.16 06-Jan-2000 itojun

remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...


# 1.15 05-Jan-2000 itojun

avoid panic on getsockopt(ICMPV6_FILTER).


# 1.14 02-Jan-2000 itojun

add net.inet6.icmp6.nodeinfo sysctl.
this allows you to disable/enable ICMPv6 node information query/reply
processing (which tells remote end the gethostname(3) setting, interface
addresses on the node, and some other things - documented in
draft-ietf-ipngwg-icmp-name-lookup* or something alike).

to test it, try ping6 -w ::1 with nodeinfo=0 and nodeinfo=1.
(sync with kame change)


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.13 15-Dec-1999 itojun

do not overwrite traffic class field when we write IPv6 version field.


# 1.12 13-Dec-1999 itojun

sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)


Revision tags: comdex-fall-1999-base fvdl-softdep-base
# 1.11 01-Oct-1999 itojun

branches: 1.11.2; 1.11.8;
consistent logging for icmp6 redirects
XXX should make logs 1-liner so that duplicated logs can be compressed
by syslog(8)?


Revision tags: chs-ubc2-base
# 1.10 31-Jul-1999 itojun

sync with recent KAME.
- loosen ipsec restriction on packet diredtion.
- revise icmp6 redirect handling on IsRouter bit.
- tcp/udp notification processing (link-local address case)
- cosmetic fixes (better code share across *BSD).


# 1.9 30-Jul-1999 itojun

remove reference to in6_systm.h (file itself will be removed afterwords)


# 1.8 22-Jul-1999 itojun

- implement IPv6 pmtud, which is necessary for TCP6.
- fix memory leak on SO_DEBUG over TCP.


# 1.7 22-Jul-1999 itojun

change unnecessary u_long/long into u_int32_t or something relevant.
more fixes should follow.


# 1.6 09-Jul-1999 thorpej

defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).


# 1.5 06-Jul-1999 itojun

sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups


# 1.4 06-Jul-1999 itojun

checked build on alpha and i386, with GENERIC.v6.
fixed several sizeof(void *) and sizeof(size_t) issues on alpha.

Thanks to: Dave Huang and Tim Rightnour


# 1.3 03-Jul-1999 thorpej

RCS ID police.


# 1.2 01-Jul-1999 itojun

branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.1 28-Jun-1999 itojun

branches: 1.1.2;
file icmp6.c was initially added on branch kame.


# 1.245 12-Jun-2020 roy

Remove in-kernel handling of Router Advertisements

This is much better handled by a user-land tool.
Proposed on tech-net here:
https://mail-index.netbsd.org/tech-net/2020/04/22/msg007766.html

Note that the ioctl SIOCGIFINFO_IN6 no longer sets flags. That now
needs to be done using the pre-existing SIOCSIFINFO_FLAGS ioctl.

Compat is fully provided where it makes sense, but trying to turn on
RA handling will obviously throw an error as it no longer exists.

Note that if you use IPv6 temporary addresses, this now needs to be
turned on in dhcpcd.conf(5) rather than in sysctl.conf(5).


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base phil-wifi-20200406
# 1.244 09-Mar-2020 roy

route: RTM_MISS now puts the message source address in RTA_AUTHOR

route(8) also reports this.
A userland app could use this to blacklist nodes who probe for machines
that doesn't exist on a subnet / prefix.


Revision tags: is-mlppp-base ad-namecache-base3 ad-namecache-base2 ad-namecache-base1 ad-namecache-base phil-wifi-20191119
# 1.243 06-Oct-2019 uwe

icmp6_notify_error - fix ctlfunc typedef to match pr_ctlinput,
drop the cast that is no longer necessary.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.242 22-Dec-2018 maxv

Replace: M_COPY_PKTHDR -> m_copy_pkthdr. No functional change, since the
former is a macro to the latter.


# 1.241 22-Dec-2018 maxv

Replace: M_MOVE_PKTHDR -> m_move_pkthdr. No functional change, since the
former is a macro to the latter.


Revision tags: pgoyette-compat-1126
# 1.240 25-Oct-2018 ozaki-r

Remove a leftover debug printf

Pointed out by hannken@


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.239 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


Revision tags: pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625
# 1.238 01-Jun-2018 ozaki-r

branches: 1.238.2;
Fix _rt_free via rtrequest(RTM_DELETE) hangs in rt_timer handlers

A rt_timer handler is passed a rtentry with an extra reference that avoids the
rtentry is accidentally released. So rt_timer handers must release the reference
of a passed rtentry by themselves (but they didn't).


Revision tags: pgoyette-compat-0521
# 1.237 07-May-2018 maxv

Remove misleading comments.


Revision tags: pgoyette-compat-0502
# 1.236 01-May-2018 maxv

Remove now unused net_osdep.h includes, the other BSDs did the same.


# 1.235 29-Apr-2018 maxv

Replace
m_copym(m, 0, M_COPYALL, M_DONTWAIT)
by
m_copypacket(m, M_DONTWAIT)
when it is obvious that 'm' has M_PKTHDR set.


# 1.234 28-Apr-2018 maxv

Remove unused ipsec_var.h includes.


# 1.233 27-Apr-2018 maxv

Fix a bug introduced in rev1.154 (2009). mcl_cache still has a size of
MCLBYTES, so the area allocated is still too small.

I think it should have been MEXTMALLOC, and of course I can't test my
change.


# 1.232 26-Apr-2018 maxv

Stop using m_copy(), use m_copym() directly. m_copy is useless,
undocumented and confusing.


# 1.231 26-Apr-2018 maxv

Use M_UNWRITABLE, no functional change.


Revision tags: pgoyette-compat-0422 pgoyette-compat-0415
# 1.230 14-Apr-2018 maxv

Fix 'icmp6len', it shouldn't be ip6_plen, because we may not be at the
beginning of the packet (off+ip6_plen is beyond the end of the mbuf). By
luck, the IP6_EXTHDR_GET that follows will fail and prevent buffer
overflows in non-jumbogram packets.

For jumbograms we will probably be in trouble here; but it doesn't seem
possible to craft reliably a jumbogram for a non-jumbogram-enabled device.

So I don't think it's a huge problem.


# 1.229 14-Apr-2018 maxv

Cosmetic, and remove one XXX (no problem).


# 1.228 14-Apr-2018 maxv

Remove the RH0 code from ICMPv6. RH0 is deprecated by RFC5095 (2007) for
security reasons. We already removed it in Route6.

In addition there was an mbuf bug here: calling IP6_EXTHDR_GET twice with
the same offset, but still using the pointer from the first call, which
could have been made invalid. By luck, m_pulldown leaves zero-sized mbufs
in place, instead of freeing them.

And in general, using a 'finaldst' pointer on the mbuf, and then modifying
that mbuf with IP6_EXTHDR_GET with a smaller offset, was really error-
prone.


# 1.227 14-Apr-2018 maxv

Remove dead code. It is the same as the non-obsolete one, since
ICMP6_DST_UNREACH_NOTNEIGHBOR == ICMP6_DST_UNREACH_BEYONDSCOPE,
and the code leads to the same errno value (EHOSTUNREACH).


# 1.226 12-Apr-2018 maxv

Synchronize the code between raw_ip6.c<->icmp6.c<->raw_ip.c, so that it is
the same everywhere.


# 1.225 12-Apr-2018 maxv

Remove misleading comment; we're just checking the SP, not verifying the
AH/ESP payload. While here style a bit.


Revision tags: pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322
# 1.224 21-Mar-2018 roy

Sprinkle more soroverflow().


Revision tags: pgoyette-compat-0315 pgoyette-compat-base
# 1.223 28-Feb-2018 maxv

branches: 1.223.2;
Remove unused ipsec_private.h includes.


# 1.222 26-Feb-2018 maxv

Remove redundant condition (harmless). PR/53030.


# 1.221 26-Feb-2018 maxv

Dedup: merge ipsec4_in_reject and ipsec6_in_reject into ipsec_in_reject.
While here fix misleading comment.

ok ozaki-r@


# 1.220 12-Feb-2018 maxv

Replace bcopy -> memcpy when it is obvious that the areas don't overlap.
Rearrange ip6_splithdr() for clarity.


# 1.219 23-Jan-2018 maxv

Style, localify, remove XXX when there's no issue, and switch 'extra'
to int.


# 1.218 23-Jan-2018 maxv

Fix the check on 'maxlen', we are not creating struct icmp6_hdr but
struct nd_redirect (which is bigger). Also, make sure we can add a
struct nd_opt_rd_hdr.

Normally this doesn't change anything, since the mbuf has IPV6_MMTU
bytes, and it's always way bigger than what we need.


# 1.217 23-Jan-2018 maxv

Fix info leak. We are allocating a slot of size:

roundup(sizeof(*nd_opt) + ifp->if_addrlen, 8)

But we are not filling in the padding caused by the roundup, and therefore
several bytes are leaked, in the mbuf we're about to send to the network.


# 1.216 23-Jan-2018 maxv

Fix twice the same mistake: 'last' can't be null, so there's no point in
having this misleading branch.


# 1.215 23-Jan-2018 maxv

Style, and four fixes:

* Remove the (disabled) IPPROTO_ESP check. If the packet was decrypted it
will have M_DECRYPTED, and this is already checked.

* Memory leaks in icmp6_error2. They seem hardly triggerable.

* Fix miscomputation in _icmp6_input, the ICMP6 header is not guaranteed
to be located right after the IP6 header. ok mlelstv@

* Memory leak in _icmp6_input. This one seems to be impossible to trigger.


Revision tags: tls-maxphys-base-20171202
# 1.214 05-Nov-2017 ozaki-r

Fix usages of ipsec_used

If IPsec isn't used, we must go back to the normal path.

PR kern/52659


Revision tags: nick-nhusb-base-20170825
# 1.213 02-Aug-2017 ozaki-r

Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.


Revision tags: perseant-stdc-iso10646-base
# 1.212 07-Jul-2017 knakahara

fix PR kern/52353. implemented by ozaki-r@n.o. I just commit by proxy.

XXX need to pullup to -8.


Revision tags: netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.211 14-Mar-2017 ozaki-r

branches: 1.211.6;
Replace DIAGNOSTIC + panic with CTASSERT


# 1.210 17-Feb-2017 ozaki-r

Rename if_acquire_NOMPSAFE to if_acquire

It can be used in MP-safe ways. So let's remove the confusing postfix.
If it's used in a unsafe way, warn NOMPSAFE in a comment.


# 1.209 13-Feb-2017 ozaki-r

Protect mtudisc and redirect stuffs of icmp/icmp6 with mutex

We have to run pr_init of icmp and icmp6 prior to tcp and tcp6 ones
for mutex initialization.


# 1.208 07-Feb-2017 ozaki-r

Add missing NULL checks for m_get_rcvif


Revision tags: nick-nhusb-base-20170204
# 1.207 02-Feb-2017 ozaki-r

Defer some pr_input to workqueue

pr_input is currently called in softint. Some pr_input such as ICMP, ICMPv6
and CARP can add/delete/update IP addresses and routing table entries. For
example, icmp6_redirect_input updates an a routing table entry and
nd6_ra_input may delete an IP address.

Basically such operations shouldn't be done in softint. That aside, we have
a reason to avoid the situation; psz/psref waits cannot be used in softint,
however they are required to work in such pr_input in the MP-safe world.

The change implements the workqueue pr_input framework called wqinput which
provides a means to defer pr_input of a protocol to workqueue easily.
Currently icmp_input, icmp6_input, carp_proto_input and carp6_proto_input
are deferred to workqueue by the framework.

Proposed and discussed on tech-kern and tech-net


# 1.206 16-Jan-2017 christos

ip6_sprintf -> IN6_PRINT so that we pass the size.


# 1.205 16-Jan-2017 ryo

Make ip6_sprintf(), in_fmtaddr(), lla_snprintf() and icmp6_redirect_diag() mpsafe.

Reviewed by ozaki-r@


Revision tags: bouyer-socketcan-base
# 1.204 13-Jan-2017 ozaki-r

branches: 1.204.2;
Tweak icmp6_input; always use off, not *offp


Revision tags: pgoyette-localcount-20170107
# 1.203 12-Dec-2016 ozaki-r

Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.202 11-Dec-2016 ozaki-r

Correct sanity checks of icmp6_redirect_output

- rt->rt_ifp is always non-NULL
- Checking RTF_UP here is just racy and meaningless
- The arguments should be non-NULL (at least for now)


Revision tags: nick-nhusb-base-20161204
# 1.201 15-Nov-2016 mlelstv

Enforce alignment requirements that are violated in some cases.
For machines that don't need strict alignment (i386,amd64,vax,m68k) this
is a no-op.

Fixes PR kern/50766 but should be improved.


Revision tags: pgoyette-localcount-20161104
# 1.200 31-Oct-2016 ozaki-r

Fix race condition of in6_selectsrc

in6_selectsrc returned a pointer to in6_addr that wan't guaranteed to be
safe by pserialize (or psref), which was racy. Let callers pass a pointer
to in6_addr and in6_selectsrc copy a result to it inside pserialize
critical sections.


# 1.199 25-Oct-2016 ozaki-r

Remove unnecessary argument

No functional change.


# 1.198 18-Oct-2016 ozaki-r

Remove unnecessary pserialize_read_enter


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.197 26-Aug-2016 dholland

PR 51434 David Binderman: remove redundant test.


# 1.196 19-Aug-2016 roy

Revert r1.148
IP6_EXTHDR_GET ensures that a icmp6 header can be fetched from the mbuf
so m_pullup does not need to be called.

While here, we can safely increament interface error stats even with an
invalidated mbuf because we have a saved reference to the interface.


Revision tags: pgoyette-localcount-20160806
# 1.195 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


Revision tags: pgoyette-localcount-20160726
# 1.194 15-Jul-2016 ozaki-r

Use sin6tosa and sin6tocsa macros

No functional change.


# 1.193 15-Jul-2016 ozaki-r

Use ifatoia6 macro

No functional change.


Revision tags: pgoyette-localcount-base nick-nhusb-base-20160907
# 1.192 07-Jul-2016 ozaki-r

branches: 1.192.2;
Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.


# 1.191 05-Jul-2016 ozaki-r

Use ia6 or ia instead of ifa as a variable name of struct in6_ifaddr

We conventionally use ifa for struct ifaddr and use ia6 or ia for
struct in6_ifaddr.

No functional change.


# 1.190 28-Jun-2016 ozaki-r

Add missing NULL checks for m_get_rcvif_psref


# 1.189 21-Jun-2016 ozaki-r

Make sure returning ifp from in6_select* functions psref-ed

To this end, callers need to pass struct psref to the functions
and the fuctions acquire a reference of ifp with it. In some cases,
we can simply use if_get_byindex, however, in other cases
(say rt->rt_ifp and ia->ifa_ifp), we have no MP-safe way for now.
In order to take a reference anyway we use non MP-safe function
if_acquire_NOMPSAFE for the latter cases. They should be fixed in
the future somehow.


# 1.188 10-Jun-2016 ozaki-r

Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.


# 1.187 10-Jun-2016 ozaki-r

Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.186 18-May-2016 ozaki-r

Don't try to get outif unnecessarily from in6_selectsrc

The got outif is unused.


# 1.185 17-May-2016 ozaki-r

Get rcvif once and reuse it

No functional change.


# 1.184 17-May-2016 ozaki-r

Make sure icmp6_redirect_input frees mbuf before return


# 1.183 12-May-2016 ozaki-r

Protect ifnet list with psz and psref

The change ensures that ifnet objects in the ifnet list aren't freed during
list iterations by using pserialize(9) and psref(9).

Note that the change adds a pslist(9) for ifnet but doesn't remove the
original ifnet list (ifnet_list) to avoid breaking kvm(3) users. We
shouldn't use the original list in the kernel anymore.


Revision tags: nick-nhusb-base-20160422
# 1.182 04-Apr-2016 ozaki-r

Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html


# 1.181 01-Apr-2016 ozaki-r

Remove unnecessary casts and do s/0/NULL/ for rtrequest


# 1.180 01-Apr-2016 ozaki-r

Refine nd6log

Add __func__ to nd6log itself instead of adding it to callers.


Revision tags: nick-nhusb-base-20160319
# 1.179 21-Jan-2016 riastradh

Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.


# 1.178 21-Jan-2016 riastradh

Give proper prototype to ip_output.


Revision tags: nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.177 14-Sep-2015 ozaki-r

Update icmp6_redirect_timeout_q when changing net.inet6.icmp6.redirtimeout

We have to update icmp6_redirect_timeout_q as well as icmp6_redirtimeout
when changing net.inet6.icmp6.redirtimeout via sysctl. The updating logic
is copied from sysctl_net_inet_icmp_redirtimeout.

This change is from s-yamaguchi@IIJ (with KNF by ozaki-r) and fixes
PR kern/50240.


# 1.176 31-Aug-2015 ozaki-r

Make rt_refcnt take into account rt_timer


# 1.175 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.174 24-Aug-2015 ozaki-r

Change 0 to NULL for rtrequest's last argument (struct rtentry **ret_nrt)


# 1.173 07-Aug-2015 ozaki-r

Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.


# 1.172 24-Jul-2015 ozaki-r

Fix rtfree-ing wrong rtentry


# 1.171 17-Jul-2015 ozaki-r

Reform use of rt_refcnt

rt_refcnt of rtentry was used in bad manners, for example, direct rt_refcnt++
and rt_refcnt-- outside route.c, "rt->rt_refcnt++; rtfree(rt);" idiom, and
touching rt after rt->rt_refcnt--.

These abuses seem to be needed because rt_refcnt manages only references
between rtentry and doesn't take care of references during packet processing
(IOW references from local variables). In order to reduce the above abuses,
the latter cases should be counted by rt_refcnt as well as the former cases.

This change improves consistency of use of rt_refcnt:
- rtentry is always accessed with rt_refcnt incremented
- rtentry's rt_refcnt is decremented after use (rtfree is always used instead
of rt_refcnt--)
- functions returning rtentry increment its rt_refcnt (and caller rtfree it)

Note that rt_refcnt prevents rtentry from being freed but doesn't prevent
rtentry from being updated. Toward MP-safe, we need to provide another
protection for rtentry, e.g., locks. (Or introduce a better data structure
allowing concurrent readers during updates.)


Revision tags: nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.170 25-Nov-2014 christos

branches: 1.170.2;
CID 977389: Out of bounds access.


Revision tags: netbsd-7-0-2-RELEASE netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.169 06-Jun-2014 rmind

branches: 1.169.2;
- Eliminate RTFREE() macro in favour of rtfree() function.
- Make rtcache() function static.


# 1.168 30-May-2014 christos

Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.


# 1.167 19-May-2014 rmind

- Split off PRU_ATTACH and PRU_DETACH logic into separate functions.
- Replace malloc with kmem and eliminate M_PCB while here.
- Sprinkle more asserts.


Revision tags: rmind-smpnet-nbase rmind-smpnet-base
# 1.166 18-May-2014 rmind

Use IFNET_FIRST() rather than open coding ifnet access.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.165 25-Feb-2014 pooka

branches: 1.165.2;
Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.


# 1.164 20-Feb-2014 joerg

Bail out in case m_pulldown failed.


# 1.163 23-Nov-2013 christos

convert from CIRCLEQ to TAILQ.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.162 05-Jun-2013 christos

branches: 1.162.2;
IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.


Revision tags: agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.161 23-Jun-2012 christos

branches: 1.161.2;
4 new sysctls to avoid ipv6 DoS attacks from OpenBSD


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.160 22-Mar-2012 drochner

remove KAME IPSEC, replaced by FAST_IPSEC


Revision tags: netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.159 31-Dec-2011 christos

branches: 1.159.2; 1.159.6; 1.159.8;
- fix offsetof usage, and redundant defines
- kill pointer casts to 0


# 1.158 19-Dec-2011 drochner

rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.157 31-Aug-2011 plunky

branches: 1.157.2; 1.157.6;
NULL does not need a cast


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 rmind-uvmplock-base
# 1.156 12-Sep-2010 drochner

avoid NULL dereference in error case


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.155 18-Oct-2009 christos

branches: 1.155.2; 1.155.4;
fix the sun2 case for real.


# 1.154 12-Oct-2009 christos

unbreak sun2.


# 1.153 16-Sep-2009 pooka

Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.152 18-Mar-2009 cegger

bzero -> memset


# 1.151 18-Mar-2009 cegger

bcmp -> memcmp


Revision tags: netbsd-5-2-3-RELEASE netbsd-5-1-5-RELEASE netbsd-5-2-2-RELEASE netbsd-5-1-4-RELEASE netbsd-5-2-1-RELEASE netbsd-5-1-3-RELEASE netbsd-5-2-RELEASE netbsd-5-2-RC1 netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 nick-hppapmap-base2 netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 haad-dm-base mjf-devfs2-base
# 1.150 03-Oct-2008 adrianp

branches: 1.150.2; 1.150.8;
Fix for CVE-2008-3530 from matt@
Implement improved checking for MTU values on ICMP 'Packet Too Big Messages'


Revision tags: wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.149 06-Aug-2008 plunky

Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base
# 1.148 07-May-2008 bouyer

branches: 1.148.2; 1.148.6;
Sync with ipv4 icmp_input(): make sure the mbuf is writable and
contains the entire icmp message befre calling icmp6_input().
should fix "panic: mbuf too short for IPv6 header" seen by several peoples.


# 1.147 04-May-2008 thorpej

Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577


Revision tags: yamt-nfs-mp-base
# 1.146 23-Apr-2008 thorpej

branches: 1.146.2;
Use <net/net_stats.h> / netstat_sysctl().


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.145 15-Apr-2008 thorpej

branches: 1.145.2;
Make ip6 and icmp6 stats per-cpu.


# 1.144 08-Apr-2008 thorpej

Change IPv6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ip6stat structure; old netstat
binaries will continue to work properly.


# 1.143 08-Apr-2008 thorpej

Change ICMP6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old icmp6stat structure; old netstat
binaries will continue to work properly.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.142 27-Feb-2008 matt

Convert to ansi definitions from old-style definitons.
Remember that func() is not ansi, func(void) is.


Revision tags: nick-net80211-sync-base bouyer-xeni386-merge1 vmlocking2-base3 bouyer-xeni386-nbase yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 bouyer-xeni386-base yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase mjf-devfs-base matt-armv6-base jmcneill-pm-base hpcarm-cleanup-base reinoud-bufcleanup-base
# 1.141 04-Dec-2007 dyoung

branches: 1.141.8; 1.141.12;
Use IFNET_FOREACH() and IFADDR_FOREACH().


Revision tags: vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.140 01-Nov-2007 dyoung

branches: 1.140.2; 1.140.4;
De-__P().


# 1.139 29-Oct-2007 dyoung

The IPv6 stack labels incoming packets with an m_tag whose payload
is a struct ip6aux. A struct ip6aux used to contain a pointer to
an in6_ifaddr, but that pointer could become a dangling reference
in the lifetime of the m_tag, because ip6_setdstifaddr() did not
increase the in6_ifaddr's reference count. I have removed the
pointer from ip6aux. I load it with the interesting fields from
the in6_ifaddr (an IPv6 address, a scope ID, and some flags),
instead.


# 1.138 24-Oct-2007 dyoung

Replace rote sockaddr_in6 initializations (memset(), set sa6_family,
sa6_len, and sa6_add) with sockaddr_in6_init() calls.

De-__P(). Constify. KNF. Shorten a staircase. Change bcmp() to
memcmp().

Extract subroutine in6_setzoneid() from in6_setscope(), for re-use
soon.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base vmlocking-base
# 1.137 19-Sep-2007 dyoung

branches: 1.137.4;
1) Introduce a new socket option, (SOL_SOCKET, SO_NOHEADER), that
tells a socket that it should both add a protocol header to tx'd
datagrams and remove the header from rx'd datagrams:

int onoff = 1, s = socket(...);
setsockopt(s, SOL_SOCKET, SO_NOHEADER, &onoff);

2) Add an implementation of (SOL_SOCKET, SO_NOHEADER) for raw IPv4
sockets.

3) Reorganize the protocols' pr_ctloutput implementations a bit.
Consistently return ENOPROTOOPT when an option is unsupported,
and EINVAL if a supported option's arguments are incorrect.
Reorganize the flow of code so that it's more clear how/when
options are passed down the stack until they are handled.

Shorten some pr_ctloutput staircases for readability.

4) Extract common mbuf code into subroutines, add new sockaddr
methods, and introduce a new subroutine, fsocreate(), for reuse
later; use it first in sys_socket():

struct mbuf *m_getsombuf(struct socket *so)

Create an mbuf and make its owner the socket `so'.

struct mbuf *m_intopt(struct socket *so, int val)

Create an mbuf, make its owner the socket `so', put the
int `val' into it, and set its length to sizeof(int).


int fsocreate(..., int *fd)

Create a socket, a la socreate(9), put the socket into the
given LWP's descriptor table, return the descriptor at `fd'
on success.

void *sockaddr_addr(struct sockaddr *sa, socklen_t *slenp)
const void *sockaddr_const_addr(const struct sockaddr *sa, socklen_t *slenp)

Extract a pointer to the address part of a sockaddr. Write
the length of the address part at `slenp', if `slenp' is
not NULL.

socklen_t sockaddr_getlen(const struct sockaddr *sa)

Return the length of a sockaddr. This just evaluates to
sa->sa_len. I only add this for consistency with code that
appears in a portable userland library that I am going to
import.

const struct sockaddr *sockaddr_any(const struct sockaddr *sa)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.

const void *sockaddr_anyaddr(const struct sockaddr *sa, socklen_t *slenp)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.


Revision tags: nick-csl-alignment-base5
# 1.136 10-Aug-2007 dyoung

branches: 1.136.2;
Constify. bcopy -> memcpy.


Revision tags: matt-mips64-base
# 1.135 19-Jul-2007 dyoung

branches: 1.135.4; 1.135.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.134 13-Jun-2007 dyoung

branches: 1.134.2;
Persuasive programming: check M_UNWRITABLE(m, len) instead of
m->m_len<len before pulling up, because that helps make it clear
that we m_pullup() in order to guarantee that the contiguous region
is *writable*.


# 1.133 23-May-2007 christos

Ansify + add a few comments, from Karl Sj��dahl


Revision tags: yamt-idlelwp-base8
# 1.132 02-May-2007 dyoung

Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


Revision tags: thorpej-atomic-base
# 1.131 04-Mar-2007 christos

branches: 1.131.2; 1.131.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.130 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


# 1.129 10-Feb-2007 degroote

branches: 1.129.2;
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.128 29-Jan-2007 dyoung

bzero -> memset


# 1.127 15-Jan-2007 dyoung

Cosmetic: indent using ASCII horizontal tab, insert space following
comma, wrap line.


# 1.126 15-Jan-2007 degroote

Fix an infinite loop ( and local dos ) in the case where the ip6_hdr and
the icmp6_hdr are not in the same mbuf.
Fix pr/34994 and probably pr/35333
Ok @rpaulo


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.125 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.124 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


Revision tags: netbsd-4-base
# 1.123 16-Nov-2006 christos

branches: 1.123.2;
__unused removal on arguments; approved by core.


Revision tags: yamt-splraiseipl-base2
# 1.122 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.121 05-Sep-2006 dyoung

branches: 1.121.2; 1.121.4;
Simplify and repair icmp6_input() to stop the kernel from panicking
in m_copydata() when an ICMP6_ECHO_REQUEST is received, as reported
by Tatoku Ogaito on current-users@.


Revision tags: yamt-pdpolicy-base8
# 1.120 01-Sep-2006 dyoung

Vastly simplify the code that copies an ICMP6 packet to two data
paths: ICMP6 reply path, and socket path.


# 1.119 30-Aug-2006 christos

declare the type of code.


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.118 11-Jul-2006 tron

Clear mbuf checksum flags before passing it to ip6_output(). We might
recycle a mbuf which contained a hardware provided checksum. This
fixes "traceroute6" to a machine which is using a wm(4) interface
that has UDP or TCP checksum offload enabled.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base
# 1.117 07-Jun-2006 kardel

branches: 1.117.2;
merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base simonb-timecounters-base
# 1.116 15-Apr-2006 christos

branches: 1.116.2;
Coverity CID 740: Change constant comparisons to MCLBYTES to KASSERT and remove
extraneous tests.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2
# 1.115 05-Mar-2006 rpaulo

branches: 1.115.2; 1.115.4;
NDP-related improvements:
RFC4191
- supports host-side router-preference

RFC3542
- if DAD fails on a interface, disables IPv6 operation on the
interface
- don't advertise MLD report before DAD finishes

Others
- fixes integer overflow for valid and preferred lifetimes
- improves timer granularity for MLD, using callout-timer.
- reflects rtadvd's IPv6 host variable information into kernel
(router only)
- adds a sysctl option to enable/disable pMTUd for multicast
packets
- performs NUD on PPP/GRE interface by default
- Redirect works regardless of ip6_accept_rtadv
- removes RFC1885-related code

From the KAME project via SUZUKI Shinsuke.
Reviewed by core.


Revision tags: yamt-pdpolicy-base
# 1.114 03-Mar-2006 rpaulo

branches: 1.114.2;
Fix typos in comments.

From: the KAME project via SUZUKI Shinsuke.


Revision tags: yamt-uio_vmspace-base5
# 1.113 21-Jan-2006 rpaulo

branches: 1.113.2; 1.113.4;
Better support of IPv6 scoped addresses.

- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.
- and, while there, don't try to remove the ff02::/32 interface route
entry in in6_ifdetach() as it's already gone.

This also includes some level of support for the standard source
address selection algorithm defined in RFC3484, which will be
completed on in the future.

From the KAME project via JINMEI Tatuya.
Approved by core@.


# 1.112 11-Dec-2005 christos

branches: 1.112.2;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base ktrace-lwp-base
# 1.111 19-Oct-2005 bouyer

In icmp6_redirect_output(), sip6 is initialised to point to the data area of
m0. But m0 may be freed later, so trying to use sip6 at the end of this
function is wrong. My guess is that we want to reference the data area
of m (the mbuf about to be send) instead at this point.
Fix a panic on Xen (where a data area of a mbuf may be unmapped when the
mbuf is freed), and probably potential data/pool corruption in other cases.


Revision tags: yamt-vop-base
# 1.110 18-Aug-2005 yamt

branches: 1.110.2;
- introduce M_MOVE_PKTHDR and use it where appropriate.
intended to be mostly API compatible with openbsd/freebsd.
- remove a glue #define in netipsec/ipsec_osdep.h.


# 1.109 29-May-2005 christos

branches: 1.109.2;
- avoid shadowed variables
- sprinkle const.


Revision tags: netbsd-3-1-1-RELEASE netbsd-3-0-3-RELEASE netbsd-3-1-RELEASE netbsd-3-0-2-RELEASE netbsd-3-1-RC4 netbsd-3-1-RC3 netbsd-3-1-RC2 netbsd-3-1-RC1 netbsd-3-0-1-RELEASE netbsd-3-0-RELEASE netbsd-3-0-RC6 netbsd-3-0-RC5 netbsd-3-0-RC4 netbsd-3-0-RC3 netbsd-3-0-RC2 netbsd-3-0-RC1 yamt-km-base4 yamt-km-base3 netbsd-3-base yamt-km-base2 yamt-km-base kent-audio2-base
# 1.108 17-Jan-2005 itojun

branches: 1.108.6; 1.108.8; 1.108.10;
shouldn't check code field on "packet too big" icmp6 message.


Revision tags: kent-audio1-beforemerge kent-audio1-base
# 1.107 25-May-2004 atatat

branches: 1.107.4;
Sysctl descriptions under net subtree (net.key not done)


Revision tags: netbsd-2-0-base
# 1.106 26-Mar-2004 itojun

branches: 1.106.2;
do not touch m->m_pkthdr.rcvif after m becomes invalid. Patrick Latifi


# 1.105 24-Mar-2004 atatat

Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.


# 1.104 17-Dec-2003 lha

Fix ICMPV6CTL_ND6_[DP]RLIST, they broke with new sysctl.
Makes ndp -r/ndp -p work again, patch from atatat


# 1.103 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.102 30-Oct-2003 simonb

Remove some assigned-to but otherwise unused variables.


# 1.101 04-Sep-2003 itojun

revamp inpcb/in6pcb so that they are more aligned with each other.
in6pcb lookup now uses hash(9).


# 1.100 25-Aug-2003 itojun

deref member in in6p directly, don't rely on existence of macro


# 1.99 22-Aug-2003 itojun

remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.


# 1.98 22-Aug-2003 itojun

change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.


# 1.97 22-Aug-2003 jonathan

Replace the set_socket() method of passing an extra struct socket*
argument to ip6_output() with a new explicit struct in6pcb* argument.
(The underlying socket can be obtained via in6pcb->inp6_socket.)

In preparation for fast-ipsec. Reviewed by itojun.


# 1.96 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.95 06-Aug-2003 itojun

m_cat may free mbuf on 2nd arg, so m_pkthdr manipulation has to happen
before m_cat call. from Julian Coleman via kame.


# 1.94 24-Jun-2003 itojun

branches: 1.94.2;
remove unneeded checks of accept_rtadv. from kame


# 1.93 24-Jun-2003 itojun

use time.tv_sec directly


# 1.92 06-Jun-2003 itojun

- sync up MLD declaration with RFC3542 (s/MLD6/MLD/)
- routing header declaration with RFC3542
(note: sizeof(ip6_rthdr0) has changed!)
also, sync up with RFC2460 routing header definition (no "strict" source
routing mode any more)

part of advanced API update (RFC2292 -> 3542).


# 1.91 03-Jun-2003 itojun

remove assumption on redirect header option processing. from kame


# 1.90 14-May-2003 itojun

always use PULLDOWN_TEST codepath.


# 1.89 31-Mar-2003 itojun

avoid mbuf leak in redirect header option attachment. more complete
fix to come. from kame


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.88 27-Sep-2002 provos

remove trailing \n in panic(). approved perry.


# 1.87 23-Sep-2002 simonb

Remove breaks after returns, unreachable returns and returns after
returns(!).


# 1.86 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame.


Revision tags: gehenna-devsw-base
# 1.85 30-Jul-2002 itojun

no need to check NULL mbuf, as we touch it already.
From: tedu <grendel@zeitbombe.org>


# 1.84 10-Jul-2002 itojun

correct ping6 -w result wth hostname with [A-Z]. PR 17540. sync w/kame


# 1.83 30-Jun-2002 thorpej

Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).


# 1.82 09-Jun-2002 itojun

whitespace cleanup


# 1.81 08-Jun-2002 itojun

whitespace cleanup


# 1.80 31-May-2002 itojun

do not mistakenly lock PMTUD route entry with RTV_MTU.


# 1.79 29-May-2002 christos

make this compile again.


# 1.78 29-May-2002 itojun

correct rmx_mtu value after PMTUD entry timeout (should be set to 0)


# 1.77 24-May-2002 itojun

extra blank line


# 1.76 24-May-2002 itojun

make a strict check before sending FQDN node information reply. sync w/kame


Revision tags: netbsd-1-6-base eeh-devprop-base newlock-base
# 1.75 05-Mar-2002 itojun

branches: 1.75.6; 1.75.8;
on redirect output, always try to attach target link layer address option.


Revision tags: ifpoll-base
# 1.74 21-Dec-2001 itojun

whitespace/costmetic sync w/kame


# 1.73 20-Dec-2001 itojun

centralize multicast group management (in6_join/leavegroup).
have a flag for ip6_output() to fragment to minimum MTU.
sync with kame


# 1.72 07-Dec-2001 itojun

correct timing to increment icmp6 MIB variables. sync with kame


# 1.71 13-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-mips-cache-base
# 1.70 29-Oct-2001 simonb

Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.


# 1.69 24-Oct-2001 itojun

more whitespace sync with kame


# 1.68 18-Oct-2001 itojun

branches: 1.68.2;
simplify per-if stats.


# 1.67 15-Oct-2001 itojun

sync with kame.
net.inet6.icmp6.nodeinfo is now a bitmap (2^0 = ping6 -w, 2^1 = ping6 -a).
give up local if there's mbuf alloc failures.
cope with ".." in hostname.
sync comments/whitespaces.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.66 22-Jun-2001 itojun

branches: 1.66.2;
remove RFC1885 compatibility code in #ifdef COMPAT_RFC1885, for icmp6
reply packet size consideration (obsolete, not used for a long time).
sync with kame


# 1.65 01-Jun-2001 itojun

use default hoplimit when incoming interface is not given to icmp6_reflect.
sync with kame


# 1.64 08-May-2001 itojun

correct faith prefix determination. use sys/netinet/if_faith.c:faithprefix()
to determine. sync with kame.
(without this change, non-faith socket may mistakenly accept for-faith traffic)


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.63 04-Apr-2001 itojun

make sure rcvif is sane on call to icmp6_reflect


# 1.62 30-Mar-2001 itojun

enable FAKE_LOOPBACK_IF case by default.
now traffic on loopback interface will be presented to bpf as normal wire
format packet (without KAME scopeid in s6_addr16[1]).

fix KAME PR 250 (host mistakenly accepts packets to fe80::x%lo0).

sync with kame.


# 1.61 21-Mar-2001 itojun

set rmx_mtu to L2 interface mtu, instead of 0, on mtudisc timeout.
ip6_output() change is for safety. sync with kame


# 1.60 08-Mar-2001 itojun

remove bogus rtfree. sync with kame. inspired by openbsd PR 1706.


# 1.59 01-Mar-2001 itojun

branches: 1.59.2;
make sure to enforce inbound ipsec policy checking, for any protocols on top
of ip (check it when final header is visited). sync with kame.
XXX kame team will need to re-check policy engine code


# 1.58 11-Feb-2001 itojun

pull latest kame pcbnotify code. synchronizes ICMPv6 path mtu discovery
behavior with other protocols (i.e. validation, use of hiwat/lowat).


# 1.57 11-Feb-2001 itojun

recover $NetBSD$ (removed by mistake)


# 1.56 10-Feb-2001 itojun

to sync with kame better, (1) remove register declaration for variables,
(2) sync whitespaces, (3) update comments. (4) bring in some of portability
and logging enhancements. no functional changes here.


# 1.55 08-Feb-2001 itojun

implement upper limit to icmp6 redirects (experimental, turned off)
negative value to {mtudisc,redirect}_{hi,lo}wat will turn off the limitation.
sync with kame.


# 1.54 07-Feb-2001 itojun

remove bogus DIAGNOSTIC. sync with kame


# 1.53 07-Feb-2001 itojun

during ip6/icmp6 inbound packet processing, do not call log() nor printf() in
normal operation (/var can get filled up by flodding bogus packets).
sysctl net.inet6.icmp6.nd6_debug will turn on diagnostic messages.
(#define ND6_DEBUG will turn it on by default)

improve stats in ND6 code.

lots of synchronziation with kame (including comments and cometic ones).


# 1.52 24-Jan-2001 itojun

- record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation


# 1.51 16-Jan-2001 itojun

s/ND6DEBUG/ND6_DEBUG/ to meet other places


# 1.50 08-Jan-2001 itojun

wrap icmp6 checksum error printf() into #ifdef ND6DEBUG.
sync with kame, NetBSD PR 11911.


# 1.49 11-Dec-2000 itojun

no need to rtalloc1() twice in pmtud. from kame


# 1.48 09-Dec-2000 itojun

update icmp6 too big validation. the change is necessary since pmtud is
mandatory for IPv6 (so we can't just validate by using connected pcb - we need
to allow traffic from unconnected pcb to do pmtud).
- if the traffic is validated by xx_ctlinput, allow up to "hiwat" pmtud
route entries.
- if the traffic was not validated by xx_ctlinput, allow up to "lowat" pmtud
route entries (there's upper limit, so bad guys cannot blow up our routing
table).
sync with kame

XXX need to think again about default hiwat/lowat value.
XXX victim selection to help starvation case


# 1.47 11-Nov-2000 itojun

improve spec conformance of node information query (07).
sync with kame.


# 1.46 18-Oct-2000 itojun

verify ICMPv6 too big messages based on TCP pcbs, and/or IPsec SA.
TODO: udp6, and sendto consideration. as pmtud is mandatory for IPv6,
it is rather important for us to support those cases.
TODO: more testing
TODO: kame sync


# 1.45 10-Oct-2000 itojun

sync with kame ($KAME$)


# 1.44 02-Oct-2000 itojun

fix compilation without INET. fix confusion between ipsecstat and ipsec6stat.
sync with kame.


# 1.43 16-Sep-2000 itojun

kame sys/netinet6/icmp6.c 1.140 -> 1.144
> in the check for the incoming redirect message, examine the gateway
> (from the routing table) only when the address family of the gateway is
> AF_INET6.


# 1.42 19-Aug-2000 itojun

- icmp6 nodeinfo: remove possibility of unaligned pointer access.
- jumbo payload output: fix incorrect mbuf manipulation
- pedant: align issues, mbuf assumption
(sync with kame)


# 1.41 03-Aug-2000 itojun

clearifications in icmp6 node query support.
XXX previous commit included "supported qtypes" icmp6 node query support.
sorry commit message was mistaken.


# 1.40 03-Aug-2000 itojun

correct typo in #define. ICMP6_NI_SUCESS -> SUCCESS (notice missing C).
sync with kame.


# 1.39 30-Jul-2000 itojun

sync comment with reality


# 1.38 28-Jul-2000 itojun

nuke the following sysctl variables. "ppsratelimit" should work better.
need to recompile sbin/sysctl after updating /usr/include.
net.inet.tcp.rstratelimit
net.inet.icmp.errratelimit
net.inet6.icmp6.errratelimit


# 1.37 09-Jul-2000 itojun

add ppsratelimit(9), which does event-per-sec rate limitation.
use it from icmp6 error rate limitation code.
XXX better name for the function?


# 1.36 07-Jul-2000 itojun

sync with kame.
introduce in6_{recover,embed}scope, for in-kernel scoped-address manipulation.
improve in6_pcbnotify.


# 1.35 06-Jul-2000 itojun

- do not use bitfield for router renumbering header.
- add protection mechanism against ND cache corruption due to bad NUD hints.
- more stats
- icmp6 pps limitation. TOOD: should implement ppsratecheck(9).


# 1.34 28-Jun-2000 mrg

<vm/vm.h> -> <uvm/uvm_extern.h>


Revision tags: netbsd-1-5-base
# 1.33 13-Jun-2000 itojun

branches: 1.33.2;
signedness issue with char, take 2. confirmed with i386 cc -funsigned-char.


# 1.32 13-Jun-2000 itojun

workaround to suppress warning on char == unsigned char arch.


# 1.31 12-Jun-2000 itojun

better conformance to draft-ietf-ipngwg-icmp-name-lookups-05.
the old code was chimera of 03 and 05 draft.

-n by default, since IPv6 reverse lookup takes too much time.
use -H to enable reverse name lookup.


Revision tags: minoura-xpg4dl-base
# 1.30 22-May-2000 itojun

branches: 1.30.2;
disallow negative numbers for ratelimit interval (tcp, icmp, icmp6).


# 1.29 09-May-2000 itojun

do not try NUD unless the gateway is a real neighbor.
real fix to KAME PR 245 (workaround has been implemented).


# 1.28 13-Apr-2000 itojun

do not return icmp6 error against icmp6 error.
(this is due to a bug in header chain chasing)


# 1.27 22-Mar-2000 itojun

use ip6_{last,next}hdr in icmp6 inbound packet parsing.


# 1.26 01-Mar-2000 itojun

introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)


# 1.25 28-Feb-2000 itojun

fix ICMPv6 redirect input. the bug can result in invalid ND entry.


# 1.24 28-Feb-2000 itojun

support draft-ietf-ipngwg-icmp-name-lookups-05.txt, drop support for
draft-ietf-ipngwg-icmp-name-lookups-04.txt.

There are certain bitfield change in 04 draft to 05 draft, which makes
04 "ping6 -a" and 05 "ping6 -a" not interoperable. sigh.


# 1.23 26-Feb-2000 itojun

bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.


# 1.22 17-Feb-2000 darrenr

Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.


# 1.21 15-Feb-2000 thorpej

Fix a couple of brainos in the last.


# 1.20 14-Feb-2000 thorpej

Use ratecheck() for ICMP6 rate limiting.


Revision tags: chs-ubc2-newbase
# 1.19 06-Feb-2000 itojun

fix include pathname for better rfc2292 compliance.


# 1.18 16-Jan-2000 itojun

add missing ipcomp cases.


# 1.17 07-Jan-2000 itohy

Rename variable "prep" for PReP port.


# 1.16 06-Jan-2000 itojun

remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...


# 1.15 05-Jan-2000 itojun

avoid panic on getsockopt(ICMPV6_FILTER).


# 1.14 02-Jan-2000 itojun

add net.inet6.icmp6.nodeinfo sysctl.
this allows you to disable/enable ICMPv6 node information query/reply
processing (which tells remote end the gethostname(3) setting, interface
addresses on the node, and some other things - documented in
draft-ietf-ipngwg-icmp-name-lookup* or something alike).

to test it, try ping6 -w ::1 with nodeinfo=0 and nodeinfo=1.
(sync with kame change)


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.13 15-Dec-1999 itojun

do not overwrite traffic class field when we write IPv6 version field.


# 1.12 13-Dec-1999 itojun

sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)


Revision tags: comdex-fall-1999-base fvdl-softdep-base
# 1.11 01-Oct-1999 itojun

branches: 1.11.2; 1.11.8;
consistent logging for icmp6 redirects
XXX should make logs 1-liner so that duplicated logs can be compressed
by syslog(8)?


Revision tags: chs-ubc2-base
# 1.10 31-Jul-1999 itojun

sync with recent KAME.
- loosen ipsec restriction on packet diredtion.
- revise icmp6 redirect handling on IsRouter bit.
- tcp/udp notification processing (link-local address case)
- cosmetic fixes (better code share across *BSD).


# 1.9 30-Jul-1999 itojun

remove reference to in6_systm.h (file itself will be removed afterwords)


# 1.8 22-Jul-1999 itojun

- implement IPv6 pmtud, which is necessary for TCP6.
- fix memory leak on SO_DEBUG over TCP.


# 1.7 22-Jul-1999 itojun

change unnecessary u_long/long into u_int32_t or something relevant.
more fixes should follow.


# 1.6 09-Jul-1999 thorpej

defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).


# 1.5 06-Jul-1999 itojun

sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups


# 1.4 06-Jul-1999 itojun

checked build on alpha and i386, with GENERIC.v6.
fixed several sizeof(void *) and sizeof(size_t) issues on alpha.

Thanks to: Dave Huang and Tim Rightnour


# 1.3 03-Jul-1999 thorpej

RCS ID police.


# 1.2 01-Jul-1999 itojun

branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.1 28-Jun-1999 itojun

branches: 1.1.2;
file icmp6.c was initially added on branch kame.


# 1.244 09-Mar-2020 roy

route: RTM_MISS now puts the message source address in RTA_AUTHOR

route(8) also reports this.
A userland app could use this to blacklist nodes who probe for machines
that doesn't exist on a subnet / prefix.


Revision tags: ad-namecache-base3 ad-namecache-base2 ad-namecache-base1 ad-namecache-base phil-wifi-20191119
# 1.243 06-Oct-2019 uwe

icmp6_notify_error - fix ctlfunc typedef to match pr_ctlinput,
drop the cast that is no longer necessary.


Revision tags: netbsd-9-0-RELEASE netbsd-9-0-RC2 netbsd-9-0-RC1 netbsd-9-base phil-wifi-20190609 isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.242 22-Dec-2018 maxv

Replace: M_COPY_PKTHDR -> m_copy_pkthdr. No functional change, since the
former is a macro to the latter.


# 1.241 22-Dec-2018 maxv

Replace: M_MOVE_PKTHDR -> m_move_pkthdr. No functional change, since the
former is a macro to the latter.


Revision tags: pgoyette-compat-1126
# 1.240 25-Oct-2018 ozaki-r

Remove a leftover debug printf

Pointed out by hannken@


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.239 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


Revision tags: pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625
# 1.238 01-Jun-2018 ozaki-r

branches: 1.238.2;
Fix _rt_free via rtrequest(RTM_DELETE) hangs in rt_timer handlers

A rt_timer handler is passed a rtentry with an extra reference that avoids the
rtentry is accidentally released. So rt_timer handers must release the reference
of a passed rtentry by themselves (but they didn't).


Revision tags: pgoyette-compat-0521
# 1.237 07-May-2018 maxv

Remove misleading comments.


Revision tags: pgoyette-compat-0502
# 1.236 01-May-2018 maxv

Remove now unused net_osdep.h includes, the other BSDs did the same.


# 1.235 29-Apr-2018 maxv

Replace
m_copym(m, 0, M_COPYALL, M_DONTWAIT)
by
m_copypacket(m, M_DONTWAIT)
when it is obvious that 'm' has M_PKTHDR set.


# 1.234 28-Apr-2018 maxv

Remove unused ipsec_var.h includes.


# 1.233 27-Apr-2018 maxv

Fix a bug introduced in rev1.154 (2009). mcl_cache still has a size of
MCLBYTES, so the area allocated is still too small.

I think it should have been MEXTMALLOC, and of course I can't test my
change.


# 1.232 26-Apr-2018 maxv

Stop using m_copy(), use m_copym() directly. m_copy is useless,
undocumented and confusing.


# 1.231 26-Apr-2018 maxv

Use M_UNWRITABLE, no functional change.


Revision tags: pgoyette-compat-0422 pgoyette-compat-0415
# 1.230 14-Apr-2018 maxv

Fix 'icmp6len', it shouldn't be ip6_plen, because we may not be at the
beginning of the packet (off+ip6_plen is beyond the end of the mbuf). By
luck, the IP6_EXTHDR_GET that follows will fail and prevent buffer
overflows in non-jumbogram packets.

For jumbograms we will probably be in trouble here; but it doesn't seem
possible to craft reliably a jumbogram for a non-jumbogram-enabled device.

So I don't think it's a huge problem.


# 1.229 14-Apr-2018 maxv

Cosmetic, and remove one XXX (no problem).


# 1.228 14-Apr-2018 maxv

Remove the RH0 code from ICMPv6. RH0 is deprecated by RFC5095 (2007) for
security reasons. We already removed it in Route6.

In addition there was an mbuf bug here: calling IP6_EXTHDR_GET twice with
the same offset, but still using the pointer from the first call, which
could have been made invalid. By luck, m_pulldown leaves zero-sized mbufs
in place, instead of freeing them.

And in general, using a 'finaldst' pointer on the mbuf, and then modifying
that mbuf with IP6_EXTHDR_GET with a smaller offset, was really error-
prone.


# 1.227 14-Apr-2018 maxv

Remove dead code. It is the same as the non-obsolete one, since
ICMP6_DST_UNREACH_NOTNEIGHBOR == ICMP6_DST_UNREACH_BEYONDSCOPE,
and the code leads to the same errno value (EHOSTUNREACH).


# 1.226 12-Apr-2018 maxv

Synchronize the code between raw_ip6.c<->icmp6.c<->raw_ip.c, so that it is
the same everywhere.


# 1.225 12-Apr-2018 maxv

Remove misleading comment; we're just checking the SP, not verifying the
AH/ESP payload. While here style a bit.


Revision tags: pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322
# 1.224 21-Mar-2018 roy

Sprinkle more soroverflow().


Revision tags: pgoyette-compat-0315 pgoyette-compat-base
# 1.223 28-Feb-2018 maxv

branches: 1.223.2;
Remove unused ipsec_private.h includes.


# 1.222 26-Feb-2018 maxv

Remove redundant condition (harmless). PR/53030.


# 1.221 26-Feb-2018 maxv

Dedup: merge ipsec4_in_reject and ipsec6_in_reject into ipsec_in_reject.
While here fix misleading comment.

ok ozaki-r@


# 1.220 12-Feb-2018 maxv

Replace bcopy -> memcpy when it is obvious that the areas don't overlap.
Rearrange ip6_splithdr() for clarity.


# 1.219 23-Jan-2018 maxv

Style, localify, remove XXX when there's no issue, and switch 'extra'
to int.


# 1.218 23-Jan-2018 maxv

Fix the check on 'maxlen', we are not creating struct icmp6_hdr but
struct nd_redirect (which is bigger). Also, make sure we can add a
struct nd_opt_rd_hdr.

Normally this doesn't change anything, since the mbuf has IPV6_MMTU
bytes, and it's always way bigger than what we need.


# 1.217 23-Jan-2018 maxv

Fix info leak. We are allocating a slot of size:

roundup(sizeof(*nd_opt) + ifp->if_addrlen, 8)

But we are not filling in the padding caused by the roundup, and therefore
several bytes are leaked, in the mbuf we're about to send to the network.


# 1.216 23-Jan-2018 maxv

Fix twice the same mistake: 'last' can't be null, so there's no point in
having this misleading branch.


# 1.215 23-Jan-2018 maxv

Style, and four fixes:

* Remove the (disabled) IPPROTO_ESP check. If the packet was decrypted it
will have M_DECRYPTED, and this is already checked.

* Memory leaks in icmp6_error2. They seem hardly triggerable.

* Fix miscomputation in _icmp6_input, the ICMP6 header is not guaranteed
to be located right after the IP6 header. ok mlelstv@

* Memory leak in _icmp6_input. This one seems to be impossible to trigger.


Revision tags: tls-maxphys-base-20171202
# 1.214 05-Nov-2017 ozaki-r

Fix usages of ipsec_used

If IPsec isn't used, we must go back to the normal path.

PR kern/52659


Revision tags: nick-nhusb-base-20170825
# 1.213 02-Aug-2017 ozaki-r

Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.


Revision tags: perseant-stdc-iso10646-base
# 1.212 07-Jul-2017 knakahara

fix PR kern/52353. implemented by ozaki-r@n.o. I just commit by proxy.

XXX need to pullup to -8.


Revision tags: netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.211 14-Mar-2017 ozaki-r

branches: 1.211.6;
Replace DIAGNOSTIC + panic with CTASSERT


# 1.210 17-Feb-2017 ozaki-r

Rename if_acquire_NOMPSAFE to if_acquire

It can be used in MP-safe ways. So let's remove the confusing postfix.
If it's used in a unsafe way, warn NOMPSAFE in a comment.


# 1.209 13-Feb-2017 ozaki-r

Protect mtudisc and redirect stuffs of icmp/icmp6 with mutex

We have to run pr_init of icmp and icmp6 prior to tcp and tcp6 ones
for mutex initialization.


# 1.208 07-Feb-2017 ozaki-r

Add missing NULL checks for m_get_rcvif


Revision tags: nick-nhusb-base-20170204
# 1.207 02-Feb-2017 ozaki-r

Defer some pr_input to workqueue

pr_input is currently called in softint. Some pr_input such as ICMP, ICMPv6
and CARP can add/delete/update IP addresses and routing table entries. For
example, icmp6_redirect_input updates an a routing table entry and
nd6_ra_input may delete an IP address.

Basically such operations shouldn't be done in softint. That aside, we have
a reason to avoid the situation; psz/psref waits cannot be used in softint,
however they are required to work in such pr_input in the MP-safe world.

The change implements the workqueue pr_input framework called wqinput which
provides a means to defer pr_input of a protocol to workqueue easily.
Currently icmp_input, icmp6_input, carp_proto_input and carp6_proto_input
are deferred to workqueue by the framework.

Proposed and discussed on tech-kern and tech-net


# 1.206 16-Jan-2017 christos

ip6_sprintf -> IN6_PRINT so that we pass the size.


# 1.205 16-Jan-2017 ryo

Make ip6_sprintf(), in_fmtaddr(), lla_snprintf() and icmp6_redirect_diag() mpsafe.

Reviewed by ozaki-r@


Revision tags: bouyer-socketcan-base
# 1.204 13-Jan-2017 ozaki-r

branches: 1.204.2;
Tweak icmp6_input; always use off, not *offp


Revision tags: pgoyette-localcount-20170107
# 1.203 12-Dec-2016 ozaki-r

Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.202 11-Dec-2016 ozaki-r

Correct sanity checks of icmp6_redirect_output

- rt->rt_ifp is always non-NULL
- Checking RTF_UP here is just racy and meaningless
- The arguments should be non-NULL (at least for now)


Revision tags: nick-nhusb-base-20161204
# 1.201 15-Nov-2016 mlelstv

Enforce alignment requirements that are violated in some cases.
For machines that don't need strict alignment (i386,amd64,vax,m68k) this
is a no-op.

Fixes PR kern/50766 but should be improved.


Revision tags: pgoyette-localcount-20161104
# 1.200 31-Oct-2016 ozaki-r

Fix race condition of in6_selectsrc

in6_selectsrc returned a pointer to in6_addr that wan't guaranteed to be
safe by pserialize (or psref), which was racy. Let callers pass a pointer
to in6_addr and in6_selectsrc copy a result to it inside pserialize
critical sections.


# 1.199 25-Oct-2016 ozaki-r

Remove unnecessary argument

No functional change.


# 1.198 18-Oct-2016 ozaki-r

Remove unnecessary pserialize_read_enter


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.197 26-Aug-2016 dholland

PR 51434 David Binderman: remove redundant test.


# 1.196 19-Aug-2016 roy

Revert r1.148
IP6_EXTHDR_GET ensures that a icmp6 header can be fetched from the mbuf
so m_pullup does not need to be called.

While here, we can safely increament interface error stats even with an
invalidated mbuf because we have a saved reference to the interface.


Revision tags: pgoyette-localcount-20160806
# 1.195 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


Revision tags: pgoyette-localcount-20160726
# 1.194 15-Jul-2016 ozaki-r

Use sin6tosa and sin6tocsa macros

No functional change.


# 1.193 15-Jul-2016 ozaki-r

Use ifatoia6 macro

No functional change.


Revision tags: pgoyette-localcount-base nick-nhusb-base-20160907
# 1.192 07-Jul-2016 ozaki-r

branches: 1.192.2;
Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.


# 1.191 05-Jul-2016 ozaki-r

Use ia6 or ia instead of ifa as a variable name of struct in6_ifaddr

We conventionally use ifa for struct ifaddr and use ia6 or ia for
struct in6_ifaddr.

No functional change.


# 1.190 28-Jun-2016 ozaki-r

Add missing NULL checks for m_get_rcvif_psref


# 1.189 21-Jun-2016 ozaki-r

Make sure returning ifp from in6_select* functions psref-ed

To this end, callers need to pass struct psref to the functions
and the fuctions acquire a reference of ifp with it. In some cases,
we can simply use if_get_byindex, however, in other cases
(say rt->rt_ifp and ia->ifa_ifp), we have no MP-safe way for now.
In order to take a reference anyway we use non MP-safe function
if_acquire_NOMPSAFE for the latter cases. They should be fixed in
the future somehow.


# 1.188 10-Jun-2016 ozaki-r

Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.


# 1.187 10-Jun-2016 ozaki-r

Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.186 18-May-2016 ozaki-r

Don't try to get outif unnecessarily from in6_selectsrc

The got outif is unused.


# 1.185 17-May-2016 ozaki-r

Get rcvif once and reuse it

No functional change.


# 1.184 17-May-2016 ozaki-r

Make sure icmp6_redirect_input frees mbuf before return


# 1.183 12-May-2016 ozaki-r

Protect ifnet list with psz and psref

The change ensures that ifnet objects in the ifnet list aren't freed during
list iterations by using pserialize(9) and psref(9).

Note that the change adds a pslist(9) for ifnet but doesn't remove the
original ifnet list (ifnet_list) to avoid breaking kvm(3) users. We
shouldn't use the original list in the kernel anymore.


Revision tags: nick-nhusb-base-20160422
# 1.182 04-Apr-2016 ozaki-r

Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html


# 1.181 01-Apr-2016 ozaki-r

Remove unnecessary casts and do s/0/NULL/ for rtrequest


# 1.180 01-Apr-2016 ozaki-r

Refine nd6log

Add __func__ to nd6log itself instead of adding it to callers.


Revision tags: nick-nhusb-base-20160319
# 1.179 21-Jan-2016 riastradh

Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.


# 1.178 21-Jan-2016 riastradh

Give proper prototype to ip_output.


Revision tags: nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.177 14-Sep-2015 ozaki-r

Update icmp6_redirect_timeout_q when changing net.inet6.icmp6.redirtimeout

We have to update icmp6_redirect_timeout_q as well as icmp6_redirtimeout
when changing net.inet6.icmp6.redirtimeout via sysctl. The updating logic
is copied from sysctl_net_inet_icmp_redirtimeout.

This change is from s-yamaguchi@IIJ (with KNF by ozaki-r) and fixes
PR kern/50240.


# 1.176 31-Aug-2015 ozaki-r

Make rt_refcnt take into account rt_timer


# 1.175 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.174 24-Aug-2015 ozaki-r

Change 0 to NULL for rtrequest's last argument (struct rtentry **ret_nrt)


# 1.173 07-Aug-2015 ozaki-r

Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.


# 1.172 24-Jul-2015 ozaki-r

Fix rtfree-ing wrong rtentry


# 1.171 17-Jul-2015 ozaki-r

Reform use of rt_refcnt

rt_refcnt of rtentry was used in bad manners, for example, direct rt_refcnt++
and rt_refcnt-- outside route.c, "rt->rt_refcnt++; rtfree(rt);" idiom, and
touching rt after rt->rt_refcnt--.

These abuses seem to be needed because rt_refcnt manages only references
between rtentry and doesn't take care of references during packet processing
(IOW references from local variables). In order to reduce the above abuses,
the latter cases should be counted by rt_refcnt as well as the former cases.

This change improves consistency of use of rt_refcnt:
- rtentry is always accessed with rt_refcnt incremented
- rtentry's rt_refcnt is decremented after use (rtfree is always used instead
of rt_refcnt--)
- functions returning rtentry increment its rt_refcnt (and caller rtfree it)

Note that rt_refcnt prevents rtentry from being freed but doesn't prevent
rtentry from being updated. Toward MP-safe, we need to provide another
protection for rtentry, e.g., locks. (Or introduce a better data structure
allowing concurrent readers during updates.)


Revision tags: nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.170 25-Nov-2014 christos

branches: 1.170.2;
CID 977389: Out of bounds access.


Revision tags: netbsd-7-0-2-RELEASE netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.169 06-Jun-2014 rmind

branches: 1.169.2;
- Eliminate RTFREE() macro in favour of rtfree() function.
- Make rtcache() function static.


# 1.168 30-May-2014 christos

Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.


# 1.167 19-May-2014 rmind

- Split off PRU_ATTACH and PRU_DETACH logic into separate functions.
- Replace malloc with kmem and eliminate M_PCB while here.
- Sprinkle more asserts.


Revision tags: rmind-smpnet-nbase rmind-smpnet-base
# 1.166 18-May-2014 rmind

Use IFNET_FIRST() rather than open coding ifnet access.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.165 25-Feb-2014 pooka

branches: 1.165.2;
Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.


# 1.164 20-Feb-2014 joerg

Bail out in case m_pulldown failed.


# 1.163 23-Nov-2013 christos

convert from CIRCLEQ to TAILQ.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.162 05-Jun-2013 christos

branches: 1.162.2;
IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.


Revision tags: agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.161 23-Jun-2012 christos

branches: 1.161.2;
4 new sysctls to avoid ipv6 DoS attacks from OpenBSD


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.160 22-Mar-2012 drochner

remove KAME IPSEC, replaced by FAST_IPSEC


Revision tags: netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.159 31-Dec-2011 christos

branches: 1.159.2; 1.159.6; 1.159.8;
- fix offsetof usage, and redundant defines
- kill pointer casts to 0


# 1.158 19-Dec-2011 drochner

rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.157 31-Aug-2011 plunky

branches: 1.157.2; 1.157.6;
NULL does not need a cast


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 rmind-uvmplock-base
# 1.156 12-Sep-2010 drochner

avoid NULL dereference in error case


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.155 18-Oct-2009 christos

branches: 1.155.2; 1.155.4;
fix the sun2 case for real.


# 1.154 12-Oct-2009 christos

unbreak sun2.


# 1.153 16-Sep-2009 pooka

Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.152 18-Mar-2009 cegger

bzero -> memset


# 1.151 18-Mar-2009 cegger

bcmp -> memcmp


Revision tags: netbsd-5-2-3-RELEASE netbsd-5-1-5-RELEASE netbsd-5-2-2-RELEASE netbsd-5-1-4-RELEASE netbsd-5-2-1-RELEASE netbsd-5-1-3-RELEASE netbsd-5-2-RELEASE netbsd-5-2-RC1 netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 nick-hppapmap-base2 netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 haad-dm-base mjf-devfs2-base
# 1.150 03-Oct-2008 adrianp

branches: 1.150.2; 1.150.8;
Fix for CVE-2008-3530 from matt@
Implement improved checking for MTU values on ICMP 'Packet Too Big Messages'


Revision tags: wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.149 06-Aug-2008 plunky

Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base
# 1.148 07-May-2008 bouyer

branches: 1.148.2; 1.148.6;
Sync with ipv4 icmp_input(): make sure the mbuf is writable and
contains the entire icmp message befre calling icmp6_input().
should fix "panic: mbuf too short for IPv6 header" seen by several peoples.


# 1.147 04-May-2008 thorpej

Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577


Revision tags: yamt-nfs-mp-base
# 1.146 23-Apr-2008 thorpej

branches: 1.146.2;
Use <net/net_stats.h> / netstat_sysctl().


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.145 15-Apr-2008 thorpej

branches: 1.145.2;
Make ip6 and icmp6 stats per-cpu.


# 1.144 08-Apr-2008 thorpej

Change IPv6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ip6stat structure; old netstat
binaries will continue to work properly.


# 1.143 08-Apr-2008 thorpej

Change ICMP6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old icmp6stat structure; old netstat
binaries will continue to work properly.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.142 27-Feb-2008 matt

Convert to ansi definitions from old-style definitons.
Remember that func() is not ansi, func(void) is.


Revision tags: nick-net80211-sync-base bouyer-xeni386-merge1 vmlocking2-base3 bouyer-xeni386-nbase yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 bouyer-xeni386-base yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase mjf-devfs-base matt-armv6-base jmcneill-pm-base hpcarm-cleanup-base reinoud-bufcleanup-base
# 1.141 04-Dec-2007 dyoung

branches: 1.141.8; 1.141.12;
Use IFNET_FOREACH() and IFADDR_FOREACH().


Revision tags: vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.140 01-Nov-2007 dyoung

branches: 1.140.2; 1.140.4;
De-__P().


# 1.139 29-Oct-2007 dyoung

The IPv6 stack labels incoming packets with an m_tag whose payload
is a struct ip6aux. A struct ip6aux used to contain a pointer to
an in6_ifaddr, but that pointer could become a dangling reference
in the lifetime of the m_tag, because ip6_setdstifaddr() did not
increase the in6_ifaddr's reference count. I have removed the
pointer from ip6aux. I load it with the interesting fields from
the in6_ifaddr (an IPv6 address, a scope ID, and some flags),
instead.


# 1.138 24-Oct-2007 dyoung

Replace rote sockaddr_in6 initializations (memset(), set sa6_family,
sa6_len, and sa6_add) with sockaddr_in6_init() calls.

De-__P(). Constify. KNF. Shorten a staircase. Change bcmp() to
memcmp().

Extract subroutine in6_setzoneid() from in6_setscope(), for re-use
soon.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base vmlocking-base
# 1.137 19-Sep-2007 dyoung

branches: 1.137.4;
1) Introduce a new socket option, (SOL_SOCKET, SO_NOHEADER), that
tells a socket that it should both add a protocol header to tx'd
datagrams and remove the header from rx'd datagrams:

int onoff = 1, s = socket(...);
setsockopt(s, SOL_SOCKET, SO_NOHEADER, &onoff);

2) Add an implementation of (SOL_SOCKET, SO_NOHEADER) for raw IPv4
sockets.

3) Reorganize the protocols' pr_ctloutput implementations a bit.
Consistently return ENOPROTOOPT when an option is unsupported,
and EINVAL if a supported option's arguments are incorrect.
Reorganize the flow of code so that it's more clear how/when
options are passed down the stack until they are handled.

Shorten some pr_ctloutput staircases for readability.

4) Extract common mbuf code into subroutines, add new sockaddr
methods, and introduce a new subroutine, fsocreate(), for reuse
later; use it first in sys_socket():

struct mbuf *m_getsombuf(struct socket *so)

Create an mbuf and make its owner the socket `so'.

struct mbuf *m_intopt(struct socket *so, int val)

Create an mbuf, make its owner the socket `so', put the
int `val' into it, and set its length to sizeof(int).


int fsocreate(..., int *fd)

Create a socket, a la socreate(9), put the socket into the
given LWP's descriptor table, return the descriptor at `fd'
on success.

void *sockaddr_addr(struct sockaddr *sa, socklen_t *slenp)
const void *sockaddr_const_addr(const struct sockaddr *sa, socklen_t *slenp)

Extract a pointer to the address part of a sockaddr. Write
the length of the address part at `slenp', if `slenp' is
not NULL.

socklen_t sockaddr_getlen(const struct sockaddr *sa)

Return the length of a sockaddr. This just evaluates to
sa->sa_len. I only add this for consistency with code that
appears in a portable userland library that I am going to
import.

const struct sockaddr *sockaddr_any(const struct sockaddr *sa)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.

const void *sockaddr_anyaddr(const struct sockaddr *sa, socklen_t *slenp)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.


Revision tags: nick-csl-alignment-base5
# 1.136 10-Aug-2007 dyoung

branches: 1.136.2;
Constify. bcopy -> memcpy.


Revision tags: matt-mips64-base
# 1.135 19-Jul-2007 dyoung

branches: 1.135.4; 1.135.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.134 13-Jun-2007 dyoung

branches: 1.134.2;
Persuasive programming: check M_UNWRITABLE(m, len) instead of
m->m_len<len before pulling up, because that helps make it clear
that we m_pullup() in order to guarantee that the contiguous region
is *writable*.


# 1.133 23-May-2007 christos

Ansify + add a few comments, from Karl Sj��dahl


Revision tags: yamt-idlelwp-base8
# 1.132 02-May-2007 dyoung

Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


Revision tags: thorpej-atomic-base
# 1.131 04-Mar-2007 christos

branches: 1.131.2; 1.131.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.130 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


# 1.129 10-Feb-2007 degroote

branches: 1.129.2;
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.128 29-Jan-2007 dyoung

bzero -> memset


# 1.127 15-Jan-2007 dyoung

Cosmetic: indent using ASCII horizontal tab, insert space following
comma, wrap line.


# 1.126 15-Jan-2007 degroote

Fix an infinite loop ( and local dos ) in the case where the ip6_hdr and
the icmp6_hdr are not in the same mbuf.
Fix pr/34994 and probably pr/35333
Ok @rpaulo


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.125 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.124 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


Revision tags: netbsd-4-base
# 1.123 16-Nov-2006 christos

branches: 1.123.2;
__unused removal on arguments; approved by core.


Revision tags: yamt-splraiseipl-base2
# 1.122 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.121 05-Sep-2006 dyoung

branches: 1.121.2; 1.121.4;
Simplify and repair icmp6_input() to stop the kernel from panicking
in m_copydata() when an ICMP6_ECHO_REQUEST is received, as reported
by Tatoku Ogaito on current-users@.


Revision tags: yamt-pdpolicy-base8
# 1.120 01-Sep-2006 dyoung

Vastly simplify the code that copies an ICMP6 packet to two data
paths: ICMP6 reply path, and socket path.


# 1.119 30-Aug-2006 christos

declare the type of code.


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.118 11-Jul-2006 tron

Clear mbuf checksum flags before passing it to ip6_output(). We might
recycle a mbuf which contained a hardware provided checksum. This
fixes "traceroute6" to a machine which is using a wm(4) interface
that has UDP or TCP checksum offload enabled.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base
# 1.117 07-Jun-2006 kardel

branches: 1.117.2;
merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base simonb-timecounters-base
# 1.116 15-Apr-2006 christos

branches: 1.116.2;
Coverity CID 740: Change constant comparisons to MCLBYTES to KASSERT and remove
extraneous tests.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2
# 1.115 05-Mar-2006 rpaulo

branches: 1.115.2; 1.115.4;
NDP-related improvements:
RFC4191
- supports host-side router-preference

RFC3542
- if DAD fails on a interface, disables IPv6 operation on the
interface
- don't advertise MLD report before DAD finishes

Others
- fixes integer overflow for valid and preferred lifetimes
- improves timer granularity for MLD, using callout-timer.
- reflects rtadvd's IPv6 host variable information into kernel
(router only)
- adds a sysctl option to enable/disable pMTUd for multicast
packets
- performs NUD on PPP/GRE interface by default
- Redirect works regardless of ip6_accept_rtadv
- removes RFC1885-related code

From the KAME project via SUZUKI Shinsuke.
Reviewed by core.


Revision tags: yamt-pdpolicy-base
# 1.114 03-Mar-2006 rpaulo

branches: 1.114.2;
Fix typos in comments.

From: the KAME project via SUZUKI Shinsuke.


Revision tags: yamt-uio_vmspace-base5
# 1.113 21-Jan-2006 rpaulo

branches: 1.113.2; 1.113.4;
Better support of IPv6 scoped addresses.

- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.
- and, while there, don't try to remove the ff02::/32 interface route
entry in in6_ifdetach() as it's already gone.

This also includes some level of support for the standard source
address selection algorithm defined in RFC3484, which will be
completed on in the future.

From the KAME project via JINMEI Tatuya.
Approved by core@.


# 1.112 11-Dec-2005 christos

branches: 1.112.2;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base ktrace-lwp-base
# 1.111 19-Oct-2005 bouyer

In icmp6_redirect_output(), sip6 is initialised to point to the data area of
m0. But m0 may be freed later, so trying to use sip6 at the end of this
function is wrong. My guess is that we want to reference the data area
of m (the mbuf about to be send) instead at this point.
Fix a panic on Xen (where a data area of a mbuf may be unmapped when the
mbuf is freed), and probably potential data/pool corruption in other cases.


Revision tags: yamt-vop-base
# 1.110 18-Aug-2005 yamt

branches: 1.110.2;
- introduce M_MOVE_PKTHDR and use it where appropriate.
intended to be mostly API compatible with openbsd/freebsd.
- remove a glue #define in netipsec/ipsec_osdep.h.


# 1.109 29-May-2005 christos

branches: 1.109.2;
- avoid shadowed variables
- sprinkle const.


Revision tags: netbsd-3-1-1-RELEASE netbsd-3-0-3-RELEASE netbsd-3-1-RELEASE netbsd-3-0-2-RELEASE netbsd-3-1-RC4 netbsd-3-1-RC3 netbsd-3-1-RC2 netbsd-3-1-RC1 netbsd-3-0-1-RELEASE netbsd-3-0-RELEASE netbsd-3-0-RC6 netbsd-3-0-RC5 netbsd-3-0-RC4 netbsd-3-0-RC3 netbsd-3-0-RC2 netbsd-3-0-RC1 yamt-km-base4 yamt-km-base3 netbsd-3-base yamt-km-base2 yamt-km-base kent-audio2-base
# 1.108 17-Jan-2005 itojun

branches: 1.108.6; 1.108.8; 1.108.10;
shouldn't check code field on "packet too big" icmp6 message.


Revision tags: kent-audio1-beforemerge kent-audio1-base
# 1.107 25-May-2004 atatat

branches: 1.107.4;
Sysctl descriptions under net subtree (net.key not done)


Revision tags: netbsd-2-0-base
# 1.106 26-Mar-2004 itojun

branches: 1.106.2;
do not touch m->m_pkthdr.rcvif after m becomes invalid. Patrick Latifi


# 1.105 24-Mar-2004 atatat

Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.


# 1.104 17-Dec-2003 lha

Fix ICMPV6CTL_ND6_[DP]RLIST, they broke with new sysctl.
Makes ndp -r/ndp -p work again, patch from atatat


# 1.103 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.102 30-Oct-2003 simonb

Remove some assigned-to but otherwise unused variables.


# 1.101 04-Sep-2003 itojun

revamp inpcb/in6pcb so that they are more aligned with each other.
in6pcb lookup now uses hash(9).


# 1.100 25-Aug-2003 itojun

deref member in in6p directly, don't rely on existence of macro


# 1.99 22-Aug-2003 itojun

remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.


# 1.98 22-Aug-2003 itojun

change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.


# 1.97 22-Aug-2003 jonathan

Replace the set_socket() method of passing an extra struct socket*
argument to ip6_output() with a new explicit struct in6pcb* argument.
(The underlying socket can be obtained via in6pcb->inp6_socket.)

In preparation for fast-ipsec. Reviewed by itojun.


# 1.96 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.95 06-Aug-2003 itojun

m_cat may free mbuf on 2nd arg, so m_pkthdr manipulation has to happen
before m_cat call. from Julian Coleman via kame.


# 1.94 24-Jun-2003 itojun

branches: 1.94.2;
remove unneeded checks of accept_rtadv. from kame


# 1.93 24-Jun-2003 itojun

use time.tv_sec directly


# 1.92 06-Jun-2003 itojun

- sync up MLD declaration with RFC3542 (s/MLD6/MLD/)
- routing header declaration with RFC3542
(note: sizeof(ip6_rthdr0) has changed!)
also, sync up with RFC2460 routing header definition (no "strict" source
routing mode any more)

part of advanced API update (RFC2292 -> 3542).


# 1.91 03-Jun-2003 itojun

remove assumption on redirect header option processing. from kame


# 1.90 14-May-2003 itojun

always use PULLDOWN_TEST codepath.


# 1.89 31-Mar-2003 itojun

avoid mbuf leak in redirect header option attachment. more complete
fix to come. from kame


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.88 27-Sep-2002 provos

remove trailing \n in panic(). approved perry.


# 1.87 23-Sep-2002 simonb

Remove breaks after returns, unreachable returns and returns after
returns(!).


# 1.86 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame.


Revision tags: gehenna-devsw-base
# 1.85 30-Jul-2002 itojun

no need to check NULL mbuf, as we touch it already.
From: tedu <grendel@zeitbombe.org>


# 1.84 10-Jul-2002 itojun

correct ping6 -w result wth hostname with [A-Z]. PR 17540. sync w/kame


# 1.83 30-Jun-2002 thorpej

Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).


# 1.82 09-Jun-2002 itojun

whitespace cleanup


# 1.81 08-Jun-2002 itojun

whitespace cleanup


# 1.80 31-May-2002 itojun

do not mistakenly lock PMTUD route entry with RTV_MTU.


# 1.79 29-May-2002 christos

make this compile again.


# 1.78 29-May-2002 itojun

correct rmx_mtu value after PMTUD entry timeout (should be set to 0)


# 1.77 24-May-2002 itojun

extra blank line


# 1.76 24-May-2002 itojun

make a strict check before sending FQDN node information reply. sync w/kame


Revision tags: netbsd-1-6-base eeh-devprop-base newlock-base
# 1.75 05-Mar-2002 itojun

branches: 1.75.6; 1.75.8;
on redirect output, always try to attach target link layer address option.


Revision tags: ifpoll-base
# 1.74 21-Dec-2001 itojun

whitespace/costmetic sync w/kame


# 1.73 20-Dec-2001 itojun

centralize multicast group management (in6_join/leavegroup).
have a flag for ip6_output() to fragment to minimum MTU.
sync with kame


# 1.72 07-Dec-2001 itojun

correct timing to increment icmp6 MIB variables. sync with kame


# 1.71 13-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-mips-cache-base
# 1.70 29-Oct-2001 simonb

Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.


# 1.69 24-Oct-2001 itojun

more whitespace sync with kame


# 1.68 18-Oct-2001 itojun

branches: 1.68.2;
simplify per-if stats.


# 1.67 15-Oct-2001 itojun

sync with kame.
net.inet6.icmp6.nodeinfo is now a bitmap (2^0 = ping6 -w, 2^1 = ping6 -a).
give up local if there's mbuf alloc failures.
cope with ".." in hostname.
sync comments/whitespaces.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.66 22-Jun-2001 itojun

branches: 1.66.2;
remove RFC1885 compatibility code in #ifdef COMPAT_RFC1885, for icmp6
reply packet size consideration (obsolete, not used for a long time).
sync with kame


# 1.65 01-Jun-2001 itojun

use default hoplimit when incoming interface is not given to icmp6_reflect.
sync with kame


# 1.64 08-May-2001 itojun

correct faith prefix determination. use sys/netinet/if_faith.c:faithprefix()
to determine. sync with kame.
(without this change, non-faith socket may mistakenly accept for-faith traffic)


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.63 04-Apr-2001 itojun

make sure rcvif is sane on call to icmp6_reflect


# 1.62 30-Mar-2001 itojun

enable FAKE_LOOPBACK_IF case by default.
now traffic on loopback interface will be presented to bpf as normal wire
format packet (without KAME scopeid in s6_addr16[1]).

fix KAME PR 250 (host mistakenly accepts packets to fe80::x%lo0).

sync with kame.


# 1.61 21-Mar-2001 itojun

set rmx_mtu to L2 interface mtu, instead of 0, on mtudisc timeout.
ip6_output() change is for safety. sync with kame


# 1.60 08-Mar-2001 itojun

remove bogus rtfree. sync with kame. inspired by openbsd PR 1706.


# 1.59 01-Mar-2001 itojun

branches: 1.59.2;
make sure to enforce inbound ipsec policy checking, for any protocols on top
of ip (check it when final header is visited). sync with kame.
XXX kame team will need to re-check policy engine code


# 1.58 11-Feb-2001 itojun

pull latest kame pcbnotify code. synchronizes ICMPv6 path mtu discovery
behavior with other protocols (i.e. validation, use of hiwat/lowat).


# 1.57 11-Feb-2001 itojun

recover $NetBSD$ (removed by mistake)


# 1.56 10-Feb-2001 itojun

to sync with kame better, (1) remove register declaration for variables,
(2) sync whitespaces, (3) update comments. (4) bring in some of portability
and logging enhancements. no functional changes here.


# 1.55 08-Feb-2001 itojun

implement upper limit to icmp6 redirects (experimental, turned off)
negative value to {mtudisc,redirect}_{hi,lo}wat will turn off the limitation.
sync with kame.


# 1.54 07-Feb-2001 itojun

remove bogus DIAGNOSTIC. sync with kame


# 1.53 07-Feb-2001 itojun

during ip6/icmp6 inbound packet processing, do not call log() nor printf() in
normal operation (/var can get filled up by flodding bogus packets).
sysctl net.inet6.icmp6.nd6_debug will turn on diagnostic messages.
(#define ND6_DEBUG will turn it on by default)

improve stats in ND6 code.

lots of synchronziation with kame (including comments and cometic ones).


# 1.52 24-Jan-2001 itojun

- record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation


# 1.51 16-Jan-2001 itojun

s/ND6DEBUG/ND6_DEBUG/ to meet other places


# 1.50 08-Jan-2001 itojun

wrap icmp6 checksum error printf() into #ifdef ND6DEBUG.
sync with kame, NetBSD PR 11911.


# 1.49 11-Dec-2000 itojun

no need to rtalloc1() twice in pmtud. from kame


# 1.48 09-Dec-2000 itojun

update icmp6 too big validation. the change is necessary since pmtud is
mandatory for IPv6 (so we can't just validate by using connected pcb - we need
to allow traffic from unconnected pcb to do pmtud).
- if the traffic is validated by xx_ctlinput, allow up to "hiwat" pmtud
route entries.
- if the traffic was not validated by xx_ctlinput, allow up to "lowat" pmtud
route entries (there's upper limit, so bad guys cannot blow up our routing
table).
sync with kame

XXX need to think again about default hiwat/lowat value.
XXX victim selection to help starvation case


# 1.47 11-Nov-2000 itojun

improve spec conformance of node information query (07).
sync with kame.


# 1.46 18-Oct-2000 itojun

verify ICMPv6 too big messages based on TCP pcbs, and/or IPsec SA.
TODO: udp6, and sendto consideration. as pmtud is mandatory for IPv6,
it is rather important for us to support those cases.
TODO: more testing
TODO: kame sync


# 1.45 10-Oct-2000 itojun

sync with kame ($KAME$)


# 1.44 02-Oct-2000 itojun

fix compilation without INET. fix confusion between ipsecstat and ipsec6stat.
sync with kame.


# 1.43 16-Sep-2000 itojun

kame sys/netinet6/icmp6.c 1.140 -> 1.144
> in the check for the incoming redirect message, examine the gateway
> (from the routing table) only when the address family of the gateway is
> AF_INET6.


# 1.42 19-Aug-2000 itojun

- icmp6 nodeinfo: remove possibility of unaligned pointer access.
- jumbo payload output: fix incorrect mbuf manipulation
- pedant: align issues, mbuf assumption
(sync with kame)


# 1.41 03-Aug-2000 itojun

clearifications in icmp6 node query support.
XXX previous commit included "supported qtypes" icmp6 node query support.
sorry commit message was mistaken.


# 1.40 03-Aug-2000 itojun

correct typo in #define. ICMP6_NI_SUCESS -> SUCCESS (notice missing C).
sync with kame.


# 1.39 30-Jul-2000 itojun

sync comment with reality


# 1.38 28-Jul-2000 itojun

nuke the following sysctl variables. "ppsratelimit" should work better.
need to recompile sbin/sysctl after updating /usr/include.
net.inet.tcp.rstratelimit
net.inet.icmp.errratelimit
net.inet6.icmp6.errratelimit


# 1.37 09-Jul-2000 itojun

add ppsratelimit(9), which does event-per-sec rate limitation.
use it from icmp6 error rate limitation code.
XXX better name for the function?


# 1.36 07-Jul-2000 itojun

sync with kame.
introduce in6_{recover,embed}scope, for in-kernel scoped-address manipulation.
improve in6_pcbnotify.


# 1.35 06-Jul-2000 itojun

- do not use bitfield for router renumbering header.
- add protection mechanism against ND cache corruption due to bad NUD hints.
- more stats
- icmp6 pps limitation. TOOD: should implement ppsratecheck(9).


# 1.34 28-Jun-2000 mrg

<vm/vm.h> -> <uvm/uvm_extern.h>


Revision tags: netbsd-1-5-base
# 1.33 13-Jun-2000 itojun

branches: 1.33.2;
signedness issue with char, take 2. confirmed with i386 cc -funsigned-char.


# 1.32 13-Jun-2000 itojun

workaround to suppress warning on char == unsigned char arch.


# 1.31 12-Jun-2000 itojun

better conformance to draft-ietf-ipngwg-icmp-name-lookups-05.
the old code was chimera of 03 and 05 draft.

-n by default, since IPv6 reverse lookup takes too much time.
use -H to enable reverse name lookup.


Revision tags: minoura-xpg4dl-base
# 1.30 22-May-2000 itojun

branches: 1.30.2;
disallow negative numbers for ratelimit interval (tcp, icmp, icmp6).


# 1.29 09-May-2000 itojun

do not try NUD unless the gateway is a real neighbor.
real fix to KAME PR 245 (workaround has been implemented).


# 1.28 13-Apr-2000 itojun

do not return icmp6 error against icmp6 error.
(this is due to a bug in header chain chasing)


# 1.27 22-Mar-2000 itojun

use ip6_{last,next}hdr in icmp6 inbound packet parsing.


# 1.26 01-Mar-2000 itojun

introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)


# 1.25 28-Feb-2000 itojun

fix ICMPv6 redirect input. the bug can result in invalid ND entry.


# 1.24 28-Feb-2000 itojun

support draft-ietf-ipngwg-icmp-name-lookups-05.txt, drop support for
draft-ietf-ipngwg-icmp-name-lookups-04.txt.

There are certain bitfield change in 04 draft to 05 draft, which makes
04 "ping6 -a" and 05 "ping6 -a" not interoperable. sigh.


# 1.23 26-Feb-2000 itojun

bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.


# 1.22 17-Feb-2000 darrenr

Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.


# 1.21 15-Feb-2000 thorpej

Fix a couple of brainos in the last.


# 1.20 14-Feb-2000 thorpej

Use ratecheck() for ICMP6 rate limiting.


Revision tags: chs-ubc2-newbase
# 1.19 06-Feb-2000 itojun

fix include pathname for better rfc2292 compliance.


# 1.18 16-Jan-2000 itojun

add missing ipcomp cases.


# 1.17 07-Jan-2000 itohy

Rename variable "prep" for PReP port.


# 1.16 06-Jan-2000 itojun

remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...


# 1.15 05-Jan-2000 itojun

avoid panic on getsockopt(ICMPV6_FILTER).


# 1.14 02-Jan-2000 itojun

add net.inet6.icmp6.nodeinfo sysctl.
this allows you to disable/enable ICMPv6 node information query/reply
processing (which tells remote end the gethostname(3) setting, interface
addresses on the node, and some other things - documented in
draft-ietf-ipngwg-icmp-name-lookup* or something alike).

to test it, try ping6 -w ::1 with nodeinfo=0 and nodeinfo=1.
(sync with kame change)


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.13 15-Dec-1999 itojun

do not overwrite traffic class field when we write IPv6 version field.


# 1.12 13-Dec-1999 itojun

sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)


Revision tags: comdex-fall-1999-base fvdl-softdep-base
# 1.11 01-Oct-1999 itojun

branches: 1.11.2; 1.11.8;
consistent logging for icmp6 redirects
XXX should make logs 1-liner so that duplicated logs can be compressed
by syslog(8)?


Revision tags: chs-ubc2-base
# 1.10 31-Jul-1999 itojun

sync with recent KAME.
- loosen ipsec restriction on packet diredtion.
- revise icmp6 redirect handling on IsRouter bit.
- tcp/udp notification processing (link-local address case)
- cosmetic fixes (better code share across *BSD).


# 1.9 30-Jul-1999 itojun

remove reference to in6_systm.h (file itself will be removed afterwords)


# 1.8 22-Jul-1999 itojun

- implement IPv6 pmtud, which is necessary for TCP6.
- fix memory leak on SO_DEBUG over TCP.


# 1.7 22-Jul-1999 itojun

change unnecessary u_long/long into u_int32_t or something relevant.
more fixes should follow.


# 1.6 09-Jul-1999 thorpej

defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).


# 1.5 06-Jul-1999 itojun

sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups


# 1.4 06-Jul-1999 itojun

checked build on alpha and i386, with GENERIC.v6.
fixed several sizeof(void *) and sizeof(size_t) issues on alpha.

Thanks to: Dave Huang and Tim Rightnour


# 1.3 03-Jul-1999 thorpej

RCS ID police.


# 1.2 01-Jul-1999 itojun

branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.1 28-Jun-1999 itojun

branches: 1.1.2;
file icmp6.c was initially added on branch kame.


# 1.243 06-Oct-2019 uwe

icmp6_notify_error - fix ctlfunc typedef to match pr_ctlinput,
drop the cast that is no longer necessary.


Revision tags: netbsd-9-base phil-wifi-20190609 isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.242 22-Dec-2018 maxv

Replace: M_COPY_PKTHDR -> m_copy_pkthdr. No functional change, since the
former is a macro to the latter.


# 1.241 22-Dec-2018 maxv

Replace: M_MOVE_PKTHDR -> m_move_pkthdr. No functional change, since the
former is a macro to the latter.


Revision tags: pgoyette-compat-1126
# 1.240 25-Oct-2018 ozaki-r

Remove a leftover debug printf

Pointed out by hannken@


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.239 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


Revision tags: pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625
# 1.238 01-Jun-2018 ozaki-r

branches: 1.238.2;
Fix _rt_free via rtrequest(RTM_DELETE) hangs in rt_timer handlers

A rt_timer handler is passed a rtentry with an extra reference that avoids the
rtentry is accidentally released. So rt_timer handers must release the reference
of a passed rtentry by themselves (but they didn't).


Revision tags: pgoyette-compat-0521
# 1.237 07-May-2018 maxv

Remove misleading comments.


Revision tags: pgoyette-compat-0502
# 1.236 01-May-2018 maxv

Remove now unused net_osdep.h includes, the other BSDs did the same.


# 1.235 29-Apr-2018 maxv

Replace
m_copym(m, 0, M_COPYALL, M_DONTWAIT)
by
m_copypacket(m, M_DONTWAIT)
when it is obvious that 'm' has M_PKTHDR set.


# 1.234 28-Apr-2018 maxv

Remove unused ipsec_var.h includes.


# 1.233 27-Apr-2018 maxv

Fix a bug introduced in rev1.154 (2009). mcl_cache still has a size of
MCLBYTES, so the area allocated is still too small.

I think it should have been MEXTMALLOC, and of course I can't test my
change.


# 1.232 26-Apr-2018 maxv

Stop using m_copy(), use m_copym() directly. m_copy is useless,
undocumented and confusing.


# 1.231 26-Apr-2018 maxv

Use M_UNWRITABLE, no functional change.


Revision tags: pgoyette-compat-0422 pgoyette-compat-0415
# 1.230 14-Apr-2018 maxv

Fix 'icmp6len', it shouldn't be ip6_plen, because we may not be at the
beginning of the packet (off+ip6_plen is beyond the end of the mbuf). By
luck, the IP6_EXTHDR_GET that follows will fail and prevent buffer
overflows in non-jumbogram packets.

For jumbograms we will probably be in trouble here; but it doesn't seem
possible to craft reliably a jumbogram for a non-jumbogram-enabled device.

So I don't think it's a huge problem.


# 1.229 14-Apr-2018 maxv

Cosmetic, and remove one XXX (no problem).


# 1.228 14-Apr-2018 maxv

Remove the RH0 code from ICMPv6. RH0 is deprecated by RFC5095 (2007) for
security reasons. We already removed it in Route6.

In addition there was an mbuf bug here: calling IP6_EXTHDR_GET twice with
the same offset, but still using the pointer from the first call, which
could have been made invalid. By luck, m_pulldown leaves zero-sized mbufs
in place, instead of freeing them.

And in general, using a 'finaldst' pointer on the mbuf, and then modifying
that mbuf with IP6_EXTHDR_GET with a smaller offset, was really error-
prone.


# 1.227 14-Apr-2018 maxv

Remove dead code. It is the same as the non-obsolete one, since
ICMP6_DST_UNREACH_NOTNEIGHBOR == ICMP6_DST_UNREACH_BEYONDSCOPE,
and the code leads to the same errno value (EHOSTUNREACH).


# 1.226 12-Apr-2018 maxv

Synchronize the code between raw_ip6.c<->icmp6.c<->raw_ip.c, so that it is
the same everywhere.


# 1.225 12-Apr-2018 maxv

Remove misleading comment; we're just checking the SP, not verifying the
AH/ESP payload. While here style a bit.


Revision tags: pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322
# 1.224 21-Mar-2018 roy

Sprinkle more soroverflow().


Revision tags: pgoyette-compat-0315 pgoyette-compat-base
# 1.223 28-Feb-2018 maxv

branches: 1.223.2;
Remove unused ipsec_private.h includes.


# 1.222 26-Feb-2018 maxv

Remove redundant condition (harmless). PR/53030.


# 1.221 26-Feb-2018 maxv

Dedup: merge ipsec4_in_reject and ipsec6_in_reject into ipsec_in_reject.
While here fix misleading comment.

ok ozaki-r@


# 1.220 12-Feb-2018 maxv

Replace bcopy -> memcpy when it is obvious that the areas don't overlap.
Rearrange ip6_splithdr() for clarity.


# 1.219 23-Jan-2018 maxv

Style, localify, remove XXX when there's no issue, and switch 'extra'
to int.


# 1.218 23-Jan-2018 maxv

Fix the check on 'maxlen', we are not creating struct icmp6_hdr but
struct nd_redirect (which is bigger). Also, make sure we can add a
struct nd_opt_rd_hdr.

Normally this doesn't change anything, since the mbuf has IPV6_MMTU
bytes, and it's always way bigger than what we need.


# 1.217 23-Jan-2018 maxv

Fix info leak. We are allocating a slot of size:

roundup(sizeof(*nd_opt) + ifp->if_addrlen, 8)

But we are not filling in the padding caused by the roundup, and therefore
several bytes are leaked, in the mbuf we're about to send to the network.


# 1.216 23-Jan-2018 maxv

Fix twice the same mistake: 'last' can't be null, so there's no point in
having this misleading branch.


# 1.215 23-Jan-2018 maxv

Style, and four fixes:

* Remove the (disabled) IPPROTO_ESP check. If the packet was decrypted it
will have M_DECRYPTED, and this is already checked.

* Memory leaks in icmp6_error2. They seem hardly triggerable.

* Fix miscomputation in _icmp6_input, the ICMP6 header is not guaranteed
to be located right after the IP6 header. ok mlelstv@

* Memory leak in _icmp6_input. This one seems to be impossible to trigger.


Revision tags: tls-maxphys-base-20171202
# 1.214 05-Nov-2017 ozaki-r

Fix usages of ipsec_used

If IPsec isn't used, we must go back to the normal path.

PR kern/52659


Revision tags: nick-nhusb-base-20170825
# 1.213 02-Aug-2017 ozaki-r

Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.


Revision tags: perseant-stdc-iso10646-base
# 1.212 07-Jul-2017 knakahara

fix PR kern/52353. implemented by ozaki-r@n.o. I just commit by proxy.

XXX need to pullup to -8.


Revision tags: netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.211 14-Mar-2017 ozaki-r

branches: 1.211.6;
Replace DIAGNOSTIC + panic with CTASSERT


# 1.210 17-Feb-2017 ozaki-r

Rename if_acquire_NOMPSAFE to if_acquire

It can be used in MP-safe ways. So let's remove the confusing postfix.
If it's used in a unsafe way, warn NOMPSAFE in a comment.


# 1.209 13-Feb-2017 ozaki-r

Protect mtudisc and redirect stuffs of icmp/icmp6 with mutex

We have to run pr_init of icmp and icmp6 prior to tcp and tcp6 ones
for mutex initialization.


# 1.208 07-Feb-2017 ozaki-r

Add missing NULL checks for m_get_rcvif


Revision tags: nick-nhusb-base-20170204
# 1.207 02-Feb-2017 ozaki-r

Defer some pr_input to workqueue

pr_input is currently called in softint. Some pr_input such as ICMP, ICMPv6
and CARP can add/delete/update IP addresses and routing table entries. For
example, icmp6_redirect_input updates an a routing table entry and
nd6_ra_input may delete an IP address.

Basically such operations shouldn't be done in softint. That aside, we have
a reason to avoid the situation; psz/psref waits cannot be used in softint,
however they are required to work in such pr_input in the MP-safe world.

The change implements the workqueue pr_input framework called wqinput which
provides a means to defer pr_input of a protocol to workqueue easily.
Currently icmp_input, icmp6_input, carp_proto_input and carp6_proto_input
are deferred to workqueue by the framework.

Proposed and discussed on tech-kern and tech-net


# 1.206 16-Jan-2017 christos

ip6_sprintf -> IN6_PRINT so that we pass the size.


# 1.205 16-Jan-2017 ryo

Make ip6_sprintf(), in_fmtaddr(), lla_snprintf() and icmp6_redirect_diag() mpsafe.

Reviewed by ozaki-r@


Revision tags: bouyer-socketcan-base
# 1.204 13-Jan-2017 ozaki-r

branches: 1.204.2;
Tweak icmp6_input; always use off, not *offp


Revision tags: pgoyette-localcount-20170107
# 1.203 12-Dec-2016 ozaki-r

Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.202 11-Dec-2016 ozaki-r

Correct sanity checks of icmp6_redirect_output

- rt->rt_ifp is always non-NULL
- Checking RTF_UP here is just racy and meaningless
- The arguments should be non-NULL (at least for now)


Revision tags: nick-nhusb-base-20161204
# 1.201 15-Nov-2016 mlelstv

Enforce alignment requirements that are violated in some cases.
For machines that don't need strict alignment (i386,amd64,vax,m68k) this
is a no-op.

Fixes PR kern/50766 but should be improved.


Revision tags: pgoyette-localcount-20161104
# 1.200 31-Oct-2016 ozaki-r

Fix race condition of in6_selectsrc

in6_selectsrc returned a pointer to in6_addr that wan't guaranteed to be
safe by pserialize (or psref), which was racy. Let callers pass a pointer
to in6_addr and in6_selectsrc copy a result to it inside pserialize
critical sections.


# 1.199 25-Oct-2016 ozaki-r

Remove unnecessary argument

No functional change.


# 1.198 18-Oct-2016 ozaki-r

Remove unnecessary pserialize_read_enter


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.197 26-Aug-2016 dholland

PR 51434 David Binderman: remove redundant test.


# 1.196 19-Aug-2016 roy

Revert r1.148
IP6_EXTHDR_GET ensures that a icmp6 header can be fetched from the mbuf
so m_pullup does not need to be called.

While here, we can safely increament interface error stats even with an
invalidated mbuf because we have a saved reference to the interface.


Revision tags: pgoyette-localcount-20160806
# 1.195 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


Revision tags: pgoyette-localcount-20160726
# 1.194 15-Jul-2016 ozaki-r

Use sin6tosa and sin6tocsa macros

No functional change.


# 1.193 15-Jul-2016 ozaki-r

Use ifatoia6 macro

No functional change.


Revision tags: pgoyette-localcount-base nick-nhusb-base-20160907
# 1.192 07-Jul-2016 ozaki-r

branches: 1.192.2;
Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.


# 1.191 05-Jul-2016 ozaki-r

Use ia6 or ia instead of ifa as a variable name of struct in6_ifaddr

We conventionally use ifa for struct ifaddr and use ia6 or ia for
struct in6_ifaddr.

No functional change.


# 1.190 28-Jun-2016 ozaki-r

Add missing NULL checks for m_get_rcvif_psref


# 1.189 21-Jun-2016 ozaki-r

Make sure returning ifp from in6_select* functions psref-ed

To this end, callers need to pass struct psref to the functions
and the fuctions acquire a reference of ifp with it. In some cases,
we can simply use if_get_byindex, however, in other cases
(say rt->rt_ifp and ia->ifa_ifp), we have no MP-safe way for now.
In order to take a reference anyway we use non MP-safe function
if_acquire_NOMPSAFE for the latter cases. They should be fixed in
the future somehow.


# 1.188 10-Jun-2016 ozaki-r

Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.


# 1.187 10-Jun-2016 ozaki-r

Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.186 18-May-2016 ozaki-r

Don't try to get outif unnecessarily from in6_selectsrc

The got outif is unused.


# 1.185 17-May-2016 ozaki-r

Get rcvif once and reuse it

No functional change.


# 1.184 17-May-2016 ozaki-r

Make sure icmp6_redirect_input frees mbuf before return


# 1.183 12-May-2016 ozaki-r

Protect ifnet list with psz and psref

The change ensures that ifnet objects in the ifnet list aren't freed during
list iterations by using pserialize(9) and psref(9).

Note that the change adds a pslist(9) for ifnet but doesn't remove the
original ifnet list (ifnet_list) to avoid breaking kvm(3) users. We
shouldn't use the original list in the kernel anymore.


Revision tags: nick-nhusb-base-20160422
# 1.182 04-Apr-2016 ozaki-r

Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html


# 1.181 01-Apr-2016 ozaki-r

Remove unnecessary casts and do s/0/NULL/ for rtrequest


# 1.180 01-Apr-2016 ozaki-r

Refine nd6log

Add __func__ to nd6log itself instead of adding it to callers.


Revision tags: nick-nhusb-base-20160319
# 1.179 21-Jan-2016 riastradh

Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.


# 1.178 21-Jan-2016 riastradh

Give proper prototype to ip_output.


Revision tags: nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.177 14-Sep-2015 ozaki-r

Update icmp6_redirect_timeout_q when changing net.inet6.icmp6.redirtimeout

We have to update icmp6_redirect_timeout_q as well as icmp6_redirtimeout
when changing net.inet6.icmp6.redirtimeout via sysctl. The updating logic
is copied from sysctl_net_inet_icmp_redirtimeout.

This change is from s-yamaguchi@IIJ (with KNF by ozaki-r) and fixes
PR kern/50240.


# 1.176 31-Aug-2015 ozaki-r

Make rt_refcnt take into account rt_timer


# 1.175 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.174 24-Aug-2015 ozaki-r

Change 0 to NULL for rtrequest's last argument (struct rtentry **ret_nrt)


# 1.173 07-Aug-2015 ozaki-r

Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.


# 1.172 24-Jul-2015 ozaki-r

Fix rtfree-ing wrong rtentry


# 1.171 17-Jul-2015 ozaki-r

Reform use of rt_refcnt

rt_refcnt of rtentry was used in bad manners, for example, direct rt_refcnt++
and rt_refcnt-- outside route.c, "rt->rt_refcnt++; rtfree(rt);" idiom, and
touching rt after rt->rt_refcnt--.

These abuses seem to be needed because rt_refcnt manages only references
between rtentry and doesn't take care of references during packet processing
(IOW references from local variables). In order to reduce the above abuses,
the latter cases should be counted by rt_refcnt as well as the former cases.

This change improves consistency of use of rt_refcnt:
- rtentry is always accessed with rt_refcnt incremented
- rtentry's rt_refcnt is decremented after use (rtfree is always used instead
of rt_refcnt--)
- functions returning rtentry increment its rt_refcnt (and caller rtfree it)

Note that rt_refcnt prevents rtentry from being freed but doesn't prevent
rtentry from being updated. Toward MP-safe, we need to provide another
protection for rtentry, e.g., locks. (Or introduce a better data structure
allowing concurrent readers during updates.)


Revision tags: nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.170 25-Nov-2014 christos

branches: 1.170.2;
CID 977389: Out of bounds access.


Revision tags: netbsd-7-0-2-RELEASE netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.169 06-Jun-2014 rmind

branches: 1.169.2;
- Eliminate RTFREE() macro in favour of rtfree() function.
- Make rtcache() function static.


# 1.168 30-May-2014 christos

Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.


# 1.167 19-May-2014 rmind

- Split off PRU_ATTACH and PRU_DETACH logic into separate functions.
- Replace malloc with kmem and eliminate M_PCB while here.
- Sprinkle more asserts.


Revision tags: rmind-smpnet-nbase rmind-smpnet-base
# 1.166 18-May-2014 rmind

Use IFNET_FIRST() rather than open coding ifnet access.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.165 25-Feb-2014 pooka

branches: 1.165.2;
Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.


# 1.164 20-Feb-2014 joerg

Bail out in case m_pulldown failed.


# 1.163 23-Nov-2013 christos

convert from CIRCLEQ to TAILQ.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.162 05-Jun-2013 christos

branches: 1.162.2;
IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.


Revision tags: agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.161 23-Jun-2012 christos

branches: 1.161.2;
4 new sysctls to avoid ipv6 DoS attacks from OpenBSD


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.160 22-Mar-2012 drochner

remove KAME IPSEC, replaced by FAST_IPSEC


Revision tags: netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.159 31-Dec-2011 christos

branches: 1.159.2; 1.159.6; 1.159.8;
- fix offsetof usage, and redundant defines
- kill pointer casts to 0


# 1.158 19-Dec-2011 drochner

rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.157 31-Aug-2011 plunky

branches: 1.157.2; 1.157.6;
NULL does not need a cast


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 rmind-uvmplock-base
# 1.156 12-Sep-2010 drochner

avoid NULL dereference in error case


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.155 18-Oct-2009 christos

branches: 1.155.2; 1.155.4;
fix the sun2 case for real.


# 1.154 12-Oct-2009 christos

unbreak sun2.


# 1.153 16-Sep-2009 pooka

Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.152 18-Mar-2009 cegger

bzero -> memset


# 1.151 18-Mar-2009 cegger

bcmp -> memcmp


Revision tags: netbsd-5-2-3-RELEASE netbsd-5-1-5-RELEASE netbsd-5-2-2-RELEASE netbsd-5-1-4-RELEASE netbsd-5-2-1-RELEASE netbsd-5-1-3-RELEASE netbsd-5-2-RELEASE netbsd-5-2-RC1 netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 nick-hppapmap-base2 netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 haad-dm-base mjf-devfs2-base
# 1.150 03-Oct-2008 adrianp

branches: 1.150.2; 1.150.8;
Fix for CVE-2008-3530 from matt@
Implement improved checking for MTU values on ICMP 'Packet Too Big Messages'


Revision tags: wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.149 06-Aug-2008 plunky

Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base
# 1.148 07-May-2008 bouyer

branches: 1.148.2; 1.148.6;
Sync with ipv4 icmp_input(): make sure the mbuf is writable and
contains the entire icmp message befre calling icmp6_input().
should fix "panic: mbuf too short for IPv6 header" seen by several peoples.


# 1.147 04-May-2008 thorpej

Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577


Revision tags: yamt-nfs-mp-base
# 1.146 23-Apr-2008 thorpej

branches: 1.146.2;
Use <net/net_stats.h> / netstat_sysctl().


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.145 15-Apr-2008 thorpej

branches: 1.145.2;
Make ip6 and icmp6 stats per-cpu.


# 1.144 08-Apr-2008 thorpej

Change IPv6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ip6stat structure; old netstat
binaries will continue to work properly.


# 1.143 08-Apr-2008 thorpej

Change ICMP6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old icmp6stat structure; old netstat
binaries will continue to work properly.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.142 27-Feb-2008 matt

Convert to ansi definitions from old-style definitons.
Remember that func() is not ansi, func(void) is.


Revision tags: nick-net80211-sync-base bouyer-xeni386-merge1 vmlocking2-base3 bouyer-xeni386-nbase yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 bouyer-xeni386-base yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase mjf-devfs-base matt-armv6-base jmcneill-pm-base hpcarm-cleanup-base reinoud-bufcleanup-base
# 1.141 04-Dec-2007 dyoung

branches: 1.141.8; 1.141.12;
Use IFNET_FOREACH() and IFADDR_FOREACH().


Revision tags: vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.140 01-Nov-2007 dyoung

branches: 1.140.2; 1.140.4;
De-__P().


# 1.139 29-Oct-2007 dyoung

The IPv6 stack labels incoming packets with an m_tag whose payload
is a struct ip6aux. A struct ip6aux used to contain a pointer to
an in6_ifaddr, but that pointer could become a dangling reference
in the lifetime of the m_tag, because ip6_setdstifaddr() did not
increase the in6_ifaddr's reference count. I have removed the
pointer from ip6aux. I load it with the interesting fields from
the in6_ifaddr (an IPv6 address, a scope ID, and some flags),
instead.


# 1.138 24-Oct-2007 dyoung

Replace rote sockaddr_in6 initializations (memset(), set sa6_family,
sa6_len, and sa6_add) with sockaddr_in6_init() calls.

De-__P(). Constify. KNF. Shorten a staircase. Change bcmp() to
memcmp().

Extract subroutine in6_setzoneid() from in6_setscope(), for re-use
soon.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base vmlocking-base
# 1.137 19-Sep-2007 dyoung

branches: 1.137.4;
1) Introduce a new socket option, (SOL_SOCKET, SO_NOHEADER), that
tells a socket that it should both add a protocol header to tx'd
datagrams and remove the header from rx'd datagrams:

int onoff = 1, s = socket(...);
setsockopt(s, SOL_SOCKET, SO_NOHEADER, &onoff);

2) Add an implementation of (SOL_SOCKET, SO_NOHEADER) for raw IPv4
sockets.

3) Reorganize the protocols' pr_ctloutput implementations a bit.
Consistently return ENOPROTOOPT when an option is unsupported,
and EINVAL if a supported option's arguments are incorrect.
Reorganize the flow of code so that it's more clear how/when
options are passed down the stack until they are handled.

Shorten some pr_ctloutput staircases for readability.

4) Extract common mbuf code into subroutines, add new sockaddr
methods, and introduce a new subroutine, fsocreate(), for reuse
later; use it first in sys_socket():

struct mbuf *m_getsombuf(struct socket *so)

Create an mbuf and make its owner the socket `so'.

struct mbuf *m_intopt(struct socket *so, int val)

Create an mbuf, make its owner the socket `so', put the
int `val' into it, and set its length to sizeof(int).


int fsocreate(..., int *fd)

Create a socket, a la socreate(9), put the socket into the
given LWP's descriptor table, return the descriptor at `fd'
on success.

void *sockaddr_addr(struct sockaddr *sa, socklen_t *slenp)
const void *sockaddr_const_addr(const struct sockaddr *sa, socklen_t *slenp)

Extract a pointer to the address part of a sockaddr. Write
the length of the address part at `slenp', if `slenp' is
not NULL.

socklen_t sockaddr_getlen(const struct sockaddr *sa)

Return the length of a sockaddr. This just evaluates to
sa->sa_len. I only add this for consistency with code that
appears in a portable userland library that I am going to
import.

const struct sockaddr *sockaddr_any(const struct sockaddr *sa)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.

const void *sockaddr_anyaddr(const struct sockaddr *sa, socklen_t *slenp)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.


Revision tags: nick-csl-alignment-base5
# 1.136 10-Aug-2007 dyoung

branches: 1.136.2;
Constify. bcopy -> memcpy.


Revision tags: matt-mips64-base
# 1.135 19-Jul-2007 dyoung

branches: 1.135.4; 1.135.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.134 13-Jun-2007 dyoung

branches: 1.134.2;
Persuasive programming: check M_UNWRITABLE(m, len) instead of
m->m_len<len before pulling up, because that helps make it clear
that we m_pullup() in order to guarantee that the contiguous region
is *writable*.


# 1.133 23-May-2007 christos

Ansify + add a few comments, from Karl Sj��dahl


Revision tags: yamt-idlelwp-base8
# 1.132 02-May-2007 dyoung

Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


Revision tags: thorpej-atomic-base
# 1.131 04-Mar-2007 christos

branches: 1.131.2; 1.131.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.130 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


# 1.129 10-Feb-2007 degroote

branches: 1.129.2;
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.128 29-Jan-2007 dyoung

bzero -> memset


# 1.127 15-Jan-2007 dyoung

Cosmetic: indent using ASCII horizontal tab, insert space following
comma, wrap line.


# 1.126 15-Jan-2007 degroote

Fix an infinite loop ( and local dos ) in the case where the ip6_hdr and
the icmp6_hdr are not in the same mbuf.
Fix pr/34994 and probably pr/35333
Ok @rpaulo


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.125 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.124 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


Revision tags: netbsd-4-base
# 1.123 16-Nov-2006 christos

branches: 1.123.2;
__unused removal on arguments; approved by core.


Revision tags: yamt-splraiseipl-base2
# 1.122 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.121 05-Sep-2006 dyoung

branches: 1.121.2; 1.121.4;
Simplify and repair icmp6_input() to stop the kernel from panicking
in m_copydata() when an ICMP6_ECHO_REQUEST is received, as reported
by Tatoku Ogaito on current-users@.


Revision tags: yamt-pdpolicy-base8
# 1.120 01-Sep-2006 dyoung

Vastly simplify the code that copies an ICMP6 packet to two data
paths: ICMP6 reply path, and socket path.


# 1.119 30-Aug-2006 christos

declare the type of code.


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.118 11-Jul-2006 tron

Clear mbuf checksum flags before passing it to ip6_output(). We might
recycle a mbuf which contained a hardware provided checksum. This
fixes "traceroute6" to a machine which is using a wm(4) interface
that has UDP or TCP checksum offload enabled.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base
# 1.117 07-Jun-2006 kardel

branches: 1.117.2;
merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base simonb-timecounters-base
# 1.116 15-Apr-2006 christos

branches: 1.116.2;
Coverity CID 740: Change constant comparisons to MCLBYTES to KASSERT and remove
extraneous tests.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2
# 1.115 05-Mar-2006 rpaulo

branches: 1.115.2; 1.115.4;
NDP-related improvements:
RFC4191
- supports host-side router-preference

RFC3542
- if DAD fails on a interface, disables IPv6 operation on the
interface
- don't advertise MLD report before DAD finishes

Others
- fixes integer overflow for valid and preferred lifetimes
- improves timer granularity for MLD, using callout-timer.
- reflects rtadvd's IPv6 host variable information into kernel
(router only)
- adds a sysctl option to enable/disable pMTUd for multicast
packets
- performs NUD on PPP/GRE interface by default
- Redirect works regardless of ip6_accept_rtadv
- removes RFC1885-related code

From the KAME project via SUZUKI Shinsuke.
Reviewed by core.


Revision tags: yamt-pdpolicy-base
# 1.114 03-Mar-2006 rpaulo

branches: 1.114.2;
Fix typos in comments.

From: the KAME project via SUZUKI Shinsuke.


Revision tags: yamt-uio_vmspace-base5
# 1.113 21-Jan-2006 rpaulo

branches: 1.113.2; 1.113.4;
Better support of IPv6 scoped addresses.

- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.
- and, while there, don't try to remove the ff02::/32 interface route
entry in in6_ifdetach() as it's already gone.

This also includes some level of support for the standard source
address selection algorithm defined in RFC3484, which will be
completed on in the future.

From the KAME project via JINMEI Tatuya.
Approved by core@.


# 1.112 11-Dec-2005 christos

branches: 1.112.2;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base ktrace-lwp-base
# 1.111 19-Oct-2005 bouyer

In icmp6_redirect_output(), sip6 is initialised to point to the data area of
m0. But m0 may be freed later, so trying to use sip6 at the end of this
function is wrong. My guess is that we want to reference the data area
of m (the mbuf about to be send) instead at this point.
Fix a panic on Xen (where a data area of a mbuf may be unmapped when the
mbuf is freed), and probably potential data/pool corruption in other cases.


Revision tags: yamt-vop-base
# 1.110 18-Aug-2005 yamt

branches: 1.110.2;
- introduce M_MOVE_PKTHDR and use it where appropriate.
intended to be mostly API compatible with openbsd/freebsd.
- remove a glue #define in netipsec/ipsec_osdep.h.


# 1.109 29-May-2005 christos

branches: 1.109.2;
- avoid shadowed variables
- sprinkle const.


Revision tags: netbsd-3-1-1-RELEASE netbsd-3-0-3-RELEASE netbsd-3-1-RELEASE netbsd-3-0-2-RELEASE netbsd-3-1-RC4 netbsd-3-1-RC3 netbsd-3-1-RC2 netbsd-3-1-RC1 netbsd-3-0-1-RELEASE netbsd-3-0-RELEASE netbsd-3-0-RC6 netbsd-3-0-RC5 netbsd-3-0-RC4 netbsd-3-0-RC3 netbsd-3-0-RC2 netbsd-3-0-RC1 yamt-km-base4 yamt-km-base3 netbsd-3-base yamt-km-base2 yamt-km-base kent-audio2-base
# 1.108 17-Jan-2005 itojun

branches: 1.108.6; 1.108.8; 1.108.10;
shouldn't check code field on "packet too big" icmp6 message.


Revision tags: kent-audio1-beforemerge kent-audio1-base
# 1.107 25-May-2004 atatat

branches: 1.107.4;
Sysctl descriptions under net subtree (net.key not done)


Revision tags: netbsd-2-0-base
# 1.106 26-Mar-2004 itojun

branches: 1.106.2;
do not touch m->m_pkthdr.rcvif after m becomes invalid. Patrick Latifi


# 1.105 24-Mar-2004 atatat

Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.


# 1.104 17-Dec-2003 lha

Fix ICMPV6CTL_ND6_[DP]RLIST, they broke with new sysctl.
Makes ndp -r/ndp -p work again, patch from atatat


# 1.103 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.102 30-Oct-2003 simonb

Remove some assigned-to but otherwise unused variables.


# 1.101 04-Sep-2003 itojun

revamp inpcb/in6pcb so that they are more aligned with each other.
in6pcb lookup now uses hash(9).


# 1.100 25-Aug-2003 itojun

deref member in in6p directly, don't rely on existence of macro


# 1.99 22-Aug-2003 itojun

remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.


# 1.98 22-Aug-2003 itojun

change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.


# 1.97 22-Aug-2003 jonathan

Replace the set_socket() method of passing an extra struct socket*
argument to ip6_output() with a new explicit struct in6pcb* argument.
(The underlying socket can be obtained via in6pcb->inp6_socket.)

In preparation for fast-ipsec. Reviewed by itojun.


# 1.96 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.95 06-Aug-2003 itojun

m_cat may free mbuf on 2nd arg, so m_pkthdr manipulation has to happen
before m_cat call. from Julian Coleman via kame.


# 1.94 24-Jun-2003 itojun

branches: 1.94.2;
remove unneeded checks of accept_rtadv. from kame


# 1.93 24-Jun-2003 itojun

use time.tv_sec directly


# 1.92 06-Jun-2003 itojun

- sync up MLD declaration with RFC3542 (s/MLD6/MLD/)
- routing header declaration with RFC3542
(note: sizeof(ip6_rthdr0) has changed!)
also, sync up with RFC2460 routing header definition (no "strict" source
routing mode any more)

part of advanced API update (RFC2292 -> 3542).


# 1.91 03-Jun-2003 itojun

remove assumption on redirect header option processing. from kame


# 1.90 14-May-2003 itojun

always use PULLDOWN_TEST codepath.


# 1.89 31-Mar-2003 itojun

avoid mbuf leak in redirect header option attachment. more complete
fix to come. from kame


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.88 27-Sep-2002 provos

remove trailing \n in panic(). approved perry.


# 1.87 23-Sep-2002 simonb

Remove breaks after returns, unreachable returns and returns after
returns(!).


# 1.86 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame.


Revision tags: gehenna-devsw-base
# 1.85 30-Jul-2002 itojun

no need to check NULL mbuf, as we touch it already.
From: tedu <grendel@zeitbombe.org>


# 1.84 10-Jul-2002 itojun

correct ping6 -w result wth hostname with [A-Z]. PR 17540. sync w/kame


# 1.83 30-Jun-2002 thorpej

Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).


# 1.82 09-Jun-2002 itojun

whitespace cleanup


# 1.81 08-Jun-2002 itojun

whitespace cleanup


# 1.80 31-May-2002 itojun

do not mistakenly lock PMTUD route entry with RTV_MTU.


# 1.79 29-May-2002 christos

make this compile again.


# 1.78 29-May-2002 itojun

correct rmx_mtu value after PMTUD entry timeout (should be set to 0)


# 1.77 24-May-2002 itojun

extra blank line


# 1.76 24-May-2002 itojun

make a strict check before sending FQDN node information reply. sync w/kame


Revision tags: netbsd-1-6-base eeh-devprop-base newlock-base
# 1.75 05-Mar-2002 itojun

branches: 1.75.6; 1.75.8;
on redirect output, always try to attach target link layer address option.


Revision tags: ifpoll-base
# 1.74 21-Dec-2001 itojun

whitespace/costmetic sync w/kame


# 1.73 20-Dec-2001 itojun

centralize multicast group management (in6_join/leavegroup).
have a flag for ip6_output() to fragment to minimum MTU.
sync with kame


# 1.72 07-Dec-2001 itojun

correct timing to increment icmp6 MIB variables. sync with kame


# 1.71 13-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-mips-cache-base
# 1.70 29-Oct-2001 simonb

Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.


# 1.69 24-Oct-2001 itojun

more whitespace sync with kame


# 1.68 18-Oct-2001 itojun

branches: 1.68.2;
simplify per-if stats.


# 1.67 15-Oct-2001 itojun

sync with kame.
net.inet6.icmp6.nodeinfo is now a bitmap (2^0 = ping6 -w, 2^1 = ping6 -a).
give up local if there's mbuf alloc failures.
cope with ".." in hostname.
sync comments/whitespaces.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.66 22-Jun-2001 itojun

branches: 1.66.2;
remove RFC1885 compatibility code in #ifdef COMPAT_RFC1885, for icmp6
reply packet size consideration (obsolete, not used for a long time).
sync with kame


# 1.65 01-Jun-2001 itojun

use default hoplimit when incoming interface is not given to icmp6_reflect.
sync with kame


# 1.64 08-May-2001 itojun

correct faith prefix determination. use sys/netinet/if_faith.c:faithprefix()
to determine. sync with kame.
(without this change, non-faith socket may mistakenly accept for-faith traffic)


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.63 04-Apr-2001 itojun

make sure rcvif is sane on call to icmp6_reflect


# 1.62 30-Mar-2001 itojun

enable FAKE_LOOPBACK_IF case by default.
now traffic on loopback interface will be presented to bpf as normal wire
format packet (without KAME scopeid in s6_addr16[1]).

fix KAME PR 250 (host mistakenly accepts packets to fe80::x%lo0).

sync with kame.


# 1.61 21-Mar-2001 itojun

set rmx_mtu to L2 interface mtu, instead of 0, on mtudisc timeout.
ip6_output() change is for safety. sync with kame


# 1.60 08-Mar-2001 itojun

remove bogus rtfree. sync with kame. inspired by openbsd PR 1706.


# 1.59 01-Mar-2001 itojun

branches: 1.59.2;
make sure to enforce inbound ipsec policy checking, for any protocols on top
of ip (check it when final header is visited). sync with kame.
XXX kame team will need to re-check policy engine code


# 1.58 11-Feb-2001 itojun

pull latest kame pcbnotify code. synchronizes ICMPv6 path mtu discovery
behavior with other protocols (i.e. validation, use of hiwat/lowat).


# 1.57 11-Feb-2001 itojun

recover $NetBSD$ (removed by mistake)


# 1.56 10-Feb-2001 itojun

to sync with kame better, (1) remove register declaration for variables,
(2) sync whitespaces, (3) update comments. (4) bring in some of portability
and logging enhancements. no functional changes here.


# 1.55 08-Feb-2001 itojun

implement upper limit to icmp6 redirects (experimental, turned off)
negative value to {mtudisc,redirect}_{hi,lo}wat will turn off the limitation.
sync with kame.


# 1.54 07-Feb-2001 itojun

remove bogus DIAGNOSTIC. sync with kame


# 1.53 07-Feb-2001 itojun

during ip6/icmp6 inbound packet processing, do not call log() nor printf() in
normal operation (/var can get filled up by flodding bogus packets).
sysctl net.inet6.icmp6.nd6_debug will turn on diagnostic messages.
(#define ND6_DEBUG will turn it on by default)

improve stats in ND6 code.

lots of synchronziation with kame (including comments and cometic ones).


# 1.52 24-Jan-2001 itojun

- record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation


# 1.51 16-Jan-2001 itojun

s/ND6DEBUG/ND6_DEBUG/ to meet other places


# 1.50 08-Jan-2001 itojun

wrap icmp6 checksum error printf() into #ifdef ND6DEBUG.
sync with kame, NetBSD PR 11911.


# 1.49 11-Dec-2000 itojun

no need to rtalloc1() twice in pmtud. from kame


# 1.48 09-Dec-2000 itojun

update icmp6 too big validation. the change is necessary since pmtud is
mandatory for IPv6 (so we can't just validate by using connected pcb - we need
to allow traffic from unconnected pcb to do pmtud).
- if the traffic is validated by xx_ctlinput, allow up to "hiwat" pmtud
route entries.
- if the traffic was not validated by xx_ctlinput, allow up to "lowat" pmtud
route entries (there's upper limit, so bad guys cannot blow up our routing
table).
sync with kame

XXX need to think again about default hiwat/lowat value.
XXX victim selection to help starvation case


# 1.47 11-Nov-2000 itojun

improve spec conformance of node information query (07).
sync with kame.


# 1.46 18-Oct-2000 itojun

verify ICMPv6 too big messages based on TCP pcbs, and/or IPsec SA.
TODO: udp6, and sendto consideration. as pmtud is mandatory for IPv6,
it is rather important for us to support those cases.
TODO: more testing
TODO: kame sync


# 1.45 10-Oct-2000 itojun

sync with kame ($KAME$)


# 1.44 02-Oct-2000 itojun

fix compilation without INET. fix confusion between ipsecstat and ipsec6stat.
sync with kame.


# 1.43 16-Sep-2000 itojun

kame sys/netinet6/icmp6.c 1.140 -> 1.144
> in the check for the incoming redirect message, examine the gateway
> (from the routing table) only when the address family of the gateway is
> AF_INET6.


# 1.42 19-Aug-2000 itojun

- icmp6 nodeinfo: remove possibility of unaligned pointer access.
- jumbo payload output: fix incorrect mbuf manipulation
- pedant: align issues, mbuf assumption
(sync with kame)


# 1.41 03-Aug-2000 itojun

clearifications in icmp6 node query support.
XXX previous commit included "supported qtypes" icmp6 node query support.
sorry commit message was mistaken.


# 1.40 03-Aug-2000 itojun

correct typo in #define. ICMP6_NI_SUCESS -> SUCCESS (notice missing C).
sync with kame.


# 1.39 30-Jul-2000 itojun

sync comment with reality


# 1.38 28-Jul-2000 itojun

nuke the following sysctl variables. "ppsratelimit" should work better.
need to recompile sbin/sysctl after updating /usr/include.
net.inet.tcp.rstratelimit
net.inet.icmp.errratelimit
net.inet6.icmp6.errratelimit


# 1.37 09-Jul-2000 itojun

add ppsratelimit(9), which does event-per-sec rate limitation.
use it from icmp6 error rate limitation code.
XXX better name for the function?


# 1.36 07-Jul-2000 itojun

sync with kame.
introduce in6_{recover,embed}scope, for in-kernel scoped-address manipulation.
improve in6_pcbnotify.


# 1.35 06-Jul-2000 itojun

- do not use bitfield for router renumbering header.
- add protection mechanism against ND cache corruption due to bad NUD hints.
- more stats
- icmp6 pps limitation. TOOD: should implement ppsratecheck(9).


# 1.34 28-Jun-2000 mrg

<vm/vm.h> -> <uvm/uvm_extern.h>


Revision tags: netbsd-1-5-base
# 1.33 13-Jun-2000 itojun

branches: 1.33.2;
signedness issue with char, take 2. confirmed with i386 cc -funsigned-char.


# 1.32 13-Jun-2000 itojun

workaround to suppress warning on char == unsigned char arch.


# 1.31 12-Jun-2000 itojun

better conformance to draft-ietf-ipngwg-icmp-name-lookups-05.
the old code was chimera of 03 and 05 draft.

-n by default, since IPv6 reverse lookup takes too much time.
use -H to enable reverse name lookup.


Revision tags: minoura-xpg4dl-base
# 1.30 22-May-2000 itojun

branches: 1.30.2;
disallow negative numbers for ratelimit interval (tcp, icmp, icmp6).


# 1.29 09-May-2000 itojun

do not try NUD unless the gateway is a real neighbor.
real fix to KAME PR 245 (workaround has been implemented).


# 1.28 13-Apr-2000 itojun

do not return icmp6 error against icmp6 error.
(this is due to a bug in header chain chasing)


# 1.27 22-Mar-2000 itojun

use ip6_{last,next}hdr in icmp6 inbound packet parsing.


# 1.26 01-Mar-2000 itojun

introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)


# 1.25 28-Feb-2000 itojun

fix ICMPv6 redirect input. the bug can result in invalid ND entry.


# 1.24 28-Feb-2000 itojun

support draft-ietf-ipngwg-icmp-name-lookups-05.txt, drop support for
draft-ietf-ipngwg-icmp-name-lookups-04.txt.

There are certain bitfield change in 04 draft to 05 draft, which makes
04 "ping6 -a" and 05 "ping6 -a" not interoperable. sigh.


# 1.23 26-Feb-2000 itojun

bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.


# 1.22 17-Feb-2000 darrenr

Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.


# 1.21 15-Feb-2000 thorpej

Fix a couple of brainos in the last.


# 1.20 14-Feb-2000 thorpej

Use ratecheck() for ICMP6 rate limiting.


Revision tags: chs-ubc2-newbase
# 1.19 06-Feb-2000 itojun

fix include pathname for better rfc2292 compliance.


# 1.18 16-Jan-2000 itojun

add missing ipcomp cases.


# 1.17 07-Jan-2000 itohy

Rename variable "prep" for PReP port.


# 1.16 06-Jan-2000 itojun

remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...


# 1.15 05-Jan-2000 itojun

avoid panic on getsockopt(ICMPV6_FILTER).


# 1.14 02-Jan-2000 itojun

add net.inet6.icmp6.nodeinfo sysctl.
this allows you to disable/enable ICMPv6 node information query/reply
processing (which tells remote end the gethostname(3) setting, interface
addresses on the node, and some other things - documented in
draft-ietf-ipngwg-icmp-name-lookup* or something alike).

to test it, try ping6 -w ::1 with nodeinfo=0 and nodeinfo=1.
(sync with kame change)


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.13 15-Dec-1999 itojun

do not overwrite traffic class field when we write IPv6 version field.


# 1.12 13-Dec-1999 itojun

sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)


Revision tags: comdex-fall-1999-base fvdl-softdep-base
# 1.11 01-Oct-1999 itojun

branches: 1.11.2; 1.11.8;
consistent logging for icmp6 redirects
XXX should make logs 1-liner so that duplicated logs can be compressed
by syslog(8)?


Revision tags: chs-ubc2-base
# 1.10 31-Jul-1999 itojun

sync with recent KAME.
- loosen ipsec restriction on packet diredtion.
- revise icmp6 redirect handling on IsRouter bit.
- tcp/udp notification processing (link-local address case)
- cosmetic fixes (better code share across *BSD).


# 1.9 30-Jul-1999 itojun

remove reference to in6_systm.h (file itself will be removed afterwords)


# 1.8 22-Jul-1999 itojun

- implement IPv6 pmtud, which is necessary for TCP6.
- fix memory leak on SO_DEBUG over TCP.


# 1.7 22-Jul-1999 itojun

change unnecessary u_long/long into u_int32_t or something relevant.
more fixes should follow.


# 1.6 09-Jul-1999 thorpej

defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).


# 1.5 06-Jul-1999 itojun

sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups


# 1.4 06-Jul-1999 itojun

checked build on alpha and i386, with GENERIC.v6.
fixed several sizeof(void *) and sizeof(size_t) issues on alpha.

Thanks to: Dave Huang and Tim Rightnour


# 1.3 03-Jul-1999 thorpej

RCS ID police.


# 1.2 01-Jul-1999 itojun

branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.1 28-Jun-1999 itojun

branches: 1.1.2;
file icmp6.c was initially added on branch kame.


Revision tags: isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226
# 1.242 22-Dec-2018 maxv

Replace: M_COPY_PKTHDR -> m_copy_pkthdr. No functional change, since the
former is a macro to the latter.


# 1.241 22-Dec-2018 maxv

Replace: M_MOVE_PKTHDR -> m_move_pkthdr. No functional change, since the
former is a macro to the latter.


Revision tags: pgoyette-compat-1126
# 1.240 25-Oct-2018 ozaki-r

Remove a leftover debug printf

Pointed out by hannken@


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906
# 1.239 03-Sep-2018 riastradh

Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)


Revision tags: pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625
# 1.238 01-Jun-2018 ozaki-r

Fix _rt_free via rtrequest(RTM_DELETE) hangs in rt_timer handlers

A rt_timer handler is passed a rtentry with an extra reference that avoids the
rtentry is accidentally released. So rt_timer handers must release the reference
of a passed rtentry by themselves (but they didn't).


Revision tags: pgoyette-compat-0521
# 1.237 07-May-2018 maxv

Remove misleading comments.


Revision tags: pgoyette-compat-0502
# 1.236 01-May-2018 maxv

Remove now unused net_osdep.h includes, the other BSDs did the same.


# 1.235 29-Apr-2018 maxv

Replace
m_copym(m, 0, M_COPYALL, M_DONTWAIT)
by
m_copypacket(m, M_DONTWAIT)
when it is obvious that 'm' has M_PKTHDR set.


# 1.234 28-Apr-2018 maxv

Remove unused ipsec_var.h includes.


# 1.233 27-Apr-2018 maxv

Fix a bug introduced in rev1.154 (2009). mcl_cache still has a size of
MCLBYTES, so the area allocated is still too small.

I think it should have been MEXTMALLOC, and of course I can't test my
change.


# 1.232 26-Apr-2018 maxv

Stop using m_copy(), use m_copym() directly. m_copy is useless,
undocumented and confusing.


# 1.231 26-Apr-2018 maxv

Use M_UNWRITABLE, no functional change.


Revision tags: pgoyette-compat-0422 pgoyette-compat-0415
# 1.230 14-Apr-2018 maxv

Fix 'icmp6len', it shouldn't be ip6_plen, because we may not be at the
beginning of the packet (off+ip6_plen is beyond the end of the mbuf). By
luck, the IP6_EXTHDR_GET that follows will fail and prevent buffer
overflows in non-jumbogram packets.

For jumbograms we will probably be in trouble here; but it doesn't seem
possible to craft reliably a jumbogram for a non-jumbogram-enabled device.

So I don't think it's a huge problem.


# 1.229 14-Apr-2018 maxv

Cosmetic, and remove one XXX (no problem).


# 1.228 14-Apr-2018 maxv

Remove the RH0 code from ICMPv6. RH0 is deprecated by RFC5095 (2007) for
security reasons. We already removed it in Route6.

In addition there was an mbuf bug here: calling IP6_EXTHDR_GET twice with
the same offset, but still using the pointer from the first call, which
could have been made invalid. By luck, m_pulldown leaves zero-sized mbufs
in place, instead of freeing them.

And in general, using a 'finaldst' pointer on the mbuf, and then modifying
that mbuf with IP6_EXTHDR_GET with a smaller offset, was really error-
prone.


# 1.227 14-Apr-2018 maxv

Remove dead code. It is the same as the non-obsolete one, since
ICMP6_DST_UNREACH_NOTNEIGHBOR == ICMP6_DST_UNREACH_BEYONDSCOPE,
and the code leads to the same errno value (EHOSTUNREACH).


# 1.226 12-Apr-2018 maxv

Synchronize the code between raw_ip6.c<->icmp6.c<->raw_ip.c, so that it is
the same everywhere.


# 1.225 12-Apr-2018 maxv

Remove misleading comment; we're just checking the SP, not verifying the
AH/ESP payload. While here style a bit.


Revision tags: pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322
# 1.224 21-Mar-2018 roy

Sprinkle more soroverflow().


Revision tags: pgoyette-compat-0315 pgoyette-compat-base
# 1.223 28-Feb-2018 maxv

branches: 1.223.2;
Remove unused ipsec_private.h includes.


# 1.222 26-Feb-2018 maxv

Remove redundant condition (harmless). PR/53030.


# 1.221 26-Feb-2018 maxv

Dedup: merge ipsec4_in_reject and ipsec6_in_reject into ipsec_in_reject.
While here fix misleading comment.

ok ozaki-r@


# 1.220 12-Feb-2018 maxv

Replace bcopy -> memcpy when it is obvious that the areas don't overlap.
Rearrange ip6_splithdr() for clarity.


# 1.219 23-Jan-2018 maxv

Style, localify, remove XXX when there's no issue, and switch 'extra'
to int.


# 1.218 23-Jan-2018 maxv

Fix the check on 'maxlen', we are not creating struct icmp6_hdr but
struct nd_redirect (which is bigger). Also, make sure we can add a
struct nd_opt_rd_hdr.

Normally this doesn't change anything, since the mbuf has IPV6_MMTU
bytes, and it's always way bigger than what we need.


# 1.217 23-Jan-2018 maxv

Fix info leak. We are allocating a slot of size:

roundup(sizeof(*nd_opt) + ifp->if_addrlen, 8)

But we are not filling in the padding caused by the roundup, and therefore
several bytes are leaked, in the mbuf we're about to send to the network.


# 1.216 23-Jan-2018 maxv

Fix twice the same mistake: 'last' can't be null, so there's no point in
having this misleading branch.


# 1.215 23-Jan-2018 maxv

Style, and four fixes:

* Remove the (disabled) IPPROTO_ESP check. If the packet was decrypted it
will have M_DECRYPTED, and this is already checked.

* Memory leaks in icmp6_error2. They seem hardly triggerable.

* Fix miscomputation in _icmp6_input, the ICMP6 header is not guaranteed
to be located right after the IP6 header. ok mlelstv@

* Memory leak in _icmp6_input. This one seems to be impossible to trigger.


Revision tags: tls-maxphys-base-20171202
# 1.214 05-Nov-2017 ozaki-r

Fix usages of ipsec_used

If IPsec isn't used, we must go back to the normal path.

PR kern/52659


Revision tags: nick-nhusb-base-20170825
# 1.213 02-Aug-2017 ozaki-r

Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.


Revision tags: perseant-stdc-iso10646-base
# 1.212 07-Jul-2017 knakahara

fix PR kern/52353. implemented by ozaki-r@n.o. I just commit by proxy.

XXX need to pullup to -8.


Revision tags: netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.211 14-Mar-2017 ozaki-r

branches: 1.211.6;
Replace DIAGNOSTIC + panic with CTASSERT


# 1.210 17-Feb-2017 ozaki-r

Rename if_acquire_NOMPSAFE to if_acquire

It can be used in MP-safe ways. So let's remove the confusing postfix.
If it's used in a unsafe way, warn NOMPSAFE in a comment.


# 1.209 13-Feb-2017 ozaki-r

Protect mtudisc and redirect stuffs of icmp/icmp6 with mutex

We have to run pr_init of icmp and icmp6 prior to tcp and tcp6 ones
for mutex initialization.


# 1.208 07-Feb-2017 ozaki-r

Add missing NULL checks for m_get_rcvif


Revision tags: nick-nhusb-base-20170204
# 1.207 02-Feb-2017 ozaki-r

Defer some pr_input to workqueue

pr_input is currently called in softint. Some pr_input such as ICMP, ICMPv6
and CARP can add/delete/update IP addresses and routing table entries. For
example, icmp6_redirect_input updates an a routing table entry and
nd6_ra_input may delete an IP address.

Basically such operations shouldn't be done in softint. That aside, we have
a reason to avoid the situation; psz/psref waits cannot be used in softint,
however they are required to work in such pr_input in the MP-safe world.

The change implements the workqueue pr_input framework called wqinput which
provides a means to defer pr_input of a protocol to workqueue easily.
Currently icmp_input, icmp6_input, carp_proto_input and carp6_proto_input
are deferred to workqueue by the framework.

Proposed and discussed on tech-kern and tech-net


# 1.206 16-Jan-2017 christos

ip6_sprintf -> IN6_PRINT so that we pass the size.


# 1.205 16-Jan-2017 ryo

Make ip6_sprintf(), in_fmtaddr(), lla_snprintf() and icmp6_redirect_diag() mpsafe.

Reviewed by ozaki-r@


Revision tags: bouyer-socketcan-base
# 1.204 13-Jan-2017 ozaki-r

branches: 1.204.2;
Tweak icmp6_input; always use off, not *offp


Revision tags: pgoyette-localcount-20170107
# 1.203 12-Dec-2016 ozaki-r

Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.202 11-Dec-2016 ozaki-r

Correct sanity checks of icmp6_redirect_output

- rt->rt_ifp is always non-NULL
- Checking RTF_UP here is just racy and meaningless
- The arguments should be non-NULL (at least for now)


Revision tags: nick-nhusb-base-20161204
# 1.201 15-Nov-2016 mlelstv

Enforce alignment requirements that are violated in some cases.
For machines that don't need strict alignment (i386,amd64,vax,m68k) this
is a no-op.

Fixes PR kern/50766 but should be improved.


Revision tags: pgoyette-localcount-20161104
# 1.200 31-Oct-2016 ozaki-r

Fix race condition of in6_selectsrc

in6_selectsrc returned a pointer to in6_addr that wan't guaranteed to be
safe by pserialize (or psref), which was racy. Let callers pass a pointer
to in6_addr and in6_selectsrc copy a result to it inside pserialize
critical sections.


# 1.199 25-Oct-2016 ozaki-r

Remove unnecessary argument

No functional change.


# 1.198 18-Oct-2016 ozaki-r

Remove unnecessary pserialize_read_enter


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.197 26-Aug-2016 dholland

PR 51434 David Binderman: remove redundant test.


# 1.196 19-Aug-2016 roy

Revert r1.148
IP6_EXTHDR_GET ensures that a icmp6 header can be fetched from the mbuf
so m_pullup does not need to be called.

While here, we can safely increament interface error stats even with an
invalidated mbuf because we have a saved reference to the interface.


Revision tags: pgoyette-localcount-20160806
# 1.195 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


Revision tags: pgoyette-localcount-20160726
# 1.194 15-Jul-2016 ozaki-r

Use sin6tosa and sin6tocsa macros

No functional change.


# 1.193 15-Jul-2016 ozaki-r

Use ifatoia6 macro

No functional change.


Revision tags: pgoyette-localcount-base nick-nhusb-base-20160907
# 1.192 07-Jul-2016 ozaki-r

branches: 1.192.2;
Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.


# 1.191 05-Jul-2016 ozaki-r

Use ia6 or ia instead of ifa as a variable name of struct in6_ifaddr

We conventionally use ifa for struct ifaddr and use ia6 or ia for
struct in6_ifaddr.

No functional change.


# 1.190 28-Jun-2016 ozaki-r

Add missing NULL checks for m_get_rcvif_psref


# 1.189 21-Jun-2016 ozaki-r

Make sure returning ifp from in6_select* functions psref-ed

To this end, callers need to pass struct psref to the functions
and the fuctions acquire a reference of ifp with it. In some cases,
we can simply use if_get_byindex, however, in other cases
(say rt->rt_ifp and ia->ifa_ifp), we have no MP-safe way for now.
In order to take a reference anyway we use non MP-safe function
if_acquire_NOMPSAFE for the latter cases. They should be fixed in
the future somehow.


# 1.188 10-Jun-2016 ozaki-r

Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.


# 1.187 10-Jun-2016 ozaki-r

Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.186 18-May-2016 ozaki-r

Don't try to get outif unnecessarily from in6_selectsrc

The got outif is unused.


# 1.185 17-May-2016 ozaki-r

Get rcvif once and reuse it

No functional change.


# 1.184 17-May-2016 ozaki-r

Make sure icmp6_redirect_input frees mbuf before return


# 1.183 12-May-2016 ozaki-r

Protect ifnet list with psz and psref

The change ensures that ifnet objects in the ifnet list aren't freed during
list iterations by using pserialize(9) and psref(9).

Note that the change adds a pslist(9) for ifnet but doesn't remove the
original ifnet list (ifnet_list) to avoid breaking kvm(3) users. We
shouldn't use the original list in the kernel anymore.


Revision tags: nick-nhusb-base-20160422
# 1.182 04-Apr-2016 ozaki-r

Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html


# 1.181 01-Apr-2016 ozaki-r

Remove unnecessary casts and do s/0/NULL/ for rtrequest


# 1.180 01-Apr-2016 ozaki-r

Refine nd6log

Add __func__ to nd6log itself instead of adding it to callers.


Revision tags: nick-nhusb-base-20160319
# 1.179 21-Jan-2016 riastradh

Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.


# 1.178 21-Jan-2016 riastradh

Give proper prototype to ip_output.


Revision tags: nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.177 14-Sep-2015 ozaki-r

Update icmp6_redirect_timeout_q when changing net.inet6.icmp6.redirtimeout

We have to update icmp6_redirect_timeout_q as well as icmp6_redirtimeout
when changing net.inet6.icmp6.redirtimeout via sysctl. The updating logic
is copied from sysctl_net_inet_icmp_redirtimeout.

This change is from s-yamaguchi@IIJ (with KNF by ozaki-r) and fixes
PR kern/50240.


# 1.176 31-Aug-2015 ozaki-r

Make rt_refcnt take into account rt_timer


# 1.175 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.174 24-Aug-2015 ozaki-r

Change 0 to NULL for rtrequest's last argument (struct rtentry **ret_nrt)


# 1.173 07-Aug-2015 ozaki-r

Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.


# 1.172 24-Jul-2015 ozaki-r

Fix rtfree-ing wrong rtentry


# 1.171 17-Jul-2015 ozaki-r

Reform use of rt_refcnt

rt_refcnt of rtentry was used in bad manners, for example, direct rt_refcnt++
and rt_refcnt-- outside route.c, "rt->rt_refcnt++; rtfree(rt);" idiom, and
touching rt after rt->rt_refcnt--.

These abuses seem to be needed because rt_refcnt manages only references
between rtentry and doesn't take care of references during packet processing
(IOW references from local variables). In order to reduce the above abuses,
the latter cases should be counted by rt_refcnt as well as the former cases.

This change improves consistency of use of rt_refcnt:
- rtentry is always accessed with rt_refcnt incremented
- rtentry's rt_refcnt is decremented after use (rtfree is always used instead
of rt_refcnt--)
- functions returning rtentry increment its rt_refcnt (and caller rtfree it)

Note that rt_refcnt prevents rtentry from being freed but doesn't prevent
rtentry from being updated. Toward MP-safe, we need to provide another
protection for rtentry, e.g., locks. (Or introduce a better data structure
allowing concurrent readers during updates.)


Revision tags: nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.170 25-Nov-2014 christos

branches: 1.170.2;
CID 977389: Out of bounds access.


Revision tags: netbsd-7-0-2-RELEASE netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.169 06-Jun-2014 rmind

branches: 1.169.2;
- Eliminate RTFREE() macro in favour of rtfree() function.
- Make rtcache() function static.


# 1.168 30-May-2014 christos

Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.


# 1.167 19-May-2014 rmind

- Split off PRU_ATTACH and PRU_DETACH logic into separate functions.
- Replace malloc with kmem and eliminate M_PCB while here.
- Sprinkle more asserts.


Revision tags: rmind-smpnet-nbase rmind-smpnet-base
# 1.166 18-May-2014 rmind

Use IFNET_FIRST() rather than open coding ifnet access.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.165 25-Feb-2014 pooka

branches: 1.165.2;
Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.


# 1.164 20-Feb-2014 joerg

Bail out in case m_pulldown failed.


# 1.163 23-Nov-2013 christos

convert from CIRCLEQ to TAILQ.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.162 05-Jun-2013 christos

branches: 1.162.2;
IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.


Revision tags: agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.161 23-Jun-2012 christos

branches: 1.161.2;
4 new sysctls to avoid ipv6 DoS attacks from OpenBSD


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.160 22-Mar-2012 drochner

remove KAME IPSEC, replaced by FAST_IPSEC


Revision tags: netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.159 31-Dec-2011 christos

branches: 1.159.2; 1.159.6; 1.159.8;
- fix offsetof usage, and redundant defines
- kill pointer casts to 0


# 1.158 19-Dec-2011 drochner

rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.157 31-Aug-2011 plunky

branches: 1.157.2; 1.157.6;
NULL does not need a cast


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 rmind-uvmplock-base
# 1.156 12-Sep-2010 drochner

avoid NULL dereference in error case


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.155 18-Oct-2009 christos

branches: 1.155.2; 1.155.4;
fix the sun2 case for real.


# 1.154 12-Oct-2009 christos

unbreak sun2.


# 1.153 16-Sep-2009 pooka

Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.152 18-Mar-2009 cegger

bzero -> memset


# 1.151 18-Mar-2009 cegger

bcmp -> memcmp


Revision tags: netbsd-5-2-3-RELEASE netbsd-5-1-5-RELEASE netbsd-5-2-2-RELEASE netbsd-5-1-4-RELEASE netbsd-5-2-1-RELEASE netbsd-5-1-3-RELEASE netbsd-5-2-RELEASE netbsd-5-2-RC1 netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 nick-hppapmap-base2 netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 haad-dm-base mjf-devfs2-base
# 1.150 03-Oct-2008 adrianp

branches: 1.150.2; 1.150.8;
Fix for CVE-2008-3530 from matt@
Implement improved checking for MTU values on ICMP 'Packet Too Big Messages'


Revision tags: wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.149 06-Aug-2008 plunky

Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base
# 1.148 07-May-2008 bouyer

branches: 1.148.2; 1.148.6;
Sync with ipv4 icmp_input(): make sure the mbuf is writable and
contains the entire icmp message befre calling icmp6_input().
should fix "panic: mbuf too short for IPv6 header" seen by several peoples.


# 1.147 04-May-2008 thorpej

Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577


Revision tags: yamt-nfs-mp-base
# 1.146 23-Apr-2008 thorpej

branches: 1.146.2;
Use <net/net_stats.h> / netstat_sysctl().


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.145 15-Apr-2008 thorpej

branches: 1.145.2;
Make ip6 and icmp6 stats per-cpu.


# 1.144 08-Apr-2008 thorpej

Change IPv6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ip6stat structure; old netstat
binaries will continue to work properly.


# 1.143 08-Apr-2008 thorpej

Change ICMP6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old icmp6stat structure; old netstat
binaries will continue to work properly.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.142 27-Feb-2008 matt

Convert to ansi definitions from old-style definitons.
Remember that func() is not ansi, func(void) is.


Revision tags: nick-net80211-sync-base bouyer-xeni386-merge1 vmlocking2-base3 bouyer-xeni386-nbase yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 bouyer-xeni386-base yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase mjf-devfs-base matt-armv6-base jmcneill-pm-base hpcarm-cleanup-base reinoud-bufcleanup-base
# 1.141 04-Dec-2007 dyoung

branches: 1.141.8; 1.141.12;
Use IFNET_FOREACH() and IFADDR_FOREACH().


Revision tags: vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.140 01-Nov-2007 dyoung

branches: 1.140.2; 1.140.4;
De-__P().


# 1.139 29-Oct-2007 dyoung

The IPv6 stack labels incoming packets with an m_tag whose payload
is a struct ip6aux. A struct ip6aux used to contain a pointer to
an in6_ifaddr, but that pointer could become a dangling reference
in the lifetime of the m_tag, because ip6_setdstifaddr() did not
increase the in6_ifaddr's reference count. I have removed the
pointer from ip6aux. I load it with the interesting fields from
the in6_ifaddr (an IPv6 address, a scope ID, and some flags),
instead.


# 1.138 24-Oct-2007 dyoung

Replace rote sockaddr_in6 initializations (memset(), set sa6_family,
sa6_len, and sa6_add) with sockaddr_in6_init() calls.

De-__P(). Constify. KNF. Shorten a staircase. Change bcmp() to
memcmp().

Extract subroutine in6_setzoneid() from in6_setscope(), for re-use
soon.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base vmlocking-base
# 1.137 19-Sep-2007 dyoung

branches: 1.137.4;
1) Introduce a new socket option, (SOL_SOCKET, SO_NOHEADER), that
tells a socket that it should both add a protocol header to tx'd
datagrams and remove the header from rx'd datagrams:

int onoff = 1, s = socket(...);
setsockopt(s, SOL_SOCKET, SO_NOHEADER, &onoff);

2) Add an implementation of (SOL_SOCKET, SO_NOHEADER) for raw IPv4
sockets.

3) Reorganize the protocols' pr_ctloutput implementations a bit.
Consistently return ENOPROTOOPT when an option is unsupported,
and EINVAL if a supported option's arguments are incorrect.
Reorganize the flow of code so that it's more clear how/when
options are passed down the stack until they are handled.

Shorten some pr_ctloutput staircases for readability.

4) Extract common mbuf code into subroutines, add new sockaddr
methods, and introduce a new subroutine, fsocreate(), for reuse
later; use it first in sys_socket():

struct mbuf *m_getsombuf(struct socket *so)

Create an mbuf and make its owner the socket `so'.

struct mbuf *m_intopt(struct socket *so, int val)

Create an mbuf, make its owner the socket `so', put the
int `val' into it, and set its length to sizeof(int).


int fsocreate(..., int *fd)

Create a socket, a la socreate(9), put the socket into the
given LWP's descriptor table, return the descriptor at `fd'
on success.

void *sockaddr_addr(struct sockaddr *sa, socklen_t *slenp)
const void *sockaddr_const_addr(const struct sockaddr *sa, socklen_t *slenp)

Extract a pointer to the address part of a sockaddr. Write
the length of the address part at `slenp', if `slenp' is
not NULL.

socklen_t sockaddr_getlen(const struct sockaddr *sa)

Return the length of a sockaddr. This just evaluates to
sa->sa_len. I only add this for consistency with code that
appears in a portable userland library that I am going to
import.

const struct sockaddr *sockaddr_any(const struct sockaddr *sa)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.

const void *sockaddr_anyaddr(const struct sockaddr *sa, socklen_t *slenp)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.


Revision tags: nick-csl-alignment-base5
# 1.136 10-Aug-2007 dyoung

branches: 1.136.2;
Constify. bcopy -> memcpy.


Revision tags: matt-mips64-base
# 1.135 19-Jul-2007 dyoung

branches: 1.135.4; 1.135.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.134 13-Jun-2007 dyoung

branches: 1.134.2;
Persuasive programming: check M_UNWRITABLE(m, len) instead of
m->m_len<len before pulling up, because that helps make it clear
that we m_pullup() in order to guarantee that the contiguous region
is *writable*.


# 1.133 23-May-2007 christos

Ansify + add a few comments, from Karl Sj��dahl


Revision tags: yamt-idlelwp-base8
# 1.132 02-May-2007 dyoung

Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


Revision tags: thorpej-atomic-base
# 1.131 04-Mar-2007 christos

branches: 1.131.2; 1.131.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.130 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


# 1.129 10-Feb-2007 degroote

branches: 1.129.2;
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.128 29-Jan-2007 dyoung

bzero -> memset


# 1.127 15-Jan-2007 dyoung

Cosmetic: indent using ASCII horizontal tab, insert space following
comma, wrap line.


# 1.126 15-Jan-2007 degroote

Fix an infinite loop ( and local dos ) in the case where the ip6_hdr and
the icmp6_hdr are not in the same mbuf.
Fix pr/34994 and probably pr/35333
Ok @rpaulo


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.125 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.124 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


Revision tags: netbsd-4-base
# 1.123 16-Nov-2006 christos

branches: 1.123.2;
__unused removal on arguments; approved by core.


Revision tags: yamt-splraiseipl-base2
# 1.122 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.121 05-Sep-2006 dyoung

branches: 1.121.2; 1.121.4;
Simplify and repair icmp6_input() to stop the kernel from panicking
in m_copydata() when an ICMP6_ECHO_REQUEST is received, as reported
by Tatoku Ogaito on current-users@.


Revision tags: yamt-pdpolicy-base8
# 1.120 01-Sep-2006 dyoung

Vastly simplify the code that copies an ICMP6 packet to two data
paths: ICMP6 reply path, and socket path.


# 1.119 30-Aug-2006 christos

declare the type of code.


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.118 11-Jul-2006 tron

Clear mbuf checksum flags before passing it to ip6_output(). We might
recycle a mbuf which contained a hardware provided checksum. This
fixes "traceroute6" to a machine which is using a wm(4) interface
that has UDP or TCP checksum offload enabled.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base
# 1.117 07-Jun-2006 kardel

branches: 1.117.2;
merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base simonb-timecounters-base
# 1.116 15-Apr-2006 christos

branches: 1.116.2;
Coverity CID 740: Change constant comparisons to MCLBYTES to KASSERT and remove
extraneous tests.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2
# 1.115 05-Mar-2006 rpaulo

branches: 1.115.2; 1.115.4;
NDP-related improvements:
RFC4191
- supports host-side router-preference

RFC3542
- if DAD fails on a interface, disables IPv6 operation on the
interface
- don't advertise MLD report before DAD finishes

Others
- fixes integer overflow for valid and preferred lifetimes
- improves timer granularity for MLD, using callout-timer.
- reflects rtadvd's IPv6 host variable information into kernel
(router only)
- adds a sysctl option to enable/disable pMTUd for multicast
packets
- performs NUD on PPP/GRE interface by default
- Redirect works regardless of ip6_accept_rtadv
- removes RFC1885-related code

From the KAME project via SUZUKI Shinsuke.
Reviewed by core.


Revision tags: yamt-pdpolicy-base
# 1.114 03-Mar-2006 rpaulo

branches: 1.114.2;
Fix typos in comments.

From: the KAME project via SUZUKI Shinsuke.


Revision tags: yamt-uio_vmspace-base5
# 1.113 21-Jan-2006 rpaulo

branches: 1.113.2; 1.113.4;
Better support of IPv6 scoped addresses.

- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.
- and, while there, don't try to remove the ff02::/32 interface route
entry in in6_ifdetach() as it's already gone.

This also includes some level of support for the standard source
address selection algorithm defined in RFC3484, which will be
completed on in the future.

From the KAME project via JINMEI Tatuya.
Approved by core@.


# 1.112 11-Dec-2005 christos

branches: 1.112.2;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base ktrace-lwp-base
# 1.111 19-Oct-2005 bouyer

In icmp6_redirect_output(), sip6 is initialised to point to the data area of
m0. But m0 may be freed later, so trying to use sip6 at the end of this
function is wrong. My guess is that we want to reference the data area
of m (the mbuf about to be send) instead at this point.
Fix a panic on Xen (where a data area of a mbuf may be unmapped when the
mbuf is freed), and probably potential data/pool corruption in other cases.


Revision tags: yamt-vop-base
# 1.110 18-Aug-2005 yamt

branches: 1.110.2;
- introduce M_MOVE_PKTHDR and use it where appropriate.
intended to be mostly API compatible with openbsd/freebsd.
- remove a glue #define in netipsec/ipsec_osdep.h.


# 1.109 29-May-2005 christos

branches: 1.109.2;
- avoid shadowed variables
- sprinkle const.


Revision tags: netbsd-3-1-1-RELEASE netbsd-3-0-3-RELEASE netbsd-3-1-RELEASE netbsd-3-0-2-RELEASE netbsd-3-1-RC4 netbsd-3-1-RC3 netbsd-3-1-RC2 netbsd-3-1-RC1 netbsd-3-0-1-RELEASE netbsd-3-0-RELEASE netbsd-3-0-RC6 netbsd-3-0-RC5 netbsd-3-0-RC4 netbsd-3-0-RC3 netbsd-3-0-RC2 netbsd-3-0-RC1 yamt-km-base4 yamt-km-base3 netbsd-3-base yamt-km-base2 yamt-km-base kent-audio2-base
# 1.108 17-Jan-2005 itojun

branches: 1.108.6; 1.108.8; 1.108.10;
shouldn't check code field on "packet too big" icmp6 message.


Revision tags: kent-audio1-beforemerge kent-audio1-base
# 1.107 25-May-2004 atatat

branches: 1.107.4;
Sysctl descriptions under net subtree (net.key not done)


Revision tags: netbsd-2-0-base
# 1.106 26-Mar-2004 itojun

branches: 1.106.2;
do not touch m->m_pkthdr.rcvif after m becomes invalid. Patrick Latifi


# 1.105 24-Mar-2004 atatat

Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.


# 1.104 17-Dec-2003 lha

Fix ICMPV6CTL_ND6_[DP]RLIST, they broke with new sysctl.
Makes ndp -r/ndp -p work again, patch from atatat


# 1.103 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.102 30-Oct-2003 simonb

Remove some assigned-to but otherwise unused variables.


# 1.101 04-Sep-2003 itojun

revamp inpcb/in6pcb so that they are more aligned with each other.
in6pcb lookup now uses hash(9).


# 1.100 25-Aug-2003 itojun

deref member in in6p directly, don't rely on existence of macro


# 1.99 22-Aug-2003 itojun

remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.


# 1.98 22-Aug-2003 itojun

change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.


# 1.97 22-Aug-2003 jonathan

Replace the set_socket() method of passing an extra struct socket*
argument to ip6_output() with a new explicit struct in6pcb* argument.
(The underlying socket can be obtained via in6pcb->inp6_socket.)

In preparation for fast-ipsec. Reviewed by itojun.


# 1.96 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.95 06-Aug-2003 itojun

m_cat may free mbuf on 2nd arg, so m_pkthdr manipulation has to happen
before m_cat call. from Julian Coleman via kame.


# 1.94 24-Jun-2003 itojun

branches: 1.94.2;
remove unneeded checks of accept_rtadv. from kame


# 1.93 24-Jun-2003 itojun

use time.tv_sec directly


# 1.92 06-Jun-2003 itojun

- sync up MLD declaration with RFC3542 (s/MLD6/MLD/)
- routing header declaration with RFC3542
(note: sizeof(ip6_rthdr0) has changed!)
also, sync up with RFC2460 routing header definition (no "strict" source
routing mode any more)

part of advanced API update (RFC2292 -> 3542).


# 1.91 03-Jun-2003 itojun

remove assumption on redirect header option processing. from kame


# 1.90 14-May-2003 itojun

always use PULLDOWN_TEST codepath.


# 1.89 31-Mar-2003 itojun

avoid mbuf leak in redirect header option attachment. more complete
fix to come. from kame


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.88 27-Sep-2002 provos

remove trailing \n in panic(). approved perry.


# 1.87 23-Sep-2002 simonb

Remove breaks after returns, unreachable returns and returns after
returns(!).


# 1.86 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame.


Revision tags: gehenna-devsw-base
# 1.85 30-Jul-2002 itojun

no need to check NULL mbuf, as we touch it already.
From: tedu <grendel@zeitbombe.org>


# 1.84 10-Jul-2002 itojun

correct ping6 -w result wth hostname with [A-Z]. PR 17540. sync w/kame


# 1.83 30-Jun-2002 thorpej

Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).


# 1.82 09-Jun-2002 itojun

whitespace cleanup


# 1.81 08-Jun-2002 itojun

whitespace cleanup


# 1.80 31-May-2002 itojun

do not mistakenly lock PMTUD route entry with RTV_MTU.


# 1.79 29-May-2002 christos

make this compile again.


# 1.78 29-May-2002 itojun

correct rmx_mtu value after PMTUD entry timeout (should be set to 0)


# 1.77 24-May-2002 itojun

extra blank line


# 1.76 24-May-2002 itojun

make a strict check before sending FQDN node information reply. sync w/kame


Revision tags: netbsd-1-6-base eeh-devprop-base newlock-base
# 1.75 05-Mar-2002 itojun

branches: 1.75.6; 1.75.8;
on redirect output, always try to attach target link layer address option.


Revision tags: ifpoll-base
# 1.74 21-Dec-2001 itojun

whitespace/costmetic sync w/kame


# 1.73 20-Dec-2001 itojun

centralize multicast group management (in6_join/leavegroup).
have a flag for ip6_output() to fragment to minimum MTU.
sync with kame


# 1.72 07-Dec-2001 itojun

correct timing to increment icmp6 MIB variables. sync with kame


# 1.71 13-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-mips-cache-base
# 1.70 29-Oct-2001 simonb

Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.


# 1.69 24-Oct-2001 itojun

more whitespace sync with kame


# 1.68 18-Oct-2001 itojun

branches: 1.68.2;
simplify per-if stats.


# 1.67 15-Oct-2001 itojun

sync with kame.
net.inet6.icmp6.nodeinfo is now a bitmap (2^0 = ping6 -w, 2^1 = ping6 -a).
give up local if there's mbuf alloc failures.
cope with ".." in hostname.
sync comments/whitespaces.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.66 22-Jun-2001 itojun

branches: 1.66.2;
remove RFC1885 compatibility code in #ifdef COMPAT_RFC1885, for icmp6
reply packet size consideration (obsolete, not used for a long time).
sync with kame


# 1.65 01-Jun-2001 itojun

use default hoplimit when incoming interface is not given to icmp6_reflect.
sync with kame


# 1.64 08-May-2001 itojun

correct faith prefix determination. use sys/netinet/if_faith.c:faithprefix()
to determine. sync with kame.
(without this change, non-faith socket may mistakenly accept for-faith traffic)


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.63 04-Apr-2001 itojun

make sure rcvif is sane on call to icmp6_reflect


# 1.62 30-Mar-2001 itojun

enable FAKE_LOOPBACK_IF case by default.
now traffic on loopback interface will be presented to bpf as normal wire
format packet (without KAME scopeid in s6_addr16[1]).

fix KAME PR 250 (host mistakenly accepts packets to fe80::x%lo0).

sync with kame.


# 1.61 21-Mar-2001 itojun

set rmx_mtu to L2 interface mtu, instead of 0, on mtudisc timeout.
ip6_output() change is for safety. sync with kame


# 1.60 08-Mar-2001 itojun

remove bogus rtfree. sync with kame. inspired by openbsd PR 1706.


# 1.59 01-Mar-2001 itojun

branches: 1.59.2;
make sure to enforce inbound ipsec policy checking, for any protocols on top
of ip (check it when final header is visited). sync with kame.
XXX kame team will need to re-check policy engine code


# 1.58 11-Feb-2001 itojun

pull latest kame pcbnotify code. synchronizes ICMPv6 path mtu discovery
behavior with other protocols (i.e. validation, use of hiwat/lowat).


# 1.57 11-Feb-2001 itojun

recover $NetBSD$ (removed by mistake)


# 1.56 10-Feb-2001 itojun

to sync with kame better, (1) remove register declaration for variables,
(2) sync whitespaces, (3) update comments. (4) bring in some of portability
and logging enhancements. no functional changes here.


# 1.55 08-Feb-2001 itojun

implement upper limit to icmp6 redirects (experimental, turned off)
negative value to {mtudisc,redirect}_{hi,lo}wat will turn off the limitation.
sync with kame.


# 1.54 07-Feb-2001 itojun

remove bogus DIAGNOSTIC. sync with kame


# 1.53 07-Feb-2001 itojun

during ip6/icmp6 inbound packet processing, do not call log() nor printf() in
normal operation (/var can get filled up by flodding bogus packets).
sysctl net.inet6.icmp6.nd6_debug will turn on diagnostic messages.
(#define ND6_DEBUG will turn it on by default)

improve stats in ND6 code.

lots of synchronziation with kame (including comments and cometic ones).


# 1.52 24-Jan-2001 itojun

- record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation


# 1.51 16-Jan-2001 itojun

s/ND6DEBUG/ND6_DEBUG/ to meet other places


# 1.50 08-Jan-2001 itojun

wrap icmp6 checksum error printf() into #ifdef ND6DEBUG.
sync with kame, NetBSD PR 11911.


# 1.49 11-Dec-2000 itojun

no need to rtalloc1() twice in pmtud. from kame


# 1.48 09-Dec-2000 itojun

update icmp6 too big validation. the change is necessary since pmtud is
mandatory for IPv6 (so we can't just validate by using connected pcb - we need
to allow traffic from unconnected pcb to do pmtud).
- if the traffic is validated by xx_ctlinput, allow up to "hiwat" pmtud
route entries.
- if the traffic was not validated by xx_ctlinput, allow up to "lowat" pmtud
route entries (there's upper limit, so bad guys cannot blow up our routing
table).
sync with kame

XXX need to think again about default hiwat/lowat value.
XXX victim selection to help starvation case


# 1.47 11-Nov-2000 itojun

improve spec conformance of node information query (07).
sync with kame.


# 1.46 18-Oct-2000 itojun

verify ICMPv6 too big messages based on TCP pcbs, and/or IPsec SA.
TODO: udp6, and sendto consideration. as pmtud is mandatory for IPv6,
it is rather important for us to support those cases.
TODO: more testing
TODO: kame sync


# 1.45 10-Oct-2000 itojun

sync with kame ($KAME$)


# 1.44 02-Oct-2000 itojun

fix compilation without INET. fix confusion between ipsecstat and ipsec6stat.
sync with kame.


# 1.43 16-Sep-2000 itojun

kame sys/netinet6/icmp6.c 1.140 -> 1.144
> in the check for the incoming redirect message, examine the gateway
> (from the routing table) only when the address family of the gateway is
> AF_INET6.


# 1.42 19-Aug-2000 itojun

- icmp6 nodeinfo: remove possibility of unaligned pointer access.
- jumbo payload output: fix incorrect mbuf manipulation
- pedant: align issues, mbuf assumption
(sync with kame)


# 1.41 03-Aug-2000 itojun

clearifications in icmp6 node query support.
XXX previous commit included "supported qtypes" icmp6 node query support.
sorry commit message was mistaken.


# 1.40 03-Aug-2000 itojun

correct typo in #define. ICMP6_NI_SUCESS -> SUCCESS (notice missing C).
sync with kame.


# 1.39 30-Jul-2000 itojun

sync comment with reality


# 1.38 28-Jul-2000 itojun

nuke the following sysctl variables. "ppsratelimit" should work better.
need to recompile sbin/sysctl after updating /usr/include.
net.inet.tcp.rstratelimit
net.inet.icmp.errratelimit
net.inet6.icmp6.errratelimit


# 1.37 09-Jul-2000 itojun

add ppsratelimit(9), which does event-per-sec rate limitation.
use it from icmp6 error rate limitation code.
XXX better name for the function?


# 1.36 07-Jul-2000 itojun

sync with kame.
introduce in6_{recover,embed}scope, for in-kernel scoped-address manipulation.
improve in6_pcbnotify.


# 1.35 06-Jul-2000 itojun

- do not use bitfield for router renumbering header.
- add protection mechanism against ND cache corruption due to bad NUD hints.
- more stats
- icmp6 pps limitation. TOOD: should implement ppsratecheck(9).


# 1.34 28-Jun-2000 mrg

<vm/vm.h> -> <uvm/uvm_extern.h>


Revision tags: netbsd-1-5-base
# 1.33 13-Jun-2000 itojun

branches: 1.33.2;
signedness issue with char, take 2. confirmed with i386 cc -funsigned-char.


# 1.32 13-Jun-2000 itojun

workaround to suppress warning on char == unsigned char arch.


# 1.31 12-Jun-2000 itojun

better conformance to draft-ietf-ipngwg-icmp-name-lookups-05.
the old code was chimera of 03 and 05 draft.

-n by default, since IPv6 reverse lookup takes too much time.
use -H to enable reverse name lookup.


Revision tags: minoura-xpg4dl-base
# 1.30 22-May-2000 itojun

branches: 1.30.2;
disallow negative numbers for ratelimit interval (tcp, icmp, icmp6).


# 1.29 09-May-2000 itojun

do not try NUD unless the gateway is a real neighbor.
real fix to KAME PR 245 (workaround has been implemented).


# 1.28 13-Apr-2000 itojun

do not return icmp6 error against icmp6 error.
(this is due to a bug in header chain chasing)


# 1.27 22-Mar-2000 itojun

use ip6_{last,next}hdr in icmp6 inbound packet parsing.


# 1.26 01-Mar-2000 itojun

introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)


# 1.25 28-Feb-2000 itojun

fix ICMPv6 redirect input. the bug can result in invalid ND entry.


# 1.24 28-Feb-2000 itojun

support draft-ietf-ipngwg-icmp-name-lookups-05.txt, drop support for
draft-ietf-ipngwg-icmp-name-lookups-04.txt.

There are certain bitfield change in 04 draft to 05 draft, which makes
04 "ping6 -a" and 05 "ping6 -a" not interoperable. sigh.


# 1.23 26-Feb-2000 itojun

bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.


# 1.22 17-Feb-2000 darrenr

Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.


# 1.21 15-Feb-2000 thorpej

Fix a couple of brainos in the last.


# 1.20 14-Feb-2000 thorpej

Use ratecheck() for ICMP6 rate limiting.


Revision tags: chs-ubc2-newbase
# 1.19 06-Feb-2000 itojun

fix include pathname for better rfc2292 compliance.


# 1.18 16-Jan-2000 itojun

add missing ipcomp cases.


# 1.17 07-Jan-2000 itohy

Rename variable "prep" for PReP port.


# 1.16 06-Jan-2000 itojun

remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...


# 1.15 05-Jan-2000 itojun

avoid panic on getsockopt(ICMPV6_FILTER).


# 1.14 02-Jan-2000 itojun

add net.inet6.icmp6.nodeinfo sysctl.
this allows you to disable/enable ICMPv6 node information query/reply
processing (which tells remote end the gethostname(3) setting, interface
addresses on the node, and some other things - documented in
draft-ietf-ipngwg-icmp-name-lookup* or something alike).

to test it, try ping6 -w ::1 with nodeinfo=0 and nodeinfo=1.
(sync with kame change)


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.13 15-Dec-1999 itojun

do not overwrite traffic class field when we write IPv6 version field.


# 1.12 13-Dec-1999 itojun

sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)


Revision tags: comdex-fall-1999-base fvdl-softdep-base
# 1.11 01-Oct-1999 itojun

branches: 1.11.2; 1.11.8;
consistent logging for icmp6 redirects
XXX should make logs 1-liner so that duplicated logs can be compressed
by syslog(8)?


Revision tags: chs-ubc2-base
# 1.10 31-Jul-1999 itojun

sync with recent KAME.
- loosen ipsec restriction on packet diredtion.
- revise icmp6 redirect handling on IsRouter bit.
- tcp/udp notification processing (link-local address case)
- cosmetic fixes (better code share across *BSD).


# 1.9 30-Jul-1999 itojun

remove reference to in6_systm.h (file itself will be removed afterwords)


# 1.8 22-Jul-1999 itojun

- implement IPv6 pmtud, which is necessary for TCP6.
- fix memory leak on SO_DEBUG over TCP.


# 1.7 22-Jul-1999 itojun

change unnecessary u_long/long into u_int32_t or something relevant.
more fixes should follow.


# 1.6 09-Jul-1999 thorpej

defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).


# 1.5 06-Jul-1999 itojun

sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups


# 1.4 06-Jul-1999 itojun

checked build on alpha and i386, with GENERIC.v6.
fixed several sizeof(void *) and sizeof(size_t) issues on alpha.

Thanks to: Dave Huang and Tim Rightnour


# 1.3 03-Jul-1999 thorpej

RCS ID police.


# 1.2 01-Jul-1999 itojun

branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.1 28-Jun-1999 itojun

branches: 1.1.2;
file icmp6.c was initially added on branch kame.


# 1.219 23-Jan-2018 maxv

Style, localify, remove XXX when there's no issue, and switch 'extra'
to int.


# 1.218 23-Jan-2018 maxv

Fix the check on 'maxlen', we are not creating struct icmp6_hdr but
struct nd_redirect (which is bigger). Also, make sure we can add a
struct nd_opt_rd_hdr.

Normally this doesn't change anything, since the mbuf has IPV6_MMTU
bytes, and it's always way bigger than what we need.


# 1.217 23-Jan-2018 maxv

Fix info leak. We are allocating a slot of size:

roundup(sizeof(*nd_opt) + ifp->if_addrlen, 8)

But we are not filling in the padding caused by the roundup, and therefore
several bytes are leaked, in the mbuf we're about to send to the network.


# 1.216 23-Jan-2018 maxv

Fix twice the same mistake: 'last' can't be null, so there's no point in
having this misleading branch.


# 1.215 23-Jan-2018 maxv

Style, and four fixes:

* Remove the (disabled) IPPROTO_ESP check. If the packet was decrypted it
will have M_DECRYPTED, and this is already checked.

* Memory leaks in icmp6_error2. They seem hardly triggerable.

* Fix miscomputation in _icmp6_input, the ICMP6 header is not guaranteed
to be located right after the IP6 header. ok mlelstv@

* Memory leak in _icmp6_input. This one seems to be impossible to trigger.


Revision tags: tls-maxphys-base-20171202
# 1.214 05-Nov-2017 ozaki-r

Fix usages of ipsec_used

If IPsec isn't used, we must go back to the normal path.

PR kern/52659


Revision tags: nick-nhusb-base-20170825
# 1.213 02-Aug-2017 ozaki-r

Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.


Revision tags: perseant-stdc-iso10646-base
# 1.212 07-Jul-2017 knakahara

fix PR kern/52353. implemented by ozaki-r@n.o. I just commit by proxy.

XXX need to pullup to -8.


Revision tags: netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.211 14-Mar-2017 ozaki-r

branches: 1.211.6;
Replace DIAGNOSTIC + panic with CTASSERT


# 1.210 17-Feb-2017 ozaki-r

Rename if_acquire_NOMPSAFE to if_acquire

It can be used in MP-safe ways. So let's remove the confusing postfix.
If it's used in a unsafe way, warn NOMPSAFE in a comment.


# 1.209 13-Feb-2017 ozaki-r

Protect mtudisc and redirect stuffs of icmp/icmp6 with mutex

We have to run pr_init of icmp and icmp6 prior to tcp and tcp6 ones
for mutex initialization.


# 1.208 07-Feb-2017 ozaki-r

Add missing NULL checks for m_get_rcvif


Revision tags: nick-nhusb-base-20170204
# 1.207 02-Feb-2017 ozaki-r

Defer some pr_input to workqueue

pr_input is currently called in softint. Some pr_input such as ICMP, ICMPv6
and CARP can add/delete/update IP addresses and routing table entries. For
example, icmp6_redirect_input updates an a routing table entry and
nd6_ra_input may delete an IP address.

Basically such operations shouldn't be done in softint. That aside, we have
a reason to avoid the situation; psz/psref waits cannot be used in softint,
however they are required to work in such pr_input in the MP-safe world.

The change implements the workqueue pr_input framework called wqinput which
provides a means to defer pr_input of a protocol to workqueue easily.
Currently icmp_input, icmp6_input, carp_proto_input and carp6_proto_input
are deferred to workqueue by the framework.

Proposed and discussed on tech-kern and tech-net


# 1.206 16-Jan-2017 christos

ip6_sprintf -> IN6_PRINT so that we pass the size.


# 1.205 16-Jan-2017 ryo

Make ip6_sprintf(), in_fmtaddr(), lla_snprintf() and icmp6_redirect_diag() mpsafe.

Reviewed by ozaki-r@


Revision tags: bouyer-socketcan-base
# 1.204 13-Jan-2017 ozaki-r

branches: 1.204.2;
Tweak icmp6_input; always use off, not *offp


Revision tags: pgoyette-localcount-20170107
# 1.203 12-Dec-2016 ozaki-r

Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.202 11-Dec-2016 ozaki-r

Correct sanity checks of icmp6_redirect_output

- rt->rt_ifp is always non-NULL
- Checking RTF_UP here is just racy and meaningless
- The arguments should be non-NULL (at least for now)


Revision tags: nick-nhusb-base-20161204
# 1.201 15-Nov-2016 mlelstv

Enforce alignment requirements that are violated in some cases.
For machines that don't need strict alignment (i386,amd64,vax,m68k) this
is a no-op.

Fixes PR kern/50766 but should be improved.


Revision tags: pgoyette-localcount-20161104
# 1.200 31-Oct-2016 ozaki-r

Fix race condition of in6_selectsrc

in6_selectsrc returned a pointer to in6_addr that wan't guaranteed to be
safe by pserialize (or psref), which was racy. Let callers pass a pointer
to in6_addr and in6_selectsrc copy a result to it inside pserialize
critical sections.


# 1.199 25-Oct-2016 ozaki-r

Remove unnecessary argument

No functional change.


# 1.198 18-Oct-2016 ozaki-r

Remove unnecessary pserialize_read_enter


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.197 26-Aug-2016 dholland

PR 51434 David Binderman: remove redundant test.


# 1.196 19-Aug-2016 roy

Revert r1.148
IP6_EXTHDR_GET ensures that a icmp6 header can be fetched from the mbuf
so m_pullup does not need to be called.

While here, we can safely increament interface error stats even with an
invalidated mbuf because we have a saved reference to the interface.


Revision tags: pgoyette-localcount-20160806
# 1.195 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


Revision tags: pgoyette-localcount-20160726
# 1.194 15-Jul-2016 ozaki-r

Use sin6tosa and sin6tocsa macros

No functional change.


# 1.193 15-Jul-2016 ozaki-r

Use ifatoia6 macro

No functional change.


Revision tags: pgoyette-localcount-base nick-nhusb-base-20160907
# 1.192 07-Jul-2016 ozaki-r

branches: 1.192.2;
Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.


# 1.191 05-Jul-2016 ozaki-r

Use ia6 or ia instead of ifa as a variable name of struct in6_ifaddr

We conventionally use ifa for struct ifaddr and use ia6 or ia for
struct in6_ifaddr.

No functional change.


# 1.190 28-Jun-2016 ozaki-r

Add missing NULL checks for m_get_rcvif_psref


# 1.189 21-Jun-2016 ozaki-r

Make sure returning ifp from in6_select* functions psref-ed

To this end, callers need to pass struct psref to the functions
and the fuctions acquire a reference of ifp with it. In some cases,
we can simply use if_get_byindex, however, in other cases
(say rt->rt_ifp and ia->ifa_ifp), we have no MP-safe way for now.
In order to take a reference anyway we use non MP-safe function
if_acquire_NOMPSAFE for the latter cases. They should be fixed in
the future somehow.


# 1.188 10-Jun-2016 ozaki-r

Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.


# 1.187 10-Jun-2016 ozaki-r

Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.186 18-May-2016 ozaki-r

Don't try to get outif unnecessarily from in6_selectsrc

The got outif is unused.


# 1.185 17-May-2016 ozaki-r

Get rcvif once and reuse it

No functional change.


# 1.184 17-May-2016 ozaki-r

Make sure icmp6_redirect_input frees mbuf before return


# 1.183 12-May-2016 ozaki-r

Protect ifnet list with psz and psref

The change ensures that ifnet objects in the ifnet list aren't freed during
list iterations by using pserialize(9) and psref(9).

Note that the change adds a pslist(9) for ifnet but doesn't remove the
original ifnet list (ifnet_list) to avoid breaking kvm(3) users. We
shouldn't use the original list in the kernel anymore.


Revision tags: nick-nhusb-base-20160422
# 1.182 04-Apr-2016 ozaki-r

Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html


# 1.181 01-Apr-2016 ozaki-r

Remove unnecessary casts and do s/0/NULL/ for rtrequest


# 1.180 01-Apr-2016 ozaki-r

Refine nd6log

Add __func__ to nd6log itself instead of adding it to callers.


Revision tags: nick-nhusb-base-20160319
# 1.179 21-Jan-2016 riastradh

Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.


# 1.178 21-Jan-2016 riastradh

Give proper prototype to ip_output.


Revision tags: nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.177 14-Sep-2015 ozaki-r

Update icmp6_redirect_timeout_q when changing net.inet6.icmp6.redirtimeout

We have to update icmp6_redirect_timeout_q as well as icmp6_redirtimeout
when changing net.inet6.icmp6.redirtimeout via sysctl. The updating logic
is copied from sysctl_net_inet_icmp_redirtimeout.

This change is from s-yamaguchi@IIJ (with KNF by ozaki-r) and fixes
PR kern/50240.


# 1.176 31-Aug-2015 ozaki-r

Make rt_refcnt take into account rt_timer


# 1.175 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.174 24-Aug-2015 ozaki-r

Change 0 to NULL for rtrequest's last argument (struct rtentry **ret_nrt)


# 1.173 07-Aug-2015 ozaki-r

Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.


# 1.172 24-Jul-2015 ozaki-r

Fix rtfree-ing wrong rtentry


# 1.171 17-Jul-2015 ozaki-r

Reform use of rt_refcnt

rt_refcnt of rtentry was used in bad manners, for example, direct rt_refcnt++
and rt_refcnt-- outside route.c, "rt->rt_refcnt++; rtfree(rt);" idiom, and
touching rt after rt->rt_refcnt--.

These abuses seem to be needed because rt_refcnt manages only references
between rtentry and doesn't take care of references during packet processing
(IOW references from local variables). In order to reduce the above abuses,
the latter cases should be counted by rt_refcnt as well as the former cases.

This change improves consistency of use of rt_refcnt:
- rtentry is always accessed with rt_refcnt incremented
- rtentry's rt_refcnt is decremented after use (rtfree is always used instead
of rt_refcnt--)
- functions returning rtentry increment its rt_refcnt (and caller rtfree it)

Note that rt_refcnt prevents rtentry from being freed but doesn't prevent
rtentry from being updated. Toward MP-safe, we need to provide another
protection for rtentry, e.g., locks. (Or introduce a better data structure
allowing concurrent readers during updates.)


Revision tags: nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.170 25-Nov-2014 christos

branches: 1.170.2;
CID 977389: Out of bounds access.


Revision tags: netbsd-7-0-2-RELEASE netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.169 06-Jun-2014 rmind

branches: 1.169.2;
- Eliminate RTFREE() macro in favour of rtfree() function.
- Make rtcache() function static.


# 1.168 30-May-2014 christos

Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.


# 1.167 19-May-2014 rmind

- Split off PRU_ATTACH and PRU_DETACH logic into separate functions.
- Replace malloc with kmem and eliminate M_PCB while here.
- Sprinkle more asserts.


Revision tags: rmind-smpnet-nbase rmind-smpnet-base
# 1.166 18-May-2014 rmind

Use IFNET_FIRST() rather than open coding ifnet access.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.165 25-Feb-2014 pooka

branches: 1.165.2;
Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.


# 1.164 20-Feb-2014 joerg

Bail out in case m_pulldown failed.


# 1.163 23-Nov-2013 christos

convert from CIRCLEQ to TAILQ.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.162 05-Jun-2013 christos

branches: 1.162.2;
IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.


Revision tags: agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.161 23-Jun-2012 christos

branches: 1.161.2;
4 new sysctls to avoid ipv6 DoS attacks from OpenBSD


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.160 22-Mar-2012 drochner

remove KAME IPSEC, replaced by FAST_IPSEC


Revision tags: netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.159 31-Dec-2011 christos

branches: 1.159.2; 1.159.6; 1.159.8;
- fix offsetof usage, and redundant defines
- kill pointer casts to 0


# 1.158 19-Dec-2011 drochner

rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.157 31-Aug-2011 plunky

branches: 1.157.2; 1.157.6;
NULL does not need a cast


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 rmind-uvmplock-base
# 1.156 12-Sep-2010 drochner

avoid NULL dereference in error case


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.155 18-Oct-2009 christos

branches: 1.155.2; 1.155.4;
fix the sun2 case for real.


# 1.154 12-Oct-2009 christos

unbreak sun2.


# 1.153 16-Sep-2009 pooka

Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.152 18-Mar-2009 cegger

bzero -> memset


# 1.151 18-Mar-2009 cegger

bcmp -> memcmp


Revision tags: netbsd-5-2-3-RELEASE netbsd-5-1-5-RELEASE netbsd-5-2-2-RELEASE netbsd-5-1-4-RELEASE netbsd-5-2-1-RELEASE netbsd-5-1-3-RELEASE netbsd-5-2-RELEASE netbsd-5-2-RC1 netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 nick-hppapmap-base2 netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 haad-dm-base mjf-devfs2-base
# 1.150 03-Oct-2008 adrianp

branches: 1.150.2; 1.150.8;
Fix for CVE-2008-3530 from matt@
Implement improved checking for MTU values on ICMP 'Packet Too Big Messages'


Revision tags: wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.149 06-Aug-2008 plunky

Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base
# 1.148 07-May-2008 bouyer

branches: 1.148.2; 1.148.6;
Sync with ipv4 icmp_input(): make sure the mbuf is writable and
contains the entire icmp message befre calling icmp6_input().
should fix "panic: mbuf too short for IPv6 header" seen by several peoples.


# 1.147 04-May-2008 thorpej

Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577


Revision tags: yamt-nfs-mp-base
# 1.146 23-Apr-2008 thorpej

branches: 1.146.2;
Use <net/net_stats.h> / netstat_sysctl().


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.145 15-Apr-2008 thorpej

branches: 1.145.2;
Make ip6 and icmp6 stats per-cpu.


# 1.144 08-Apr-2008 thorpej

Change IPv6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ip6stat structure; old netstat
binaries will continue to work properly.


# 1.143 08-Apr-2008 thorpej

Change ICMP6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old icmp6stat structure; old netstat
binaries will continue to work properly.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.142 27-Feb-2008 matt

Convert to ansi definitions from old-style definitons.
Remember that func() is not ansi, func(void) is.


Revision tags: nick-net80211-sync-base bouyer-xeni386-merge1 vmlocking2-base3 bouyer-xeni386-nbase yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 bouyer-xeni386-base yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase mjf-devfs-base matt-armv6-base jmcneill-pm-base hpcarm-cleanup-base reinoud-bufcleanup-base
# 1.141 04-Dec-2007 dyoung

branches: 1.141.8; 1.141.12;
Use IFNET_FOREACH() and IFADDR_FOREACH().


Revision tags: vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.140 01-Nov-2007 dyoung

branches: 1.140.2; 1.140.4;
De-__P().


# 1.139 29-Oct-2007 dyoung

The IPv6 stack labels incoming packets with an m_tag whose payload
is a struct ip6aux. A struct ip6aux used to contain a pointer to
an in6_ifaddr, but that pointer could become a dangling reference
in the lifetime of the m_tag, because ip6_setdstifaddr() did not
increase the in6_ifaddr's reference count. I have removed the
pointer from ip6aux. I load it with the interesting fields from
the in6_ifaddr (an IPv6 address, a scope ID, and some flags),
instead.


# 1.138 24-Oct-2007 dyoung

Replace rote sockaddr_in6 initializations (memset(), set sa6_family,
sa6_len, and sa6_add) with sockaddr_in6_init() calls.

De-__P(). Constify. KNF. Shorten a staircase. Change bcmp() to
memcmp().

Extract subroutine in6_setzoneid() from in6_setscope(), for re-use
soon.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base vmlocking-base
# 1.137 19-Sep-2007 dyoung

branches: 1.137.4;
1) Introduce a new socket option, (SOL_SOCKET, SO_NOHEADER), that
tells a socket that it should both add a protocol header to tx'd
datagrams and remove the header from rx'd datagrams:

int onoff = 1, s = socket(...);
setsockopt(s, SOL_SOCKET, SO_NOHEADER, &onoff);

2) Add an implementation of (SOL_SOCKET, SO_NOHEADER) for raw IPv4
sockets.

3) Reorganize the protocols' pr_ctloutput implementations a bit.
Consistently return ENOPROTOOPT when an option is unsupported,
and EINVAL if a supported option's arguments are incorrect.
Reorganize the flow of code so that it's more clear how/when
options are passed down the stack until they are handled.

Shorten some pr_ctloutput staircases for readability.

4) Extract common mbuf code into subroutines, add new sockaddr
methods, and introduce a new subroutine, fsocreate(), for reuse
later; use it first in sys_socket():

struct mbuf *m_getsombuf(struct socket *so)

Create an mbuf and make its owner the socket `so'.

struct mbuf *m_intopt(struct socket *so, int val)

Create an mbuf, make its owner the socket `so', put the
int `val' into it, and set its length to sizeof(int).


int fsocreate(..., int *fd)

Create a socket, a la socreate(9), put the socket into the
given LWP's descriptor table, return the descriptor at `fd'
on success.

void *sockaddr_addr(struct sockaddr *sa, socklen_t *slenp)
const void *sockaddr_const_addr(const struct sockaddr *sa, socklen_t *slenp)

Extract a pointer to the address part of a sockaddr. Write
the length of the address part at `slenp', if `slenp' is
not NULL.

socklen_t sockaddr_getlen(const struct sockaddr *sa)

Return the length of a sockaddr. This just evaluates to
sa->sa_len. I only add this for consistency with code that
appears in a portable userland library that I am going to
import.

const struct sockaddr *sockaddr_any(const struct sockaddr *sa)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.

const void *sockaddr_anyaddr(const struct sockaddr *sa, socklen_t *slenp)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.


Revision tags: nick-csl-alignment-base5
# 1.136 10-Aug-2007 dyoung

branches: 1.136.2;
Constify. bcopy -> memcpy.


Revision tags: matt-mips64-base
# 1.135 19-Jul-2007 dyoung

branches: 1.135.4; 1.135.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.134 13-Jun-2007 dyoung

branches: 1.134.2;
Persuasive programming: check M_UNWRITABLE(m, len) instead of
m->m_len<len before pulling up, because that helps make it clear
that we m_pullup() in order to guarantee that the contiguous region
is *writable*.


# 1.133 23-May-2007 christos

Ansify + add a few comments, from Karl Sj��dahl


Revision tags: yamt-idlelwp-base8
# 1.132 02-May-2007 dyoung

Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


Revision tags: thorpej-atomic-base
# 1.131 04-Mar-2007 christos

branches: 1.131.2; 1.131.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.130 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


# 1.129 10-Feb-2007 degroote

branches: 1.129.2;
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.128 29-Jan-2007 dyoung

bzero -> memset


# 1.127 15-Jan-2007 dyoung

Cosmetic: indent using ASCII horizontal tab, insert space following
comma, wrap line.


# 1.126 15-Jan-2007 degroote

Fix an infinite loop ( and local dos ) in the case where the ip6_hdr and
the icmp6_hdr are not in the same mbuf.
Fix pr/34994 and probably pr/35333
Ok @rpaulo


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.125 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.124 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


Revision tags: netbsd-4-base
# 1.123 16-Nov-2006 christos

branches: 1.123.2;
__unused removal on arguments; approved by core.


Revision tags: yamt-splraiseipl-base2
# 1.122 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.121 05-Sep-2006 dyoung

branches: 1.121.2; 1.121.4;
Simplify and repair icmp6_input() to stop the kernel from panicking
in m_copydata() when an ICMP6_ECHO_REQUEST is received, as reported
by Tatoku Ogaito on current-users@.


Revision tags: yamt-pdpolicy-base8
# 1.120 01-Sep-2006 dyoung

Vastly simplify the code that copies an ICMP6 packet to two data
paths: ICMP6 reply path, and socket path.


# 1.119 30-Aug-2006 christos

declare the type of code.


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.118 11-Jul-2006 tron

Clear mbuf checksum flags before passing it to ip6_output(). We might
recycle a mbuf which contained a hardware provided checksum. This
fixes "traceroute6" to a machine which is using a wm(4) interface
that has UDP or TCP checksum offload enabled.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base
# 1.117 07-Jun-2006 kardel

branches: 1.117.2;
merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base simonb-timecounters-base
# 1.116 15-Apr-2006 christos

branches: 1.116.2;
Coverity CID 740: Change constant comparisons to MCLBYTES to KASSERT and remove
extraneous tests.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2
# 1.115 05-Mar-2006 rpaulo

branches: 1.115.2; 1.115.4;
NDP-related improvements:
RFC4191
- supports host-side router-preference

RFC3542
- if DAD fails on a interface, disables IPv6 operation on the
interface
- don't advertise MLD report before DAD finishes

Others
- fixes integer overflow for valid and preferred lifetimes
- improves timer granularity for MLD, using callout-timer.
- reflects rtadvd's IPv6 host variable information into kernel
(router only)
- adds a sysctl option to enable/disable pMTUd for multicast
packets
- performs NUD on PPP/GRE interface by default
- Redirect works regardless of ip6_accept_rtadv
- removes RFC1885-related code

From the KAME project via SUZUKI Shinsuke.
Reviewed by core.


Revision tags: yamt-pdpolicy-base
# 1.114 03-Mar-2006 rpaulo

branches: 1.114.2;
Fix typos in comments.

From: the KAME project via SUZUKI Shinsuke.


Revision tags: yamt-uio_vmspace-base5
# 1.113 21-Jan-2006 rpaulo

branches: 1.113.2; 1.113.4;
Better support of IPv6 scoped addresses.

- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.
- and, while there, don't try to remove the ff02::/32 interface route
entry in in6_ifdetach() as it's already gone.

This also includes some level of support for the standard source
address selection algorithm defined in RFC3484, which will be
completed on in the future.

From the KAME project via JINMEI Tatuya.
Approved by core@.


# 1.112 11-Dec-2005 christos

branches: 1.112.2;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base ktrace-lwp-base
# 1.111 19-Oct-2005 bouyer

In icmp6_redirect_output(), sip6 is initialised to point to the data area of
m0. But m0 may be freed later, so trying to use sip6 at the end of this
function is wrong. My guess is that we want to reference the data area
of m (the mbuf about to be send) instead at this point.
Fix a panic on Xen (where a data area of a mbuf may be unmapped when the
mbuf is freed), and probably potential data/pool corruption in other cases.


Revision tags: yamt-vop-base
# 1.110 18-Aug-2005 yamt

branches: 1.110.2;
- introduce M_MOVE_PKTHDR and use it where appropriate.
intended to be mostly API compatible with openbsd/freebsd.
- remove a glue #define in netipsec/ipsec_osdep.h.


# 1.109 29-May-2005 christos

branches: 1.109.2;
- avoid shadowed variables
- sprinkle const.


Revision tags: netbsd-3-1-1-RELEASE netbsd-3-0-3-RELEASE netbsd-3-1-RELEASE netbsd-3-0-2-RELEASE netbsd-3-1-RC4 netbsd-3-1-RC3 netbsd-3-1-RC2 netbsd-3-1-RC1 netbsd-3-0-1-RELEASE netbsd-3-0-RELEASE netbsd-3-0-RC6 netbsd-3-0-RC5 netbsd-3-0-RC4 netbsd-3-0-RC3 netbsd-3-0-RC2 netbsd-3-0-RC1 yamt-km-base4 yamt-km-base3 netbsd-3-base yamt-km-base2 yamt-km-base kent-audio2-base
# 1.108 17-Jan-2005 itojun

branches: 1.108.6; 1.108.8; 1.108.10;
shouldn't check code field on "packet too big" icmp6 message.


Revision tags: kent-audio1-beforemerge kent-audio1-base
# 1.107 25-May-2004 atatat

branches: 1.107.4;
Sysctl descriptions under net subtree (net.key not done)


Revision tags: netbsd-2-0-base
# 1.106 26-Mar-2004 itojun

branches: 1.106.2;
do not touch m->m_pkthdr.rcvif after m becomes invalid. Patrick Latifi


# 1.105 24-Mar-2004 atatat

Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.


# 1.104 17-Dec-2003 lha

Fix ICMPV6CTL_ND6_[DP]RLIST, they broke with new sysctl.
Makes ndp -r/ndp -p work again, patch from atatat


# 1.103 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.102 30-Oct-2003 simonb

Remove some assigned-to but otherwise unused variables.


# 1.101 04-Sep-2003 itojun

revamp inpcb/in6pcb so that they are more aligned with each other.
in6pcb lookup now uses hash(9).


# 1.100 25-Aug-2003 itojun

deref member in in6p directly, don't rely on existence of macro


# 1.99 22-Aug-2003 itojun

remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.


# 1.98 22-Aug-2003 itojun

change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.


# 1.97 22-Aug-2003 jonathan

Replace the set_socket() method of passing an extra struct socket*
argument to ip6_output() with a new explicit struct in6pcb* argument.
(The underlying socket can be obtained via in6pcb->inp6_socket.)

In preparation for fast-ipsec. Reviewed by itojun.


# 1.96 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.95 06-Aug-2003 itojun

m_cat may free mbuf on 2nd arg, so m_pkthdr manipulation has to happen
before m_cat call. from Julian Coleman via kame.


# 1.94 24-Jun-2003 itojun

branches: 1.94.2;
remove unneeded checks of accept_rtadv. from kame


# 1.93 24-Jun-2003 itojun

use time.tv_sec directly


# 1.92 06-Jun-2003 itojun

- sync up MLD declaration with RFC3542 (s/MLD6/MLD/)
- routing header declaration with RFC3542
(note: sizeof(ip6_rthdr0) has changed!)
also, sync up with RFC2460 routing header definition (no "strict" source
routing mode any more)

part of advanced API update (RFC2292 -> 3542).


# 1.91 03-Jun-2003 itojun

remove assumption on redirect header option processing. from kame


# 1.90 14-May-2003 itojun

always use PULLDOWN_TEST codepath.


# 1.89 31-Mar-2003 itojun

avoid mbuf leak in redirect header option attachment. more complete
fix to come. from kame


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.88 27-Sep-2002 provos

remove trailing \n in panic(). approved perry.


# 1.87 23-Sep-2002 simonb

Remove breaks after returns, unreachable returns and returns after
returns(!).


# 1.86 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame.


Revision tags: gehenna-devsw-base
# 1.85 30-Jul-2002 itojun

no need to check NULL mbuf, as we touch it already.
From: tedu <grendel@zeitbombe.org>


# 1.84 10-Jul-2002 itojun

correct ping6 -w result wth hostname with [A-Z]. PR 17540. sync w/kame


# 1.83 30-Jun-2002 thorpej

Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).


# 1.82 09-Jun-2002 itojun

whitespace cleanup


# 1.81 08-Jun-2002 itojun

whitespace cleanup


# 1.80 31-May-2002 itojun

do not mistakenly lock PMTUD route entry with RTV_MTU.


# 1.79 29-May-2002 christos

make this compile again.


# 1.78 29-May-2002 itojun

correct rmx_mtu value after PMTUD entry timeout (should be set to 0)


# 1.77 24-May-2002 itojun

extra blank line


# 1.76 24-May-2002 itojun

make a strict check before sending FQDN node information reply. sync w/kame


Revision tags: netbsd-1-6-base eeh-devprop-base newlock-base
# 1.75 05-Mar-2002 itojun

branches: 1.75.6; 1.75.8;
on redirect output, always try to attach target link layer address option.


Revision tags: ifpoll-base
# 1.74 21-Dec-2001 itojun

whitespace/costmetic sync w/kame


# 1.73 20-Dec-2001 itojun

centralize multicast group management (in6_join/leavegroup).
have a flag for ip6_output() to fragment to minimum MTU.
sync with kame


# 1.72 07-Dec-2001 itojun

correct timing to increment icmp6 MIB variables. sync with kame


# 1.71 13-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-mips-cache-base
# 1.70 29-Oct-2001 simonb

Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.


# 1.69 24-Oct-2001 itojun

more whitespace sync with kame


# 1.68 18-Oct-2001 itojun

branches: 1.68.2;
simplify per-if stats.


# 1.67 15-Oct-2001 itojun

sync with kame.
net.inet6.icmp6.nodeinfo is now a bitmap (2^0 = ping6 -w, 2^1 = ping6 -a).
give up local if there's mbuf alloc failures.
cope with ".." in hostname.
sync comments/whitespaces.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.66 22-Jun-2001 itojun

branches: 1.66.2;
remove RFC1885 compatibility code in #ifdef COMPAT_RFC1885, for icmp6
reply packet size consideration (obsolete, not used for a long time).
sync with kame


# 1.65 01-Jun-2001 itojun

use default hoplimit when incoming interface is not given to icmp6_reflect.
sync with kame


# 1.64 08-May-2001 itojun

correct faith prefix determination. use sys/netinet/if_faith.c:faithprefix()
to determine. sync with kame.
(without this change, non-faith socket may mistakenly accept for-faith traffic)


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.63 04-Apr-2001 itojun

make sure rcvif is sane on call to icmp6_reflect


# 1.62 30-Mar-2001 itojun

enable FAKE_LOOPBACK_IF case by default.
now traffic on loopback interface will be presented to bpf as normal wire
format packet (without KAME scopeid in s6_addr16[1]).

fix KAME PR 250 (host mistakenly accepts packets to fe80::x%lo0).

sync with kame.


# 1.61 21-Mar-2001 itojun

set rmx_mtu to L2 interface mtu, instead of 0, on mtudisc timeout.
ip6_output() change is for safety. sync with kame


# 1.60 08-Mar-2001 itojun

remove bogus rtfree. sync with kame. inspired by openbsd PR 1706.


# 1.59 01-Mar-2001 itojun

branches: 1.59.2;
make sure to enforce inbound ipsec policy checking, for any protocols on top
of ip (check it when final header is visited). sync with kame.
XXX kame team will need to re-check policy engine code


# 1.58 11-Feb-2001 itojun

pull latest kame pcbnotify code. synchronizes ICMPv6 path mtu discovery
behavior with other protocols (i.e. validation, use of hiwat/lowat).


# 1.57 11-Feb-2001 itojun

recover $NetBSD$ (removed by mistake)


# 1.56 10-Feb-2001 itojun

to sync with kame better, (1) remove register declaration for variables,
(2) sync whitespaces, (3) update comments. (4) bring in some of portability
and logging enhancements. no functional changes here.


# 1.55 08-Feb-2001 itojun

implement upper limit to icmp6 redirects (experimental, turned off)
negative value to {mtudisc,redirect}_{hi,lo}wat will turn off the limitation.
sync with kame.


# 1.54 07-Feb-2001 itojun

remove bogus DIAGNOSTIC. sync with kame


# 1.53 07-Feb-2001 itojun

during ip6/icmp6 inbound packet processing, do not call log() nor printf() in
normal operation (/var can get filled up by flodding bogus packets).
sysctl net.inet6.icmp6.nd6_debug will turn on diagnostic messages.
(#define ND6_DEBUG will turn it on by default)

improve stats in ND6 code.

lots of synchronziation with kame (including comments and cometic ones).


# 1.52 24-Jan-2001 itojun

- record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation


# 1.51 16-Jan-2001 itojun

s/ND6DEBUG/ND6_DEBUG/ to meet other places


# 1.50 08-Jan-2001 itojun

wrap icmp6 checksum error printf() into #ifdef ND6DEBUG.
sync with kame, NetBSD PR 11911.


# 1.49 11-Dec-2000 itojun

no need to rtalloc1() twice in pmtud. from kame


# 1.48 09-Dec-2000 itojun

update icmp6 too big validation. the change is necessary since pmtud is
mandatory for IPv6 (so we can't just validate by using connected pcb - we need
to allow traffic from unconnected pcb to do pmtud).
- if the traffic is validated by xx_ctlinput, allow up to "hiwat" pmtud
route entries.
- if the traffic was not validated by xx_ctlinput, allow up to "lowat" pmtud
route entries (there's upper limit, so bad guys cannot blow up our routing
table).
sync with kame

XXX need to think again about default hiwat/lowat value.
XXX victim selection to help starvation case


# 1.47 11-Nov-2000 itojun

improve spec conformance of node information query (07).
sync with kame.


# 1.46 18-Oct-2000 itojun

verify ICMPv6 too big messages based on TCP pcbs, and/or IPsec SA.
TODO: udp6, and sendto consideration. as pmtud is mandatory for IPv6,
it is rather important for us to support those cases.
TODO: more testing
TODO: kame sync


# 1.45 10-Oct-2000 itojun

sync with kame ($KAME$)


# 1.44 02-Oct-2000 itojun

fix compilation without INET. fix confusion between ipsecstat and ipsec6stat.
sync with kame.


# 1.43 16-Sep-2000 itojun

kame sys/netinet6/icmp6.c 1.140 -> 1.144
> in the check for the incoming redirect message, examine the gateway
> (from the routing table) only when the address family of the gateway is
> AF_INET6.


# 1.42 19-Aug-2000 itojun

- icmp6 nodeinfo: remove possibility of unaligned pointer access.
- jumbo payload output: fix incorrect mbuf manipulation
- pedant: align issues, mbuf assumption
(sync with kame)


# 1.41 03-Aug-2000 itojun

clearifications in icmp6 node query support.
XXX previous commit included "supported qtypes" icmp6 node query support.
sorry commit message was mistaken.


# 1.40 03-Aug-2000 itojun

correct typo in #define. ICMP6_NI_SUCESS -> SUCCESS (notice missing C).
sync with kame.


# 1.39 30-Jul-2000 itojun

sync comment with reality


# 1.38 28-Jul-2000 itojun

nuke the following sysctl variables. "ppsratelimit" should work better.
need to recompile sbin/sysctl after updating /usr/include.
net.inet.tcp.rstratelimit
net.inet.icmp.errratelimit
net.inet6.icmp6.errratelimit


# 1.37 09-Jul-2000 itojun

add ppsratelimit(9), which does event-per-sec rate limitation.
use it from icmp6 error rate limitation code.
XXX better name for the function?


# 1.36 07-Jul-2000 itojun

sync with kame.
introduce in6_{recover,embed}scope, for in-kernel scoped-address manipulation.
improve in6_pcbnotify.


# 1.35 06-Jul-2000 itojun

- do not use bitfield for router renumbering header.
- add protection mechanism against ND cache corruption due to bad NUD hints.
- more stats
- icmp6 pps limitation. TOOD: should implement ppsratecheck(9).


# 1.34 28-Jun-2000 mrg

<vm/vm.h> -> <uvm/uvm_extern.h>


Revision tags: netbsd-1-5-base
# 1.33 13-Jun-2000 itojun

branches: 1.33.2;
signedness issue with char, take 2. confirmed with i386 cc -funsigned-char.


# 1.32 13-Jun-2000 itojun

workaround to suppress warning on char == unsigned char arch.


# 1.31 12-Jun-2000 itojun

better conformance to draft-ietf-ipngwg-icmp-name-lookups-05.
the old code was chimera of 03 and 05 draft.

-n by default, since IPv6 reverse lookup takes too much time.
use -H to enable reverse name lookup.


Revision tags: minoura-xpg4dl-base
# 1.30 22-May-2000 itojun

branches: 1.30.2;
disallow negative numbers for ratelimit interval (tcp, icmp, icmp6).


# 1.29 09-May-2000 itojun

do not try NUD unless the gateway is a real neighbor.
real fix to KAME PR 245 (workaround has been implemented).


# 1.28 13-Apr-2000 itojun

do not return icmp6 error against icmp6 error.
(this is due to a bug in header chain chasing)


# 1.27 22-Mar-2000 itojun

use ip6_{last,next}hdr in icmp6 inbound packet parsing.


# 1.26 01-Mar-2000 itojun

introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)


# 1.25 28-Feb-2000 itojun

fix ICMPv6 redirect input. the bug can result in invalid ND entry.


# 1.24 28-Feb-2000 itojun

support draft-ietf-ipngwg-icmp-name-lookups-05.txt, drop support for
draft-ietf-ipngwg-icmp-name-lookups-04.txt.

There are certain bitfield change in 04 draft to 05 draft, which makes
04 "ping6 -a" and 05 "ping6 -a" not interoperable. sigh.


# 1.23 26-Feb-2000 itojun

bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.


# 1.22 17-Feb-2000 darrenr

Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.


# 1.21 15-Feb-2000 thorpej

Fix a couple of brainos in the last.


# 1.20 14-Feb-2000 thorpej

Use ratecheck() for ICMP6 rate limiting.


Revision tags: chs-ubc2-newbase
# 1.19 06-Feb-2000 itojun

fix include pathname for better rfc2292 compliance.


# 1.18 16-Jan-2000 itojun

add missing ipcomp cases.


# 1.17 07-Jan-2000 itohy

Rename variable "prep" for PReP port.


# 1.16 06-Jan-2000 itojun

remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...


# 1.15 05-Jan-2000 itojun

avoid panic on getsockopt(ICMPV6_FILTER).


# 1.14 02-Jan-2000 itojun

add net.inet6.icmp6.nodeinfo sysctl.
this allows you to disable/enable ICMPv6 node information query/reply
processing (which tells remote end the gethostname(3) setting, interface
addresses on the node, and some other things - documented in
draft-ietf-ipngwg-icmp-name-lookup* or something alike).

to test it, try ping6 -w ::1 with nodeinfo=0 and nodeinfo=1.
(sync with kame change)


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.13 15-Dec-1999 itojun

do not overwrite traffic class field when we write IPv6 version field.


# 1.12 13-Dec-1999 itojun

sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)


Revision tags: comdex-fall-1999-base fvdl-softdep-base
# 1.11 01-Oct-1999 itojun

branches: 1.11.2; 1.11.8;
consistent logging for icmp6 redirects
XXX should make logs 1-liner so that duplicated logs can be compressed
by syslog(8)?


Revision tags: chs-ubc2-base
# 1.10 31-Jul-1999 itojun

sync with recent KAME.
- loosen ipsec restriction on packet diredtion.
- revise icmp6 redirect handling on IsRouter bit.
- tcp/udp notification processing (link-local address case)
- cosmetic fixes (better code share across *BSD).


# 1.9 30-Jul-1999 itojun

remove reference to in6_systm.h (file itself will be removed afterwords)


# 1.8 22-Jul-1999 itojun

- implement IPv6 pmtud, which is necessary for TCP6.
- fix memory leak on SO_DEBUG over TCP.


# 1.7 22-Jul-1999 itojun

change unnecessary u_long/long into u_int32_t or something relevant.
more fixes should follow.


# 1.6 09-Jul-1999 thorpej

defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).


# 1.5 06-Jul-1999 itojun

sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups


# 1.4 06-Jul-1999 itojun

checked build on alpha and i386, with GENERIC.v6.
fixed several sizeof(void *) and sizeof(size_t) issues on alpha.

Thanks to: Dave Huang and Tim Rightnour


# 1.3 03-Jul-1999 thorpej

RCS ID police.


# 1.2 01-Jul-1999 itojun

branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.1 28-Jun-1999 itojun

branches: 1.1.2;
file icmp6.c was initially added on branch kame.


# 1.214 05-Nov-2017 ozaki-r

Fix usages of ipsec_used

If IPsec isn't used, we must go back to the normal path.

PR kern/52659


Revision tags: nick-nhusb-base-20170825
# 1.213 02-Aug-2017 ozaki-r

Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.


Revision tags: perseant-stdc-iso10646-base
# 1.212 07-Jul-2017 knakahara

fix PR kern/52353. implemented by ozaki-r@n.o. I just commit by proxy.

XXX need to pullup to -8.


Revision tags: netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.211 14-Mar-2017 ozaki-r

branches: 1.211.6;
Replace DIAGNOSTIC + panic with CTASSERT


# 1.210 17-Feb-2017 ozaki-r

Rename if_acquire_NOMPSAFE to if_acquire

It can be used in MP-safe ways. So let's remove the confusing postfix.
If it's used in a unsafe way, warn NOMPSAFE in a comment.


# 1.209 13-Feb-2017 ozaki-r

Protect mtudisc and redirect stuffs of icmp/icmp6 with mutex

We have to run pr_init of icmp and icmp6 prior to tcp and tcp6 ones
for mutex initialization.


# 1.208 07-Feb-2017 ozaki-r

Add missing NULL checks for m_get_rcvif


Revision tags: nick-nhusb-base-20170204
# 1.207 02-Feb-2017 ozaki-r

Defer some pr_input to workqueue

pr_input is currently called in softint. Some pr_input such as ICMP, ICMPv6
and CARP can add/delete/update IP addresses and routing table entries. For
example, icmp6_redirect_input updates an a routing table entry and
nd6_ra_input may delete an IP address.

Basically such operations shouldn't be done in softint. That aside, we have
a reason to avoid the situation; psz/psref waits cannot be used in softint,
however they are required to work in such pr_input in the MP-safe world.

The change implements the workqueue pr_input framework called wqinput which
provides a means to defer pr_input of a protocol to workqueue easily.
Currently icmp_input, icmp6_input, carp_proto_input and carp6_proto_input
are deferred to workqueue by the framework.

Proposed and discussed on tech-kern and tech-net


# 1.206 16-Jan-2017 christos

ip6_sprintf -> IN6_PRINT so that we pass the size.


# 1.205 16-Jan-2017 ryo

Make ip6_sprintf(), in_fmtaddr(), lla_snprintf() and icmp6_redirect_diag() mpsafe.

Reviewed by ozaki-r@


Revision tags: bouyer-socketcan-base
# 1.204 13-Jan-2017 ozaki-r

branches: 1.204.2;
Tweak icmp6_input; always use off, not *offp


Revision tags: pgoyette-localcount-20170107
# 1.203 12-Dec-2016 ozaki-r

Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.202 11-Dec-2016 ozaki-r

Correct sanity checks of icmp6_redirect_output

- rt->rt_ifp is always non-NULL
- Checking RTF_UP here is just racy and meaningless
- The arguments should be non-NULL (at least for now)


Revision tags: nick-nhusb-base-20161204
# 1.201 15-Nov-2016 mlelstv

Enforce alignment requirements that are violated in some cases.
For machines that don't need strict alignment (i386,amd64,vax,m68k) this
is a no-op.

Fixes PR kern/50766 but should be improved.


Revision tags: pgoyette-localcount-20161104
# 1.200 31-Oct-2016 ozaki-r

Fix race condition of in6_selectsrc

in6_selectsrc returned a pointer to in6_addr that wan't guaranteed to be
safe by pserialize (or psref), which was racy. Let callers pass a pointer
to in6_addr and in6_selectsrc copy a result to it inside pserialize
critical sections.


# 1.199 25-Oct-2016 ozaki-r

Remove unnecessary argument

No functional change.


# 1.198 18-Oct-2016 ozaki-r

Remove unnecessary pserialize_read_enter


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.197 26-Aug-2016 dholland

PR 51434 David Binderman: remove redundant test.


# 1.196 19-Aug-2016 roy

Revert r1.148
IP6_EXTHDR_GET ensures that a icmp6 header can be fetched from the mbuf
so m_pullup does not need to be called.

While here, we can safely increament interface error stats even with an
invalidated mbuf because we have a saved reference to the interface.


Revision tags: pgoyette-localcount-20160806
# 1.195 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


Revision tags: pgoyette-localcount-20160726
# 1.194 15-Jul-2016 ozaki-r

Use sin6tosa and sin6tocsa macros

No functional change.


# 1.193 15-Jul-2016 ozaki-r

Use ifatoia6 macro

No functional change.


Revision tags: pgoyette-localcount-base nick-nhusb-base-20160907
# 1.192 07-Jul-2016 ozaki-r

branches: 1.192.2;
Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.


# 1.191 05-Jul-2016 ozaki-r

Use ia6 or ia instead of ifa as a variable name of struct in6_ifaddr

We conventionally use ifa for struct ifaddr and use ia6 or ia for
struct in6_ifaddr.

No functional change.


# 1.190 28-Jun-2016 ozaki-r

Add missing NULL checks for m_get_rcvif_psref


# 1.189 21-Jun-2016 ozaki-r

Make sure returning ifp from in6_select* functions psref-ed

To this end, callers need to pass struct psref to the functions
and the fuctions acquire a reference of ifp with it. In some cases,
we can simply use if_get_byindex, however, in other cases
(say rt->rt_ifp and ia->ifa_ifp), we have no MP-safe way for now.
In order to take a reference anyway we use non MP-safe function
if_acquire_NOMPSAFE for the latter cases. They should be fixed in
the future somehow.


# 1.188 10-Jun-2016 ozaki-r

Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.


# 1.187 10-Jun-2016 ozaki-r

Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.186 18-May-2016 ozaki-r

Don't try to get outif unnecessarily from in6_selectsrc

The got outif is unused.


# 1.185 17-May-2016 ozaki-r

Get rcvif once and reuse it

No functional change.


# 1.184 17-May-2016 ozaki-r

Make sure icmp6_redirect_input frees mbuf before return


# 1.183 12-May-2016 ozaki-r

Protect ifnet list with psz and psref

The change ensures that ifnet objects in the ifnet list aren't freed during
list iterations by using pserialize(9) and psref(9).

Note that the change adds a pslist(9) for ifnet but doesn't remove the
original ifnet list (ifnet_list) to avoid breaking kvm(3) users. We
shouldn't use the original list in the kernel anymore.


Revision tags: nick-nhusb-base-20160422
# 1.182 04-Apr-2016 ozaki-r

Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html


# 1.181 01-Apr-2016 ozaki-r

Remove unnecessary casts and do s/0/NULL/ for rtrequest


# 1.180 01-Apr-2016 ozaki-r

Refine nd6log

Add __func__ to nd6log itself instead of adding it to callers.


Revision tags: nick-nhusb-base-20160319
# 1.179 21-Jan-2016 riastradh

Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.


# 1.178 21-Jan-2016 riastradh

Give proper prototype to ip_output.


Revision tags: nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.177 14-Sep-2015 ozaki-r

Update icmp6_redirect_timeout_q when changing net.inet6.icmp6.redirtimeout

We have to update icmp6_redirect_timeout_q as well as icmp6_redirtimeout
when changing net.inet6.icmp6.redirtimeout via sysctl. The updating logic
is copied from sysctl_net_inet_icmp_redirtimeout.

This change is from s-yamaguchi@IIJ (with KNF by ozaki-r) and fixes
PR kern/50240.


# 1.176 31-Aug-2015 ozaki-r

Make rt_refcnt take into account rt_timer


# 1.175 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.174 24-Aug-2015 ozaki-r

Change 0 to NULL for rtrequest's last argument (struct rtentry **ret_nrt)


# 1.173 07-Aug-2015 ozaki-r

Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.


# 1.172 24-Jul-2015 ozaki-r

Fix rtfree-ing wrong rtentry


# 1.171 17-Jul-2015 ozaki-r

Reform use of rt_refcnt

rt_refcnt of rtentry was used in bad manners, for example, direct rt_refcnt++
and rt_refcnt-- outside route.c, "rt->rt_refcnt++; rtfree(rt);" idiom, and
touching rt after rt->rt_refcnt--.

These abuses seem to be needed because rt_refcnt manages only references
between rtentry and doesn't take care of references during packet processing
(IOW references from local variables). In order to reduce the above abuses,
the latter cases should be counted by rt_refcnt as well as the former cases.

This change improves consistency of use of rt_refcnt:
- rtentry is always accessed with rt_refcnt incremented
- rtentry's rt_refcnt is decremented after use (rtfree is always used instead
of rt_refcnt--)
- functions returning rtentry increment its rt_refcnt (and caller rtfree it)

Note that rt_refcnt prevents rtentry from being freed but doesn't prevent
rtentry from being updated. Toward MP-safe, we need to provide another
protection for rtentry, e.g., locks. (Or introduce a better data structure
allowing concurrent readers during updates.)


Revision tags: nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.170 25-Nov-2014 christos

branches: 1.170.2;
CID 977389: Out of bounds access.


Revision tags: netbsd-7-0-2-RELEASE netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.169 06-Jun-2014 rmind

branches: 1.169.2;
- Eliminate RTFREE() macro in favour of rtfree() function.
- Make rtcache() function static.


# 1.168 30-May-2014 christos

Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.


# 1.167 19-May-2014 rmind

- Split off PRU_ATTACH and PRU_DETACH logic into separate functions.
- Replace malloc with kmem and eliminate M_PCB while here.
- Sprinkle more asserts.


Revision tags: rmind-smpnet-nbase rmind-smpnet-base
# 1.166 18-May-2014 rmind

Use IFNET_FIRST() rather than open coding ifnet access.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.165 25-Feb-2014 pooka

branches: 1.165.2;
Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.


# 1.164 20-Feb-2014 joerg

Bail out in case m_pulldown failed.


# 1.163 23-Nov-2013 christos

convert from CIRCLEQ to TAILQ.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.162 05-Jun-2013 christos

branches: 1.162.2;
IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.


Revision tags: agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.161 23-Jun-2012 christos

branches: 1.161.2;
4 new sysctls to avoid ipv6 DoS attacks from OpenBSD


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.160 22-Mar-2012 drochner

remove KAME IPSEC, replaced by FAST_IPSEC


Revision tags: netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.159 31-Dec-2011 christos

branches: 1.159.2; 1.159.6; 1.159.8;
- fix offsetof usage, and redundant defines
- kill pointer casts to 0


# 1.158 19-Dec-2011 drochner

rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.157 31-Aug-2011 plunky

branches: 1.157.2; 1.157.6;
NULL does not need a cast


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 rmind-uvmplock-base
# 1.156 12-Sep-2010 drochner

avoid NULL dereference in error case


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.155 18-Oct-2009 christos

branches: 1.155.2; 1.155.4;
fix the sun2 case for real.


# 1.154 12-Oct-2009 christos

unbreak sun2.


# 1.153 16-Sep-2009 pooka

Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.152 18-Mar-2009 cegger

bzero -> memset


# 1.151 18-Mar-2009 cegger

bcmp -> memcmp


Revision tags: netbsd-5-2-3-RELEASE netbsd-5-1-5-RELEASE netbsd-5-2-2-RELEASE netbsd-5-1-4-RELEASE netbsd-5-2-1-RELEASE netbsd-5-1-3-RELEASE netbsd-5-2-RELEASE netbsd-5-2-RC1 netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 nick-hppapmap-base2 netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 haad-dm-base mjf-devfs2-base
# 1.150 03-Oct-2008 adrianp

branches: 1.150.2; 1.150.8;
Fix for CVE-2008-3530 from matt@
Implement improved checking for MTU values on ICMP 'Packet Too Big Messages'


Revision tags: wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.149 06-Aug-2008 plunky

Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base
# 1.148 07-May-2008 bouyer

branches: 1.148.2; 1.148.6;
Sync with ipv4 icmp_input(): make sure the mbuf is writable and
contains the entire icmp message befre calling icmp6_input().
should fix "panic: mbuf too short for IPv6 header" seen by several peoples.


# 1.147 04-May-2008 thorpej

Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577


Revision tags: yamt-nfs-mp-base
# 1.146 23-Apr-2008 thorpej

branches: 1.146.2;
Use <net/net_stats.h> / netstat_sysctl().


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.145 15-Apr-2008 thorpej

branches: 1.145.2;
Make ip6 and icmp6 stats per-cpu.


# 1.144 08-Apr-2008 thorpej

Change IPv6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ip6stat structure; old netstat
binaries will continue to work properly.


# 1.143 08-Apr-2008 thorpej

Change ICMP6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old icmp6stat structure; old netstat
binaries will continue to work properly.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.142 27-Feb-2008 matt

Convert to ansi definitions from old-style definitons.
Remember that func() is not ansi, func(void) is.


Revision tags: nick-net80211-sync-base bouyer-xeni386-merge1 vmlocking2-base3 bouyer-xeni386-nbase yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 bouyer-xeni386-base yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase mjf-devfs-base matt-armv6-base jmcneill-pm-base hpcarm-cleanup-base reinoud-bufcleanup-base
# 1.141 04-Dec-2007 dyoung

branches: 1.141.8; 1.141.12;
Use IFNET_FOREACH() and IFADDR_FOREACH().


Revision tags: vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.140 01-Nov-2007 dyoung

branches: 1.140.2; 1.140.4;
De-__P().


# 1.139 29-Oct-2007 dyoung

The IPv6 stack labels incoming packets with an m_tag whose payload
is a struct ip6aux. A struct ip6aux used to contain a pointer to
an in6_ifaddr, but that pointer could become a dangling reference
in the lifetime of the m_tag, because ip6_setdstifaddr() did not
increase the in6_ifaddr's reference count. I have removed the
pointer from ip6aux. I load it with the interesting fields from
the in6_ifaddr (an IPv6 address, a scope ID, and some flags),
instead.


# 1.138 24-Oct-2007 dyoung

Replace rote sockaddr_in6 initializations (memset(), set sa6_family,
sa6_len, and sa6_add) with sockaddr_in6_init() calls.

De-__P(). Constify. KNF. Shorten a staircase. Change bcmp() to
memcmp().

Extract subroutine in6_setzoneid() from in6_setscope(), for re-use
soon.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base vmlocking-base
# 1.137 19-Sep-2007 dyoung

branches: 1.137.4;
1) Introduce a new socket option, (SOL_SOCKET, SO_NOHEADER), that
tells a socket that it should both add a protocol header to tx'd
datagrams and remove the header from rx'd datagrams:

int onoff = 1, s = socket(...);
setsockopt(s, SOL_SOCKET, SO_NOHEADER, &onoff);

2) Add an implementation of (SOL_SOCKET, SO_NOHEADER) for raw IPv4
sockets.

3) Reorganize the protocols' pr_ctloutput implementations a bit.
Consistently return ENOPROTOOPT when an option is unsupported,
and EINVAL if a supported option's arguments are incorrect.
Reorganize the flow of code so that it's more clear how/when
options are passed down the stack until they are handled.

Shorten some pr_ctloutput staircases for readability.

4) Extract common mbuf code into subroutines, add new sockaddr
methods, and introduce a new subroutine, fsocreate(), for reuse
later; use it first in sys_socket():

struct mbuf *m_getsombuf(struct socket *so)

Create an mbuf and make its owner the socket `so'.

struct mbuf *m_intopt(struct socket *so, int val)

Create an mbuf, make its owner the socket `so', put the
int `val' into it, and set its length to sizeof(int).


int fsocreate(..., int *fd)

Create a socket, a la socreate(9), put the socket into the
given LWP's descriptor table, return the descriptor at `fd'
on success.

void *sockaddr_addr(struct sockaddr *sa, socklen_t *slenp)
const void *sockaddr_const_addr(const struct sockaddr *sa, socklen_t *slenp)

Extract a pointer to the address part of a sockaddr. Write
the length of the address part at `slenp', if `slenp' is
not NULL.

socklen_t sockaddr_getlen(const struct sockaddr *sa)

Return the length of a sockaddr. This just evaluates to
sa->sa_len. I only add this for consistency with code that
appears in a portable userland library that I am going to
import.

const struct sockaddr *sockaddr_any(const struct sockaddr *sa)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.

const void *sockaddr_anyaddr(const struct sockaddr *sa, socklen_t *slenp)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.


Revision tags: nick-csl-alignment-base5
# 1.136 10-Aug-2007 dyoung

branches: 1.136.2;
Constify. bcopy -> memcpy.


Revision tags: matt-mips64-base
# 1.135 19-Jul-2007 dyoung

branches: 1.135.4; 1.135.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.134 13-Jun-2007 dyoung

branches: 1.134.2;
Persuasive programming: check M_UNWRITABLE(m, len) instead of
m->m_len<len before pulling up, because that helps make it clear
that we m_pullup() in order to guarantee that the contiguous region
is *writable*.


# 1.133 23-May-2007 christos

Ansify + add a few comments, from Karl Sj��dahl


Revision tags: yamt-idlelwp-base8
# 1.132 02-May-2007 dyoung

Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


Revision tags: thorpej-atomic-base
# 1.131 04-Mar-2007 christos

branches: 1.131.2; 1.131.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.130 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


# 1.129 10-Feb-2007 degroote

branches: 1.129.2;
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.128 29-Jan-2007 dyoung

bzero -> memset


# 1.127 15-Jan-2007 dyoung

Cosmetic: indent using ASCII horizontal tab, insert space following
comma, wrap line.


# 1.126 15-Jan-2007 degroote

Fix an infinite loop ( and local dos ) in the case where the ip6_hdr and
the icmp6_hdr are not in the same mbuf.
Fix pr/34994 and probably pr/35333
Ok @rpaulo


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.125 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.124 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


Revision tags: netbsd-4-base
# 1.123 16-Nov-2006 christos

branches: 1.123.2;
__unused removal on arguments; approved by core.


Revision tags: yamt-splraiseipl-base2
# 1.122 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.121 05-Sep-2006 dyoung

branches: 1.121.2; 1.121.4;
Simplify and repair icmp6_input() to stop the kernel from panicking
in m_copydata() when an ICMP6_ECHO_REQUEST is received, as reported
by Tatoku Ogaito on current-users@.


Revision tags: yamt-pdpolicy-base8
# 1.120 01-Sep-2006 dyoung

Vastly simplify the code that copies an ICMP6 packet to two data
paths: ICMP6 reply path, and socket path.


# 1.119 30-Aug-2006 christos

declare the type of code.


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.118 11-Jul-2006 tron

Clear mbuf checksum flags before passing it to ip6_output(). We might
recycle a mbuf which contained a hardware provided checksum. This
fixes "traceroute6" to a machine which is using a wm(4) interface
that has UDP or TCP checksum offload enabled.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base
# 1.117 07-Jun-2006 kardel

branches: 1.117.2;
merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base simonb-timecounters-base
# 1.116 15-Apr-2006 christos

branches: 1.116.2;
Coverity CID 740: Change constant comparisons to MCLBYTES to KASSERT and remove
extraneous tests.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2
# 1.115 05-Mar-2006 rpaulo

branches: 1.115.2; 1.115.4;
NDP-related improvements:
RFC4191
- supports host-side router-preference

RFC3542
- if DAD fails on a interface, disables IPv6 operation on the
interface
- don't advertise MLD report before DAD finishes

Others
- fixes integer overflow for valid and preferred lifetimes
- improves timer granularity for MLD, using callout-timer.
- reflects rtadvd's IPv6 host variable information into kernel
(router only)
- adds a sysctl option to enable/disable pMTUd for multicast
packets
- performs NUD on PPP/GRE interface by default
- Redirect works regardless of ip6_accept_rtadv
- removes RFC1885-related code

From the KAME project via SUZUKI Shinsuke.
Reviewed by core.


Revision tags: yamt-pdpolicy-base
# 1.114 03-Mar-2006 rpaulo

branches: 1.114.2;
Fix typos in comments.

From: the KAME project via SUZUKI Shinsuke.


Revision tags: yamt-uio_vmspace-base5
# 1.113 21-Jan-2006 rpaulo

branches: 1.113.2; 1.113.4;
Better support of IPv6 scoped addresses.

- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.
- and, while there, don't try to remove the ff02::/32 interface route
entry in in6_ifdetach() as it's already gone.

This also includes some level of support for the standard source
address selection algorithm defined in RFC3484, which will be
completed on in the future.

From the KAME project via JINMEI Tatuya.
Approved by core@.


# 1.112 11-Dec-2005 christos

branches: 1.112.2;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base ktrace-lwp-base
# 1.111 19-Oct-2005 bouyer

In icmp6_redirect_output(), sip6 is initialised to point to the data area of
m0. But m0 may be freed later, so trying to use sip6 at the end of this
function is wrong. My guess is that we want to reference the data area
of m (the mbuf about to be send) instead at this point.
Fix a panic on Xen (where a data area of a mbuf may be unmapped when the
mbuf is freed), and probably potential data/pool corruption in other cases.


Revision tags: yamt-vop-base
# 1.110 18-Aug-2005 yamt

branches: 1.110.2;
- introduce M_MOVE_PKTHDR and use it where appropriate.
intended to be mostly API compatible with openbsd/freebsd.
- remove a glue #define in netipsec/ipsec_osdep.h.


# 1.109 29-May-2005 christos

branches: 1.109.2;
- avoid shadowed variables
- sprinkle const.


Revision tags: netbsd-3-1-1-RELEASE netbsd-3-0-3-RELEASE netbsd-3-1-RELEASE netbsd-3-0-2-RELEASE netbsd-3-1-RC4 netbsd-3-1-RC3 netbsd-3-1-RC2 netbsd-3-1-RC1 netbsd-3-0-1-RELEASE netbsd-3-0-RELEASE netbsd-3-0-RC6 netbsd-3-0-RC5 netbsd-3-0-RC4 netbsd-3-0-RC3 netbsd-3-0-RC2 netbsd-3-0-RC1 yamt-km-base4 yamt-km-base3 netbsd-3-base yamt-km-base2 yamt-km-base kent-audio2-base
# 1.108 17-Jan-2005 itojun

branches: 1.108.6; 1.108.8; 1.108.10;
shouldn't check code field on "packet too big" icmp6 message.


Revision tags: kent-audio1-beforemerge kent-audio1-base
# 1.107 25-May-2004 atatat

branches: 1.107.4;
Sysctl descriptions under net subtree (net.key not done)


Revision tags: netbsd-2-0-base
# 1.106 26-Mar-2004 itojun

branches: 1.106.2;
do not touch m->m_pkthdr.rcvif after m becomes invalid. Patrick Latifi


# 1.105 24-Mar-2004 atatat

Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.


# 1.104 17-Dec-2003 lha

Fix ICMPV6CTL_ND6_[DP]RLIST, they broke with new sysctl.
Makes ndp -r/ndp -p work again, patch from atatat


# 1.103 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.102 30-Oct-2003 simonb

Remove some assigned-to but otherwise unused variables.


# 1.101 04-Sep-2003 itojun

revamp inpcb/in6pcb so that they are more aligned with each other.
in6pcb lookup now uses hash(9).


# 1.100 25-Aug-2003 itojun

deref member in in6p directly, don't rely on existence of macro


# 1.99 22-Aug-2003 itojun

remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.


# 1.98 22-Aug-2003 itojun

change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.


# 1.97 22-Aug-2003 jonathan

Replace the set_socket() method of passing an extra struct socket*
argument to ip6_output() with a new explicit struct in6pcb* argument.
(The underlying socket can be obtained via in6pcb->inp6_socket.)

In preparation for fast-ipsec. Reviewed by itojun.


# 1.96 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.95 06-Aug-2003 itojun

m_cat may free mbuf on 2nd arg, so m_pkthdr manipulation has to happen
before m_cat call. from Julian Coleman via kame.


# 1.94 24-Jun-2003 itojun

branches: 1.94.2;
remove unneeded checks of accept_rtadv. from kame


# 1.93 24-Jun-2003 itojun

use time.tv_sec directly


# 1.92 06-Jun-2003 itojun

- sync up MLD declaration with RFC3542 (s/MLD6/MLD/)
- routing header declaration with RFC3542
(note: sizeof(ip6_rthdr0) has changed!)
also, sync up with RFC2460 routing header definition (no "strict" source
routing mode any more)

part of advanced API update (RFC2292 -> 3542).


# 1.91 03-Jun-2003 itojun

remove assumption on redirect header option processing. from kame


# 1.90 14-May-2003 itojun

always use PULLDOWN_TEST codepath.


# 1.89 31-Mar-2003 itojun

avoid mbuf leak in redirect header option attachment. more complete
fix to come. from kame


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.88 27-Sep-2002 provos

remove trailing \n in panic(). approved perry.


# 1.87 23-Sep-2002 simonb

Remove breaks after returns, unreachable returns and returns after
returns(!).


# 1.86 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame.


Revision tags: gehenna-devsw-base
# 1.85 30-Jul-2002 itojun

no need to check NULL mbuf, as we touch it already.
From: tedu <grendel@zeitbombe.org>


# 1.84 10-Jul-2002 itojun

correct ping6 -w result wth hostname with [A-Z]. PR 17540. sync w/kame


# 1.83 30-Jun-2002 thorpej

Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).


# 1.82 09-Jun-2002 itojun

whitespace cleanup


# 1.81 08-Jun-2002 itojun

whitespace cleanup


# 1.80 31-May-2002 itojun

do not mistakenly lock PMTUD route entry with RTV_MTU.


# 1.79 29-May-2002 christos

make this compile again.


# 1.78 29-May-2002 itojun

correct rmx_mtu value after PMTUD entry timeout (should be set to 0)


# 1.77 24-May-2002 itojun

extra blank line


# 1.76 24-May-2002 itojun

make a strict check before sending FQDN node information reply. sync w/kame


Revision tags: netbsd-1-6-base eeh-devprop-base newlock-base
# 1.75 05-Mar-2002 itojun

branches: 1.75.6; 1.75.8;
on redirect output, always try to attach target link layer address option.


Revision tags: ifpoll-base
# 1.74 21-Dec-2001 itojun

whitespace/costmetic sync w/kame


# 1.73 20-Dec-2001 itojun

centralize multicast group management (in6_join/leavegroup).
have a flag for ip6_output() to fragment to minimum MTU.
sync with kame


# 1.72 07-Dec-2001 itojun

correct timing to increment icmp6 MIB variables. sync with kame


# 1.71 13-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-mips-cache-base
# 1.70 29-Oct-2001 simonb

Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.


# 1.69 24-Oct-2001 itojun

more whitespace sync with kame


# 1.68 18-Oct-2001 itojun

branches: 1.68.2;
simplify per-if stats.


# 1.67 15-Oct-2001 itojun

sync with kame.
net.inet6.icmp6.nodeinfo is now a bitmap (2^0 = ping6 -w, 2^1 = ping6 -a).
give up local if there's mbuf alloc failures.
cope with ".." in hostname.
sync comments/whitespaces.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.66 22-Jun-2001 itojun

branches: 1.66.2;
remove RFC1885 compatibility code in #ifdef COMPAT_RFC1885, for icmp6
reply packet size consideration (obsolete, not used for a long time).
sync with kame


# 1.65 01-Jun-2001 itojun

use default hoplimit when incoming interface is not given to icmp6_reflect.
sync with kame


# 1.64 08-May-2001 itojun

correct faith prefix determination. use sys/netinet/if_faith.c:faithprefix()
to determine. sync with kame.
(without this change, non-faith socket may mistakenly accept for-faith traffic)


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.63 04-Apr-2001 itojun

make sure rcvif is sane on call to icmp6_reflect


# 1.62 30-Mar-2001 itojun

enable FAKE_LOOPBACK_IF case by default.
now traffic on loopback interface will be presented to bpf as normal wire
format packet (without KAME scopeid in s6_addr16[1]).

fix KAME PR 250 (host mistakenly accepts packets to fe80::x%lo0).

sync with kame.


# 1.61 21-Mar-2001 itojun

set rmx_mtu to L2 interface mtu, instead of 0, on mtudisc timeout.
ip6_output() change is for safety. sync with kame


# 1.60 08-Mar-2001 itojun

remove bogus rtfree. sync with kame. inspired by openbsd PR 1706.


# 1.59 01-Mar-2001 itojun

branches: 1.59.2;
make sure to enforce inbound ipsec policy checking, for any protocols on top
of ip (check it when final header is visited). sync with kame.
XXX kame team will need to re-check policy engine code


# 1.58 11-Feb-2001 itojun

pull latest kame pcbnotify code. synchronizes ICMPv6 path mtu discovery
behavior with other protocols (i.e. validation, use of hiwat/lowat).


# 1.57 11-Feb-2001 itojun

recover $NetBSD$ (removed by mistake)


# 1.56 10-Feb-2001 itojun

to sync with kame better, (1) remove register declaration for variables,
(2) sync whitespaces, (3) update comments. (4) bring in some of portability
and logging enhancements. no functional changes here.


# 1.55 08-Feb-2001 itojun

implement upper limit to icmp6 redirects (experimental, turned off)
negative value to {mtudisc,redirect}_{hi,lo}wat will turn off the limitation.
sync with kame.


# 1.54 07-Feb-2001 itojun

remove bogus DIAGNOSTIC. sync with kame


# 1.53 07-Feb-2001 itojun

during ip6/icmp6 inbound packet processing, do not call log() nor printf() in
normal operation (/var can get filled up by flodding bogus packets).
sysctl net.inet6.icmp6.nd6_debug will turn on diagnostic messages.
(#define ND6_DEBUG will turn it on by default)

improve stats in ND6 code.

lots of synchronziation with kame (including comments and cometic ones).


# 1.52 24-Jan-2001 itojun

- record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation


# 1.51 16-Jan-2001 itojun

s/ND6DEBUG/ND6_DEBUG/ to meet other places


# 1.50 08-Jan-2001 itojun

wrap icmp6 checksum error printf() into #ifdef ND6DEBUG.
sync with kame, NetBSD PR 11911.


# 1.49 11-Dec-2000 itojun

no need to rtalloc1() twice in pmtud. from kame


# 1.48 09-Dec-2000 itojun

update icmp6 too big validation. the change is necessary since pmtud is
mandatory for IPv6 (so we can't just validate by using connected pcb - we need
to allow traffic from unconnected pcb to do pmtud).
- if the traffic is validated by xx_ctlinput, allow up to "hiwat" pmtud
route entries.
- if the traffic was not validated by xx_ctlinput, allow up to "lowat" pmtud
route entries (there's upper limit, so bad guys cannot blow up our routing
table).
sync with kame

XXX need to think again about default hiwat/lowat value.
XXX victim selection to help starvation case


# 1.47 11-Nov-2000 itojun

improve spec conformance of node information query (07).
sync with kame.


# 1.46 18-Oct-2000 itojun

verify ICMPv6 too big messages based on TCP pcbs, and/or IPsec SA.
TODO: udp6, and sendto consideration. as pmtud is mandatory for IPv6,
it is rather important for us to support those cases.
TODO: more testing
TODO: kame sync


# 1.45 10-Oct-2000 itojun

sync with kame ($KAME$)


# 1.44 02-Oct-2000 itojun

fix compilation without INET. fix confusion between ipsecstat and ipsec6stat.
sync with kame.


# 1.43 16-Sep-2000 itojun

kame sys/netinet6/icmp6.c 1.140 -> 1.144
> in the check for the incoming redirect message, examine the gateway
> (from the routing table) only when the address family of the gateway is
> AF_INET6.


# 1.42 19-Aug-2000 itojun

- icmp6 nodeinfo: remove possibility of unaligned pointer access.
- jumbo payload output: fix incorrect mbuf manipulation
- pedant: align issues, mbuf assumption
(sync with kame)


# 1.41 03-Aug-2000 itojun

clearifications in icmp6 node query support.
XXX previous commit included "supported qtypes" icmp6 node query support.
sorry commit message was mistaken.


# 1.40 03-Aug-2000 itojun

correct typo in #define. ICMP6_NI_SUCESS -> SUCCESS (notice missing C).
sync with kame.


# 1.39 30-Jul-2000 itojun

sync comment with reality


# 1.38 28-Jul-2000 itojun

nuke the following sysctl variables. "ppsratelimit" should work better.
need to recompile sbin/sysctl after updating /usr/include.
net.inet.tcp.rstratelimit
net.inet.icmp.errratelimit
net.inet6.icmp6.errratelimit


# 1.37 09-Jul-2000 itojun

add ppsratelimit(9), which does event-per-sec rate limitation.
use it from icmp6 error rate limitation code.
XXX better name for the function?


# 1.36 07-Jul-2000 itojun

sync with kame.
introduce in6_{recover,embed}scope, for in-kernel scoped-address manipulation.
improve in6_pcbnotify.


# 1.35 06-Jul-2000 itojun

- do not use bitfield for router renumbering header.
- add protection mechanism against ND cache corruption due to bad NUD hints.
- more stats
- icmp6 pps limitation. TOOD: should implement ppsratecheck(9).


# 1.34 28-Jun-2000 mrg

<vm/vm.h> -> <uvm/uvm_extern.h>


Revision tags: netbsd-1-5-base
# 1.33 13-Jun-2000 itojun

branches: 1.33.2;
signedness issue with char, take 2. confirmed with i386 cc -funsigned-char.


# 1.32 13-Jun-2000 itojun

workaround to suppress warning on char == unsigned char arch.


# 1.31 12-Jun-2000 itojun

better conformance to draft-ietf-ipngwg-icmp-name-lookups-05.
the old code was chimera of 03 and 05 draft.

-n by default, since IPv6 reverse lookup takes too much time.
use -H to enable reverse name lookup.


Revision tags: minoura-xpg4dl-base
# 1.30 22-May-2000 itojun

branches: 1.30.2;
disallow negative numbers for ratelimit interval (tcp, icmp, icmp6).


# 1.29 09-May-2000 itojun

do not try NUD unless the gateway is a real neighbor.
real fix to KAME PR 245 (workaround has been implemented).


# 1.28 13-Apr-2000 itojun

do not return icmp6 error against icmp6 error.
(this is due to a bug in header chain chasing)


# 1.27 22-Mar-2000 itojun

use ip6_{last,next}hdr in icmp6 inbound packet parsing.


# 1.26 01-Mar-2000 itojun

introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)


# 1.25 28-Feb-2000 itojun

fix ICMPv6 redirect input. the bug can result in invalid ND entry.


# 1.24 28-Feb-2000 itojun

support draft-ietf-ipngwg-icmp-name-lookups-05.txt, drop support for
draft-ietf-ipngwg-icmp-name-lookups-04.txt.

There are certain bitfield change in 04 draft to 05 draft, which makes
04 "ping6 -a" and 05 "ping6 -a" not interoperable. sigh.


# 1.23 26-Feb-2000 itojun

bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.


# 1.22 17-Feb-2000 darrenr

Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.


# 1.21 15-Feb-2000 thorpej

Fix a couple of brainos in the last.


# 1.20 14-Feb-2000 thorpej

Use ratecheck() for ICMP6 rate limiting.


Revision tags: chs-ubc2-newbase
# 1.19 06-Feb-2000 itojun

fix include pathname for better rfc2292 compliance.


# 1.18 16-Jan-2000 itojun

add missing ipcomp cases.


# 1.17 07-Jan-2000 itohy

Rename variable "prep" for PReP port.


# 1.16 06-Jan-2000 itojun

remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...


# 1.15 05-Jan-2000 itojun

avoid panic on getsockopt(ICMPV6_FILTER).


# 1.14 02-Jan-2000 itojun

add net.inet6.icmp6.nodeinfo sysctl.
this allows you to disable/enable ICMPv6 node information query/reply
processing (which tells remote end the gethostname(3) setting, interface
addresses on the node, and some other things - documented in
draft-ietf-ipngwg-icmp-name-lookup* or something alike).

to test it, try ping6 -w ::1 with nodeinfo=0 and nodeinfo=1.
(sync with kame change)


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.13 15-Dec-1999 itojun

do not overwrite traffic class field when we write IPv6 version field.


# 1.12 13-Dec-1999 itojun

sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)


Revision tags: comdex-fall-1999-base fvdl-softdep-base
# 1.11 01-Oct-1999 itojun

branches: 1.11.2; 1.11.8;
consistent logging for icmp6 redirects
XXX should make logs 1-liner so that duplicated logs can be compressed
by syslog(8)?


Revision tags: chs-ubc2-base
# 1.10 31-Jul-1999 itojun

sync with recent KAME.
- loosen ipsec restriction on packet diredtion.
- revise icmp6 redirect handling on IsRouter bit.
- tcp/udp notification processing (link-local address case)
- cosmetic fixes (better code share across *BSD).


# 1.9 30-Jul-1999 itojun

remove reference to in6_systm.h (file itself will be removed afterwords)


# 1.8 22-Jul-1999 itojun

- implement IPv6 pmtud, which is necessary for TCP6.
- fix memory leak on SO_DEBUG over TCP.


# 1.7 22-Jul-1999 itojun

change unnecessary u_long/long into u_int32_t or something relevant.
more fixes should follow.


# 1.6 09-Jul-1999 thorpej

defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).


# 1.5 06-Jul-1999 itojun

sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups


# 1.4 06-Jul-1999 itojun

checked build on alpha and i386, with GENERIC.v6.
fixed several sizeof(void *) and sizeof(size_t) issues on alpha.

Thanks to: Dave Huang and Tim Rightnour


# 1.3 03-Jul-1999 thorpej

RCS ID police.


# 1.2 01-Jul-1999 itojun

branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.1 28-Jun-1999 itojun

branches: 1.1.2;
file icmp6.c was initially added on branch kame.


# 1.213 02-Aug-2017 ozaki-r

Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.


Revision tags: perseant-stdc-iso10646-base
# 1.212 07-Jul-2017 knakahara

fix PR kern/52353. implemented by ozaki-r@n.o. I just commit by proxy.

XXX need to pullup to -8.


Revision tags: netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.211 14-Mar-2017 ozaki-r

branches: 1.211.6;
Replace DIAGNOSTIC + panic with CTASSERT


# 1.210 17-Feb-2017 ozaki-r

Rename if_acquire_NOMPSAFE to if_acquire

It can be used in MP-safe ways. So let's remove the confusing postfix.
If it's used in a unsafe way, warn NOMPSAFE in a comment.


# 1.209 13-Feb-2017 ozaki-r

Protect mtudisc and redirect stuffs of icmp/icmp6 with mutex

We have to run pr_init of icmp and icmp6 prior to tcp and tcp6 ones
for mutex initialization.


# 1.208 07-Feb-2017 ozaki-r

Add missing NULL checks for m_get_rcvif


Revision tags: nick-nhusb-base-20170204
# 1.207 02-Feb-2017 ozaki-r

Defer some pr_input to workqueue

pr_input is currently called in softint. Some pr_input such as ICMP, ICMPv6
and CARP can add/delete/update IP addresses and routing table entries. For
example, icmp6_redirect_input updates an a routing table entry and
nd6_ra_input may delete an IP address.

Basically such operations shouldn't be done in softint. That aside, we have
a reason to avoid the situation; psz/psref waits cannot be used in softint,
however they are required to work in such pr_input in the MP-safe world.

The change implements the workqueue pr_input framework called wqinput which
provides a means to defer pr_input of a protocol to workqueue easily.
Currently icmp_input, icmp6_input, carp_proto_input and carp6_proto_input
are deferred to workqueue by the framework.

Proposed and discussed on tech-kern and tech-net


# 1.206 16-Jan-2017 christos

ip6_sprintf -> IN6_PRINT so that we pass the size.


# 1.205 16-Jan-2017 ryo

Make ip6_sprintf(), in_fmtaddr(), lla_snprintf() and icmp6_redirect_diag() mpsafe.

Reviewed by ozaki-r@


Revision tags: bouyer-socketcan-base
# 1.204 13-Jan-2017 ozaki-r

branches: 1.204.2;
Tweak icmp6_input; always use off, not *offp


Revision tags: pgoyette-localcount-20170107
# 1.203 12-Dec-2016 ozaki-r

Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.202 11-Dec-2016 ozaki-r

Correct sanity checks of icmp6_redirect_output

- rt->rt_ifp is always non-NULL
- Checking RTF_UP here is just racy and meaningless
- The arguments should be non-NULL (at least for now)


Revision tags: nick-nhusb-base-20161204
# 1.201 15-Nov-2016 mlelstv

Enforce alignment requirements that are violated in some cases.
For machines that don't need strict alignment (i386,amd64,vax,m68k) this
is a no-op.

Fixes PR kern/50766 but should be improved.


Revision tags: pgoyette-localcount-20161104
# 1.200 31-Oct-2016 ozaki-r

Fix race condition of in6_selectsrc

in6_selectsrc returned a pointer to in6_addr that wan't guaranteed to be
safe by pserialize (or psref), which was racy. Let callers pass a pointer
to in6_addr and in6_selectsrc copy a result to it inside pserialize
critical sections.


# 1.199 25-Oct-2016 ozaki-r

Remove unnecessary argument

No functional change.


# 1.198 18-Oct-2016 ozaki-r

Remove unnecessary pserialize_read_enter


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.197 26-Aug-2016 dholland

PR 51434 David Binderman: remove redundant test.


# 1.196 19-Aug-2016 roy

Revert r1.148
IP6_EXTHDR_GET ensures that a icmp6 header can be fetched from the mbuf
so m_pullup does not need to be called.

While here, we can safely increament interface error stats even with an
invalidated mbuf because we have a saved reference to the interface.


Revision tags: pgoyette-localcount-20160806
# 1.195 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


Revision tags: pgoyette-localcount-20160726
# 1.194 15-Jul-2016 ozaki-r

Use sin6tosa and sin6tocsa macros

No functional change.


# 1.193 15-Jul-2016 ozaki-r

Use ifatoia6 macro

No functional change.


Revision tags: pgoyette-localcount-base nick-nhusb-base-20160907
# 1.192 07-Jul-2016 ozaki-r

branches: 1.192.2;
Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.


# 1.191 05-Jul-2016 ozaki-r

Use ia6 or ia instead of ifa as a variable name of struct in6_ifaddr

We conventionally use ifa for struct ifaddr and use ia6 or ia for
struct in6_ifaddr.

No functional change.


# 1.190 28-Jun-2016 ozaki-r

Add missing NULL checks for m_get_rcvif_psref


# 1.189 21-Jun-2016 ozaki-r

Make sure returning ifp from in6_select* functions psref-ed

To this end, callers need to pass struct psref to the functions
and the fuctions acquire a reference of ifp with it. In some cases,
we can simply use if_get_byindex, however, in other cases
(say rt->rt_ifp and ia->ifa_ifp), we have no MP-safe way for now.
In order to take a reference anyway we use non MP-safe function
if_acquire_NOMPSAFE for the latter cases. They should be fixed in
the future somehow.


# 1.188 10-Jun-2016 ozaki-r

Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.


# 1.187 10-Jun-2016 ozaki-r

Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.186 18-May-2016 ozaki-r

Don't try to get outif unnecessarily from in6_selectsrc

The got outif is unused.


# 1.185 17-May-2016 ozaki-r

Get rcvif once and reuse it

No functional change.


# 1.184 17-May-2016 ozaki-r

Make sure icmp6_redirect_input frees mbuf before return


# 1.183 12-May-2016 ozaki-r

Protect ifnet list with psz and psref

The change ensures that ifnet objects in the ifnet list aren't freed during
list iterations by using pserialize(9) and psref(9).

Note that the change adds a pslist(9) for ifnet but doesn't remove the
original ifnet list (ifnet_list) to avoid breaking kvm(3) users. We
shouldn't use the original list in the kernel anymore.


Revision tags: nick-nhusb-base-20160422
# 1.182 04-Apr-2016 ozaki-r

Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html


# 1.181 01-Apr-2016 ozaki-r

Remove unnecessary casts and do s/0/NULL/ for rtrequest


# 1.180 01-Apr-2016 ozaki-r

Refine nd6log

Add __func__ to nd6log itself instead of adding it to callers.


Revision tags: nick-nhusb-base-20160319
# 1.179 21-Jan-2016 riastradh

Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.


# 1.178 21-Jan-2016 riastradh

Give proper prototype to ip_output.


Revision tags: nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.177 14-Sep-2015 ozaki-r

Update icmp6_redirect_timeout_q when changing net.inet6.icmp6.redirtimeout

We have to update icmp6_redirect_timeout_q as well as icmp6_redirtimeout
when changing net.inet6.icmp6.redirtimeout via sysctl. The updating logic
is copied from sysctl_net_inet_icmp_redirtimeout.

This change is from s-yamaguchi@IIJ (with KNF by ozaki-r) and fixes
PR kern/50240.


# 1.176 31-Aug-2015 ozaki-r

Make rt_refcnt take into account rt_timer


# 1.175 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.174 24-Aug-2015 ozaki-r

Change 0 to NULL for rtrequest's last argument (struct rtentry **ret_nrt)


# 1.173 07-Aug-2015 ozaki-r

Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.


# 1.172 24-Jul-2015 ozaki-r

Fix rtfree-ing wrong rtentry


# 1.171 17-Jul-2015 ozaki-r

Reform use of rt_refcnt

rt_refcnt of rtentry was used in bad manners, for example, direct rt_refcnt++
and rt_refcnt-- outside route.c, "rt->rt_refcnt++; rtfree(rt);" idiom, and
touching rt after rt->rt_refcnt--.

These abuses seem to be needed because rt_refcnt manages only references
between rtentry and doesn't take care of references during packet processing
(IOW references from local variables). In order to reduce the above abuses,
the latter cases should be counted by rt_refcnt as well as the former cases.

This change improves consistency of use of rt_refcnt:
- rtentry is always accessed with rt_refcnt incremented
- rtentry's rt_refcnt is decremented after use (rtfree is always used instead
of rt_refcnt--)
- functions returning rtentry increment its rt_refcnt (and caller rtfree it)

Note that rt_refcnt prevents rtentry from being freed but doesn't prevent
rtentry from being updated. Toward MP-safe, we need to provide another
protection for rtentry, e.g., locks. (Or introduce a better data structure
allowing concurrent readers during updates.)


Revision tags: nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.170 25-Nov-2014 christos

branches: 1.170.2;
CID 977389: Out of bounds access.


Revision tags: netbsd-7-0-2-RELEASE netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.169 06-Jun-2014 rmind

branches: 1.169.2;
- Eliminate RTFREE() macro in favour of rtfree() function.
- Make rtcache() function static.


# 1.168 30-May-2014 christos

Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.


# 1.167 19-May-2014 rmind

- Split off PRU_ATTACH and PRU_DETACH logic into separate functions.
- Replace malloc with kmem and eliminate M_PCB while here.
- Sprinkle more asserts.


Revision tags: rmind-smpnet-nbase rmind-smpnet-base
# 1.166 18-May-2014 rmind

Use IFNET_FIRST() rather than open coding ifnet access.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.165 25-Feb-2014 pooka

branches: 1.165.2;
Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.


# 1.164 20-Feb-2014 joerg

Bail out in case m_pulldown failed.


# 1.163 23-Nov-2013 christos

convert from CIRCLEQ to TAILQ.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.162 05-Jun-2013 christos

branches: 1.162.2;
IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.


Revision tags: agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.161 23-Jun-2012 christos

branches: 1.161.2;
4 new sysctls to avoid ipv6 DoS attacks from OpenBSD


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.160 22-Mar-2012 drochner

remove KAME IPSEC, replaced by FAST_IPSEC


Revision tags: netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.159 31-Dec-2011 christos

branches: 1.159.2; 1.159.6; 1.159.8;
- fix offsetof usage, and redundant defines
- kill pointer casts to 0


# 1.158 19-Dec-2011 drochner

rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.157 31-Aug-2011 plunky

branches: 1.157.2; 1.157.6;
NULL does not need a cast


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 rmind-uvmplock-base
# 1.156 12-Sep-2010 drochner

avoid NULL dereference in error case


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.155 18-Oct-2009 christos

branches: 1.155.2; 1.155.4;
fix the sun2 case for real.


# 1.154 12-Oct-2009 christos

unbreak sun2.


# 1.153 16-Sep-2009 pooka

Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.152 18-Mar-2009 cegger

bzero -> memset


# 1.151 18-Mar-2009 cegger

bcmp -> memcmp


Revision tags: netbsd-5-2-3-RELEASE netbsd-5-1-5-RELEASE netbsd-5-2-2-RELEASE netbsd-5-1-4-RELEASE netbsd-5-2-1-RELEASE netbsd-5-1-3-RELEASE netbsd-5-2-RELEASE netbsd-5-2-RC1 netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 nick-hppapmap-base2 netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 haad-dm-base mjf-devfs2-base
# 1.150 03-Oct-2008 adrianp

branches: 1.150.2; 1.150.8;
Fix for CVE-2008-3530 from matt@
Implement improved checking for MTU values on ICMP 'Packet Too Big Messages'


Revision tags: wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.149 06-Aug-2008 plunky

Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base
# 1.148 07-May-2008 bouyer

branches: 1.148.2; 1.148.6;
Sync with ipv4 icmp_input(): make sure the mbuf is writable and
contains the entire icmp message befre calling icmp6_input().
should fix "panic: mbuf too short for IPv6 header" seen by several peoples.


# 1.147 04-May-2008 thorpej

Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577


Revision tags: yamt-nfs-mp-base
# 1.146 23-Apr-2008 thorpej

branches: 1.146.2;
Use <net/net_stats.h> / netstat_sysctl().


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.145 15-Apr-2008 thorpej

branches: 1.145.2;
Make ip6 and icmp6 stats per-cpu.


# 1.144 08-Apr-2008 thorpej

Change IPv6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ip6stat structure; old netstat
binaries will continue to work properly.


# 1.143 08-Apr-2008 thorpej

Change ICMP6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old icmp6stat structure; old netstat
binaries will continue to work properly.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.142 27-Feb-2008 matt

Convert to ansi definitions from old-style definitons.
Remember that func() is not ansi, func(void) is.


Revision tags: nick-net80211-sync-base bouyer-xeni386-merge1 vmlocking2-base3 bouyer-xeni386-nbase yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 bouyer-xeni386-base yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase mjf-devfs-base matt-armv6-base jmcneill-pm-base hpcarm-cleanup-base reinoud-bufcleanup-base
# 1.141 04-Dec-2007 dyoung

branches: 1.141.8; 1.141.12;
Use IFNET_FOREACH() and IFADDR_FOREACH().


Revision tags: vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.140 01-Nov-2007 dyoung

branches: 1.140.2; 1.140.4;
De-__P().


# 1.139 29-Oct-2007 dyoung

The IPv6 stack labels incoming packets with an m_tag whose payload
is a struct ip6aux. A struct ip6aux used to contain a pointer to
an in6_ifaddr, but that pointer could become a dangling reference
in the lifetime of the m_tag, because ip6_setdstifaddr() did not
increase the in6_ifaddr's reference count. I have removed the
pointer from ip6aux. I load it with the interesting fields from
the in6_ifaddr (an IPv6 address, a scope ID, and some flags),
instead.


# 1.138 24-Oct-2007 dyoung

Replace rote sockaddr_in6 initializations (memset(), set sa6_family,
sa6_len, and sa6_add) with sockaddr_in6_init() calls.

De-__P(). Constify. KNF. Shorten a staircase. Change bcmp() to
memcmp().

Extract subroutine in6_setzoneid() from in6_setscope(), for re-use
soon.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base vmlocking-base
# 1.137 19-Sep-2007 dyoung

branches: 1.137.4;
1) Introduce a new socket option, (SOL_SOCKET, SO_NOHEADER), that
tells a socket that it should both add a protocol header to tx'd
datagrams and remove the header from rx'd datagrams:

int onoff = 1, s = socket(...);
setsockopt(s, SOL_SOCKET, SO_NOHEADER, &onoff);

2) Add an implementation of (SOL_SOCKET, SO_NOHEADER) for raw IPv4
sockets.

3) Reorganize the protocols' pr_ctloutput implementations a bit.
Consistently return ENOPROTOOPT when an option is unsupported,
and EINVAL if a supported option's arguments are incorrect.
Reorganize the flow of code so that it's more clear how/when
options are passed down the stack until they are handled.

Shorten some pr_ctloutput staircases for readability.

4) Extract common mbuf code into subroutines, add new sockaddr
methods, and introduce a new subroutine, fsocreate(), for reuse
later; use it first in sys_socket():

struct mbuf *m_getsombuf(struct socket *so)

Create an mbuf and make its owner the socket `so'.

struct mbuf *m_intopt(struct socket *so, int val)

Create an mbuf, make its owner the socket `so', put the
int `val' into it, and set its length to sizeof(int).


int fsocreate(..., int *fd)

Create a socket, a la socreate(9), put the socket into the
given LWP's descriptor table, return the descriptor at `fd'
on success.

void *sockaddr_addr(struct sockaddr *sa, socklen_t *slenp)
const void *sockaddr_const_addr(const struct sockaddr *sa, socklen_t *slenp)

Extract a pointer to the address part of a sockaddr. Write
the length of the address part at `slenp', if `slenp' is
not NULL.

socklen_t sockaddr_getlen(const struct sockaddr *sa)

Return the length of a sockaddr. This just evaluates to
sa->sa_len. I only add this for consistency with code that
appears in a portable userland library that I am going to
import.

const struct sockaddr *sockaddr_any(const struct sockaddr *sa)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.

const void *sockaddr_anyaddr(const struct sockaddr *sa, socklen_t *slenp)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.


Revision tags: nick-csl-alignment-base5
# 1.136 10-Aug-2007 dyoung

branches: 1.136.2;
Constify. bcopy -> memcpy.


Revision tags: matt-mips64-base
# 1.135 19-Jul-2007 dyoung

branches: 1.135.4; 1.135.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.134 13-Jun-2007 dyoung

branches: 1.134.2;
Persuasive programming: check M_UNWRITABLE(m, len) instead of
m->m_len<len before pulling up, because that helps make it clear
that we m_pullup() in order to guarantee that the contiguous region
is *writable*.


# 1.133 23-May-2007 christos

Ansify + add a few comments, from Karl Sj��dahl


Revision tags: yamt-idlelwp-base8
# 1.132 02-May-2007 dyoung

Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


Revision tags: thorpej-atomic-base
# 1.131 04-Mar-2007 christos

branches: 1.131.2; 1.131.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.130 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


# 1.129 10-Feb-2007 degroote

branches: 1.129.2;
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.128 29-Jan-2007 dyoung

bzero -> memset


# 1.127 15-Jan-2007 dyoung

Cosmetic: indent using ASCII horizontal tab, insert space following
comma, wrap line.


# 1.126 15-Jan-2007 degroote

Fix an infinite loop ( and local dos ) in the case where the ip6_hdr and
the icmp6_hdr are not in the same mbuf.
Fix pr/34994 and probably pr/35333
Ok @rpaulo


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.125 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.124 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


Revision tags: netbsd-4-base
# 1.123 16-Nov-2006 christos

branches: 1.123.2;
__unused removal on arguments; approved by core.


Revision tags: yamt-splraiseipl-base2
# 1.122 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.121 05-Sep-2006 dyoung

branches: 1.121.2; 1.121.4;
Simplify and repair icmp6_input() to stop the kernel from panicking
in m_copydata() when an ICMP6_ECHO_REQUEST is received, as reported
by Tatoku Ogaito on current-users@.


Revision tags: yamt-pdpolicy-base8
# 1.120 01-Sep-2006 dyoung

Vastly simplify the code that copies an ICMP6 packet to two data
paths: ICMP6 reply path, and socket path.


# 1.119 30-Aug-2006 christos

declare the type of code.


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.118 11-Jul-2006 tron

Clear mbuf checksum flags before passing it to ip6_output(). We might
recycle a mbuf which contained a hardware provided checksum. This
fixes "traceroute6" to a machine which is using a wm(4) interface
that has UDP or TCP checksum offload enabled.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base
# 1.117 07-Jun-2006 kardel

branches: 1.117.2;
merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base simonb-timecounters-base
# 1.116 15-Apr-2006 christos

branches: 1.116.2;
Coverity CID 740: Change constant comparisons to MCLBYTES to KASSERT and remove
extraneous tests.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2
# 1.115 05-Mar-2006 rpaulo

branches: 1.115.2; 1.115.4;
NDP-related improvements:
RFC4191
- supports host-side router-preference

RFC3542
- if DAD fails on a interface, disables IPv6 operation on the
interface
- don't advertise MLD report before DAD finishes

Others
- fixes integer overflow for valid and preferred lifetimes
- improves timer granularity for MLD, using callout-timer.
- reflects rtadvd's IPv6 host variable information into kernel
(router only)
- adds a sysctl option to enable/disable pMTUd for multicast
packets
- performs NUD on PPP/GRE interface by default
- Redirect works regardless of ip6_accept_rtadv
- removes RFC1885-related code

From the KAME project via SUZUKI Shinsuke.
Reviewed by core.


Revision tags: yamt-pdpolicy-base
# 1.114 03-Mar-2006 rpaulo

branches: 1.114.2;
Fix typos in comments.

From: the KAME project via SUZUKI Shinsuke.


Revision tags: yamt-uio_vmspace-base5
# 1.113 21-Jan-2006 rpaulo

branches: 1.113.2; 1.113.4;
Better support of IPv6 scoped addresses.

- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.
- and, while there, don't try to remove the ff02::/32 interface route
entry in in6_ifdetach() as it's already gone.

This also includes some level of support for the standard source
address selection algorithm defined in RFC3484, which will be
completed on in the future.

From the KAME project via JINMEI Tatuya.
Approved by core@.


# 1.112 11-Dec-2005 christos

branches: 1.112.2;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base ktrace-lwp-base
# 1.111 19-Oct-2005 bouyer

In icmp6_redirect_output(), sip6 is initialised to point to the data area of
m0. But m0 may be freed later, so trying to use sip6 at the end of this
function is wrong. My guess is that we want to reference the data area
of m (the mbuf about to be send) instead at this point.
Fix a panic on Xen (where a data area of a mbuf may be unmapped when the
mbuf is freed), and probably potential data/pool corruption in other cases.


Revision tags: yamt-vop-base
# 1.110 18-Aug-2005 yamt

branches: 1.110.2;
- introduce M_MOVE_PKTHDR and use it where appropriate.
intended to be mostly API compatible with openbsd/freebsd.
- remove a glue #define in netipsec/ipsec_osdep.h.


# 1.109 29-May-2005 christos

branches: 1.109.2;
- avoid shadowed variables
- sprinkle const.


Revision tags: netbsd-3-1-1-RELEASE netbsd-3-0-3-RELEASE netbsd-3-1-RELEASE netbsd-3-0-2-RELEASE netbsd-3-1-RC4 netbsd-3-1-RC3 netbsd-3-1-RC2 netbsd-3-1-RC1 netbsd-3-0-1-RELEASE netbsd-3-0-RELEASE netbsd-3-0-RC6 netbsd-3-0-RC5 netbsd-3-0-RC4 netbsd-3-0-RC3 netbsd-3-0-RC2 netbsd-3-0-RC1 yamt-km-base4 yamt-km-base3 netbsd-3-base yamt-km-base2 yamt-km-base kent-audio2-base
# 1.108 17-Jan-2005 itojun

branches: 1.108.6; 1.108.8; 1.108.10;
shouldn't check code field on "packet too big" icmp6 message.


Revision tags: kent-audio1-beforemerge kent-audio1-base
# 1.107 25-May-2004 atatat

branches: 1.107.4;
Sysctl descriptions under net subtree (net.key not done)


Revision tags: netbsd-2-0-base
# 1.106 26-Mar-2004 itojun

branches: 1.106.2;
do not touch m->m_pkthdr.rcvif after m becomes invalid. Patrick Latifi


# 1.105 24-Mar-2004 atatat

Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.


# 1.104 17-Dec-2003 lha

Fix ICMPV6CTL_ND6_[DP]RLIST, they broke with new sysctl.
Makes ndp -r/ndp -p work again, patch from atatat


# 1.103 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.102 30-Oct-2003 simonb

Remove some assigned-to but otherwise unused variables.


# 1.101 04-Sep-2003 itojun

revamp inpcb/in6pcb so that they are more aligned with each other.
in6pcb lookup now uses hash(9).


# 1.100 25-Aug-2003 itojun

deref member in in6p directly, don't rely on existence of macro


# 1.99 22-Aug-2003 itojun

remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.


# 1.98 22-Aug-2003 itojun

change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.


# 1.97 22-Aug-2003 jonathan

Replace the set_socket() method of passing an extra struct socket*
argument to ip6_output() with a new explicit struct in6pcb* argument.
(The underlying socket can be obtained via in6pcb->inp6_socket.)

In preparation for fast-ipsec. Reviewed by itojun.


# 1.96 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.95 06-Aug-2003 itojun

m_cat may free mbuf on 2nd arg, so m_pkthdr manipulation has to happen
before m_cat call. from Julian Coleman via kame.


# 1.94 24-Jun-2003 itojun

branches: 1.94.2;
remove unneeded checks of accept_rtadv. from kame


# 1.93 24-Jun-2003 itojun

use time.tv_sec directly


# 1.92 06-Jun-2003 itojun

- sync up MLD declaration with RFC3542 (s/MLD6/MLD/)
- routing header declaration with RFC3542
(note: sizeof(ip6_rthdr0) has changed!)
also, sync up with RFC2460 routing header definition (no "strict" source
routing mode any more)

part of advanced API update (RFC2292 -> 3542).


# 1.91 03-Jun-2003 itojun

remove assumption on redirect header option processing. from kame


# 1.90 14-May-2003 itojun

always use PULLDOWN_TEST codepath.


# 1.89 31-Mar-2003 itojun

avoid mbuf leak in redirect header option attachment. more complete
fix to come. from kame


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.88 27-Sep-2002 provos

remove trailing \n in panic(). approved perry.


# 1.87 23-Sep-2002 simonb

Remove breaks after returns, unreachable returns and returns after
returns(!).


# 1.86 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame.


Revision tags: gehenna-devsw-base
# 1.85 30-Jul-2002 itojun

no need to check NULL mbuf, as we touch it already.
From: tedu <grendel@zeitbombe.org>


# 1.84 10-Jul-2002 itojun

correct ping6 -w result wth hostname with [A-Z]. PR 17540. sync w/kame


# 1.83 30-Jun-2002 thorpej

Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).


# 1.82 09-Jun-2002 itojun

whitespace cleanup


# 1.81 08-Jun-2002 itojun

whitespace cleanup


# 1.80 31-May-2002 itojun

do not mistakenly lock PMTUD route entry with RTV_MTU.


# 1.79 29-May-2002 christos

make this compile again.


# 1.78 29-May-2002 itojun

correct rmx_mtu value after PMTUD entry timeout (should be set to 0)


# 1.77 24-May-2002 itojun

extra blank line


# 1.76 24-May-2002 itojun

make a strict check before sending FQDN node information reply. sync w/kame


Revision tags: netbsd-1-6-base eeh-devprop-base newlock-base
# 1.75 05-Mar-2002 itojun

branches: 1.75.6; 1.75.8;
on redirect output, always try to attach target link layer address option.


Revision tags: ifpoll-base
# 1.74 21-Dec-2001 itojun

whitespace/costmetic sync w/kame


# 1.73 20-Dec-2001 itojun

centralize multicast group management (in6_join/leavegroup).
have a flag for ip6_output() to fragment to minimum MTU.
sync with kame


# 1.72 07-Dec-2001 itojun

correct timing to increment icmp6 MIB variables. sync with kame


# 1.71 13-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-mips-cache-base
# 1.70 29-Oct-2001 simonb

Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.


# 1.69 24-Oct-2001 itojun

more whitespace sync with kame


# 1.68 18-Oct-2001 itojun

branches: 1.68.2;
simplify per-if stats.


# 1.67 15-Oct-2001 itojun

sync with kame.
net.inet6.icmp6.nodeinfo is now a bitmap (2^0 = ping6 -w, 2^1 = ping6 -a).
give up local if there's mbuf alloc failures.
cope with ".." in hostname.
sync comments/whitespaces.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.66 22-Jun-2001 itojun

branches: 1.66.2;
remove RFC1885 compatibility code in #ifdef COMPAT_RFC1885, for icmp6
reply packet size consideration (obsolete, not used for a long time).
sync with kame


# 1.65 01-Jun-2001 itojun

use default hoplimit when incoming interface is not given to icmp6_reflect.
sync with kame


# 1.64 08-May-2001 itojun

correct faith prefix determination. use sys/netinet/if_faith.c:faithprefix()
to determine. sync with kame.
(without this change, non-faith socket may mistakenly accept for-faith traffic)


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.63 04-Apr-2001 itojun

make sure rcvif is sane on call to icmp6_reflect


# 1.62 30-Mar-2001 itojun

enable FAKE_LOOPBACK_IF case by default.
now traffic on loopback interface will be presented to bpf as normal wire
format packet (without KAME scopeid in s6_addr16[1]).

fix KAME PR 250 (host mistakenly accepts packets to fe80::x%lo0).

sync with kame.


# 1.61 21-Mar-2001 itojun

set rmx_mtu to L2 interface mtu, instead of 0, on mtudisc timeout.
ip6_output() change is for safety. sync with kame


# 1.60 08-Mar-2001 itojun

remove bogus rtfree. sync with kame. inspired by openbsd PR 1706.


# 1.59 01-Mar-2001 itojun

branches: 1.59.2;
make sure to enforce inbound ipsec policy checking, for any protocols on top
of ip (check it when final header is visited). sync with kame.
XXX kame team will need to re-check policy engine code


# 1.58 11-Feb-2001 itojun

pull latest kame pcbnotify code. synchronizes ICMPv6 path mtu discovery
behavior with other protocols (i.e. validation, use of hiwat/lowat).


# 1.57 11-Feb-2001 itojun

recover $NetBSD$ (removed by mistake)


# 1.56 10-Feb-2001 itojun

to sync with kame better, (1) remove register declaration for variables,
(2) sync whitespaces, (3) update comments. (4) bring in some of portability
and logging enhancements. no functional changes here.


# 1.55 08-Feb-2001 itojun

implement upper limit to icmp6 redirects (experimental, turned off)
negative value to {mtudisc,redirect}_{hi,lo}wat will turn off the limitation.
sync with kame.


# 1.54 07-Feb-2001 itojun

remove bogus DIAGNOSTIC. sync with kame


# 1.53 07-Feb-2001 itojun

during ip6/icmp6 inbound packet processing, do not call log() nor printf() in
normal operation (/var can get filled up by flodding bogus packets).
sysctl net.inet6.icmp6.nd6_debug will turn on diagnostic messages.
(#define ND6_DEBUG will turn it on by default)

improve stats in ND6 code.

lots of synchronziation with kame (including comments and cometic ones).


# 1.52 24-Jan-2001 itojun

- record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation


# 1.51 16-Jan-2001 itojun

s/ND6DEBUG/ND6_DEBUG/ to meet other places


# 1.50 08-Jan-2001 itojun

wrap icmp6 checksum error printf() into #ifdef ND6DEBUG.
sync with kame, NetBSD PR 11911.


# 1.49 11-Dec-2000 itojun

no need to rtalloc1() twice in pmtud. from kame


# 1.48 09-Dec-2000 itojun

update icmp6 too big validation. the change is necessary since pmtud is
mandatory for IPv6 (so we can't just validate by using connected pcb - we need
to allow traffic from unconnected pcb to do pmtud).
- if the traffic is validated by xx_ctlinput, allow up to "hiwat" pmtud
route entries.
- if the traffic was not validated by xx_ctlinput, allow up to "lowat" pmtud
route entries (there's upper limit, so bad guys cannot blow up our routing
table).
sync with kame

XXX need to think again about default hiwat/lowat value.
XXX victim selection to help starvation case


# 1.47 11-Nov-2000 itojun

improve spec conformance of node information query (07).
sync with kame.


# 1.46 18-Oct-2000 itojun

verify ICMPv6 too big messages based on TCP pcbs, and/or IPsec SA.
TODO: udp6, and sendto consideration. as pmtud is mandatory for IPv6,
it is rather important for us to support those cases.
TODO: more testing
TODO: kame sync


# 1.45 10-Oct-2000 itojun

sync with kame ($KAME$)


# 1.44 02-Oct-2000 itojun

fix compilation without INET. fix confusion between ipsecstat and ipsec6stat.
sync with kame.


# 1.43 16-Sep-2000 itojun

kame sys/netinet6/icmp6.c 1.140 -> 1.144
> in the check for the incoming redirect message, examine the gateway
> (from the routing table) only when the address family of the gateway is
> AF_INET6.


# 1.42 19-Aug-2000 itojun

- icmp6 nodeinfo: remove possibility of unaligned pointer access.
- jumbo payload output: fix incorrect mbuf manipulation
- pedant: align issues, mbuf assumption
(sync with kame)


# 1.41 03-Aug-2000 itojun

clearifications in icmp6 node query support.
XXX previous commit included "supported qtypes" icmp6 node query support.
sorry commit message was mistaken.


# 1.40 03-Aug-2000 itojun

correct typo in #define. ICMP6_NI_SUCESS -> SUCCESS (notice missing C).
sync with kame.


# 1.39 30-Jul-2000 itojun

sync comment with reality


# 1.38 28-Jul-2000 itojun

nuke the following sysctl variables. "ppsratelimit" should work better.
need to recompile sbin/sysctl after updating /usr/include.
net.inet.tcp.rstratelimit
net.inet.icmp.errratelimit
net.inet6.icmp6.errratelimit


# 1.37 09-Jul-2000 itojun

add ppsratelimit(9), which does event-per-sec rate limitation.
use it from icmp6 error rate limitation code.
XXX better name for the function?


# 1.36 07-Jul-2000 itojun

sync with kame.
introduce in6_{recover,embed}scope, for in-kernel scoped-address manipulation.
improve in6_pcbnotify.


# 1.35 06-Jul-2000 itojun

- do not use bitfield for router renumbering header.
- add protection mechanism against ND cache corruption due to bad NUD hints.
- more stats
- icmp6 pps limitation. TOOD: should implement ppsratecheck(9).


# 1.34 28-Jun-2000 mrg

<vm/vm.h> -> <uvm/uvm_extern.h>


Revision tags: netbsd-1-5-base
# 1.33 13-Jun-2000 itojun

branches: 1.33.2;
signedness issue with char, take 2. confirmed with i386 cc -funsigned-char.


# 1.32 13-Jun-2000 itojun

workaround to suppress warning on char == unsigned char arch.


# 1.31 12-Jun-2000 itojun

better conformance to draft-ietf-ipngwg-icmp-name-lookups-05.
the old code was chimera of 03 and 05 draft.

-n by default, since IPv6 reverse lookup takes too much time.
use -H to enable reverse name lookup.


Revision tags: minoura-xpg4dl-base
# 1.30 22-May-2000 itojun

branches: 1.30.2;
disallow negative numbers for ratelimit interval (tcp, icmp, icmp6).


# 1.29 09-May-2000 itojun

do not try NUD unless the gateway is a real neighbor.
real fix to KAME PR 245 (workaround has been implemented).


# 1.28 13-Apr-2000 itojun

do not return icmp6 error against icmp6 error.
(this is due to a bug in header chain chasing)


# 1.27 22-Mar-2000 itojun

use ip6_{last,next}hdr in icmp6 inbound packet parsing.


# 1.26 01-Mar-2000 itojun

introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)


# 1.25 28-Feb-2000 itojun

fix ICMPv6 redirect input. the bug can result in invalid ND entry.


# 1.24 28-Feb-2000 itojun

support draft-ietf-ipngwg-icmp-name-lookups-05.txt, drop support for
draft-ietf-ipngwg-icmp-name-lookups-04.txt.

There are certain bitfield change in 04 draft to 05 draft, which makes
04 "ping6 -a" and 05 "ping6 -a" not interoperable. sigh.


# 1.23 26-Feb-2000 itojun

bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.


# 1.22 17-Feb-2000 darrenr

Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.


# 1.21 15-Feb-2000 thorpej

Fix a couple of brainos in the last.


# 1.20 14-Feb-2000 thorpej

Use ratecheck() for ICMP6 rate limiting.


Revision tags: chs-ubc2-newbase
# 1.19 06-Feb-2000 itojun

fix include pathname for better rfc2292 compliance.


# 1.18 16-Jan-2000 itojun

add missing ipcomp cases.


# 1.17 07-Jan-2000 itohy

Rename variable "prep" for PReP port.


# 1.16 06-Jan-2000 itojun

remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...


# 1.15 05-Jan-2000 itojun

avoid panic on getsockopt(ICMPV6_FILTER).


# 1.14 02-Jan-2000 itojun

add net.inet6.icmp6.nodeinfo sysctl.
this allows you to disable/enable ICMPv6 node information query/reply
processing (which tells remote end the gethostname(3) setting, interface
addresses on the node, and some other things - documented in
draft-ietf-ipngwg-icmp-name-lookup* or something alike).

to test it, try ping6 -w ::1 with nodeinfo=0 and nodeinfo=1.
(sync with kame change)


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.13 15-Dec-1999 itojun

do not overwrite traffic class field when we write IPv6 version field.


# 1.12 13-Dec-1999 itojun

sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)


Revision tags: comdex-fall-1999-base fvdl-softdep-base
# 1.11 01-Oct-1999 itojun

branches: 1.11.2; 1.11.8;
consistent logging for icmp6 redirects
XXX should make logs 1-liner so that duplicated logs can be compressed
by syslog(8)?


Revision tags: chs-ubc2-base
# 1.10 31-Jul-1999 itojun

sync with recent KAME.
- loosen ipsec restriction on packet diredtion.
- revise icmp6 redirect handling on IsRouter bit.
- tcp/udp notification processing (link-local address case)
- cosmetic fixes (better code share across *BSD).


# 1.9 30-Jul-1999 itojun

remove reference to in6_systm.h (file itself will be removed afterwords)


# 1.8 22-Jul-1999 itojun

- implement IPv6 pmtud, which is necessary for TCP6.
- fix memory leak on SO_DEBUG over TCP.


# 1.7 22-Jul-1999 itojun

change unnecessary u_long/long into u_int32_t or something relevant.
more fixes should follow.


# 1.6 09-Jul-1999 thorpej

defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).


# 1.5 06-Jul-1999 itojun

sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups


# 1.4 06-Jul-1999 itojun

checked build on alpha and i386, with GENERIC.v6.
fixed several sizeof(void *) and sizeof(size_t) issues on alpha.

Thanks to: Dave Huang and Tim Rightnour


# 1.3 03-Jul-1999 thorpej

RCS ID police.


# 1.2 01-Jul-1999 itojun

branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.1 28-Jun-1999 itojun

branches: 1.1.2;
file icmp6.c was initially added on branch kame.


# 1.212 07-Jul-2017 knakahara

fix PR kern/52353. implemented by ozaki-r@n.o. I just commit by proxy.

XXX need to pullup to -8.


Revision tags: netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1 jdolecek-ncq-base pgoyette-localcount-20170320
# 1.211 14-Mar-2017 ozaki-r

Replace DIAGNOSTIC + panic with CTASSERT


# 1.210 17-Feb-2017 ozaki-r

Rename if_acquire_NOMPSAFE to if_acquire

It can be used in MP-safe ways. So let's remove the confusing postfix.
If it's used in a unsafe way, warn NOMPSAFE in a comment.


# 1.209 13-Feb-2017 ozaki-r

Protect mtudisc and redirect stuffs of icmp/icmp6 with mutex

We have to run pr_init of icmp and icmp6 prior to tcp and tcp6 ones
for mutex initialization.


# 1.208 07-Feb-2017 ozaki-r

Add missing NULL checks for m_get_rcvif


Revision tags: nick-nhusb-base-20170204
# 1.207 02-Feb-2017 ozaki-r

Defer some pr_input to workqueue

pr_input is currently called in softint. Some pr_input such as ICMP, ICMPv6
and CARP can add/delete/update IP addresses and routing table entries. For
example, icmp6_redirect_input updates an a routing table entry and
nd6_ra_input may delete an IP address.

Basically such operations shouldn't be done in softint. That aside, we have
a reason to avoid the situation; psz/psref waits cannot be used in softint,
however they are required to work in such pr_input in the MP-safe world.

The change implements the workqueue pr_input framework called wqinput which
provides a means to defer pr_input of a protocol to workqueue easily.
Currently icmp_input, icmp6_input, carp_proto_input and carp6_proto_input
are deferred to workqueue by the framework.

Proposed and discussed on tech-kern and tech-net


# 1.206 16-Jan-2017 christos

ip6_sprintf -> IN6_PRINT so that we pass the size.


# 1.205 16-Jan-2017 ryo

Make ip6_sprintf(), in_fmtaddr(), lla_snprintf() and icmp6_redirect_diag() mpsafe.

Reviewed by ozaki-r@


Revision tags: bouyer-socketcan-base
# 1.204 13-Jan-2017 ozaki-r

branches: 1.204.2;
Tweak icmp6_input; always use off, not *offp


Revision tags: pgoyette-localcount-20170107
# 1.203 12-Dec-2016 ozaki-r

Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.202 11-Dec-2016 ozaki-r

Correct sanity checks of icmp6_redirect_output

- rt->rt_ifp is always non-NULL
- Checking RTF_UP here is just racy and meaningless
- The arguments should be non-NULL (at least for now)


Revision tags: nick-nhusb-base-20161204
# 1.201 15-Nov-2016 mlelstv

Enforce alignment requirements that are violated in some cases.
For machines that don't need strict alignment (i386,amd64,vax,m68k) this
is a no-op.

Fixes PR kern/50766 but should be improved.


Revision tags: pgoyette-localcount-20161104
# 1.200 31-Oct-2016 ozaki-r

Fix race condition of in6_selectsrc

in6_selectsrc returned a pointer to in6_addr that wan't guaranteed to be
safe by pserialize (or psref), which was racy. Let callers pass a pointer
to in6_addr and in6_selectsrc copy a result to it inside pserialize
critical sections.


# 1.199 25-Oct-2016 ozaki-r

Remove unnecessary argument

No functional change.


# 1.198 18-Oct-2016 ozaki-r

Remove unnecessary pserialize_read_enter


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.197 26-Aug-2016 dholland

PR 51434 David Binderman: remove redundant test.


# 1.196 19-Aug-2016 roy

Revert r1.148
IP6_EXTHDR_GET ensures that a icmp6 header can be fetched from the mbuf
so m_pullup does not need to be called.

While here, we can safely increament interface error stats even with an
invalidated mbuf because we have a saved reference to the interface.


Revision tags: pgoyette-localcount-20160806
# 1.195 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


Revision tags: pgoyette-localcount-20160726
# 1.194 15-Jul-2016 ozaki-r

Use sin6tosa and sin6tocsa macros

No functional change.


# 1.193 15-Jul-2016 ozaki-r

Use ifatoia6 macro

No functional change.


Revision tags: pgoyette-localcount-base nick-nhusb-base-20160907
# 1.192 07-Jul-2016 ozaki-r

branches: 1.192.2;
Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.


# 1.191 05-Jul-2016 ozaki-r

Use ia6 or ia instead of ifa as a variable name of struct in6_ifaddr

We conventionally use ifa for struct ifaddr and use ia6 or ia for
struct in6_ifaddr.

No functional change.


# 1.190 28-Jun-2016 ozaki-r

Add missing NULL checks for m_get_rcvif_psref


# 1.189 21-Jun-2016 ozaki-r

Make sure returning ifp from in6_select* functions psref-ed

To this end, callers need to pass struct psref to the functions
and the fuctions acquire a reference of ifp with it. In some cases,
we can simply use if_get_byindex, however, in other cases
(say rt->rt_ifp and ia->ifa_ifp), we have no MP-safe way for now.
In order to take a reference anyway we use non MP-safe function
if_acquire_NOMPSAFE for the latter cases. They should be fixed in
the future somehow.


# 1.188 10-Jun-2016 ozaki-r

Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.


# 1.187 10-Jun-2016 ozaki-r

Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.186 18-May-2016 ozaki-r

Don't try to get outif unnecessarily from in6_selectsrc

The got outif is unused.


# 1.185 17-May-2016 ozaki-r

Get rcvif once and reuse it

No functional change.


# 1.184 17-May-2016 ozaki-r

Make sure icmp6_redirect_input frees mbuf before return


# 1.183 12-May-2016 ozaki-r

Protect ifnet list with psz and psref

The change ensures that ifnet objects in the ifnet list aren't freed during
list iterations by using pserialize(9) and psref(9).

Note that the change adds a pslist(9) for ifnet but doesn't remove the
original ifnet list (ifnet_list) to avoid breaking kvm(3) users. We
shouldn't use the original list in the kernel anymore.


Revision tags: nick-nhusb-base-20160422
# 1.182 04-Apr-2016 ozaki-r

Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html


# 1.181 01-Apr-2016 ozaki-r

Remove unnecessary casts and do s/0/NULL/ for rtrequest


# 1.180 01-Apr-2016 ozaki-r

Refine nd6log

Add __func__ to nd6log itself instead of adding it to callers.


Revision tags: nick-nhusb-base-20160319
# 1.179 21-Jan-2016 riastradh

Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.


# 1.178 21-Jan-2016 riastradh

Give proper prototype to ip_output.


Revision tags: nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.177 14-Sep-2015 ozaki-r

Update icmp6_redirect_timeout_q when changing net.inet6.icmp6.redirtimeout

We have to update icmp6_redirect_timeout_q as well as icmp6_redirtimeout
when changing net.inet6.icmp6.redirtimeout via sysctl. The updating logic
is copied from sysctl_net_inet_icmp_redirtimeout.

This change is from s-yamaguchi@IIJ (with KNF by ozaki-r) and fixes
PR kern/50240.


# 1.176 31-Aug-2015 ozaki-r

Make rt_refcnt take into account rt_timer


# 1.175 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.174 24-Aug-2015 ozaki-r

Change 0 to NULL for rtrequest's last argument (struct rtentry **ret_nrt)


# 1.173 07-Aug-2015 ozaki-r

Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.


# 1.172 24-Jul-2015 ozaki-r

Fix rtfree-ing wrong rtentry


# 1.171 17-Jul-2015 ozaki-r

Reform use of rt_refcnt

rt_refcnt of rtentry was used in bad manners, for example, direct rt_refcnt++
and rt_refcnt-- outside route.c, "rt->rt_refcnt++; rtfree(rt);" idiom, and
touching rt after rt->rt_refcnt--.

These abuses seem to be needed because rt_refcnt manages only references
between rtentry and doesn't take care of references during packet processing
(IOW references from local variables). In order to reduce the above abuses,
the latter cases should be counted by rt_refcnt as well as the former cases.

This change improves consistency of use of rt_refcnt:
- rtentry is always accessed with rt_refcnt incremented
- rtentry's rt_refcnt is decremented after use (rtfree is always used instead
of rt_refcnt--)
- functions returning rtentry increment its rt_refcnt (and caller rtfree it)

Note that rt_refcnt prevents rtentry from being freed but doesn't prevent
rtentry from being updated. Toward MP-safe, we need to provide another
protection for rtentry, e.g., locks. (Or introduce a better data structure
allowing concurrent readers during updates.)


Revision tags: nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.170 25-Nov-2014 christos

branches: 1.170.2;
CID 977389: Out of bounds access.


Revision tags: netbsd-7-0-2-RELEASE netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.169 06-Jun-2014 rmind

branches: 1.169.2;
- Eliminate RTFREE() macro in favour of rtfree() function.
- Make rtcache() function static.


# 1.168 30-May-2014 christos

Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.


# 1.167 19-May-2014 rmind

- Split off PRU_ATTACH and PRU_DETACH logic into separate functions.
- Replace malloc with kmem and eliminate M_PCB while here.
- Sprinkle more asserts.


Revision tags: rmind-smpnet-nbase rmind-smpnet-base
# 1.166 18-May-2014 rmind

Use IFNET_FIRST() rather than open coding ifnet access.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.165 25-Feb-2014 pooka

branches: 1.165.2;
Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.


# 1.164 20-Feb-2014 joerg

Bail out in case m_pulldown failed.


# 1.163 23-Nov-2013 christos

convert from CIRCLEQ to TAILQ.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.162 05-Jun-2013 christos

branches: 1.162.2;
IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.


Revision tags: agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.161 23-Jun-2012 christos

branches: 1.161.2;
4 new sysctls to avoid ipv6 DoS attacks from OpenBSD


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.160 22-Mar-2012 drochner

remove KAME IPSEC, replaced by FAST_IPSEC


Revision tags: netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.159 31-Dec-2011 christos

branches: 1.159.2; 1.159.6; 1.159.8;
- fix offsetof usage, and redundant defines
- kill pointer casts to 0


# 1.158 19-Dec-2011 drochner

rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.157 31-Aug-2011 plunky

branches: 1.157.2; 1.157.6;
NULL does not need a cast


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 rmind-uvmplock-base
# 1.156 12-Sep-2010 drochner

avoid NULL dereference in error case


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.155 18-Oct-2009 christos

branches: 1.155.2; 1.155.4;
fix the sun2 case for real.


# 1.154 12-Oct-2009 christos

unbreak sun2.


# 1.153 16-Sep-2009 pooka

Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.152 18-Mar-2009 cegger

bzero -> memset


# 1.151 18-Mar-2009 cegger

bcmp -> memcmp


Revision tags: netbsd-5-2-3-RELEASE netbsd-5-1-5-RELEASE netbsd-5-2-2-RELEASE netbsd-5-1-4-RELEASE netbsd-5-2-1-RELEASE netbsd-5-1-3-RELEASE netbsd-5-2-RELEASE netbsd-5-2-RC1 netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 nick-hppapmap-base2 netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 haad-dm-base mjf-devfs2-base
# 1.150 03-Oct-2008 adrianp

branches: 1.150.2; 1.150.8;
Fix for CVE-2008-3530 from matt@
Implement improved checking for MTU values on ICMP 'Packet Too Big Messages'


Revision tags: wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.149 06-Aug-2008 plunky

Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base
# 1.148 07-May-2008 bouyer

branches: 1.148.2; 1.148.6;
Sync with ipv4 icmp_input(): make sure the mbuf is writable and
contains the entire icmp message befre calling icmp6_input().
should fix "panic: mbuf too short for IPv6 header" seen by several peoples.


# 1.147 04-May-2008 thorpej

Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577


Revision tags: yamt-nfs-mp-base
# 1.146 23-Apr-2008 thorpej

branches: 1.146.2;
Use <net/net_stats.h> / netstat_sysctl().


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.145 15-Apr-2008 thorpej

branches: 1.145.2;
Make ip6 and icmp6 stats per-cpu.


# 1.144 08-Apr-2008 thorpej

Change IPv6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ip6stat structure; old netstat
binaries will continue to work properly.


# 1.143 08-Apr-2008 thorpej

Change ICMP6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old icmp6stat structure; old netstat
binaries will continue to work properly.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.142 27-Feb-2008 matt

Convert to ansi definitions from old-style definitons.
Remember that func() is not ansi, func(void) is.


Revision tags: nick-net80211-sync-base bouyer-xeni386-merge1 vmlocking2-base3 bouyer-xeni386-nbase yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 bouyer-xeni386-base yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase mjf-devfs-base matt-armv6-base jmcneill-pm-base hpcarm-cleanup-base reinoud-bufcleanup-base
# 1.141 04-Dec-2007 dyoung

branches: 1.141.8; 1.141.12;
Use IFNET_FOREACH() and IFADDR_FOREACH().


Revision tags: vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.140 01-Nov-2007 dyoung

branches: 1.140.2; 1.140.4;
De-__P().


# 1.139 29-Oct-2007 dyoung

The IPv6 stack labels incoming packets with an m_tag whose payload
is a struct ip6aux. A struct ip6aux used to contain a pointer to
an in6_ifaddr, but that pointer could become a dangling reference
in the lifetime of the m_tag, because ip6_setdstifaddr() did not
increase the in6_ifaddr's reference count. I have removed the
pointer from ip6aux. I load it with the interesting fields from
the in6_ifaddr (an IPv6 address, a scope ID, and some flags),
instead.


# 1.138 24-Oct-2007 dyoung

Replace rote sockaddr_in6 initializations (memset(), set sa6_family,
sa6_len, and sa6_add) with sockaddr_in6_init() calls.

De-__P(). Constify. KNF. Shorten a staircase. Change bcmp() to
memcmp().

Extract subroutine in6_setzoneid() from in6_setscope(), for re-use
soon.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base vmlocking-base
# 1.137 19-Sep-2007 dyoung

branches: 1.137.4;
1) Introduce a new socket option, (SOL_SOCKET, SO_NOHEADER), that
tells a socket that it should both add a protocol header to tx'd
datagrams and remove the header from rx'd datagrams:

int onoff = 1, s = socket(...);
setsockopt(s, SOL_SOCKET, SO_NOHEADER, &onoff);

2) Add an implementation of (SOL_SOCKET, SO_NOHEADER) for raw IPv4
sockets.

3) Reorganize the protocols' pr_ctloutput implementations a bit.
Consistently return ENOPROTOOPT when an option is unsupported,
and EINVAL if a supported option's arguments are incorrect.
Reorganize the flow of code so that it's more clear how/when
options are passed down the stack until they are handled.

Shorten some pr_ctloutput staircases for readability.

4) Extract common mbuf code into subroutines, add new sockaddr
methods, and introduce a new subroutine, fsocreate(), for reuse
later; use it first in sys_socket():

struct mbuf *m_getsombuf(struct socket *so)

Create an mbuf and make its owner the socket `so'.

struct mbuf *m_intopt(struct socket *so, int val)

Create an mbuf, make its owner the socket `so', put the
int `val' into it, and set its length to sizeof(int).


int fsocreate(..., int *fd)

Create a socket, a la socreate(9), put the socket into the
given LWP's descriptor table, return the descriptor at `fd'
on success.

void *sockaddr_addr(struct sockaddr *sa, socklen_t *slenp)
const void *sockaddr_const_addr(const struct sockaddr *sa, socklen_t *slenp)

Extract a pointer to the address part of a sockaddr. Write
the length of the address part at `slenp', if `slenp' is
not NULL.

socklen_t sockaddr_getlen(const struct sockaddr *sa)

Return the length of a sockaddr. This just evaluates to
sa->sa_len. I only add this for consistency with code that
appears in a portable userland library that I am going to
import.

const struct sockaddr *sockaddr_any(const struct sockaddr *sa)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.

const void *sockaddr_anyaddr(const struct sockaddr *sa, socklen_t *slenp)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.


Revision tags: nick-csl-alignment-base5
# 1.136 10-Aug-2007 dyoung

branches: 1.136.2;
Constify. bcopy -> memcpy.


Revision tags: matt-mips64-base
# 1.135 19-Jul-2007 dyoung

branches: 1.135.4; 1.135.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.134 13-Jun-2007 dyoung

branches: 1.134.2;
Persuasive programming: check M_UNWRITABLE(m, len) instead of
m->m_len<len before pulling up, because that helps make it clear
that we m_pullup() in order to guarantee that the contiguous region
is *writable*.


# 1.133 23-May-2007 christos

Ansify + add a few comments, from Karl Sj��dahl


Revision tags: yamt-idlelwp-base8
# 1.132 02-May-2007 dyoung

Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


Revision tags: thorpej-atomic-base
# 1.131 04-Mar-2007 christos

branches: 1.131.2; 1.131.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.130 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


# 1.129 10-Feb-2007 degroote

branches: 1.129.2;
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.128 29-Jan-2007 dyoung

bzero -> memset


# 1.127 15-Jan-2007 dyoung

Cosmetic: indent using ASCII horizontal tab, insert space following
comma, wrap line.


# 1.126 15-Jan-2007 degroote

Fix an infinite loop ( and local dos ) in the case where the ip6_hdr and
the icmp6_hdr are not in the same mbuf.
Fix pr/34994 and probably pr/35333
Ok @rpaulo


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.125 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.124 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


Revision tags: netbsd-4-base
# 1.123 16-Nov-2006 christos

branches: 1.123.2;
__unused removal on arguments; approved by core.


Revision tags: yamt-splraiseipl-base2
# 1.122 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.121 05-Sep-2006 dyoung

branches: 1.121.2; 1.121.4;
Simplify and repair icmp6_input() to stop the kernel from panicking
in m_copydata() when an ICMP6_ECHO_REQUEST is received, as reported
by Tatoku Ogaito on current-users@.


Revision tags: yamt-pdpolicy-base8
# 1.120 01-Sep-2006 dyoung

Vastly simplify the code that copies an ICMP6 packet to two data
paths: ICMP6 reply path, and socket path.


# 1.119 30-Aug-2006 christos

declare the type of code.


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.118 11-Jul-2006 tron

Clear mbuf checksum flags before passing it to ip6_output(). We might
recycle a mbuf which contained a hardware provided checksum. This
fixes "traceroute6" to a machine which is using a wm(4) interface
that has UDP or TCP checksum offload enabled.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base
# 1.117 07-Jun-2006 kardel

branches: 1.117.2;
merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base simonb-timecounters-base
# 1.116 15-Apr-2006 christos

branches: 1.116.2;
Coverity CID 740: Change constant comparisons to MCLBYTES to KASSERT and remove
extraneous tests.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2
# 1.115 05-Mar-2006 rpaulo

branches: 1.115.2; 1.115.4;
NDP-related improvements:
RFC4191
- supports host-side router-preference

RFC3542
- if DAD fails on a interface, disables IPv6 operation on the
interface
- don't advertise MLD report before DAD finishes

Others
- fixes integer overflow for valid and preferred lifetimes
- improves timer granularity for MLD, using callout-timer.
- reflects rtadvd's IPv6 host variable information into kernel
(router only)
- adds a sysctl option to enable/disable pMTUd for multicast
packets
- performs NUD on PPP/GRE interface by default
- Redirect works regardless of ip6_accept_rtadv
- removes RFC1885-related code

From the KAME project via SUZUKI Shinsuke.
Reviewed by core.


Revision tags: yamt-pdpolicy-base
# 1.114 03-Mar-2006 rpaulo

branches: 1.114.2;
Fix typos in comments.

From: the KAME project via SUZUKI Shinsuke.


Revision tags: yamt-uio_vmspace-base5
# 1.113 21-Jan-2006 rpaulo

branches: 1.113.2; 1.113.4;
Better support of IPv6 scoped addresses.

- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.
- and, while there, don't try to remove the ff02::/32 interface route
entry in in6_ifdetach() as it's already gone.

This also includes some level of support for the standard source
address selection algorithm defined in RFC3484, which will be
completed on in the future.

From the KAME project via JINMEI Tatuya.
Approved by core@.


# 1.112 11-Dec-2005 christos

branches: 1.112.2;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base ktrace-lwp-base
# 1.111 19-Oct-2005 bouyer

In icmp6_redirect_output(), sip6 is initialised to point to the data area of
m0. But m0 may be freed later, so trying to use sip6 at the end of this
function is wrong. My guess is that we want to reference the data area
of m (the mbuf about to be send) instead at this point.
Fix a panic on Xen (where a data area of a mbuf may be unmapped when the
mbuf is freed), and probably potential data/pool corruption in other cases.


Revision tags: yamt-vop-base
# 1.110 18-Aug-2005 yamt

branches: 1.110.2;
- introduce M_MOVE_PKTHDR and use it where appropriate.
intended to be mostly API compatible with openbsd/freebsd.
- remove a glue #define in netipsec/ipsec_osdep.h.


# 1.109 29-May-2005 christos

branches: 1.109.2;
- avoid shadowed variables
- sprinkle const.


Revision tags: netbsd-3-1-1-RELEASE netbsd-3-0-3-RELEASE netbsd-3-1-RELEASE netbsd-3-0-2-RELEASE netbsd-3-1-RC4 netbsd-3-1-RC3 netbsd-3-1-RC2 netbsd-3-1-RC1 netbsd-3-0-1-RELEASE netbsd-3-0-RELEASE netbsd-3-0-RC6 netbsd-3-0-RC5 netbsd-3-0-RC4 netbsd-3-0-RC3 netbsd-3-0-RC2 netbsd-3-0-RC1 yamt-km-base4 yamt-km-base3 netbsd-3-base yamt-km-base2 yamt-km-base kent-audio2-base
# 1.108 17-Jan-2005 itojun

branches: 1.108.6; 1.108.8; 1.108.10;
shouldn't check code field on "packet too big" icmp6 message.


Revision tags: kent-audio1-beforemerge kent-audio1-base
# 1.107 25-May-2004 atatat

branches: 1.107.4;
Sysctl descriptions under net subtree (net.key not done)


Revision tags: netbsd-2-0-base
# 1.106 26-Mar-2004 itojun

branches: 1.106.2;
do not touch m->m_pkthdr.rcvif after m becomes invalid. Patrick Latifi


# 1.105 24-Mar-2004 atatat

Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.


# 1.104 17-Dec-2003 lha

Fix ICMPV6CTL_ND6_[DP]RLIST, they broke with new sysctl.
Makes ndp -r/ndp -p work again, patch from atatat


# 1.103 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.102 30-Oct-2003 simonb

Remove some assigned-to but otherwise unused variables.


# 1.101 04-Sep-2003 itojun

revamp inpcb/in6pcb so that they are more aligned with each other.
in6pcb lookup now uses hash(9).


# 1.100 25-Aug-2003 itojun

deref member in in6p directly, don't rely on existence of macro


# 1.99 22-Aug-2003 itojun

remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.


# 1.98 22-Aug-2003 itojun

change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.


# 1.97 22-Aug-2003 jonathan

Replace the set_socket() method of passing an extra struct socket*
argument to ip6_output() with a new explicit struct in6pcb* argument.
(The underlying socket can be obtained via in6pcb->inp6_socket.)

In preparation for fast-ipsec. Reviewed by itojun.


# 1.96 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.95 06-Aug-2003 itojun

m_cat may free mbuf on 2nd arg, so m_pkthdr manipulation has to happen
before m_cat call. from Julian Coleman via kame.


# 1.94 24-Jun-2003 itojun

branches: 1.94.2;
remove unneeded checks of accept_rtadv. from kame


# 1.93 24-Jun-2003 itojun

use time.tv_sec directly


# 1.92 06-Jun-2003 itojun

- sync up MLD declaration with RFC3542 (s/MLD6/MLD/)
- routing header declaration with RFC3542
(note: sizeof(ip6_rthdr0) has changed!)
also, sync up with RFC2460 routing header definition (no "strict" source
routing mode any more)

part of advanced API update (RFC2292 -> 3542).


# 1.91 03-Jun-2003 itojun

remove assumption on redirect header option processing. from kame


# 1.90 14-May-2003 itojun

always use PULLDOWN_TEST codepath.


# 1.89 31-Mar-2003 itojun

avoid mbuf leak in redirect header option attachment. more complete
fix to come. from kame


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.88 27-Sep-2002 provos

remove trailing \n in panic(). approved perry.


# 1.87 23-Sep-2002 simonb

Remove breaks after returns, unreachable returns and returns after
returns(!).


# 1.86 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame.


Revision tags: gehenna-devsw-base
# 1.85 30-Jul-2002 itojun

no need to check NULL mbuf, as we touch it already.
From: tedu <grendel@zeitbombe.org>


# 1.84 10-Jul-2002 itojun

correct ping6 -w result wth hostname with [A-Z]. PR 17540. sync w/kame


# 1.83 30-Jun-2002 thorpej

Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).


# 1.82 09-Jun-2002 itojun

whitespace cleanup


# 1.81 08-Jun-2002 itojun

whitespace cleanup


# 1.80 31-May-2002 itojun

do not mistakenly lock PMTUD route entry with RTV_MTU.


# 1.79 29-May-2002 christos

make this compile again.


# 1.78 29-May-2002 itojun

correct rmx_mtu value after PMTUD entry timeout (should be set to 0)


# 1.77 24-May-2002 itojun

extra blank line


# 1.76 24-May-2002 itojun

make a strict check before sending FQDN node information reply. sync w/kame


Revision tags: netbsd-1-6-base eeh-devprop-base newlock-base
# 1.75 05-Mar-2002 itojun

branches: 1.75.6; 1.75.8;
on redirect output, always try to attach target link layer address option.


Revision tags: ifpoll-base
# 1.74 21-Dec-2001 itojun

whitespace/costmetic sync w/kame


# 1.73 20-Dec-2001 itojun

centralize multicast group management (in6_join/leavegroup).
have a flag for ip6_output() to fragment to minimum MTU.
sync with kame


# 1.72 07-Dec-2001 itojun

correct timing to increment icmp6 MIB variables. sync with kame


# 1.71 13-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-mips-cache-base
# 1.70 29-Oct-2001 simonb

Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.


# 1.69 24-Oct-2001 itojun

more whitespace sync with kame


# 1.68 18-Oct-2001 itojun

branches: 1.68.2;
simplify per-if stats.


# 1.67 15-Oct-2001 itojun

sync with kame.
net.inet6.icmp6.nodeinfo is now a bitmap (2^0 = ping6 -w, 2^1 = ping6 -a).
give up local if there's mbuf alloc failures.
cope with ".." in hostname.
sync comments/whitespaces.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.66 22-Jun-2001 itojun

branches: 1.66.2;
remove RFC1885 compatibility code in #ifdef COMPAT_RFC1885, for icmp6
reply packet size consideration (obsolete, not used for a long time).
sync with kame


# 1.65 01-Jun-2001 itojun

use default hoplimit when incoming interface is not given to icmp6_reflect.
sync with kame


# 1.64 08-May-2001 itojun

correct faith prefix determination. use sys/netinet/if_faith.c:faithprefix()
to determine. sync with kame.
(without this change, non-faith socket may mistakenly accept for-faith traffic)


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.63 04-Apr-2001 itojun

make sure rcvif is sane on call to icmp6_reflect


# 1.62 30-Mar-2001 itojun

enable FAKE_LOOPBACK_IF case by default.
now traffic on loopback interface will be presented to bpf as normal wire
format packet (without KAME scopeid in s6_addr16[1]).

fix KAME PR 250 (host mistakenly accepts packets to fe80::x%lo0).

sync with kame.


# 1.61 21-Mar-2001 itojun

set rmx_mtu to L2 interface mtu, instead of 0, on mtudisc timeout.
ip6_output() change is for safety. sync with kame


# 1.60 08-Mar-2001 itojun

remove bogus rtfree. sync with kame. inspired by openbsd PR 1706.


# 1.59 01-Mar-2001 itojun

branches: 1.59.2;
make sure to enforce inbound ipsec policy checking, for any protocols on top
of ip (check it when final header is visited). sync with kame.
XXX kame team will need to re-check policy engine code


# 1.58 11-Feb-2001 itojun

pull latest kame pcbnotify code. synchronizes ICMPv6 path mtu discovery
behavior with other protocols (i.e. validation, use of hiwat/lowat).


# 1.57 11-Feb-2001 itojun

recover $NetBSD$ (removed by mistake)


# 1.56 10-Feb-2001 itojun

to sync with kame better, (1) remove register declaration for variables,
(2) sync whitespaces, (3) update comments. (4) bring in some of portability
and logging enhancements. no functional changes here.


# 1.55 08-Feb-2001 itojun

implement upper limit to icmp6 redirects (experimental, turned off)
negative value to {mtudisc,redirect}_{hi,lo}wat will turn off the limitation.
sync with kame.


# 1.54 07-Feb-2001 itojun

remove bogus DIAGNOSTIC. sync with kame


# 1.53 07-Feb-2001 itojun

during ip6/icmp6 inbound packet processing, do not call log() nor printf() in
normal operation (/var can get filled up by flodding bogus packets).
sysctl net.inet6.icmp6.nd6_debug will turn on diagnostic messages.
(#define ND6_DEBUG will turn it on by default)

improve stats in ND6 code.

lots of synchronziation with kame (including comments and cometic ones).


# 1.52 24-Jan-2001 itojun

- record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation


# 1.51 16-Jan-2001 itojun

s/ND6DEBUG/ND6_DEBUG/ to meet other places


# 1.50 08-Jan-2001 itojun

wrap icmp6 checksum error printf() into #ifdef ND6DEBUG.
sync with kame, NetBSD PR 11911.


# 1.49 11-Dec-2000 itojun

no need to rtalloc1() twice in pmtud. from kame


# 1.48 09-Dec-2000 itojun

update icmp6 too big validation. the change is necessary since pmtud is
mandatory for IPv6 (so we can't just validate by using connected pcb - we need
to allow traffic from unconnected pcb to do pmtud).
- if the traffic is validated by xx_ctlinput, allow up to "hiwat" pmtud
route entries.
- if the traffic was not validated by xx_ctlinput, allow up to "lowat" pmtud
route entries (there's upper limit, so bad guys cannot blow up our routing
table).
sync with kame

XXX need to think again about default hiwat/lowat value.
XXX victim selection to help starvation case


# 1.47 11-Nov-2000 itojun

improve spec conformance of node information query (07).
sync with kame.


# 1.46 18-Oct-2000 itojun

verify ICMPv6 too big messages based on TCP pcbs, and/or IPsec SA.
TODO: udp6, and sendto consideration. as pmtud is mandatory for IPv6,
it is rather important for us to support those cases.
TODO: more testing
TODO: kame sync


# 1.45 10-Oct-2000 itojun

sync with kame ($KAME$)


# 1.44 02-Oct-2000 itojun

fix compilation without INET. fix confusion between ipsecstat and ipsec6stat.
sync with kame.


# 1.43 16-Sep-2000 itojun

kame sys/netinet6/icmp6.c 1.140 -> 1.144
> in the check for the incoming redirect message, examine the gateway
> (from the routing table) only when the address family of the gateway is
> AF_INET6.


# 1.42 19-Aug-2000 itojun

- icmp6 nodeinfo: remove possibility of unaligned pointer access.
- jumbo payload output: fix incorrect mbuf manipulation
- pedant: align issues, mbuf assumption
(sync with kame)


# 1.41 03-Aug-2000 itojun

clearifications in icmp6 node query support.
XXX previous commit included "supported qtypes" icmp6 node query support.
sorry commit message was mistaken.


# 1.40 03-Aug-2000 itojun

correct typo in #define. ICMP6_NI_SUCESS -> SUCCESS (notice missing C).
sync with kame.


# 1.39 30-Jul-2000 itojun

sync comment with reality


# 1.38 28-Jul-2000 itojun

nuke the following sysctl variables. "ppsratelimit" should work better.
need to recompile sbin/sysctl after updating /usr/include.
net.inet.tcp.rstratelimit
net.inet.icmp.errratelimit
net.inet6.icmp6.errratelimit


# 1.37 09-Jul-2000 itojun

add ppsratelimit(9), which does event-per-sec rate limitation.
use it from icmp6 error rate limitation code.
XXX better name for the function?


# 1.36 07-Jul-2000 itojun

sync with kame.
introduce in6_{recover,embed}scope, for in-kernel scoped-address manipulation.
improve in6_pcbnotify.


# 1.35 06-Jul-2000 itojun

- do not use bitfield for router renumbering header.
- add protection mechanism against ND cache corruption due to bad NUD hints.
- more stats
- icmp6 pps limitation. TOOD: should implement ppsratecheck(9).


# 1.34 28-Jun-2000 mrg

<vm/vm.h> -> <uvm/uvm_extern.h>


Revision tags: netbsd-1-5-base
# 1.33 13-Jun-2000 itojun

branches: 1.33.2;
signedness issue with char, take 2. confirmed with i386 cc -funsigned-char.


# 1.32 13-Jun-2000 itojun

workaround to suppress warning on char == unsigned char arch.


# 1.31 12-Jun-2000 itojun

better conformance to draft-ietf-ipngwg-icmp-name-lookups-05.
the old code was chimera of 03 and 05 draft.

-n by default, since IPv6 reverse lookup takes too much time.
use -H to enable reverse name lookup.


Revision tags: minoura-xpg4dl-base
# 1.30 22-May-2000 itojun

branches: 1.30.2;
disallow negative numbers for ratelimit interval (tcp, icmp, icmp6).


# 1.29 09-May-2000 itojun

do not try NUD unless the gateway is a real neighbor.
real fix to KAME PR 245 (workaround has been implemented).


# 1.28 13-Apr-2000 itojun

do not return icmp6 error against icmp6 error.
(this is due to a bug in header chain chasing)


# 1.27 22-Mar-2000 itojun

use ip6_{last,next}hdr in icmp6 inbound packet parsing.


# 1.26 01-Mar-2000 itojun

introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)


# 1.25 28-Feb-2000 itojun

fix ICMPv6 redirect input. the bug can result in invalid ND entry.


# 1.24 28-Feb-2000 itojun

support draft-ietf-ipngwg-icmp-name-lookups-05.txt, drop support for
draft-ietf-ipngwg-icmp-name-lookups-04.txt.

There are certain bitfield change in 04 draft to 05 draft, which makes
04 "ping6 -a" and 05 "ping6 -a" not interoperable. sigh.


# 1.23 26-Feb-2000 itojun

bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.


# 1.22 17-Feb-2000 darrenr

Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.


# 1.21 15-Feb-2000 thorpej

Fix a couple of brainos in the last.


# 1.20 14-Feb-2000 thorpej

Use ratecheck() for ICMP6 rate limiting.


Revision tags: chs-ubc2-newbase
# 1.19 06-Feb-2000 itojun

fix include pathname for better rfc2292 compliance.


# 1.18 16-Jan-2000 itojun

add missing ipcomp cases.


# 1.17 07-Jan-2000 itohy

Rename variable "prep" for PReP port.


# 1.16 06-Jan-2000 itojun

remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...


# 1.15 05-Jan-2000 itojun

avoid panic on getsockopt(ICMPV6_FILTER).


# 1.14 02-Jan-2000 itojun

add net.inet6.icmp6.nodeinfo sysctl.
this allows you to disable/enable ICMPv6 node information query/reply
processing (which tells remote end the gethostname(3) setting, interface
addresses on the node, and some other things - documented in
draft-ietf-ipngwg-icmp-name-lookup* or something alike).

to test it, try ping6 -w ::1 with nodeinfo=0 and nodeinfo=1.
(sync with kame change)


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.13 15-Dec-1999 itojun

do not overwrite traffic class field when we write IPv6 version field.


# 1.12 13-Dec-1999 itojun

sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)


Revision tags: comdex-fall-1999-base fvdl-softdep-base
# 1.11 01-Oct-1999 itojun

branches: 1.11.2; 1.11.8;
consistent logging for icmp6 redirects
XXX should make logs 1-liner so that duplicated logs can be compressed
by syslog(8)?


Revision tags: chs-ubc2-base
# 1.10 31-Jul-1999 itojun

sync with recent KAME.
- loosen ipsec restriction on packet diredtion.
- revise icmp6 redirect handling on IsRouter bit.
- tcp/udp notification processing (link-local address case)
- cosmetic fixes (better code share across *BSD).


# 1.9 30-Jul-1999 itojun

remove reference to in6_systm.h (file itself will be removed afterwords)


# 1.8 22-Jul-1999 itojun

- implement IPv6 pmtud, which is necessary for TCP6.
- fix memory leak on SO_DEBUG over TCP.


# 1.7 22-Jul-1999 itojun

change unnecessary u_long/long into u_int32_t or something relevant.
more fixes should follow.


# 1.6 09-Jul-1999 thorpej

defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).


# 1.5 06-Jul-1999 itojun

sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups


# 1.4 06-Jul-1999 itojun

checked build on alpha and i386, with GENERIC.v6.
fixed several sizeof(void *) and sizeof(size_t) issues on alpha.

Thanks to: Dave Huang and Tim Rightnour


# 1.3 03-Jul-1999 thorpej

RCS ID police.


# 1.2 01-Jul-1999 itojun

branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.1 28-Jun-1999 itojun

branches: 1.1.2;
file icmp6.c was initially added on branch kame.


# 1.211 14-Mar-2017 ozaki-r

Replace DIAGNOSTIC + panic with CTASSERT


# 1.210 17-Feb-2017 ozaki-r

Rename if_acquire_NOMPSAFE to if_acquire

It can be used in MP-safe ways. So let's remove the confusing postfix.
If it's used in a unsafe way, warn NOMPSAFE in a comment.


# 1.209 13-Feb-2017 ozaki-r

Protect mtudisc and redirect stuffs of icmp/icmp6 with mutex

We have to run pr_init of icmp and icmp6 prior to tcp and tcp6 ones
for mutex initialization.


# 1.208 07-Feb-2017 ozaki-r

Add missing NULL checks for m_get_rcvif


Revision tags: nick-nhusb-base-20170204
# 1.207 02-Feb-2017 ozaki-r

Defer some pr_input to workqueue

pr_input is currently called in softint. Some pr_input such as ICMP, ICMPv6
and CARP can add/delete/update IP addresses and routing table entries. For
example, icmp6_redirect_input updates an a routing table entry and
nd6_ra_input may delete an IP address.

Basically such operations shouldn't be done in softint. That aside, we have
a reason to avoid the situation; psz/psref waits cannot be used in softint,
however they are required to work in such pr_input in the MP-safe world.

The change implements the workqueue pr_input framework called wqinput which
provides a means to defer pr_input of a protocol to workqueue easily.
Currently icmp_input, icmp6_input, carp_proto_input and carp6_proto_input
are deferred to workqueue by the framework.

Proposed and discussed on tech-kern and tech-net


# 1.206 16-Jan-2017 christos

ip6_sprintf -> IN6_PRINT so that we pass the size.


# 1.205 16-Jan-2017 ryo

Make ip6_sprintf(), in_fmtaddr(), lla_snprintf() and icmp6_redirect_diag() mpsafe.

Reviewed by ozaki-r@


Revision tags: bouyer-socketcan-base
# 1.204 13-Jan-2017 ozaki-r

Tweak icmp6_input; always use off, not *offp


Revision tags: pgoyette-localcount-20170107
# 1.203 12-Dec-2016 ozaki-r

Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.202 11-Dec-2016 ozaki-r

Correct sanity checks of icmp6_redirect_output

- rt->rt_ifp is always non-NULL
- Checking RTF_UP here is just racy and meaningless
- The arguments should be non-NULL (at least for now)


Revision tags: nick-nhusb-base-20161204
# 1.201 15-Nov-2016 mlelstv

Enforce alignment requirements that are violated in some cases.
For machines that don't need strict alignment (i386,amd64,vax,m68k) this
is a no-op.

Fixes PR kern/50766 but should be improved.


Revision tags: pgoyette-localcount-20161104
# 1.200 31-Oct-2016 ozaki-r

Fix race condition of in6_selectsrc

in6_selectsrc returned a pointer to in6_addr that wan't guaranteed to be
safe by pserialize (or psref), which was racy. Let callers pass a pointer
to in6_addr and in6_selectsrc copy a result to it inside pserialize
critical sections.


# 1.199 25-Oct-2016 ozaki-r

Remove unnecessary argument

No functional change.


# 1.198 18-Oct-2016 ozaki-r

Remove unnecessary pserialize_read_enter


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.197 26-Aug-2016 dholland

PR 51434 David Binderman: remove redundant test.


# 1.196 19-Aug-2016 roy

Revert r1.148
IP6_EXTHDR_GET ensures that a icmp6 header can be fetched from the mbuf
so m_pullup does not need to be called.

While here, we can safely increament interface error stats even with an
invalidated mbuf because we have a saved reference to the interface.


Revision tags: pgoyette-localcount-20160806
# 1.195 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


Revision tags: pgoyette-localcount-20160726
# 1.194 15-Jul-2016 ozaki-r

Use sin6tosa and sin6tocsa macros

No functional change.


# 1.193 15-Jul-2016 ozaki-r

Use ifatoia6 macro

No functional change.


Revision tags: pgoyette-localcount-base nick-nhusb-base-20160907
# 1.192 07-Jul-2016 ozaki-r

branches: 1.192.2;
Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.


# 1.191 05-Jul-2016 ozaki-r

Use ia6 or ia instead of ifa as a variable name of struct in6_ifaddr

We conventionally use ifa for struct ifaddr and use ia6 or ia for
struct in6_ifaddr.

No functional change.


# 1.190 28-Jun-2016 ozaki-r

Add missing NULL checks for m_get_rcvif_psref


# 1.189 21-Jun-2016 ozaki-r

Make sure returning ifp from in6_select* functions psref-ed

To this end, callers need to pass struct psref to the functions
and the fuctions acquire a reference of ifp with it. In some cases,
we can simply use if_get_byindex, however, in other cases
(say rt->rt_ifp and ia->ifa_ifp), we have no MP-safe way for now.
In order to take a reference anyway we use non MP-safe function
if_acquire_NOMPSAFE for the latter cases. They should be fixed in
the future somehow.


# 1.188 10-Jun-2016 ozaki-r

Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.


# 1.187 10-Jun-2016 ozaki-r

Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.186 18-May-2016 ozaki-r

Don't try to get outif unnecessarily from in6_selectsrc

The got outif is unused.


# 1.185 17-May-2016 ozaki-r

Get rcvif once and reuse it

No functional change.


# 1.184 17-May-2016 ozaki-r

Make sure icmp6_redirect_input frees mbuf before return


# 1.183 12-May-2016 ozaki-r

Protect ifnet list with psz and psref

The change ensures that ifnet objects in the ifnet list aren't freed during
list iterations by using pserialize(9) and psref(9).

Note that the change adds a pslist(9) for ifnet but doesn't remove the
original ifnet list (ifnet_list) to avoid breaking kvm(3) users. We
shouldn't use the original list in the kernel anymore.


Revision tags: nick-nhusb-base-20160422
# 1.182 04-Apr-2016 ozaki-r

Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html


# 1.181 01-Apr-2016 ozaki-r

Remove unnecessary casts and do s/0/NULL/ for rtrequest


# 1.180 01-Apr-2016 ozaki-r

Refine nd6log

Add __func__ to nd6log itself instead of adding it to callers.


Revision tags: nick-nhusb-base-20160319
# 1.179 21-Jan-2016 riastradh

Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.


# 1.178 21-Jan-2016 riastradh

Give proper prototype to ip_output.


Revision tags: nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.177 14-Sep-2015 ozaki-r

Update icmp6_redirect_timeout_q when changing net.inet6.icmp6.redirtimeout

We have to update icmp6_redirect_timeout_q as well as icmp6_redirtimeout
when changing net.inet6.icmp6.redirtimeout via sysctl. The updating logic
is copied from sysctl_net_inet_icmp_redirtimeout.

This change is from s-yamaguchi@IIJ (with KNF by ozaki-r) and fixes
PR kern/50240.


# 1.176 31-Aug-2015 ozaki-r

Make rt_refcnt take into account rt_timer


# 1.175 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.174 24-Aug-2015 ozaki-r

Change 0 to NULL for rtrequest's last argument (struct rtentry **ret_nrt)


# 1.173 07-Aug-2015 ozaki-r

Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.


# 1.172 24-Jul-2015 ozaki-r

Fix rtfree-ing wrong rtentry


# 1.171 17-Jul-2015 ozaki-r

Reform use of rt_refcnt

rt_refcnt of rtentry was used in bad manners, for example, direct rt_refcnt++
and rt_refcnt-- outside route.c, "rt->rt_refcnt++; rtfree(rt);" idiom, and
touching rt after rt->rt_refcnt--.

These abuses seem to be needed because rt_refcnt manages only references
between rtentry and doesn't take care of references during packet processing
(IOW references from local variables). In order to reduce the above abuses,
the latter cases should be counted by rt_refcnt as well as the former cases.

This change improves consistency of use of rt_refcnt:
- rtentry is always accessed with rt_refcnt incremented
- rtentry's rt_refcnt is decremented after use (rtfree is always used instead
of rt_refcnt--)
- functions returning rtentry increment its rt_refcnt (and caller rtfree it)

Note that rt_refcnt prevents rtentry from being freed but doesn't prevent
rtentry from being updated. Toward MP-safe, we need to provide another
protection for rtentry, e.g., locks. (Or introduce a better data structure
allowing concurrent readers during updates.)


Revision tags: nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.170 25-Nov-2014 christos

branches: 1.170.2;
CID 977389: Out of bounds access.


Revision tags: netbsd-7-0-2-RELEASE netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.169 06-Jun-2014 rmind

branches: 1.169.2;
- Eliminate RTFREE() macro in favour of rtfree() function.
- Make rtcache() function static.


# 1.168 30-May-2014 christos

Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.


# 1.167 19-May-2014 rmind

- Split off PRU_ATTACH and PRU_DETACH logic into separate functions.
- Replace malloc with kmem and eliminate M_PCB while here.
- Sprinkle more asserts.


Revision tags: rmind-smpnet-nbase rmind-smpnet-base
# 1.166 18-May-2014 rmind

Use IFNET_FIRST() rather than open coding ifnet access.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.165 25-Feb-2014 pooka

branches: 1.165.2;
Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.


# 1.164 20-Feb-2014 joerg

Bail out in case m_pulldown failed.


# 1.163 23-Nov-2013 christos

convert from CIRCLEQ to TAILQ.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.162 05-Jun-2013 christos

branches: 1.162.2;
IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.


Revision tags: agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.161 23-Jun-2012 christos

branches: 1.161.2;
4 new sysctls to avoid ipv6 DoS attacks from OpenBSD


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.160 22-Mar-2012 drochner

remove KAME IPSEC, replaced by FAST_IPSEC


Revision tags: netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.159 31-Dec-2011 christos

branches: 1.159.2; 1.159.6; 1.159.8;
- fix offsetof usage, and redundant defines
- kill pointer casts to 0


# 1.158 19-Dec-2011 drochner

rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.157 31-Aug-2011 plunky

branches: 1.157.2; 1.157.6;
NULL does not need a cast


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 rmind-uvmplock-base
# 1.156 12-Sep-2010 drochner

avoid NULL dereference in error case


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.155 18-Oct-2009 christos

branches: 1.155.2; 1.155.4;
fix the sun2 case for real.


# 1.154 12-Oct-2009 christos

unbreak sun2.


# 1.153 16-Sep-2009 pooka

Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.152 18-Mar-2009 cegger

bzero -> memset


# 1.151 18-Mar-2009 cegger

bcmp -> memcmp


Revision tags: netbsd-5-2-3-RELEASE netbsd-5-1-5-RELEASE netbsd-5-2-2-RELEASE netbsd-5-1-4-RELEASE netbsd-5-2-1-RELEASE netbsd-5-1-3-RELEASE netbsd-5-2-RELEASE netbsd-5-2-RC1 netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 nick-hppapmap-base2 netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 haad-dm-base mjf-devfs2-base
# 1.150 03-Oct-2008 adrianp

branches: 1.150.2; 1.150.8;
Fix for CVE-2008-3530 from matt@
Implement improved checking for MTU values on ICMP 'Packet Too Big Messages'


Revision tags: wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.149 06-Aug-2008 plunky

Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base
# 1.148 07-May-2008 bouyer

branches: 1.148.2; 1.148.6;
Sync with ipv4 icmp_input(): make sure the mbuf is writable and
contains the entire icmp message befre calling icmp6_input().
should fix "panic: mbuf too short for IPv6 header" seen by several peoples.


# 1.147 04-May-2008 thorpej

Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577


Revision tags: yamt-nfs-mp-base
# 1.146 23-Apr-2008 thorpej

branches: 1.146.2;
Use <net/net_stats.h> / netstat_sysctl().


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.145 15-Apr-2008 thorpej

branches: 1.145.2;
Make ip6 and icmp6 stats per-cpu.


# 1.144 08-Apr-2008 thorpej

Change IPv6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ip6stat structure; old netstat
binaries will continue to work properly.


# 1.143 08-Apr-2008 thorpej

Change ICMP6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old icmp6stat structure; old netstat
binaries will continue to work properly.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.142 27-Feb-2008 matt

Convert to ansi definitions from old-style definitons.
Remember that func() is not ansi, func(void) is.


Revision tags: nick-net80211-sync-base bouyer-xeni386-merge1 vmlocking2-base3 bouyer-xeni386-nbase yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 bouyer-xeni386-base yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase mjf-devfs-base matt-armv6-base jmcneill-pm-base hpcarm-cleanup-base reinoud-bufcleanup-base
# 1.141 04-Dec-2007 dyoung

branches: 1.141.8; 1.141.12;
Use IFNET_FOREACH() and IFADDR_FOREACH().


Revision tags: vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.140 01-Nov-2007 dyoung

branches: 1.140.2; 1.140.4;
De-__P().


# 1.139 29-Oct-2007 dyoung

The IPv6 stack labels incoming packets with an m_tag whose payload
is a struct ip6aux. A struct ip6aux used to contain a pointer to
an in6_ifaddr, but that pointer could become a dangling reference
in the lifetime of the m_tag, because ip6_setdstifaddr() did not
increase the in6_ifaddr's reference count. I have removed the
pointer from ip6aux. I load it with the interesting fields from
the in6_ifaddr (an IPv6 address, a scope ID, and some flags),
instead.


# 1.138 24-Oct-2007 dyoung

Replace rote sockaddr_in6 initializations (memset(), set sa6_family,
sa6_len, and sa6_add) with sockaddr_in6_init() calls.

De-__P(). Constify. KNF. Shorten a staircase. Change bcmp() to
memcmp().

Extract subroutine in6_setzoneid() from in6_setscope(), for re-use
soon.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base vmlocking-base
# 1.137 19-Sep-2007 dyoung

branches: 1.137.4;
1) Introduce a new socket option, (SOL_SOCKET, SO_NOHEADER), that
tells a socket that it should both add a protocol header to tx'd
datagrams and remove the header from rx'd datagrams:

int onoff = 1, s = socket(...);
setsockopt(s, SOL_SOCKET, SO_NOHEADER, &onoff);

2) Add an implementation of (SOL_SOCKET, SO_NOHEADER) for raw IPv4
sockets.

3) Reorganize the protocols' pr_ctloutput implementations a bit.
Consistently return ENOPROTOOPT when an option is unsupported,
and EINVAL if a supported option's arguments are incorrect.
Reorganize the flow of code so that it's more clear how/when
options are passed down the stack until they are handled.

Shorten some pr_ctloutput staircases for readability.

4) Extract common mbuf code into subroutines, add new sockaddr
methods, and introduce a new subroutine, fsocreate(), for reuse
later; use it first in sys_socket():

struct mbuf *m_getsombuf(struct socket *so)

Create an mbuf and make its owner the socket `so'.

struct mbuf *m_intopt(struct socket *so, int val)

Create an mbuf, make its owner the socket `so', put the
int `val' into it, and set its length to sizeof(int).


int fsocreate(..., int *fd)

Create a socket, a la socreate(9), put the socket into the
given LWP's descriptor table, return the descriptor at `fd'
on success.

void *sockaddr_addr(struct sockaddr *sa, socklen_t *slenp)
const void *sockaddr_const_addr(const struct sockaddr *sa, socklen_t *slenp)

Extract a pointer to the address part of a sockaddr. Write
the length of the address part at `slenp', if `slenp' is
not NULL.

socklen_t sockaddr_getlen(const struct sockaddr *sa)

Return the length of a sockaddr. This just evaluates to
sa->sa_len. I only add this for consistency with code that
appears in a portable userland library that I am going to
import.

const struct sockaddr *sockaddr_any(const struct sockaddr *sa)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.

const void *sockaddr_anyaddr(const struct sockaddr *sa, socklen_t *slenp)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.


Revision tags: nick-csl-alignment-base5
# 1.136 10-Aug-2007 dyoung

branches: 1.136.2;
Constify. bcopy -> memcpy.


Revision tags: matt-mips64-base
# 1.135 19-Jul-2007 dyoung

branches: 1.135.4; 1.135.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.134 13-Jun-2007 dyoung

branches: 1.134.2;
Persuasive programming: check M_UNWRITABLE(m, len) instead of
m->m_len<len before pulling up, because that helps make it clear
that we m_pullup() in order to guarantee that the contiguous region
is *writable*.


# 1.133 23-May-2007 christos

Ansify + add a few comments, from Karl Sj��dahl


Revision tags: yamt-idlelwp-base8
# 1.132 02-May-2007 dyoung

Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


Revision tags: thorpej-atomic-base
# 1.131 04-Mar-2007 christos

branches: 1.131.2; 1.131.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.130 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


# 1.129 10-Feb-2007 degroote

branches: 1.129.2;
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.128 29-Jan-2007 dyoung

bzero -> memset


# 1.127 15-Jan-2007 dyoung

Cosmetic: indent using ASCII horizontal tab, insert space following
comma, wrap line.


# 1.126 15-Jan-2007 degroote

Fix an infinite loop ( and local dos ) in the case where the ip6_hdr and
the icmp6_hdr are not in the same mbuf.
Fix pr/34994 and probably pr/35333
Ok @rpaulo


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.125 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.124 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


Revision tags: netbsd-4-base
# 1.123 16-Nov-2006 christos

branches: 1.123.2;
__unused removal on arguments; approved by core.


Revision tags: yamt-splraiseipl-base2
# 1.122 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.121 05-Sep-2006 dyoung

branches: 1.121.2; 1.121.4;
Simplify and repair icmp6_input() to stop the kernel from panicking
in m_copydata() when an ICMP6_ECHO_REQUEST is received, as reported
by Tatoku Ogaito on current-users@.


Revision tags: yamt-pdpolicy-base8
# 1.120 01-Sep-2006 dyoung

Vastly simplify the code that copies an ICMP6 packet to two data
paths: ICMP6 reply path, and socket path.


# 1.119 30-Aug-2006 christos

declare the type of code.


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.118 11-Jul-2006 tron

Clear mbuf checksum flags before passing it to ip6_output(). We might
recycle a mbuf which contained a hardware provided checksum. This
fixes "traceroute6" to a machine which is using a wm(4) interface
that has UDP or TCP checksum offload enabled.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base
# 1.117 07-Jun-2006 kardel

branches: 1.117.2;
merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base simonb-timecounters-base
# 1.116 15-Apr-2006 christos

branches: 1.116.2;
Coverity CID 740: Change constant comparisons to MCLBYTES to KASSERT and remove
extraneous tests.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2
# 1.115 05-Mar-2006 rpaulo

branches: 1.115.2; 1.115.4;
NDP-related improvements:
RFC4191
- supports host-side router-preference

RFC3542
- if DAD fails on a interface, disables IPv6 operation on the
interface
- don't advertise MLD report before DAD finishes

Others
- fixes integer overflow for valid and preferred lifetimes
- improves timer granularity for MLD, using callout-timer.
- reflects rtadvd's IPv6 host variable information into kernel
(router only)
- adds a sysctl option to enable/disable pMTUd for multicast
packets
- performs NUD on PPP/GRE interface by default
- Redirect works regardless of ip6_accept_rtadv
- removes RFC1885-related code

From the KAME project via SUZUKI Shinsuke.
Reviewed by core.


Revision tags: yamt-pdpolicy-base
# 1.114 03-Mar-2006 rpaulo

branches: 1.114.2;
Fix typos in comments.

From: the KAME project via SUZUKI Shinsuke.


Revision tags: yamt-uio_vmspace-base5
# 1.113 21-Jan-2006 rpaulo

branches: 1.113.2; 1.113.4;
Better support of IPv6 scoped addresses.

- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.
- and, while there, don't try to remove the ff02::/32 interface route
entry in in6_ifdetach() as it's already gone.

This also includes some level of support for the standard source
address selection algorithm defined in RFC3484, which will be
completed on in the future.

From the KAME project via JINMEI Tatuya.
Approved by core@.


# 1.112 11-Dec-2005 christos

branches: 1.112.2;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base ktrace-lwp-base
# 1.111 19-Oct-2005 bouyer

In icmp6_redirect_output(), sip6 is initialised to point to the data area of
m0. But m0 may be freed later, so trying to use sip6 at the end of this
function is wrong. My guess is that we want to reference the data area
of m (the mbuf about to be send) instead at this point.
Fix a panic on Xen (where a data area of a mbuf may be unmapped when the
mbuf is freed), and probably potential data/pool corruption in other cases.


Revision tags: yamt-vop-base
# 1.110 18-Aug-2005 yamt

branches: 1.110.2;
- introduce M_MOVE_PKTHDR and use it where appropriate.
intended to be mostly API compatible with openbsd/freebsd.
- remove a glue #define in netipsec/ipsec_osdep.h.


# 1.109 29-May-2005 christos

branches: 1.109.2;
- avoid shadowed variables
- sprinkle const.


Revision tags: netbsd-3-1-1-RELEASE netbsd-3-0-3-RELEASE netbsd-3-1-RELEASE netbsd-3-0-2-RELEASE netbsd-3-1-RC4 netbsd-3-1-RC3 netbsd-3-1-RC2 netbsd-3-1-RC1 netbsd-3-0-1-RELEASE netbsd-3-0-RELEASE netbsd-3-0-RC6 netbsd-3-0-RC5 netbsd-3-0-RC4 netbsd-3-0-RC3 netbsd-3-0-RC2 netbsd-3-0-RC1 yamt-km-base4 yamt-km-base3 netbsd-3-base yamt-km-base2 yamt-km-base kent-audio2-base
# 1.108 17-Jan-2005 itojun

branches: 1.108.6; 1.108.8; 1.108.10;
shouldn't check code field on "packet too big" icmp6 message.


Revision tags: kent-audio1-beforemerge kent-audio1-base
# 1.107 25-May-2004 atatat

branches: 1.107.4;
Sysctl descriptions under net subtree (net.key not done)


Revision tags: netbsd-2-0-base
# 1.106 26-Mar-2004 itojun

branches: 1.106.2;
do not touch m->m_pkthdr.rcvif after m becomes invalid. Patrick Latifi


# 1.105 24-Mar-2004 atatat

Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.


# 1.104 17-Dec-2003 lha

Fix ICMPV6CTL_ND6_[DP]RLIST, they broke with new sysctl.
Makes ndp -r/ndp -p work again, patch from atatat


# 1.103 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.102 30-Oct-2003 simonb

Remove some assigned-to but otherwise unused variables.


# 1.101 04-Sep-2003 itojun

revamp inpcb/in6pcb so that they are more aligned with each other.
in6pcb lookup now uses hash(9).


# 1.100 25-Aug-2003 itojun

deref member in in6p directly, don't rely on existence of macro


# 1.99 22-Aug-2003 itojun

remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.


# 1.98 22-Aug-2003 itojun

change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.


# 1.97 22-Aug-2003 jonathan

Replace the set_socket() method of passing an extra struct socket*
argument to ip6_output() with a new explicit struct in6pcb* argument.
(The underlying socket can be obtained via in6pcb->inp6_socket.)

In preparation for fast-ipsec. Reviewed by itojun.


# 1.96 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.95 06-Aug-2003 itojun

m_cat may free mbuf on 2nd arg, so m_pkthdr manipulation has to happen
before m_cat call. from Julian Coleman via kame.


# 1.94 24-Jun-2003 itojun

branches: 1.94.2;
remove unneeded checks of accept_rtadv. from kame


# 1.93 24-Jun-2003 itojun

use time.tv_sec directly


# 1.92 06-Jun-2003 itojun

- sync up MLD declaration with RFC3542 (s/MLD6/MLD/)
- routing header declaration with RFC3542
(note: sizeof(ip6_rthdr0) has changed!)
also, sync up with RFC2460 routing header definition (no "strict" source
routing mode any more)

part of advanced API update (RFC2292 -> 3542).


# 1.91 03-Jun-2003 itojun

remove assumption on redirect header option processing. from kame


# 1.90 14-May-2003 itojun

always use PULLDOWN_TEST codepath.


# 1.89 31-Mar-2003 itojun

avoid mbuf leak in redirect header option attachment. more complete
fix to come. from kame


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.88 27-Sep-2002 provos

remove trailing \n in panic(). approved perry.


# 1.87 23-Sep-2002 simonb

Remove breaks after returns, unreachable returns and returns after
returns(!).


# 1.86 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame.


Revision tags: gehenna-devsw-base
# 1.85 30-Jul-2002 itojun

no need to check NULL mbuf, as we touch it already.
From: tedu <grendel@zeitbombe.org>


# 1.84 10-Jul-2002 itojun

correct ping6 -w result wth hostname with [A-Z]. PR 17540. sync w/kame


# 1.83 30-Jun-2002 thorpej

Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).


# 1.82 09-Jun-2002 itojun

whitespace cleanup


# 1.81 08-Jun-2002 itojun

whitespace cleanup


# 1.80 31-May-2002 itojun

do not mistakenly lock PMTUD route entry with RTV_MTU.


# 1.79 29-May-2002 christos

make this compile again.


# 1.78 29-May-2002 itojun

correct rmx_mtu value after PMTUD entry timeout (should be set to 0)


# 1.77 24-May-2002 itojun

extra blank line


# 1.76 24-May-2002 itojun

make a strict check before sending FQDN node information reply. sync w/kame


Revision tags: netbsd-1-6-base eeh-devprop-base newlock-base
# 1.75 05-Mar-2002 itojun

branches: 1.75.6; 1.75.8;
on redirect output, always try to attach target link layer address option.


Revision tags: ifpoll-base
# 1.74 21-Dec-2001 itojun

whitespace/costmetic sync w/kame


# 1.73 20-Dec-2001 itojun

centralize multicast group management (in6_join/leavegroup).
have a flag for ip6_output() to fragment to minimum MTU.
sync with kame


# 1.72 07-Dec-2001 itojun

correct timing to increment icmp6 MIB variables. sync with kame


# 1.71 13-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-mips-cache-base
# 1.70 29-Oct-2001 simonb

Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.


# 1.69 24-Oct-2001 itojun

more whitespace sync with kame


# 1.68 18-Oct-2001 itojun

branches: 1.68.2;
simplify per-if stats.


# 1.67 15-Oct-2001 itojun

sync with kame.
net.inet6.icmp6.nodeinfo is now a bitmap (2^0 = ping6 -w, 2^1 = ping6 -a).
give up local if there's mbuf alloc failures.
cope with ".." in hostname.
sync comments/whitespaces.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.66 22-Jun-2001 itojun

branches: 1.66.2;
remove RFC1885 compatibility code in #ifdef COMPAT_RFC1885, for icmp6
reply packet size consideration (obsolete, not used for a long time).
sync with kame


# 1.65 01-Jun-2001 itojun

use default hoplimit when incoming interface is not given to icmp6_reflect.
sync with kame


# 1.64 08-May-2001 itojun

correct faith prefix determination. use sys/netinet/if_faith.c:faithprefix()
to determine. sync with kame.
(without this change, non-faith socket may mistakenly accept for-faith traffic)


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.63 04-Apr-2001 itojun

make sure rcvif is sane on call to icmp6_reflect


# 1.62 30-Mar-2001 itojun

enable FAKE_LOOPBACK_IF case by default.
now traffic on loopback interface will be presented to bpf as normal wire
format packet (without KAME scopeid in s6_addr16[1]).

fix KAME PR 250 (host mistakenly accepts packets to fe80::x%lo0).

sync with kame.


# 1.61 21-Mar-2001 itojun

set rmx_mtu to L2 interface mtu, instead of 0, on mtudisc timeout.
ip6_output() change is for safety. sync with kame


# 1.60 08-Mar-2001 itojun

remove bogus rtfree. sync with kame. inspired by openbsd PR 1706.


# 1.59 01-Mar-2001 itojun

branches: 1.59.2;
make sure to enforce inbound ipsec policy checking, for any protocols on top
of ip (check it when final header is visited). sync with kame.
XXX kame team will need to re-check policy engine code


# 1.58 11-Feb-2001 itojun

pull latest kame pcbnotify code. synchronizes ICMPv6 path mtu discovery
behavior with other protocols (i.e. validation, use of hiwat/lowat).


# 1.57 11-Feb-2001 itojun

recover $NetBSD$ (removed by mistake)


# 1.56 10-Feb-2001 itojun

to sync with kame better, (1) remove register declaration for variables,
(2) sync whitespaces, (3) update comments. (4) bring in some of portability
and logging enhancements. no functional changes here.


# 1.55 08-Feb-2001 itojun

implement upper limit to icmp6 redirects (experimental, turned off)
negative value to {mtudisc,redirect}_{hi,lo}wat will turn off the limitation.
sync with kame.


# 1.54 07-Feb-2001 itojun

remove bogus DIAGNOSTIC. sync with kame


# 1.53 07-Feb-2001 itojun

during ip6/icmp6 inbound packet processing, do not call log() nor printf() in
normal operation (/var can get filled up by flodding bogus packets).
sysctl net.inet6.icmp6.nd6_debug will turn on diagnostic messages.
(#define ND6_DEBUG will turn it on by default)

improve stats in ND6 code.

lots of synchronziation with kame (including comments and cometic ones).


# 1.52 24-Jan-2001 itojun

- record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation


# 1.51 16-Jan-2001 itojun

s/ND6DEBUG/ND6_DEBUG/ to meet other places


# 1.50 08-Jan-2001 itojun

wrap icmp6 checksum error printf() into #ifdef ND6DEBUG.
sync with kame, NetBSD PR 11911.


# 1.49 11-Dec-2000 itojun

no need to rtalloc1() twice in pmtud. from kame


# 1.48 09-Dec-2000 itojun

update icmp6 too big validation. the change is necessary since pmtud is
mandatory for IPv6 (so we can't just validate by using connected pcb - we need
to allow traffic from unconnected pcb to do pmtud).
- if the traffic is validated by xx_ctlinput, allow up to "hiwat" pmtud
route entries.
- if the traffic was not validated by xx_ctlinput, allow up to "lowat" pmtud
route entries (there's upper limit, so bad guys cannot blow up our routing
table).
sync with kame

XXX need to think again about default hiwat/lowat value.
XXX victim selection to help starvation case


# 1.47 11-Nov-2000 itojun

improve spec conformance of node information query (07).
sync with kame.


# 1.46 18-Oct-2000 itojun

verify ICMPv6 too big messages based on TCP pcbs, and/or IPsec SA.
TODO: udp6, and sendto consideration. as pmtud is mandatory for IPv6,
it is rather important for us to support those cases.
TODO: more testing
TODO: kame sync


# 1.45 10-Oct-2000 itojun

sync with kame ($KAME$)


# 1.44 02-Oct-2000 itojun

fix compilation without INET. fix confusion between ipsecstat and ipsec6stat.
sync with kame.


# 1.43 16-Sep-2000 itojun

kame sys/netinet6/icmp6.c 1.140 -> 1.144
> in the check for the incoming redirect message, examine the gateway
> (from the routing table) only when the address family of the gateway is
> AF_INET6.


# 1.42 19-Aug-2000 itojun

- icmp6 nodeinfo: remove possibility of unaligned pointer access.
- jumbo payload output: fix incorrect mbuf manipulation
- pedant: align issues, mbuf assumption
(sync with kame)


# 1.41 03-Aug-2000 itojun

clearifications in icmp6 node query support.
XXX previous commit included "supported qtypes" icmp6 node query support.
sorry commit message was mistaken.


# 1.40 03-Aug-2000 itojun

correct typo in #define. ICMP6_NI_SUCESS -> SUCCESS (notice missing C).
sync with kame.


# 1.39 30-Jul-2000 itojun

sync comment with reality


# 1.38 28-Jul-2000 itojun

nuke the following sysctl variables. "ppsratelimit" should work better.
need to recompile sbin/sysctl after updating /usr/include.
net.inet.tcp.rstratelimit
net.inet.icmp.errratelimit
net.inet6.icmp6.errratelimit


# 1.37 09-Jul-2000 itojun

add ppsratelimit(9), which does event-per-sec rate limitation.
use it from icmp6 error rate limitation code.
XXX better name for the function?


# 1.36 07-Jul-2000 itojun

sync with kame.
introduce in6_{recover,embed}scope, for in-kernel scoped-address manipulation.
improve in6_pcbnotify.


# 1.35 06-Jul-2000 itojun

- do not use bitfield for router renumbering header.
- add protection mechanism against ND cache corruption due to bad NUD hints.
- more stats
- icmp6 pps limitation. TOOD: should implement ppsratecheck(9).


# 1.34 28-Jun-2000 mrg

<vm/vm.h> -> <uvm/uvm_extern.h>


Revision tags: netbsd-1-5-base
# 1.33 13-Jun-2000 itojun

branches: 1.33.2;
signedness issue with char, take 2. confirmed with i386 cc -funsigned-char.


# 1.32 13-Jun-2000 itojun

workaround to suppress warning on char == unsigned char arch.


# 1.31 12-Jun-2000 itojun

better conformance to draft-ietf-ipngwg-icmp-name-lookups-05.
the old code was chimera of 03 and 05 draft.

-n by default, since IPv6 reverse lookup takes too much time.
use -H to enable reverse name lookup.


Revision tags: minoura-xpg4dl-base
# 1.30 22-May-2000 itojun

branches: 1.30.2;
disallow negative numbers for ratelimit interval (tcp, icmp, icmp6).


# 1.29 09-May-2000 itojun

do not try NUD unless the gateway is a real neighbor.
real fix to KAME PR 245 (workaround has been implemented).


# 1.28 13-Apr-2000 itojun

do not return icmp6 error against icmp6 error.
(this is due to a bug in header chain chasing)


# 1.27 22-Mar-2000 itojun

use ip6_{last,next}hdr in icmp6 inbound packet parsing.


# 1.26 01-Mar-2000 itojun

introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)


# 1.25 28-Feb-2000 itojun

fix ICMPv6 redirect input. the bug can result in invalid ND entry.


# 1.24 28-Feb-2000 itojun

support draft-ietf-ipngwg-icmp-name-lookups-05.txt, drop support for
draft-ietf-ipngwg-icmp-name-lookups-04.txt.

There are certain bitfield change in 04 draft to 05 draft, which makes
04 "ping6 -a" and 05 "ping6 -a" not interoperable. sigh.


# 1.23 26-Feb-2000 itojun

bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.


# 1.22 17-Feb-2000 darrenr

Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.


# 1.21 15-Feb-2000 thorpej

Fix a couple of brainos in the last.


# 1.20 14-Feb-2000 thorpej

Use ratecheck() for ICMP6 rate limiting.


Revision tags: chs-ubc2-newbase
# 1.19 06-Feb-2000 itojun

fix include pathname for better rfc2292 compliance.


# 1.18 16-Jan-2000 itojun

add missing ipcomp cases.


# 1.17 07-Jan-2000 itohy

Rename variable "prep" for PReP port.


# 1.16 06-Jan-2000 itojun

remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...


# 1.15 05-Jan-2000 itojun

avoid panic on getsockopt(ICMPV6_FILTER).


# 1.14 02-Jan-2000 itojun

add net.inet6.icmp6.nodeinfo sysctl.
this allows you to disable/enable ICMPv6 node information query/reply
processing (which tells remote end the gethostname(3) setting, interface
addresses on the node, and some other things - documented in
draft-ietf-ipngwg-icmp-name-lookup* or something alike).

to test it, try ping6 -w ::1 with nodeinfo=0 and nodeinfo=1.
(sync with kame change)


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.13 15-Dec-1999 itojun

do not overwrite traffic class field when we write IPv6 version field.


# 1.12 13-Dec-1999 itojun

sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)


Revision tags: comdex-fall-1999-base fvdl-softdep-base
# 1.11 01-Oct-1999 itojun

branches: 1.11.2; 1.11.8;
consistent logging for icmp6 redirects
XXX should make logs 1-liner so that duplicated logs can be compressed
by syslog(8)?


Revision tags: chs-ubc2-base
# 1.10 31-Jul-1999 itojun

sync with recent KAME.
- loosen ipsec restriction on packet diredtion.
- revise icmp6 redirect handling on IsRouter bit.
- tcp/udp notification processing (link-local address case)
- cosmetic fixes (better code share across *BSD).


# 1.9 30-Jul-1999 itojun

remove reference to in6_systm.h (file itself will be removed afterwords)


# 1.8 22-Jul-1999 itojun

- implement IPv6 pmtud, which is necessary for TCP6.
- fix memory leak on SO_DEBUG over TCP.


# 1.7 22-Jul-1999 itojun

change unnecessary u_long/long into u_int32_t or something relevant.
more fixes should follow.


# 1.6 09-Jul-1999 thorpej

defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).


# 1.5 06-Jul-1999 itojun

sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups


# 1.4 06-Jul-1999 itojun

checked build on alpha and i386, with GENERIC.v6.
fixed several sizeof(void *) and sizeof(size_t) issues on alpha.

Thanks to: Dave Huang and Tim Rightnour


# 1.3 03-Jul-1999 thorpej

RCS ID police.


# 1.2 01-Jul-1999 itojun

branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.1 28-Jun-1999 itojun

branches: 1.1.2;
file icmp6.c was initially added on branch kame.


# 1.210 17-Feb-2017 ozaki-r

Rename if_acquire_NOMPSAFE to if_acquire

It can be used in MP-safe ways. So let's remove the confusing postfix.
If it's used in a unsafe way, warn NOMPSAFE in a comment.


# 1.209 13-Feb-2017 ozaki-r

Protect mtudisc and redirect stuffs of icmp/icmp6 with mutex

We have to run pr_init of icmp and icmp6 prior to tcp and tcp6 ones
for mutex initialization.


# 1.208 07-Feb-2017 ozaki-r

Add missing NULL checks for m_get_rcvif


Revision tags: nick-nhusb-base-20170204
# 1.207 02-Feb-2017 ozaki-r

Defer some pr_input to workqueue

pr_input is currently called in softint. Some pr_input such as ICMP, ICMPv6
and CARP can add/delete/update IP addresses and routing table entries. For
example, icmp6_redirect_input updates an a routing table entry and
nd6_ra_input may delete an IP address.

Basically such operations shouldn't be done in softint. That aside, we have
a reason to avoid the situation; psz/psref waits cannot be used in softint,
however they are required to work in such pr_input in the MP-safe world.

The change implements the workqueue pr_input framework called wqinput which
provides a means to defer pr_input of a protocol to workqueue easily.
Currently icmp_input, icmp6_input, carp_proto_input and carp6_proto_input
are deferred to workqueue by the framework.

Proposed and discussed on tech-kern and tech-net


# 1.206 16-Jan-2017 christos

ip6_sprintf -> IN6_PRINT so that we pass the size.


# 1.205 16-Jan-2017 ryo

Make ip6_sprintf(), in_fmtaddr(), lla_snprintf() and icmp6_redirect_diag() mpsafe.

Reviewed by ozaki-r@


Revision tags: bouyer-socketcan-base
# 1.204 13-Jan-2017 ozaki-r

Tweak icmp6_input; always use off, not *offp


Revision tags: pgoyette-localcount-20170107
# 1.203 12-Dec-2016 ozaki-r

Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.202 11-Dec-2016 ozaki-r

Correct sanity checks of icmp6_redirect_output

- rt->rt_ifp is always non-NULL
- Checking RTF_UP here is just racy and meaningless
- The arguments should be non-NULL (at least for now)


Revision tags: nick-nhusb-base-20161204
# 1.201 15-Nov-2016 mlelstv

Enforce alignment requirements that are violated in some cases.
For machines that don't need strict alignment (i386,amd64,vax,m68k) this
is a no-op.

Fixes PR kern/50766 but should be improved.


Revision tags: pgoyette-localcount-20161104
# 1.200 31-Oct-2016 ozaki-r

Fix race condition of in6_selectsrc

in6_selectsrc returned a pointer to in6_addr that wan't guaranteed to be
safe by pserialize (or psref), which was racy. Let callers pass a pointer
to in6_addr and in6_selectsrc copy a result to it inside pserialize
critical sections.


# 1.199 25-Oct-2016 ozaki-r

Remove unnecessary argument

No functional change.


# 1.198 18-Oct-2016 ozaki-r

Remove unnecessary pserialize_read_enter


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.197 26-Aug-2016 dholland

PR 51434 David Binderman: remove redundant test.


# 1.196 19-Aug-2016 roy

Revert r1.148
IP6_EXTHDR_GET ensures that a icmp6 header can be fetched from the mbuf
so m_pullup does not need to be called.

While here, we can safely increament interface error stats even with an
invalidated mbuf because we have a saved reference to the interface.


Revision tags: pgoyette-localcount-20160806
# 1.195 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


Revision tags: pgoyette-localcount-20160726
# 1.194 15-Jul-2016 ozaki-r

Use sin6tosa and sin6tocsa macros

No functional change.


# 1.193 15-Jul-2016 ozaki-r

Use ifatoia6 macro

No functional change.


Revision tags: pgoyette-localcount-base nick-nhusb-base-20160907
# 1.192 07-Jul-2016 ozaki-r

branches: 1.192.2;
Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.


# 1.191 05-Jul-2016 ozaki-r

Use ia6 or ia instead of ifa as a variable name of struct in6_ifaddr

We conventionally use ifa for struct ifaddr and use ia6 or ia for
struct in6_ifaddr.

No functional change.


# 1.190 28-Jun-2016 ozaki-r

Add missing NULL checks for m_get_rcvif_psref


# 1.189 21-Jun-2016 ozaki-r

Make sure returning ifp from in6_select* functions psref-ed

To this end, callers need to pass struct psref to the functions
and the fuctions acquire a reference of ifp with it. In some cases,
we can simply use if_get_byindex, however, in other cases
(say rt->rt_ifp and ia->ifa_ifp), we have no MP-safe way for now.
In order to take a reference anyway we use non MP-safe function
if_acquire_NOMPSAFE for the latter cases. They should be fixed in
the future somehow.


# 1.188 10-Jun-2016 ozaki-r

Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.


# 1.187 10-Jun-2016 ozaki-r

Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.186 18-May-2016 ozaki-r

Don't try to get outif unnecessarily from in6_selectsrc

The got outif is unused.


# 1.185 17-May-2016 ozaki-r

Get rcvif once and reuse it

No functional change.


# 1.184 17-May-2016 ozaki-r

Make sure icmp6_redirect_input frees mbuf before return


# 1.183 12-May-2016 ozaki-r

Protect ifnet list with psz and psref

The change ensures that ifnet objects in the ifnet list aren't freed during
list iterations by using pserialize(9) and psref(9).

Note that the change adds a pslist(9) for ifnet but doesn't remove the
original ifnet list (ifnet_list) to avoid breaking kvm(3) users. We
shouldn't use the original list in the kernel anymore.


Revision tags: nick-nhusb-base-20160422
# 1.182 04-Apr-2016 ozaki-r

Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html


# 1.181 01-Apr-2016 ozaki-r

Remove unnecessary casts and do s/0/NULL/ for rtrequest


# 1.180 01-Apr-2016 ozaki-r

Refine nd6log

Add __func__ to nd6log itself instead of adding it to callers.


Revision tags: nick-nhusb-base-20160319
# 1.179 21-Jan-2016 riastradh

Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.


# 1.178 21-Jan-2016 riastradh

Give proper prototype to ip_output.


Revision tags: nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.177 14-Sep-2015 ozaki-r

Update icmp6_redirect_timeout_q when changing net.inet6.icmp6.redirtimeout

We have to update icmp6_redirect_timeout_q as well as icmp6_redirtimeout
when changing net.inet6.icmp6.redirtimeout via sysctl. The updating logic
is copied from sysctl_net_inet_icmp_redirtimeout.

This change is from s-yamaguchi@IIJ (with KNF by ozaki-r) and fixes
PR kern/50240.


# 1.176 31-Aug-2015 ozaki-r

Make rt_refcnt take into account rt_timer


# 1.175 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.174 24-Aug-2015 ozaki-r

Change 0 to NULL for rtrequest's last argument (struct rtentry **ret_nrt)


# 1.173 07-Aug-2015 ozaki-r

Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.


# 1.172 24-Jul-2015 ozaki-r

Fix rtfree-ing wrong rtentry


# 1.171 17-Jul-2015 ozaki-r

Reform use of rt_refcnt

rt_refcnt of rtentry was used in bad manners, for example, direct rt_refcnt++
and rt_refcnt-- outside route.c, "rt->rt_refcnt++; rtfree(rt);" idiom, and
touching rt after rt->rt_refcnt--.

These abuses seem to be needed because rt_refcnt manages only references
between rtentry and doesn't take care of references during packet processing
(IOW references from local variables). In order to reduce the above abuses,
the latter cases should be counted by rt_refcnt as well as the former cases.

This change improves consistency of use of rt_refcnt:
- rtentry is always accessed with rt_refcnt incremented
- rtentry's rt_refcnt is decremented after use (rtfree is always used instead
of rt_refcnt--)
- functions returning rtentry increment its rt_refcnt (and caller rtfree it)

Note that rt_refcnt prevents rtentry from being freed but doesn't prevent
rtentry from being updated. Toward MP-safe, we need to provide another
protection for rtentry, e.g., locks. (Or introduce a better data structure
allowing concurrent readers during updates.)


Revision tags: nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.170 25-Nov-2014 christos

branches: 1.170.2;
CID 977389: Out of bounds access.


Revision tags: netbsd-7-0-2-RELEASE netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.169 06-Jun-2014 rmind

branches: 1.169.2;
- Eliminate RTFREE() macro in favour of rtfree() function.
- Make rtcache() function static.


# 1.168 30-May-2014 christos

Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.


# 1.167 19-May-2014 rmind

- Split off PRU_ATTACH and PRU_DETACH logic into separate functions.
- Replace malloc with kmem and eliminate M_PCB while here.
- Sprinkle more asserts.


Revision tags: rmind-smpnet-nbase rmind-smpnet-base
# 1.166 18-May-2014 rmind

Use IFNET_FIRST() rather than open coding ifnet access.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.165 25-Feb-2014 pooka

branches: 1.165.2;
Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.


# 1.164 20-Feb-2014 joerg

Bail out in case m_pulldown failed.


# 1.163 23-Nov-2013 christos

convert from CIRCLEQ to TAILQ.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.162 05-Jun-2013 christos

branches: 1.162.2;
IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.


Revision tags: agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.161 23-Jun-2012 christos

branches: 1.161.2;
4 new sysctls to avoid ipv6 DoS attacks from OpenBSD


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.160 22-Mar-2012 drochner

remove KAME IPSEC, replaced by FAST_IPSEC


Revision tags: netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.159 31-Dec-2011 christos

branches: 1.159.2; 1.159.6; 1.159.8;
- fix offsetof usage, and redundant defines
- kill pointer casts to 0


# 1.158 19-Dec-2011 drochner

rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.157 31-Aug-2011 plunky

branches: 1.157.2; 1.157.6;
NULL does not need a cast


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 rmind-uvmplock-base
# 1.156 12-Sep-2010 drochner

avoid NULL dereference in error case


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.155 18-Oct-2009 christos

branches: 1.155.2; 1.155.4;
fix the sun2 case for real.


# 1.154 12-Oct-2009 christos

unbreak sun2.


# 1.153 16-Sep-2009 pooka

Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.152 18-Mar-2009 cegger

bzero -> memset


# 1.151 18-Mar-2009 cegger

bcmp -> memcmp


Revision tags: netbsd-5-2-3-RELEASE netbsd-5-1-5-RELEASE netbsd-5-2-2-RELEASE netbsd-5-1-4-RELEASE netbsd-5-2-1-RELEASE netbsd-5-1-3-RELEASE netbsd-5-2-RELEASE netbsd-5-2-RC1 netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 nick-hppapmap-base2 netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 haad-dm-base mjf-devfs2-base
# 1.150 03-Oct-2008 adrianp

branches: 1.150.2; 1.150.8;
Fix for CVE-2008-3530 from matt@
Implement improved checking for MTU values on ICMP 'Packet Too Big Messages'


Revision tags: wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.149 06-Aug-2008 plunky

Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base
# 1.148 07-May-2008 bouyer

branches: 1.148.2; 1.148.6;
Sync with ipv4 icmp_input(): make sure the mbuf is writable and
contains the entire icmp message befre calling icmp6_input().
should fix "panic: mbuf too short for IPv6 header" seen by several peoples.


# 1.147 04-May-2008 thorpej

Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577


Revision tags: yamt-nfs-mp-base
# 1.146 23-Apr-2008 thorpej

branches: 1.146.2;
Use <net/net_stats.h> / netstat_sysctl().


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.145 15-Apr-2008 thorpej

branches: 1.145.2;
Make ip6 and icmp6 stats per-cpu.


# 1.144 08-Apr-2008 thorpej

Change IPv6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ip6stat structure; old netstat
binaries will continue to work properly.


# 1.143 08-Apr-2008 thorpej

Change ICMP6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old icmp6stat structure; old netstat
binaries will continue to work properly.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.142 27-Feb-2008 matt

Convert to ansi definitions from old-style definitons.
Remember that func() is not ansi, func(void) is.


Revision tags: nick-net80211-sync-base bouyer-xeni386-merge1 vmlocking2-base3 bouyer-xeni386-nbase yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 bouyer-xeni386-base yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase mjf-devfs-base matt-armv6-base jmcneill-pm-base hpcarm-cleanup-base reinoud-bufcleanup-base
# 1.141 04-Dec-2007 dyoung

branches: 1.141.8; 1.141.12;
Use IFNET_FOREACH() and IFADDR_FOREACH().


Revision tags: vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.140 01-Nov-2007 dyoung

branches: 1.140.2; 1.140.4;
De-__P().


# 1.139 29-Oct-2007 dyoung

The IPv6 stack labels incoming packets with an m_tag whose payload
is a struct ip6aux. A struct ip6aux used to contain a pointer to
an in6_ifaddr, but that pointer could become a dangling reference
in the lifetime of the m_tag, because ip6_setdstifaddr() did not
increase the in6_ifaddr's reference count. I have removed the
pointer from ip6aux. I load it with the interesting fields from
the in6_ifaddr (an IPv6 address, a scope ID, and some flags),
instead.


# 1.138 24-Oct-2007 dyoung

Replace rote sockaddr_in6 initializations (memset(), set sa6_family,
sa6_len, and sa6_add) with sockaddr_in6_init() calls.

De-__P(). Constify. KNF. Shorten a staircase. Change bcmp() to
memcmp().

Extract subroutine in6_setzoneid() from in6_setscope(), for re-use
soon.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base vmlocking-base
# 1.137 19-Sep-2007 dyoung

branches: 1.137.4;
1) Introduce a new socket option, (SOL_SOCKET, SO_NOHEADER), that
tells a socket that it should both add a protocol header to tx'd
datagrams and remove the header from rx'd datagrams:

int onoff = 1, s = socket(...);
setsockopt(s, SOL_SOCKET, SO_NOHEADER, &onoff);

2) Add an implementation of (SOL_SOCKET, SO_NOHEADER) for raw IPv4
sockets.

3) Reorganize the protocols' pr_ctloutput implementations a bit.
Consistently return ENOPROTOOPT when an option is unsupported,
and EINVAL if a supported option's arguments are incorrect.
Reorganize the flow of code so that it's more clear how/when
options are passed down the stack until they are handled.

Shorten some pr_ctloutput staircases for readability.

4) Extract common mbuf code into subroutines, add new sockaddr
methods, and introduce a new subroutine, fsocreate(), for reuse
later; use it first in sys_socket():

struct mbuf *m_getsombuf(struct socket *so)

Create an mbuf and make its owner the socket `so'.

struct mbuf *m_intopt(struct socket *so, int val)

Create an mbuf, make its owner the socket `so', put the
int `val' into it, and set its length to sizeof(int).


int fsocreate(..., int *fd)

Create a socket, a la socreate(9), put the socket into the
given LWP's descriptor table, return the descriptor at `fd'
on success.

void *sockaddr_addr(struct sockaddr *sa, socklen_t *slenp)
const void *sockaddr_const_addr(const struct sockaddr *sa, socklen_t *slenp)

Extract a pointer to the address part of a sockaddr. Write
the length of the address part at `slenp', if `slenp' is
not NULL.

socklen_t sockaddr_getlen(const struct sockaddr *sa)

Return the length of a sockaddr. This just evaluates to
sa->sa_len. I only add this for consistency with code that
appears in a portable userland library that I am going to
import.

const struct sockaddr *sockaddr_any(const struct sockaddr *sa)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.

const void *sockaddr_anyaddr(const struct sockaddr *sa, socklen_t *slenp)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.


Revision tags: nick-csl-alignment-base5
# 1.136 10-Aug-2007 dyoung

branches: 1.136.2;
Constify. bcopy -> memcpy.


Revision tags: matt-mips64-base
# 1.135 19-Jul-2007 dyoung

branches: 1.135.4; 1.135.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.134 13-Jun-2007 dyoung

branches: 1.134.2;
Persuasive programming: check M_UNWRITABLE(m, len) instead of
m->m_len<len before pulling up, because that helps make it clear
that we m_pullup() in order to guarantee that the contiguous region
is *writable*.


# 1.133 23-May-2007 christos

Ansify + add a few comments, from Karl Sj��dahl


Revision tags: yamt-idlelwp-base8
# 1.132 02-May-2007 dyoung

Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


Revision tags: thorpej-atomic-base
# 1.131 04-Mar-2007 christos

branches: 1.131.2; 1.131.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.130 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


# 1.129 10-Feb-2007 degroote

branches: 1.129.2;
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.128 29-Jan-2007 dyoung

bzero -> memset


# 1.127 15-Jan-2007 dyoung

Cosmetic: indent using ASCII horizontal tab, insert space following
comma, wrap line.


# 1.126 15-Jan-2007 degroote

Fix an infinite loop ( and local dos ) in the case where the ip6_hdr and
the icmp6_hdr are not in the same mbuf.
Fix pr/34994 and probably pr/35333
Ok @rpaulo


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.125 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.124 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


Revision tags: netbsd-4-base
# 1.123 16-Nov-2006 christos

branches: 1.123.2;
__unused removal on arguments; approved by core.


Revision tags: yamt-splraiseipl-base2
# 1.122 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.121 05-Sep-2006 dyoung

branches: 1.121.2; 1.121.4;
Simplify and repair icmp6_input() to stop the kernel from panicking
in m_copydata() when an ICMP6_ECHO_REQUEST is received, as reported
by Tatoku Ogaito on current-users@.


Revision tags: yamt-pdpolicy-base8
# 1.120 01-Sep-2006 dyoung

Vastly simplify the code that copies an ICMP6 packet to two data
paths: ICMP6 reply path, and socket path.


# 1.119 30-Aug-2006 christos

declare the type of code.


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.118 11-Jul-2006 tron

Clear mbuf checksum flags before passing it to ip6_output(). We might
recycle a mbuf which contained a hardware provided checksum. This
fixes "traceroute6" to a machine which is using a wm(4) interface
that has UDP or TCP checksum offload enabled.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base
# 1.117 07-Jun-2006 kardel

branches: 1.117.2;
merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base simonb-timecounters-base
# 1.116 15-Apr-2006 christos

branches: 1.116.2;
Coverity CID 740: Change constant comparisons to MCLBYTES to KASSERT and remove
extraneous tests.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2
# 1.115 05-Mar-2006 rpaulo

branches: 1.115.2; 1.115.4;
NDP-related improvements:
RFC4191
- supports host-side router-preference

RFC3542
- if DAD fails on a interface, disables IPv6 operation on the
interface
- don't advertise MLD report before DAD finishes

Others
- fixes integer overflow for valid and preferred lifetimes
- improves timer granularity for MLD, using callout-timer.
- reflects rtadvd's IPv6 host variable information into kernel
(router only)
- adds a sysctl option to enable/disable pMTUd for multicast
packets
- performs NUD on PPP/GRE interface by default
- Redirect works regardless of ip6_accept_rtadv
- removes RFC1885-related code

From the KAME project via SUZUKI Shinsuke.
Reviewed by core.


Revision tags: yamt-pdpolicy-base
# 1.114 03-Mar-2006 rpaulo

branches: 1.114.2;
Fix typos in comments.

From: the KAME project via SUZUKI Shinsuke.


Revision tags: yamt-uio_vmspace-base5
# 1.113 21-Jan-2006 rpaulo

branches: 1.113.2; 1.113.4;
Better support of IPv6 scoped addresses.

- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.
- and, while there, don't try to remove the ff02::/32 interface route
entry in in6_ifdetach() as it's already gone.

This also includes some level of support for the standard source
address selection algorithm defined in RFC3484, which will be
completed on in the future.

From the KAME project via JINMEI Tatuya.
Approved by core@.


# 1.112 11-Dec-2005 christos

branches: 1.112.2;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base ktrace-lwp-base
# 1.111 19-Oct-2005 bouyer

In icmp6_redirect_output(), sip6 is initialised to point to the data area of
m0. But m0 may be freed later, so trying to use sip6 at the end of this
function is wrong. My guess is that we want to reference the data area
of m (the mbuf about to be send) instead at this point.
Fix a panic on Xen (where a data area of a mbuf may be unmapped when the
mbuf is freed), and probably potential data/pool corruption in other cases.


Revision tags: yamt-vop-base
# 1.110 18-Aug-2005 yamt

branches: 1.110.2;
- introduce M_MOVE_PKTHDR and use it where appropriate.
intended to be mostly API compatible with openbsd/freebsd.
- remove a glue #define in netipsec/ipsec_osdep.h.


# 1.109 29-May-2005 christos

branches: 1.109.2;
- avoid shadowed variables
- sprinkle const.


Revision tags: netbsd-3-1-1-RELEASE netbsd-3-0-3-RELEASE netbsd-3-1-RELEASE netbsd-3-0-2-RELEASE netbsd-3-1-RC4 netbsd-3-1-RC3 netbsd-3-1-RC2 netbsd-3-1-RC1 netbsd-3-0-1-RELEASE netbsd-3-0-RELEASE netbsd-3-0-RC6 netbsd-3-0-RC5 netbsd-3-0-RC4 netbsd-3-0-RC3 netbsd-3-0-RC2 netbsd-3-0-RC1 yamt-km-base4 yamt-km-base3 netbsd-3-base yamt-km-base2 yamt-km-base kent-audio2-base
# 1.108 17-Jan-2005 itojun

branches: 1.108.6; 1.108.8; 1.108.10;
shouldn't check code field on "packet too big" icmp6 message.


Revision tags: kent-audio1-beforemerge kent-audio1-base
# 1.107 25-May-2004 atatat

branches: 1.107.4;
Sysctl descriptions under net subtree (net.key not done)


Revision tags: netbsd-2-0-base
# 1.106 26-Mar-2004 itojun

branches: 1.106.2;
do not touch m->m_pkthdr.rcvif after m becomes invalid. Patrick Latifi


# 1.105 24-Mar-2004 atatat

Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.


# 1.104 17-Dec-2003 lha

Fix ICMPV6CTL_ND6_[DP]RLIST, they broke with new sysctl.
Makes ndp -r/ndp -p work again, patch from atatat


# 1.103 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.102 30-Oct-2003 simonb

Remove some assigned-to but otherwise unused variables.


# 1.101 04-Sep-2003 itojun

revamp inpcb/in6pcb so that they are more aligned with each other.
in6pcb lookup now uses hash(9).


# 1.100 25-Aug-2003 itojun

deref member in in6p directly, don't rely on existence of macro


# 1.99 22-Aug-2003 itojun

remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.


# 1.98 22-Aug-2003 itojun

change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.


# 1.97 22-Aug-2003 jonathan

Replace the set_socket() method of passing an extra struct socket*
argument to ip6_output() with a new explicit struct in6pcb* argument.
(The underlying socket can be obtained via in6pcb->inp6_socket.)

In preparation for fast-ipsec. Reviewed by itojun.


# 1.96 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.95 06-Aug-2003 itojun

m_cat may free mbuf on 2nd arg, so m_pkthdr manipulation has to happen
before m_cat call. from Julian Coleman via kame.


# 1.94 24-Jun-2003 itojun

branches: 1.94.2;
remove unneeded checks of accept_rtadv. from kame


# 1.93 24-Jun-2003 itojun

use time.tv_sec directly


# 1.92 06-Jun-2003 itojun

- sync up MLD declaration with RFC3542 (s/MLD6/MLD/)
- routing header declaration with RFC3542
(note: sizeof(ip6_rthdr0) has changed!)
also, sync up with RFC2460 routing header definition (no "strict" source
routing mode any more)

part of advanced API update (RFC2292 -> 3542).


# 1.91 03-Jun-2003 itojun

remove assumption on redirect header option processing. from kame


# 1.90 14-May-2003 itojun

always use PULLDOWN_TEST codepath.


# 1.89 31-Mar-2003 itojun

avoid mbuf leak in redirect header option attachment. more complete
fix to come. from kame


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.88 27-Sep-2002 provos

remove trailing \n in panic(). approved perry.


# 1.87 23-Sep-2002 simonb

Remove breaks after returns, unreachable returns and returns after
returns(!).


# 1.86 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame.


Revision tags: gehenna-devsw-base
# 1.85 30-Jul-2002 itojun

no need to check NULL mbuf, as we touch it already.
From: tedu <grendel@zeitbombe.org>


# 1.84 10-Jul-2002 itojun

correct ping6 -w result wth hostname with [A-Z]. PR 17540. sync w/kame


# 1.83 30-Jun-2002 thorpej

Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).


# 1.82 09-Jun-2002 itojun

whitespace cleanup


# 1.81 08-Jun-2002 itojun

whitespace cleanup


# 1.80 31-May-2002 itojun

do not mistakenly lock PMTUD route entry with RTV_MTU.


# 1.79 29-May-2002 christos

make this compile again.


# 1.78 29-May-2002 itojun

correct rmx_mtu value after PMTUD entry timeout (should be set to 0)


# 1.77 24-May-2002 itojun

extra blank line


# 1.76 24-May-2002 itojun

make a strict check before sending FQDN node information reply. sync w/kame


Revision tags: netbsd-1-6-base eeh-devprop-base newlock-base
# 1.75 05-Mar-2002 itojun

branches: 1.75.6; 1.75.8;
on redirect output, always try to attach target link layer address option.


Revision tags: ifpoll-base
# 1.74 21-Dec-2001 itojun

whitespace/costmetic sync w/kame


# 1.73 20-Dec-2001 itojun

centralize multicast group management (in6_join/leavegroup).
have a flag for ip6_output() to fragment to minimum MTU.
sync with kame


# 1.72 07-Dec-2001 itojun

correct timing to increment icmp6 MIB variables. sync with kame


# 1.71 13-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-mips-cache-base
# 1.70 29-Oct-2001 simonb

Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.


# 1.69 24-Oct-2001 itojun

more whitespace sync with kame


# 1.68 18-Oct-2001 itojun

branches: 1.68.2;
simplify per-if stats.


# 1.67 15-Oct-2001 itojun

sync with kame.
net.inet6.icmp6.nodeinfo is now a bitmap (2^0 = ping6 -w, 2^1 = ping6 -a).
give up local if there's mbuf alloc failures.
cope with ".." in hostname.
sync comments/whitespaces.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.66 22-Jun-2001 itojun

branches: 1.66.2;
remove RFC1885 compatibility code in #ifdef COMPAT_RFC1885, for icmp6
reply packet size consideration (obsolete, not used for a long time).
sync with kame


# 1.65 01-Jun-2001 itojun

use default hoplimit when incoming interface is not given to icmp6_reflect.
sync with kame


# 1.64 08-May-2001 itojun

correct faith prefix determination. use sys/netinet/if_faith.c:faithprefix()
to determine. sync with kame.
(without this change, non-faith socket may mistakenly accept for-faith traffic)


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.63 04-Apr-2001 itojun

make sure rcvif is sane on call to icmp6_reflect


# 1.62 30-Mar-2001 itojun

enable FAKE_LOOPBACK_IF case by default.
now traffic on loopback interface will be presented to bpf as normal wire
format packet (without KAME scopeid in s6_addr16[1]).

fix KAME PR 250 (host mistakenly accepts packets to fe80::x%lo0).

sync with kame.


# 1.61 21-Mar-2001 itojun

set rmx_mtu to L2 interface mtu, instead of 0, on mtudisc timeout.
ip6_output() change is for safety. sync with kame


# 1.60 08-Mar-2001 itojun

remove bogus rtfree. sync with kame. inspired by openbsd PR 1706.


# 1.59 01-Mar-2001 itojun

branches: 1.59.2;
make sure to enforce inbound ipsec policy checking, for any protocols on top
of ip (check it when final header is visited). sync with kame.
XXX kame team will need to re-check policy engine code


# 1.58 11-Feb-2001 itojun

pull latest kame pcbnotify code. synchronizes ICMPv6 path mtu discovery
behavior with other protocols (i.e. validation, use of hiwat/lowat).


# 1.57 11-Feb-2001 itojun

recover $NetBSD$ (removed by mistake)


# 1.56 10-Feb-2001 itojun

to sync with kame better, (1) remove register declaration for variables,
(2) sync whitespaces, (3) update comments. (4) bring in some of portability
and logging enhancements. no functional changes here.


# 1.55 08-Feb-2001 itojun

implement upper limit to icmp6 redirects (experimental, turned off)
negative value to {mtudisc,redirect}_{hi,lo}wat will turn off the limitation.
sync with kame.


# 1.54 07-Feb-2001 itojun

remove bogus DIAGNOSTIC. sync with kame


# 1.53 07-Feb-2001 itojun

during ip6/icmp6 inbound packet processing, do not call log() nor printf() in
normal operation (/var can get filled up by flodding bogus packets).
sysctl net.inet6.icmp6.nd6_debug will turn on diagnostic messages.
(#define ND6_DEBUG will turn it on by default)

improve stats in ND6 code.

lots of synchronziation with kame (including comments and cometic ones).


# 1.52 24-Jan-2001 itojun

- record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation


# 1.51 16-Jan-2001 itojun

s/ND6DEBUG/ND6_DEBUG/ to meet other places


# 1.50 08-Jan-2001 itojun

wrap icmp6 checksum error printf() into #ifdef ND6DEBUG.
sync with kame, NetBSD PR 11911.


# 1.49 11-Dec-2000 itojun

no need to rtalloc1() twice in pmtud. from kame


# 1.48 09-Dec-2000 itojun

update icmp6 too big validation. the change is necessary since pmtud is
mandatory for IPv6 (so we can't just validate by using connected pcb - we need
to allow traffic from unconnected pcb to do pmtud).
- if the traffic is validated by xx_ctlinput, allow up to "hiwat" pmtud
route entries.
- if the traffic was not validated by xx_ctlinput, allow up to "lowat" pmtud
route entries (there's upper limit, so bad guys cannot blow up our routing
table).
sync with kame

XXX need to think again about default hiwat/lowat value.
XXX victim selection to help starvation case


# 1.47 11-Nov-2000 itojun

improve spec conformance of node information query (07).
sync with kame.


# 1.46 18-Oct-2000 itojun

verify ICMPv6 too big messages based on TCP pcbs, and/or IPsec SA.
TODO: udp6, and sendto consideration. as pmtud is mandatory for IPv6,
it is rather important for us to support those cases.
TODO: more testing
TODO: kame sync


# 1.45 10-Oct-2000 itojun

sync with kame ($KAME$)


# 1.44 02-Oct-2000 itojun

fix compilation without INET. fix confusion between ipsecstat and ipsec6stat.
sync with kame.


# 1.43 16-Sep-2000 itojun

kame sys/netinet6/icmp6.c 1.140 -> 1.144
> in the check for the incoming redirect message, examine the gateway
> (from the routing table) only when the address family of the gateway is
> AF_INET6.


# 1.42 19-Aug-2000 itojun

- icmp6 nodeinfo: remove possibility of unaligned pointer access.
- jumbo payload output: fix incorrect mbuf manipulation
- pedant: align issues, mbuf assumption
(sync with kame)


# 1.41 03-Aug-2000 itojun

clearifications in icmp6 node query support.
XXX previous commit included "supported qtypes" icmp6 node query support.
sorry commit message was mistaken.


# 1.40 03-Aug-2000 itojun

correct typo in #define. ICMP6_NI_SUCESS -> SUCCESS (notice missing C).
sync with kame.


# 1.39 30-Jul-2000 itojun

sync comment with reality


# 1.38 28-Jul-2000 itojun

nuke the following sysctl variables. "ppsratelimit" should work better.
need to recompile sbin/sysctl after updating /usr/include.
net.inet.tcp.rstratelimit
net.inet.icmp.errratelimit
net.inet6.icmp6.errratelimit


# 1.37 09-Jul-2000 itojun

add ppsratelimit(9), which does event-per-sec rate limitation.
use it from icmp6 error rate limitation code.
XXX better name for the function?


# 1.36 07-Jul-2000 itojun

sync with kame.
introduce in6_{recover,embed}scope, for in-kernel scoped-address manipulation.
improve in6_pcbnotify.


# 1.35 06-Jul-2000 itojun

- do not use bitfield for router renumbering header.
- add protection mechanism against ND cache corruption due to bad NUD hints.
- more stats
- icmp6 pps limitation. TOOD: should implement ppsratecheck(9).


# 1.34 28-Jun-2000 mrg

<vm/vm.h> -> <uvm/uvm_extern.h>


Revision tags: netbsd-1-5-base
# 1.33 13-Jun-2000 itojun

branches: 1.33.2;
signedness issue with char, take 2. confirmed with i386 cc -funsigned-char.


# 1.32 13-Jun-2000 itojun

workaround to suppress warning on char == unsigned char arch.


# 1.31 12-Jun-2000 itojun

better conformance to draft-ietf-ipngwg-icmp-name-lookups-05.
the old code was chimera of 03 and 05 draft.

-n by default, since IPv6 reverse lookup takes too much time.
use -H to enable reverse name lookup.


Revision tags: minoura-xpg4dl-base
# 1.30 22-May-2000 itojun

branches: 1.30.2;
disallow negative numbers for ratelimit interval (tcp, icmp, icmp6).


# 1.29 09-May-2000 itojun

do not try NUD unless the gateway is a real neighbor.
real fix to KAME PR 245 (workaround has been implemented).


# 1.28 13-Apr-2000 itojun

do not return icmp6 error against icmp6 error.
(this is due to a bug in header chain chasing)


# 1.27 22-Mar-2000 itojun

use ip6_{last,next}hdr in icmp6 inbound packet parsing.


# 1.26 01-Mar-2000 itojun

introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)


# 1.25 28-Feb-2000 itojun

fix ICMPv6 redirect input. the bug can result in invalid ND entry.


# 1.24 28-Feb-2000 itojun

support draft-ietf-ipngwg-icmp-name-lookups-05.txt, drop support for
draft-ietf-ipngwg-icmp-name-lookups-04.txt.

There are certain bitfield change in 04 draft to 05 draft, which makes
04 "ping6 -a" and 05 "ping6 -a" not interoperable. sigh.


# 1.23 26-Feb-2000 itojun

bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.


# 1.22 17-Feb-2000 darrenr

Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.


# 1.21 15-Feb-2000 thorpej

Fix a couple of brainos in the last.


# 1.20 14-Feb-2000 thorpej

Use ratecheck() for ICMP6 rate limiting.


Revision tags: chs-ubc2-newbase
# 1.19 06-Feb-2000 itojun

fix include pathname for better rfc2292 compliance.


# 1.18 16-Jan-2000 itojun

add missing ipcomp cases.


# 1.17 07-Jan-2000 itohy

Rename variable "prep" for PReP port.


# 1.16 06-Jan-2000 itojun

remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...


# 1.15 05-Jan-2000 itojun

avoid panic on getsockopt(ICMPV6_FILTER).


# 1.14 02-Jan-2000 itojun

add net.inet6.icmp6.nodeinfo sysctl.
this allows you to disable/enable ICMPv6 node information query/reply
processing (which tells remote end the gethostname(3) setting, interface
addresses on the node, and some other things - documented in
draft-ietf-ipngwg-icmp-name-lookup* or something alike).

to test it, try ping6 -w ::1 with nodeinfo=0 and nodeinfo=1.
(sync with kame change)


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.13 15-Dec-1999 itojun

do not overwrite traffic class field when we write IPv6 version field.


# 1.12 13-Dec-1999 itojun

sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)


Revision tags: comdex-fall-1999-base fvdl-softdep-base
# 1.11 01-Oct-1999 itojun

branches: 1.11.2; 1.11.8;
consistent logging for icmp6 redirects
XXX should make logs 1-liner so that duplicated logs can be compressed
by syslog(8)?


Revision tags: chs-ubc2-base
# 1.10 31-Jul-1999 itojun

sync with recent KAME.
- loosen ipsec restriction on packet diredtion.
- revise icmp6 redirect handling on IsRouter bit.
- tcp/udp notification processing (link-local address case)
- cosmetic fixes (better code share across *BSD).


# 1.9 30-Jul-1999 itojun

remove reference to in6_systm.h (file itself will be removed afterwords)


# 1.8 22-Jul-1999 itojun

- implement IPv6 pmtud, which is necessary for TCP6.
- fix memory leak on SO_DEBUG over TCP.


# 1.7 22-Jul-1999 itojun

change unnecessary u_long/long into u_int32_t or something relevant.
more fixes should follow.


# 1.6 09-Jul-1999 thorpej

defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).


# 1.5 06-Jul-1999 itojun

sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups


# 1.4 06-Jul-1999 itojun

checked build on alpha and i386, with GENERIC.v6.
fixed several sizeof(void *) and sizeof(size_t) issues on alpha.

Thanks to: Dave Huang and Tim Rightnour


# 1.3 03-Jul-1999 thorpej

RCS ID police.


# 1.2 01-Jul-1999 itojun

branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.1 28-Jun-1999 itojun

branches: 1.1.2;
file icmp6.c was initially added on branch kame.


# 1.209 13-Feb-2017 ozaki-r

Protect mtudisc and redirect stuffs of icmp/icmp6 with mutex

We have to run pr_init of icmp and icmp6 prior to tcp and tcp6 ones
for mutex initialization.


# 1.208 07-Feb-2017 ozaki-r

Add missing NULL checks for m_get_rcvif


Revision tags: nick-nhusb-base-20170204
# 1.207 02-Feb-2017 ozaki-r

Defer some pr_input to workqueue

pr_input is currently called in softint. Some pr_input such as ICMP, ICMPv6
and CARP can add/delete/update IP addresses and routing table entries. For
example, icmp6_redirect_input updates an a routing table entry and
nd6_ra_input may delete an IP address.

Basically such operations shouldn't be done in softint. That aside, we have
a reason to avoid the situation; psz/psref waits cannot be used in softint,
however they are required to work in such pr_input in the MP-safe world.

The change implements the workqueue pr_input framework called wqinput which
provides a means to defer pr_input of a protocol to workqueue easily.
Currently icmp_input, icmp6_input, carp_proto_input and carp6_proto_input
are deferred to workqueue by the framework.

Proposed and discussed on tech-kern and tech-net


# 1.206 16-Jan-2017 christos

ip6_sprintf -> IN6_PRINT so that we pass the size.


# 1.205 16-Jan-2017 ryo

Make ip6_sprintf(), in_fmtaddr(), lla_snprintf() and icmp6_redirect_diag() mpsafe.

Reviewed by ozaki-r@


Revision tags: bouyer-socketcan-base
# 1.204 13-Jan-2017 ozaki-r

Tweak icmp6_input; always use off, not *offp


Revision tags: pgoyette-localcount-20170107
# 1.203 12-Dec-2016 ozaki-r

Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.202 11-Dec-2016 ozaki-r

Correct sanity checks of icmp6_redirect_output

- rt->rt_ifp is always non-NULL
- Checking RTF_UP here is just racy and meaningless
- The arguments should be non-NULL (at least for now)


Revision tags: nick-nhusb-base-20161204
# 1.201 15-Nov-2016 mlelstv

Enforce alignment requirements that are violated in some cases.
For machines that don't need strict alignment (i386,amd64,vax,m68k) this
is a no-op.

Fixes PR kern/50766 but should be improved.


Revision tags: pgoyette-localcount-20161104
# 1.200 31-Oct-2016 ozaki-r

Fix race condition of in6_selectsrc

in6_selectsrc returned a pointer to in6_addr that wan't guaranteed to be
safe by pserialize (or psref), which was racy. Let callers pass a pointer
to in6_addr and in6_selectsrc copy a result to it inside pserialize
critical sections.


# 1.199 25-Oct-2016 ozaki-r

Remove unnecessary argument

No functional change.


# 1.198 18-Oct-2016 ozaki-r

Remove unnecessary pserialize_read_enter


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.197 26-Aug-2016 dholland

PR 51434 David Binderman: remove redundant test.


# 1.196 19-Aug-2016 roy

Revert r1.148
IP6_EXTHDR_GET ensures that a icmp6 header can be fetched from the mbuf
so m_pullup does not need to be called.

While here, we can safely increament interface error stats even with an
invalidated mbuf because we have a saved reference to the interface.


Revision tags: pgoyette-localcount-20160806
# 1.195 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


Revision tags: pgoyette-localcount-20160726
# 1.194 15-Jul-2016 ozaki-r

Use sin6tosa and sin6tocsa macros

No functional change.


# 1.193 15-Jul-2016 ozaki-r

Use ifatoia6 macro

No functional change.


Revision tags: pgoyette-localcount-base nick-nhusb-base-20160907
# 1.192 07-Jul-2016 ozaki-r

branches: 1.192.2;
Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.


# 1.191 05-Jul-2016 ozaki-r

Use ia6 or ia instead of ifa as a variable name of struct in6_ifaddr

We conventionally use ifa for struct ifaddr and use ia6 or ia for
struct in6_ifaddr.

No functional change.


# 1.190 28-Jun-2016 ozaki-r

Add missing NULL checks for m_get_rcvif_psref


# 1.189 21-Jun-2016 ozaki-r

Make sure returning ifp from in6_select* functions psref-ed

To this end, callers need to pass struct psref to the functions
and the fuctions acquire a reference of ifp with it. In some cases,
we can simply use if_get_byindex, however, in other cases
(say rt->rt_ifp and ia->ifa_ifp), we have no MP-safe way for now.
In order to take a reference anyway we use non MP-safe function
if_acquire_NOMPSAFE for the latter cases. They should be fixed in
the future somehow.


# 1.188 10-Jun-2016 ozaki-r

Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.


# 1.187 10-Jun-2016 ozaki-r

Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.186 18-May-2016 ozaki-r

Don't try to get outif unnecessarily from in6_selectsrc

The got outif is unused.


# 1.185 17-May-2016 ozaki-r

Get rcvif once and reuse it

No functional change.


# 1.184 17-May-2016 ozaki-r

Make sure icmp6_redirect_input frees mbuf before return


# 1.183 12-May-2016 ozaki-r

Protect ifnet list with psz and psref

The change ensures that ifnet objects in the ifnet list aren't freed during
list iterations by using pserialize(9) and psref(9).

Note that the change adds a pslist(9) for ifnet but doesn't remove the
original ifnet list (ifnet_list) to avoid breaking kvm(3) users. We
shouldn't use the original list in the kernel anymore.


Revision tags: nick-nhusb-base-20160422
# 1.182 04-Apr-2016 ozaki-r

Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html


# 1.181 01-Apr-2016 ozaki-r

Remove unnecessary casts and do s/0/NULL/ for rtrequest


# 1.180 01-Apr-2016 ozaki-r

Refine nd6log

Add __func__ to nd6log itself instead of adding it to callers.


Revision tags: nick-nhusb-base-20160319
# 1.179 21-Jan-2016 riastradh

Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.


# 1.178 21-Jan-2016 riastradh

Give proper prototype to ip_output.


Revision tags: nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.177 14-Sep-2015 ozaki-r

Update icmp6_redirect_timeout_q when changing net.inet6.icmp6.redirtimeout

We have to update icmp6_redirect_timeout_q as well as icmp6_redirtimeout
when changing net.inet6.icmp6.redirtimeout via sysctl. The updating logic
is copied from sysctl_net_inet_icmp_redirtimeout.

This change is from s-yamaguchi@IIJ (with KNF by ozaki-r) and fixes
PR kern/50240.


# 1.176 31-Aug-2015 ozaki-r

Make rt_refcnt take into account rt_timer


# 1.175 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.174 24-Aug-2015 ozaki-r

Change 0 to NULL for rtrequest's last argument (struct rtentry **ret_nrt)


# 1.173 07-Aug-2015 ozaki-r

Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.


# 1.172 24-Jul-2015 ozaki-r

Fix rtfree-ing wrong rtentry


# 1.171 17-Jul-2015 ozaki-r

Reform use of rt_refcnt

rt_refcnt of rtentry was used in bad manners, for example, direct rt_refcnt++
and rt_refcnt-- outside route.c, "rt->rt_refcnt++; rtfree(rt);" idiom, and
touching rt after rt->rt_refcnt--.

These abuses seem to be needed because rt_refcnt manages only references
between rtentry and doesn't take care of references during packet processing
(IOW references from local variables). In order to reduce the above abuses,
the latter cases should be counted by rt_refcnt as well as the former cases.

This change improves consistency of use of rt_refcnt:
- rtentry is always accessed with rt_refcnt incremented
- rtentry's rt_refcnt is decremented after use (rtfree is always used instead
of rt_refcnt--)
- functions returning rtentry increment its rt_refcnt (and caller rtfree it)

Note that rt_refcnt prevents rtentry from being freed but doesn't prevent
rtentry from being updated. Toward MP-safe, we need to provide another
protection for rtentry, e.g., locks. (Or introduce a better data structure
allowing concurrent readers during updates.)


Revision tags: nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.170 25-Nov-2014 christos

branches: 1.170.2;
CID 977389: Out of bounds access.


Revision tags: netbsd-7-0-2-RELEASE netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.169 06-Jun-2014 rmind

branches: 1.169.2;
- Eliminate RTFREE() macro in favour of rtfree() function.
- Make rtcache() function static.


# 1.168 30-May-2014 christos

Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.


# 1.167 19-May-2014 rmind

- Split off PRU_ATTACH and PRU_DETACH logic into separate functions.
- Replace malloc with kmem and eliminate M_PCB while here.
- Sprinkle more asserts.


Revision tags: rmind-smpnet-nbase rmind-smpnet-base
# 1.166 18-May-2014 rmind

Use IFNET_FIRST() rather than open coding ifnet access.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.165 25-Feb-2014 pooka

branches: 1.165.2;
Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.


# 1.164 20-Feb-2014 joerg

Bail out in case m_pulldown failed.


# 1.163 23-Nov-2013 christos

convert from CIRCLEQ to TAILQ.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.162 05-Jun-2013 christos

branches: 1.162.2;
IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.


Revision tags: agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.161 23-Jun-2012 christos

branches: 1.161.2;
4 new sysctls to avoid ipv6 DoS attacks from OpenBSD


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.160 22-Mar-2012 drochner

remove KAME IPSEC, replaced by FAST_IPSEC


Revision tags: netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.159 31-Dec-2011 christos

branches: 1.159.2; 1.159.6; 1.159.8;
- fix offsetof usage, and redundant defines
- kill pointer casts to 0


# 1.158 19-Dec-2011 drochner

rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.157 31-Aug-2011 plunky

branches: 1.157.2; 1.157.6;
NULL does not need a cast


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 rmind-uvmplock-base
# 1.156 12-Sep-2010 drochner

avoid NULL dereference in error case


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.155 18-Oct-2009 christos

branches: 1.155.2; 1.155.4;
fix the sun2 case for real.


# 1.154 12-Oct-2009 christos

unbreak sun2.


# 1.153 16-Sep-2009 pooka

Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.152 18-Mar-2009 cegger

bzero -> memset


# 1.151 18-Mar-2009 cegger

bcmp -> memcmp


Revision tags: netbsd-5-2-3-RELEASE netbsd-5-1-5-RELEASE netbsd-5-2-2-RELEASE netbsd-5-1-4-RELEASE netbsd-5-2-1-RELEASE netbsd-5-1-3-RELEASE netbsd-5-2-RELEASE netbsd-5-2-RC1 netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 nick-hppapmap-base2 netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 haad-dm-base mjf-devfs2-base
# 1.150 03-Oct-2008 adrianp

branches: 1.150.2; 1.150.8;
Fix for CVE-2008-3530 from matt@
Implement improved checking for MTU values on ICMP 'Packet Too Big Messages'


Revision tags: wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.149 06-Aug-2008 plunky

Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base
# 1.148 07-May-2008 bouyer

branches: 1.148.2; 1.148.6;
Sync with ipv4 icmp_input(): make sure the mbuf is writable and
contains the entire icmp message befre calling icmp6_input().
should fix "panic: mbuf too short for IPv6 header" seen by several peoples.


# 1.147 04-May-2008 thorpej

Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577


Revision tags: yamt-nfs-mp-base
# 1.146 23-Apr-2008 thorpej

branches: 1.146.2;
Use <net/net_stats.h> / netstat_sysctl().


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.145 15-Apr-2008 thorpej

branches: 1.145.2;
Make ip6 and icmp6 stats per-cpu.


# 1.144 08-Apr-2008 thorpej

Change IPv6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ip6stat structure; old netstat
binaries will continue to work properly.


# 1.143 08-Apr-2008 thorpej

Change ICMP6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old icmp6stat structure; old netstat
binaries will continue to work properly.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.142 27-Feb-2008 matt

Convert to ansi definitions from old-style definitons.
Remember that func() is not ansi, func(void) is.


Revision tags: nick-net80211-sync-base bouyer-xeni386-merge1 vmlocking2-base3 bouyer-xeni386-nbase yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 bouyer-xeni386-base yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase mjf-devfs-base matt-armv6-base jmcneill-pm-base hpcarm-cleanup-base reinoud-bufcleanup-base
# 1.141 04-Dec-2007 dyoung

branches: 1.141.8; 1.141.12;
Use IFNET_FOREACH() and IFADDR_FOREACH().


Revision tags: vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.140 01-Nov-2007 dyoung

branches: 1.140.2; 1.140.4;
De-__P().


# 1.139 29-Oct-2007 dyoung

The IPv6 stack labels incoming packets with an m_tag whose payload
is a struct ip6aux. A struct ip6aux used to contain a pointer to
an in6_ifaddr, but that pointer could become a dangling reference
in the lifetime of the m_tag, because ip6_setdstifaddr() did not
increase the in6_ifaddr's reference count. I have removed the
pointer from ip6aux. I load it with the interesting fields from
the in6_ifaddr (an IPv6 address, a scope ID, and some flags),
instead.


# 1.138 24-Oct-2007 dyoung

Replace rote sockaddr_in6 initializations (memset(), set sa6_family,
sa6_len, and sa6_add) with sockaddr_in6_init() calls.

De-__P(). Constify. KNF. Shorten a staircase. Change bcmp() to
memcmp().

Extract subroutine in6_setzoneid() from in6_setscope(), for re-use
soon.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base vmlocking-base
# 1.137 19-Sep-2007 dyoung

branches: 1.137.4;
1) Introduce a new socket option, (SOL_SOCKET, SO_NOHEADER), that
tells a socket that it should both add a protocol header to tx'd
datagrams and remove the header from rx'd datagrams:

int onoff = 1, s = socket(...);
setsockopt(s, SOL_SOCKET, SO_NOHEADER, &onoff);

2) Add an implementation of (SOL_SOCKET, SO_NOHEADER) for raw IPv4
sockets.

3) Reorganize the protocols' pr_ctloutput implementations a bit.
Consistently return ENOPROTOOPT when an option is unsupported,
and EINVAL if a supported option's arguments are incorrect.
Reorganize the flow of code so that it's more clear how/when
options are passed down the stack until they are handled.

Shorten some pr_ctloutput staircases for readability.

4) Extract common mbuf code into subroutines, add new sockaddr
methods, and introduce a new subroutine, fsocreate(), for reuse
later; use it first in sys_socket():

struct mbuf *m_getsombuf(struct socket *so)

Create an mbuf and make its owner the socket `so'.

struct mbuf *m_intopt(struct socket *so, int val)

Create an mbuf, make its owner the socket `so', put the
int `val' into it, and set its length to sizeof(int).


int fsocreate(..., int *fd)

Create a socket, a la socreate(9), put the socket into the
given LWP's descriptor table, return the descriptor at `fd'
on success.

void *sockaddr_addr(struct sockaddr *sa, socklen_t *slenp)
const void *sockaddr_const_addr(const struct sockaddr *sa, socklen_t *slenp)

Extract a pointer to the address part of a sockaddr. Write
the length of the address part at `slenp', if `slenp' is
not NULL.

socklen_t sockaddr_getlen(const struct sockaddr *sa)

Return the length of a sockaddr. This just evaluates to
sa->sa_len. I only add this for consistency with code that
appears in a portable userland library that I am going to
import.

const struct sockaddr *sockaddr_any(const struct sockaddr *sa)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.

const void *sockaddr_anyaddr(const struct sockaddr *sa, socklen_t *slenp)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.


Revision tags: nick-csl-alignment-base5
# 1.136 10-Aug-2007 dyoung

branches: 1.136.2;
Constify. bcopy -> memcpy.


Revision tags: matt-mips64-base
# 1.135 19-Jul-2007 dyoung

branches: 1.135.4; 1.135.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.134 13-Jun-2007 dyoung

branches: 1.134.2;
Persuasive programming: check M_UNWRITABLE(m, len) instead of
m->m_len<len before pulling up, because that helps make it clear
that we m_pullup() in order to guarantee that the contiguous region
is *writable*.


# 1.133 23-May-2007 christos

Ansify + add a few comments, from Karl Sj��dahl


Revision tags: yamt-idlelwp-base8
# 1.132 02-May-2007 dyoung

Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


Revision tags: thorpej-atomic-base
# 1.131 04-Mar-2007 christos

branches: 1.131.2; 1.131.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.130 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


# 1.129 10-Feb-2007 degroote

branches: 1.129.2;
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.128 29-Jan-2007 dyoung

bzero -> memset


# 1.127 15-Jan-2007 dyoung

Cosmetic: indent using ASCII horizontal tab, insert space following
comma, wrap line.


# 1.126 15-Jan-2007 degroote

Fix an infinite loop ( and local dos ) in the case where the ip6_hdr and
the icmp6_hdr are not in the same mbuf.
Fix pr/34994 and probably pr/35333
Ok @rpaulo


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.125 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.124 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


Revision tags: netbsd-4-base
# 1.123 16-Nov-2006 christos

branches: 1.123.2;
__unused removal on arguments; approved by core.


Revision tags: yamt-splraiseipl-base2
# 1.122 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.121 05-Sep-2006 dyoung

branches: 1.121.2; 1.121.4;
Simplify and repair icmp6_input() to stop the kernel from panicking
in m_copydata() when an ICMP6_ECHO_REQUEST is received, as reported
by Tatoku Ogaito on current-users@.


Revision tags: yamt-pdpolicy-base8
# 1.120 01-Sep-2006 dyoung

Vastly simplify the code that copies an ICMP6 packet to two data
paths: ICMP6 reply path, and socket path.


# 1.119 30-Aug-2006 christos

declare the type of code.


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.118 11-Jul-2006 tron

Clear mbuf checksum flags before passing it to ip6_output(). We might
recycle a mbuf which contained a hardware provided checksum. This
fixes "traceroute6" to a machine which is using a wm(4) interface
that has UDP or TCP checksum offload enabled.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base
# 1.117 07-Jun-2006 kardel

branches: 1.117.2;
merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base simonb-timecounters-base
# 1.116 15-Apr-2006 christos

branches: 1.116.2;
Coverity CID 740: Change constant comparisons to MCLBYTES to KASSERT and remove
extraneous tests.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2
# 1.115 05-Mar-2006 rpaulo

branches: 1.115.2; 1.115.4;
NDP-related improvements:
RFC4191
- supports host-side router-preference

RFC3542
- if DAD fails on a interface, disables IPv6 operation on the
interface
- don't advertise MLD report before DAD finishes

Others
- fixes integer overflow for valid and preferred lifetimes
- improves timer granularity for MLD, using callout-timer.
- reflects rtadvd's IPv6 host variable information into kernel
(router only)
- adds a sysctl option to enable/disable pMTUd for multicast
packets
- performs NUD on PPP/GRE interface by default
- Redirect works regardless of ip6_accept_rtadv
- removes RFC1885-related code

From the KAME project via SUZUKI Shinsuke.
Reviewed by core.


Revision tags: yamt-pdpolicy-base
# 1.114 03-Mar-2006 rpaulo

branches: 1.114.2;
Fix typos in comments.

From: the KAME project via SUZUKI Shinsuke.


Revision tags: yamt-uio_vmspace-base5
# 1.113 21-Jan-2006 rpaulo

branches: 1.113.2; 1.113.4;
Better support of IPv6 scoped addresses.

- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.
- and, while there, don't try to remove the ff02::/32 interface route
entry in in6_ifdetach() as it's already gone.

This also includes some level of support for the standard source
address selection algorithm defined in RFC3484, which will be
completed on in the future.

From the KAME project via JINMEI Tatuya.
Approved by core@.


# 1.112 11-Dec-2005 christos

branches: 1.112.2;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base ktrace-lwp-base
# 1.111 19-Oct-2005 bouyer

In icmp6_redirect_output(), sip6 is initialised to point to the data area of
m0. But m0 may be freed later, so trying to use sip6 at the end of this
function is wrong. My guess is that we want to reference the data area
of m (the mbuf about to be send) instead at this point.
Fix a panic on Xen (where a data area of a mbuf may be unmapped when the
mbuf is freed), and probably potential data/pool corruption in other cases.


Revision tags: yamt-vop-base
# 1.110 18-Aug-2005 yamt

branches: 1.110.2;
- introduce M_MOVE_PKTHDR and use it where appropriate.
intended to be mostly API compatible with openbsd/freebsd.
- remove a glue #define in netipsec/ipsec_osdep.h.


# 1.109 29-May-2005 christos

branches: 1.109.2;
- avoid shadowed variables
- sprinkle const.


Revision tags: netbsd-3-1-1-RELEASE netbsd-3-0-3-RELEASE netbsd-3-1-RELEASE netbsd-3-0-2-RELEASE netbsd-3-1-RC4 netbsd-3-1-RC3 netbsd-3-1-RC2 netbsd-3-1-RC1 netbsd-3-0-1-RELEASE netbsd-3-0-RELEASE netbsd-3-0-RC6 netbsd-3-0-RC5 netbsd-3-0-RC4 netbsd-3-0-RC3 netbsd-3-0-RC2 netbsd-3-0-RC1 yamt-km-base4 yamt-km-base3 netbsd-3-base yamt-km-base2 yamt-km-base kent-audio2-base
# 1.108 17-Jan-2005 itojun

branches: 1.108.6; 1.108.8; 1.108.10;
shouldn't check code field on "packet too big" icmp6 message.


Revision tags: kent-audio1-beforemerge kent-audio1-base
# 1.107 25-May-2004 atatat

branches: 1.107.4;
Sysctl descriptions under net subtree (net.key not done)


Revision tags: netbsd-2-0-base
# 1.106 26-Mar-2004 itojun

branches: 1.106.2;
do not touch m->m_pkthdr.rcvif after m becomes invalid. Patrick Latifi


# 1.105 24-Mar-2004 atatat

Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.


# 1.104 17-Dec-2003 lha

Fix ICMPV6CTL_ND6_[DP]RLIST, they broke with new sysctl.
Makes ndp -r/ndp -p work again, patch from atatat


# 1.103 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.102 30-Oct-2003 simonb

Remove some assigned-to but otherwise unused variables.


# 1.101 04-Sep-2003 itojun

revamp inpcb/in6pcb so that they are more aligned with each other.
in6pcb lookup now uses hash(9).


# 1.100 25-Aug-2003 itojun

deref member in in6p directly, don't rely on existence of macro


# 1.99 22-Aug-2003 itojun

remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.


# 1.98 22-Aug-2003 itojun

change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.


# 1.97 22-Aug-2003 jonathan

Replace the set_socket() method of passing an extra struct socket*
argument to ip6_output() with a new explicit struct in6pcb* argument.
(The underlying socket can be obtained via in6pcb->inp6_socket.)

In preparation for fast-ipsec. Reviewed by itojun.


# 1.96 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.95 06-Aug-2003 itojun

m_cat may free mbuf on 2nd arg, so m_pkthdr manipulation has to happen
before m_cat call. from Julian Coleman via kame.


# 1.94 24-Jun-2003 itojun

branches: 1.94.2;
remove unneeded checks of accept_rtadv. from kame


# 1.93 24-Jun-2003 itojun

use time.tv_sec directly


# 1.92 06-Jun-2003 itojun

- sync up MLD declaration with RFC3542 (s/MLD6/MLD/)
- routing header declaration with RFC3542
(note: sizeof(ip6_rthdr0) has changed!)
also, sync up with RFC2460 routing header definition (no "strict" source
routing mode any more)

part of advanced API update (RFC2292 -> 3542).


# 1.91 03-Jun-2003 itojun

remove assumption on redirect header option processing. from kame


# 1.90 14-May-2003 itojun

always use PULLDOWN_TEST codepath.


# 1.89 31-Mar-2003 itojun

avoid mbuf leak in redirect header option attachment. more complete
fix to come. from kame


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.88 27-Sep-2002 provos

remove trailing \n in panic(). approved perry.


# 1.87 23-Sep-2002 simonb

Remove breaks after returns, unreachable returns and returns after
returns(!).


# 1.86 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame.


Revision tags: gehenna-devsw-base
# 1.85 30-Jul-2002 itojun

no need to check NULL mbuf, as we touch it already.
From: tedu <grendel@zeitbombe.org>


# 1.84 10-Jul-2002 itojun

correct ping6 -w result wth hostname with [A-Z]. PR 17540. sync w/kame


# 1.83 30-Jun-2002 thorpej

Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).


# 1.82 09-Jun-2002 itojun

whitespace cleanup


# 1.81 08-Jun-2002 itojun

whitespace cleanup


# 1.80 31-May-2002 itojun

do not mistakenly lock PMTUD route entry with RTV_MTU.


# 1.79 29-May-2002 christos

make this compile again.


# 1.78 29-May-2002 itojun

correct rmx_mtu value after PMTUD entry timeout (should be set to 0)


# 1.77 24-May-2002 itojun

extra blank line


# 1.76 24-May-2002 itojun

make a strict check before sending FQDN node information reply. sync w/kame


Revision tags: netbsd-1-6-base eeh-devprop-base newlock-base
# 1.75 05-Mar-2002 itojun

branches: 1.75.6; 1.75.8;
on redirect output, always try to attach target link layer address option.


Revision tags: ifpoll-base
# 1.74 21-Dec-2001 itojun

whitespace/costmetic sync w/kame


# 1.73 20-Dec-2001 itojun

centralize multicast group management (in6_join/leavegroup).
have a flag for ip6_output() to fragment to minimum MTU.
sync with kame


# 1.72 07-Dec-2001 itojun

correct timing to increment icmp6 MIB variables. sync with kame


# 1.71 13-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-mips-cache-base
# 1.70 29-Oct-2001 simonb

Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.


# 1.69 24-Oct-2001 itojun

more whitespace sync with kame


# 1.68 18-Oct-2001 itojun

branches: 1.68.2;
simplify per-if stats.


# 1.67 15-Oct-2001 itojun

sync with kame.
net.inet6.icmp6.nodeinfo is now a bitmap (2^0 = ping6 -w, 2^1 = ping6 -a).
give up local if there's mbuf alloc failures.
cope with ".." in hostname.
sync comments/whitespaces.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.66 22-Jun-2001 itojun

branches: 1.66.2;
remove RFC1885 compatibility code in #ifdef COMPAT_RFC1885, for icmp6
reply packet size consideration (obsolete, not used for a long time).
sync with kame


# 1.65 01-Jun-2001 itojun

use default hoplimit when incoming interface is not given to icmp6_reflect.
sync with kame


# 1.64 08-May-2001 itojun

correct faith prefix determination. use sys/netinet/if_faith.c:faithprefix()
to determine. sync with kame.
(without this change, non-faith socket may mistakenly accept for-faith traffic)


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.63 04-Apr-2001 itojun

make sure rcvif is sane on call to icmp6_reflect


# 1.62 30-Mar-2001 itojun

enable FAKE_LOOPBACK_IF case by default.
now traffic on loopback interface will be presented to bpf as normal wire
format packet (without KAME scopeid in s6_addr16[1]).

fix KAME PR 250 (host mistakenly accepts packets to fe80::x%lo0).

sync with kame.


# 1.61 21-Mar-2001 itojun

set rmx_mtu to L2 interface mtu, instead of 0, on mtudisc timeout.
ip6_output() change is for safety. sync with kame


# 1.60 08-Mar-2001 itojun

remove bogus rtfree. sync with kame. inspired by openbsd PR 1706.


# 1.59 01-Mar-2001 itojun

branches: 1.59.2;
make sure to enforce inbound ipsec policy checking, for any protocols on top
of ip (check it when final header is visited). sync with kame.
XXX kame team will need to re-check policy engine code


# 1.58 11-Feb-2001 itojun

pull latest kame pcbnotify code. synchronizes ICMPv6 path mtu discovery
behavior with other protocols (i.e. validation, use of hiwat/lowat).


# 1.57 11-Feb-2001 itojun

recover $NetBSD$ (removed by mistake)


# 1.56 10-Feb-2001 itojun

to sync with kame better, (1) remove register declaration for variables,
(2) sync whitespaces, (3) update comments. (4) bring in some of portability
and logging enhancements. no functional changes here.


# 1.55 08-Feb-2001 itojun

implement upper limit to icmp6 redirects (experimental, turned off)
negative value to {mtudisc,redirect}_{hi,lo}wat will turn off the limitation.
sync with kame.


# 1.54 07-Feb-2001 itojun

remove bogus DIAGNOSTIC. sync with kame


# 1.53 07-Feb-2001 itojun

during ip6/icmp6 inbound packet processing, do not call log() nor printf() in
normal operation (/var can get filled up by flodding bogus packets).
sysctl net.inet6.icmp6.nd6_debug will turn on diagnostic messages.
(#define ND6_DEBUG will turn it on by default)

improve stats in ND6 code.

lots of synchronziation with kame (including comments and cometic ones).


# 1.52 24-Jan-2001 itojun

- record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation


# 1.51 16-Jan-2001 itojun

s/ND6DEBUG/ND6_DEBUG/ to meet other places


# 1.50 08-Jan-2001 itojun

wrap icmp6 checksum error printf() into #ifdef ND6DEBUG.
sync with kame, NetBSD PR 11911.


# 1.49 11-Dec-2000 itojun

no need to rtalloc1() twice in pmtud. from kame


# 1.48 09-Dec-2000 itojun

update icmp6 too big validation. the change is necessary since pmtud is
mandatory for IPv6 (so we can't just validate by using connected pcb - we need
to allow traffic from unconnected pcb to do pmtud).
- if the traffic is validated by xx_ctlinput, allow up to "hiwat" pmtud
route entries.
- if the traffic was not validated by xx_ctlinput, allow up to "lowat" pmtud
route entries (there's upper limit, so bad guys cannot blow up our routing
table).
sync with kame

XXX need to think again about default hiwat/lowat value.
XXX victim selection to help starvation case


# 1.47 11-Nov-2000 itojun

improve spec conformance of node information query (07).
sync with kame.


# 1.46 18-Oct-2000 itojun

verify ICMPv6 too big messages based on TCP pcbs, and/or IPsec SA.
TODO: udp6, and sendto consideration. as pmtud is mandatory for IPv6,
it is rather important for us to support those cases.
TODO: more testing
TODO: kame sync


# 1.45 10-Oct-2000 itojun

sync with kame ($KAME$)


# 1.44 02-Oct-2000 itojun

fix compilation without INET. fix confusion between ipsecstat and ipsec6stat.
sync with kame.


# 1.43 16-Sep-2000 itojun

kame sys/netinet6/icmp6.c 1.140 -> 1.144
> in the check for the incoming redirect message, examine the gateway
> (from the routing table) only when the address family of the gateway is
> AF_INET6.


# 1.42 19-Aug-2000 itojun

- icmp6 nodeinfo: remove possibility of unaligned pointer access.
- jumbo payload output: fix incorrect mbuf manipulation
- pedant: align issues, mbuf assumption
(sync with kame)


# 1.41 03-Aug-2000 itojun

clearifications in icmp6 node query support.
XXX previous commit included "supported qtypes" icmp6 node query support.
sorry commit message was mistaken.


# 1.40 03-Aug-2000 itojun

correct typo in #define. ICMP6_NI_SUCESS -> SUCCESS (notice missing C).
sync with kame.


# 1.39 30-Jul-2000 itojun

sync comment with reality


# 1.38 28-Jul-2000 itojun

nuke the following sysctl variables. "ppsratelimit" should work better.
need to recompile sbin/sysctl after updating /usr/include.
net.inet.tcp.rstratelimit
net.inet.icmp.errratelimit
net.inet6.icmp6.errratelimit


# 1.37 09-Jul-2000 itojun

add ppsratelimit(9), which does event-per-sec rate limitation.
use it from icmp6 error rate limitation code.
XXX better name for the function?


# 1.36 07-Jul-2000 itojun

sync with kame.
introduce in6_{recover,embed}scope, for in-kernel scoped-address manipulation.
improve in6_pcbnotify.


# 1.35 06-Jul-2000 itojun

- do not use bitfield for router renumbering header.
- add protection mechanism against ND cache corruption due to bad NUD hints.
- more stats
- icmp6 pps limitation. TOOD: should implement ppsratecheck(9).


# 1.34 28-Jun-2000 mrg

<vm/vm.h> -> <uvm/uvm_extern.h>


Revision tags: netbsd-1-5-base
# 1.33 13-Jun-2000 itojun

branches: 1.33.2;
signedness issue with char, take 2. confirmed with i386 cc -funsigned-char.


# 1.32 13-Jun-2000 itojun

workaround to suppress warning on char == unsigned char arch.


# 1.31 12-Jun-2000 itojun

better conformance to draft-ietf-ipngwg-icmp-name-lookups-05.
the old code was chimera of 03 and 05 draft.

-n by default, since IPv6 reverse lookup takes too much time.
use -H to enable reverse name lookup.


Revision tags: minoura-xpg4dl-base
# 1.30 22-May-2000 itojun

branches: 1.30.2;
disallow negative numbers for ratelimit interval (tcp, icmp, icmp6).


# 1.29 09-May-2000 itojun

do not try NUD unless the gateway is a real neighbor.
real fix to KAME PR 245 (workaround has been implemented).


# 1.28 13-Apr-2000 itojun

do not return icmp6 error against icmp6 error.
(this is due to a bug in header chain chasing)


# 1.27 22-Mar-2000 itojun

use ip6_{last,next}hdr in icmp6 inbound packet parsing.


# 1.26 01-Mar-2000 itojun

introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)


# 1.25 28-Feb-2000 itojun

fix ICMPv6 redirect input. the bug can result in invalid ND entry.


# 1.24 28-Feb-2000 itojun

support draft-ietf-ipngwg-icmp-name-lookups-05.txt, drop support for
draft-ietf-ipngwg-icmp-name-lookups-04.txt.

There are certain bitfield change in 04 draft to 05 draft, which makes
04 "ping6 -a" and 05 "ping6 -a" not interoperable. sigh.


# 1.23 26-Feb-2000 itojun

bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.


# 1.22 17-Feb-2000 darrenr

Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.


# 1.21 15-Feb-2000 thorpej

Fix a couple of brainos in the last.


# 1.20 14-Feb-2000 thorpej

Use ratecheck() for ICMP6 rate limiting.


Revision tags: chs-ubc2-newbase
# 1.19 06-Feb-2000 itojun

fix include pathname for better rfc2292 compliance.


# 1.18 16-Jan-2000 itojun

add missing ipcomp cases.


# 1.17 07-Jan-2000 itohy

Rename variable "prep" for PReP port.


# 1.16 06-Jan-2000 itojun

remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...


# 1.15 05-Jan-2000 itojun

avoid panic on getsockopt(ICMPV6_FILTER).


# 1.14 02-Jan-2000 itojun

add net.inet6.icmp6.nodeinfo sysctl.
this allows you to disable/enable ICMPv6 node information query/reply
processing (which tells remote end the gethostname(3) setting, interface
addresses on the node, and some other things - documented in
draft-ietf-ipngwg-icmp-name-lookup* or something alike).

to test it, try ping6 -w ::1 with nodeinfo=0 and nodeinfo=1.
(sync with kame change)


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.13 15-Dec-1999 itojun

do not overwrite traffic class field when we write IPv6 version field.


# 1.12 13-Dec-1999 itojun

sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)


Revision tags: comdex-fall-1999-base fvdl-softdep-base
# 1.11 01-Oct-1999 itojun

branches: 1.11.2; 1.11.8;
consistent logging for icmp6 redirects
XXX should make logs 1-liner so that duplicated logs can be compressed
by syslog(8)?


Revision tags: chs-ubc2-base
# 1.10 31-Jul-1999 itojun

sync with recent KAME.
- loosen ipsec restriction on packet diredtion.
- revise icmp6 redirect handling on IsRouter bit.
- tcp/udp notification processing (link-local address case)
- cosmetic fixes (better code share across *BSD).


# 1.9 30-Jul-1999 itojun

remove reference to in6_systm.h (file itself will be removed afterwords)


# 1.8 22-Jul-1999 itojun

- implement IPv6 pmtud, which is necessary for TCP6.
- fix memory leak on SO_DEBUG over TCP.


# 1.7 22-Jul-1999 itojun

change unnecessary u_long/long into u_int32_t or something relevant.
more fixes should follow.


# 1.6 09-Jul-1999 thorpej

defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).


# 1.5 06-Jul-1999 itojun

sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups


# 1.4 06-Jul-1999 itojun

checked build on alpha and i386, with GENERIC.v6.
fixed several sizeof(void *) and sizeof(size_t) issues on alpha.

Thanks to: Dave Huang and Tim Rightnour


# 1.3 03-Jul-1999 thorpej

RCS ID police.


# 1.2 01-Jul-1999 itojun

branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.1 28-Jun-1999 itojun

branches: 1.1.2;
file icmp6.c was initially added on branch kame.


# 1.208 07-Feb-2017 ozaki-r

Add missing NULL checks for m_get_rcvif


Revision tags: nick-nhusb-base-20170204
# 1.207 02-Feb-2017 ozaki-r

Defer some pr_input to workqueue

pr_input is currently called in softint. Some pr_input such as ICMP, ICMPv6
and CARP can add/delete/update IP addresses and routing table entries. For
example, icmp6_redirect_input updates an a routing table entry and
nd6_ra_input may delete an IP address.

Basically such operations shouldn't be done in softint. That aside, we have
a reason to avoid the situation; psz/psref waits cannot be used in softint,
however they are required to work in such pr_input in the MP-safe world.

The change implements the workqueue pr_input framework called wqinput which
provides a means to defer pr_input of a protocol to workqueue easily.
Currently icmp_input, icmp6_input, carp_proto_input and carp6_proto_input
are deferred to workqueue by the framework.

Proposed and discussed on tech-kern and tech-net


# 1.206 16-Jan-2017 christos

ip6_sprintf -> IN6_PRINT so that we pass the size.


# 1.205 16-Jan-2017 ryo

Make ip6_sprintf(), in_fmtaddr(), lla_snprintf() and icmp6_redirect_diag() mpsafe.

Reviewed by ozaki-r@


Revision tags: bouyer-socketcan-base
# 1.204 13-Jan-2017 ozaki-r

Tweak icmp6_input; always use off, not *offp


Revision tags: pgoyette-localcount-20170107
# 1.203 12-Dec-2016 ozaki-r

Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.202 11-Dec-2016 ozaki-r

Correct sanity checks of icmp6_redirect_output

- rt->rt_ifp is always non-NULL
- Checking RTF_UP here is just racy and meaningless
- The arguments should be non-NULL (at least for now)


Revision tags: nick-nhusb-base-20161204
# 1.201 15-Nov-2016 mlelstv

Enforce alignment requirements that are violated in some cases.
For machines that don't need strict alignment (i386,amd64,vax,m68k) this
is a no-op.

Fixes PR kern/50766 but should be improved.


Revision tags: pgoyette-localcount-20161104
# 1.200 31-Oct-2016 ozaki-r

Fix race condition of in6_selectsrc

in6_selectsrc returned a pointer to in6_addr that wan't guaranteed to be
safe by pserialize (or psref), which was racy. Let callers pass a pointer
to in6_addr and in6_selectsrc copy a result to it inside pserialize
critical sections.


# 1.199 25-Oct-2016 ozaki-r

Remove unnecessary argument

No functional change.


# 1.198 18-Oct-2016 ozaki-r

Remove unnecessary pserialize_read_enter


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.197 26-Aug-2016 dholland

PR 51434 David Binderman: remove redundant test.


# 1.196 19-Aug-2016 roy

Revert r1.148
IP6_EXTHDR_GET ensures that a icmp6 header can be fetched from the mbuf
so m_pullup does not need to be called.

While here, we can safely increament interface error stats even with an
invalidated mbuf because we have a saved reference to the interface.


Revision tags: pgoyette-localcount-20160806
# 1.195 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


Revision tags: pgoyette-localcount-20160726
# 1.194 15-Jul-2016 ozaki-r

Use sin6tosa and sin6tocsa macros

No functional change.


# 1.193 15-Jul-2016 ozaki-r

Use ifatoia6 macro

No functional change.


Revision tags: pgoyette-localcount-base nick-nhusb-base-20160907
# 1.192 07-Jul-2016 ozaki-r

branches: 1.192.2;
Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.


# 1.191 05-Jul-2016 ozaki-r

Use ia6 or ia instead of ifa as a variable name of struct in6_ifaddr

We conventionally use ifa for struct ifaddr and use ia6 or ia for
struct in6_ifaddr.

No functional change.


# 1.190 28-Jun-2016 ozaki-r

Add missing NULL checks for m_get_rcvif_psref


# 1.189 21-Jun-2016 ozaki-r

Make sure returning ifp from in6_select* functions psref-ed

To this end, callers need to pass struct psref to the functions
and the fuctions acquire a reference of ifp with it. In some cases,
we can simply use if_get_byindex, however, in other cases
(say rt->rt_ifp and ia->ifa_ifp), we have no MP-safe way for now.
In order to take a reference anyway we use non MP-safe function
if_acquire_NOMPSAFE for the latter cases. They should be fixed in
the future somehow.


# 1.188 10-Jun-2016 ozaki-r

Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.


# 1.187 10-Jun-2016 ozaki-r

Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.186 18-May-2016 ozaki-r

Don't try to get outif unnecessarily from in6_selectsrc

The got outif is unused.


# 1.185 17-May-2016 ozaki-r

Get rcvif once and reuse it

No functional change.


# 1.184 17-May-2016 ozaki-r

Make sure icmp6_redirect_input frees mbuf before return


# 1.183 12-May-2016 ozaki-r

Protect ifnet list with psz and psref

The change ensures that ifnet objects in the ifnet list aren't freed during
list iterations by using pserialize(9) and psref(9).

Note that the change adds a pslist(9) for ifnet but doesn't remove the
original ifnet list (ifnet_list) to avoid breaking kvm(3) users. We
shouldn't use the original list in the kernel anymore.


Revision tags: nick-nhusb-base-20160422
# 1.182 04-Apr-2016 ozaki-r

Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html


# 1.181 01-Apr-2016 ozaki-r

Remove unnecessary casts and do s/0/NULL/ for rtrequest


# 1.180 01-Apr-2016 ozaki-r

Refine nd6log

Add __func__ to nd6log itself instead of adding it to callers.


Revision tags: nick-nhusb-base-20160319
# 1.179 21-Jan-2016 riastradh

Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.


# 1.178 21-Jan-2016 riastradh

Give proper prototype to ip_output.


Revision tags: nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.177 14-Sep-2015 ozaki-r

Update icmp6_redirect_timeout_q when changing net.inet6.icmp6.redirtimeout

We have to update icmp6_redirect_timeout_q as well as icmp6_redirtimeout
when changing net.inet6.icmp6.redirtimeout via sysctl. The updating logic
is copied from sysctl_net_inet_icmp_redirtimeout.

This change is from s-yamaguchi@IIJ (with KNF by ozaki-r) and fixes
PR kern/50240.


# 1.176 31-Aug-2015 ozaki-r

Make rt_refcnt take into account rt_timer


# 1.175 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.174 24-Aug-2015 ozaki-r

Change 0 to NULL for rtrequest's last argument (struct rtentry **ret_nrt)


# 1.173 07-Aug-2015 ozaki-r

Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.


# 1.172 24-Jul-2015 ozaki-r

Fix rtfree-ing wrong rtentry


# 1.171 17-Jul-2015 ozaki-r

Reform use of rt_refcnt

rt_refcnt of rtentry was used in bad manners, for example, direct rt_refcnt++
and rt_refcnt-- outside route.c, "rt->rt_refcnt++; rtfree(rt);" idiom, and
touching rt after rt->rt_refcnt--.

These abuses seem to be needed because rt_refcnt manages only references
between rtentry and doesn't take care of references during packet processing
(IOW references from local variables). In order to reduce the above abuses,
the latter cases should be counted by rt_refcnt as well as the former cases.

This change improves consistency of use of rt_refcnt:
- rtentry is always accessed with rt_refcnt incremented
- rtentry's rt_refcnt is decremented after use (rtfree is always used instead
of rt_refcnt--)
- functions returning rtentry increment its rt_refcnt (and caller rtfree it)

Note that rt_refcnt prevents rtentry from being freed but doesn't prevent
rtentry from being updated. Toward MP-safe, we need to provide another
protection for rtentry, e.g., locks. (Or introduce a better data structure
allowing concurrent readers during updates.)


Revision tags: nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.170 25-Nov-2014 christos

branches: 1.170.2;
CID 977389: Out of bounds access.


Revision tags: netbsd-7-0-2-RELEASE netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.169 06-Jun-2014 rmind

branches: 1.169.2;
- Eliminate RTFREE() macro in favour of rtfree() function.
- Make rtcache() function static.


# 1.168 30-May-2014 christos

Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.


# 1.167 19-May-2014 rmind

- Split off PRU_ATTACH and PRU_DETACH logic into separate functions.
- Replace malloc with kmem and eliminate M_PCB while here.
- Sprinkle more asserts.


Revision tags: rmind-smpnet-nbase rmind-smpnet-base
# 1.166 18-May-2014 rmind

Use IFNET_FIRST() rather than open coding ifnet access.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.165 25-Feb-2014 pooka

branches: 1.165.2;
Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.


# 1.164 20-Feb-2014 joerg

Bail out in case m_pulldown failed.


# 1.163 23-Nov-2013 christos

convert from CIRCLEQ to TAILQ.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.162 05-Jun-2013 christos

branches: 1.162.2;
IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.


Revision tags: agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.161 23-Jun-2012 christos

branches: 1.161.2;
4 new sysctls to avoid ipv6 DoS attacks from OpenBSD


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.160 22-Mar-2012 drochner

remove KAME IPSEC, replaced by FAST_IPSEC


Revision tags: netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.159 31-Dec-2011 christos

branches: 1.159.2; 1.159.6; 1.159.8;
- fix offsetof usage, and redundant defines
- kill pointer casts to 0


# 1.158 19-Dec-2011 drochner

rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.157 31-Aug-2011 plunky

branches: 1.157.2; 1.157.6;
NULL does not need a cast


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 rmind-uvmplock-base
# 1.156 12-Sep-2010 drochner

avoid NULL dereference in error case


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.155 18-Oct-2009 christos

branches: 1.155.2; 1.155.4;
fix the sun2 case for real.


# 1.154 12-Oct-2009 christos

unbreak sun2.


# 1.153 16-Sep-2009 pooka

Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.152 18-Mar-2009 cegger

bzero -> memset


# 1.151 18-Mar-2009 cegger

bcmp -> memcmp


Revision tags: netbsd-5-2-3-RELEASE netbsd-5-1-5-RELEASE netbsd-5-2-2-RELEASE netbsd-5-1-4-RELEASE netbsd-5-2-1-RELEASE netbsd-5-1-3-RELEASE netbsd-5-2-RELEASE netbsd-5-2-RC1 netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 nick-hppapmap-base2 netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 haad-dm-base mjf-devfs2-base
# 1.150 03-Oct-2008 adrianp

branches: 1.150.2; 1.150.8;
Fix for CVE-2008-3530 from matt@
Implement improved checking for MTU values on ICMP 'Packet Too Big Messages'


Revision tags: wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.149 06-Aug-2008 plunky

Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base
# 1.148 07-May-2008 bouyer

branches: 1.148.2; 1.148.6;
Sync with ipv4 icmp_input(): make sure the mbuf is writable and
contains the entire icmp message befre calling icmp6_input().
should fix "panic: mbuf too short for IPv6 header" seen by several peoples.


# 1.147 04-May-2008 thorpej

Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577


Revision tags: yamt-nfs-mp-base
# 1.146 23-Apr-2008 thorpej

branches: 1.146.2;
Use <net/net_stats.h> / netstat_sysctl().


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.145 15-Apr-2008 thorpej

branches: 1.145.2;
Make ip6 and icmp6 stats per-cpu.


# 1.144 08-Apr-2008 thorpej

Change IPv6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ip6stat structure; old netstat
binaries will continue to work properly.


# 1.143 08-Apr-2008 thorpej

Change ICMP6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old icmp6stat structure; old netstat
binaries will continue to work properly.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.142 27-Feb-2008 matt

Convert to ansi definitions from old-style definitons.
Remember that func() is not ansi, func(void) is.


Revision tags: nick-net80211-sync-base bouyer-xeni386-merge1 vmlocking2-base3 bouyer-xeni386-nbase yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 bouyer-xeni386-base yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase mjf-devfs-base matt-armv6-base jmcneill-pm-base hpcarm-cleanup-base reinoud-bufcleanup-base
# 1.141 04-Dec-2007 dyoung

branches: 1.141.8; 1.141.12;
Use IFNET_FOREACH() and IFADDR_FOREACH().


Revision tags: vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.140 01-Nov-2007 dyoung

branches: 1.140.2; 1.140.4;
De-__P().


# 1.139 29-Oct-2007 dyoung

The IPv6 stack labels incoming packets with an m_tag whose payload
is a struct ip6aux. A struct ip6aux used to contain a pointer to
an in6_ifaddr, but that pointer could become a dangling reference
in the lifetime of the m_tag, because ip6_setdstifaddr() did not
increase the in6_ifaddr's reference count. I have removed the
pointer from ip6aux. I load it with the interesting fields from
the in6_ifaddr (an IPv6 address, a scope ID, and some flags),
instead.


# 1.138 24-Oct-2007 dyoung

Replace rote sockaddr_in6 initializations (memset(), set sa6_family,
sa6_len, and sa6_add) with sockaddr_in6_init() calls.

De-__P(). Constify. KNF. Shorten a staircase. Change bcmp() to
memcmp().

Extract subroutine in6_setzoneid() from in6_setscope(), for re-use
soon.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base vmlocking-base
# 1.137 19-Sep-2007 dyoung

branches: 1.137.4;
1) Introduce a new socket option, (SOL_SOCKET, SO_NOHEADER), that
tells a socket that it should both add a protocol header to tx'd
datagrams and remove the header from rx'd datagrams:

int onoff = 1, s = socket(...);
setsockopt(s, SOL_SOCKET, SO_NOHEADER, &onoff);

2) Add an implementation of (SOL_SOCKET, SO_NOHEADER) for raw IPv4
sockets.

3) Reorganize the protocols' pr_ctloutput implementations a bit.
Consistently return ENOPROTOOPT when an option is unsupported,
and EINVAL if a supported option's arguments are incorrect.
Reorganize the flow of code so that it's more clear how/when
options are passed down the stack until they are handled.

Shorten some pr_ctloutput staircases for readability.

4) Extract common mbuf code into subroutines, add new sockaddr
methods, and introduce a new subroutine, fsocreate(), for reuse
later; use it first in sys_socket():

struct mbuf *m_getsombuf(struct socket *so)

Create an mbuf and make its owner the socket `so'.

struct mbuf *m_intopt(struct socket *so, int val)

Create an mbuf, make its owner the socket `so', put the
int `val' into it, and set its length to sizeof(int).


int fsocreate(..., int *fd)

Create a socket, a la socreate(9), put the socket into the
given LWP's descriptor table, return the descriptor at `fd'
on success.

void *sockaddr_addr(struct sockaddr *sa, socklen_t *slenp)
const void *sockaddr_const_addr(const struct sockaddr *sa, socklen_t *slenp)

Extract a pointer to the address part of a sockaddr. Write
the length of the address part at `slenp', if `slenp' is
not NULL.

socklen_t sockaddr_getlen(const struct sockaddr *sa)

Return the length of a sockaddr. This just evaluates to
sa->sa_len. I only add this for consistency with code that
appears in a portable userland library that I am going to
import.

const struct sockaddr *sockaddr_any(const struct sockaddr *sa)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.

const void *sockaddr_anyaddr(const struct sockaddr *sa, socklen_t *slenp)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.


Revision tags: nick-csl-alignment-base5
# 1.136 10-Aug-2007 dyoung

branches: 1.136.2;
Constify. bcopy -> memcpy.


Revision tags: matt-mips64-base
# 1.135 19-Jul-2007 dyoung

branches: 1.135.4; 1.135.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.134 13-Jun-2007 dyoung

branches: 1.134.2;
Persuasive programming: check M_UNWRITABLE(m, len) instead of
m->m_len<len before pulling up, because that helps make it clear
that we m_pullup() in order to guarantee that the contiguous region
is *writable*.


# 1.133 23-May-2007 christos

Ansify + add a few comments, from Karl Sj��dahl


Revision tags: yamt-idlelwp-base8
# 1.132 02-May-2007 dyoung

Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


Revision tags: thorpej-atomic-base
# 1.131 04-Mar-2007 christos

branches: 1.131.2; 1.131.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.130 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


# 1.129 10-Feb-2007 degroote

branches: 1.129.2;
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.128 29-Jan-2007 dyoung

bzero -> memset


# 1.127 15-Jan-2007 dyoung

Cosmetic: indent using ASCII horizontal tab, insert space following
comma, wrap line.


# 1.126 15-Jan-2007 degroote

Fix an infinite loop ( and local dos ) in the case where the ip6_hdr and
the icmp6_hdr are not in the same mbuf.
Fix pr/34994 and probably pr/35333
Ok @rpaulo


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.125 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.124 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


Revision tags: netbsd-4-base
# 1.123 16-Nov-2006 christos

branches: 1.123.2;
__unused removal on arguments; approved by core.


Revision tags: yamt-splraiseipl-base2
# 1.122 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.121 05-Sep-2006 dyoung

branches: 1.121.2; 1.121.4;
Simplify and repair icmp6_input() to stop the kernel from panicking
in m_copydata() when an ICMP6_ECHO_REQUEST is received, as reported
by Tatoku Ogaito on current-users@.


Revision tags: yamt-pdpolicy-base8
# 1.120 01-Sep-2006 dyoung

Vastly simplify the code that copies an ICMP6 packet to two data
paths: ICMP6 reply path, and socket path.


# 1.119 30-Aug-2006 christos

declare the type of code.


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.118 11-Jul-2006 tron

Clear mbuf checksum flags before passing it to ip6_output(). We might
recycle a mbuf which contained a hardware provided checksum. This
fixes "traceroute6" to a machine which is using a wm(4) interface
that has UDP or TCP checksum offload enabled.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base
# 1.117 07-Jun-2006 kardel

branches: 1.117.2;
merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base simonb-timecounters-base
# 1.116 15-Apr-2006 christos

branches: 1.116.2;
Coverity CID 740: Change constant comparisons to MCLBYTES to KASSERT and remove
extraneous tests.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2
# 1.115 05-Mar-2006 rpaulo

branches: 1.115.2; 1.115.4;
NDP-related improvements:
RFC4191
- supports host-side router-preference

RFC3542
- if DAD fails on a interface, disables IPv6 operation on the
interface
- don't advertise MLD report before DAD finishes

Others
- fixes integer overflow for valid and preferred lifetimes
- improves timer granularity for MLD, using callout-timer.
- reflects rtadvd's IPv6 host variable information into kernel
(router only)
- adds a sysctl option to enable/disable pMTUd for multicast
packets
- performs NUD on PPP/GRE interface by default
- Redirect works regardless of ip6_accept_rtadv
- removes RFC1885-related code

From the KAME project via SUZUKI Shinsuke.
Reviewed by core.


Revision tags: yamt-pdpolicy-base
# 1.114 03-Mar-2006 rpaulo

branches: 1.114.2;
Fix typos in comments.

From: the KAME project via SUZUKI Shinsuke.


Revision tags: yamt-uio_vmspace-base5
# 1.113 21-Jan-2006 rpaulo

branches: 1.113.2; 1.113.4;
Better support of IPv6 scoped addresses.

- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.
- and, while there, don't try to remove the ff02::/32 interface route
entry in in6_ifdetach() as it's already gone.

This also includes some level of support for the standard source
address selection algorithm defined in RFC3484, which will be
completed on in the future.

From the KAME project via JINMEI Tatuya.
Approved by core@.


# 1.112 11-Dec-2005 christos

branches: 1.112.2;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base ktrace-lwp-base
# 1.111 19-Oct-2005 bouyer

In icmp6_redirect_output(), sip6 is initialised to point to the data area of
m0. But m0 may be freed later, so trying to use sip6 at the end of this
function is wrong. My guess is that we want to reference the data area
of m (the mbuf about to be send) instead at this point.
Fix a panic on Xen (where a data area of a mbuf may be unmapped when the
mbuf is freed), and probably potential data/pool corruption in other cases.


Revision tags: yamt-vop-base
# 1.110 18-Aug-2005 yamt

branches: 1.110.2;
- introduce M_MOVE_PKTHDR and use it where appropriate.
intended to be mostly API compatible with openbsd/freebsd.
- remove a glue #define in netipsec/ipsec_osdep.h.


# 1.109 29-May-2005 christos

branches: 1.109.2;
- avoid shadowed variables
- sprinkle const.


Revision tags: netbsd-3-1-1-RELEASE netbsd-3-0-3-RELEASE netbsd-3-1-RELEASE netbsd-3-0-2-RELEASE netbsd-3-1-RC4 netbsd-3-1-RC3 netbsd-3-1-RC2 netbsd-3-1-RC1 netbsd-3-0-1-RELEASE netbsd-3-0-RELEASE netbsd-3-0-RC6 netbsd-3-0-RC5 netbsd-3-0-RC4 netbsd-3-0-RC3 netbsd-3-0-RC2 netbsd-3-0-RC1 yamt-km-base4 yamt-km-base3 netbsd-3-base yamt-km-base2 yamt-km-base kent-audio2-base
# 1.108 17-Jan-2005 itojun

branches: 1.108.6; 1.108.8; 1.108.10;
shouldn't check code field on "packet too big" icmp6 message.


Revision tags: kent-audio1-beforemerge kent-audio1-base
# 1.107 25-May-2004 atatat

branches: 1.107.4;
Sysctl descriptions under net subtree (net.key not done)


Revision tags: netbsd-2-0-base
# 1.106 26-Mar-2004 itojun

branches: 1.106.2;
do not touch m->m_pkthdr.rcvif after m becomes invalid. Patrick Latifi


# 1.105 24-Mar-2004 atatat

Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.


# 1.104 17-Dec-2003 lha

Fix ICMPV6CTL_ND6_[DP]RLIST, they broke with new sysctl.
Makes ndp -r/ndp -p work again, patch from atatat


# 1.103 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.102 30-Oct-2003 simonb

Remove some assigned-to but otherwise unused variables.


# 1.101 04-Sep-2003 itojun

revamp inpcb/in6pcb so that they are more aligned with each other.
in6pcb lookup now uses hash(9).


# 1.100 25-Aug-2003 itojun

deref member in in6p directly, don't rely on existence of macro


# 1.99 22-Aug-2003 itojun

remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.


# 1.98 22-Aug-2003 itojun

change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.


# 1.97 22-Aug-2003 jonathan

Replace the set_socket() method of passing an extra struct socket*
argument to ip6_output() with a new explicit struct in6pcb* argument.
(The underlying socket can be obtained via in6pcb->inp6_socket.)

In preparation for fast-ipsec. Reviewed by itojun.


# 1.96 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.95 06-Aug-2003 itojun

m_cat may free mbuf on 2nd arg, so m_pkthdr manipulation has to happen
before m_cat call. from Julian Coleman via kame.


# 1.94 24-Jun-2003 itojun

branches: 1.94.2;
remove unneeded checks of accept_rtadv. from kame


# 1.93 24-Jun-2003 itojun

use time.tv_sec directly


# 1.92 06-Jun-2003 itojun

- sync up MLD declaration with RFC3542 (s/MLD6/MLD/)
- routing header declaration with RFC3542
(note: sizeof(ip6_rthdr0) has changed!)
also, sync up with RFC2460 routing header definition (no "strict" source
routing mode any more)

part of advanced API update (RFC2292 -> 3542).


# 1.91 03-Jun-2003 itojun

remove assumption on redirect header option processing. from kame


# 1.90 14-May-2003 itojun

always use PULLDOWN_TEST codepath.


# 1.89 31-Mar-2003 itojun

avoid mbuf leak in redirect header option attachment. more complete
fix to come. from kame


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.88 27-Sep-2002 provos

remove trailing \n in panic(). approved perry.


# 1.87 23-Sep-2002 simonb

Remove breaks after returns, unreachable returns and returns after
returns(!).


# 1.86 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame.


Revision tags: gehenna-devsw-base
# 1.85 30-Jul-2002 itojun

no need to check NULL mbuf, as we touch it already.
From: tedu <grendel@zeitbombe.org>


# 1.84 10-Jul-2002 itojun

correct ping6 -w result wth hostname with [A-Z]. PR 17540. sync w/kame


# 1.83 30-Jun-2002 thorpej

Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).


# 1.82 09-Jun-2002 itojun

whitespace cleanup


# 1.81 08-Jun-2002 itojun

whitespace cleanup


# 1.80 31-May-2002 itojun

do not mistakenly lock PMTUD route entry with RTV_MTU.


# 1.79 29-May-2002 christos

make this compile again.


# 1.78 29-May-2002 itojun

correct rmx_mtu value after PMTUD entry timeout (should be set to 0)


# 1.77 24-May-2002 itojun

extra blank line


# 1.76 24-May-2002 itojun

make a strict check before sending FQDN node information reply. sync w/kame


Revision tags: netbsd-1-6-base eeh-devprop-base newlock-base
# 1.75 05-Mar-2002 itojun

branches: 1.75.6; 1.75.8;
on redirect output, always try to attach target link layer address option.


Revision tags: ifpoll-base
# 1.74 21-Dec-2001 itojun

whitespace/costmetic sync w/kame


# 1.73 20-Dec-2001 itojun

centralize multicast group management (in6_join/leavegroup).
have a flag for ip6_output() to fragment to minimum MTU.
sync with kame


# 1.72 07-Dec-2001 itojun

correct timing to increment icmp6 MIB variables. sync with kame


# 1.71 13-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-mips-cache-base
# 1.70 29-Oct-2001 simonb

Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.


# 1.69 24-Oct-2001 itojun

more whitespace sync with kame


# 1.68 18-Oct-2001 itojun

branches: 1.68.2;
simplify per-if stats.


# 1.67 15-Oct-2001 itojun

sync with kame.
net.inet6.icmp6.nodeinfo is now a bitmap (2^0 = ping6 -w, 2^1 = ping6 -a).
give up local if there's mbuf alloc failures.
cope with ".." in hostname.
sync comments/whitespaces.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.66 22-Jun-2001 itojun

branches: 1.66.2;
remove RFC1885 compatibility code in #ifdef COMPAT_RFC1885, for icmp6
reply packet size consideration (obsolete, not used for a long time).
sync with kame


# 1.65 01-Jun-2001 itojun

use default hoplimit when incoming interface is not given to icmp6_reflect.
sync with kame


# 1.64 08-May-2001 itojun

correct faith prefix determination. use sys/netinet/if_faith.c:faithprefix()
to determine. sync with kame.
(without this change, non-faith socket may mistakenly accept for-faith traffic)


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.63 04-Apr-2001 itojun

make sure rcvif is sane on call to icmp6_reflect


# 1.62 30-Mar-2001 itojun

enable FAKE_LOOPBACK_IF case by default.
now traffic on loopback interface will be presented to bpf as normal wire
format packet (without KAME scopeid in s6_addr16[1]).

fix KAME PR 250 (host mistakenly accepts packets to fe80::x%lo0).

sync with kame.


# 1.61 21-Mar-2001 itojun

set rmx_mtu to L2 interface mtu, instead of 0, on mtudisc timeout.
ip6_output() change is for safety. sync with kame


# 1.60 08-Mar-2001 itojun

remove bogus rtfree. sync with kame. inspired by openbsd PR 1706.


# 1.59 01-Mar-2001 itojun

branches: 1.59.2;
make sure to enforce inbound ipsec policy checking, for any protocols on top
of ip (check it when final header is visited). sync with kame.
XXX kame team will need to re-check policy engine code


# 1.58 11-Feb-2001 itojun

pull latest kame pcbnotify code. synchronizes ICMPv6 path mtu discovery
behavior with other protocols (i.e. validation, use of hiwat/lowat).


# 1.57 11-Feb-2001 itojun

recover $NetBSD$ (removed by mistake)


# 1.56 10-Feb-2001 itojun

to sync with kame better, (1) remove register declaration for variables,
(2) sync whitespaces, (3) update comments. (4) bring in some of portability
and logging enhancements. no functional changes here.


# 1.55 08-Feb-2001 itojun

implement upper limit to icmp6 redirects (experimental, turned off)
negative value to {mtudisc,redirect}_{hi,lo}wat will turn off the limitation.
sync with kame.


# 1.54 07-Feb-2001 itojun

remove bogus DIAGNOSTIC. sync with kame


# 1.53 07-Feb-2001 itojun

during ip6/icmp6 inbound packet processing, do not call log() nor printf() in
normal operation (/var can get filled up by flodding bogus packets).
sysctl net.inet6.icmp6.nd6_debug will turn on diagnostic messages.
(#define ND6_DEBUG will turn it on by default)

improve stats in ND6 code.

lots of synchronziation with kame (including comments and cometic ones).


# 1.52 24-Jan-2001 itojun

- record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation


# 1.51 16-Jan-2001 itojun

s/ND6DEBUG/ND6_DEBUG/ to meet other places


# 1.50 08-Jan-2001 itojun

wrap icmp6 checksum error printf() into #ifdef ND6DEBUG.
sync with kame, NetBSD PR 11911.


# 1.49 11-Dec-2000 itojun

no need to rtalloc1() twice in pmtud. from kame


# 1.48 09-Dec-2000 itojun

update icmp6 too big validation. the change is necessary since pmtud is
mandatory for IPv6 (so we can't just validate by using connected pcb - we need
to allow traffic from unconnected pcb to do pmtud).
- if the traffic is validated by xx_ctlinput, allow up to "hiwat" pmtud
route entries.
- if the traffic was not validated by xx_ctlinput, allow up to "lowat" pmtud
route entries (there's upper limit, so bad guys cannot blow up our routing
table).
sync with kame

XXX need to think again about default hiwat/lowat value.
XXX victim selection to help starvation case


# 1.47 11-Nov-2000 itojun

improve spec conformance of node information query (07).
sync with kame.


# 1.46 18-Oct-2000 itojun

verify ICMPv6 too big messages based on TCP pcbs, and/or IPsec SA.
TODO: udp6, and sendto consideration. as pmtud is mandatory for IPv6,
it is rather important for us to support those cases.
TODO: more testing
TODO: kame sync


# 1.45 10-Oct-2000 itojun

sync with kame ($KAME$)


# 1.44 02-Oct-2000 itojun

fix compilation without INET. fix confusion between ipsecstat and ipsec6stat.
sync with kame.


# 1.43 16-Sep-2000 itojun

kame sys/netinet6/icmp6.c 1.140 -> 1.144
> in the check for the incoming redirect message, examine the gateway
> (from the routing table) only when the address family of the gateway is
> AF_INET6.


# 1.42 19-Aug-2000 itojun

- icmp6 nodeinfo: remove possibility of unaligned pointer access.
- jumbo payload output: fix incorrect mbuf manipulation
- pedant: align issues, mbuf assumption
(sync with kame)


# 1.41 03-Aug-2000 itojun

clearifications in icmp6 node query support.
XXX previous commit included "supported qtypes" icmp6 node query support.
sorry commit message was mistaken.


# 1.40 03-Aug-2000 itojun

correct typo in #define. ICMP6_NI_SUCESS -> SUCCESS (notice missing C).
sync with kame.


# 1.39 30-Jul-2000 itojun

sync comment with reality


# 1.38 28-Jul-2000 itojun

nuke the following sysctl variables. "ppsratelimit" should work better.
need to recompile sbin/sysctl after updating /usr/include.
net.inet.tcp.rstratelimit
net.inet.icmp.errratelimit
net.inet6.icmp6.errratelimit


# 1.37 09-Jul-2000 itojun

add ppsratelimit(9), which does event-per-sec rate limitation.
use it from icmp6 error rate limitation code.
XXX better name for the function?


# 1.36 07-Jul-2000 itojun

sync with kame.
introduce in6_{recover,embed}scope, for in-kernel scoped-address manipulation.
improve in6_pcbnotify.


# 1.35 06-Jul-2000 itojun

- do not use bitfield for router renumbering header.
- add protection mechanism against ND cache corruption due to bad NUD hints.
- more stats
- icmp6 pps limitation. TOOD: should implement ppsratecheck(9).


# 1.34 28-Jun-2000 mrg

<vm/vm.h> -> <uvm/uvm_extern.h>


Revision tags: netbsd-1-5-base
# 1.33 13-Jun-2000 itojun

branches: 1.33.2;
signedness issue with char, take 2. confirmed with i386 cc -funsigned-char.


# 1.32 13-Jun-2000 itojun

workaround to suppress warning on char == unsigned char arch.


# 1.31 12-Jun-2000 itojun

better conformance to draft-ietf-ipngwg-icmp-name-lookups-05.
the old code was chimera of 03 and 05 draft.

-n by default, since IPv6 reverse lookup takes too much time.
use -H to enable reverse name lookup.


Revision tags: minoura-xpg4dl-base
# 1.30 22-May-2000 itojun

branches: 1.30.2;
disallow negative numbers for ratelimit interval (tcp, icmp, icmp6).


# 1.29 09-May-2000 itojun

do not try NUD unless the gateway is a real neighbor.
real fix to KAME PR 245 (workaround has been implemented).


# 1.28 13-Apr-2000 itojun

do not return icmp6 error against icmp6 error.
(this is due to a bug in header chain chasing)


# 1.27 22-Mar-2000 itojun

use ip6_{last,next}hdr in icmp6 inbound packet parsing.


# 1.26 01-Mar-2000 itojun

introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)


# 1.25 28-Feb-2000 itojun

fix ICMPv6 redirect input. the bug can result in invalid ND entry.


# 1.24 28-Feb-2000 itojun

support draft-ietf-ipngwg-icmp-name-lookups-05.txt, drop support for
draft-ietf-ipngwg-icmp-name-lookups-04.txt.

There are certain bitfield change in 04 draft to 05 draft, which makes
04 "ping6 -a" and 05 "ping6 -a" not interoperable. sigh.


# 1.23 26-Feb-2000 itojun

bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.


# 1.22 17-Feb-2000 darrenr

Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.


# 1.21 15-Feb-2000 thorpej

Fix a couple of brainos in the last.


# 1.20 14-Feb-2000 thorpej

Use ratecheck() for ICMP6 rate limiting.


Revision tags: chs-ubc2-newbase
# 1.19 06-Feb-2000 itojun

fix include pathname for better rfc2292 compliance.


# 1.18 16-Jan-2000 itojun

add missing ipcomp cases.


# 1.17 07-Jan-2000 itohy

Rename variable "prep" for PReP port.


# 1.16 06-Jan-2000 itojun

remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...


# 1.15 05-Jan-2000 itojun

avoid panic on getsockopt(ICMPV6_FILTER).


# 1.14 02-Jan-2000 itojun

add net.inet6.icmp6.nodeinfo sysctl.
this allows you to disable/enable ICMPv6 node information query/reply
processing (which tells remote end the gethostname(3) setting, interface
addresses on the node, and some other things - documented in
draft-ietf-ipngwg-icmp-name-lookup* or something alike).

to test it, try ping6 -w ::1 with nodeinfo=0 and nodeinfo=1.
(sync with kame change)


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.13 15-Dec-1999 itojun

do not overwrite traffic class field when we write IPv6 version field.


# 1.12 13-Dec-1999 itojun

sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)


Revision tags: comdex-fall-1999-base fvdl-softdep-base
# 1.11 01-Oct-1999 itojun

branches: 1.11.2; 1.11.8;
consistent logging for icmp6 redirects
XXX should make logs 1-liner so that duplicated logs can be compressed
by syslog(8)?


Revision tags: chs-ubc2-base
# 1.10 31-Jul-1999 itojun

sync with recent KAME.
- loosen ipsec restriction on packet diredtion.
- revise icmp6 redirect handling on IsRouter bit.
- tcp/udp notification processing (link-local address case)
- cosmetic fixes (better code share across *BSD).


# 1.9 30-Jul-1999 itojun

remove reference to in6_systm.h (file itself will be removed afterwords)


# 1.8 22-Jul-1999 itojun

- implement IPv6 pmtud, which is necessary for TCP6.
- fix memory leak on SO_DEBUG over TCP.


# 1.7 22-Jul-1999 itojun

change unnecessary u_long/long into u_int32_t or something relevant.
more fixes should follow.


# 1.6 09-Jul-1999 thorpej

defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).


# 1.5 06-Jul-1999 itojun

sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups


# 1.4 06-Jul-1999 itojun

checked build on alpha and i386, with GENERIC.v6.
fixed several sizeof(void *) and sizeof(size_t) issues on alpha.

Thanks to: Dave Huang and Tim Rightnour


# 1.3 03-Jul-1999 thorpej

RCS ID police.


# 1.2 01-Jul-1999 itojun

branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.1 28-Jun-1999 itojun

branches: 1.1.2;
file icmp6.c was initially added on branch kame.


# 1.206 16-Jan-2017 christos

ip6_sprintf -> IN6_PRINT so that we pass the size.


# 1.205 16-Jan-2017 ryo

Make ip6_sprintf(), in_fmtaddr(), lla_snprintf() and icmp6_redirect_diag() mpsafe.

Reviewed by ozaki-r@


Revision tags: bouyer-socketcan-base
# 1.204 13-Jan-2017 ozaki-r

Tweak icmp6_input; always use off, not *offp


Revision tags: pgoyette-localcount-20170107
# 1.203 12-Dec-2016 ozaki-r

Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.202 11-Dec-2016 ozaki-r

Correct sanity checks of icmp6_redirect_output

- rt->rt_ifp is always non-NULL
- Checking RTF_UP here is just racy and meaningless
- The arguments should be non-NULL (at least for now)


Revision tags: nick-nhusb-base-20161204
# 1.201 15-Nov-2016 mlelstv

Enforce alignment requirements that are violated in some cases.
For machines that don't need strict alignment (i386,amd64,vax,m68k) this
is a no-op.

Fixes PR kern/50766 but should be improved.


Revision tags: pgoyette-localcount-20161104
# 1.200 31-Oct-2016 ozaki-r

Fix race condition of in6_selectsrc

in6_selectsrc returned a pointer to in6_addr that wan't guaranteed to be
safe by pserialize (or psref), which was racy. Let callers pass a pointer
to in6_addr and in6_selectsrc copy a result to it inside pserialize
critical sections.


# 1.199 25-Oct-2016 ozaki-r

Remove unnecessary argument

No functional change.


# 1.198 18-Oct-2016 ozaki-r

Remove unnecessary pserialize_read_enter


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.197 26-Aug-2016 dholland

PR 51434 David Binderman: remove redundant test.


# 1.196 19-Aug-2016 roy

Revert r1.148
IP6_EXTHDR_GET ensures that a icmp6 header can be fetched from the mbuf
so m_pullup does not need to be called.

While here, we can safely increament interface error stats even with an
invalidated mbuf because we have a saved reference to the interface.


Revision tags: pgoyette-localcount-20160806
# 1.195 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


Revision tags: pgoyette-localcount-20160726
# 1.194 15-Jul-2016 ozaki-r

Use sin6tosa and sin6tocsa macros

No functional change.


# 1.193 15-Jul-2016 ozaki-r

Use ifatoia6 macro

No functional change.


Revision tags: pgoyette-localcount-base nick-nhusb-base-20160907
# 1.192 07-Jul-2016 ozaki-r

branches: 1.192.2;
Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.


# 1.191 05-Jul-2016 ozaki-r

Use ia6 or ia instead of ifa as a variable name of struct in6_ifaddr

We conventionally use ifa for struct ifaddr and use ia6 or ia for
struct in6_ifaddr.

No functional change.


# 1.190 28-Jun-2016 ozaki-r

Add missing NULL checks for m_get_rcvif_psref


# 1.189 21-Jun-2016 ozaki-r

Make sure returning ifp from in6_select* functions psref-ed

To this end, callers need to pass struct psref to the functions
and the fuctions acquire a reference of ifp with it. In some cases,
we can simply use if_get_byindex, however, in other cases
(say rt->rt_ifp and ia->ifa_ifp), we have no MP-safe way for now.
In order to take a reference anyway we use non MP-safe function
if_acquire_NOMPSAFE for the latter cases. They should be fixed in
the future somehow.


# 1.188 10-Jun-2016 ozaki-r

Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.


# 1.187 10-Jun-2016 ozaki-r

Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.186 18-May-2016 ozaki-r

Don't try to get outif unnecessarily from in6_selectsrc

The got outif is unused.


# 1.185 17-May-2016 ozaki-r

Get rcvif once and reuse it

No functional change.


# 1.184 17-May-2016 ozaki-r

Make sure icmp6_redirect_input frees mbuf before return


# 1.183 12-May-2016 ozaki-r

Protect ifnet list with psz and psref

The change ensures that ifnet objects in the ifnet list aren't freed during
list iterations by using pserialize(9) and psref(9).

Note that the change adds a pslist(9) for ifnet but doesn't remove the
original ifnet list (ifnet_list) to avoid breaking kvm(3) users. We
shouldn't use the original list in the kernel anymore.


Revision tags: nick-nhusb-base-20160422
# 1.182 04-Apr-2016 ozaki-r

Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html


# 1.181 01-Apr-2016 ozaki-r

Remove unnecessary casts and do s/0/NULL/ for rtrequest


# 1.180 01-Apr-2016 ozaki-r

Refine nd6log

Add __func__ to nd6log itself instead of adding it to callers.


Revision tags: nick-nhusb-base-20160319
# 1.179 21-Jan-2016 riastradh

Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.


# 1.178 21-Jan-2016 riastradh

Give proper prototype to ip_output.


Revision tags: nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.177 14-Sep-2015 ozaki-r

Update icmp6_redirect_timeout_q when changing net.inet6.icmp6.redirtimeout

We have to update icmp6_redirect_timeout_q as well as icmp6_redirtimeout
when changing net.inet6.icmp6.redirtimeout via sysctl. The updating logic
is copied from sysctl_net_inet_icmp_redirtimeout.

This change is from s-yamaguchi@IIJ (with KNF by ozaki-r) and fixes
PR kern/50240.


# 1.176 31-Aug-2015 ozaki-r

Make rt_refcnt take into account rt_timer


# 1.175 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.174 24-Aug-2015 ozaki-r

Change 0 to NULL for rtrequest's last argument (struct rtentry **ret_nrt)


# 1.173 07-Aug-2015 ozaki-r

Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.


# 1.172 24-Jul-2015 ozaki-r

Fix rtfree-ing wrong rtentry


# 1.171 17-Jul-2015 ozaki-r

Reform use of rt_refcnt

rt_refcnt of rtentry was used in bad manners, for example, direct rt_refcnt++
and rt_refcnt-- outside route.c, "rt->rt_refcnt++; rtfree(rt);" idiom, and
touching rt after rt->rt_refcnt--.

These abuses seem to be needed because rt_refcnt manages only references
between rtentry and doesn't take care of references during packet processing
(IOW references from local variables). In order to reduce the above abuses,
the latter cases should be counted by rt_refcnt as well as the former cases.

This change improves consistency of use of rt_refcnt:
- rtentry is always accessed with rt_refcnt incremented
- rtentry's rt_refcnt is decremented after use (rtfree is always used instead
of rt_refcnt--)
- functions returning rtentry increment its rt_refcnt (and caller rtfree it)

Note that rt_refcnt prevents rtentry from being freed but doesn't prevent
rtentry from being updated. Toward MP-safe, we need to provide another
protection for rtentry, e.g., locks. (Or introduce a better data structure
allowing concurrent readers during updates.)


Revision tags: nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.170 25-Nov-2014 christos

branches: 1.170.2;
CID 977389: Out of bounds access.


Revision tags: netbsd-7-0-2-RELEASE netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.169 06-Jun-2014 rmind

branches: 1.169.2;
- Eliminate RTFREE() macro in favour of rtfree() function.
- Make rtcache() function static.


# 1.168 30-May-2014 christos

Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.


# 1.167 19-May-2014 rmind

- Split off PRU_ATTACH and PRU_DETACH logic into separate functions.
- Replace malloc with kmem and eliminate M_PCB while here.
- Sprinkle more asserts.


Revision tags: rmind-smpnet-nbase rmind-smpnet-base
# 1.166 18-May-2014 rmind

Use IFNET_FIRST() rather than open coding ifnet access.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.165 25-Feb-2014 pooka

branches: 1.165.2;
Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.


# 1.164 20-Feb-2014 joerg

Bail out in case m_pulldown failed.


# 1.163 23-Nov-2013 christos

convert from CIRCLEQ to TAILQ.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.162 05-Jun-2013 christos

branches: 1.162.2;
IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.


Revision tags: agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.161 23-Jun-2012 christos

branches: 1.161.2;
4 new sysctls to avoid ipv6 DoS attacks from OpenBSD


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.160 22-Mar-2012 drochner

remove KAME IPSEC, replaced by FAST_IPSEC


Revision tags: netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.159 31-Dec-2011 christos

branches: 1.159.2; 1.159.6; 1.159.8;
- fix offsetof usage, and redundant defines
- kill pointer casts to 0


# 1.158 19-Dec-2011 drochner

rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.157 31-Aug-2011 plunky

branches: 1.157.2; 1.157.6;
NULL does not need a cast


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 rmind-uvmplock-base
# 1.156 12-Sep-2010 drochner

avoid NULL dereference in error case


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.155 18-Oct-2009 christos

branches: 1.155.2; 1.155.4;
fix the sun2 case for real.


# 1.154 12-Oct-2009 christos

unbreak sun2.


# 1.153 16-Sep-2009 pooka

Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.152 18-Mar-2009 cegger

bzero -> memset


# 1.151 18-Mar-2009 cegger

bcmp -> memcmp


Revision tags: netbsd-5-2-3-RELEASE netbsd-5-1-5-RELEASE netbsd-5-2-2-RELEASE netbsd-5-1-4-RELEASE netbsd-5-2-1-RELEASE netbsd-5-1-3-RELEASE netbsd-5-2-RELEASE netbsd-5-2-RC1 netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 nick-hppapmap-base2 netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 haad-dm-base mjf-devfs2-base
# 1.150 03-Oct-2008 adrianp

branches: 1.150.2; 1.150.8;
Fix for CVE-2008-3530 from matt@
Implement improved checking for MTU values on ICMP 'Packet Too Big Messages'


Revision tags: wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.149 06-Aug-2008 plunky

Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base
# 1.148 07-May-2008 bouyer

branches: 1.148.2; 1.148.6;
Sync with ipv4 icmp_input(): make sure the mbuf is writable and
contains the entire icmp message befre calling icmp6_input().
should fix "panic: mbuf too short for IPv6 header" seen by several peoples.


# 1.147 04-May-2008 thorpej

Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577


Revision tags: yamt-nfs-mp-base
# 1.146 23-Apr-2008 thorpej

branches: 1.146.2;
Use <net/net_stats.h> / netstat_sysctl().


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.145 15-Apr-2008 thorpej

branches: 1.145.2;
Make ip6 and icmp6 stats per-cpu.


# 1.144 08-Apr-2008 thorpej

Change IPv6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ip6stat structure; old netstat
binaries will continue to work properly.


# 1.143 08-Apr-2008 thorpej

Change ICMP6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old icmp6stat structure; old netstat
binaries will continue to work properly.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.142 27-Feb-2008 matt

Convert to ansi definitions from old-style definitons.
Remember that func() is not ansi, func(void) is.


Revision tags: nick-net80211-sync-base bouyer-xeni386-merge1 vmlocking2-base3 bouyer-xeni386-nbase yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 bouyer-xeni386-base yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase mjf-devfs-base matt-armv6-base jmcneill-pm-base hpcarm-cleanup-base reinoud-bufcleanup-base
# 1.141 04-Dec-2007 dyoung

branches: 1.141.8; 1.141.12;
Use IFNET_FOREACH() and IFADDR_FOREACH().


Revision tags: vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.140 01-Nov-2007 dyoung

branches: 1.140.2; 1.140.4;
De-__P().


# 1.139 29-Oct-2007 dyoung

The IPv6 stack labels incoming packets with an m_tag whose payload
is a struct ip6aux. A struct ip6aux used to contain a pointer to
an in6_ifaddr, but that pointer could become a dangling reference
in the lifetime of the m_tag, because ip6_setdstifaddr() did not
increase the in6_ifaddr's reference count. I have removed the
pointer from ip6aux. I load it with the interesting fields from
the in6_ifaddr (an IPv6 address, a scope ID, and some flags),
instead.


# 1.138 24-Oct-2007 dyoung

Replace rote sockaddr_in6 initializations (memset(), set sa6_family,
sa6_len, and sa6_add) with sockaddr_in6_init() calls.

De-__P(). Constify. KNF. Shorten a staircase. Change bcmp() to
memcmp().

Extract subroutine in6_setzoneid() from in6_setscope(), for re-use
soon.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base vmlocking-base
# 1.137 19-Sep-2007 dyoung

branches: 1.137.4;
1) Introduce a new socket option, (SOL_SOCKET, SO_NOHEADER), that
tells a socket that it should both add a protocol header to tx'd
datagrams and remove the header from rx'd datagrams:

int onoff = 1, s = socket(...);
setsockopt(s, SOL_SOCKET, SO_NOHEADER, &onoff);

2) Add an implementation of (SOL_SOCKET, SO_NOHEADER) for raw IPv4
sockets.

3) Reorganize the protocols' pr_ctloutput implementations a bit.
Consistently return ENOPROTOOPT when an option is unsupported,
and EINVAL if a supported option's arguments are incorrect.
Reorganize the flow of code so that it's more clear how/when
options are passed down the stack until they are handled.

Shorten some pr_ctloutput staircases for readability.

4) Extract common mbuf code into subroutines, add new sockaddr
methods, and introduce a new subroutine, fsocreate(), for reuse
later; use it first in sys_socket():

struct mbuf *m_getsombuf(struct socket *so)

Create an mbuf and make its owner the socket `so'.

struct mbuf *m_intopt(struct socket *so, int val)

Create an mbuf, make its owner the socket `so', put the
int `val' into it, and set its length to sizeof(int).


int fsocreate(..., int *fd)

Create a socket, a la socreate(9), put the socket into the
given LWP's descriptor table, return the descriptor at `fd'
on success.

void *sockaddr_addr(struct sockaddr *sa, socklen_t *slenp)
const void *sockaddr_const_addr(const struct sockaddr *sa, socklen_t *slenp)

Extract a pointer to the address part of a sockaddr. Write
the length of the address part at `slenp', if `slenp' is
not NULL.

socklen_t sockaddr_getlen(const struct sockaddr *sa)

Return the length of a sockaddr. This just evaluates to
sa->sa_len. I only add this for consistency with code that
appears in a portable userland library that I am going to
import.

const struct sockaddr *sockaddr_any(const struct sockaddr *sa)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.

const void *sockaddr_anyaddr(const struct sockaddr *sa, socklen_t *slenp)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.


Revision tags: nick-csl-alignment-base5
# 1.136 10-Aug-2007 dyoung

branches: 1.136.2;
Constify. bcopy -> memcpy.


Revision tags: matt-mips64-base
# 1.135 19-Jul-2007 dyoung

branches: 1.135.4; 1.135.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.134 13-Jun-2007 dyoung

branches: 1.134.2;
Persuasive programming: check M_UNWRITABLE(m, len) instead of
m->m_len<len before pulling up, because that helps make it clear
that we m_pullup() in order to guarantee that the contiguous region
is *writable*.


# 1.133 23-May-2007 christos

Ansify + add a few comments, from Karl Sj��dahl


Revision tags: yamt-idlelwp-base8
# 1.132 02-May-2007 dyoung

Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


Revision tags: thorpej-atomic-base
# 1.131 04-Mar-2007 christos

branches: 1.131.2; 1.131.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.130 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


# 1.129 10-Feb-2007 degroote

branches: 1.129.2;
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.128 29-Jan-2007 dyoung

bzero -> memset


# 1.127 15-Jan-2007 dyoung

Cosmetic: indent using ASCII horizontal tab, insert space following
comma, wrap line.


# 1.126 15-Jan-2007 degroote

Fix an infinite loop ( and local dos ) in the case where the ip6_hdr and
the icmp6_hdr are not in the same mbuf.
Fix pr/34994 and probably pr/35333
Ok @rpaulo


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.125 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.124 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


Revision tags: netbsd-4-base
# 1.123 16-Nov-2006 christos

branches: 1.123.2;
__unused removal on arguments; approved by core.


Revision tags: yamt-splraiseipl-base2
# 1.122 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.121 05-Sep-2006 dyoung

branches: 1.121.2; 1.121.4;
Simplify and repair icmp6_input() to stop the kernel from panicking
in m_copydata() when an ICMP6_ECHO_REQUEST is received, as reported
by Tatoku Ogaito on current-users@.


Revision tags: yamt-pdpolicy-base8
# 1.120 01-Sep-2006 dyoung

Vastly simplify the code that copies an ICMP6 packet to two data
paths: ICMP6 reply path, and socket path.


# 1.119 30-Aug-2006 christos

declare the type of code.


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.118 11-Jul-2006 tron

Clear mbuf checksum flags before passing it to ip6_output(). We might
recycle a mbuf which contained a hardware provided checksum. This
fixes "traceroute6" to a machine which is using a wm(4) interface
that has UDP or TCP checksum offload enabled.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base
# 1.117 07-Jun-2006 kardel

branches: 1.117.2;
merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base simonb-timecounters-base
# 1.116 15-Apr-2006 christos

branches: 1.116.2;
Coverity CID 740: Change constant comparisons to MCLBYTES to KASSERT and remove
extraneous tests.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2
# 1.115 05-Mar-2006 rpaulo

branches: 1.115.2; 1.115.4;
NDP-related improvements:
RFC4191
- supports host-side router-preference

RFC3542
- if DAD fails on a interface, disables IPv6 operation on the
interface
- don't advertise MLD report before DAD finishes

Others
- fixes integer overflow for valid and preferred lifetimes
- improves timer granularity for MLD, using callout-timer.
- reflects rtadvd's IPv6 host variable information into kernel
(router only)
- adds a sysctl option to enable/disable pMTUd for multicast
packets
- performs NUD on PPP/GRE interface by default
- Redirect works regardless of ip6_accept_rtadv
- removes RFC1885-related code

From the KAME project via SUZUKI Shinsuke.
Reviewed by core.


Revision tags: yamt-pdpolicy-base
# 1.114 03-Mar-2006 rpaulo

branches: 1.114.2;
Fix typos in comments.

From: the KAME project via SUZUKI Shinsuke.


Revision tags: yamt-uio_vmspace-base5
# 1.113 21-Jan-2006 rpaulo

branches: 1.113.2; 1.113.4;
Better support of IPv6 scoped addresses.

- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.
- and, while there, don't try to remove the ff02::/32 interface route
entry in in6_ifdetach() as it's already gone.

This also includes some level of support for the standard source
address selection algorithm defined in RFC3484, which will be
completed on in the future.

From the KAME project via JINMEI Tatuya.
Approved by core@.


# 1.112 11-Dec-2005 christos

branches: 1.112.2;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base ktrace-lwp-base
# 1.111 19-Oct-2005 bouyer

In icmp6_redirect_output(), sip6 is initialised to point to the data area of
m0. But m0 may be freed later, so trying to use sip6 at the end of this
function is wrong. My guess is that we want to reference the data area
of m (the mbuf about to be send) instead at this point.
Fix a panic on Xen (where a data area of a mbuf may be unmapped when the
mbuf is freed), and probably potential data/pool corruption in other cases.


Revision tags: yamt-vop-base
# 1.110 18-Aug-2005 yamt

branches: 1.110.2;
- introduce M_MOVE_PKTHDR and use it where appropriate.
intended to be mostly API compatible with openbsd/freebsd.
- remove a glue #define in netipsec/ipsec_osdep.h.


# 1.109 29-May-2005 christos

branches: 1.109.2;
- avoid shadowed variables
- sprinkle const.


Revision tags: netbsd-3-1-1-RELEASE netbsd-3-0-3-RELEASE netbsd-3-1-RELEASE netbsd-3-0-2-RELEASE netbsd-3-1-RC4 netbsd-3-1-RC3 netbsd-3-1-RC2 netbsd-3-1-RC1 netbsd-3-0-1-RELEASE netbsd-3-0-RELEASE netbsd-3-0-RC6 netbsd-3-0-RC5 netbsd-3-0-RC4 netbsd-3-0-RC3 netbsd-3-0-RC2 netbsd-3-0-RC1 yamt-km-base4 yamt-km-base3 netbsd-3-base yamt-km-base2 yamt-km-base kent-audio2-base
# 1.108 17-Jan-2005 itojun

branches: 1.108.6; 1.108.8; 1.108.10;
shouldn't check code field on "packet too big" icmp6 message.


Revision tags: kent-audio1-beforemerge kent-audio1-base
# 1.107 25-May-2004 atatat

branches: 1.107.4;
Sysctl descriptions under net subtree (net.key not done)


Revision tags: netbsd-2-0-base
# 1.106 26-Mar-2004 itojun

branches: 1.106.2;
do not touch m->m_pkthdr.rcvif after m becomes invalid. Patrick Latifi


# 1.105 24-Mar-2004 atatat

Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.


# 1.104 17-Dec-2003 lha

Fix ICMPV6CTL_ND6_[DP]RLIST, they broke with new sysctl.
Makes ndp -r/ndp -p work again, patch from atatat


# 1.103 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.102 30-Oct-2003 simonb

Remove some assigned-to but otherwise unused variables.


# 1.101 04-Sep-2003 itojun

revamp inpcb/in6pcb so that they are more aligned with each other.
in6pcb lookup now uses hash(9).


# 1.100 25-Aug-2003 itojun

deref member in in6p directly, don't rely on existence of macro


# 1.99 22-Aug-2003 itojun

remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.


# 1.98 22-Aug-2003 itojun

change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.


# 1.97 22-Aug-2003 jonathan

Replace the set_socket() method of passing an extra struct socket*
argument to ip6_output() with a new explicit struct in6pcb* argument.
(The underlying socket can be obtained via in6pcb->inp6_socket.)

In preparation for fast-ipsec. Reviewed by itojun.


# 1.96 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.95 06-Aug-2003 itojun

m_cat may free mbuf on 2nd arg, so m_pkthdr manipulation has to happen
before m_cat call. from Julian Coleman via kame.


# 1.94 24-Jun-2003 itojun

branches: 1.94.2;
remove unneeded checks of accept_rtadv. from kame


# 1.93 24-Jun-2003 itojun

use time.tv_sec directly


# 1.92 06-Jun-2003 itojun

- sync up MLD declaration with RFC3542 (s/MLD6/MLD/)
- routing header declaration with RFC3542
(note: sizeof(ip6_rthdr0) has changed!)
also, sync up with RFC2460 routing header definition (no "strict" source
routing mode any more)

part of advanced API update (RFC2292 -> 3542).


# 1.91 03-Jun-2003 itojun

remove assumption on redirect header option processing. from kame


# 1.90 14-May-2003 itojun

always use PULLDOWN_TEST codepath.


# 1.89 31-Mar-2003 itojun

avoid mbuf leak in redirect header option attachment. more complete
fix to come. from kame


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.88 27-Sep-2002 provos

remove trailing \n in panic(). approved perry.


# 1.87 23-Sep-2002 simonb

Remove breaks after returns, unreachable returns and returns after
returns(!).


# 1.86 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame.


Revision tags: gehenna-devsw-base
# 1.85 30-Jul-2002 itojun

no need to check NULL mbuf, as we touch it already.
From: tedu <grendel@zeitbombe.org>


# 1.84 10-Jul-2002 itojun

correct ping6 -w result wth hostname with [A-Z]. PR 17540. sync w/kame


# 1.83 30-Jun-2002 thorpej

Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).


# 1.82 09-Jun-2002 itojun

whitespace cleanup


# 1.81 08-Jun-2002 itojun

whitespace cleanup


# 1.80 31-May-2002 itojun

do not mistakenly lock PMTUD route entry with RTV_MTU.


# 1.79 29-May-2002 christos

make this compile again.


# 1.78 29-May-2002 itojun

correct rmx_mtu value after PMTUD entry timeout (should be set to 0)


# 1.77 24-May-2002 itojun

extra blank line


# 1.76 24-May-2002 itojun

make a strict check before sending FQDN node information reply. sync w/kame


Revision tags: netbsd-1-6-base eeh-devprop-base newlock-base
# 1.75 05-Mar-2002 itojun

branches: 1.75.6; 1.75.8;
on redirect output, always try to attach target link layer address option.


Revision tags: ifpoll-base
# 1.74 21-Dec-2001 itojun

whitespace/costmetic sync w/kame


# 1.73 20-Dec-2001 itojun

centralize multicast group management (in6_join/leavegroup).
have a flag for ip6_output() to fragment to minimum MTU.
sync with kame


# 1.72 07-Dec-2001 itojun

correct timing to increment icmp6 MIB variables. sync with kame


# 1.71 13-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-mips-cache-base
# 1.70 29-Oct-2001 simonb

Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.


# 1.69 24-Oct-2001 itojun

more whitespace sync with kame


# 1.68 18-Oct-2001 itojun

branches: 1.68.2;
simplify per-if stats.


# 1.67 15-Oct-2001 itojun

sync with kame.
net.inet6.icmp6.nodeinfo is now a bitmap (2^0 = ping6 -w, 2^1 = ping6 -a).
give up local if there's mbuf alloc failures.
cope with ".." in hostname.
sync comments/whitespaces.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.66 22-Jun-2001 itojun

branches: 1.66.2;
remove RFC1885 compatibility code in #ifdef COMPAT_RFC1885, for icmp6
reply packet size consideration (obsolete, not used for a long time).
sync with kame


# 1.65 01-Jun-2001 itojun

use default hoplimit when incoming interface is not given to icmp6_reflect.
sync with kame


# 1.64 08-May-2001 itojun

correct faith prefix determination. use sys/netinet/if_faith.c:faithprefix()
to determine. sync with kame.
(without this change, non-faith socket may mistakenly accept for-faith traffic)


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.63 04-Apr-2001 itojun

make sure rcvif is sane on call to icmp6_reflect


# 1.62 30-Mar-2001 itojun

enable FAKE_LOOPBACK_IF case by default.
now traffic on loopback interface will be presented to bpf as normal wire
format packet (without KAME scopeid in s6_addr16[1]).

fix KAME PR 250 (host mistakenly accepts packets to fe80::x%lo0).

sync with kame.


# 1.61 21-Mar-2001 itojun

set rmx_mtu to L2 interface mtu, instead of 0, on mtudisc timeout.
ip6_output() change is for safety. sync with kame


# 1.60 08-Mar-2001 itojun

remove bogus rtfree. sync with kame. inspired by openbsd PR 1706.


# 1.59 01-Mar-2001 itojun

branches: 1.59.2;
make sure to enforce inbound ipsec policy checking, for any protocols on top
of ip (check it when final header is visited). sync with kame.
XXX kame team will need to re-check policy engine code


# 1.58 11-Feb-2001 itojun

pull latest kame pcbnotify code. synchronizes ICMPv6 path mtu discovery
behavior with other protocols (i.e. validation, use of hiwat/lowat).


# 1.57 11-Feb-2001 itojun

recover $NetBSD$ (removed by mistake)


# 1.56 10-Feb-2001 itojun

to sync with kame better, (1) remove register declaration for variables,
(2) sync whitespaces, (3) update comments. (4) bring in some of portability
and logging enhancements. no functional changes here.


# 1.55 08-Feb-2001 itojun

implement upper limit to icmp6 redirects (experimental, turned off)
negative value to {mtudisc,redirect}_{hi,lo}wat will turn off the limitation.
sync with kame.


# 1.54 07-Feb-2001 itojun

remove bogus DIAGNOSTIC. sync with kame


# 1.53 07-Feb-2001 itojun

during ip6/icmp6 inbound packet processing, do not call log() nor printf() in
normal operation (/var can get filled up by flodding bogus packets).
sysctl net.inet6.icmp6.nd6_debug will turn on diagnostic messages.
(#define ND6_DEBUG will turn it on by default)

improve stats in ND6 code.

lots of synchronziation with kame (including comments and cometic ones).


# 1.52 24-Jan-2001 itojun

- record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation


# 1.51 16-Jan-2001 itojun

s/ND6DEBUG/ND6_DEBUG/ to meet other places


# 1.50 08-Jan-2001 itojun

wrap icmp6 checksum error printf() into #ifdef ND6DEBUG.
sync with kame, NetBSD PR 11911.


# 1.49 11-Dec-2000 itojun

no need to rtalloc1() twice in pmtud. from kame


# 1.48 09-Dec-2000 itojun

update icmp6 too big validation. the change is necessary since pmtud is
mandatory for IPv6 (so we can't just validate by using connected pcb - we need
to allow traffic from unconnected pcb to do pmtud).
- if the traffic is validated by xx_ctlinput, allow up to "hiwat" pmtud
route entries.
- if the traffic was not validated by xx_ctlinput, allow up to "lowat" pmtud
route entries (there's upper limit, so bad guys cannot blow up our routing
table).
sync with kame

XXX need to think again about default hiwat/lowat value.
XXX victim selection to help starvation case


# 1.47 11-Nov-2000 itojun

improve spec conformance of node information query (07).
sync with kame.


# 1.46 18-Oct-2000 itojun

verify ICMPv6 too big messages based on TCP pcbs, and/or IPsec SA.
TODO: udp6, and sendto consideration. as pmtud is mandatory for IPv6,
it is rather important for us to support those cases.
TODO: more testing
TODO: kame sync


# 1.45 10-Oct-2000 itojun

sync with kame ($KAME$)


# 1.44 02-Oct-2000 itojun

fix compilation without INET. fix confusion between ipsecstat and ipsec6stat.
sync with kame.


# 1.43 16-Sep-2000 itojun

kame sys/netinet6/icmp6.c 1.140 -> 1.144
> in the check for the incoming redirect message, examine the gateway
> (from the routing table) only when the address family of the gateway is
> AF_INET6.


# 1.42 19-Aug-2000 itojun

- icmp6 nodeinfo: remove possibility of unaligned pointer access.
- jumbo payload output: fix incorrect mbuf manipulation
- pedant: align issues, mbuf assumption
(sync with kame)


# 1.41 03-Aug-2000 itojun

clearifications in icmp6 node query support.
XXX previous commit included "supported qtypes" icmp6 node query support.
sorry commit message was mistaken.


# 1.40 03-Aug-2000 itojun

correct typo in #define. ICMP6_NI_SUCESS -> SUCCESS (notice missing C).
sync with kame.


# 1.39 30-Jul-2000 itojun

sync comment with reality


# 1.38 28-Jul-2000 itojun

nuke the following sysctl variables. "ppsratelimit" should work better.
need to recompile sbin/sysctl after updating /usr/include.
net.inet.tcp.rstratelimit
net.inet.icmp.errratelimit
net.inet6.icmp6.errratelimit


# 1.37 09-Jul-2000 itojun

add ppsratelimit(9), which does event-per-sec rate limitation.
use it from icmp6 error rate limitation code.
XXX better name for the function?


# 1.36 07-Jul-2000 itojun

sync with kame.
introduce in6_{recover,embed}scope, for in-kernel scoped-address manipulation.
improve in6_pcbnotify.


# 1.35 06-Jul-2000 itojun

- do not use bitfield for router renumbering header.
- add protection mechanism against ND cache corruption due to bad NUD hints.
- more stats
- icmp6 pps limitation. TOOD: should implement ppsratecheck(9).


# 1.34 28-Jun-2000 mrg

<vm/vm.h> -> <uvm/uvm_extern.h>


Revision tags: netbsd-1-5-base
# 1.33 13-Jun-2000 itojun

branches: 1.33.2;
signedness issue with char, take 2. confirmed with i386 cc -funsigned-char.


# 1.32 13-Jun-2000 itojun

workaround to suppress warning on char == unsigned char arch.


# 1.31 12-Jun-2000 itojun

better conformance to draft-ietf-ipngwg-icmp-name-lookups-05.
the old code was chimera of 03 and 05 draft.

-n by default, since IPv6 reverse lookup takes too much time.
use -H to enable reverse name lookup.


Revision tags: minoura-xpg4dl-base
# 1.30 22-May-2000 itojun

branches: 1.30.2;
disallow negative numbers for ratelimit interval (tcp, icmp, icmp6).


# 1.29 09-May-2000 itojun

do not try NUD unless the gateway is a real neighbor.
real fix to KAME PR 245 (workaround has been implemented).


# 1.28 13-Apr-2000 itojun

do not return icmp6 error against icmp6 error.
(this is due to a bug in header chain chasing)


# 1.27 22-Mar-2000 itojun

use ip6_{last,next}hdr in icmp6 inbound packet parsing.


# 1.26 01-Mar-2000 itojun

introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)


# 1.25 28-Feb-2000 itojun

fix ICMPv6 redirect input. the bug can result in invalid ND entry.


# 1.24 28-Feb-2000 itojun

support draft-ietf-ipngwg-icmp-name-lookups-05.txt, drop support for
draft-ietf-ipngwg-icmp-name-lookups-04.txt.

There are certain bitfield change in 04 draft to 05 draft, which makes
04 "ping6 -a" and 05 "ping6 -a" not interoperable. sigh.


# 1.23 26-Feb-2000 itojun

bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.


# 1.22 17-Feb-2000 darrenr

Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.


# 1.21 15-Feb-2000 thorpej

Fix a couple of brainos in the last.


# 1.20 14-Feb-2000 thorpej

Use ratecheck() for ICMP6 rate limiting.


Revision tags: chs-ubc2-newbase
# 1.19 06-Feb-2000 itojun

fix include pathname for better rfc2292 compliance.


# 1.18 16-Jan-2000 itojun

add missing ipcomp cases.


# 1.17 07-Jan-2000 itohy

Rename variable "prep" for PReP port.


# 1.16 06-Jan-2000 itojun

remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...


# 1.15 05-Jan-2000 itojun

avoid panic on getsockopt(ICMPV6_FILTER).


# 1.14 02-Jan-2000 itojun

add net.inet6.icmp6.nodeinfo sysctl.
this allows you to disable/enable ICMPv6 node information query/reply
processing (which tells remote end the gethostname(3) setting, interface
addresses on the node, and some other things - documented in
draft-ietf-ipngwg-icmp-name-lookup* or something alike).

to test it, try ping6 -w ::1 with nodeinfo=0 and nodeinfo=1.
(sync with kame change)


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.13 15-Dec-1999 itojun

do not overwrite traffic class field when we write IPv6 version field.


# 1.12 13-Dec-1999 itojun

sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)


Revision tags: comdex-fall-1999-base fvdl-softdep-base
# 1.11 01-Oct-1999 itojun

branches: 1.11.2; 1.11.8;
consistent logging for icmp6 redirects
XXX should make logs 1-liner so that duplicated logs can be compressed
by syslog(8)?


Revision tags: chs-ubc2-base
# 1.10 31-Jul-1999 itojun

sync with recent KAME.
- loosen ipsec restriction on packet diredtion.
- revise icmp6 redirect handling on IsRouter bit.
- tcp/udp notification processing (link-local address case)
- cosmetic fixes (better code share across *BSD).


# 1.9 30-Jul-1999 itojun

remove reference to in6_systm.h (file itself will be removed afterwords)


# 1.8 22-Jul-1999 itojun

- implement IPv6 pmtud, which is necessary for TCP6.
- fix memory leak on SO_DEBUG over TCP.


# 1.7 22-Jul-1999 itojun

change unnecessary u_long/long into u_int32_t or something relevant.
more fixes should follow.


# 1.6 09-Jul-1999 thorpej

defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).


# 1.5 06-Jul-1999 itojun

sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups


# 1.4 06-Jul-1999 itojun

checked build on alpha and i386, with GENERIC.v6.
fixed several sizeof(void *) and sizeof(size_t) issues on alpha.

Thanks to: Dave Huang and Tim Rightnour


# 1.3 03-Jul-1999 thorpej

RCS ID police.


# 1.2 01-Jul-1999 itojun

branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.1 28-Jun-1999 itojun

branches: 1.1.2;
file icmp6.c was initially added on branch kame.


# 1.204 13-Jan-2017 ozaki-r

Tweak icmp6_input; always use off, not *offp


Revision tags: pgoyette-localcount-20170107
# 1.203 12-Dec-2016 ozaki-r

Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.202 11-Dec-2016 ozaki-r

Correct sanity checks of icmp6_redirect_output

- rt->rt_ifp is always non-NULL
- Checking RTF_UP here is just racy and meaningless
- The arguments should be non-NULL (at least for now)


Revision tags: nick-nhusb-base-20161204
# 1.201 15-Nov-2016 mlelstv

Enforce alignment requirements that are violated in some cases.
For machines that don't need strict alignment (i386,amd64,vax,m68k) this
is a no-op.

Fixes PR kern/50766 but should be improved.


Revision tags: pgoyette-localcount-20161104
# 1.200 31-Oct-2016 ozaki-r

Fix race condition of in6_selectsrc

in6_selectsrc returned a pointer to in6_addr that wan't guaranteed to be
safe by pserialize (or psref), which was racy. Let callers pass a pointer
to in6_addr and in6_selectsrc copy a result to it inside pserialize
critical sections.


# 1.199 25-Oct-2016 ozaki-r

Remove unnecessary argument

No functional change.


# 1.198 18-Oct-2016 ozaki-r

Remove unnecessary pserialize_read_enter


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.197 26-Aug-2016 dholland

PR 51434 David Binderman: remove redundant test.


# 1.196 19-Aug-2016 roy

Revert r1.148
IP6_EXTHDR_GET ensures that a icmp6 header can be fetched from the mbuf
so m_pullup does not need to be called.

While here, we can safely increament interface error stats even with an
invalidated mbuf because we have a saved reference to the interface.


Revision tags: pgoyette-localcount-20160806
# 1.195 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


Revision tags: pgoyette-localcount-20160726
# 1.194 15-Jul-2016 ozaki-r

Use sin6tosa and sin6tocsa macros

No functional change.


# 1.193 15-Jul-2016 ozaki-r

Use ifatoia6 macro

No functional change.


Revision tags: pgoyette-localcount-base nick-nhusb-base-20160907
# 1.192 07-Jul-2016 ozaki-r

branches: 1.192.2;
Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.


# 1.191 05-Jul-2016 ozaki-r

Use ia6 or ia instead of ifa as a variable name of struct in6_ifaddr

We conventionally use ifa for struct ifaddr and use ia6 or ia for
struct in6_ifaddr.

No functional change.


# 1.190 28-Jun-2016 ozaki-r

Add missing NULL checks for m_get_rcvif_psref


# 1.189 21-Jun-2016 ozaki-r

Make sure returning ifp from in6_select* functions psref-ed

To this end, callers need to pass struct psref to the functions
and the fuctions acquire a reference of ifp with it. In some cases,
we can simply use if_get_byindex, however, in other cases
(say rt->rt_ifp and ia->ifa_ifp), we have no MP-safe way for now.
In order to take a reference anyway we use non MP-safe function
if_acquire_NOMPSAFE for the latter cases. They should be fixed in
the future somehow.


# 1.188 10-Jun-2016 ozaki-r

Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.


# 1.187 10-Jun-2016 ozaki-r

Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.186 18-May-2016 ozaki-r

Don't try to get outif unnecessarily from in6_selectsrc

The got outif is unused.


# 1.185 17-May-2016 ozaki-r

Get rcvif once and reuse it

No functional change.


# 1.184 17-May-2016 ozaki-r

Make sure icmp6_redirect_input frees mbuf before return


# 1.183 12-May-2016 ozaki-r

Protect ifnet list with psz and psref

The change ensures that ifnet objects in the ifnet list aren't freed during
list iterations by using pserialize(9) and psref(9).

Note that the change adds a pslist(9) for ifnet but doesn't remove the
original ifnet list (ifnet_list) to avoid breaking kvm(3) users. We
shouldn't use the original list in the kernel anymore.


Revision tags: nick-nhusb-base-20160422
# 1.182 04-Apr-2016 ozaki-r

Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html


# 1.181 01-Apr-2016 ozaki-r

Remove unnecessary casts and do s/0/NULL/ for rtrequest


# 1.180 01-Apr-2016 ozaki-r

Refine nd6log

Add __func__ to nd6log itself instead of adding it to callers.


Revision tags: nick-nhusb-base-20160319
# 1.179 21-Jan-2016 riastradh

Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.


# 1.178 21-Jan-2016 riastradh

Give proper prototype to ip_output.


Revision tags: nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.177 14-Sep-2015 ozaki-r

Update icmp6_redirect_timeout_q when changing net.inet6.icmp6.redirtimeout

We have to update icmp6_redirect_timeout_q as well as icmp6_redirtimeout
when changing net.inet6.icmp6.redirtimeout via sysctl. The updating logic
is copied from sysctl_net_inet_icmp_redirtimeout.

This change is from s-yamaguchi@IIJ (with KNF by ozaki-r) and fixes
PR kern/50240.


# 1.176 31-Aug-2015 ozaki-r

Make rt_refcnt take into account rt_timer


# 1.175 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.174 24-Aug-2015 ozaki-r

Change 0 to NULL for rtrequest's last argument (struct rtentry **ret_nrt)


# 1.173 07-Aug-2015 ozaki-r

Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.


# 1.172 24-Jul-2015 ozaki-r

Fix rtfree-ing wrong rtentry


# 1.171 17-Jul-2015 ozaki-r

Reform use of rt_refcnt

rt_refcnt of rtentry was used in bad manners, for example, direct rt_refcnt++
and rt_refcnt-- outside route.c, "rt->rt_refcnt++; rtfree(rt);" idiom, and
touching rt after rt->rt_refcnt--.

These abuses seem to be needed because rt_refcnt manages only references
between rtentry and doesn't take care of references during packet processing
(IOW references from local variables). In order to reduce the above abuses,
the latter cases should be counted by rt_refcnt as well as the former cases.

This change improves consistency of use of rt_refcnt:
- rtentry is always accessed with rt_refcnt incremented
- rtentry's rt_refcnt is decremented after use (rtfree is always used instead
of rt_refcnt--)
- functions returning rtentry increment its rt_refcnt (and caller rtfree it)

Note that rt_refcnt prevents rtentry from being freed but doesn't prevent
rtentry from being updated. Toward MP-safe, we need to provide another
protection for rtentry, e.g., locks. (Or introduce a better data structure
allowing concurrent readers during updates.)


Revision tags: nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.170 25-Nov-2014 christos

branches: 1.170.2;
CID 977389: Out of bounds access.


Revision tags: netbsd-7-0-2-RELEASE netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.169 06-Jun-2014 rmind

branches: 1.169.2;
- Eliminate RTFREE() macro in favour of rtfree() function.
- Make rtcache() function static.


# 1.168 30-May-2014 christos

Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.


# 1.167 19-May-2014 rmind

- Split off PRU_ATTACH and PRU_DETACH logic into separate functions.
- Replace malloc with kmem and eliminate M_PCB while here.
- Sprinkle more asserts.


Revision tags: rmind-smpnet-nbase rmind-smpnet-base
# 1.166 18-May-2014 rmind

Use IFNET_FIRST() rather than open coding ifnet access.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.165 25-Feb-2014 pooka

branches: 1.165.2;
Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.


# 1.164 20-Feb-2014 joerg

Bail out in case m_pulldown failed.


# 1.163 23-Nov-2013 christos

convert from CIRCLEQ to TAILQ.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.162 05-Jun-2013 christos

branches: 1.162.2;
IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.


Revision tags: agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.161 23-Jun-2012 christos

branches: 1.161.2;
4 new sysctls to avoid ipv6 DoS attacks from OpenBSD


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.160 22-Mar-2012 drochner

remove KAME IPSEC, replaced by FAST_IPSEC


Revision tags: netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.159 31-Dec-2011 christos

branches: 1.159.2; 1.159.6; 1.159.8;
- fix offsetof usage, and redundant defines
- kill pointer casts to 0


# 1.158 19-Dec-2011 drochner

rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.157 31-Aug-2011 plunky

branches: 1.157.2; 1.157.6;
NULL does not need a cast


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 rmind-uvmplock-base
# 1.156 12-Sep-2010 drochner

avoid NULL dereference in error case


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.155 18-Oct-2009 christos

branches: 1.155.2; 1.155.4;
fix the sun2 case for real.


# 1.154 12-Oct-2009 christos

unbreak sun2.


# 1.153 16-Sep-2009 pooka

Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.152 18-Mar-2009 cegger

bzero -> memset


# 1.151 18-Mar-2009 cegger

bcmp -> memcmp


Revision tags: netbsd-5-2-3-RELEASE netbsd-5-1-5-RELEASE netbsd-5-2-2-RELEASE netbsd-5-1-4-RELEASE netbsd-5-2-1-RELEASE netbsd-5-1-3-RELEASE netbsd-5-2-RELEASE netbsd-5-2-RC1 netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 nick-hppapmap-base2 netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 haad-dm-base mjf-devfs2-base
# 1.150 03-Oct-2008 adrianp

branches: 1.150.2; 1.150.8;
Fix for CVE-2008-3530 from matt@
Implement improved checking for MTU values on ICMP 'Packet Too Big Messages'


Revision tags: wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.149 06-Aug-2008 plunky

Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base
# 1.148 07-May-2008 bouyer

branches: 1.148.2; 1.148.6;
Sync with ipv4 icmp_input(): make sure the mbuf is writable and
contains the entire icmp message befre calling icmp6_input().
should fix "panic: mbuf too short for IPv6 header" seen by several peoples.


# 1.147 04-May-2008 thorpej

Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577


Revision tags: yamt-nfs-mp-base
# 1.146 23-Apr-2008 thorpej

branches: 1.146.2;
Use <net/net_stats.h> / netstat_sysctl().


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.145 15-Apr-2008 thorpej

branches: 1.145.2;
Make ip6 and icmp6 stats per-cpu.


# 1.144 08-Apr-2008 thorpej

Change IPv6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ip6stat structure; old netstat
binaries will continue to work properly.


# 1.143 08-Apr-2008 thorpej

Change ICMP6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old icmp6stat structure; old netstat
binaries will continue to work properly.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.142 27-Feb-2008 matt

Convert to ansi definitions from old-style definitons.
Remember that func() is not ansi, func(void) is.


Revision tags: nick-net80211-sync-base bouyer-xeni386-merge1 vmlocking2-base3 bouyer-xeni386-nbase yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 bouyer-xeni386-base yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase mjf-devfs-base matt-armv6-base jmcneill-pm-base hpcarm-cleanup-base reinoud-bufcleanup-base
# 1.141 04-Dec-2007 dyoung

branches: 1.141.8; 1.141.12;
Use IFNET_FOREACH() and IFADDR_FOREACH().


Revision tags: vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.140 01-Nov-2007 dyoung

branches: 1.140.2; 1.140.4;
De-__P().


# 1.139 29-Oct-2007 dyoung

The IPv6 stack labels incoming packets with an m_tag whose payload
is a struct ip6aux. A struct ip6aux used to contain a pointer to
an in6_ifaddr, but that pointer could become a dangling reference
in the lifetime of the m_tag, because ip6_setdstifaddr() did not
increase the in6_ifaddr's reference count. I have removed the
pointer from ip6aux. I load it with the interesting fields from
the in6_ifaddr (an IPv6 address, a scope ID, and some flags),
instead.


# 1.138 24-Oct-2007 dyoung

Replace rote sockaddr_in6 initializations (memset(), set sa6_family,
sa6_len, and sa6_add) with sockaddr_in6_init() calls.

De-__P(). Constify. KNF. Shorten a staircase. Change bcmp() to
memcmp().

Extract subroutine in6_setzoneid() from in6_setscope(), for re-use
soon.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base vmlocking-base
# 1.137 19-Sep-2007 dyoung

branches: 1.137.4;
1) Introduce a new socket option, (SOL_SOCKET, SO_NOHEADER), that
tells a socket that it should both add a protocol header to tx'd
datagrams and remove the header from rx'd datagrams:

int onoff = 1, s = socket(...);
setsockopt(s, SOL_SOCKET, SO_NOHEADER, &onoff);

2) Add an implementation of (SOL_SOCKET, SO_NOHEADER) for raw IPv4
sockets.

3) Reorganize the protocols' pr_ctloutput implementations a bit.
Consistently return ENOPROTOOPT when an option is unsupported,
and EINVAL if a supported option's arguments are incorrect.
Reorganize the flow of code so that it's more clear how/when
options are passed down the stack until they are handled.

Shorten some pr_ctloutput staircases for readability.

4) Extract common mbuf code into subroutines, add new sockaddr
methods, and introduce a new subroutine, fsocreate(), for reuse
later; use it first in sys_socket():

struct mbuf *m_getsombuf(struct socket *so)

Create an mbuf and make its owner the socket `so'.

struct mbuf *m_intopt(struct socket *so, int val)

Create an mbuf, make its owner the socket `so', put the
int `val' into it, and set its length to sizeof(int).


int fsocreate(..., int *fd)

Create a socket, a la socreate(9), put the socket into the
given LWP's descriptor table, return the descriptor at `fd'
on success.

void *sockaddr_addr(struct sockaddr *sa, socklen_t *slenp)
const void *sockaddr_const_addr(const struct sockaddr *sa, socklen_t *slenp)

Extract a pointer to the address part of a sockaddr. Write
the length of the address part at `slenp', if `slenp' is
not NULL.

socklen_t sockaddr_getlen(const struct sockaddr *sa)

Return the length of a sockaddr. This just evaluates to
sa->sa_len. I only add this for consistency with code that
appears in a portable userland library that I am going to
import.

const struct sockaddr *sockaddr_any(const struct sockaddr *sa)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.

const void *sockaddr_anyaddr(const struct sockaddr *sa, socklen_t *slenp)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.


Revision tags: nick-csl-alignment-base5
# 1.136 10-Aug-2007 dyoung

branches: 1.136.2;
Constify. bcopy -> memcpy.


Revision tags: matt-mips64-base
# 1.135 19-Jul-2007 dyoung

branches: 1.135.4; 1.135.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.134 13-Jun-2007 dyoung

branches: 1.134.2;
Persuasive programming: check M_UNWRITABLE(m, len) instead of
m->m_len<len before pulling up, because that helps make it clear
that we m_pullup() in order to guarantee that the contiguous region
is *writable*.


# 1.133 23-May-2007 christos

Ansify + add a few comments, from Karl Sj��dahl


Revision tags: yamt-idlelwp-base8
# 1.132 02-May-2007 dyoung

Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


Revision tags: thorpej-atomic-base
# 1.131 04-Mar-2007 christos

branches: 1.131.2; 1.131.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.130 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


# 1.129 10-Feb-2007 degroote

branches: 1.129.2;
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.128 29-Jan-2007 dyoung

bzero -> memset


# 1.127 15-Jan-2007 dyoung

Cosmetic: indent using ASCII horizontal tab, insert space following
comma, wrap line.


# 1.126 15-Jan-2007 degroote

Fix an infinite loop ( and local dos ) in the case where the ip6_hdr and
the icmp6_hdr are not in the same mbuf.
Fix pr/34994 and probably pr/35333
Ok @rpaulo


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.125 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.124 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


Revision tags: netbsd-4-base
# 1.123 16-Nov-2006 christos

branches: 1.123.2;
__unused removal on arguments; approved by core.


Revision tags: yamt-splraiseipl-base2
# 1.122 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.121 05-Sep-2006 dyoung

branches: 1.121.2; 1.121.4;
Simplify and repair icmp6_input() to stop the kernel from panicking
in m_copydata() when an ICMP6_ECHO_REQUEST is received, as reported
by Tatoku Ogaito on current-users@.


Revision tags: yamt-pdpolicy-base8
# 1.120 01-Sep-2006 dyoung

Vastly simplify the code that copies an ICMP6 packet to two data
paths: ICMP6 reply path, and socket path.


# 1.119 30-Aug-2006 christos

declare the type of code.


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.118 11-Jul-2006 tron

Clear mbuf checksum flags before passing it to ip6_output(). We might
recycle a mbuf which contained a hardware provided checksum. This
fixes "traceroute6" to a machine which is using a wm(4) interface
that has UDP or TCP checksum offload enabled.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base
# 1.117 07-Jun-2006 kardel

branches: 1.117.2;
merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base simonb-timecounters-base
# 1.116 15-Apr-2006 christos

branches: 1.116.2;
Coverity CID 740: Change constant comparisons to MCLBYTES to KASSERT and remove
extraneous tests.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2
# 1.115 05-Mar-2006 rpaulo

branches: 1.115.2; 1.115.4;
NDP-related improvements:
RFC4191
- supports host-side router-preference

RFC3542
- if DAD fails on a interface, disables IPv6 operation on the
interface
- don't advertise MLD report before DAD finishes

Others
- fixes integer overflow for valid and preferred lifetimes
- improves timer granularity for MLD, using callout-timer.
- reflects rtadvd's IPv6 host variable information into kernel
(router only)
- adds a sysctl option to enable/disable pMTUd for multicast
packets
- performs NUD on PPP/GRE interface by default
- Redirect works regardless of ip6_accept_rtadv
- removes RFC1885-related code

From the KAME project via SUZUKI Shinsuke.
Reviewed by core.


Revision tags: yamt-pdpolicy-base
# 1.114 03-Mar-2006 rpaulo

branches: 1.114.2;
Fix typos in comments.

From: the KAME project via SUZUKI Shinsuke.


Revision tags: yamt-uio_vmspace-base5
# 1.113 21-Jan-2006 rpaulo

branches: 1.113.2; 1.113.4;
Better support of IPv6 scoped addresses.

- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.
- and, while there, don't try to remove the ff02::/32 interface route
entry in in6_ifdetach() as it's already gone.

This also includes some level of support for the standard source
address selection algorithm defined in RFC3484, which will be
completed on in the future.

From the KAME project via JINMEI Tatuya.
Approved by core@.


# 1.112 11-Dec-2005 christos

branches: 1.112.2;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base ktrace-lwp-base
# 1.111 19-Oct-2005 bouyer

In icmp6_redirect_output(), sip6 is initialised to point to the data area of
m0. But m0 may be freed later, so trying to use sip6 at the end of this
function is wrong. My guess is that we want to reference the data area
of m (the mbuf about to be send) instead at this point.
Fix a panic on Xen (where a data area of a mbuf may be unmapped when the
mbuf is freed), and probably potential data/pool corruption in other cases.


Revision tags: yamt-vop-base
# 1.110 18-Aug-2005 yamt

branches: 1.110.2;
- introduce M_MOVE_PKTHDR and use it where appropriate.
intended to be mostly API compatible with openbsd/freebsd.
- remove a glue #define in netipsec/ipsec_osdep.h.


# 1.109 29-May-2005 christos

branches: 1.109.2;
- avoid shadowed variables
- sprinkle const.


Revision tags: netbsd-3-1-1-RELEASE netbsd-3-0-3-RELEASE netbsd-3-1-RELEASE netbsd-3-0-2-RELEASE netbsd-3-1-RC4 netbsd-3-1-RC3 netbsd-3-1-RC2 netbsd-3-1-RC1 netbsd-3-0-1-RELEASE netbsd-3-0-RELEASE netbsd-3-0-RC6 netbsd-3-0-RC5 netbsd-3-0-RC4 netbsd-3-0-RC3 netbsd-3-0-RC2 netbsd-3-0-RC1 yamt-km-base4 yamt-km-base3 netbsd-3-base yamt-km-base2 yamt-km-base kent-audio2-base
# 1.108 17-Jan-2005 itojun

branches: 1.108.6; 1.108.8; 1.108.10;
shouldn't check code field on "packet too big" icmp6 message.


Revision tags: kent-audio1-beforemerge kent-audio1-base
# 1.107 25-May-2004 atatat

branches: 1.107.4;
Sysctl descriptions under net subtree (net.key not done)


Revision tags: netbsd-2-0-base
# 1.106 26-Mar-2004 itojun

branches: 1.106.2;
do not touch m->m_pkthdr.rcvif after m becomes invalid. Patrick Latifi


# 1.105 24-Mar-2004 atatat

Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.


# 1.104 17-Dec-2003 lha

Fix ICMPV6CTL_ND6_[DP]RLIST, they broke with new sysctl.
Makes ndp -r/ndp -p work again, patch from atatat


# 1.103 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.102 30-Oct-2003 simonb

Remove some assigned-to but otherwise unused variables.


# 1.101 04-Sep-2003 itojun

revamp inpcb/in6pcb so that they are more aligned with each other.
in6pcb lookup now uses hash(9).


# 1.100 25-Aug-2003 itojun

deref member in in6p directly, don't rely on existence of macro


# 1.99 22-Aug-2003 itojun

remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.


# 1.98 22-Aug-2003 itojun

change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.


# 1.97 22-Aug-2003 jonathan

Replace the set_socket() method of passing an extra struct socket*
argument to ip6_output() with a new explicit struct in6pcb* argument.
(The underlying socket can be obtained via in6pcb->inp6_socket.)

In preparation for fast-ipsec. Reviewed by itojun.


# 1.96 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.95 06-Aug-2003 itojun

m_cat may free mbuf on 2nd arg, so m_pkthdr manipulation has to happen
before m_cat call. from Julian Coleman via kame.


# 1.94 24-Jun-2003 itojun

branches: 1.94.2;
remove unneeded checks of accept_rtadv. from kame


# 1.93 24-Jun-2003 itojun

use time.tv_sec directly


# 1.92 06-Jun-2003 itojun

- sync up MLD declaration with RFC3542 (s/MLD6/MLD/)
- routing header declaration with RFC3542
(note: sizeof(ip6_rthdr0) has changed!)
also, sync up with RFC2460 routing header definition (no "strict" source
routing mode any more)

part of advanced API update (RFC2292 -> 3542).


# 1.91 03-Jun-2003 itojun

remove assumption on redirect header option processing. from kame


# 1.90 14-May-2003 itojun

always use PULLDOWN_TEST codepath.


# 1.89 31-Mar-2003 itojun

avoid mbuf leak in redirect header option attachment. more complete
fix to come. from kame


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.88 27-Sep-2002 provos

remove trailing \n in panic(). approved perry.


# 1.87 23-Sep-2002 simonb

Remove breaks after returns, unreachable returns and returns after
returns(!).


# 1.86 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame.


Revision tags: gehenna-devsw-base
# 1.85 30-Jul-2002 itojun

no need to check NULL mbuf, as we touch it already.
From: tedu <grendel@zeitbombe.org>


# 1.84 10-Jul-2002 itojun

correct ping6 -w result wth hostname with [A-Z]. PR 17540. sync w/kame


# 1.83 30-Jun-2002 thorpej

Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).


# 1.82 09-Jun-2002 itojun

whitespace cleanup


# 1.81 08-Jun-2002 itojun

whitespace cleanup


# 1.80 31-May-2002 itojun

do not mistakenly lock PMTUD route entry with RTV_MTU.


# 1.79 29-May-2002 christos

make this compile again.


# 1.78 29-May-2002 itojun

correct rmx_mtu value after PMTUD entry timeout (should be set to 0)


# 1.77 24-May-2002 itojun

extra blank line


# 1.76 24-May-2002 itojun

make a strict check before sending FQDN node information reply. sync w/kame


Revision tags: netbsd-1-6-base eeh-devprop-base newlock-base
# 1.75 05-Mar-2002 itojun

branches: 1.75.6; 1.75.8;
on redirect output, always try to attach target link layer address option.


Revision tags: ifpoll-base
# 1.74 21-Dec-2001 itojun

whitespace/costmetic sync w/kame


# 1.73 20-Dec-2001 itojun

centralize multicast group management (in6_join/leavegroup).
have a flag for ip6_output() to fragment to minimum MTU.
sync with kame


# 1.72 07-Dec-2001 itojun

correct timing to increment icmp6 MIB variables. sync with kame


# 1.71 13-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-mips-cache-base
# 1.70 29-Oct-2001 simonb

Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.


# 1.69 24-Oct-2001 itojun

more whitespace sync with kame


# 1.68 18-Oct-2001 itojun

branches: 1.68.2;
simplify per-if stats.


# 1.67 15-Oct-2001 itojun

sync with kame.
net.inet6.icmp6.nodeinfo is now a bitmap (2^0 = ping6 -w, 2^1 = ping6 -a).
give up local if there's mbuf alloc failures.
cope with ".." in hostname.
sync comments/whitespaces.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.66 22-Jun-2001 itojun

branches: 1.66.2;
remove RFC1885 compatibility code in #ifdef COMPAT_RFC1885, for icmp6
reply packet size consideration (obsolete, not used for a long time).
sync with kame


# 1.65 01-Jun-2001 itojun

use default hoplimit when incoming interface is not given to icmp6_reflect.
sync with kame


# 1.64 08-May-2001 itojun

correct faith prefix determination. use sys/netinet/if_faith.c:faithprefix()
to determine. sync with kame.
(without this change, non-faith socket may mistakenly accept for-faith traffic)


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.63 04-Apr-2001 itojun

make sure rcvif is sane on call to icmp6_reflect


# 1.62 30-Mar-2001 itojun

enable FAKE_LOOPBACK_IF case by default.
now traffic on loopback interface will be presented to bpf as normal wire
format packet (without KAME scopeid in s6_addr16[1]).

fix KAME PR 250 (host mistakenly accepts packets to fe80::x%lo0).

sync with kame.


# 1.61 21-Mar-2001 itojun

set rmx_mtu to L2 interface mtu, instead of 0, on mtudisc timeout.
ip6_output() change is for safety. sync with kame


# 1.60 08-Mar-2001 itojun

remove bogus rtfree. sync with kame. inspired by openbsd PR 1706.


# 1.59 01-Mar-2001 itojun

branches: 1.59.2;
make sure to enforce inbound ipsec policy checking, for any protocols on top
of ip (check it when final header is visited). sync with kame.
XXX kame team will need to re-check policy engine code


# 1.58 11-Feb-2001 itojun

pull latest kame pcbnotify code. synchronizes ICMPv6 path mtu discovery
behavior with other protocols (i.e. validation, use of hiwat/lowat).


# 1.57 11-Feb-2001 itojun

recover $NetBSD$ (removed by mistake)


# 1.56 10-Feb-2001 itojun

to sync with kame better, (1) remove register declaration for variables,
(2) sync whitespaces, (3) update comments. (4) bring in some of portability
and logging enhancements. no functional changes here.


# 1.55 08-Feb-2001 itojun

implement upper limit to icmp6 redirects (experimental, turned off)
negative value to {mtudisc,redirect}_{hi,lo}wat will turn off the limitation.
sync with kame.


# 1.54 07-Feb-2001 itojun

remove bogus DIAGNOSTIC. sync with kame


# 1.53 07-Feb-2001 itojun

during ip6/icmp6 inbound packet processing, do not call log() nor printf() in
normal operation (/var can get filled up by flodding bogus packets).
sysctl net.inet6.icmp6.nd6_debug will turn on diagnostic messages.
(#define ND6_DEBUG will turn it on by default)

improve stats in ND6 code.

lots of synchronziation with kame (including comments and cometic ones).


# 1.52 24-Jan-2001 itojun

- record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation


# 1.51 16-Jan-2001 itojun

s/ND6DEBUG/ND6_DEBUG/ to meet other places


# 1.50 08-Jan-2001 itojun

wrap icmp6 checksum error printf() into #ifdef ND6DEBUG.
sync with kame, NetBSD PR 11911.


# 1.49 11-Dec-2000 itojun

no need to rtalloc1() twice in pmtud. from kame


# 1.48 09-Dec-2000 itojun

update icmp6 too big validation. the change is necessary since pmtud is
mandatory for IPv6 (so we can't just validate by using connected pcb - we need
to allow traffic from unconnected pcb to do pmtud).
- if the traffic is validated by xx_ctlinput, allow up to "hiwat" pmtud
route entries.
- if the traffic was not validated by xx_ctlinput, allow up to "lowat" pmtud
route entries (there's upper limit, so bad guys cannot blow up our routing
table).
sync with kame

XXX need to think again about default hiwat/lowat value.
XXX victim selection to help starvation case


# 1.47 11-Nov-2000 itojun

improve spec conformance of node information query (07).
sync with kame.


# 1.46 18-Oct-2000 itojun

verify ICMPv6 too big messages based on TCP pcbs, and/or IPsec SA.
TODO: udp6, and sendto consideration. as pmtud is mandatory for IPv6,
it is rather important for us to support those cases.
TODO: more testing
TODO: kame sync


# 1.45 10-Oct-2000 itojun

sync with kame ($KAME$)


# 1.44 02-Oct-2000 itojun

fix compilation without INET. fix confusion between ipsecstat and ipsec6stat.
sync with kame.


# 1.43 16-Sep-2000 itojun

kame sys/netinet6/icmp6.c 1.140 -> 1.144
> in the check for the incoming redirect message, examine the gateway
> (from the routing table) only when the address family of the gateway is
> AF_INET6.


# 1.42 19-Aug-2000 itojun

- icmp6 nodeinfo: remove possibility of unaligned pointer access.
- jumbo payload output: fix incorrect mbuf manipulation
- pedant: align issues, mbuf assumption
(sync with kame)


# 1.41 03-Aug-2000 itojun

clearifications in icmp6 node query support.
XXX previous commit included "supported qtypes" icmp6 node query support.
sorry commit message was mistaken.


# 1.40 03-Aug-2000 itojun

correct typo in #define. ICMP6_NI_SUCESS -> SUCCESS (notice missing C).
sync with kame.


# 1.39 30-Jul-2000 itojun

sync comment with reality


# 1.38 28-Jul-2000 itojun

nuke the following sysctl variables. "ppsratelimit" should work better.
need to recompile sbin/sysctl after updating /usr/include.
net.inet.tcp.rstratelimit
net.inet.icmp.errratelimit
net.inet6.icmp6.errratelimit


# 1.37 09-Jul-2000 itojun

add ppsratelimit(9), which does event-per-sec rate limitation.
use it from icmp6 error rate limitation code.
XXX better name for the function?


# 1.36 07-Jul-2000 itojun

sync with kame.
introduce in6_{recover,embed}scope, for in-kernel scoped-address manipulation.
improve in6_pcbnotify.


# 1.35 06-Jul-2000 itojun

- do not use bitfield for router renumbering header.
- add protection mechanism against ND cache corruption due to bad NUD hints.
- more stats
- icmp6 pps limitation. TOOD: should implement ppsratecheck(9).


# 1.34 28-Jun-2000 mrg

<vm/vm.h> -> <uvm/uvm_extern.h>


Revision tags: netbsd-1-5-base
# 1.33 13-Jun-2000 itojun

branches: 1.33.2;
signedness issue with char, take 2. confirmed with i386 cc -funsigned-char.


# 1.32 13-Jun-2000 itojun

workaround to suppress warning on char == unsigned char arch.


# 1.31 12-Jun-2000 itojun

better conformance to draft-ietf-ipngwg-icmp-name-lookups-05.
the old code was chimera of 03 and 05 draft.

-n by default, since IPv6 reverse lookup takes too much time.
use -H to enable reverse name lookup.


Revision tags: minoura-xpg4dl-base
# 1.30 22-May-2000 itojun

branches: 1.30.2;
disallow negative numbers for ratelimit interval (tcp, icmp, icmp6).


# 1.29 09-May-2000 itojun

do not try NUD unless the gateway is a real neighbor.
real fix to KAME PR 245 (workaround has been implemented).


# 1.28 13-Apr-2000 itojun

do not return icmp6 error against icmp6 error.
(this is due to a bug in header chain chasing)


# 1.27 22-Mar-2000 itojun

use ip6_{last,next}hdr in icmp6 inbound packet parsing.


# 1.26 01-Mar-2000 itojun

introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)


# 1.25 28-Feb-2000 itojun

fix ICMPv6 redirect input. the bug can result in invalid ND entry.


# 1.24 28-Feb-2000 itojun

support draft-ietf-ipngwg-icmp-name-lookups-05.txt, drop support for
draft-ietf-ipngwg-icmp-name-lookups-04.txt.

There are certain bitfield change in 04 draft to 05 draft, which makes
04 "ping6 -a" and 05 "ping6 -a" not interoperable. sigh.


# 1.23 26-Feb-2000 itojun

bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.


# 1.22 17-Feb-2000 darrenr

Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.


# 1.21 15-Feb-2000 thorpej

Fix a couple of brainos in the last.


# 1.20 14-Feb-2000 thorpej

Use ratecheck() for ICMP6 rate limiting.


Revision tags: chs-ubc2-newbase
# 1.19 06-Feb-2000 itojun

fix include pathname for better rfc2292 compliance.


# 1.18 16-Jan-2000 itojun

add missing ipcomp cases.


# 1.17 07-Jan-2000 itohy

Rename variable "prep" for PReP port.


# 1.16 06-Jan-2000 itojun

remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...


# 1.15 05-Jan-2000 itojun

avoid panic on getsockopt(ICMPV6_FILTER).


# 1.14 02-Jan-2000 itojun

add net.inet6.icmp6.nodeinfo sysctl.
this allows you to disable/enable ICMPv6 node information query/reply
processing (which tells remote end the gethostname(3) setting, interface
addresses on the node, and some other things - documented in
draft-ietf-ipngwg-icmp-name-lookup* or something alike).

to test it, try ping6 -w ::1 with nodeinfo=0 and nodeinfo=1.
(sync with kame change)


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.13 15-Dec-1999 itojun

do not overwrite traffic class field when we write IPv6 version field.


# 1.12 13-Dec-1999 itojun

sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)


Revision tags: comdex-fall-1999-base fvdl-softdep-base
# 1.11 01-Oct-1999 itojun

branches: 1.11.2; 1.11.8;
consistent logging for icmp6 redirects
XXX should make logs 1-liner so that duplicated logs can be compressed
by syslog(8)?


Revision tags: chs-ubc2-base
# 1.10 31-Jul-1999 itojun

sync with recent KAME.
- loosen ipsec restriction on packet diredtion.
- revise icmp6 redirect handling on IsRouter bit.
- tcp/udp notification processing (link-local address case)
- cosmetic fixes (better code share across *BSD).


# 1.9 30-Jul-1999 itojun

remove reference to in6_systm.h (file itself will be removed afterwords)


# 1.8 22-Jul-1999 itojun

- implement IPv6 pmtud, which is necessary for TCP6.
- fix memory leak on SO_DEBUG over TCP.


# 1.7 22-Jul-1999 itojun

change unnecessary u_long/long into u_int32_t or something relevant.
more fixes should follow.


# 1.6 09-Jul-1999 thorpej

defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).


# 1.5 06-Jul-1999 itojun

sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups


# 1.4 06-Jul-1999 itojun

checked build on alpha and i386, with GENERIC.v6.
fixed several sizeof(void *) and sizeof(size_t) issues on alpha.

Thanks to: Dave Huang and Tim Rightnour


# 1.3 03-Jul-1999 thorpej

RCS ID police.


# 1.2 01-Jul-1999 itojun

branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.1 28-Jun-1999 itojun

branches: 1.1.2;
file icmp6.c was initially added on branch kame.


# 1.203 12-Dec-2016 ozaki-r

Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.202 11-Dec-2016 ozaki-r

Correct sanity checks of icmp6_redirect_output

- rt->rt_ifp is always non-NULL
- Checking RTF_UP here is just racy and meaningless
- The arguments should be non-NULL (at least for now)


Revision tags: nick-nhusb-base-20161204
# 1.201 15-Nov-2016 mlelstv

Enforce alignment requirements that are violated in some cases.
For machines that don't need strict alignment (i386,amd64,vax,m68k) this
is a no-op.

Fixes PR kern/50766 but should be improved.


Revision tags: pgoyette-localcount-20161104
# 1.200 31-Oct-2016 ozaki-r

Fix race condition of in6_selectsrc

in6_selectsrc returned a pointer to in6_addr that wan't guaranteed to be
safe by pserialize (or psref), which was racy. Let callers pass a pointer
to in6_addr and in6_selectsrc copy a result to it inside pserialize
critical sections.


# 1.199 25-Oct-2016 ozaki-r

Remove unnecessary argument

No functional change.


# 1.198 18-Oct-2016 ozaki-r

Remove unnecessary pserialize_read_enter


Revision tags: nick-nhusb-base-20161004 localcount-20160914
# 1.197 26-Aug-2016 dholland

PR 51434 David Binderman: remove redundant test.


# 1.196 19-Aug-2016 roy

Revert r1.148
IP6_EXTHDR_GET ensures that a icmp6 header can be fetched from the mbuf
so m_pullup does not need to be called.

While here, we can safely increament interface error stats even with an
invalidated mbuf because we have a saved reference to the interface.


Revision tags: pgoyette-localcount-20160806
# 1.195 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


Revision tags: pgoyette-localcount-20160726
# 1.194 15-Jul-2016 ozaki-r

Use sin6tosa and sin6tocsa macros

No functional change.


# 1.193 15-Jul-2016 ozaki-r

Use ifatoia6 macro

No functional change.


Revision tags: pgoyette-localcount-base nick-nhusb-base-20160907
# 1.192 07-Jul-2016 ozaki-r

branches: 1.192.2;
Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.


# 1.191 05-Jul-2016 ozaki-r

Use ia6 or ia instead of ifa as a variable name of struct in6_ifaddr

We conventionally use ifa for struct ifaddr and use ia6 or ia for
struct in6_ifaddr.

No functional change.


# 1.190 28-Jun-2016 ozaki-r

Add missing NULL checks for m_get_rcvif_psref


# 1.189 21-Jun-2016 ozaki-r

Make sure returning ifp from in6_select* functions psref-ed

To this end, callers need to pass struct psref to the functions
and the fuctions acquire a reference of ifp with it. In some cases,
we can simply use if_get_byindex, however, in other cases
(say rt->rt_ifp and ia->ifa_ifp), we have no MP-safe way for now.
In order to take a reference anyway we use non MP-safe function
if_acquire_NOMPSAFE for the latter cases. They should be fixed in
the future somehow.


# 1.188 10-Jun-2016 ozaki-r

Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.


# 1.187 10-Jun-2016 ozaki-r

Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.


Revision tags: nick-nhusb-base-20160529
# 1.186 18-May-2016 ozaki-r

Don't try to get outif unnecessarily from in6_selectsrc

The got outif is unused.


# 1.185 17-May-2016 ozaki-r

Get rcvif once and reuse it

No functional change.


# 1.184 17-May-2016 ozaki-r

Make sure icmp6_redirect_input frees mbuf before return


# 1.183 12-May-2016 ozaki-r

Protect ifnet list with psz and psref

The change ensures that ifnet objects in the ifnet list aren't freed during
list iterations by using pserialize(9) and psref(9).

Note that the change adds a pslist(9) for ifnet but doesn't remove the
original ifnet list (ifnet_list) to avoid breaking kvm(3) users. We
shouldn't use the original list in the kernel anymore.


Revision tags: nick-nhusb-base-20160422
# 1.182 04-Apr-2016 ozaki-r

Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html


# 1.181 01-Apr-2016 ozaki-r

Remove unnecessary casts and do s/0/NULL/ for rtrequest


# 1.180 01-Apr-2016 ozaki-r

Refine nd6log

Add __func__ to nd6log itself instead of adding it to callers.


Revision tags: nick-nhusb-base-20160319
# 1.179 21-Jan-2016 riastradh

Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.


# 1.178 21-Jan-2016 riastradh

Give proper prototype to ip_output.


Revision tags: nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.177 14-Sep-2015 ozaki-r

Update icmp6_redirect_timeout_q when changing net.inet6.icmp6.redirtimeout

We have to update icmp6_redirect_timeout_q as well as icmp6_redirtimeout
when changing net.inet6.icmp6.redirtimeout via sysctl. The updating logic
is copied from sysctl_net_inet_icmp_redirtimeout.

This change is from s-yamaguchi@IIJ (with KNF by ozaki-r) and fixes
PR kern/50240.


# 1.176 31-Aug-2015 ozaki-r

Make rt_refcnt take into account rt_timer


# 1.175 24-Aug-2015 pooka

sprinkle _KERNEL_OPT


# 1.174 24-Aug-2015 ozaki-r

Change 0 to NULL for rtrequest's last argument (struct rtentry **ret_nrt)


# 1.173 07-Aug-2015 ozaki-r

Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.


# 1.172 24-Jul-2015 ozaki-r

Fix rtfree-ing wrong rtentry


# 1.171 17-Jul-2015 ozaki-r

Reform use of rt_refcnt

rt_refcnt of rtentry was used in bad manners, for example, direct rt_refcnt++
and rt_refcnt-- outside route.c, "rt->rt_refcnt++; rtfree(rt);" idiom, and
touching rt after rt->rt_refcnt--.

These abuses seem to be needed because rt_refcnt manages only references
between rtentry and doesn't take care of references during packet processing
(IOW references from local variables). In order to reduce the above abuses,
the latter cases should be counted by rt_refcnt as well as the former cases.

This change improves consistency of use of rt_refcnt:
- rtentry is always accessed with rt_refcnt incremented
- rtentry's rt_refcnt is decremented after use (rtfree is always used instead
of rt_refcnt--)
- functions returning rtentry increment its rt_refcnt (and caller rtfree it)

Note that rt_refcnt prevents rtentry from being freed but doesn't prevent
rtentry from being updated. Toward MP-safe, we need to provide another
protection for rtentry, e.g., locks. (Or introduce a better data structure
allowing concurrent readers during updates.)


Revision tags: nick-nhusb-base-20150606 nick-nhusb-base-20150406 nick-nhusb-base
# 1.170 25-Nov-2014 christos

branches: 1.170.2;
CID 977389: Out of bounds access.


Revision tags: netbsd-7-0-2-RELEASE netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.169 06-Jun-2014 rmind

branches: 1.169.2;
- Eliminate RTFREE() macro in favour of rtfree() function.
- Make rtcache() function static.


# 1.168 30-May-2014 christos

Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.


# 1.167 19-May-2014 rmind

- Split off PRU_ATTACH and PRU_DETACH logic into separate functions.
- Replace malloc with kmem and eliminate M_PCB while here.
- Sprinkle more asserts.


Revision tags: rmind-smpnet-nbase rmind-smpnet-base
# 1.166 18-May-2014 rmind

Use IFNET_FIRST() rather than open coding ifnet access.


Revision tags: yamt-pagecache-base9 riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3
# 1.165 25-Feb-2014 pooka

branches: 1.165.2;
Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.


# 1.164 20-Feb-2014 joerg

Bail out in case m_pulldown failed.


# 1.163 23-Nov-2013 christos

convert from CIRCLEQ to TAILQ.


Revision tags: riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base
# 1.162 05-Jun-2013 christos

branches: 1.162.2;
IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.


Revision tags: agc-symver-base yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6
# 1.161 23-Jun-2012 christos

branches: 1.161.2;
4 new sysctls to avoid ipv6 DoS attacks from OpenBSD


Revision tags: jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8
# 1.160 22-Mar-2012 drochner

remove KAME IPSEC, replaced by FAST_IPSEC


Revision tags: netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3 jmcneill-usbmp-base2 netbsd-6-base
# 1.159 31-Dec-2011 christos

branches: 1.159.2; 1.159.6; 1.159.8;
- fix offsetof usage, and redundant defines
- kill pointer casts to 0


# 1.158 19-Dec-2011 drochner

rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.


Revision tags: jmcneill-usbmp-pre-base2 jmcneill-usbmp-base jmcneill-audiomp3-base yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base
# 1.157 31-Aug-2011 plunky

branches: 1.157.2; 1.157.6;
NULL does not need a cast


Revision tags: rmind-uvmplock-nbase cherry-xenmp-base bouyer-quota2-nbase bouyer-quota2-base jruoho-x86intr-base matt-mips64-premerge-20101231 uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 rmind-uvmplock-base
# 1.156 12-Sep-2010 drochner

avoid NULL dereference in error case


Revision tags: uebayasi-xip-base2 yamt-nfs-mp-base10 uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211 jym-xensuspend-nbase
# 1.155 18-Oct-2009 christos

branches: 1.155.2; 1.155.4;
fix the sun2 case for real.


# 1.154 12-Oct-2009 christos

unbreak sun2.


# 1.153 16-Sep-2009 pooka

Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.152 18-Mar-2009 cegger

bzero -> memset


# 1.151 18-Mar-2009 cegger

bcmp -> memcmp


Revision tags: netbsd-5-2-3-RELEASE netbsd-5-1-5-RELEASE netbsd-5-2-2-RELEASE netbsd-5-1-4-RELEASE netbsd-5-2-1-RELEASE netbsd-5-1-3-RELEASE netbsd-5-2-RELEASE netbsd-5-2-RC1 netbsd-5-1-2-RELEASE netbsd-5-1-1-RELEASE matt-nb5-mips64-premerge-20101231 matt-nb5-pq3-base netbsd-5-1-RELEASE netbsd-5-1-RC4 matt-nb5-mips64-k15 netbsd-5-1-RC3 netbsd-5-1-RC2 netbsd-5-1-RC1 netbsd-5-0-2-RELEASE matt-nb5-mips64-premerge-20091211 matt-nb5-mips64-u2-k2-k4-k7-k8-k9 matt-nb4-mips64-k7-u2a-k9b matt-nb5-mips64-u1-k1-k5 netbsd-5-0-1-RELEASE netbsd-5-0-RELEASE netbsd-5-0-RC4 netbsd-5-0-RC3 nick-hppapmap-base2 netbsd-5-0-RC2 netbsd-5-0-RC1 haad-dm-base2 haad-nbase2 ad-audiomp2-base netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 haad-dm-base mjf-devfs2-base
# 1.150 03-Oct-2008 adrianp

branches: 1.150.2; 1.150.8;
Fix for CVE-2008-3530 from matt@
Implement improved checking for MTU values on ICMP 'Packet Too Big Messages'


Revision tags: wrstuden-revivesa-base-3 wrstuden-revivesa-base-2
# 1.149 06-Aug-2008 plunky

Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core


Revision tags: wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base
# 1.148 07-May-2008 bouyer

branches: 1.148.2; 1.148.6;
Sync with ipv4 icmp_input(): make sure the mbuf is writable and
contains the entire icmp message befre calling icmp6_input().
should fix "panic: mbuf too short for IPv6 header" seen by several peoples.


# 1.147 04-May-2008 thorpej

Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577


Revision tags: yamt-nfs-mp-base
# 1.146 23-Apr-2008 thorpej

branches: 1.146.2;
Use <net/net_stats.h> / netstat_sysctl().


Revision tags: yamt-pf42-baseX yamt-pf42-base
# 1.145 15-Apr-2008 thorpej

branches: 1.145.2;
Make ip6 and icmp6 stats per-cpu.


# 1.144 08-Apr-2008 thorpej

Change IPv6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ip6stat structure; old netstat
binaries will continue to work properly.


# 1.143 08-Apr-2008 thorpej

Change ICMP6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old icmp6stat structure; old netstat
binaries will continue to work properly.


Revision tags: ad-socklock-base1 yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase keiichi-mipv6-base matt-armv6-nbase
# 1.142 27-Feb-2008 matt

Convert to ansi definitions from old-style definitons.
Remember that func() is not ansi, func(void) is.


Revision tags: nick-net80211-sync-base bouyer-xeni386-merge1 vmlocking2-base3 bouyer-xeni386-nbase yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 bouyer-xeni386-base yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase mjf-devfs-base matt-armv6-base jmcneill-pm-base hpcarm-cleanup-base reinoud-bufcleanup-base
# 1.141 04-Dec-2007 dyoung

branches: 1.141.8; 1.141.12;
Use IFNET_FOREACH() and IFADDR_FOREACH().


Revision tags: vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase bouyer-xenamd64-base
# 1.140 01-Nov-2007 dyoung

branches: 1.140.2; 1.140.4;
De-__P().


# 1.139 29-Oct-2007 dyoung

The IPv6 stack labels incoming packets with an m_tag whose payload
is a struct ip6aux. A struct ip6aux used to contain a pointer to
an in6_ifaddr, but that pointer could become a dangling reference
in the lifetime of the m_tag, because ip6_setdstifaddr() did not
increase the in6_ifaddr's reference count. I have removed the
pointer from ip6aux. I load it with the interesting fields from
the in6_ifaddr (an IPv6 address, a scope ID, and some flags),
instead.


# 1.138 24-Oct-2007 dyoung

Replace rote sockaddr_in6 initializations (memset(), set sa6_family,
sa6_len, and sa6_add) with sockaddr_in6_init() calls.

De-__P(). Constify. KNF. Shorten a staircase. Change bcmp() to
memcmp().

Extract subroutine in6_setzoneid() from in6_setscope(), for re-use
soon.


Revision tags: yamt-x86pmap-base4 yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base vmlocking-base
# 1.137 19-Sep-2007 dyoung

branches: 1.137.4;
1) Introduce a new socket option, (SOL_SOCKET, SO_NOHEADER), that
tells a socket that it should both add a protocol header to tx'd
datagrams and remove the header from rx'd datagrams:

int onoff = 1, s = socket(...);
setsockopt(s, SOL_SOCKET, SO_NOHEADER, &onoff);

2) Add an implementation of (SOL_SOCKET, SO_NOHEADER) for raw IPv4
sockets.

3) Reorganize the protocols' pr_ctloutput implementations a bit.
Consistently return ENOPROTOOPT when an option is unsupported,
and EINVAL if a supported option's arguments are incorrect.
Reorganize the flow of code so that it's more clear how/when
options are passed down the stack until they are handled.

Shorten some pr_ctloutput staircases for readability.

4) Extract common mbuf code into subroutines, add new sockaddr
methods, and introduce a new subroutine, fsocreate(), for reuse
later; use it first in sys_socket():

struct mbuf *m_getsombuf(struct socket *so)

Create an mbuf and make its owner the socket `so'.

struct mbuf *m_intopt(struct socket *so, int val)

Create an mbuf, make its owner the socket `so', put the
int `val' into it, and set its length to sizeof(int).


int fsocreate(..., int *fd)

Create a socket, a la socreate(9), put the socket into the
given LWP's descriptor table, return the descriptor at `fd'
on success.

void *sockaddr_addr(struct sockaddr *sa, socklen_t *slenp)
const void *sockaddr_const_addr(const struct sockaddr *sa, socklen_t *slenp)

Extract a pointer to the address part of a sockaddr. Write
the length of the address part at `slenp', if `slenp' is
not NULL.

socklen_t sockaddr_getlen(const struct sockaddr *sa)

Return the length of a sockaddr. This just evaluates to
sa->sa_len. I only add this for consistency with code that
appears in a portable userland library that I am going to
import.

const struct sockaddr *sockaddr_any(const struct sockaddr *sa)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.

const void *sockaddr_anyaddr(const struct sockaddr *sa, socklen_t *slenp)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.


Revision tags: nick-csl-alignment-base5
# 1.136 10-Aug-2007 dyoung

branches: 1.136.2;
Constify. bcopy -> memcpy.


Revision tags: matt-mips64-base
# 1.135 19-Jul-2007 dyoung

branches: 1.135.4; 1.135.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.134 13-Jun-2007 dyoung

branches: 1.134.2;
Persuasive programming: check M_UNWRITABLE(m, len) instead of
m->m_len<len before pulling up, because that helps make it clear
that we m_pullup() in order to guarantee that the contiguous region
is *writable*.


# 1.133 23-May-2007 christos

Ansify + add a few comments, from Karl Sj��dahl


Revision tags: yamt-idlelwp-base8
# 1.132 02-May-2007 dyoung

Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


Revision tags: thorpej-atomic-base
# 1.131 04-Mar-2007 christos

branches: 1.131.2; 1.131.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.130 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


# 1.129 10-Feb-2007 degroote

branches: 1.129.2;
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.128 29-Jan-2007 dyoung

bzero -> memset


# 1.127 15-Jan-2007 dyoung

Cosmetic: indent using ASCII horizontal tab, insert space following
comma, wrap line.


# 1.126 15-Jan-2007 degroote

Fix an infinite loop ( and local dos ) in the case where the ip6_hdr and
the icmp6_hdr are not in the same mbuf.
Fix pr/34994 and probably pr/35333
Ok @rpaulo


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.125 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.124 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


Revision tags: netbsd-4-base
# 1.123 16-Nov-2006 christos

branches: 1.123.2;
__unused removal on arguments; approved by core.


Revision tags: yamt-splraiseipl-base2
# 1.122 12-Oct-2006 christos

- sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386


Revision tags: yamt-splraiseipl-base yamt-pdpolicy-base9 rpaulo-netinet-merge-pcb-base
# 1.121 05-Sep-2006 dyoung

branches: 1.121.2; 1.121.4;
Simplify and repair icmp6_input() to stop the kernel from panicking
in m_copydata() when an ICMP6_ECHO_REQUEST is received, as reported
by Tatoku Ogaito on current-users@.


Revision tags: yamt-pdpolicy-base8
# 1.120 01-Sep-2006 dyoung

Vastly simplify the code that copies an ICMP6 packet to two data
paths: ICMP6 reply path, and socket path.


# 1.119 30-Aug-2006 christos

declare the type of code.


Revision tags: abandoned-netbsd-4-base yamt-pdpolicy-base7
# 1.118 11-Jul-2006 tron

Clear mbuf checksum flags before passing it to ip6_output(). We might
recycle a mbuf which contained a hardware provided checksum. This
fixes "traceroute6" to a machine which is using a wm(4) interface
that has UDP or TCP checksum offload enabled.


Revision tags: yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base chap-midi-base
# 1.117 07-Jun-2006 kardel

branches: 1.117.2;
merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html


Revision tags: yamt-pdpolicy-base5 elad-kernelauth-base simonb-timecounters-base
# 1.116 15-Apr-2006 christos

branches: 1.116.2;
Coverity CID 740: Change constant comparisons to MCLBYTES to KASSERT and remove
extraneous tests.


Revision tags: yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2
# 1.115 05-Mar-2006 rpaulo

branches: 1.115.2; 1.115.4;
NDP-related improvements:
RFC4191
- supports host-side router-preference

RFC3542
- if DAD fails on a interface, disables IPv6 operation on the
interface
- don't advertise MLD report before DAD finishes

Others
- fixes integer overflow for valid and preferred lifetimes
- improves timer granularity for MLD, using callout-timer.
- reflects rtadvd's IPv6 host variable information into kernel
(router only)
- adds a sysctl option to enable/disable pMTUd for multicast
packets
- performs NUD on PPP/GRE interface by default
- Redirect works regardless of ip6_accept_rtadv
- removes RFC1885-related code

From the KAME project via SUZUKI Shinsuke.
Reviewed by core.


Revision tags: yamt-pdpolicy-base
# 1.114 03-Mar-2006 rpaulo

branches: 1.114.2;
Fix typos in comments.

From: the KAME project via SUZUKI Shinsuke.


Revision tags: yamt-uio_vmspace-base5
# 1.113 21-Jan-2006 rpaulo

branches: 1.113.2; 1.113.4;
Better support of IPv6 scoped addresses.

- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.
- and, while there, don't try to remove the ff02::/32 interface route
entry in in6_ifdetach() as it's already gone.

This also includes some level of support for the standard source
address selection algorithm defined in RFC3484, which will be
completed on in the future.

From the KAME project via JINMEI Tatuya.
Approved by core@.


# 1.112 11-Dec-2005 christos

branches: 1.112.2;
merge ktrace-lwp.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base ktrace-lwp-base
# 1.111 19-Oct-2005 bouyer

In icmp6_redirect_output(), sip6 is initialised to point to the data area of
m0. But m0 may be freed later, so trying to use sip6 at the end of this
function is wrong. My guess is that we want to reference the data area
of m (the mbuf about to be send) instead at this point.
Fix a panic on Xen (where a data area of a mbuf may be unmapped when the
mbuf is freed), and probably potential data/pool corruption in other cases.


Revision tags: yamt-vop-base
# 1.110 18-Aug-2005 yamt

branches: 1.110.2;
- introduce M_MOVE_PKTHDR and use it where appropriate.
intended to be mostly API compatible with openbsd/freebsd.
- remove a glue #define in netipsec/ipsec_osdep.h.


# 1.109 29-May-2005 christos

branches: 1.109.2;
- avoid shadowed variables
- sprinkle const.


Revision tags: netbsd-3-1-1-RELEASE netbsd-3-0-3-RELEASE netbsd-3-1-RELEASE netbsd-3-0-2-RELEASE netbsd-3-1-RC4 netbsd-3-1-RC3 netbsd-3-1-RC2 netbsd-3-1-RC1 netbsd-3-0-1-RELEASE netbsd-3-0-RELEASE netbsd-3-0-RC6 netbsd-3-0-RC5 netbsd-3-0-RC4 netbsd-3-0-RC3 netbsd-3-0-RC2 netbsd-3-0-RC1 yamt-km-base4 yamt-km-base3 netbsd-3-base yamt-km-base2 yamt-km-base kent-audio2-base
# 1.108 17-Jan-2005 itojun

branches: 1.108.6; 1.108.8; 1.108.10;
shouldn't check code field on "packet too big" icmp6 message.


Revision tags: kent-audio1-beforemerge kent-audio1-base
# 1.107 25-May-2004 atatat

branches: 1.107.4;
Sysctl descriptions under net subtree (net.key not done)


Revision tags: netbsd-2-0-base
# 1.106 26-Mar-2004 itojun

branches: 1.106.2;
do not touch m->m_pkthdr.rcvif after m becomes invalid. Patrick Latifi


# 1.105 24-Mar-2004 atatat

Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.


# 1.104 17-Dec-2003 lha

Fix ICMPV6CTL_ND6_[DP]RLIST, they broke with new sysctl.
Makes ndp -r/ndp -p work again, patch from atatat


# 1.103 04-Dec-2003 atatat

Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.


# 1.102 30-Oct-2003 simonb

Remove some assigned-to but otherwise unused variables.


# 1.101 04-Sep-2003 itojun

revamp inpcb/in6pcb so that they are more aligned with each other.
in6pcb lookup now uses hash(9).


# 1.100 25-Aug-2003 itojun

deref member in in6p directly, don't rely on existence of macro


# 1.99 22-Aug-2003 itojun

remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.


# 1.98 22-Aug-2003 itojun

change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.


# 1.97 22-Aug-2003 jonathan

Replace the set_socket() method of passing an extra struct socket*
argument to ip6_output() with a new explicit struct in6pcb* argument.
(The underlying socket can be obtained via in6pcb->inp6_socket.)

In preparation for fast-ipsec. Reviewed by itojun.


# 1.96 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.95 06-Aug-2003 itojun

m_cat may free mbuf on 2nd arg, so m_pkthdr manipulation has to happen
before m_cat call. from Julian Coleman via kame.


# 1.94 24-Jun-2003 itojun

branches: 1.94.2;
remove unneeded checks of accept_rtadv. from kame


# 1.93 24-Jun-2003 itojun

use time.tv_sec directly


# 1.92 06-Jun-2003 itojun

- sync up MLD declaration with RFC3542 (s/MLD6/MLD/)
- routing header declaration with RFC3542
(note: sizeof(ip6_rthdr0) has changed!)
also, sync up with RFC2460 routing header definition (no "strict" source
routing mode any more)

part of advanced API update (RFC2292 -> 3542).


# 1.91 03-Jun-2003 itojun

remove assumption on redirect header option processing. from kame


# 1.90 14-May-2003 itojun

always use PULLDOWN_TEST codepath.


# 1.89 31-Mar-2003 itojun

avoid mbuf leak in redirect header option attachment. more complete
fix to come. from kame


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base kqueue-aftermerge kqueue-beforemerge kqueue-base
# 1.88 27-Sep-2002 provos

remove trailing \n in panic(). approved perry.


# 1.87 23-Sep-2002 simonb

Remove breaks after returns, unreachable returns and returns after
returns(!).


# 1.86 11-Sep-2002 itojun

KNF - return is not a function. sync w/kame.


Revision tags: gehenna-devsw-base
# 1.85 30-Jul-2002 itojun

no need to check NULL mbuf, as we touch it already.
From: tedu <grendel@zeitbombe.org>


# 1.84 10-Jul-2002 itojun

correct ping6 -w result wth hostname with [A-Z]. PR 17540. sync w/kame


# 1.83 30-Jun-2002 thorpej

Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).


# 1.82 09-Jun-2002 itojun

whitespace cleanup


# 1.81 08-Jun-2002 itojun

whitespace cleanup


# 1.80 31-May-2002 itojun

do not mistakenly lock PMTUD route entry with RTV_MTU.


# 1.79 29-May-2002 christos

make this compile again.


# 1.78 29-May-2002 itojun

correct rmx_mtu value after PMTUD entry timeout (should be set to 0)


# 1.77 24-May-2002 itojun

extra blank line


# 1.76 24-May-2002 itojun

make a strict check before sending FQDN node information reply. sync w/kame


Revision tags: netbsd-1-6-base eeh-devprop-base newlock-base
# 1.75 05-Mar-2002 itojun

branches: 1.75.6; 1.75.8;
on redirect output, always try to attach target link layer address option.


Revision tags: ifpoll-base
# 1.74 21-Dec-2001 itojun

whitespace/costmetic sync w/kame


# 1.73 20-Dec-2001 itojun

centralize multicast group management (in6_join/leavegroup).
have a flag for ip6_output() to fragment to minimum MTU.
sync with kame


# 1.72 07-Dec-2001 itojun

correct timing to increment icmp6 MIB variables. sync with kame


# 1.71 13-Nov-2001 lukem

add RCSIDs


Revision tags: thorpej-mips-cache-base
# 1.70 29-Oct-2001 simonb

Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.


# 1.69 24-Oct-2001 itojun

more whitespace sync with kame


# 1.68 18-Oct-2001 itojun

branches: 1.68.2;
simplify per-if stats.


# 1.67 15-Oct-2001 itojun

sync with kame.
net.inet6.icmp6.nodeinfo is now a bitmap (2^0 = ping6 -w, 2^1 = ping6 -a).
give up local if there's mbuf alloc failures.
cope with ".." in hostname.
sync comments/whitespaces.


Revision tags: thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base
# 1.66 22-Jun-2001 itojun

branches: 1.66.2;
remove RFC1885 compatibility code in #ifdef COMPAT_RFC1885, for icmp6
reply packet size consideration (obsolete, not used for a long time).
sync with kame


# 1.65 01-Jun-2001 itojun

use default hoplimit when incoming interface is not given to icmp6_reflect.
sync with kame


# 1.64 08-May-2001 itojun

correct faith prefix determination. use sys/netinet/if_faith.c:faithprefix()
to determine. sync with kame.
(without this change, non-faith socket may mistakenly accept for-faith traffic)


Revision tags: thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.63 04-Apr-2001 itojun

make sure rcvif is sane on call to icmp6_reflect


# 1.62 30-Mar-2001 itojun

enable FAKE_LOOPBACK_IF case by default.
now traffic on loopback interface will be presented to bpf as normal wire
format packet (without KAME scopeid in s6_addr16[1]).

fix KAME PR 250 (host mistakenly accepts packets to fe80::x%lo0).

sync with kame.


# 1.61 21-Mar-2001 itojun

set rmx_mtu to L2 interface mtu, instead of 0, on mtudisc timeout.
ip6_output() change is for safety. sync with kame


# 1.60 08-Mar-2001 itojun

remove bogus rtfree. sync with kame. inspired by openbsd PR 1706.


# 1.59 01-Mar-2001 itojun

branches: 1.59.2;
make sure to enforce inbound ipsec policy checking, for any protocols on top
of ip (check it when final header is visited). sync with kame.
XXX kame team will need to re-check policy engine code


# 1.58 11-Feb-2001 itojun

pull latest kame pcbnotify code. synchronizes ICMPv6 path mtu discovery
behavior with other protocols (i.e. validation, use of hiwat/lowat).


# 1.57 11-Feb-2001 itojun

recover $NetBSD$ (removed by mistake)


# 1.56 10-Feb-2001 itojun

to sync with kame better, (1) remove register declaration for variables,
(2) sync whitespaces, (3) update comments. (4) bring in some of portability
and logging enhancements. no functional changes here.


# 1.55 08-Feb-2001 itojun

implement upper limit to icmp6 redirects (experimental, turned off)
negative value to {mtudisc,redirect}_{hi,lo}wat will turn off the limitation.
sync with kame.


# 1.54 07-Feb-2001 itojun

remove bogus DIAGNOSTIC. sync with kame


# 1.53 07-Feb-2001 itojun

during ip6/icmp6 inbound packet processing, do not call log() nor printf() in
normal operation (/var can get filled up by flodding bogus packets).
sysctl net.inet6.icmp6.nd6_debug will turn on diagnostic messages.
(#define ND6_DEBUG will turn it on by default)

improve stats in ND6 code.

lots of synchronziation with kame (including comments and cometic ones).


# 1.52 24-Jan-2001 itojun

- record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation


# 1.51 16-Jan-2001 itojun

s/ND6DEBUG/ND6_DEBUG/ to meet other places


# 1.50 08-Jan-2001 itojun

wrap icmp6 checksum error printf() into #ifdef ND6DEBUG.
sync with kame, NetBSD PR 11911.


# 1.49 11-Dec-2000 itojun

no need to rtalloc1() twice in pmtud. from kame


# 1.48 09-Dec-2000 itojun

update icmp6 too big validation. the change is necessary since pmtud is
mandatory for IPv6 (so we can't just validate by using connected pcb - we need
to allow traffic from unconnected pcb to do pmtud).
- if the traffic is validated by xx_ctlinput, allow up to "hiwat" pmtud
route entries.
- if the traffic was not validated by xx_ctlinput, allow up to "lowat" pmtud
route entries (there's upper limit, so bad guys cannot blow up our routing
table).
sync with kame

XXX need to think again about default hiwat/lowat value.
XXX victim selection to help starvation case


# 1.47 11-Nov-2000 itojun

improve spec conformance of node information query (07).
sync with kame.


# 1.46 18-Oct-2000 itojun

verify ICMPv6 too big messages based on TCP pcbs, and/or IPsec SA.
TODO: udp6, and sendto consideration. as pmtud is mandatory for IPv6,
it is rather important for us to support those cases.
TODO: more testing
TODO: kame sync


# 1.45 10-Oct-2000 itojun

sync with kame ($KAME$)


# 1.44 02-Oct-2000 itojun

fix compilation without INET. fix confusion between ipsecstat and ipsec6stat.
sync with kame.


# 1.43 16-Sep-2000 itojun

kame sys/netinet6/icmp6.c 1.140 -> 1.144
> in the check for the incoming redirect message, examine the gateway
> (from the routing table) only when the address family of the gateway is
> AF_INET6.


# 1.42 19-Aug-2000 itojun

- icmp6 nodeinfo: remove possibility of unaligned pointer access.
- jumbo payload output: fix incorrect mbuf manipulation
- pedant: align issues, mbuf assumption
(sync with kame)


# 1.41 03-Aug-2000 itojun

clearifications in icmp6 node query support.
XXX previous commit included "supported qtypes" icmp6 node query support.
sorry commit message was mistaken.


# 1.40 03-Aug-2000 itojun

correct typo in #define. ICMP6_NI_SUCESS -> SUCCESS (notice missing C).
sync with kame.


# 1.39 30-Jul-2000 itojun

sync comment with reality


# 1.38 28-Jul-2000 itojun

nuke the following sysctl variables. "ppsratelimit" should work better.
need to recompile sbin/sysctl after updating /usr/include.
net.inet.tcp.rstratelimit
net.inet.icmp.errratelimit
net.inet6.icmp6.errratelimit


# 1.37 09-Jul-2000 itojun

add ppsratelimit(9), which does event-per-sec rate limitation.
use it from icmp6 error rate limitation code.
XXX better name for the function?


# 1.36 07-Jul-2000 itojun

sync with kame.
introduce in6_{recover,embed}scope, for in-kernel scoped-address manipulation.
improve in6_pcbnotify.


# 1.35 06-Jul-2000 itojun

- do not use bitfield for router renumbering header.
- add protection mechanism against ND cache corruption due to bad NUD hints.
- more stats
- icmp6 pps limitation. TOOD: should implement ppsratecheck(9).


# 1.34 28-Jun-2000 mrg

<vm/vm.h> -> <uvm/uvm_extern.h>


Revision tags: netbsd-1-5-base
# 1.33 13-Jun-2000 itojun

branches: 1.33.2;
signedness issue with char, take 2. confirmed with i386 cc -funsigned-char.


# 1.32 13-Jun-2000 itojun

workaround to suppress warning on char == unsigned char arch.


# 1.31 12-Jun-2000 itojun

better conformance to draft-ietf-ipngwg-icmp-name-lookups-05.
the old code was chimera of 03 and 05 draft.

-n by default, since IPv6 reverse lookup takes too much time.
use -H to enable reverse name lookup.


Revision tags: minoura-xpg4dl-base
# 1.30 22-May-2000 itojun

branches: 1.30.2;
disallow negative numbers for ratelimit interval (tcp, icmp, icmp6).


# 1.29 09-May-2000 itojun

do not try NUD unless the gateway is a real neighbor.
real fix to KAME PR 245 (workaround has been implemented).


# 1.28 13-Apr-2000 itojun

do not return icmp6 error against icmp6 error.
(this is due to a bug in header chain chasing)


# 1.27 22-Mar-2000 itojun

use ip6_{last,next}hdr in icmp6 inbound packet parsing.


# 1.26 01-Mar-2000 itojun

introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)


# 1.25 28-Feb-2000 itojun

fix ICMPv6 redirect input. the bug can result in invalid ND entry.


# 1.24 28-Feb-2000 itojun

support draft-ietf-ipngwg-icmp-name-lookups-05.txt, drop support for
draft-ietf-ipngwg-icmp-name-lookups-04.txt.

There are certain bitfield change in 04 draft to 05 draft, which makes
04 "ping6 -a" and 05 "ping6 -a" not interoperable. sigh.


# 1.23 26-Feb-2000 itojun

bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.


# 1.22 17-Feb-2000 darrenr

Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.


# 1.21 15-Feb-2000 thorpej

Fix a couple of brainos in the last.


# 1.20 14-Feb-2000 thorpej

Use ratecheck() for ICMP6 rate limiting.


Revision tags: chs-ubc2-newbase
# 1.19 06-Feb-2000 itojun

fix include pathname for better rfc2292 compliance.


# 1.18 16-Jan-2000 itojun

add missing ipcomp cases.


# 1.17 07-Jan-2000 itohy

Rename variable "prep" for PReP port.


# 1.16 06-Jan-2000 itojun

remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...


# 1.15 05-Jan-2000 itojun

avoid panic on getsockopt(ICMPV6_FILTER).


# 1.14 02-Jan-2000 itojun

add net.inet6.icmp6.nodeinfo sysctl.
this allows you to disable/enable ICMPv6 node information query/reply
processing (which tells remote end the gethostname(3) setting, interface
addresses on the node, and some other things - documented in
draft-ietf-ipngwg-icmp-name-lookup* or something alike).

to test it, try ping6 -w ::1 with nodeinfo=0 and nodeinfo=1.
(sync with kame change)


Revision tags: wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.13 15-Dec-1999 itojun

do not overwrite traffic class field when we write IPv6 version field.


# 1.12 13-Dec-1999 itojun

sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)


Revision tags: comdex-fall-1999-base fvdl-softdep-base
# 1.11 01-Oct-1999 itojun

branches: 1.11.2; 1.11.8;
consistent logging for icmp6 redirects
XXX should make logs 1-liner so that duplicated logs can be compressed
by syslog(8)?


Revision tags: chs-ubc2-base
# 1.10 31-Jul-1999 itojun

sync with recent KAME.
- loosen ipsec restriction on packet diredtion.
- revise icmp6 redirect handling on IsRouter bit.
- tcp/udp notification processing (link-local address case)
- cosmetic fixes (better code share across *BSD).


# 1.9 30-Jul-1999 itojun

remove reference to in6_systm.h (file itself will be removed afterwords)


# 1.8 22-Jul-1999 itojun

- implement IPv6 pmtud, which is necessary for TCP6.
- fix memory leak on SO_DEBUG over TCP.


# 1.7 22-Jul-1999 itojun

change unnecessary u_long/long into u_int32_t or something relevant.
more fixes should follow.


# 1.6 09-Jul-1999 thorpej

defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).


# 1.5 06-Jul-1999 itojun

sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups


# 1.4 06-Jul-1999 itojun

checked build on alpha and i386, with GENERIC.v6.
fixed several sizeof(void *) and sizeof(size_t) issues on alpha.

Thanks to: Dave Huang and Tim Rightnour


# 1.3 03-Jul-1999 thorpej

RCS ID police.


# 1.2 01-Jul-1999 itojun

branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


# 1.1 28-Jun-1999 itojun

branches: 1.1.2;
file icmp6.c was initially added on branch kame.