History log of /freebsd-11-stable/sys/netinet6/nd6_nbr.c
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# 359803 11-Apr-2020 ae

MFC r359498:
Ignore ND6 neighbor advertisement received for static link-layer entries.

Previously such NA could override manually created LLE.

Reported by: Martin Beran <martin at mber cz>


# 332819 20-Apr-2018 jtl

MFC r331488:
This change adds a flag to the DAD entry to indicate whether it is
currently on the queue. This prevents accidentally doubly-removing a DAD
entry from the queue, while also simplifying some of the logic in
nd6_dad_stop().

Sponsored by: Netflix, Inc.


# 331722 29-Mar-2018 eadler

Revert r330897:

This was intended to be a non-functional change. It wasn't. The commit
message was thus wrong. In addition it broke arm, and merged crypto
related code.

Revert with prejudice.

This revert skips files touched in r316370 since that commit was since
MFCed. This revert also skips files that require $FreeBSD$ property
changes.

Thank you to those who helped me get out of this mess including but not
limited to gonzo, kevans, rgrimes.

Requested by: gjb (re)


# 330897 14-Mar-2018 eadler

Partial merge of the SPDX changes

These changes are incomplete but are making it difficult
to determine what other changes can/should be merged.

No objections from: pfg


# 327337 29-Dec-2017 ae

MFC r326876:
Follow the RFC6980 and silently ignore following IPv6 NDP messages
that had the IPv6 fragmentation header:
o Neighbor Solicitation
o Neighbor Advertisement
o Router Solicitation
o Router Advertisement
o Redirect

Introduce M_FRAGMENTED mbuf flag, and set it after IPv6 fragment reassembly
is completed. Then check the presence of this flag in correspondig ND6
handling routines.

PR: 224247


# 317067 17-Apr-2017 asomers

MFC r313025, r313395, r314113, r314442, r315458, r315656

r313025:
Add tests for multi-fib IPv6 routing

PR: 196361
Submitted by: jhujhiti@adjectivism.org
Reported by: Jason Healy <jhealy@logn.net>
MFC after: 4 weeks
Sponsored by: Spectra Logic Corp

r313395:
Add fibs_test:udp_dontroute6, another IPv6 multi-FIB test

PR: 196361
MFC after: 3 weeks
Sponsored by: Spectra Logic Corp

r314113:
Remove tests/sys/netinet/fibs_tests's dependency on net/socat

Instead of bridging two tap interfaces with socat, just use an epair pair.

MFC after: 3 weeks
Sponsored by: Spectra Logic Corp

r314442:
Add an ATF test for IPv6 SLAAC with multiple fibs

Tests that an interface can get a SLAAC address and that it inserts its
routes into the correct fib. Does not test anything to do with NDP.

PR: 196361
Reviewed by: Erick Turnquist <jhujhiti@adjectivism.org>
MFC after: 3 weeks
Sponsored by: Spectra Logic Corp
Differential Revision: https://reviews.freebsd.org/D9776

r315458:
Constrain IPv6 routes to single FIBs when net.add_addr_allfibs=0

sys/netinet6/icmp6.c
Use the interface's FIB for source address selection in ICMPv6 error
responses.

sys/netinet6/in6.c
In in6_newaddrmsg, announce arrival of local addresses on the
interface's FIB only. In in6_lltable_rtcheck, use a per-fib ND6
cache instead of a single cache.

sys/netinet6/in6_src.c
In in6_selectsrc, use the caller's fib instead of the default fib.
In in6_selectsrc_socket, remove a superfluous check.

sys/netinet6/nd6.c
In nd6_lle_event, use the interface's fib for routing socket
messages. In nd6_is_new_addr_neighbor, check all FIBs when trying
to determine whether an address is a neighbor. Also, simplify the
code for point to point interfaces.

sys/netinet6/nd6.h
sys/netinet6/nd6.c
sys/netinet6/nd6_rtr.c
Make defrouter_select fib-aware, and make all of its callers pass in
the interface fib.

sys/netinet6/nd6_nbr.c
When inputting a Neighbor Solicitation packet, consider the
interface fib instead of the default fib for DAD. Output NS and
Neighbor Advertisement packets on the correct fib.

sys/netinet6/nd6_rtr.c
Allow installing the same host route on different interfaces in
different FIBs. If rt_add_addr_allfibs=0, only install or delete
the prefix route on the interface fib.

tests/sys/netinet/fibs_test.sh
Clear some expected failures, but add a skip for the newly revealed
BUG217871.

PR: 196361
Submitted by: Erick Turnquist <jhujhiti@adjectivism.org>
Reported by: Jason Healy <jhealy@logn.net>
Reviewed by: asomers
MFC after: 3 weeks
Sponsored by: Spectra Logic Corp
Differential Revision: https://reviews.freebsd.org/D9451

r315656:
Fix back-to-back runs of sys/netinet/fibs_test;slaac_on_nondefault_fib6

This test was failing if run twice because rtadvd takes too long to die.
The rtadvd process from the first run was still running when the
second run created its interfaces. The solution is to use SIGKILL during
the cleanup instead of SIGTERM so rtadvd will die faster.

While I'm here, randomize the addresses used for the test, which makes bugs
like this easier to spot, and fix the cleanup order to be the opposite of
the setup order

PR: 217871
MFC after: 18 days
X-MFC-With: 315458
Sponsored by: Spectra Logic Corp


# 302408 07-Jul-2016 gjb

Copy head@r302406 to stable/11 as part of the 11.0-RELEASE cycle.
Prune svn:mergeinfo from the new branch, as nothing has been merged
here.

Additional commits post-branch will follow.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


/freebsd-11-stable/MAINTAINERS
/freebsd-11-stable/cddl
/freebsd-11-stable/cddl/contrib/opensolaris
/freebsd-11-stable/cddl/contrib/opensolaris/cmd/dtrace/test/tst/common/print
/freebsd-11-stable/cddl/contrib/opensolaris/cmd/zfs
/freebsd-11-stable/cddl/contrib/opensolaris/lib/libzfs
/freebsd-11-stable/contrib/amd
/freebsd-11-stable/contrib/apr
/freebsd-11-stable/contrib/apr-util
/freebsd-11-stable/contrib/atf
/freebsd-11-stable/contrib/binutils
/freebsd-11-stable/contrib/bmake
/freebsd-11-stable/contrib/byacc
/freebsd-11-stable/contrib/bzip2
/freebsd-11-stable/contrib/com_err
/freebsd-11-stable/contrib/compiler-rt
/freebsd-11-stable/contrib/dialog
/freebsd-11-stable/contrib/dma
/freebsd-11-stable/contrib/dtc
/freebsd-11-stable/contrib/ee
/freebsd-11-stable/contrib/elftoolchain
/freebsd-11-stable/contrib/elftoolchain/ar
/freebsd-11-stable/contrib/elftoolchain/brandelf
/freebsd-11-stable/contrib/elftoolchain/elfdump
/freebsd-11-stable/contrib/expat
/freebsd-11-stable/contrib/file
/freebsd-11-stable/contrib/gcc
/freebsd-11-stable/contrib/gcclibs/libgomp
/freebsd-11-stable/contrib/gdb
/freebsd-11-stable/contrib/gdtoa
/freebsd-11-stable/contrib/groff
/freebsd-11-stable/contrib/ipfilter
/freebsd-11-stable/contrib/ldns
/freebsd-11-stable/contrib/ldns-host
/freebsd-11-stable/contrib/less
/freebsd-11-stable/contrib/libarchive
/freebsd-11-stable/contrib/libarchive/cpio
/freebsd-11-stable/contrib/libarchive/libarchive
/freebsd-11-stable/contrib/libarchive/libarchive_fe
/freebsd-11-stable/contrib/libarchive/tar
/freebsd-11-stable/contrib/libc++
/freebsd-11-stable/contrib/libc-vis
/freebsd-11-stable/contrib/libcxxrt
/freebsd-11-stable/contrib/libexecinfo
/freebsd-11-stable/contrib/libpcap
/freebsd-11-stable/contrib/libstdc++
/freebsd-11-stable/contrib/libucl
/freebsd-11-stable/contrib/libxo
/freebsd-11-stable/contrib/llvm
/freebsd-11-stable/contrib/llvm/projects/libunwind
/freebsd-11-stable/contrib/llvm/tools/clang
/freebsd-11-stable/contrib/llvm/tools/lldb
/freebsd-11-stable/contrib/llvm/tools/llvm-dwarfdump
/freebsd-11-stable/contrib/llvm/tools/llvm-lto
/freebsd-11-stable/contrib/mdocml
/freebsd-11-stable/contrib/mtree
/freebsd-11-stable/contrib/ncurses
/freebsd-11-stable/contrib/netcat
/freebsd-11-stable/contrib/ntp
/freebsd-11-stable/contrib/nvi
/freebsd-11-stable/contrib/one-true-awk
/freebsd-11-stable/contrib/openbsm
/freebsd-11-stable/contrib/openpam
/freebsd-11-stable/contrib/openresolv
/freebsd-11-stable/contrib/pf
/freebsd-11-stable/contrib/sendmail
/freebsd-11-stable/contrib/serf
/freebsd-11-stable/contrib/sqlite3
/freebsd-11-stable/contrib/subversion
/freebsd-11-stable/contrib/tcpdump
/freebsd-11-stable/contrib/tcsh
/freebsd-11-stable/contrib/tnftp
/freebsd-11-stable/contrib/top
/freebsd-11-stable/contrib/top/install-sh
/freebsd-11-stable/contrib/tzcode/stdtime
/freebsd-11-stable/contrib/tzcode/zic
/freebsd-11-stable/contrib/tzdata
/freebsd-11-stable/contrib/unbound
/freebsd-11-stable/contrib/vis
/freebsd-11-stable/contrib/wpa
/freebsd-11-stable/contrib/xz
/freebsd-11-stable/crypto/heimdal
/freebsd-11-stable/crypto/openssh
/freebsd-11-stable/crypto/openssl
/freebsd-11-stable/gnu/lib
/freebsd-11-stable/gnu/usr.bin/binutils
/freebsd-11-stable/gnu/usr.bin/cc/cc_tools
/freebsd-11-stable/gnu/usr.bin/gdb
/freebsd-11-stable/lib/libc/locale/ascii.c
/freebsd-11-stable/sys/cddl/contrib/opensolaris
/freebsd-11-stable/sys/contrib/dev/acpica
/freebsd-11-stable/sys/contrib/ipfilter
/freebsd-11-stable/sys/contrib/libfdt
/freebsd-11-stable/sys/contrib/octeon-sdk
/freebsd-11-stable/sys/contrib/x86emu
/freebsd-11-stable/sys/contrib/xz-embedded
/freebsd-11-stable/usr.sbin/bhyve/atkbdc.h
/freebsd-11-stable/usr.sbin/bhyve/bhyvegc.c
/freebsd-11-stable/usr.sbin/bhyve/bhyvegc.h
/freebsd-11-stable/usr.sbin/bhyve/console.c
/freebsd-11-stable/usr.sbin/bhyve/console.h
/freebsd-11-stable/usr.sbin/bhyve/pci_fbuf.c
/freebsd-11-stable/usr.sbin/bhyve/pci_xhci.c
/freebsd-11-stable/usr.sbin/bhyve/pci_xhci.h
/freebsd-11-stable/usr.sbin/bhyve/ps2kbd.c
/freebsd-11-stable/usr.sbin/bhyve/ps2kbd.h
/freebsd-11-stable/usr.sbin/bhyve/ps2mouse.c
/freebsd-11-stable/usr.sbin/bhyve/ps2mouse.h
/freebsd-11-stable/usr.sbin/bhyve/rfb.c
/freebsd-11-stable/usr.sbin/bhyve/rfb.h
/freebsd-11-stable/usr.sbin/bhyve/sockstream.c
/freebsd-11-stable/usr.sbin/bhyve/sockstream.h
/freebsd-11-stable/usr.sbin/bhyve/usb_emul.c
/freebsd-11-stable/usr.sbin/bhyve/usb_emul.h
/freebsd-11-stable/usr.sbin/bhyve/usb_mouse.c
/freebsd-11-stable/usr.sbin/bhyve/vga.c
/freebsd-11-stable/usr.sbin/bhyve/vga.h
# 301213 02-Jun-2016 markj

Always start IPv6 DAD asynchronously.

Otherwise we transmit the first neighbour solicitation in the context of the
caller of nd6_dad_start(), which can easily result in lock recursion. When
DAD is to be started after some delay, we send the first NS from the DAD
callout handler, so just change the implementation to do this in the
non-delayed case as well.

Reviewed by: ae, hrs
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D6639


# 296991 17-Mar-2016 markj

Modify defrouter_remove() to perform the router lookup before removal.

This allows some simplification of its callers. No functional change
intended.

Tested by: Larry Rosenman (as part of a larger change)
MFC after: 1 month


# 296063 25-Feb-2016 markj

Lock the NDP default router list and count defrouter references.

This addresses a number of race conditions that can cause crashes as a
result of unsynchronized access to the list.

PR: 206904
Tested by: Larry Rosenman <ler@lerctr.org>,
Kevin Bowling <kevin.bowling@kev009.com>
MFC after: 2 months
Differential Revision: https://reviews.freebsd.org/D5315


# 295732 17-Feb-2016 markj

Release the ref acquired in nd6_dad_find() if DAD is already in progress.

MFC after: 1 week


# 295579 12-Feb-2016 markj

Remove a bogus comment from nd6_na_input().

The splnet() call that it refers to has been removed, and a lock for the
default router list is in fact needed.

MFC after: 1 week


# 295578 12-Feb-2016 markj

Remove superfluous return statements from the neighbour discovery code.

MFC after: 1 week


# 293633 10-Jan-2016 melifaro

Split in6_selectsrc() into in6_selectsrc_addr() and in6_selectsrc_socket().

in6_selectsrc() has 2 class of users: socket-based one (raw/udp/pcb/etc) and
socket-less (ND code). The main reason for that change is inability to
specify non-default FIB for callers w/o socket since (internally) inpcb
is used to determine fib.

As as result, add 2 wrappers for in6_selectsrc() (making in6_selectsrc()
static):
1) in6_selectsrc_socket() for the former class. Embed scope_ambiguous check
along with returning hop limit when needed.
2) in6_selectsrc_addr() for the latter case. Add 'fibnum' argument and
pass IPv6 address w/ explicitly specified scope as separate argument.

Reviewed by: ae (previous version)


# 293159 04-Jan-2016 melifaro

Add rib_lookup_info() to provide API for retrieving individual route
entries data in unified format.

There are control plane functions that require information other than
just next-hop data (e.g. individual rtentry fields like flags or
prefix/mask). Given that the goal is to avoid rte reference/refcounting,
re-use rt_addrinfo structure to store most rte fields. If caller wants
to retrieve key/mask or gateway (which are sockaddrs and are allocated
separately), it needs to provide sufficient-sized sockaddrs structures
w/ ther pointers saved in passed rt_addrinfo.

Convert:
* lltable new records checks (in_lltable_rtcheck(),
nd6_is_new_addr_neighbor().
* rtsock pre-add/change route check.
* IPv6 NS ND-proxy check (RADIX_MPATH code was eliminated because
1) we don't support RTF_ANNOUNCE ND-proxy for networks and there should
not be multiple host routes for such hosts 2) if we have multiple
routes we should inspect them (which is not done). 3) the entire idea
of abusing KRT as storage for ND proxy seems odd. Userland programs
should be used for that purpose).


# 293101 03-Jan-2016 melifaro

Remove 'struct route_int6' argument from in6_selectsrc() and
in6_selectif().

The main task of in6_selectsrc() is to return IPv6 SAS (along with
output interface used for scope checks). No data-path code uses
route argument for caching. The only users are icmp6 (reflect code),
ND6 ns/na generation code. All this fucntions are control-plane, so
there is no reason to try to 'optimize' something by passing cached
route into to ip6_output(). Given that, simplify code by eliminating
in6_selectsrc() 'struct route_in6' argument. Since in6_selectif() is
used only by in6_selectsrc(), eliminate its 'struct route_in6' argument,
too. While here, reshape rte-related code inside in6_selectif() to
free lookup result immediately after saving all the needed fields.


# 293038 01-Jan-2016 melifaro

Use lltable_get_ifp() instead of direct access to lltable fields.


# 292978 31-Dec-2015 melifaro

Implement interface link header precomputation API.

Add if_requestencap() interface method which is capable of calculating
various link headers for given interface. Right now there is support
for INET/INET6/ARP llheader calculation (IFENCAP_LL type request).
Other types are planned to support more complex calculation
(L2 multipath lagg nexthops, tunnel encap nexthops, etc..).

Reshape 'struct route' to be able to pass additional data (with is length)
to prepend to mbuf.

These two changes permits routing code to pass pre-calculated nexthop data
(like L2 header for route w/gateway) down to the stack eliminating the
need for other lookups. It also brings us closer to more complex scenarios
like transparently handling MPLS nexthops and tunnel interfaces.
Last, but not least, it removes layering violation introduced by flowtable
code (ro_lle) and simplifies handling of existing if_output consumers.

ARP/ND changes:
Make arp/ndp stack pre-calculate link header upon installing/updating lle
record. Interface link address change are handled by re-calculating
headers for all lles based on if_lladdr event. After these changes,
arpresolve()/nd6_resolve() returns full pre-calculated header for
supported interfaces thus simplifying if_output().
Move these lookups to separate ether_resolve_addr() function which ether
returs error or fully-prepared link header. Add <arp|nd6_>resolve_addr()
compat versions to return link addresses instead of pre-calculated data.

BPF changes:
Raw bpf writes occupied _two_ cases: AF_UNSPEC and pseudo_AF_HDRCMPLT.
Despite the naming, both of there have ther header "complete". The only
difference is that interface source mac has to be filled by OS for
AF_UNSPEC (controlled via BIOCGHDRCMPLT). This logic has to stay inside
BPF and not pollute if_output() routines. Convert BPF to pass prepend data
via new 'struct route' mechanism. Note that it does not change
non-optimized if_output(): ro_prepend handling is purely optional.
Side note: hackish pseudo_AF_HDRCMPLT is supported for ethernet and FDDI.
It is not needed for ethernet anymore. The only remaining FDDI user is
dev/pdq mostly untouched since 2007. FDDI support was eliminated from
OpenBSD in 2013 (sys/net/if_fddisubr.c rev 1.65).

Flowtable changes:
Flowtable violates layering by saving (and not correctly managing)
rtes/lles. Instead of passing lle pointer, pass pointer to pre-calculated
header data from that lle.

Differential Revision: https://reviews.freebsd.org/D4102


# 292402 17-Dec-2015 smh

Revert r292275 & r292379

glebius has concerns about these changes so reverting those can be discussed
and addressed.

Sponsored by: Multiplay


# 292379 16-Dec-2015 smh

Fix issues introduced by r292275

* Fix panic for etherswitches which don't have a LLADDR.
* Disabled DELAY in unsolicited NDA, which needs further work.
* Fixed missing DELAY in carp_send_na.
* style(9) fix.

Reported by: kp & melifaro
X-MFC-With: r292275
MFC after: 1 month
Sponsored by: Multiplay


# 292275 15-Dec-2015 smh

Fix lagg failover due to missing notifications

When using lagg failover mode neither Gratuitous ARP (IPv4) or Unsolicited
Neighbour Advertisements (IPv6) are sent to notify other nodes that the
address may have moved.

This results is slow failover, dropped packets and network outages for the
lagg interface when the primary link goes down.

We now use the new if_link_state_change_cond with the force param set to
allow lagg to force through link state changes and hence fire a
ifnet_link_event which are now monitored by rip and nd6.

Upon receiving these events each protocol trigger the relevant
notifications:
* inet4 => Gratuitous ARP
* inet6 => Unsolicited Neighbour Announce

This also fixes the carp IPv6 NA's that stopped working after r251584 which
added the ipv6_route__llma route.

The new behavour can be controlled using the sysctls:
* net.link.ether.inet.arp_on_link
* net.inet6.icmp6.nd6_on_link

Also removed unused param from lagg_port_state and added descriptions for the
sysctls while here.

PR: 156226
MFC after: 1 month
Sponsored by: Multiplay
Differential Revision: https://reviews.freebsd.org/D4111


# 292155 13-Dec-2015 melifaro

Remove LLE read lock from IPv6 fast path.

LLE structure is mostly unchanged during its lifecycle: there are only 2
things relevant for fast path lookup code:
1) link-level address change. Since r286722, these updates are performed
under AFDATA WLOCK.
2) Some sort of feedback indicating that this particular entry is used so
we send NS to perform reachability verification instead of expiring entry.
The only signal that is needed from fast path is something like binary
yes/no.
The latter is solved by the following changes:

Special r_skip_req (introduced in D3688) value is used for fast path feedback.
It is read lockless by fast path, but updated under req_mutex mutex. If this
field is non-zero, then fast path will acquire lock and set it back to 0.

After transitioning to STALE state, callout timer is armed to run each
V_nd6_delay seconds to make sure that if packet was transmitted at the start
of given interval, we would be able to switch to PROBE state in V_nd6_delay
seconds as user expects.
(in STALE state) timer is rescheduled until original V_nd6_gctimer expires
keeping lle in STALE state (remaining timer value stored in lle_remtime).
(in STALE state) timer is rescheduled if packet was transmitted less that
V_nd6_delay seconds ago to make sure we transition to PROBE state exactly
after V_n6_delay seconds.

As a result, all packets towards lle in REACHABLE/STALE/PROBE states are handled
by fast path without acquiring lle read lock.

Differential Revision: https://reviews.freebsd.org/D3780


# 290486 07-Nov-2015 melifaro

Unify setting lladdr for AF_INET[6].


# 288600 03-Oct-2015 hrs

- Schedule DAD for IN6_IFF_TENTATIVE addresses in nd6_timer(). This
catches cases that DAD probes cannot be sent because of
IFF_UP && !IFF_DRV_RUNNING.

- nd6_dad_starttimer() now calls nd6_dad_ns_output(), instead of
calling it before nd6_dad_starttimer().

- Do not release an entry in dadq when a duplicate entry is being
added.


# 288301 27-Sep-2015 melifaro

Eliminate nd6_nud_hint() and its TCP bindings.

Initially function was introduced in r53541 (KAME initial commit) to
"provide hints from upper layer protocols that indicate a connection
is making "forward progress"" (quote from RFC 2461 7.3.1 Reachability
Confirmation).
However, it was converted to do nothing (e.g. just return) in r122922
(tcp_hostcache implementation) back in 2003. Some defines were moved
to tcp_var.h in r169541. Then, it was broken (for non-corner cases)
by r186119 (L2<>L3 split) in 2008 (NULL ifp in nd6_lookup). So,
right now this code is broken and has no "real" base users.

Differential Revision: https://reviews.freebsd.org/D3699


# 288062 21-Sep-2015 melifaro

Unify nd6 state switching by using newly-created nd6_llinfo_setstate()
function. The change is mostly mechanical with the following exception:
Last piece of nd6_resolve_slow() was refactored: ND6_LLINFO_PERMANENT
condition was removed as always-true, explicit ND6_LLINFO_NOSTATE ->
ND6_LLINFO_INCOMPLETE state transition was removed as duplicate.

Reviewed by: ae
Sponsored by: Yandex LLC


# 287484 05-Sep-2015 melifaro

Do not pass lle to nd6_ns_output(). Use newly-added
nd6_llinfo_get_holdsrc() to extract desired IPv6 source
from holdchain and pass it to the nd6_ns_output().


# 287094 24-Aug-2015 hrs

- Deprecate IN6_IFF_NODAD. It was used to prevent DAD on a loopback
interface but in6if_do_dad() already had a check for IFF_LOOPBACK.

- Remove in6if_do_dad() check in in6_broadcast_ifa(). An address
which needs DAD always has IN6_IFF_TENTATIVE there.

- in6if_do_dad() now returns EAGAIN when the interface is not ready
since DAD callout handler ignores such an interface.

- In DAD callout handler, mark an address as IN6_IFF_TENTATIVE
when the interface has ND6_IFF_IFDISABLED. And Do IFF_UP and
IFF_DRV_RUNNING check consistently when DAD is required.

- draft-ietf-6man-enhanced-dad is now published as RFC 7527.

- Fix some typos.


# 286624 11-Aug-2015 melifaro

Store addresses instead of sockaddrs inside llentry.
This permits us having all (not fully true yet) all the info
needed in lookup process in first 64 bytes of 'struct llentry'.

struct llentry layout:
BEFORE:
[rwlock .. state .. state .. MAC ] (lle+1) [sockaddr_in[6]]
AFTER
[ in[6]_addr MAC .. state .. rwlock ]

Currently, address part of struct llentry has only 16 bytes for the key.
However, lltable does not restrict any custom lltable consumers with long
keys use the previous approach (store key at (lle+1)).

Sponsored by: Yandex LLC


# 282805 12-May-2015 hrs

- Remove ND6_IFF_IGNORELOOP. This functionality was useless in practice
because a link where looped back NS messages are permanently observed
does not work with either NDP or ARP for IPv4.

- draft-ietf-6man-enhanced-dad is now RFC 7527.

Discussed with: hiren
MFC after: 3 days


# 282354 02-May-2015 glebius

Remove #ifdef IFT_FOO.

Submitted by: Guy Yur <guyyur gmail.com>


# 281309 09-Apr-2015 ae

Fix the check for maximum mbuf's size needed to send ND6 NA and NS.
It is acceptable that the size can be equal to MCLBYTES. In the later
KAME's code this check has been moved under DIAGNOSTIC ifdef, because
the size of NA and NS is much smaller than MCLBYTES. So, it is safe to
replace the check with KASSERT.

PR: 199304
Discussed with: glebius
MFC after: 1 week


# 280236 18-Mar-2015 ae

To avoid a possible race, release the reference to ifa after return
from nd6_dad_na_input().

Submitted by: Alexandre Martins
MFC after: 1 week


# 279676 05-Mar-2015 hrs

- Implement loopback probing state in enhanced DAD algorithm.

- Add no_dad and ignoreloop per-IF knob. no_dad disables DAD completely,
and ignoreloop is to prevent infinite loop in loopback probing state when
loopback is permanently expected.


# 279564 03-Mar-2015 ae

Create nd6_ns_output_fib() function with extra argument fibnum. Use it
to initialize mbuf's fibnum. Uninitialized fibnum value can lead to
panic in the routing code. Currently we use only RT_DEFAULT_FIB value
for initialization.

Differential Revision: https://reviews.freebsd.org/D1998
Reviewed by: hrs (previous version)
Sponsored by: Yandex LLC


# 279559 03-Mar-2015 hrs

Nonce has to be non-NULL for DAD even if net.inet6.ip6.dad_enhanced=0.


# 279531 02-Mar-2015 hrs

Implement Enhanced DAD algorithm for IPv6 described in
draft-ietf-6man-enhanced-dad-13.

This basically adds a random nonce option (RFC 3971) to NS messages
for DAD probe to detect a looped back packet. This looped back packet
prevented DAD on some pseudo-interfaces which aggregates multiple L2 links
such as lagg(4).

The length of the nonce is set to 6 bytes. This algorithm can be disabled by
setting net.inet6.ip6.dad_enhanced sysctl to 0 in a per-vnet basis.

Reported by: hiren
Reviewed by: ae
Differential Revision: https://reviews.freebsd.org/D1835


# 276844 08-Jan-2015 melifaro

* Use newly-created nd6_grab_holdchain() function to retrieve lle
hold mbuf chain instead of calling full-blown nd6_output_lle()
for each packet. This simplifies both callers and nd6_output_lle()
implementation.
* Make nd6_output_lle() static and remove now-unused lle and chain
arguments.
* Rename nd6_output_flush() -> nd6_flush_holdchain() to be consistent.
* Move all pre-send transmit hooks to newly-created nd6_output_ifp().
Now nd6_output(), nd6_output_lle() and nd6_flush_holdchain() are using
it to send mbufs to if_output.
* Remove SeND hook from nd6_na_input() because it was implemented
incorrectly since the beginning (r211501):
- it tagged initial input mbuf (m) instead of m_hold
- tagging _all_ mbufs in holdchain seems to be wrong anyway.


# 276692 05-Jan-2015 rwatson

To ease changes to underlying mbuf structure and the mbuf allocator, reduce
the knowledge of mbuf layout, and in particular constants such as M_EXT,
MLEN, MHLEN, and so on, in mbuf consumers by unifying various alignment
utility functions (M_ALIGN(), MH_ALIGN(), MEXT_ALIGN() in a single
M_ALIGN() macro, implemented by a now-inlined m_align() function:

- Move m_align() from uipc_mbuf.c to mbuf.h; mark as __inline.
- Reimplement M_ALIGN(), MH_ALIGN(), and MEXT_ALIGN() using m_align().
- Update consumers around the tree to simply use M_ALIGN().

This change eliminates a number of cases where mbuf consumers must be aware
of whether or not mbufs returned by the allocator use external storage, but
also assumptions about the size of the returned mbuf. This will make it
easier to introduce changes in how we use external storage, as well as
features such as variable-size mbufs.

Differential Revision: https://reviews.freebsd.org/D1436
Reviewed by: glebius, trasz, gnn, bz
Sponsored by: EMC / Isilon Storage Division


# 275700 11-Dec-2014 markj

Revert r275695: nd6_dad_find() was already correct.

Reported by: ae, kib
Pointy hat to: markj


# 275695 10-Dec-2014 markj

Fix a bug in r266857: nd6_dad_find() must return NULL if it doesn't find
a matching element in the DAD queue.

Reported by: Holger Hans Peter Freyther <holger@freyther.de>
MFC after: 3 days


# 275593 08-Dec-2014 markj

Add refcounting to IPv6 DAD objects and simplify the DAD code to fix a
number of races which could cause double frees or use-after-frees when
performing DAD on an address. In particular, an IPv6 address can now only be
marked as a duplicate from the DAD callout.

Differential Revision: https://reviews.freebsd.org/D1258
Reviewed by: ae, hrs
Reported by: rstone
MFC after: 1 month


# 274347 10-Nov-2014 melifaro

* Make nd6_dad_duplicated() constant.
* Simplify refcounting by using nd6_dad_add() / nd6_dad_del().

Reviewed by: ae
MFC after: 2 weeks
Sponsored by: Yandex LLC


# 274316 09-Nov-2014 melifaro

Remove unused 'struct route *' argument from nd6_output_flush().


# 266857 29-May-2014 hrs

Add rwlock to struct dadq. A panic could occur when a large number of
addresses performed DAD at the same time.


# 266248 16-May-2014 hrs

Cancel DAD for an ifa when the ifp has ND6_IFF_IFDISABLED as early as
possible and do not clear IN6_IFF_TENTATIVE. If IFDISABLED was accidentally
set after a DAD started, TENTATIVE could be cleared because no NA was
received due to IFDISABLED, and as a result it could prevent DAD when
manually clearing IFDISABLED after that.


# 264976 26-Apr-2014 melifaro

Use "hash" value in rtalloc_mpath_fib() instead of RTF_ANNOUNCE flag.
Hashing method is the same as in in6_src.c. (Probably we need better one).

MFC after: 2 weeks


# 260187 02-Jan-2014 ae

lla_lookup() does modification only when LLE_CREATE is specified.
Thus we can use IF_AFDATA_RLOCK() instead of IF_AFDATA_LOCK() when doing
lla_lookup() without LLE_CREATE flag.

Reviewed by: glebius, adrian
MFC after: 1 week
Sponsored by: Yandex LLC


# 257985 11-Nov-2013 ae

Fix panic with RADIX_MPATH, when RTFREE_LOCKED() called for already
unlocked route. Use in6_rtalloc() instead of in6_rtalloc1. This helps
simplify the code and remove several now unused variables.

PR: 156283
MFC after: 2 weeks


# 248328 15-Mar-2013 glebius

- Use m_getcl() instead of hand allocating.
- Do not calculate constant length values at run time,
CTASSERT() their sanity.
- Remove superfluous cleaning of mbuf fields after allocation.
- Replace compat macros with function calls.

Sponsored by: Nginx, Inc.


# 245925 25-Jan-2013 np

Generate lle_event in the IPv6 neighbor discovery code too.

Reviewed by: bz@


# 243882 05-Dec-2012 glebius

Mechanically substitute flags from historic mbuf allocator with
malloc(9) flags within sys.

Exceptions:

- sys/contrib not touched
- sys/mbuf.h edited manually


# 241686 18-Oct-2012 andre

Mechanically remove the last stray remains of spl* calls from net*/*.
They have been Noop's for a long time now.


# 238092 04-Jul-2012 glebius

When ip_output()/ip6_output() is supplied a struct route *ro argument,
it skips FLOWTABLE lookup. However, the non-NULL ro has dual meaning
here: it may be supplied to provide route, and it may be supplied to
store and return to caller the route that ip_output()/ip6_output()
finds. In the latter case skipping FLOWTABLE lookup is pessimisation.

The difference between struct route filled by FLOWTABLE and filled
by rtalloc() family is that the former doesn't hold a reference on
its rtentry. Reference is hold by flow entry, and it is about to
be released in future. Thus, route filled by FLOWTABLE shouldn't
be passed to RTFREE() macro.

- Introduce new flag for struct route/route_in6, that marks route
not holding a reference on rtentry.
- Introduce new macro RO_RTFREE() that cleans up a struct route
depending on its kind.
- All callers to ip_output()/ip6_output() that do supply non-NULL
but empty route should use RO_RTFREE() to free results of
lookup.
- ip_output()/ip6_output() now do FLOWTABLE lookup always when
ro->ro_rt == NULL.

Tested by: tuexen (SCTP part)


# 231852 17-Feb-2012 bz

Merge multi-FIB IPv6 support from projects/multi-fibv6/head/:

Extend the so far IPv4-only support for multiple routing tables (FIBs)
introduced in r178888 to IPv6 providing feature parity.

This includes an extended rtalloc(9) KPI for IPv6, the necessary
adjustments to the network stack, and user land support as in netstat.

Sponsored by: Cisco Systems, Inc.
Reviewed by: melifaro (basically)
MFC after: 10 days


# 228571 16-Dec-2011 glebius

A major overhaul of the CARP implementation. The ip_carp.c was started
from scratch, copying needed functionality from the old implemenation
on demand, with a thorough review of all code. The main change is that
interface layer has been removed from the CARP. Now redundant addresses
are configured exactly on the interfaces, they run on.

The CARP configuration itself is, as before, configured and read via
SIOCSVH/SIOCGVH ioctls. A new prefix created with SIOCAIFADDR or
SIOCAIFADDR_IN6 may now be configured to a particular virtual host id,
which makes the prefix redundant.

ifconfig(8) semantics has been changed too: now one doesn't need
to clone carpXX interface, he/she should directly configure a vhid
on a Ethernet interface.

To supply vhid data from the kernel to an application the getifaddrs(8)
function had been changed to pass ifam_data with each address. [1]

The new implementation definitely closes all PRs related to carp(4)
being an interface, and may close several others. It also allows
to run a single redundant IP per interface.

Big thanks to Bjoern Zeeb for his help with inet6 part of patch, for
idea on using ifam_data and for several rounds of reviewing!

PR: kern/117000, kern/126945, kern/126714, kern/120130, kern/117448
Reviewed by: bz
Submitted by: bz [1]


# 226340 13-Oct-2011 glebius

Use TAILQ_FOREACH() in the nd6_dad_find() instead of hand-rolled implementation.


# 222728 06-Jun-2011 hrs

- Accept Router Advertisement messages even when net.inet6.ip6.forwarding=1.

- A new per-interface knob IFF_ND6_NO_RADR and sysctl IPV6CTL_NO_RADR.
This controls if accepting a route in an RA message as the default route.
The default value for each interface can be set by net.inet6.ip6.no_radr.
The system wide default value is 0.

- A new sysctl: net.inet6.ip6.norbit_raif. This controls if setting R-bit in
NA on RA accepting interfaces. The default is 0 (R-bit is set based on
net.inet6.ip6.forwarding).

Background:

IPv6 host/router model suggests a router sends an RA and a host accepts it for
router discovery. Because of that, KAME implementation does not allow
accepting RAs when net.inet6.ip6.forwarding=1. Accepting RAs on a router can
make the routing table confused since it can change the default router
unintentionally.

However, in practice there are cases where we cannot distinguish a host from
a router clearly. For example, a customer edge router often works as a host
against the ISP, and as a router against the LAN at the same time. Another
example is a complex network configurations like an L2TP tunnel for IPv6
connection to Internet over an Ethernet link with another native IPv6 subnet.
In this case, the physical interface for the native IPv6 subnet works as a
host, and the pseudo-interface for L2TP works as the default IP forwarding
route.

Problem:

Disabling processing RA messages when net.inet6.ip6.forwarding=1 and
accepting them when net.inet6.ip6.forward=0 cause the following practical
issues:

- A router cannot perform SLAAC. It becomes a problem if a box has
multiple interfaces and you want to use SLAAC on some of them, for
example. A customer edge router for IPv6 Internet access service
using an IPv6-over-IPv6 tunnel sometimes needs SLAAC on the
physical interface for administration purpose; updating firmware
and so on (link-local addresses can be used there, but GUAs by
SLAAC are often used for scalability).

- When a host has multiple IPv6 interfaces and it receives multiple RAs on
them, controlling the default route is difficult. Router preferences
defined in RFC 4191 works only when the routers on the links are
under your control.

Details of Implementation Changes:

Router Advertisement messages will be accepted even when
net.inet6.ip6.forwarding=1. More precisely, the conditions are as
follow:

(ACCEPT_RTADV && !NO_RADR && !ip6.forwarding)
=> Normal RA processing on that interface. (as IPv6 host)

(ACCEPT_RTADV && (NO_RADR || ip6.forwarding))
=> Accept RA but add the router to the defroute list with
rtlifetime=0 unconditionally. This effectively prevents
from setting the received router address as the box's
default route.

(!ACCEPT_RTADV)
=> No RA processing on that interface.

ACCEPT_RTADV and NO_RADR are per-interface knob. In short, all interface
are classified as "RA-accepting" or not. An RA-accepting interface always
processes RA messages regardless of ip6.forwarding. The difference caused by
NO_RADR or ip6.forwarding is whether the RA source address is considered as
the default router or not.

R-bit in NA on the RA accepting interfaces is set based on
net.inet6.ip6.forwarding. While RFC 6204 W-1 rule (for CPE case) suggests
a router should disable the R-bit completely even when the box has
net.inet6.ip6.forwarding=1, I believe there is no technical reason with
doing so. This behavior can be set by a new sysctl net.inet6.ip6.norbit_raif
(the default is 0).

Usage:

# ifconfig fxp0 inet6 accept_rtadv
=> accept RA on fxp0
# ifconfig fxp0 inet6 accept_rtadv no_radr
=> accept RA on fxp0 but ignore default route information in it.
# sysctl net.inet6.ip6.norbit_no_radr=1
=> R-bit in NAs on RA accepting interfaces will always be set to 0.


# 220743 17-Apr-2011 bz

Fix IPv6 ND. After r219562 we in nd6_ns_input() were erroneously always
passing the cached proxydl reference (sockaddr_dl initialized or not) to
nd6_na_output(). nd6_na_output() will thus assume a proxy NA. Revert to
conditionally passing either &proxydl or NULL if no proxy case desired.

Tested by: ipv6gw and ref9-i386
Reported by: Pete French (petefrench ingresso.co.uk on stable)
Reported by: bz, simon on Y! cluster
Reported by: kib
PR: kern/151908
MFC after: 3 days


# 219819 21-Mar-2011 jeff

- Merge changes to the base system to support OFED. These include
a wider arg2 for sysctl, updates to vlan code, IFT_INFINIBAND,
and other miscellaneous small features.


# 219562 12-Mar-2011 bz

Make sure the locally cached value of rt->rt_gateway stays stable,
even after dropping the reference and unlocking. Previously we
have dereferenced a NULL pointer (after r121765).
Simply unlocking after the block does not work either because of
lock ordering (see r121765) and in addition we would still hold
a pointer to something that might be gone by the time we access it.
Thus take a copy of the value rather than just caching the pointer.

PR: kern/151908
Submitted by: chenyl (netstar2008 126.com) (initial version)
MFC after: 2 weeks


# 216022 28-Nov-2010 bz

Plug well observed races on la_hold entries with the callout handler.

Call the handler function with the lock held, return unlocked as we
might free the entry. Rework functions later in the call graph to be
either called with the lock held or, only if needed, unlocked.

Place asserts to document and tighten assumptions on various lle locking,
which were not always true before.

We call nd6_ns_output() unlocked and the assignment of ip6->ip6_src was
decentralized to minimize possible complexity introduced with the formerly
missing locking there. This also resulted in a push down of local
variable scopes into smaller blocks.

Reported by: many
PR: kern/148857
Submitted by: Dmitrij Tejblum (tejblum yandex-team.ru) (original version)
MFC After: 4 days


# 215701 22-Nov-2010 dim

After some off-list discussion, revert a number of changes to the
DPCPU_DEFINE and VNET_DEFINE macros, as these cause problems for various
people working on the affected files. A better long-term solution is
still being considered. This reversal may give some modules empty
set_pcpu or set_vnet sections, but these are harmless.

Changes reverted:

------------------------------------------------------------------------
r215318 | dim | 2010-11-14 21:40:55 +0100 (Sun, 14 Nov 2010) | 4 lines

Instead of unconditionally emitting .globl's for the __start_set_xxx and
__stop_set_xxx symbols, only emit them when the set_vnet or set_pcpu
sections are actually defined.

------------------------------------------------------------------------
r215317 | dim | 2010-11-14 21:38:11 +0100 (Sun, 14 Nov 2010) | 3 lines

Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout
the tree.

------------------------------------------------------------------------
r215316 | dim | 2010-11-14 21:23:02 +0100 (Sun, 14 Nov 2010) | 2 lines

Add macros to define static instances of VNET_DEFINE and DPCPU_DEFINE.


# 215559 20-Nov-2010 bz

In case of an early return from the function there is no need to zero
the route upfront, so defer as long as we can.

MFC after: 3 days


# 215317 14-Nov-2010 dim

Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout
the tree.


# 211501 19-Aug-2010 anchie

MFp4: anchie_soc2009 branch:

Add kernel side support for Secure Neighbor Discovery (SeND), RFC 3971.

The implementation consists of a kernel module that gets packets from
the nd6 code, sends them to user space on a dedicated socket and reinjects
them back for further processing.

Hooks are used from nd6 code paths to divert relevant packets to the
send implementation for processing in user space. The hooks are only
triggered if the send module is loaded. In case no user space
application is connected to the send socket, processing continues
normaly as if the module would not be loaded. Unloading the module
is not possible at this time due to missing nd6 locking.

The native SeND socket is similar to a raw IPv6 socket but with its own,
internal pseudo-protocol.

Approved by: bz (mentor)


# 211193 11-Aug-2010 will

Unbreak LINT by moving all carp hooks to net/if.c / netinet/ip_carp.h, with
the appropriate ifdefs.

Reviewed by: bz
Approved by: ken (mentor)


# 211157 10-Aug-2010 will

Allow carp(4) to be loaded as a kernel module. Follow precedent set by
bridge(4), lagg(4) etc. and make use of function pointers and
pf_proto_register() to hook carp into the network stack.

Currently, because of the uncertainty about whether the unload path is free
of race condition panics, unloads are disallowed by default. Compiling with
CARPMOD_CAN_UNLOAD in CFLAGS removes this anti foot shooting measure.

This commit requires IP6PROTOSPACER, introduced in r211115.

Reviewed by: bz, simon
Approved by: ken (mentor)
MFC after: 2 weeks


# 207369 29-Apr-2010 bz

MFP4: @176978-176982, 176984, 176990-176994, 177441

"Whitspace" churn after the VIMAGE/VNET whirls.

Remove the need for some "init" functions within the network
stack, like pim6_init(), icmp_init() or significantly shorten
others like ip6_init() and nd6_init(), using static initialization
again where possible and formerly missed.

Move (most) variables back to the place they used to be before the
container structs and VIMAGE_GLOABLS (before r185088) and try to
reduce the diff to stable/7 and earlier as good as possible,
to help out-of-tree consumers to update from 6.x or 7.x to 8 or 9.

This also removes some header file pollution for putatively
static global variables.

Revert VIMAGE specific changes in ipfilter::ip_auth.c, that are
no longer needed.

Reviewed by: jhb
Discussed with: rwatson
Sponsored by: The FreeBSD Foundation
Sponsored by: CK Software GmbH
MFC after: 6 days


# 197138 12-Sep-2009 hrs

Improve flexibility of receiving Router Advertisement and
automatic link-local address configuration:

- Convert a sysctl net.inet6.ip6.accept_rtadv to one for the
default value of a per-IF flag ND6_IFF_ACCEPT_RTADV, not a
global knob. The default value of the sysctl is 0.

- Add a new per-IF flag ND6_IFF_AUTO_LINKLOCAL and convert a
sysctl net.inet6.ip6.auto_linklocal to one for its default
value. The default value of the sysctl is 1.

- Make ND6_IFF_IFDISABLED more robust. It can be used to disable
IPv6 functionality of an interface now.

- Receiving RA is allowed if ip6_forwarding==0 *and*
ND6_IFF_ACCEPT_RTADV is set on that interface. The former
condition will be revisited later to support a "host + router" box
like IPv6 CPE router. The current behavior is compatible with
the older releases of FreeBSD.

- The ifconfig(8) now supports these ND6 flags as well as "nud",
"prefer_source", and "disabled" in ndp(8). The ndp(8) now
supports "auto_linklocal".

Discussed with: bz and jinmei
Reviewed by: bz
MFC after: 3 days


# 196019 01-Aug-2009 rwatson

Merge the remainder of kern_vimage.c and vimage.h into vnet.c and
vnet.h, we now use jails (rather than vimages) as the abstraction
for virtualization management, and what remained was specific to
virtual network stacks. Minor cleanups are done in the process,
and comments updated to reflect these changes.

Reviewed by: bz
Approved by: re (vimage blanket)


# 195727 16-Jul-2009 rwatson

Remove unused VNET_SET() and related macros; only VNET_GET() is
ever actually used. Rename VNET_GET() to VNET() to shorten
variable references.

Discussed with: bz, julian
Reviewed by: bz
Approved by: re (kensmith, kib)


# 195699 14-Jul-2009 rwatson

Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator. Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...). This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.

Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack. Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory. Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.

Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy. Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address. When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.

This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.

Bump __FreeBSD_version and update UPDATING.

Portions submitted by: bz
Reviewed by: bz, zec
Discussed with: gnn, jamie, jeff, jhb, julian, sam
Suggested by: peter
Approved by: re (kensmith)


# 194777 23-Jun-2009 bz

Make callers to in6_selectsrc() and in6_pcbladdr() pass in memory
to save the selected source address rather than returning an
unreferenced copy to a pointer that might long be gone by the
time we use the pointer for anything meaningful.

Asked for by: rwatson
Reviewed by: rwatson


# 194760 23-Jun-2009 rwatson

Modify most routines returning 'struct ifaddr *' to return references
rather than pointers, requiring callers to properly dispose of those
references. The following routines now return references:

ifaddr_byindex
ifa_ifwithaddr
ifa_ifwithbroadaddr
ifa_ifwithdstaddr
ifa_ifwithnet
ifaof_ifpforaddr
ifa_ifwithroute
ifa_ifwithroute_fib
rt_getifa
rt_getifa_fib
IFP_TO_IA
ip_rtaddr
in6_ifawithifp
in6ifa_ifpforlinklocal
in6ifa_ifpwithaddr
in6_ifadd
carp_iamatch6
ip6_getdstifaddr

Remove unused macro which didn't have required referencing:

IFP_TO_IA6

This closes many small races in which changes to interface
or address lists while an ifaddr was in use could lead to use of freed
memory (etc). In a few cases, add missing if_addr_list locking
required to safely acquire references.

Because of a lack of deep copying support, we accept a race in which
an in6_ifaddr pointed to by mbuf tags and extracted with
ip6_getdstifaddr() doesn't hold a reference while in transmit. Once
we have mbuf tag deep copy support, this can be fixed.

Reviewed by: bz
Obtained from: Apple, Inc. (portions)
MFC after: 6 weeks (portions)


# 194602 21-Jun-2009 rwatson

Clean up common ifaddr management:

- Unify reference count and lock initialization in a single function,
ifa_init().
- Move tear-down from a macro (IFAFREE) to a function ifa_free().
- Move reference count bump from a macro (IFAREF) to a function ifa_ref().
- Instead of using a u_int protected by a mutex to refcount(9) for
reference count management.

The ifa_mtx is now used for exactly one ioctl, and possibly should be
removed.

MFC after: 3 weeks


# 191816 05-May-2009 zec

Change the curvnet variable from a global const struct vnet *,
previously always pointing to the default vnet context, to a
dynamically changing thread-local one. The currvnet context
should be set on entry to networking code via CURVNET_SET() macros,
and reverted to previous state via CURVNET_RESTORE(). Recursions
on curvnet are permitted, though strongly discuouraged.

This change should have no functional impact on nooptions VIMAGE
kernel builds, where CURVNET_* macros expand to whitespace.

The curthread->td_vnet (aka curvnet) variable's purpose is to be an
indicator of the vnet context in which the current network-related
operation takes place, in case we cannot deduce the current vnet
context from any other source, such as by looking at mbuf's
m->m_pkthdr.rcvif->if_vnet, sockets's so->so_vnet etc. Moreover, so
far curvnet has turned out to be an invaluable consistency checking
aid: it helps to catch cases when sockets, ifnets or any other
vnet-aware structures may have leaked from one vnet to another.

The exact placement of the CURVNET_SET() / CURVNET_RESTORE() macros
was a result of an empirical iterative process, whith an aim to
reduce recursions on CURVNET_SET() to a minimum, while still reducing
the scope of CURVNET_SET() to networking only operations - the
alternative would be calling CURVNET_SET() on each system call entry.
In general, curvnet has to be set in three typicall cases: when
processing socket-related requests from userspace or from within the
kernel; when processing inbound traffic flowing from device drivers
to upper layers of the networking stack, and when executing
timer-driven networking functions.

This change also introduces a DDB subcommand to show the list of all
vnet instances.

Approved by: julian (mentor)


# 190964 12-Apr-2009 rwatson

Update stats in struct icmpstat and icmp6stat using four new
macros: ICMPSTAT_ADD(), ICMPSTAT_INC(), ICMP6STAT_ADD(), and
ICMP6STAT_INC(), rather than directly manipulating the fields
of these structures across the kernel. This will make it
easier to change the implementation of these statistics,
such as using per-CPU versions of the data structures.

In on case, icmp6stat members are manipulated indirectly, by
icmp6_errcount(), and this will require further work to fix
for per-CPU stats.

MFC after: 3 days


# 187946 31-Jan-2009 bz

Like with r185713 make sure to not leak a lock as rtalloc1(9) returns
a locked route. Thus we have to use RTFREE_LOCKED(9) to get it unlocked
and rtfree(9)d rather than just rtfree(9)d.

Since the PR was filed, new places with the same problem were added
with new code. Also check that the rt is valid before freeing it
either way there.

PR: kern/129793
Submitted by: Dheeraj Reddy <dheeraj@ece.gatech.edu>
MFC after: 2 weeks
Committed from: Bugathon #6


# 186468 23-Dec-2008 kmacy

avoid lock recursion by deferring the link check until after LLE lock is dropped


# 186160 16-Dec-2008 kmacy

explicitly check return of lla_lookup against NULL


# 186119 15-Dec-2008 qingli

This main goals of this project are:
1. separating L2 tables (ARP, NDP) from the L3 routing tables
2. removing as much locking dependencies among these layers as
possible to allow for some parallelism in the search operations
3. simplify the logic in the routing code,

The most notable end result is the obsolescent of the route
cloning (RTF_CLONING) concept, which translated into code reduction
in both IPv4 ARP and IPv6 NDP related modules, and size reduction in
struct rtentry{}. The change in design obsoletes the semantics of
RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland
applications such as "arp" and "ndp" have been modified to reflect
those changes. The output from "netstat -r" shows only the routing
entries.

Quite a few developers have contributed to this project in the
past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and
Andre Oppermann. And most recently:

- Kip Macy revised the locking code completely, thus completing
the last piece of the puzzle, Kip has also been conducting
active functional testing
- Sam Leffler has helped me improving/refactoring the code, and
provided valuable reviews
- Julian Elischer setup the perforce tree for me and has helped
me maintaining that branch before the svn conversion


# 185571 02-Dec-2008 bz

Rather than using hidden includes (with cicular dependencies),
directly include only the header files needed. This reduces the
unneeded spamming of various headers into lots of files.

For now, this leaves us with very few modules including vnet.h
and thus needing to depend on opt_route.h.

Reviewed by: brooks, gnn, des, zec, imp
Sponsored by: The FreeBSD Foundation


# 185348 26-Nov-2008 zec

Merge more of currently non-functional (i.e. resolving to
whitespace) macros from p4/vimage branch.

Do a better job at enclosing all instantiations of globals
scheduled for virtualization in #ifdef VIMAGE_GLOBALS blocks.

De-virtualize and mark as const saorder_state_alive and
saorder_state_any arrays from ipsec code, given that they are never
updated at runtime, so virtualizing them would be pointless.

Reviewed by: bz, julian
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation


# 185088 19-Nov-2008 zec

Change the initialization methodology for global variables scheduled
for virtualization.

Instead of initializing the affected global variables at instatiation,
assign initial values to them in initializer functions. As a rule,
initialization at instatiation for such variables should never be
introduced again from now on. Furthermore, enclose all instantiations
of such global variables in #ifdef VIMAGE_GLOBALS blocks.

Essentialy, this change should have zero functional impact. In the next
phase of merging network stack virtualization infrastructure from
p4/vimage branch, the new initialization methology will allow us to
switch between using global variables and their counterparts residing in
virtualization containers with minimum code churn, and in the long run
allow us to intialize multiple instances of such container structures.

Discussed at: devsummit Strassburg
Reviewed by: bz, julian
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation


# 183550 02-Oct-2008 zec

Step 1.5 of importing the network stack virtualization infrastructure
from the vimage project, as per plan established at devsummit 08/08:
http://wiki.freebsd.org/Image/Notes200808DevSummit

Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator
macros, and CURVNET_SET() context setting macros, all currently
resolving to NOPs.

Prepare for virtualization of selected SYSCTL objects by introducing a
family of SYSCTL_V_*() macros, currently resolving to their global
counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT().

Move selected #defines from sys/sys/vimage.h to newly introduced header
files specific to virtualized subsystems (sys/net/vnet.h,
sys/netinet/vinet.h etc.).

All the changes are verified to have zero functional impact at this
point in time by doing MD5 comparision between pre- and post-change
object files(*).

(*) netipsec/keysock.c did not validate depending on compile time options.

Implemented by: julian, bz, brooks, zec
Reviewed by: julian, bz, brooks, kris, rwatson, ...
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation


# 183529 01-Oct-2008 cperciva

Default to ignoring potentially evil IPv6 Neighbor Solicitation
messages.

Approved by: so (cperciva)
Approved by: re (kensmith)
Security: FreeBSD-SA-08:10.nd6
Thanks to: jinmei, bz


# 182150 25-Aug-2008 julian

another missed V_


# 181803 17-Aug-2008 bz

Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).

This is the first in a series of commits over the course
of the next few weeks.

Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.

We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.

Obtained from: //depot/projects/vimage-commit2/...
Reviewed by: brooks, des, ed, mav, julian,
jamie, kris, rwatson, zec, ...
(various people I forgot, different versions)
md5 (with a bit of help)
Sponsored by: NLnet Foundation, The FreeBSD Foundation
X-MFC after: never
V_Commit_Message_Reviewed_By: more people than the patch


# 178167 13-Apr-2008 qingli

This patch provides the back end support for equal-cost multi-path
(ECMP) for both IPv4 and IPv6. Previously, multipath route insertion
is disallowed. For example,

route add -net 192.103.54.0/24 10.9.44.1
route add -net 192.103.54.0/24 10.9.44.2

The second route insertion will trigger an error message of
"add net 192.103.54.0/24: gateway 10.2.5.2: route already in table"

Multiple default routes can also be inserted. Here is the netstat
output:

default 10.2.5.1 UGS 0 3074 bge0 =>
default 10.2.5.2 UGS 0 0 bge0

When multipath routes exist, the "route delete" command requires
a specific gateway to be specified or else an error message would
be displayed. For example,

route delete default

would fail and trigger the following error message:

"route: writing to routing socket: No such process"
"delete net default: not in table"

On the other hand,

route delete default 10.2.5.2

would be successful: "delete net default: gateway 10.2.5.2"

One does not have to specify a gateway if there is only a single
route for a particular destination.

I need to perform more testings on address aliases and multiple
interfaces that have the same IP prefixes. This patch as it
stands today is not yet ready for prime time. Therefore, the ECMP
code fragments are fully guarded by the RADIX_MPATH macro.
Include the "options RADIX_MPATH" in the kernel configuration
to enable this feature.

Reviewed by: robert, sam, gnn, julian, kmacy


# 175162 08-Jan-2008 obrien

un-__P()


# 174510 10-Dec-2007 obrien

Clean up VCS Ids.


# 171260 05-Jul-2007 delphij

Space cleanup

Approved by: re (rwatson)


# 171259 05-Jul-2007 delphij

ANSIfy[1] plus some style cleanup nearby.

Discussed with: gnn, rwatson
Submitted by: Karl Sj?dahl - dunceor <dunceor gmail com> [1]
Approved by: re (rwatson)


# 171133 01-Jul-2007 gnn

Commit IPv6 support for FAST_IPSEC to the tree.
This commit includes only the kernel files, the rest of the files
will follow in a second commit.

Reviewed by: bz
Approved by: re
Supported by: Secure Computing


# 169273 05-May-2007 suz

some minor modification to the previous commit to sys/netinet6/nd6.c and nd6_nbr.c.
- added some clarification comments
- removed an unnecesary code

Obtained from: KAME
MFC after: 1 week


# 169241 04-May-2007 suz

fixed a memory leak in unresolved ND queue processing

Obtained from: KAME
MFC after: 1 week


# 165118 12-Dec-2006 bz

MFp4: 92972, 98913 + one more change

In ip6_sprintf no longer use and return one of eight static buffers
for printing/logging ipv6 addresses.
The caller now has to hand in a sufficiently large buffer as first
argument.


# 160981 04-Aug-2006 brooks

With exception of the if_name() macro, all definitions in net_osdep.h
were unused or already in if_var.h so add if_name() to if_var.h and
remove net_osdep.h along with all references to it.

Longer term we may want to kill off if_name() entierly since all modern
BSDs have if_xname variables rendering it unnecessicary.


# 153227 08-Dec-2005 ume

the response NS to a DAD NS was not sent correctly due to the
invalid destination address.

Submitted by: JINMEI Tatuya <jinmei__at__isl.rdc.toshiba.co.jp>
MFC after: 1 day


# 151539 21-Oct-2005 suz

sync with KAME regarding NDP

- introduced fine-grain-timer to manage ND-caches and IPv6 Multicast-Listeners
- supports Router-Preference <draft-ietf-ipv6-router-selection-07.txt>
- better prefix lifetime management
- more spec-comformant DAD advertisement
- updated RFC/internet-draft revisions

Obtained from: KAME
Reviewed by: ume, gnn
MFC after: 2 month


# 151536 21-Oct-2005 suz

sync with KAME (renamed a macro IPV6_DADOUTPUT to IPV6_UNSPECSRC)

Obtained from: KAME


# 151479 19-Oct-2005 suz

sync with KAME (nuked unused code, use NULL to denote a NULL pointer)

Obtained from: KAME
Reviewed by: ume, gnn


# 151477 19-Oct-2005 suz

sync with KAME regarding the following clarification in RFC3542:
- disable IPv6 operation if DAD fails for some EUI-64 link-local addresses.
- export get_hw_ifid() (and rename it) as a subroutine for this process.

Obtained from: KAME
Reviewd by: ume, gnn
MFC after: 2 week


# 151465 19-Oct-2005 suz

sync with KAME in the following points:
- fixed typos
- improved some comment descriptions
- use NULL, instead of 0, to denote a NULL pointer
- avoid embedding a magic number in the code
- use nd6log() instead of log() to record NDP-specific logs
- nuked an unnecessay white space

Obtained from: KAME
MFC after: 1 day


# 150202 15-Sep-2005 suz

plugged a possible memory leak

Obtained from: KAME
MFC after: 1 day


# 149829 06-Sep-2005 thompsa

Add support for multicast to the bridge and allow inet6 addresses to be
assigned to the interface.

IPv6 auto-configuration is disabled. An IPv6 link-local address has a
link-local scope within one link, the spec is unclear for the bridge case and
it may cause scope violation.

An address can be assigned in the usual way;
ifconfig bridge0 inet6 xxxx:...

Tested by: bmah
Reviewed by: ume (netinet6)
Approved by: mlaier (mentor)
MFC after: 1 week


# 148987 12-Aug-2005 ume

- fix typo in comment.
- nuke unused code.

Submitted by: suz
Obtained from: KAME


# 148887 09-Aug-2005 rwatson

Propagate rename of IFF_OACTIVE and IFF_RUNNING to IFF_DRV_OACTIVE and
IFF_DRV_RUNNING, as well as the move from ifnet.if_flags to
ifnet.if_drv_flags. Device drivers are now responsible for
synchronizing access to these flags, as they are in if_drv_flags. This
helps prevent races between the network stack and device driver in
maintaining the interface flags field.

Many __FreeBSD__ and __FreeBSD_version checks maintained and continued;
some less so.

Reviewed by: pjd, bz
MFC after: 7 days


# 148385 25-Jul-2005 ume

scope cleanup. with this change
- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.

Submitted by: JINMEI Tatuya <jinmei__at__isl.rdc.toshiba.co.jp>
Obtained from: KAME


# 147306 11-Jun-2005 brooks

Fix IPv6 neighbor discovery by using IF_LLADDR to get the mac address
instead of a particularly ugly cast + pointer math hack.

Reported by: kuriyama, kris


# 142215 22-Feb-2005 glebius

Add CARP (Common Address Redundancy Protocol), which allows multiple
hosts to share an IP address, providing high availability and load
balancing.

Original work on CARP done by Michael Shalayeff, with many
additions by Marco Pfatschbacher and Ryan McBride.

FreeBSD port done solely by Max Laier.

Patch by: mlaier
Obtained from: OpenBSD (mickey, mcbride)


# 139826 07-Jan-2005 imp

/* -> /*- for license, minor formatting changes, separate for KAME


# 128422 19-Apr-2004 luigi

ifp has the same value as rt->rti_ifp so remove the dependency
on the route entry to locate the necessary information.


# 125873 16-Feb-2004 ume

we don't need to include ipsec.h.

Submitted by: "Bjoern A. Zeeb" <bzeeb+freebsd@zabbadoz.net>


# 125396 03-Feb-2004 ume

pass pcb rather than so. it is expected that per socket policy
works again.


# 122062 04-Nov-2003 ume

- cleanup SP refcnt issue.
- share policy-on-socket for listening socket.
- don't copy policy-on-socket at all. secpolicy no longer contain
spidx, which saves a lot of memory.
- deep-copy pcb policy if it is an ipsec policy. assign ID field to
all SPD entries. make it possible for racoon to grab SPD entry on
pcb.
- fixed the order of searching SA table for packets.
- fixed to get a security association header. a mode is always needed
to compare them.
- fixed that the incorrect time was set to
sadb_comb_{hard|soft}_usetime.
- disallow port spec for tunnel mode policy (as we don't reassemble).
- an user can define a policy-id.
- clear enc/auth key before freeing.
- fixed that the kernel crashed when key_spdacquire() was called
because key_spdacquire() had been implemented imcopletely.
- preparation for 64bit sequence number.
- maintain ordered list of SA, based on SA id.
- cleanup secasvar management; refcnt is key.c responsibility;
alloc/free is keydb.c responsibility.
- cleanup, avoid double-loop.
- use hash for spi-based lookup.
- mark persistent SP "persistent".
XXX in theory refcnt should do the right thing, however, we have
"spdflush" which would touch all SPs. another solution would be to
de-register persistent SPs from sptree.
- u_short -> u_int16_t
- reduce kernel stack usage by auto variable secasindex.
- clarify function name confusion. ipsec_*_policy ->
ipsec_*_pcbpolicy.
- avoid variable name confusion.
(struct inpcbpolicy *)pcb_sp, spp (struct secpolicy **), sp (struct
secpolicy *)
- count number of ipsec encapsulations on ipsec4_output, so that we
can tell ip_output() how to handle the packet further.
- When the value of the ul_proto is ICMP or ICMPV6, the port field in
"src" of the spidx specifies ICMP type, and the port field in "dst"
of the spidx specifies ICMP code.
- avoid from applying IPsec transport mode to the packets when the
kernel forwards the packets.

Tested by: nork
Obtained from: KAME


# 121807 31-Oct-2003 ume

use arc4random.

Obtained from: KAME


# 121765 30-Oct-2003 sam

use a local variable to avoid holding a lock across a call out of view

Supported by: FreeBSD Foundation


# 121315 21-Oct-2003 ume

- change scope to zone.
- change node-local to interface-local.
- better error handling of address-to-scope mapping.
- use in6_clearscope().

Obtained from: KAME


# 121161 17-Oct-2003 ume

- add dom_if{attach,detach} framework.
- transition to use ifp->if_afdata.

Obtained from: KAME


# 120941 09-Oct-2003 ume

- typo in comment
- style
- ANSIfy
(there is no functional change.)

Obtained from: KAME


# 120049 14-Sep-2003 mdodd

Enable IPv6 for Token Ring.


# 111119 19-Feb-2003 imp

Back out M_* changes, per decision of the TRB.

Approved by: trb


# 109623 21-Jan-2003 alfred

Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.


# 108470 30-Dec-2002 schweikh

Fix typos, mostly s/ an / a / where appropriate and a few s/an/and/
Add FreeBSD Id tag where missing.


# 105194 15-Oct-2002 sam

Replace aux mbufs with packet tags:

o instead of a list of mbufs use a list of m_tag structures a la openbsd
o for netgraph et. al. extend the stock openbsd m_tag to include a 32-bit
ABI/module number cookie
o for openbsd compatibility define a well-known cookie MTAG_ABI_COMPAT and
use this in defining openbsd-compatible m_tag_find and m_tag_get routines
o rewrite KAME use of aux mbufs in terms of packet tags
o eliminate the most heavily used aux mbufs by adding an additional struct
inpcb parameter to ip_output and ip6_output to allow the IPsec code to
locate the security policy to apply to outbound packets
o bump __FreeBSD_version so code can be conditionalized
o fixup ipfilter's call to ip_output based on __FreeBSD_version

Reviewed by: julian, luigi (silent), -arch, -net, darren
Approved by: julian, silence from everyone else
Obtained from: openbsd (mostly)
MFC after: 1 month


# 95023 19-Apr-2002 suz

just merged cosmetic changes from KAME to ease sync between KAME and FreeBSD.
(based on freebsd4-snap-20020128)

Reviewed by: ume
MFC after: 1 week


# 92733 19-Mar-2002 peter

Pacify gcc-3.1.


# 78468 19-Jun-2001 sumikawa

Add IFT_L2VLAN for supported NDP type. IPv6 over VLAN works now.

Obtained from: KAME
MFC after: 2 weeks


# 78064 11-Jun-2001 ume

Sync with recent KAME.
This work was based on kame-20010528-freebsd43-snap.tgz and some
critical problem after the snap was out were fixed.
There are many many changes since last KAME merge.

TODO:
- The definitions of SADB_* in sys/net/pfkeyv2.h are still different
from RFC2407/IANA assignment because of binary compatibility
issue. It should be fixed under 5-CURRENT.
- ip6po_m member of struct ip6_pktopts is no longer used. But, it
is still there because of binary compatibility issue. It should
be removed under 5-CURRENT.

Reviewed by: itojun
Obtained from: KAME
MFC after: 3 weeks


# 62587 04-Jul-2000 itojun

sync with kame tree as of july00. tons of bug fixes/improvements.

API changes:
- additional IPv6 ioctls
- IPsec PF_KEY API was changed, it is mandatory to upgrade setkey(8).
(also syntax change)


# 61965 22-Jun-2000 ume

Inhibit successful DAD messages and "no default interface" messages.
It seems that people find them too noisy.
(ND6_DEBUG will enable them)

Obtained from: KAME Project


# 60938 26-May-2000 jake

Back out the previous change to the queue(3) interface.
It was not discussed and should probably not happen.

Requested by: msmith and others


# 60833 23-May-2000 jake

Change the way that the queue(3) structures are declared; don't assume that
the type argument to *_HEAD and *_ENTRY is a struct.

Suggested by: phk
Reviewed by: phk
Approved by: mdodd


# 55009 22-Dec-1999 shin

IPSEC support in the kernel.
pr_input() routines prototype is also changed to support IPSEC and IPV6
chained protocol headers.

Reviewed by: freebsd-arch, cvs-committers
Obtained from: KAME project


# 54263 07-Dec-1999 shin

udp IPv6 support, IPv6/IPv4 tunneling support in kernel,
packet divert at kernel for IPv6/IPv4 translater daemon

This includes queue related patch submitted by jburkhol@home.com.

Submitted by: queue related patch from jburkhol@home.com
Reviewed by: freebsd-arch, cvs-committers
Obtained from: KAME project


# 53626 23-Nov-1999 shin

Removed IPSEC and IPV6FIREWALL because they are not ready yet.


# 53541 22-Nov-1999 shin

KAME netinet6 basic part(no IPsec,no V6 Multicast Forwarding, no UDP/TCP
for IPv6 yet)

With this patch, you can assigne IPv6 addr automatically, and can reply to
IPv6 ping.

Reviewed by: freebsd-arch, cvs-committers
Obtained from: KAME project