History log of /freebsd-11-stable/sys/net/route.h
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# 341241 29-Nov-2018 markj

MFC r340968:
Plug routing sysctl leaks.


# 331722 29-Mar-2018 eadler

Revert r330897:

This was intended to be a non-functional change. It wasn't. The commit
message was thus wrong. In addition it broke arm, and merged crypto
related code.

Revert with prejudice.

This revert skips files touched in r316370 since that commit was since
MFCed. This revert also skips files that require $FreeBSD$ property
changes.

Thank you to those who helped me get out of this mess including but not
limited to gonzo, kevans, rgrimes.

Requested by: gjb (re)


# 330897 14-Mar-2018 eadler

Partial merge of the SPDX changes

These changes are incomplete but are making it difficult
to determine what other changes can/should be merged.

No objections from: pfg


# 320134 20-Jun-2017 ae

MFC r319895:
Resurrect RTF_RNH_LOCKED flag and restore ability to call rtalloc1_fib()
with acquired RIB lock.

This fixes a possible panic due to trying to acquire RIB rlock when it is
already exclusive locked.

PR: 215963, 215122
Sponsored by: Yandex LLC
Approved by: re (delphij)


# 302408 07-Jul-2016 gjb

Copy head@r302406 to stable/11 as part of the 11.0-RELEASE cycle.
Prune svn:mergeinfo from the new branch, as nothing has been merged
here.

Additional commits post-branch will follow.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


/freebsd-11-stable/MAINTAINERS
/freebsd-11-stable/cddl
/freebsd-11-stable/cddl/contrib/opensolaris
/freebsd-11-stable/cddl/contrib/opensolaris/cmd/dtrace/test/tst/common/print
/freebsd-11-stable/cddl/contrib/opensolaris/cmd/zfs
/freebsd-11-stable/cddl/contrib/opensolaris/lib/libzfs
/freebsd-11-stable/contrib/amd
/freebsd-11-stable/contrib/apr
/freebsd-11-stable/contrib/apr-util
/freebsd-11-stable/contrib/atf
/freebsd-11-stable/contrib/binutils
/freebsd-11-stable/contrib/bmake
/freebsd-11-stable/contrib/byacc
/freebsd-11-stable/contrib/bzip2
/freebsd-11-stable/contrib/com_err
/freebsd-11-stable/contrib/compiler-rt
/freebsd-11-stable/contrib/dialog
/freebsd-11-stable/contrib/dma
/freebsd-11-stable/contrib/dtc
/freebsd-11-stable/contrib/ee
/freebsd-11-stable/contrib/elftoolchain
/freebsd-11-stable/contrib/elftoolchain/ar
/freebsd-11-stable/contrib/elftoolchain/brandelf
/freebsd-11-stable/contrib/elftoolchain/elfdump
/freebsd-11-stable/contrib/expat
/freebsd-11-stable/contrib/file
/freebsd-11-stable/contrib/gcc
/freebsd-11-stable/contrib/gcclibs/libgomp
/freebsd-11-stable/contrib/gdb
/freebsd-11-stable/contrib/gdtoa
/freebsd-11-stable/contrib/groff
/freebsd-11-stable/contrib/ipfilter
/freebsd-11-stable/contrib/ldns
/freebsd-11-stable/contrib/ldns-host
/freebsd-11-stable/contrib/less
/freebsd-11-stable/contrib/libarchive
/freebsd-11-stable/contrib/libarchive/cpio
/freebsd-11-stable/contrib/libarchive/libarchive
/freebsd-11-stable/contrib/libarchive/libarchive_fe
/freebsd-11-stable/contrib/libarchive/tar
/freebsd-11-stable/contrib/libc++
/freebsd-11-stable/contrib/libc-vis
/freebsd-11-stable/contrib/libcxxrt
/freebsd-11-stable/contrib/libexecinfo
/freebsd-11-stable/contrib/libpcap
/freebsd-11-stable/contrib/libstdc++
/freebsd-11-stable/contrib/libucl
/freebsd-11-stable/contrib/libxo
/freebsd-11-stable/contrib/llvm
/freebsd-11-stable/contrib/llvm/projects/libunwind
/freebsd-11-stable/contrib/llvm/tools/clang
/freebsd-11-stable/contrib/llvm/tools/lldb
/freebsd-11-stable/contrib/llvm/tools/llvm-dwarfdump
/freebsd-11-stable/contrib/llvm/tools/llvm-lto
/freebsd-11-stable/contrib/mdocml
/freebsd-11-stable/contrib/mtree
/freebsd-11-stable/contrib/ncurses
/freebsd-11-stable/contrib/netcat
/freebsd-11-stable/contrib/ntp
/freebsd-11-stable/contrib/nvi
/freebsd-11-stable/contrib/one-true-awk
/freebsd-11-stable/contrib/openbsm
/freebsd-11-stable/contrib/openpam
/freebsd-11-stable/contrib/openresolv
/freebsd-11-stable/contrib/pf
/freebsd-11-stable/contrib/sendmail
/freebsd-11-stable/contrib/serf
/freebsd-11-stable/contrib/sqlite3
/freebsd-11-stable/contrib/subversion
/freebsd-11-stable/contrib/tcpdump
/freebsd-11-stable/contrib/tcsh
/freebsd-11-stable/contrib/tnftp
/freebsd-11-stable/contrib/top
/freebsd-11-stable/contrib/top/install-sh
/freebsd-11-stable/contrib/tzcode/stdtime
/freebsd-11-stable/contrib/tzcode/zic
/freebsd-11-stable/contrib/tzdata
/freebsd-11-stable/contrib/unbound
/freebsd-11-stable/contrib/vis
/freebsd-11-stable/contrib/wpa
/freebsd-11-stable/contrib/xz
/freebsd-11-stable/crypto/heimdal
/freebsd-11-stable/crypto/openssh
/freebsd-11-stable/crypto/openssl
/freebsd-11-stable/gnu/lib
/freebsd-11-stable/gnu/usr.bin/binutils
/freebsd-11-stable/gnu/usr.bin/cc/cc_tools
/freebsd-11-stable/gnu/usr.bin/gdb
/freebsd-11-stable/lib/libc/locale/ascii.c
/freebsd-11-stable/sys/cddl/contrib/opensolaris
/freebsd-11-stable/sys/contrib/dev/acpica
/freebsd-11-stable/sys/contrib/ipfilter
/freebsd-11-stable/sys/contrib/libfdt
/freebsd-11-stable/sys/contrib/octeon-sdk
/freebsd-11-stable/sys/contrib/x86emu
/freebsd-11-stable/sys/contrib/xz-embedded
/freebsd-11-stable/usr.sbin/bhyve/atkbdc.h
/freebsd-11-stable/usr.sbin/bhyve/bhyvegc.c
/freebsd-11-stable/usr.sbin/bhyve/bhyvegc.h
/freebsd-11-stable/usr.sbin/bhyve/console.c
/freebsd-11-stable/usr.sbin/bhyve/console.h
/freebsd-11-stable/usr.sbin/bhyve/pci_fbuf.c
/freebsd-11-stable/usr.sbin/bhyve/pci_xhci.c
/freebsd-11-stable/usr.sbin/bhyve/pci_xhci.h
/freebsd-11-stable/usr.sbin/bhyve/ps2kbd.c
/freebsd-11-stable/usr.sbin/bhyve/ps2kbd.h
/freebsd-11-stable/usr.sbin/bhyve/ps2mouse.c
/freebsd-11-stable/usr.sbin/bhyve/ps2mouse.h
/freebsd-11-stable/usr.sbin/bhyve/rfb.c
/freebsd-11-stable/usr.sbin/bhyve/rfb.h
/freebsd-11-stable/usr.sbin/bhyve/sockstream.c
/freebsd-11-stable/usr.sbin/bhyve/sockstream.h
/freebsd-11-stable/usr.sbin/bhyve/usb_emul.c
/freebsd-11-stable/usr.sbin/bhyve/usb_emul.h
/freebsd-11-stable/usr.sbin/bhyve/usb_mouse.c
/freebsd-11-stable/usr.sbin/bhyve/vga.c
/freebsd-11-stable/usr.sbin/bhyve/vga.h
# 301502 06-Jun-2016 bz

Provide a public interface to rt_flushifroutes which takes the address
family as an argument as well.
This will be used to cleanup individual protocols during VNET teardown.

Obtained from: projects/vnet
Sponsored by: The FreeBSD Foundation


# 301217 02-Jun-2016 gnn

This change re-adds L2 caching for TCP and UDP, as originally added in D4306
but removed due to other changes in the system. Restore the llentry pointer
to the "struct route", and use it to cache the L2 lookup (ARP or ND6) as
appropriate.

Submitted by: Mike Karels
Differential Revision: https://reviews.freebsd.org/D6262


# 297234 24-Mar-2016 bz

Fix compile errors after r297225:

- properly V_irtualise variable access unbreaking VIMAGE kernels.
- remove the volatile from the function return type to make architecture
using gcc happy [-Wreturn-type]
"type qualifiers ignored on function return type"
I am not entirely happy with this solution putting the u_int there
but it will do for now.


# 297225 24-Mar-2016 gnn

FreeBSD previously provided route caching for TCP (and UDP). Re-add
route caching for TCP, with some improvements. In particular, invalidate
the route cache if a new route is added, which might be a better match.
The cache is automatically invalidated if the old route is deleted.

Submitted by: Mike Karels
Reviewed by: gnn
Differential Revision: https://reviews.freebsd.org/D4306


# 294706 25-Jan-2016 melifaro

MFP r287070,r287073: split radix implementation and route table structure.

There are number of radix consumers in kernel land (pf,ipfw,nfs,route)
with different requirements. In fact, first 3 don't have _any_ requirements
and first 2 does not use radix locking. On the other hand, routing
structure do have these requirements (rnh_gen, multipath, custom
to-be-added control plane functions, different locking).
Additionally, radix should not known anything about its consumers internals.

So, radix code now uses tiny 'struct radix_head' structure along with
internal 'struct radix_mask_head' instead of 'struct radix_node_head'.
Existing consumers still uses the same 'struct radix_node_head' with
slight modifications: they need to pass pointer to (embedded)
'struct radix_head' to all radix callbacks.

Routing code now uses new 'struct rib_head' with different locking macro:
RADIX_NODE_HEAD prefix was renamed to RIB_ (which stands for routing
information base).

New net/route_var.h header was added to hold routing subsystem internal
data. 'struct rib_head' was placed there. 'struct rtentry' will also
be moved there soon.


# 293886 14-Jan-2016 melifaro

Remove now-unused wrappers for various routing functions.


# 293829 13-Jan-2016 melifaro

Remove RTF_RNH_LOCKED support from rtalloc1_fib().

Last caller using it was eliminated in r293471.

Sponsored by: Yandex LLC


# 293656 11-Jan-2016 melifaro

Do not rewrite all ro_flags.


# 293611 09-Jan-2016 melifaro

Fix userland build broken by r293470.

Pointy hat to: melifaro


# 293544 09-Jan-2016 melifaro

Finish r275196: do not dereference rtentry in if_output() routines.

The only piece of information that is required is rt_flags subset.

In particular, if_loop() requires RTF_REJECT and RTF_BLACKHOLE flags
to check if this particular mbuf needs to be dropped (and what
error should be returned).
Note that if_loop() will always return EHOSTUNREACH for "reject" routes
regardless of RTF_HOST flag existence. This is due to upcoming routing
changes where RTF_HOST value won't be available as lookup result.

All other functions require RTF_GATEWAY flag to check if they need
to return EHOSTUNREACH instead of EHOSTDOWN error.

There are 11 places where non-zero 'struct route' is passed to if_output().
For most of the callers (forwarding, bpf, arp) does not care about exact
error value. In fact, the only place where this result is propagated
is ip_output(). (ip6_output() passes NULL route to nd6_output_ifp()).

Given that, add 3 new 'struct route' flags (RT_REJECT, RT_BLACKHOLE and
RT_IS_GW) and inline function (rt_update_ro_flags()) to copy necessary
rte flags to ro_flags. Call this function in ip_output() after looking up/
verifying rte.

Reviewed by: ae


# 293470 09-Jan-2016 melifaro

Remove sys/eventhandler.h from net/route.h

Reviewed by: ae


# 293466 09-Jan-2016 melifaro

(Temporarily) remove route_redirect_event eventhandler.

Such handler should pass different set of variables, instead
of directly providing 2 locked route entries.
Given that it hasn't been really used since at least 2012, remove
current code.
Will re-add it after finishing most major routing-related changes.

Discussed with: np


# 293159 04-Jan-2016 melifaro

Add rib_lookup_info() to provide API for retrieving individual route
entries data in unified format.

There are control plane functions that require information other than
just next-hop data (e.g. individual rtentry fields like flags or
prefix/mask). Given that the goal is to avoid rte reference/refcounting,
re-use rt_addrinfo structure to store most rte fields. If caller wants
to retrieve key/mask or gateway (which are sockaddrs and are allocated
separately), it needs to provide sufficient-sized sockaddrs structures
w/ ther pointers saved in passed rt_addrinfo.

Convert:
* lltable new records checks (in_lltable_rtcheck(),
nd6_is_new_addr_neighbor().
* rtsock pre-add/change route check.
* IPv6 NS ND-proxy check (RADIX_MPATH code was eliminated because
1) we don't support RTF_ANNOUNCE ND-proxy for networks and there should
not be multiple host routes for such hosts 2) if we have multiple
routes we should inspect them (which is not done). 3) the entire idea
of abusing KRT as storage for ND proxy seems odd. Userland programs
should be used for that purpose).


# 293098 03-Jan-2016 melifaro

Handle IPV6_PATHMTU option by spliting ip6_getpmtu_ctl() from ip6_getpmtu().
Add ro_mtu field to 'struct route' to be able to pass lookup MTU back to
the caller.

Currently, ip6_getpmtu() has 2 totally different use cases:
1) control plane (IPV6_PATHMTU req), where we just need to calculate MTU
and return it, w/o any reusability.
2) Actual ip6_output() data path where we (nearly) always use the provided
route lookup data. If this data is not 'valid' we need to perform another
lookup and save the result (which cannot be re-used by ip6_output()).

Given that, handle 1) by calling separate function doing rte lookup itself.
Resulting MTU is calculated by (newly-added) ip6_calcmtu() used by both
ip6_getpmtu_ctl() and ip6_getpmtu().
For 2) instead of storing ref'ed rte, store mtu (the only needed data
from the lookup result) inside newly-added ro_mtu field.
'struct route' was shrinked by 8(or 4 bytes) in r292978. Grow it again
by 4 bytes. New ro_mtu field will be used in other places like
ip/tcp_output (EMSGSIZE handling from output routines).

Reviewed by: ae


# 292978 31-Dec-2015 melifaro

Implement interface link header precomputation API.

Add if_requestencap() interface method which is capable of calculating
various link headers for given interface. Right now there is support
for INET/INET6/ARP llheader calculation (IFENCAP_LL type request).
Other types are planned to support more complex calculation
(L2 multipath lagg nexthops, tunnel encap nexthops, etc..).

Reshape 'struct route' to be able to pass additional data (with is length)
to prepend to mbuf.

These two changes permits routing code to pass pre-calculated nexthop data
(like L2 header for route w/gateway) down to the stack eliminating the
need for other lookups. It also brings us closer to more complex scenarios
like transparently handling MPLS nexthops and tunnel interfaces.
Last, but not least, it removes layering violation introduced by flowtable
code (ro_lle) and simplifies handling of existing if_output consumers.

ARP/ND changes:
Make arp/ndp stack pre-calculate link header upon installing/updating lle
record. Interface link address change are handled by re-calculating
headers for all lles based on if_lladdr event. After these changes,
arpresolve()/nd6_resolve() returns full pre-calculated header for
supported interfaces thus simplifying if_output().
Move these lookups to separate ether_resolve_addr() function which ether
returs error or fully-prepared link header. Add <arp|nd6_>resolve_addr()
compat versions to return link addresses instead of pre-calculated data.

BPF changes:
Raw bpf writes occupied _two_ cases: AF_UNSPEC and pseudo_AF_HDRCMPLT.
Despite the naming, both of there have ther header "complete". The only
difference is that interface source mac has to be filled by OS for
AF_UNSPEC (controlled via BIOCGHDRCMPLT). This logic has to stay inside
BPF and not pollute if_output() routines. Convert BPF to pass prepend data
via new 'struct route' mechanism. Note that it does not change
non-optimized if_output(): ro_prepend handling is purely optional.
Side note: hackish pseudo_AF_HDRCMPLT is supported for ethernet and FDDI.
It is not needed for ethernet anymore. The only remaining FDDI user is
dev/pdq mostly untouched since 2007. FDDI support was eliminated from
OpenBSD in 2013 (sys/net/if_fddisubr.c rev 1.65).

Flowtable changes:
Flowtable violates layering by saving (and not correctly managing)
rtes/lles. Instead of passing lle pointer, pass pointer to pre-calculated
header data from that lle.

Differential Revision: https://reviews.freebsd.org/D4102


# 292333 16-Dec-2015 melifaro

Provide additional lle data in IPv6 lltable dump used by ndp(8).

Before the change, things like lle state were queried via
SIOCGNBRINFO_IN6 by ndp(8) for _each_ lle entry in dump.
This ioctl was added in 1999, probably to avoid touching rtsock code.

This change maps SIOCGNBRINFO_IN6 data to standard rtsock dump the
following way:
expire (already) maps to rtm_rmx.rmx_expire
isrouter -> rtm_flags & RTF_GATEWAY
asked -> rtm_rmx.rmx_pksent
state -> rtm_rmx.rmx_state (maps to rmx_weight via define)

Reviewed by: ae


# 291993 08-Dec-2015 melifaro

Merge helper fib* functions used for basic lookups.

Vast majority of rtalloc(9) users require only basic info from
route table (e.g. "does the rtentry interface match with the interface
I have?". "what is the MTU?", "Give me the IPv4 source address to use",
etc..).
Instead of hand-rolling lookups, checking if rtentry is up, valid,
dealing with IPv6 mtu, finding "address" ifp (almost never done right),
provide easy-to-use API hiding all the complexity and returning the
needed info into small on-stack structure.

This change also helps hiding route subsystem internals (locking, direct
rtentry accesses).
Additionaly, using this API improves lookup performance since rtentry is not
locked.
(This is safe, since all the rtentry changes happens under both radix WLOCK
and rtentry WLOCK).

Sponsored by: Yandex LLC


# 291466 30-Nov-2015 melifaro

Add new rt_foreach_fib_walk_del() function for deleting route entries
by filter function instead of picking into routing table details in
each consumer.
Remove now-unused rt_expunge() (eliminating last external RTF_RNH_LOCKED
user).
This simplifies future nexthops/mulitipath changes and rtrequest1_fib()
locking refactoring.

Actual changes:
Add "rt_chain" field to permit rte grouping while doing batched delete
from routing table (thus growing rte 200->208 on amd64).
Add "rti_filter" / "rti_filterdata" / "rti_spare" fields to rt_addrinfo
to pass filter function to various routing subsystems in standard way.
Convert all rt_expunge() customers to new rt_addinfo-based api and eliminate
rt_expunge().


# 289461 17-Oct-2015 melifaro

Remove several compat functions from pre-fib era.


# 287476 05-Sep-2015 melifaro

Constantify lookup key in ifa_ifwith* functions.
Some places in our network stack already have const
arguments (like if_output() routines and LLE functions).

Code using ifa_ifwith (and similar functins) along with
LLE/_output functions is currently bound to use tricks
like __DECONST(). Provide a cleaner way by making sockaddr
lookup key really constant.

MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D3464


# 286594 10-Aug-2015 melifaro

Rename rt_foreach_fib() to rt_foreach_fib_walk().

Suggested by: julian


# 286458 08-Aug-2015 melifaro

MFP r274295:

* Move interface route cleanup to route.c:rt_flushifroutes()
* Convert most of "for (fibnum = 0; fibnum < rt_numfibs; fibnum++)" users
to use new rt_foreach_fib() instead of hand-rolling cycles.


# 274611 16-Nov-2014 melifaro

Finish r274175: do control plane MTU tracking.

Update route MTU in case of ifnet MTU change.
Add new RTF_FIXEDMTU to track explicitly specified MTU.

Old behavior:
ifconfig em0 mtu 1500->9000 -> all routes traversing em0 do not change MTU.
User has to manually update all routes.
ifconfig em0 mtu 9000->1500 -> all routes traversing em0 do not change MTU.
However, if ip[6]_output finds route with rt_mtu > interface mtu, rt_mtu
gets updated.

New behavior:
ifconfig em0 mtu 1500->9000 -> all interface routes in all fibs gets updated
with new MTU unless RTF_FIXEDMTU flag set on them.
ifconfig em0 mtu 9000->1500 -> all routes in all fibs gets updated with new
MTU unless RTF_FIXEDMTU flag set on them AND rt_mtu is less than ifp mtu.

route add ... -mtu XXX automatically sets RTF_FIXEDMTU flag.
route change .. -mtu 0 automatically removes RTF_FIXEDMTU flag.

PR: 194238
MFC after: 1 month
CR: D1125


# 271923 21-Sep-2014 hrs

Fix build.


# 271916 21-Sep-2014 hrs

Make net.add_addr_allfibs vnet-local.


# 265279 03-May-2014 melifaro

Pass radix head ptr along with rte to rtexpunge().
Rename rtexpunge to rt_expunge().


# 265103 29-Apr-2014 melifaro

Move rt_setmetrics() from rtsock.c to route.c.
All rtsock-initiated rte creation/modification are now
performed in route.c holding radix tree write lock.
This reduces the need for per-rte mutex.

Sponsored by: Yandex LLC
MFC after: 1 month


# 264973 26-Apr-2014 melifaro

Unify sa_equal() macro usage.

MFC after: 2 weeks


# 264678 19-Apr-2014 jmg

garbage collect something that hasn't been triggered in almost 5 years...
the last consumer was removed a couple years ago...


# 263203 15-Mar-2014 glebius

Garbage collect long time obsoleted (or never used) stuff from routing API.


# 262806 05-Mar-2014 glebius

The route code used to mtx_destroy() a locked mutex before rtentry free. Now,
after r262763 it started to return locked mutexes to UMA. To fix that,
conditionally unlock the mutex in the destructor.

Tested by: "Sergey V. Dyatko" <sergey.dyatko@gmail.com>


# 262767 04-Mar-2014 glebius

Hide struct rtentry from userland.


# 262763 04-Mar-2014 glebius

- Remove rt_metrics_lite and simply put its members into rtentry.
- Use counter(9) for rt_pksent (former rt_rmx.rmx_pksent). This
removes another cache trashing ++ from packet forwarding path.
- Create zini/fini methods for the rtentry UMA zone. Via initialize
mutex and counter in them.
- Fix reporting of rmx_pksent to routing socket.
- Fix netstat(1) to report "Use" both in kvm(3) and sysctl(3) mode.

The change is mostly targeted for stable/10 merge. For head,
rt_pksent is expected to just disappear.

Discussed with: melifaro
Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# 260508 10-Jan-2014 melifaro

Simplify inet alias handling code: if we're adding/removing alias which
has the same prefix as some other alias on the same interface, use
newly-added rt_addrmsg() instead of hand-rolled in_addralias_rtmsg().

This eliminates the following rtsock messages:

Pinned RTM_ADD for prefix (for alias addition).
Pinned RTM_DELETE for prefix (for alias withdrawal).

Example (got 10.0.0.1/24 on vlan4, playing with 10.0.0.2/24):

before commit, addition:

got message of size 116 on Fri Jan 10 14:13:15 2014
RTM_NEWADDR: address being added to iface: len 116, metric 0, flags:
sockaddrs: <NETMASK,IFP,IFA,BRD>
255.255.255.0 vlan4:8.0.27.c5.29.d4 10.0.0.2 10.0.0.255

got message of size 192 on Fri Jan 10 14:13:15 2014
RTM_ADD: Add Route: len 192, pid: 0, seq 0, errno 0, flags:<UP,PINNED>
locks: inits:
sockaddrs: <DST,GATEWAY,NETMASK>
10.0.0.0 10.0.0.2 (255) ffff ffff ff

after commit, addition:

got message of size 116 on Fri Jan 10 13:56:26 2014
RTM_NEWADDR: address being added to iface: len 116, metric 0, flags:
sockaddrs: <NETMASK,IFP,IFA,BRD>
255.255.255.0 vlan4:8.0.27.c5.29.d4 14.0.0.2 14.0.0.255

before commit, wihdrawal:

got message of size 192 on Fri Jan 10 13:58:59 2014
RTM_DELETE: Delete Route: len 192, pid: 0, seq 0, errno 0, flags:<UP,PINNED>
locks: inits:
sockaddrs: <DST,GATEWAY,NETMASK>
10.0.0.0 10.0.0.2 (255) ffff ffff ff

got message of size 116 on Fri Jan 10 13:58:59 2014
RTM_DELADDR: address being removed from iface: len 116, metric 0, flags:
sockaddrs: <NETMASK,IFP,IFA,BRD>
255.255.255.0 vlan4:8.0.27.c5.29.d4 10.0.0.2 10.0.0.255

adter commit, withdrawal:

got message of size 116 on Fri Jan 10 14:14:11 2014
RTM_DELADDR: address being removed from iface: len 116, metric 0, flags:
sockaddrs: <NETMASK,IFP,IFA,BRD>
255.255.255.0 vlan4:8.0.27.c5.29.d4 10.0.0.2 10.0.0.255

Sending both RTM_ADD/RTM_DELETE messages to rtsock is completely wrong
(and requires some hacks to keep prefix in route table on RTM_DELETE).

I've tested this change with quagga (no change) and bird (*).

bird alias handling is already broken in *BSD sysdep code, so nothing
changes here, too.

I'm going to MFC this change if there will be no complains about behavior
change.

While here, fix some style(9) bugs introduced by r260488
(pointed by glebius and bde).

Sponsored by: Yandex LLC
MFC after: 4 weeks


# 260488 09-Jan-2014 melifaro

Split rt_newaddrmsg_fib() into two different functions.
Adding/deleting interface addresses involves access to 3 different subsystems,
int different parts of code. Each call can fail, so reporting successful
operation by rtsock in the middle of the process error-prone.

Further split routing notification API and actual rtsock calls via creating
public-available rt_addrmsg() / rt_routemsg() functions with "private"
rtsock_* backend.

MFC after: 2 weeks


# 260460 08-Jan-2014 melifaro

Constanly use RT_ALL_FIBS everywhere instead of -1.

MFC after: 2 weeks


# 252184 24-Jun-2013 qingli

Due to the routing related networking kernel redesign work
in FBSD 8.0, interface routes have been returened to the
applications without the RTF_GATEWAY bit. This incompatibility
has caused some issues with Zebra, Qugga and the like.
This patch provides the RTF_GATEWAY flag bit in returned interface
routes so to behave similarly to pre 8.0 systems.

Reviewed by: hrs
Verified by: mackn at opendns dot com


# 248070 08-Mar-2013 melifaro

Fix long-standing issue with interface routes being unprotected:
Use RTM_PINNED flag to mark route as immutable.
Forbid deleting immutable routes without special rtrequest1_fib() flag.
Adding interface address with prefix already in route table is handled
by atomically deleting old prefix and adding interface one.

Discussed with: andre, eri
MFC after: 3 weeks


# 238092 04-Jul-2012 glebius

When ip_output()/ip6_output() is supplied a struct route *ro argument,
it skips FLOWTABLE lookup. However, the non-NULL ro has dual meaning
here: it may be supplied to provide route, and it may be supplied to
store and return to caller the route that ip_output()/ip6_output()
finds. In the latter case skipping FLOWTABLE lookup is pessimisation.

The difference between struct route filled by FLOWTABLE and filled
by rtalloc() family is that the former doesn't hold a reference on
its rtentry. Reference is hold by flow entry, and it is about to
be released in future. Thus, route filled by FLOWTABLE shouldn't
be passed to RTFREE() macro.

- Introduce new flag for struct route/route_in6, that marks route
not holding a reference on rtentry.
- Introduce new macro RO_RTFREE() that cleans up a struct route
depending on its kind.
- All callers to ip_output()/ip6_output() that do supply non-NULL
but empty route should use RO_RTFREE() to free results of
lookup.
- ip_output()/ip6_output() now do FLOWTABLE lookup always when
ro->ro_rt == NULL.

Tested by: tuexen (SCTP part)


# 233113 18-Mar-2012 bz

Hide kernel option ROUTETABLES evaluations in the implementation
rather than the header file. With this also move RT_MAXFIBS and
RT_NUMFIBS into the implemantion to avoid further usage in other
code. rt_numfibs is all that should be needed.

This allows users to change the number of FIBs from 1..RT_MAXFIBS(16)
dynamically using the tunable without the need to change the kernel
config for the maximum anymore. This means that thet multi-FIB
feature is now fully available with GENERIC kernels.
The kernel option ROUTETABLES can still be used to set the default
numbers of FIBs in absence of the tunable.

Ok.ed by: julian, hrs, melifaro
MFC after: 2 weeks


# 231852 17-Feb-2012 bz

Merge multi-FIB IPv6 support from projects/multi-fibv6/head/:

Extend the so far IPv4-only support for multiple routing tables (FIBs)
introduced in r178888 to IPv6 providing feature parity.

This includes an extended rtalloc(9) KPI for IPv6, the necessary
adjustments to the network stack, and user land support as in netstat.

Sponsored by: Cisco Systems, Inc.
Reviewed by: melifaro (basically)
MFC after: 10 days


# 225837 28-Sep-2011 bz

Pass the fibnum where we need filtering of the message on the
rtsock allowing routing daemons to filter routing updates on an
rtsock per FIB.

Adjust raw_input() and split it into wrapper and a new function
taking an optional callback argument even though we only have one
consumer [1] to keep the hackish flags local to rtsock.c.

PR: kern/134931
Submitted by: multiple (see PR)
Suggested by: rwatson [1]
Reviewed by: rwatson
MFC after: 3 days


# 225698 20-Sep-2011 kmacy

Make KBI changes required for future MFCing of inpcb rtentry / llentry caching.

Reviewed by: rwatson, bz
Approved by: re (kib)


# 223359 21-Jun-2011 bz

Garbage collect never used global, sysctl, externs.

MFC after: 1 week


# 219791 20-Mar-2011 dchagin

Remove dead code.

MFC after: 1 Week


# 205222 16-Mar-2010 qingli

Verify interface up status using its link state only
if the interface has such capability. The interface
capability flag indicates whether such capability
exists. This approach is much more backward compatible.
Physical device driver changes will be part of another
commit.

Also updated the ifconfig utility to show the LINKSTATE
capability if present.

Reviewed by: rwatson, imp, juli
MFC after: 3 days


# 205024 11-Mar-2010 qingli

The if_tap interface is of IFT_ETHERNET type, but it
does not set or update the if_link_state variable.
As such RT_LINK_IS_UP() fails for the if_tap interface.

Also, the RT_LINK_IS_UP() needs to bypass all loopback
interfaces because loopback interfaces are considered
up logically as long as the system is running.

This patch fixes the above issues by setting and updating
the if_link_state variable when the tap interface is
opened or closed respectively. Similary approach is
already done in the if_tun device.

MFC after: 3 days


# 204902 08-Mar-2010 qingli

One of the advantages of enabling ECMP (a.k.a RADIX_MPATH) is to
allow for connection load balancing across interfaces. Currently
the address alias handling method is colliding with the ECMP code.
For example, when two interfaces are configured on the same prefix,
only one prefix route is installed. So connection load balancing
among the available interfaces is not possible.

The other advantage of ECMP is for failover. The issue with the
current code, is that the interface link-state is not reflected
in the route entry. For example, if there are two interfaces on
the same prefix, the cable on one interface is unplugged, new and
existing connections should switch over to the other interface.
This is not done today and packets go into a black hole.

Also, there is a small bug in the kernel where deleting ECMP routes
in the userland will always return an error even though the command
is successfully executed.

MFC after: 5 days


# 201282 30-Dec-2009 qingli

The proxy arp entries could not be added into the system over the
IFF_POINTOPOINT link types. The reason was due to the routing
entry returned from the kernel covering the remote end is of an
interface type that does not support ARP. This patch fixes this
problem by providing a hint to the kernel routing code, which
indicates the prefix route instead of the PPP host route should
be returned to the caller. Since a host route to the local end
point is also added into the routing table, and there could be
multiple such instantiations due to multiple PPP links can be
created with the same local end IP address, this patch also fixes
the loopback route installation failure problem observed prior to
this patch. The reference count of loopback route to local end would
be either incremented or decremented. The first instantiation would
create the entry and the last removal would delete the route entry.

MFC after: 5 days


# 196995 08-Sep-2009 np

Add arp_update_event. This replaces route_arp_update_event, which
has not worked since the arp-v2 rewrite.

The event handler will be called with the llentry write-locked and
can examine la_flags to determine whether the entry is being added
or removed.

Reviewed by: gnn, kmacy
Approved by: gnn (mentor)
MFC after: 1 month


# 194629 22-Jun-2009 bz

Collect all VIMAGE_GLOBALS variables in one place.

No longer export rt_tables as all lookups go through
rt_tables_get_rnh().

We cannot make rt_tables (and rtstat, rttrash[1]) static as
netstat -r (-rs[1]) would stop working on a stripped
VIMAGE_GLOBALS kernel.

Reviewed by: zec
Presumably broken by: phk 13.5y ago in r12820 [1]


# 193232 01-Jun-2009 bz

Convert the two dimensional array to be malloced and introduce
an accessor function to get the correct rnh pointer back.

Update netstat to get the correct pointer using kvm_read()
as well.

This not only fixes the ABI problem depending on the kernel
option but also permits the tunable to overwrite the kernel
option at boot time up to MAXFIBS, enlarging the number of
FIBs without having to recompile. So people could just use
GENERIC now.

Reviewed by: julian, rwatson, zec
X-MFC: not possible


# 191148 16-Apr-2009 kmacy

Change if_output to take a struct route as its fourth argument in order
to allow passing a cached struct llentry * down to L2

Reviewed by: rwatson


# 191124 15-Apr-2009 kmacy

revert RTM_VERSION change - it doesn't do what I thought it does and changing breaks
ifconfig needlessly


# 191117 15-Apr-2009 kmacy

add an llentry to struct route{_in6} to allow it to be passed around with
the rtentry


# 191080 14-Apr-2009 kmacy

Extend route command:
- add show as alias for get
- add weights to allow mpath to do more than equal cost
- add sticky / nostick to disable / re-enable per-connection load balancing

This adds a field to rt_metrics_lite so network bits of world will need to be re-built.

Reviewed by: jeli & qingli


# 187094 12-Jan-2009 qingli

Revive the RTF_LLINFO flag in route.h. The kernel code is guarded
by the new kernel option COMPAT_ROUTE_FLAGS for binary backward
compatibility. The RTF_LLDATA flag maps to the same value as RTF_LLINFO.
RTF_LLDATA is used by the arp and ndp utilities. The RTF_LLDATA flag is
always returned to the userland regardless whether the COMPAT_ROUTE_FLAGS
is defined.


# 186500 26-Dec-2008 qingli

This checkin addresses a couple of issues:
1. The "route" command allows route insertion through the interface-direct
option "-iface". During if_attach(), an sockaddr_dl{} entry is created
for the interface and is part of the interface address list. This
sockaddr_dl{} entry describes the interface in detail. The "route"
command selects this entry as the "gateway" object when the "-iface"
option is present. The "arp" and "ndp" commands also interact with the
kernel through the routing socket when adding and removing static L2
entries. The static L2 information is also provided through the
"gateway" object with an AF_LINK family type, similar to what is
provided by the "route" command. In order to differentiate between
these two types of operations, a RTF_LLDATA flag is introduced. This
flag is set by the "arp" and "ndp" commands when issuing the add and
delete commands. This flag is also set in each L2 entry returned by the
kernel. The "arp" and "ndp" command follows a convention where a RTM_GET
is issued first followed by a RTM_ADD/DELETE. This RTM_GET request fills
in the fields for a "rtm" object, which is reinjected into the kernel by
a subsequent RTM_ADD/DELETE command. The entry returend from RTM_GET
is a prefix route, so the RTF_LLDATA flag must be specified when issuing
the RTM_ADD/DELETE messages.

2. Enforce the convention that NET_RT_FLAGS with a 0 w_arg is the
specification for retrieving L2 information. Also optimized the
code logic.

Reviewed by: julian


# 186119 15-Dec-2008 qingli

This main goals of this project are:
1. separating L2 tables (ARP, NDP) from the L3 routing tables
2. removing as much locking dependencies among these layers as
possible to allow for some parallelism in the search operations
3. simplify the logic in the routing code,

The most notable end result is the obsolescent of the route
cloning (RTF_CLONING) concept, which translated into code reduction
in both IPv4 ARP and IPv6 NDP related modules, and size reduction in
struct rtentry{}. The change in design obsoletes the semantics of
RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland
applications such as "arp" and "ndp" have been modified to reflect
those changes. The output from "netstat -r" shows only the routing
entries.

Quite a few developers have contributed to this project in the
past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and
Andre Oppermann. And most recently:

- Kip Macy revised the locking code completely, thus completing
the last piece of the puzzle, Kip has also been conducting
active functional testing
- Sam Leffler has helped me improving/refactoring the code, and
provided valuable reviews
- Julian Elischer setup the perforce tree for me and has helped
me maintaining that branch before the svn conversion


# 185747 07-Dec-2008 kmacy

- convert radix node head lock from mutex to rwlock
- make radix node head lock not recursive
- fix LOR in rtexpunge
- fix LOR in rtredirect

Reviewed by: sam


# 183200 20-Sep-2008 zec

Move #defines for MRT-related constants from net/route.c to
net/route.h, because the vnet code will need those constants as
well.

Reviewed by: bz
Approved by: julian (mentor)
MFC after: never


# 183017 14-Sep-2008 julian

come on Julian, make up if you're committing one change or the other.
fix braino


# 183013 14-Sep-2008 julian

Revert a part of the MRT commit that proved un-needed.
rt_check() in its original form proved to be sufficient and
rt_check_fib() can go away (as can its evil twin in_rt_check()).

I believe this does NOT address the crashes people have been seeing
in rt_check.

MFC after: 1 week


# 182801 05-Sep-2008 julian

Be consistent about whether these multi-lined macros are separated by
a blank line. Some were, some weren't. Decide in favour of the line
as it matches what an inline would do, and it's easier to read.


# 178898 10-May-2008 julian

move a #define from a place it shouldn't have been to a place it should
have been. Basically my testign didn't ocver one case that this broke.
thanks tinderbox!


# 178888 09-May-2008 julian

Add code to allow the system to handle multiple routing tables.
This particular implementation is designed to be fully backwards compatible
and to be MFC-able to 7.x (and 6.x)

Currently the only protocol that can make use of the multiple tables is IPv4
Similar functionality exists in OpenBSD and Linux.

From my notes:

-----

One thing where FreeBSD has been falling behind, and which by chance I
have some time to work on is "policy based routing", which allows
different
packet streams to be routed by more than just the destination address.

Constraints:
------------

I want to make some form of this available in the 6.x tree
(and by extension 7.x) , but FreeBSD in general needs it so I might as
well do it in -current and back port the portions I need.

One of the ways that this can be done is to have the ability to
instantiate multiple kernel routing tables (which I will now
refer to as "Forwarding Information Bases" or "FIBs" for political
correctness reasons). Which FIB a particular packet uses to make
the next hop decision can be decided by a number of mechanisms.
The policies these mechanisms implement are the "Policies" referred
to in "Policy based routing".

One of the constraints I have if I try to back port this work to
6.x is that it must be implemented as a EXTENSION to the existing
ABIs in 6.x so that third party applications do not need to be
recompiled in timespan of the branch.

This first version will not have some of the bells and whistles that
will come with later versions. It will, for example, be limited to 16
tables in the first commit.
Implementation method, Compatible version. (part 1)
-------------------------------
For this reason I have implemented a "sufficient subset" of a
multiple routing table solution in Perforce, and back-ported it
to 6.x. (also in Perforce though not always caught up with what I
have done in -current/P4). The subset allows a number of FIBs
to be defined at compile time (8 is sufficient for my purposes in 6.x)
and implements the changes needed to allow IPV4 to use them. I have not
done the changes for ipv6 simply because I do not need it, and I do not
have enough knowledge of ipv6 (e.g. neighbor discovery) needed to do it.

Other protocol families are left untouched and should there be
users with proprietary protocol families, they should continue to work
and be oblivious to the existence of the extra FIBs.

To understand how this is done, one must know that the current FIB
code starts everything off with a single dimensional array of
pointers to FIB head structures (One per protocol family), each of
which in turn points to the trie of routes available to that family.

The basic change in the ABI compatible version of the change is to
extent that array to be a 2 dimensional array, so that
instead of protocol family X looking at rt_tables[X] for the
table it needs, it looks at rt_tables[Y][X] when for all
protocol families except ipv4 Y is always 0.
Code that is unaware of the change always just sees the first row
of the table, which of course looks just like the one dimensional
array that existed before.

The entry points rtrequest(), rtalloc(), rtalloc1(), rtalloc_ign()
are all maintained, but refer only to the first row of the array,
so that existing callers in proprietary protocols can continue to
do the "right thing".
Some new entry points are added, for the exclusive use of ipv4 code
called in_rtrequest(), in_rtalloc(), in_rtalloc1() and in_rtalloc_ign(),
which have an extra argument which refers the code to the correct row.

In addition, there are some new entry points (currently called
rtalloc_fib() and friends) that check the Address family being
looked up and call either rtalloc() (and friends) if the protocol
is not IPv4 forcing the action to row 0 or to the appropriate row
if it IS IPv4 (and that info is available). These are for calling
from code that is not specific to any particular protocol. The way
these are implemented would change in the non ABI preserving code
to be added later.

One feature of the first version of the code is that for ipv4,
the interface routes show up automatically on all the FIBs, so
that no matter what FIB you select you always have the basic
direct attached hosts available to you. (rtinit() does this
automatically).

You CAN delete an interface route from one FIB should you want
to but by default it's there. ARP information is also available
in each FIB. It's assumed that the same machine would have the
same MAC address, regardless of which FIB you are using to get
to it.

This brings us as to how the correct FIB is selected for an outgoing
IPV4 packet.

Firstly, all packets have a FIB associated with them. if nothing
has been done to change it, it will be FIB 0. The FIB is changed
in the following ways.

Packets fall into one of a number of classes.

1/ locally generated packets, coming from a socket/PCB.
Such packets select a FIB from a number associated with the
socket/PCB. This in turn is inherited from the process,
but can be changed by a socket option. The process in turn
inherits it on fork. I have written a utility call setfib
that acts a bit like nice..

setfib -3 ping target.example.com # will use fib 3 for ping.

It is an obvious extension to make it a property of a jail
but I have not done so. It can be achieved by combining the setfib and
jail commands.

2/ packets received on an interface for forwarding.
By default these packets would use table 0,
(or possibly a number settable in a sysctl(not yet)).
but prior to routing the firewall can inspect them (see below).
(possibly in the future you may be able to associate a FIB
with packets received on an interface.. An ifconfig arg, but not yet.)

3/ packets inspected by a packet classifier, which can arbitrarily
associate a fib with it on a packet by packet basis.
A fib assigned to a packet by a packet classifier
(such as ipfw) would over-ride a fib associated by
a more default source. (such as cases 1 or 2).

4/ a tcp listen socket associated with a fib will generate
accept sockets that are associated with that same fib.

5/ Packets generated in response to some other packet (e.g. reset
or icmp packets). These should use the FIB associated with the
packet being reponded to.

6/ Packets generated during encapsulation.
gif, tun and other tunnel interfaces will encapsulate using the FIB
that was in effect withthe proces that set up the tunnel.
thus setfib 1 ifconfig gif0 [tunnel instructions]
will set the fib for the tunnel to use to be fib 1.

Routing messages would be associated with their
process, and thus select one FIB or another.
messages from the kernel would be associated with the fib they
refer to and would only be received by a routing socket associated
with that fib. (not yet implemented)

In addition Netstat has been edited to be able to cope with the
fact that the array is now 2 dimensional. (It looks in system
memory using libkvm (!)). Old versions of netstat see only the first FIB.

In addition two sysctls are added to give:
a) the number of FIBs compiled in (active)
b) the default FIB of the calling process.

Early testing experience:
-------------------------

Basically our (IronPort's) appliance does this functionality already
using ipfw fwd but that method has some drawbacks.

For example,
It can't fully simulate a routing table because it can't influence the
socket's choice of local address when a connect() is done.

Testing during the generating of these changes has been
remarkably smooth so far. Multiple tables have co-existed
with no notable side effects, and packets have been routes
accordingly.

ipfw has grown 2 new keywords:

setfib N ip from anay to any
count ip from any to any fib N

In pf there seems to be a requirement to be able to give symbolic names to the
fibs but I do not have that capacity. I am not sure if it is required.

SCTP has interestingly enough built in support for this, called VRFs
in Cisco parlance. it will be interesting to see how that handles it
when it suddenly actually does something.

Where to next:
--------------------

After committing the ABI compatible version and MFCing it, I'd
like to proceed in a forward direction in -current. this will
result in some roto-tilling in the routing code.

Firstly: the current code's idea of having a separate tree per
protocol family, all of the same format, and pointed to by the
1 dimensional array is a bit silly. Especially when one considers that
there is code that makes assumptions about every protocol having the
same internal structures there. Some protocols don't WANT that
sort of structure. (for example the whole idea of a netmask is foreign
to appletalk). This needs to be made opaque to the external code.

My suggested first change is to add routing method pointers to the
'domain' structure, along with information pointing the data.
instead of having an array of pointers to uniform structures,
there would be an array pointing to the 'domain' structures
for each protocol address domain (protocol family),
and the methods this reached would be called. The methods would have
an argument that gives FIB number, but the protocol would be free
to ignore it.

When the ABI can be changed it raises the possibilty of the
addition of a fib entry into the "struct route". Currently,
the structure contains the sockaddr of the desination, and the resulting
fib entry. To make this work fully, one could add a fib number
so that given an address and a fib, one can find the third element, the
fib entry.

Interaction with the ARP layer/ LL layer would need to be
revisited as well. Qing Li has been working on this already.

This work was sponsored by Ironport Systems/Cisco

Reviewed by: several including rwatson, bz and mlair (parts each)
Obtained from: Ironport systems/Cisco


# 178167 13-Apr-2008 qingli

This patch provides the back end support for equal-cost multi-path
(ECMP) for both IPv4 and IPv6. Previously, multipath route insertion
is disallowed. For example,

route add -net 192.103.54.0/24 10.9.44.1
route add -net 192.103.54.0/24 10.9.44.2

The second route insertion will trigger an error message of
"add net 192.103.54.0/24: gateway 10.2.5.2: route already in table"

Multiple default routes can also be inserted. Here is the netstat
output:

default 10.2.5.1 UGS 0 3074 bge0 =>
default 10.2.5.2 UGS 0 0 bge0

When multipath routes exist, the "route delete" command requires
a specific gateway to be specified or else an error message would
be displayed. For example,

route delete default

would fail and trigger the following error message:

"route: writing to routing socket: No such process"
"delete net default: not in table"

On the other hand,

route delete default 10.2.5.2

would be successful: "delete net default: gateway 10.2.5.2"

One does not have to specify a gateway if there is only a single
route for a particular destination.

I need to perform more testings on address aliases and multiple
interfaces that have the same IP prefixes. This patch as it
stands today is not yet ready for prime time. Therefore, the ECMP
code fragments are fully guarded by the RADIX_MPATH macro.
Include the "options RADIX_MPATH" in the kernel configuration
to enable this feature.

Reviewed by: robert, sam, gnn, julian, kmacy


# 174934 27-Dec-2007 mux

Add a workaround for a deadlock between the rt_setgate() and rt_check()
functions. It is easily triggered by running routed, and, I expect, by
running any other daemon that uses routing sockets.

Reviewed by: net@
MFC after: 1 week


# 174703 17-Dec-2007 kmacy

widen the routing event interface (arp update, redirect, and eventually pmtu change)
into separate functions

revert previous commit's changes to arpresolve and add a new interface
arpresolve2 which does arp resolution without an mbuf


# 174559 12-Dec-2007 kmacy

add interface for allowing consumers to register for ARP updates,
redirects, and path MTU changes

Reviewed by: silby


# 156750 15-Mar-2006 andre

- Fill in the correct rtm_index for RTM_ADD and RTM_CHANGE messages.

- Allow RTM_CHANGE to change a number of route flags as specified by
RTF_FMASK.

- The unused rtm_use field in struct rt_msghdr is redesignated as
rtm_fmask field to communicate route flag changes in RTM_CHANGE
messages from userland. The use count of a route was moved to
rtm_rmx a long time ago. For source code compatibility reasons
a define of rtm_use to rtm_fmask is provided.

These changes faciliate running of multiple cooperating routing
daemons at the same time without causing undesired interference.
Open[BGP|OSPF]D make use of these features to have IGP routes
override EGP ones.

Obtained from: OpenBSD (claudio@)
MFC after: 3 days


# 150130 14-Sep-2005 andre

Remove bogous semicolons at the end of the definitions of
'do { ... } while (0)' macros.

PR: kern/83088
Sumbitted by: <antoine.brodin at laposte.net>


# 139823 06-Jan-2005 imp

/* -> /*- for license, minor formatting changes


# 136155 05-Oct-2004 sam

Add 802.11-specific events that are dispatched through the routing socket.
This really doesn't belong here but is preferred (for the moment) over
adding yet another mechanism for sending msgs from the kernel to user apps.

Reviewed by: imp


# 132780 28-Jul-2004 kan

Avoid casts as lvalues.


# 128621 24-Apr-2004 luigi

document the locking behaviour of the functions that access
the routing table.


# 128454 20-Apr-2004 luigi

Document an assumption on the structure of 'struct rtentry'


# 128357 17-Apr-2004 luigi

make route_init() static


# 128185 13-Apr-2004 luigi

route.h: introduce a macro, SA_SIZE(struct sockaddr *) which returns
the space occupied by a struct sockaddr when passed through a
routing socket.
Use it to replace the macro ROUNDUP(int), that does the same but
is redefined by every file which uses it, courtesy of
the School of Cut'n'Paste Programming(TM).

(partial) userland changes to follow.


# 128019 07-Apr-2004 imp

Remove advertising clause from University of California Regent's
license, per letter dated July 22, 1999 and email from Peter Wemm,
Alan Cox and Robert Watson.

Approved by: core, peter, alc, rwatson


# 127828 04-Apr-2004 luigi

+ arpresolve(): remove an unused argument
+ struct ifnet: remove unused fields, move ipv6-related field close
to each other, add a pointer to l3<->l2 translation tables (arp,nd6,
etc.) for future use.

+ struct route: remove an unused field, move close to each
other some fields that might likely go away in the future


# 122922 20-Nov-2003 andre

Introduce tcp_hostcache and remove the tcp specific metrics from
the routing table. Move all usage and references in the tcp stack
from the routing table metrics to the tcp hostcache.

It caches measured parameters of past tcp sessions to provide better
initial start values for following connections from or to the same
source or destination. Depending on the network parameters to/from
the remote host this can lead to significant speedups for new tcp
connections after the first one because they inherit and shortcut
the learning curve.

tcp_hostcache is designed for multiple concurrent access in SMP
environments with high contention and is hash indexed by remote
ip address.

It removes significant locking requirements from the tcp stack with
regard to the routing table.

Reviewed by: sam (mentor), bms
Reviewed by: -net, -current, core@kame.net (IPv6 parts)
Approved by: re (scottl)


# 122921 20-Nov-2003 andre

Remove RTF_PRCLONING from routing table and adjust users of it
accordingly. The define is left intact for ABI compatibility
with userland.

This is a pre-step for the introduction of tcp_hostcache. The
network stack remains fully useable with this change.

Reviewed by: sam (mentor), bms
Reviewed by: -net, -current, core@kame.net (IPv6 parts)
Approved by: re (scottl)


# 122334 08-Nov-2003 sam

replace explicit changes to rt_refcnt by RT_ADDREF and RT_REMREF
macros that expand to include assertions when the system is built
with INVARIANTS

Supported by: FreeBSD Foundation


# 121770 30-Oct-2003 sam

Overhaul routing table entry cleanup by introducing a new rtexpunge
routine that takes a locked routing table reference and removes all
references to the entry in the various data structures. This
eliminates instances of recursive locking and also closes races
where the lock on the entry had to be dropped prior to calling
rtrequest(RTM_DELETE). This also cleans up confusion where the
caller held a reference to an entry that might have been reclaimed
(and in some cases used that reference).

Supported by: FreeBSD Foundation


# 120727 04-Oct-2003 sam

Locking for updates to routing table entries. Each rtentry gets a mutex
that covers updates to the contents. Note this is separate from holding
a reference and/or locking the routing table itself.

Other/related changes:

o rtredirect loses the final parameter by which an rtentry reference
may be returned; this was never used and added unwarranted complexity
for locking.
o minor style cleanups to routing code (e.g. ansi-fy function decls)
o remove the logic to bump the refcnt on the parent of cloned routes,
we assume the parent will remain as long as the clone; doing this avoids
a circularity in locking during delete
o convert some timeouts to MPSAFE callouts

Notes:

1. rt_mtx in struct rtentry is guarded by #ifdef _KERNEL as user-level
applications cannot/do-no know about mutex's. Doing this requires
that the mutex be the last element in the structure. A better solution
is to introduce an externalized version of struct rtentry but this is
a major task because of the intertwining of rtentry and other data
structures that are visible to user applications.
2. There are known LOR's that are expected to go away with forthcoming
work to eliminate many held references. If not these will be resolved
prior to release.
3. ATM changes are untested.

Sponsored by: FreeBSD Foundation
Obtained from: BSD/OS (partly)


# 120701 03-Oct-2003 sam

cleanups prior to adding locking (and in some cases to eliminate locking):

o move route_cb to be private to rtsock.c
o replace global static route_proto by locals
o eliminate global #define shorthands for info references
o remove some register decls
o ansi-fy function decls
o move items to be close in scope to their usage
o add rt_dispatch function for dispatching the actual message
o cleanup tangled logic for doing all-but-me msg send

Support by: FreeBSD Foundation


# 117752 18-Jul-2003 hsu

Add mutex for routing entries.

Reviewed by: bmilekic, silby


# 111926 05-Mar-2003 peter

Finish driving a stake through the heart of netns and the associated
ifdefs scattered around the place - its dead Jim!

The SMB stuff had stolen AF_NS, make it official.


# 111767 02-Mar-2003 mdodd

Reduce code duplication. This adds the function rt_check() to route.c.

Approved by: sam (in principle)


# 93084 24-Mar-2002 bde

Fixed some style bugs in the removal of __P(()). The main ones were
not removing tabs before "__P((", and not outdenting continuation lines
to preserve non-KNF lining up of code with parentheses. Switch to KNF
formatting and/or rewrap the whole prototype in some cases.


# 92725 19-Mar-2002 alfred

Remove __P.


# 89498 18-Jan-2002 ru

Introduce an interface announcement message for the routing
socket so that routing daemons and other interested parties
know when an interface is attached/detached.

PR: kern/33747
Obtained from: NetBSD
MFC after: 2 weeks


# 86764 22-Nov-2001 jlemon

Introduce a syncache, which enables FreeBSD to withstand a SYN flood
DoS in an improved fashion over the existing code.

Reviewed by: silby (in a previous iteration)
Sponsored by: DARPA, NAI Labs


# 85074 17-Oct-2001 ru

Pull post-4.4BSD change to sys/net/route.c from BSD/OS 4.2.

Have sys/net/route.c:rtrequest1(), which takes ``rt_addrinfo *''
as the argument. Pass rt_addrinfo all the way down to rtrequest1
and ifa->ifa_rtrequest. 3rd argument of ifa->ifa_rtrequest is now
``rt_addrinfo *'' instead of ``sockaddr *'' (almost noone is
using it anyways).

Benefit: the following command now works. Previously we needed
two route(8) invocations, "add" then "change".
# route add -inet6 default ::1 -ifp gif0

Remove unsafe typecast in rtrequest(), from ``rtentry *'' to
``sockaddr *''. It was introduced by 4.3BSD-Reno and never
corrected.

Obtained from: BSD/OS, NetBSD
MFC after: 1 month
PR: kern/28360


# 85052 17-Oct-2001 ru

64-bit fixes from CSRG.


# 83366 12-Sep-2001 julian

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


# 80353 25-Jul-2001 fenner

Don't bother passing p to rtioctl just so it can fail to pass it to mrt_ioctl


# 63745 21-Jul-2000 jayanth

When a connection is being dropped due to a listen queue overflow,
delete the cloned route that is associated with the connection.
This does not exhaust the routing table memory when the system
is under a SYN flood attack. The route entry is not deleted if there
is any prior information cached in it.

Reviewed by: Peter Wemm,asmodai


# 57178 13-Feb-2000 peter

Clean up some loose ends in the network code, including the X.25 and ISO
#ifdefs. Clean out unused netisr's and leftover netisr linker set gunk.
Tested on x86 and alpha, including world.

Approved by: jkh


# 55205 29-Dec-1999 peter

Change #ifdef KERNEL to #ifdef _KERNEL in the public headers. "KERNEL"
is an application space macro and the applications are supposed to be free
to use it as they please (but cannot). This is consistant with the other
BSD's who made this change quite some time ago. More commits to come.


# 54350 09-Dec-1999 shin

rtcalloc() is removed because it turned out not to be necessary for FreeBSD.
(It was added as a part of KAME patch)

Specified by: jdp@polstra.com


# 52904 05-Nov-1999 shin

KAME related header files additions and merges.
(only those which don't affect c source files so much)

Reviewed by: cvs-committers
Obtained from: KAME project


# 51252 13-Sep-1999 ru

Add comments, fix typos.

Reviewed by: wollman


# 50477 27-Aug-1999 peter

$Id$ -> $FreeBSD$


# 48381 30-Jun-1999 msmith

Increase the size of the route reference count from 15 bits to 31 bits.

This doesn't change the size or alignment of the structure on either i386
or Alpha, and thus should be binary-compatible (modulo problems with old
applications and routes with more than 2^15 references).

Reviewed by: peter


# 46568 06-May-1999 peter

Add sufficient braces to keep egcs happy about potentially ambiguous
if/else nesting.


# 34823 23-Mar-1998 bde

Added a forward struct declaration so that this file is less
self-insufficient.


# 29179 07-Sep-1997 bde

Some staticized variables were still declared to be extern.


# 22975 22-Feb-1997 peter

Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.


# 21673 14-Jan-1997 jkh

Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.


# 21666 13-Jan-1997 wollman

Use the new if_multiaddrs list for multicast addresses rather than the
previous hackery involving struct in_ifaddr and arpcom. Get rid of the
abominable multi_kludge. Update all network interfaces to use the
new machanism. Distressingly few Ethernet drivers program the multicast
filter properly (assuming the hardware has one, which it usually does).


# 18839 09-Oct-1996 wollman

Get rid of obsolete RTF_MASK and RTF_CHAINDELETE flags.


# 17835 26-Aug-1996 julian

change a comment to match what the BSD4.4 book says.


# 15652 06-May-1996 wollman

Add three new route flags to help determine what sort of address
the destination represents. For IP:

- Iff it is a host route, RTF_LOCAL and RTF_BROADCAST indicate local
(belongs to this host) and broadcast addresses, respectively.

- For all routes, RTF_MULTICAST is set if the destination is multicast.

The RTF_BROADCAST flag is used by ip_output() to eliminate a call to
in_broadcast() in a common case; this gives about 1% in our packet-generation
experiments. All three flags might be used (although they aren't now)
to determine whether a packet can be forwarded; a given host route can
represent a forwardable address if:

(rt->rt_flags & (RTF_HOST | RTF_LOCAL | RTF_BROADCAST | RTF_MULTICAST))
== RTF_HOST

Obviously, one still has to do all the work if a host route is not present,
but this code allows one to cache the results of such a lookup if rtalloc1()
is called without masking RTF_PRCLONING.


# 13765 30-Jan-1996 mpp

Fix a bunch of spelling errors in the comment fields of
a bunch of system include files.


# 12820 14-Dec-1995 phk

Another mega commit to staticize things.


# 12340 16-Nov-1995 phk

All net.* sysctl converted now.


# 11819 26-Oct-1995 julian

Reviewed by: julian and jhay@mikom.csir.co.za
Submitted by: Mike Mitchell, supervisor@alb.asctmd.com

This is a bulk mport of Mike's IPX/SPX protocol stacks and all the
related gunf that goes with it..
it is not guaranteed to work 100% correctly at this time
but as we had several people trying to work on it
I figured it would be better to get it checked in so
they could all get teh same thing to work on..

Mikes been using it for a year or so
but on 2.0

more changes and stuff will be merged in from other developers now that this is in.

Mike Mitchell, Network Engineer
AMTECH Systems Corporation, Technology and Manufacturing
8600 Jefferson Street, Albuquerque, New Mexico 87113 (505) 856-8000
supervisor@alb.asctmd.com


# 11541 16-Oct-1995 wollman

Change signature of rt->rt_output() so that it is compatible with
ifp->if_output() functions. This way, initial implementations of
rt_output functionality can just lazily use if_output until customized
versions are written.


# 9759 29-Jul-1995 bde

Eliminate sloppy common-style declarations. There should be none left for
the LINT configuation.


# 8876 30-May-1995 rgrimes

Remove trailing whitespace.


# 7197 20-Mar-1995 wollman

Better fix for the deletion of parents of cloned routes problem,
superseding the `nextchild' hack. This also provides a way
forward to fix RTM_CHANGE and RTM_ADD as well.


# 6245 08-Feb-1995 wollman

Define RTF_PINNED for future use.


# 6235 07-Feb-1995 wollman

Correct fix for merge conflicts: RTM_VERSION is always 5. Header files
included by user code must never depend on kernel compile options.


# 6228 07-Feb-1995 dg

Fixed unresolved CVS conflict on RTM_VERSION.


# 6223 07-Feb-1995 wollman

Merge in the socket-level support for Transaction TCP from the OLAH_TTCP
branch.

Submitted by: Andras Olah <olah@cs.utwente.nl>


# 5833 24-Jan-1995 bde

Declare `struct mbuf' with the correct scope to avoid lots of warnings
for compiling routed... Previously a kernel function pointer that is
bogusly visible to applications was incompletely declared to hide the
problem.


# 5791 23-Jan-1995 wollman

route.c: keep track of where cloned routes come from, and make sure to
delete them when the ``parent'' goes away

route.h: add glue to track this to rtentry structure. WARNING WILL ROBINSON!
This will be yet another incompatible change in your route-using binaries.
I apologize, but this was the only way to do it. I took this opportunity
to increase the size of the metrics to what I believe will be the final
length for 2.1, so that when the T/TCP stuff is done, this won't happen
again.


# 5099 13-Dec-1994 wollman

Add support for two separate cloning flags, one set by the lower layers,
and one set by the protocol family. Also add another parameter to
rtalloc1() to allow for any interface flags to be ignored; currently
this is only useful for RTF_PRCLONING. Get rid of rt_prflags and re-unite
with rt_flags. Add T/TCP ``route metrics''.

NB: YOU MUST RECOMPILE `route' AND OTHER RELATED PROGRAMS AS A RESULT OF
THIS CHANGE.

This also adds a new interface parameter, `ifi_physical', which will
eventually replace IFF_ALTPHYS as the mechanism for specifying the
particular physical connection desired on a multiple-connection card.

NB: YOU MUST RECOMPILE `ifconfig' AND OTHER RELATED PROGRAMS AS A RESULT OF
THIS CHANGE.


# 4104 02-Nov-1994 wollman

Collapse two fields so that we have space for another 32 flags.
NB: You will have to recompile programs which use the `rt_use' member in
order to get the correct values. This should not cause incorrect operation,
but the statistics may look a little confusing.


# 2168 21-Aug-1994 paul

Make idempotent.

Submitted by: Paul


# 1817 02-Aug-1994 dg

Added $Id$


# 1549 25-May-1994 rgrimes

The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.

Reviewed by: Rodney W. Grimes
Submitted by: John Dyson and David Greenman


# 1542 24-May-1994 rgrimes

This commit was generated by cvs2svn to compensate for changes in r1541,
which included commits to RCS files with non-trunk default branches.


# 1541 24-May-1994 rgrimes

BSD 4.4 Lite Kernel Sources