History log of /netbsd-current/sys/net/route.h
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 1.134 16-Jun-2023 rin

Align function name in its declaration consistently.
No binary changes.


# 1.133 16-Jun-2023 rin

Consistently use __inline instead of inline, as done for rev. 1.119:
http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/net/route.h#rev1.119


Revision tags: netbsd-10-base bouyer-sunxi-drm-base
# 1.132 20-Sep-2022 knakahara

Remove routes on an address removal if the routes referencing to the address. Implemented by ozaki-r@n.o.

A route that has a gateway is on a connected route can be invalid if the
connected route is deleted, i.e., an associated address is removed.
Traditionally NetBSD doesn't sweep such a route on the address removal. Sending
packets over the route fails with "No route to host". Also the route holds an
orphan ifaddr as rt_ifa that is destructed say by in_purgeaddr.

If the same address is assgined again in such a state, there can be two
different ifaddr objects with the same address. Until recently it's not a
big problem because we can send packets anyway. However after MP-ification
of the network stack, we can't send packets because we strictly check if rt_ifa
(i.e., the (old) ifaddr) is valid.

This change automatically removes such routes on a removal of an associated
address to avoid keeping inconsistent routes.


# 1.131 29-Aug-2022 knakahara

Add sysctl entry to control to send routing message for RTM_DYNAMIC.

Some routing daemons require such routing message to keep coherency.

If we want to let kernel send such message, set net.inet.icmp.dynamic_rt_msg=1
for IPv4, net.inet6.icmp6.dynamic_rt_msg=1 for IPv6.
Default(=0) is the same as before, that is, not send such routing message.


# 1.130 26-Aug-2022 knakahara

Refactor: rtrequest_newmsg() is no longer used after nd6_rtr.c:r1.149

That has bumped up to 9.99.66 when nd6_rtr.c:r1.149 was commited.


# 1.129 09-Aug-2021 andvar

fix various typos in compatibility, mainly in comments.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.128 22-Mar-2021 christos

Add a list of names


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406
# 1.127 09-Mar-2020 roy

branches: 1.127.6; 1.127.8;
route: RTM_MISS now puts the message source address in RTA_AUTHOR

route(8) also reports this.
A userland app could use this to blacklist nodes who probe for machines
that doesn't exist on a subnet / prefix.


Revision tags: ad-namecache-base3
# 1.126 08-Feb-2020 roy

route(4): add RO_MISSFILTER socket option

This allows filtering of specific RTM_MISS destination sockaddrs.


Revision tags: ad-namecache-base2 ad-namecache-base1 ad-namecache-base phil-wifi-20191119
# 1.125 19-Sep-2019 ozaki-r

branches: 1.125.2;
Avoid having a rtcache directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.
A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.

Reviewed by knakahara@ and yamaguchi@


# 1.124 22-Aug-2019 roy

rtsock: rework rt_clonedmsg to take a message type and lladdr

We will use this in a future patch to notify userland of lladdr
changes.

XXX pullup -8 -9


Revision tags: netbsd-9-base phil-wifi-20190609
# 1.123 29-Apr-2019 roy

branches: 1.123.2;
Introduce rt_addrmsg_src which adds RTA_AUTHOR to the message.
Use this when we notify userland of a duplicate address
and set RTA_AUTHOR to the hardware address of the sender.

While here, match the logging diagnostic of INET6 to the simpler one
of INET so it's consistent.


# 1.122 29-Apr-2019 roy

rtsock: Route address message simplification

Rename rt_newaddrmsg to rt_addrmsg_rt.
Add rt_addrmsg which drops the error and route arguments which are only
needed by one caller.


# 1.121 29-Apr-2019 pgoyette

For the rtsock compat code, make sure we create the "oroute" sysctl
tree. Otherwise a 5.2 version of getifaddrs(2) gets errors.

This makes the 5.2 version of ifconfig(8) behave the same on both
NetBSD-8 and -current. HOWEVER, both of them print nothing (for
``ifconfig -l'' command) so there's still a bug somewhere.

As reported originally by der Mouse.


Revision tags: isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126
# 1.120 30-Oct-2018 ozaki-r

Avoid double rt_replace_ifa on rtrequest1(RTM_ADD)

Some callers of rtrequest1(RTM_ADD) adjust rt_ifa of an rtentry created by
rtrequest1 that may change rt_ifa (in ifa_rtrequest) with another ifa that is
different from requested one. It's wasteful and even worse introduces a race
condition. rtrequest1 should just use a passed ifa as is if a caller hopes so.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422
# 1.119 19-Apr-2018 christos

branches: 1.119.2;
s/static inline/static __inline/g for consistency.


Revision tags: pgoyette-compat-0415
# 1.118 12-Apr-2018 ozaki-r

Resolve tangled lock dependencies in route.c

This change sweeps remaining lock decisions based on if locked or not by moving
utility functions of rtentry updates from rtsock.c and ensuring holding the
rt_lock. It also improves the atomicity of a update of a rtentry.


Revision tags: pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.117 09-Jan-2018 christos

branches: 1.117.2;
Use a queue of deferred entries to delete routes instead of a fixed stack
of 10. Otherwise we can overflow in route deletions from the rexmit timer.
XXX: pullup-8


# 1.116 18-Dec-2017 ozaki-r

Show ARP/NDP caches as LLINFO not LLDATA for backward compatiblity


# 1.115 13-Dec-2017 christos

Add bit definitions for flags so that route(8) can use them.


Revision tags: tls-maxphys-base-20171202
# 1.114 21-Sep-2017 ozaki-r

Invalidate rtcache based on a global generation counter

The change introduces a global generation counter that is incremented when any
routes have been added or deleted. When a rtcache caches a rtentry into itself,
it also stores a snapshot of the generation counter. If the snapshot equals to
the global counter, the cache is still valid, otherwise invalidated.

One drawback of the change is that all rtcaches of all protocol families are
invalidated when any routes of any protocol families are added or deleted.
If that matters, we should have separate generation counters based on
protocol families.

This change removes LIST_ENTRY from struct route, which fixes a part of
PR kern/52515.


Revision tags: nick-nhusb-base-20170825 perseant-stdc-iso10646-base
# 1.113 16-Jun-2017 ozaki-r

Sending a routing message (RTM_ADD) on adding an llentry

A message used to be sent on adding a cloned route. Restore the
behavior for backward compatibility.

Requested by ryo@


Revision tags: netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.112 11-Apr-2017 roy

branches: 1.112.4;
Add RO_MSGFILTER socket option to PF_ROUTE to filter out
un-wanted route(4) messages.

Inspired by the ROUTE_MSGFILTER equivalent in OpenBSD,
but with an API which allows the full range of potential message types.


Revision tags: jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107
# 1.111 19-Dec-2016 roy

branches: 1.111.2;
Fix gcc complaining about int to unsigned long conversion issues by
explictly marking as unsigned in RT_ROUNDUP2.


# 1.110 16-Dec-2016 christos

Can't hide stuff from userland, because struct route is embedded in other
structures (like inpcb) and things like fstat stop working.


# 1.109 12-Dec-2016 ozaki-r

Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.108 08-Dec-2016 ozaki-r

Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.


Revision tags: nick-nhusb-base-20161204
# 1.107 15-Nov-2016 ozaki-r

Don't use rt_walktree to delete routes

Some functions use rt_walktree to scan the routing table and delete
matched routes. However, we shouldn't use rt_walktree to delete
routes because rt_walktree is recursive to the routing table (radix
tree) and isn't friendly to MP-ification. rt_walktree allows a caller
to pass a callback function to delete an matched entry. The callback
function is called from an API of the radix tree (rn_walktree) but
also calls an API of the radix tree to delete an entry.

This change adds a new API of the radix tree, rn_search_matched,
which returns a matched entry that is selected by a callback
function passed by a caller and the caller itself deletes the
entry. By using the API, we can avoid the recursive form.


Revision tags: pgoyette-localcount-20161104
# 1.106 25-Oct-2016 ozaki-r

Remove unnecessary argument

No functional change.


# 1.105 21-Oct-2016 ozaki-r

Make some rt_timer functions and variables static

No functional change.


# 1.104 18-Oct-2016 ozaki-r

Remove unused rtcache_lookup_noclone


Revision tags: nick-nhusb-base-20161004
# 1.103 21-Sep-2016 roy

Add ifam_pid and ifam_addrflags to ifa_msghdr.
Re-version RTM_NEWADDR, RTM_DELADDR, RTM_CHGADDR and NET_RT_IFLIST.
Add compat code for old version.


Revision tags: localcount-20160914 pgoyette-localcount-20160806
# 1.102 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


Revision tags: pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529
# 1.101 28-Apr-2016 ozaki-r

branches: 1.101.2;
Constify rtentry of if_output

We no longer need to change rtentry below if_output.

The change makes it clear where rtentries are changed (or not)
and helps forthcoming locking (os psrefing) rtentries.


# 1.100 26-Apr-2016 ozaki-r

Stop using rt_gwroute on packet sending paths

rt_gwroute of rtentry is a reference to a rtentry of the gateway
for a rtentry with RTF_GATEWAY. That was used by L2 (arp and ndp)
to look up L2 addresses. By separating L2 nexthop caches, we don't
need a route for the purpose and we can stop using rt_gwroute.
By doing so, we can reduce referencing and modifying rtentries,
which makes it easy to apply a lock (and/or psref) to the
routing table and rtentries.

One issue to do this is to keep RTF_REJECT behavior. It seems it
was broken when we moved rtalloc1 things from L2 output routines
(e.g., ether_output) to ip_hresolv_output, but (fortunately?)
it works unexpectedly. What we mistook are:
- RTF_REJECT was checked for any routes in L2 output routines,
but in ip_hresolv_output it is checked only when the route
is RTF_GATEWAY
- The RTF_REJECT check wasn't copied to IPv6 (nd6_output)

It seems that rt_gwroute checks hid the mistakes and it looked
work (unexpectedly) and removing rt_gwroute checks unveil the
issue. So we need to fix RTF_REJECT checks in ip_hresolv_output
and also add them to nd6_output.

One more point we have to care is returning an errno; we need
to mimic looutput behavior. Originally RTF_REJECT check was
done either in L2 output routines or in looutput. The latter is
applied when a reject route directs to a loopback interface.
However, now RTF_REJECT check is done before looutput so to keep
the original behavior we need to return an errno which looutput
chooses. Added rt_check_reject_route does such tweaks.


Revision tags: nick-nhusb-base-20160422
# 1.99 11-Apr-2016 ozaki-r

Don't use radix tree API directly


# 1.98 04-Apr-2016 ozaki-r

Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html


# 1.97 24-Mar-2016 ozaki-r

Constify rt_newmsg's arguments


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.96 02-Sep-2015 ozaki-r

Do rt_refcnt++ when set a rtentry to another rtentry's rt_gwroute

And also do rtfree when deref a rtentry from rt_gwroute.


# 1.95 31-Aug-2015 ozaki-r

Hook up lltable/llentry with the kernel (and rumpkernel)

It is built and initialized on bootup, but there is no user for now.

Most codes in in.c are imported from FreeBSD as well as lltable/llentry.


# 1.94 31-Aug-2015 ozaki-r

Make rt_refcnt take into account rt_timer


# 1.93 24-Aug-2015 ozaki-r

Add an assertion; if rtcache has an rtentry, its refcnt must be > 0


# 1.92 17-Jul-2015 ozaki-r

Reform use of rt_refcnt

rt_refcnt of rtentry was used in bad manners, for example, direct rt_refcnt++
and rt_refcnt-- outside route.c, "rt->rt_refcnt++; rtfree(rt);" idiom, and
touching rt after rt->rt_refcnt--.

These abuses seem to be needed because rt_refcnt manages only references
between rtentry and doesn't take care of references during packet processing
(IOW references from local variables). In order to reduce the above abuses,
the latter cases should be counted by rt_refcnt as well as the former cases.

This change improves consistency of use of rt_refcnt:
- rtentry is always accessed with rt_refcnt incremented
- rtentry's rt_refcnt is decremented after use (rtfree is always used instead
of rt_refcnt--)
- functions returning rtentry increment its rt_refcnt (and caller rtfree it)

Note that rt_refcnt prevents rtentry from being freed but doesn't prevent
rtentry from being updated. Toward MP-safe, we need to provide another
protection for rtentry, e.g., locks. (Or introduce a better data structure
allowing concurrent readers during updates.)


Revision tags: nick-nhusb-base-20150606
# 1.91 30-Apr-2015 ozaki-r

Make some functions static

- rtflushall
- rtcache_clear
- rtcache_invalidate

And pull these static inline functions in route.c

- rt_destroy
- rt_setkey


Revision tags: nick-nhusb-base-20150406
# 1.90 06-Apr-2015 ozaki-r

Classify and sort prototype declarations

No functional change.


# 1.89 06-Apr-2015 ozaki-r

Make rt_maskedcopy static


# 1.88 23-Mar-2015 roy

Add RTF_BROADCAST to mark routes used for the broadcast address when
they are created on the fly. This makes it clear what the route is for
and allows an optimisation in ip_output() by avoiding a call to
in_broadcast() because most of the time we do talk to a host.
It also avoids a needless allocation for the storage of llinfo_arp and
thus vanishes from arp(8) - it showed as incomplete anyway so this
is a nice side effect.

Guard against this and routes marked with RTF_BLACKHOLE in
ip_fastforward().
While here, guard against routes marked with RTF_BLACKHOLE in
ip6_fastforward().
RTF_BROADCAST is IPv4 only, so don't bother checking that here.


# 1.87 26-Feb-2015 roy

Introduce the routing flag RTF_LOCAL to track local address routes.
Add functions rt_ifa_addlocal() and rt_ifa_remlocal() to add and remove
local routes for the address and announce the new address and route
to the routing socket.

Add in_ifaddlocal() and in_ifremlocal() to use these functions.
Rename in6_if{add,rem}loop() to in6_if{add,rem}local() and use these
functions.

rtinit() no longer announces the address, just the network route for the
address. As such, calls to rt_newaddrmsg() have been removed from
in_addprefix() and in_scrubprefix().

This solves the problem of potentially more than one announcement, or no
announcement at all for the address in certain situations.


# 1.86 25-Feb-2015 roy

Rename nd6_rtmsg() to rt_newmsg() and move into the generic routing code
as it's not IPv6 specific and will be used elsewhere.


# 1.85 24-Feb-2015 roy

Clean comments and style.


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.84 06-Jun-2014 rmind

branches: 1.84.4;
- Eliminate RTFREE() macro in favour of rtfree() function.
- Make rtcache() function static.


Revision tags: yamt-pagecache-base9 rmind-smpnet-nbase rmind-smpnet-base
# 1.83 26-Apr-2014 pooka

It's been > 20years since rtioctl() did something. Let's just
remove that special way of returning EOPNOTSUPP.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base
# 1.82 01-Mar-2013 joerg

branches: 1.82.6; 1.82.10;
Retire OSI network stack. OK core@


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6 jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.81 18-Feb-2012 rmind

branches: 1.81.2;
rt_setkey: remove invalid assert, sockaddr_dup() may fail if no memory.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.80 11-Nov-2011 gdt

branches: 1.80.4;
Move RTF_ANNOUNCE flag so that it no longer conflicts with RTF_PROTO2.

RTF_ANNOUNCE was defined as RTF_PROTO2. The flag is used to indicated
that host should act as a proxy for a link level arp or ndp request.
(If RTF_PROTO2 is used as an experimental flag (as advertised),
various problems can occur.)

This commit provides a first-class definition with its own bit for
RTF_ANNOUNCE, removes the old aliasing definitions, and adds support
for the new RTF_ANNOUNCE flag to netstat(8) and route(8).,

Also, remove unused RTF_ flags that collide with RTF_PROTO1:
netinet/icmp6.h defined RTF_PROBEMTU as RTF_PROTO1
netinet/if_inarp.h defined RTF_USETRAILERS as RTF_PROTO1
(Neither of these flags are used anywhere. Both have been removed
to reduce chances of collision with RTF_PROTO1.)

Figuring this out and the diff are the work of Beverly Schwartz of
BBN.

(Passed release build, boot in VM, with no apparently related atf
failures.)

Approved for Public Release, Distribution Unlimited
This material is based upon work supported by the Defense Advanced
Research Projects Agency and Space and Naval Warfare Systems Center,
Pacific, under Contract No. N66001-09-C-2073.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.79 31-Mar-2011 dyoung

branches: 1.79.4;
Hide the radix-trie implementation of the forwarding table so that we
will have an easier time replacing it with something different, even if
it is a second radix-trie implementation.

sys/net/route.c and sys/net/rtsock.c no longer operate directly on
radix_nodes or radix_node_heads.

Hopefully this will reduce the temptation to implement multipath or
source-based routing using grotty hacks to the grotty old radix-trie
code, too. :-)


Revision tags: bouyer-quota2-nbase bouyer-quota2-base
# 1.78 01-Feb-2011 matt

Add a new AF/PF_ROUTE which is 64-bit clean which makes the routing socket
interface (and its associated sysctls) act identically for both 32 and 64 bit
programs. The old unclean one remains for backward compatibility.


# 1.77 26-Jan-2011 dyoung

Update comment on RTM_CHGADDR to describe better what it's for.


Revision tags: jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.76 12-Nov-2010 roy

branches: 1.76.2; 1.76.4;
Add RTM_CHGADDR to signal that an address on the interface has changed.
This is mainly used for notifying userland about active link address changes.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.75 26-Jun-2010 kefren

Add MPLS support, proposed on tech-net@ a couple of days ago

Welcome to 5.99.33


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211
# 1.74 03-Nov-2009 dyoung

branches: 1.74.2; 1.74.4;
s/u_quad_t/uint64_t/.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 jym-xensuspend-nbase yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.73 02-Apr-2009 christos

Centralize the ROUNDUP and ADVANCE macro in a header file, give them an
RT_ prefix and use them appropriately, instead of making copies. Make
pppd use the RT_ROUNDUP macro; fixes proxyarp setting on 64 bit hosts.

XXX: All this should be pulled up to 5.0


Revision tags: nick-hppapmap-base2 mjf-devfs2-base
# 1.72 11-Jan-2009 christos

branches: 1.72.2;
merge christos-time_t


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base christos-time_t-nbase haad-dm-base christos-time_t-base
# 1.71 07-Nov-2008 dyoung

*** Summary ***

When a link-layer address changes (e.g., ifconfig ex0 link
02:de:ad:be:ef:02 active), send a gratuitous ARP and/or a Neighbor
Advertisement to update the network-/link-layer address bindings
on our LAN peers.

Refuse a change of ethernet address to the address 00:00:00:00:00:00
or to any multicast/broadcast address. (Thanks matt@.)

Reorder ifnet ioctl operations so that driver ioctls may inherit
the functions of their "class"---ether_ioctl(), fddi_ioctl(), et
cetera---and the class ioctls may inherit from the generic ioctl,
ifioctl_common(), but both driver- and class-ioctls may override
the generic behavior. Make network drivers share more code.

Distinguish a "factory" link-layer address from others for the
purposes of both protecting that address from deletion and computing
EUI64.

Return consistent, appropriate error codes from network drivers.

Improve readability. KNF.

*** Details ***

In if_attach(), always initialize the interface ioctl routine,
ifnet->if_ioctl, if the driver has not already initialized it.
Delete if_ioctl == NULL tests everywhere else, because it cannot
happen.

In the ioctl routines of network interfaces, inherit common ioctl
behaviors by calling either ifioctl_common() or whichever ioctl
routine is appropriate for the class of interface---e.g., ether_ioctl()
for ethernets.

Stop (ab)using SIOCSIFADDR and start to use SIOCINITIFADDR. In
the user->kernel interface, SIOCSIFADDR's argument was an ifreq,
but on the protocol->ifnet interface, SIOCSIFADDR's argument was
an ifaddr. That was confusing, and it would work against me as I
make it possible for a network interface to overload most ioctls.
On the protocol->ifnet interface, replace SIOCSIFADDR with
SIOCINITIFADDR. In ifioctl(), return EPERM if userland tries to
invoke SIOCINITIFADDR.

In ifioctl(), give the interface the first shot at handling most
interface ioctls, and give the protocol the second shot, instead
of the other way around. Finally, let compatibility code (COMPAT_OSOCK)
take a shot.

Pull device initialization out of switch statements under
SIOCINITIFADDR. For example, pull ..._init() out of any switch
statement that looks like this:

switch (...->sa_family) {
case ...:
..._init();
...
break;
...
default:
..._init();
...
break;
}

Rewrite many if-else clauses that handle all permutations of IFF_UP
and IFF_RUNNING to use a switch statement,

switch (x & (IFF_UP|IFF_RUNNING)) {
case 0:
...
break;
case IFF_RUNNING:
...
break;
case IFF_UP:
...
break;
case IFF_UP|IFF_RUNNING:
...
break;
}

unifdef lots of code containing #ifdef FreeBSD, #ifdef NetBSD, and
#ifdef SIOCSIFMTU, especially in fwip(4) and in ndis(4).

In ipw(4), remove an if_set_sadl() call that is out of place.

In nfe(4), reuse the jumbo MTU logic in ether_ioctl().

Let ethernets register a callback for setting h/w state such as
promiscuous mode and the multicast filter in accord with a change
in the if_flags: ether_set_ifflags_cb() registers a callback that
returns ENETRESET if the caller should reset the ethernet by calling
if_init(), 0 on success, != 0 on failure. Pull common code from
ex(4), gem(4), nfe(4), sip(4), tlp(4), vge(4) into ether_ioctl(),
and register if_flags callbacks for those drivers.

Return ENOTTY instead of EINVAL for inappropriate ioctls. In
zyd(4), use ENXIO instead of ENOTTY to indicate that the device is
not any longer attached.

Add to if_set_sadl() a boolean 'factory' argument that indicates
whether a link-layer address was assigned by the factory or some
other source. In a comment, recommend using the factory address
for generating an EUI64, and update in6_get_hw_ifid() to prefer a
factory address to any other link-layer address.

Add a routing message, RTM_LLINFO_UPD, that tells protocols to
update the binding of network-layer addresses to link-layer addresses.
Implement this message in IPv4 and IPv6 by sending a gratuitous
ARP or a neighbor advertisement, respectively. Generate RTM_LLINFO_UPD
messages on a change of an interface's link-layer address.

In ether_ioctl(), do not let SIOCALIFADDR set a link-layer address
that is broadcast/multicast or equal to 00:00:00:00:00:00.

Make ether_ioctl() call ifioctl_common() to handle ioctls that it
does not understand.

In gif(4), initialize if_softc and use it, instead of assuming that
the gif_softc and ifp overlap.

Let ifioctl_common() handle SIOCGIFADDR.

Sprinkle rtcache_invariants(), which checks on DIAGNOSTIC kernels
that certain invariants on a struct route are satisfied.

In agr(4), rewrite agr_ioctl_filter() to be a bit more explicit
about the ioctls that we do not allow on an agr(4) member interface.

bzero -> memset. Delete unnecessary casts to void *. Use
sockaddr_in_init() and sockaddr_in6_init(). Compare pointers with
NULL instead of "testing truth". Replace some instances of (type
*)0 with NULL. Change some K&R prototypes to ANSI C, and join
lines.


Revision tags: netbsd-5-0-RC3 netbsd-5-0-RC2 netbsd-5-0-RC1 netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-baseX yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base yamt-nfs-mp-base yamt-pf42-base ad-socklock-base1
# 1.70 26-Mar-2008 ad

branches: 1.70.2; 1.70.6; 1.70.12; 1.70.14; 1.70.16;
Defer processing of routing messages to a soft interrupt. These can be
generated at IPL_VM and it's not safe to call directly into the socket
layer at that level. Reviewed by matt@.


Revision tags: yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase hpcarm-cleanup-base
# 1.69 20-Feb-2008 matt

branches: 1.69.2; 1.69.6;
s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)


Revision tags: mjf-devfs-base
# 1.68 11-Feb-2008 simonb

Don't look for <stdbool.h> if compiling _STANDALONE as well.


Revision tags: bouyer-xeni386-nbase
# 1.67 21-Jan-2008 dyoung

struct route is part of the kernel ABI (!!!), so move it back
outside of #ifdef _KERNEL. #include stdbool.h if !_KERNEL.


# 1.66 21-Jan-2008 dyoung

Move struct route inside of #ifdef _KERNEL to protect userland from
it.


# 1.65 21-Jan-2008 dyoung

In rtflushall(), do not clear a route cache by removing its rtentry
reference, but mark the cache 'invalid'. Let the next user of the
route cache check to whether or not the cache is valid, and update
the rtentry reference if necessary. In this way, avoid hairy
splnet()/splx() protection of route caches, which I never did trust.


Revision tags: bouyer-xeni386-base
# 1.64 14-Jan-2008 dyoung

Use rtcache_validate() instead of rtcache_getrt(). Delete rtcache_getrt().

In rtcache_lookup2(), use the return values of rtcache_validate()
and _rtcache_init() instead of looking at _ro_rt. Also, check the
return code of rtcache_setdst() for an error.


# 1.63 12-Jan-2008 dyoung

Good-bye, rtcache_check(). Call both rtcache_validate() and
rtcache_update(,1) instead of rtcache_check().


# 1.62 11-Jan-2008 dyoung

Cosmetic: remove redundant 'not' from a comment, re-wrap lines.


# 1.61 10-Jan-2008 dyoung

Make many void rtcache_X() routines return struct rtentry *, so
that we can make many back-to-back rtcache_X();rtcache_getrt()
calls into one rtcache_X() call.


Revision tags: matt-armv6-base
# 1.60 04-Jan-2008 dyoung

Replace rtcache_down() with rtcache_validate() and update rtcache_down()
uses.


Revision tags: vmlocking2-base3
# 1.59 20-Dec-2007 dyoung

Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.


Revision tags: nick-csl-alignment-base5 matt-armv6-prevmlocking yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base jmcneill-pm-base reinoud-bufcleanup-base vmlocking-base
# 1.58 27-Aug-2007 dyoung

branches: 1.58.2; 1.58.8; 1.58.10; 1.58.14;
Add a new routing message type, RTM_SETGATE. We can use an
RTM_SETGATE message to ask the link layer to fill in the link-layer
nexthop before we try to detect a duplicate route in a multipath-capable
kernel.


Revision tags: matt-mips64-base
# 1.57 19-Jul-2007 dyoung

branches: 1.57.4; 1.57.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.56 09-Jun-2007 dyoung

branches: 1.56.2;
Get rid of radix_node_head.rnh_walktree, because it is only ever
set to rn_walktree.

Introduce rt_walktree(), which applies a subroutine to every route
in a particular address family. Use it instead of rn_walktree()
virtually everywhere. This helps to hide the routing table
implementation.


Revision tags: yamt-idlelwp-base8
# 1.55 06-May-2007 dyoung

Factor rtcache_lookup2() out of rtcache_lookup1(), for re-use in
the IPv6 stack. rtcache_lookup2() takes an int * argument that it
writes with 1 if we had a cache 'hit', 0 if there was a cache
'miss'.


# 1.54 02-May-2007 dyoung

Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


# 1.53 22-Apr-2007 xtraeme

rtcache_clear is defined as static void in route.c, but it's used
in netinet/in_route.c. Move the prototype into route.h to fix
the build.


Revision tags: thorpej-atomic-base
# 1.52 04-Mar-2007 christos

branches: 1.52.2; 1.52.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.51 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.50 05-Jan-2007 joerg

branches: 1.50.2;
Add a debug option for the route cache to help tracing down issues
like PR 35272 and 35318. When the kernel is compiled with
-DRTCACHE_DEBUG, all rtcache entries are logged to a list with the place
they got initialised. This allows overwrites, double inits and other
manual messing to be detected.


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.49 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.48 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


# 1.47 07-Dec-2006 joerg

Deinline rt_get_ifa. Keep it in route.c as it is part of the routing
API, even though rtsock.c is the only user right now.


# 1.46 07-Dec-2006 joerg

Deinline rt_replace_ifa and move rt_set_ifa and rt_set_ifa1 to
route.c as they are not used outside that file.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base netbsd-4-base
# 1.45 13-Nov-2006 dyoung

Fix bugs in rt_get_ifa() and put aside the sequence number stuff,
which isn't ready for primetime yet.


# 1.44 13-Nov-2006 dyoung

Add a source-address selection policy mechanism to the kernel.

Also, add ioctls SIOCGIFADDRPREF/SIOCSIFADDRPREF to get/set preference
numbers for addresses. Make ifconfig(8) set/display preference
numbers.

To activate source-address selection policies in your kernel, add
'options IPSELSRC' to your kernel configuration.

Miscellaneous changes in support of source-address selection:

1 Factor out some common code, producing rt_replace_ifa().

2 Abbreviate a for-loop with TAILQ_FOREACH().

3 Add the predicates on IPv4 addresses IN_LINKLOCAL() and
IN_PRIVATE(), that are true for link-local unicast
(169.254/16) and RFC1918 private addresses, respectively.
Add the predicate IN_ANY_LOCAL() that is true for link-local
unicast and multicast.

4 Add IPv4-specific interface attach/detach routines,
in_domifattach and in_domifdetach, which build #ifdef
IPSELSRC.

See in_getifa(9) for a more thorough description of source-address
selection policy.


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base2 yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base simonb-timcounters-final yamt-pdpolicy-base5 chap-midi-base yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 elad-kernelauth-base yamt-pdpolicy-base yamt-uio_vmspace-base5 simonb-timecounters-base rpaulo-netinet-merge-pcb-base
# 1.43 11-Dec-2005 christos

branches: 1.43.20; 1.43.22;
merge ktrace-lwp.


Revision tags: ktrace-lwp-base
# 1.42 10-Dec-2005 elad

Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.41 22-Jun-2005 dyoung

branches: 1.41.2;
Resolve conflicts in importation of 18-May-2005 ath(4) / net80211(9)
from FreeBSD. Introduce compatibility shims (sys/dev/ic/ath_netbsd.[ch],
sys/net80211/ieee80211_netbsd.[ch]). Update drivers (an, atu, atw,
awi, ipw, iwi, rtw, wi) for the new net80211(9) API.


# 1.40 29-May-2005 christos

- sprinkle const
- remove unneeded casts
- use more mem*() instead of b*() funcs.


Revision tags: netbsd-3-1-1-RELEASE netbsd-3-0-3-RELEASE netbsd-3-1-RELEASE netbsd-3-0-2-RELEASE netbsd-3-1-RC4 netbsd-3-1-RC3 netbsd-3-1-RC2 netbsd-3-1-RC1 netbsd-3-0-1-RELEASE netbsd-3-0-RELEASE netbsd-3-0-RC6 netbsd-3-0-RC5 netbsd-3-0-RC4 netbsd-3-0-RC3 netbsd-3-0-RC2 netbsd-3-0-RC1 yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.39 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.38 21-Apr-2004 matt

branches: 1.38.4; 1.38.6;
ANSI-fy and some additional de-__P and constification.


# 1.37 21-Apr-2004 matt

Constify if.c radix.c and route.c (and fix related fallout).


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.36 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.35 29-Jun-2003 fvdl

branches: 1.35.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.34 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.33 18-Jan-2003 wiz

bandwidth, not bandwith.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.32 12-Nov-2002 itojun

remove all entries in rt timer queue on ip_mtudisc change, instead of
destroying the queue.


# 1.31 12-Nov-2002 itojun

add an argument to rt_timer_remove_all(), to specify if we need to call
timeout routine on removal.


# 1.30 02-Nov-2002 perry

/*CONTCOND*/ while (0)'ed macros


Revision tags: kqueue-aftermerge kqueue-beforemerge netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base gehenna-devsw-base kqueue-base
# 1.29 12-May-2002 matt

branches: 1.29.4;
Eliminate more commons.


Revision tags: eeh-devprop-base newlock-base ifpoll-base thorpej-mips-cache-base thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.28 08-Mar-2001 enami

branches: 1.28.2;
- lineup comment.
- fix typo in comment.


# 1.27 21-Feb-2001 itojun

branches: 1.27.2;
use u_quad_t for rtstat.
not sure if it really matters, but short (32K) looks way too small given
recent fat pipes connecting *BSD boxes, and our great uptime :-).


# 1.26 27-Jan-2001 itojun

cleanup cloned route when parent route (RTF_CLONING) goes away.
adds rt_parent to link parent from child (like NRL did, ours do refcnt
rt_refcnt properly).

bsdi rt_walkbranch would speedup the processing, but since the code will not
be visited too frequently, the current code (with rt_walktree) should be okay.


# 1.25 27-Jan-2001 itojun

mark cloned routes with RTF_CLONED. present it with netstat -r by "c".

let static routes overwrite cloned routes, as cloned routes can come back again
if necessary. behavior same as freebsd/bsdi, code partially from bsdi42.
(NRL rt->rt_parent was not added)
should fix PR 11916 and maybe some other PRs with ARP behavior.

recompilation of usr.sbin/route6d is suggested.


# 1.24 17-Jan-2001 itojun

pull post-4.4BSD change to sys/net/route.c from BSD/OS 4.2 (UCB copyrighted).

have sys/net/route.c:rtrequest1(), which takes rt_addrinfo * as the argument.
pass rt_addrinfo all the way down to rtrequest, and ifa->ifa_rtrequest.
3rd arg of ifa->ifa_rtrequest is now rt_addrinfo * instead of sockaddr *
(almost noone is using it anyways).

benefit: the follwoing command now works. previously we need two route(8)
invocations, "add" then "change".
# route add -inet6 default ::1 -ifp gif0

remove unsafe typecast in rtrequest(), from rtentry * to sockaddr *. it was
introduced by 4.3BSD-reno and never corrected.

XXX is eon_rtrequest() change correct regarding to 3rd arg?
eon_rtrequest() and rtrequest() were incorrect since 4.3BSD-reno,
so i do not have correct answer in the source code.
someone with more clue about netiso-over-ip, please help.


# 1.23 09-Dec-2000 itojun

update icmp6 too big validation. the change is necessary since pmtud is
mandatory for IPv6 (so we can't just validate by using connected pcb - we need
to allow traffic from unconnected pcb to do pmtud).
- if the traffic is validated by xx_ctlinput, allow up to "hiwat" pmtud
route entries.
- if the traffic was not validated by xx_ctlinput, allow up to "lowat" pmtud
route entries (there's upper limit, so bad guys cannot blow up our routing
table).
sync with kame

XXX need to think again about default hiwat/lowat value.
XXX victim selection to help starvation case


Revision tags: netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.22 04-May-2000 ragge

branches: 1.22.4;
Change rt_refcnt from short to int, to allow more than 32k routes thru
one interface without unexpected side effects.


# 1.21 06-Mar-2000 thorpej

- Add link status to if_data, so that routing daemons and other interested
parties can easily know the state of a link.
- Define an interface announcement message for the routing socket so that
routing daemons and other interested parties know when an interface
is attached/detached.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.20 19-Nov-1999 bouyer

Update protocoles and interfaces stats counters to 64bit.
RTM_IFINFO is now 0xf, 0xe is RTM_OIFINFO which returns the old (if_msghdr14)
struct with 32bit counters (binary compat, conditioned on COMPAT_14).
Same for sysctl: node 3 is renamed NET_RT_OIFLIST, NET_RT_IFLIST is now node 4.
Change rt_msg1() to add an mbuf to the mbuf chain instead of just panic()
when the message is larger than MHLEN.


Revision tags: comdex-fall-1999-base fvdl-softdep-base chs-ubc2-base
# 1.19 30-Jul-1999 itojun

branches: 1.19.2; 1.19.8;
remove reference to in6_systm.h (file itself will be removed afterwords)


# 1.18 01-Jul-1999 itojun

IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


Revision tags: netbsd-1-4-PATCH003 netbsd-1-4-PATCH002 netbsd-1-4-PATCH001 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.17 27-Dec-1998 thorpej

branches: 1.17.4; 1.17.6;
Simplify the rttimer code somewhat; use TAILQs instead of CIRCLEQs (we
didn't really need to traverse the queues backwards anyhow), and other
minor code simplification.


# 1.16 10-Dec-1998 christos

IPX counters and centralize statistics routine.


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.15 25-Aug-1998 thorpej

Use do { ... } while (0) in RTFREE().


Revision tags: eeh-paddr_t-base
# 1.14 02-May-1998 thorpej

Need <sys/socket.h> to stand alone.


# 1.13 29-Apr-1998 thorpej

Oops, we depend on <sys/queue.h>.


# 1.12 29-Apr-1998 kml

Add generic route timeout functionality; used by path MTU discovery code


Revision tags: netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base thorpej-signal-base marc-pcmcia-bp marc-pcmcia-base
# 1.11 02-Apr-1997 christos

branches: 1.11.8;
Sync with Lite2.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.10 22-May-1996 mycroft

Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.9 13-Feb-1996 christos

branches: 1.9.4;
Net prototypes


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.8 26-Mar-1995 jtc

KERNEL -> _KERNEL


# 1.7 08-Mar-1995 cgd

fixed sized types, where appropriate. when casting pointers to
integers to do math on them, cast to long. ioctl commands are
u_longs.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.6 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.5 13-May-1994 mycroft

Update to 4.4-Lite networking code, with a few local changes.


# 1.4 11-May-1994 mycroft

Update to RTM version 3. Add prototypes. Add some new constants which are
not used yet.


Revision tags: magnum-base netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.3 20-May-1993 cgd

add rcs ids to everything, and clean up headers


# 1.2 19-Apr-1993 mycroft

Add consistent multiple-inclusion protection.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.132 20-Sep-2022 knakahara

Remove routes on an address removal if the routes referencing to the address. Implemented by ozaki-r@n.o.

A route that has a gateway is on a connected route can be invalid if the
connected route is deleted, i.e., an associated address is removed.
Traditionally NetBSD doesn't sweep such a route on the address removal. Sending
packets over the route fails with "No route to host". Also the route holds an
orphan ifaddr as rt_ifa that is destructed say by in_purgeaddr.

If the same address is assgined again in such a state, there can be two
different ifaddr objects with the same address. Until recently it's not a
big problem because we can send packets anyway. However after MP-ification
of the network stack, we can't send packets because we strictly check if rt_ifa
(i.e., the (old) ifaddr) is valid.

This change automatically removes such routes on a removal of an associated
address to avoid keeping inconsistent routes.


# 1.131 29-Aug-2022 knakahara

Add sysctl entry to control to send routing message for RTM_DYNAMIC.

Some routing daemons require such routing message to keep coherency.

If we want to let kernel send such message, set net.inet.icmp.dynamic_rt_msg=1
for IPv4, net.inet6.icmp6.dynamic_rt_msg=1 for IPv6.
Default(=0) is the same as before, that is, not send such routing message.


# 1.130 26-Aug-2022 knakahara

Refactor: rtrequest_newmsg() is no longer used after nd6_rtr.c:r1.149

That has bumped up to 9.99.66 when nd6_rtr.c:r1.149 was commited.


# 1.129 09-Aug-2021 andvar

fix various typos in compatibility, mainly in comments.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.128 22-Mar-2021 christos

Add a list of names


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406
# 1.127 09-Mar-2020 roy

branches: 1.127.6; 1.127.8;
route: RTM_MISS now puts the message source address in RTA_AUTHOR

route(8) also reports this.
A userland app could use this to blacklist nodes who probe for machines
that doesn't exist on a subnet / prefix.


Revision tags: ad-namecache-base3
# 1.126 08-Feb-2020 roy

route(4): add RO_MISSFILTER socket option

This allows filtering of specific RTM_MISS destination sockaddrs.


Revision tags: ad-namecache-base2 ad-namecache-base1 ad-namecache-base phil-wifi-20191119
# 1.125 19-Sep-2019 ozaki-r

branches: 1.125.2;
Avoid having a rtcache directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.
A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.

Reviewed by knakahara@ and yamaguchi@


# 1.124 22-Aug-2019 roy

rtsock: rework rt_clonedmsg to take a message type and lladdr

We will use this in a future patch to notify userland of lladdr
changes.

XXX pullup -8 -9


Revision tags: netbsd-9-base phil-wifi-20190609
# 1.123 29-Apr-2019 roy

branches: 1.123.2;
Introduce rt_addrmsg_src which adds RTA_AUTHOR to the message.
Use this when we notify userland of a duplicate address
and set RTA_AUTHOR to the hardware address of the sender.

While here, match the logging diagnostic of INET6 to the simpler one
of INET so it's consistent.


# 1.122 29-Apr-2019 roy

rtsock: Route address message simplification

Rename rt_newaddrmsg to rt_addrmsg_rt.
Add rt_addrmsg which drops the error and route arguments which are only
needed by one caller.


# 1.121 29-Apr-2019 pgoyette

For the rtsock compat code, make sure we create the "oroute" sysctl
tree. Otherwise a 5.2 version of getifaddrs(2) gets errors.

This makes the 5.2 version of ifconfig(8) behave the same on both
NetBSD-8 and -current. HOWEVER, both of them print nothing (for
``ifconfig -l'' command) so there's still a bug somewhere.

As reported originally by der Mouse.


Revision tags: isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126
# 1.120 30-Oct-2018 ozaki-r

Avoid double rt_replace_ifa on rtrequest1(RTM_ADD)

Some callers of rtrequest1(RTM_ADD) adjust rt_ifa of an rtentry created by
rtrequest1 that may change rt_ifa (in ifa_rtrequest) with another ifa that is
different from requested one. It's wasteful and even worse introduces a race
condition. rtrequest1 should just use a passed ifa as is if a caller hopes so.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422
# 1.119 19-Apr-2018 christos

branches: 1.119.2;
s/static inline/static __inline/g for consistency.


Revision tags: pgoyette-compat-0415
# 1.118 12-Apr-2018 ozaki-r

Resolve tangled lock dependencies in route.c

This change sweeps remaining lock decisions based on if locked or not by moving
utility functions of rtentry updates from rtsock.c and ensuring holding the
rt_lock. It also improves the atomicity of a update of a rtentry.


Revision tags: pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.117 09-Jan-2018 christos

branches: 1.117.2;
Use a queue of deferred entries to delete routes instead of a fixed stack
of 10. Otherwise we can overflow in route deletions from the rexmit timer.
XXX: pullup-8


# 1.116 18-Dec-2017 ozaki-r

Show ARP/NDP caches as LLINFO not LLDATA for backward compatiblity


# 1.115 13-Dec-2017 christos

Add bit definitions for flags so that route(8) can use them.


Revision tags: tls-maxphys-base-20171202
# 1.114 21-Sep-2017 ozaki-r

Invalidate rtcache based on a global generation counter

The change introduces a global generation counter that is incremented when any
routes have been added or deleted. When a rtcache caches a rtentry into itself,
it also stores a snapshot of the generation counter. If the snapshot equals to
the global counter, the cache is still valid, otherwise invalidated.

One drawback of the change is that all rtcaches of all protocol families are
invalidated when any routes of any protocol families are added or deleted.
If that matters, we should have separate generation counters based on
protocol families.

This change removes LIST_ENTRY from struct route, which fixes a part of
PR kern/52515.


Revision tags: nick-nhusb-base-20170825 perseant-stdc-iso10646-base
# 1.113 16-Jun-2017 ozaki-r

Sending a routing message (RTM_ADD) on adding an llentry

A message used to be sent on adding a cloned route. Restore the
behavior for backward compatibility.

Requested by ryo@


Revision tags: netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.112 11-Apr-2017 roy

branches: 1.112.4;
Add RO_MSGFILTER socket option to PF_ROUTE to filter out
un-wanted route(4) messages.

Inspired by the ROUTE_MSGFILTER equivalent in OpenBSD,
but with an API which allows the full range of potential message types.


Revision tags: jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107
# 1.111 19-Dec-2016 roy

branches: 1.111.2;
Fix gcc complaining about int to unsigned long conversion issues by
explictly marking as unsigned in RT_ROUNDUP2.


# 1.110 16-Dec-2016 christos

Can't hide stuff from userland, because struct route is embedded in other
structures (like inpcb) and things like fstat stop working.


# 1.109 12-Dec-2016 ozaki-r

Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.108 08-Dec-2016 ozaki-r

Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.


Revision tags: nick-nhusb-base-20161204
# 1.107 15-Nov-2016 ozaki-r

Don't use rt_walktree to delete routes

Some functions use rt_walktree to scan the routing table and delete
matched routes. However, we shouldn't use rt_walktree to delete
routes because rt_walktree is recursive to the routing table (radix
tree) and isn't friendly to MP-ification. rt_walktree allows a caller
to pass a callback function to delete an matched entry. The callback
function is called from an API of the radix tree (rn_walktree) but
also calls an API of the radix tree to delete an entry.

This change adds a new API of the radix tree, rn_search_matched,
which returns a matched entry that is selected by a callback
function passed by a caller and the caller itself deletes the
entry. By using the API, we can avoid the recursive form.


Revision tags: pgoyette-localcount-20161104
# 1.106 25-Oct-2016 ozaki-r

Remove unnecessary argument

No functional change.


# 1.105 21-Oct-2016 ozaki-r

Make some rt_timer functions and variables static

No functional change.


# 1.104 18-Oct-2016 ozaki-r

Remove unused rtcache_lookup_noclone


Revision tags: nick-nhusb-base-20161004
# 1.103 21-Sep-2016 roy

Add ifam_pid and ifam_addrflags to ifa_msghdr.
Re-version RTM_NEWADDR, RTM_DELADDR, RTM_CHGADDR and NET_RT_IFLIST.
Add compat code for old version.


Revision tags: localcount-20160914 pgoyette-localcount-20160806
# 1.102 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


Revision tags: pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529
# 1.101 28-Apr-2016 ozaki-r

branches: 1.101.2;
Constify rtentry of if_output

We no longer need to change rtentry below if_output.

The change makes it clear where rtentries are changed (or not)
and helps forthcoming locking (os psrefing) rtentries.


# 1.100 26-Apr-2016 ozaki-r

Stop using rt_gwroute on packet sending paths

rt_gwroute of rtentry is a reference to a rtentry of the gateway
for a rtentry with RTF_GATEWAY. That was used by L2 (arp and ndp)
to look up L2 addresses. By separating L2 nexthop caches, we don't
need a route for the purpose and we can stop using rt_gwroute.
By doing so, we can reduce referencing and modifying rtentries,
which makes it easy to apply a lock (and/or psref) to the
routing table and rtentries.

One issue to do this is to keep RTF_REJECT behavior. It seems it
was broken when we moved rtalloc1 things from L2 output routines
(e.g., ether_output) to ip_hresolv_output, but (fortunately?)
it works unexpectedly. What we mistook are:
- RTF_REJECT was checked for any routes in L2 output routines,
but in ip_hresolv_output it is checked only when the route
is RTF_GATEWAY
- The RTF_REJECT check wasn't copied to IPv6 (nd6_output)

It seems that rt_gwroute checks hid the mistakes and it looked
work (unexpectedly) and removing rt_gwroute checks unveil the
issue. So we need to fix RTF_REJECT checks in ip_hresolv_output
and also add them to nd6_output.

One more point we have to care is returning an errno; we need
to mimic looutput behavior. Originally RTF_REJECT check was
done either in L2 output routines or in looutput. The latter is
applied when a reject route directs to a loopback interface.
However, now RTF_REJECT check is done before looutput so to keep
the original behavior we need to return an errno which looutput
chooses. Added rt_check_reject_route does such tweaks.


Revision tags: nick-nhusb-base-20160422
# 1.99 11-Apr-2016 ozaki-r

Don't use radix tree API directly


# 1.98 04-Apr-2016 ozaki-r

Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html


# 1.97 24-Mar-2016 ozaki-r

Constify rt_newmsg's arguments


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.96 02-Sep-2015 ozaki-r

Do rt_refcnt++ when set a rtentry to another rtentry's rt_gwroute

And also do rtfree when deref a rtentry from rt_gwroute.


# 1.95 31-Aug-2015 ozaki-r

Hook up lltable/llentry with the kernel (and rumpkernel)

It is built and initialized on bootup, but there is no user for now.

Most codes in in.c are imported from FreeBSD as well as lltable/llentry.


# 1.94 31-Aug-2015 ozaki-r

Make rt_refcnt take into account rt_timer


# 1.93 24-Aug-2015 ozaki-r

Add an assertion; if rtcache has an rtentry, its refcnt must be > 0


# 1.92 17-Jul-2015 ozaki-r

Reform use of rt_refcnt

rt_refcnt of rtentry was used in bad manners, for example, direct rt_refcnt++
and rt_refcnt-- outside route.c, "rt->rt_refcnt++; rtfree(rt);" idiom, and
touching rt after rt->rt_refcnt--.

These abuses seem to be needed because rt_refcnt manages only references
between rtentry and doesn't take care of references during packet processing
(IOW references from local variables). In order to reduce the above abuses,
the latter cases should be counted by rt_refcnt as well as the former cases.

This change improves consistency of use of rt_refcnt:
- rtentry is always accessed with rt_refcnt incremented
- rtentry's rt_refcnt is decremented after use (rtfree is always used instead
of rt_refcnt--)
- functions returning rtentry increment its rt_refcnt (and caller rtfree it)

Note that rt_refcnt prevents rtentry from being freed but doesn't prevent
rtentry from being updated. Toward MP-safe, we need to provide another
protection for rtentry, e.g., locks. (Or introduce a better data structure
allowing concurrent readers during updates.)


Revision tags: nick-nhusb-base-20150606
# 1.91 30-Apr-2015 ozaki-r

Make some functions static

- rtflushall
- rtcache_clear
- rtcache_invalidate

And pull these static inline functions in route.c

- rt_destroy
- rt_setkey


Revision tags: nick-nhusb-base-20150406
# 1.90 06-Apr-2015 ozaki-r

Classify and sort prototype declarations

No functional change.


# 1.89 06-Apr-2015 ozaki-r

Make rt_maskedcopy static


# 1.88 23-Mar-2015 roy

Add RTF_BROADCAST to mark routes used for the broadcast address when
they are created on the fly. This makes it clear what the route is for
and allows an optimisation in ip_output() by avoiding a call to
in_broadcast() because most of the time we do talk to a host.
It also avoids a needless allocation for the storage of llinfo_arp and
thus vanishes from arp(8) - it showed as incomplete anyway so this
is a nice side effect.

Guard against this and routes marked with RTF_BLACKHOLE in
ip_fastforward().
While here, guard against routes marked with RTF_BLACKHOLE in
ip6_fastforward().
RTF_BROADCAST is IPv4 only, so don't bother checking that here.


# 1.87 26-Feb-2015 roy

Introduce the routing flag RTF_LOCAL to track local address routes.
Add functions rt_ifa_addlocal() and rt_ifa_remlocal() to add and remove
local routes for the address and announce the new address and route
to the routing socket.

Add in_ifaddlocal() and in_ifremlocal() to use these functions.
Rename in6_if{add,rem}loop() to in6_if{add,rem}local() and use these
functions.

rtinit() no longer announces the address, just the network route for the
address. As such, calls to rt_newaddrmsg() have been removed from
in_addprefix() and in_scrubprefix().

This solves the problem of potentially more than one announcement, or no
announcement at all for the address in certain situations.


# 1.86 25-Feb-2015 roy

Rename nd6_rtmsg() to rt_newmsg() and move into the generic routing code
as it's not IPv6 specific and will be used elsewhere.


# 1.85 24-Feb-2015 roy

Clean comments and style.


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.84 06-Jun-2014 rmind

branches: 1.84.4;
- Eliminate RTFREE() macro in favour of rtfree() function.
- Make rtcache() function static.


Revision tags: yamt-pagecache-base9 rmind-smpnet-nbase rmind-smpnet-base
# 1.83 26-Apr-2014 pooka

It's been > 20years since rtioctl() did something. Let's just
remove that special way of returning EOPNOTSUPP.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base
# 1.82 01-Mar-2013 joerg

branches: 1.82.6; 1.82.10;
Retire OSI network stack. OK core@


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6 jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.81 18-Feb-2012 rmind

branches: 1.81.2;
rt_setkey: remove invalid assert, sockaddr_dup() may fail if no memory.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.80 11-Nov-2011 gdt

branches: 1.80.4;
Move RTF_ANNOUNCE flag so that it no longer conflicts with RTF_PROTO2.

RTF_ANNOUNCE was defined as RTF_PROTO2. The flag is used to indicated
that host should act as a proxy for a link level arp or ndp request.
(If RTF_PROTO2 is used as an experimental flag (as advertised),
various problems can occur.)

This commit provides a first-class definition with its own bit for
RTF_ANNOUNCE, removes the old aliasing definitions, and adds support
for the new RTF_ANNOUNCE flag to netstat(8) and route(8).,

Also, remove unused RTF_ flags that collide with RTF_PROTO1:
netinet/icmp6.h defined RTF_PROBEMTU as RTF_PROTO1
netinet/if_inarp.h defined RTF_USETRAILERS as RTF_PROTO1
(Neither of these flags are used anywhere. Both have been removed
to reduce chances of collision with RTF_PROTO1.)

Figuring this out and the diff are the work of Beverly Schwartz of
BBN.

(Passed release build, boot in VM, with no apparently related atf
failures.)

Approved for Public Release, Distribution Unlimited
This material is based upon work supported by the Defense Advanced
Research Projects Agency and Space and Naval Warfare Systems Center,
Pacific, under Contract No. N66001-09-C-2073.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.79 31-Mar-2011 dyoung

branches: 1.79.4;
Hide the radix-trie implementation of the forwarding table so that we
will have an easier time replacing it with something different, even if
it is a second radix-trie implementation.

sys/net/route.c and sys/net/rtsock.c no longer operate directly on
radix_nodes or radix_node_heads.

Hopefully this will reduce the temptation to implement multipath or
source-based routing using grotty hacks to the grotty old radix-trie
code, too. :-)


Revision tags: bouyer-quota2-nbase bouyer-quota2-base
# 1.78 01-Feb-2011 matt

Add a new AF/PF_ROUTE which is 64-bit clean which makes the routing socket
interface (and its associated sysctls) act identically for both 32 and 64 bit
programs. The old unclean one remains for backward compatibility.


# 1.77 26-Jan-2011 dyoung

Update comment on RTM_CHGADDR to describe better what it's for.


Revision tags: jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.76 12-Nov-2010 roy

branches: 1.76.2; 1.76.4;
Add RTM_CHGADDR to signal that an address on the interface has changed.
This is mainly used for notifying userland about active link address changes.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.75 26-Jun-2010 kefren

Add MPLS support, proposed on tech-net@ a couple of days ago

Welcome to 5.99.33


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211
# 1.74 03-Nov-2009 dyoung

branches: 1.74.2; 1.74.4;
s/u_quad_t/uint64_t/.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 jym-xensuspend-nbase yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.73 02-Apr-2009 christos

Centralize the ROUNDUP and ADVANCE macro in a header file, give them an
RT_ prefix and use them appropriately, instead of making copies. Make
pppd use the RT_ROUNDUP macro; fixes proxyarp setting on 64 bit hosts.

XXX: All this should be pulled up to 5.0


Revision tags: nick-hppapmap-base2 mjf-devfs2-base
# 1.72 11-Jan-2009 christos

branches: 1.72.2;
merge christos-time_t


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base christos-time_t-nbase haad-dm-base christos-time_t-base
# 1.71 07-Nov-2008 dyoung

*** Summary ***

When a link-layer address changes (e.g., ifconfig ex0 link
02:de:ad:be:ef:02 active), send a gratuitous ARP and/or a Neighbor
Advertisement to update the network-/link-layer address bindings
on our LAN peers.

Refuse a change of ethernet address to the address 00:00:00:00:00:00
or to any multicast/broadcast address. (Thanks matt@.)

Reorder ifnet ioctl operations so that driver ioctls may inherit
the functions of their "class"---ether_ioctl(), fddi_ioctl(), et
cetera---and the class ioctls may inherit from the generic ioctl,
ifioctl_common(), but both driver- and class-ioctls may override
the generic behavior. Make network drivers share more code.

Distinguish a "factory" link-layer address from others for the
purposes of both protecting that address from deletion and computing
EUI64.

Return consistent, appropriate error codes from network drivers.

Improve readability. KNF.

*** Details ***

In if_attach(), always initialize the interface ioctl routine,
ifnet->if_ioctl, if the driver has not already initialized it.
Delete if_ioctl == NULL tests everywhere else, because it cannot
happen.

In the ioctl routines of network interfaces, inherit common ioctl
behaviors by calling either ifioctl_common() or whichever ioctl
routine is appropriate for the class of interface---e.g., ether_ioctl()
for ethernets.

Stop (ab)using SIOCSIFADDR and start to use SIOCINITIFADDR. In
the user->kernel interface, SIOCSIFADDR's argument was an ifreq,
but on the protocol->ifnet interface, SIOCSIFADDR's argument was
an ifaddr. That was confusing, and it would work against me as I
make it possible for a network interface to overload most ioctls.
On the protocol->ifnet interface, replace SIOCSIFADDR with
SIOCINITIFADDR. In ifioctl(), return EPERM if userland tries to
invoke SIOCINITIFADDR.

In ifioctl(), give the interface the first shot at handling most
interface ioctls, and give the protocol the second shot, instead
of the other way around. Finally, let compatibility code (COMPAT_OSOCK)
take a shot.

Pull device initialization out of switch statements under
SIOCINITIFADDR. For example, pull ..._init() out of any switch
statement that looks like this:

switch (...->sa_family) {
case ...:
..._init();
...
break;
...
default:
..._init();
...
break;
}

Rewrite many if-else clauses that handle all permutations of IFF_UP
and IFF_RUNNING to use a switch statement,

switch (x & (IFF_UP|IFF_RUNNING)) {
case 0:
...
break;
case IFF_RUNNING:
...
break;
case IFF_UP:
...
break;
case IFF_UP|IFF_RUNNING:
...
break;
}

unifdef lots of code containing #ifdef FreeBSD, #ifdef NetBSD, and
#ifdef SIOCSIFMTU, especially in fwip(4) and in ndis(4).

In ipw(4), remove an if_set_sadl() call that is out of place.

In nfe(4), reuse the jumbo MTU logic in ether_ioctl().

Let ethernets register a callback for setting h/w state such as
promiscuous mode and the multicast filter in accord with a change
in the if_flags: ether_set_ifflags_cb() registers a callback that
returns ENETRESET if the caller should reset the ethernet by calling
if_init(), 0 on success, != 0 on failure. Pull common code from
ex(4), gem(4), nfe(4), sip(4), tlp(4), vge(4) into ether_ioctl(),
and register if_flags callbacks for those drivers.

Return ENOTTY instead of EINVAL for inappropriate ioctls. In
zyd(4), use ENXIO instead of ENOTTY to indicate that the device is
not any longer attached.

Add to if_set_sadl() a boolean 'factory' argument that indicates
whether a link-layer address was assigned by the factory or some
other source. In a comment, recommend using the factory address
for generating an EUI64, and update in6_get_hw_ifid() to prefer a
factory address to any other link-layer address.

Add a routing message, RTM_LLINFO_UPD, that tells protocols to
update the binding of network-layer addresses to link-layer addresses.
Implement this message in IPv4 and IPv6 by sending a gratuitous
ARP or a neighbor advertisement, respectively. Generate RTM_LLINFO_UPD
messages on a change of an interface's link-layer address.

In ether_ioctl(), do not let SIOCALIFADDR set a link-layer address
that is broadcast/multicast or equal to 00:00:00:00:00:00.

Make ether_ioctl() call ifioctl_common() to handle ioctls that it
does not understand.

In gif(4), initialize if_softc and use it, instead of assuming that
the gif_softc and ifp overlap.

Let ifioctl_common() handle SIOCGIFADDR.

Sprinkle rtcache_invariants(), which checks on DIAGNOSTIC kernels
that certain invariants on a struct route are satisfied.

In agr(4), rewrite agr_ioctl_filter() to be a bit more explicit
about the ioctls that we do not allow on an agr(4) member interface.

bzero -> memset. Delete unnecessary casts to void *. Use
sockaddr_in_init() and sockaddr_in6_init(). Compare pointers with
NULL instead of "testing truth". Replace some instances of (type
*)0 with NULL. Change some K&R prototypes to ANSI C, and join
lines.


Revision tags: netbsd-5-0-RC3 netbsd-5-0-RC2 netbsd-5-0-RC1 netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-baseX yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base yamt-nfs-mp-base yamt-pf42-base ad-socklock-base1
# 1.70 26-Mar-2008 ad

branches: 1.70.2; 1.70.6; 1.70.12; 1.70.14; 1.70.16;
Defer processing of routing messages to a soft interrupt. These can be
generated at IPL_VM and it's not safe to call directly into the socket
layer at that level. Reviewed by matt@.


Revision tags: yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase hpcarm-cleanup-base
# 1.69 20-Feb-2008 matt

branches: 1.69.2; 1.69.6;
s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)


Revision tags: mjf-devfs-base
# 1.68 11-Feb-2008 simonb

Don't look for <stdbool.h> if compiling _STANDALONE as well.


Revision tags: bouyer-xeni386-nbase
# 1.67 21-Jan-2008 dyoung

struct route is part of the kernel ABI (!!!), so move it back
outside of #ifdef _KERNEL. #include stdbool.h if !_KERNEL.


# 1.66 21-Jan-2008 dyoung

Move struct route inside of #ifdef _KERNEL to protect userland from
it.


# 1.65 21-Jan-2008 dyoung

In rtflushall(), do not clear a route cache by removing its rtentry
reference, but mark the cache 'invalid'. Let the next user of the
route cache check to whether or not the cache is valid, and update
the rtentry reference if necessary. In this way, avoid hairy
splnet()/splx() protection of route caches, which I never did trust.


Revision tags: bouyer-xeni386-base
# 1.64 14-Jan-2008 dyoung

Use rtcache_validate() instead of rtcache_getrt(). Delete rtcache_getrt().

In rtcache_lookup2(), use the return values of rtcache_validate()
and _rtcache_init() instead of looking at _ro_rt. Also, check the
return code of rtcache_setdst() for an error.


# 1.63 12-Jan-2008 dyoung

Good-bye, rtcache_check(). Call both rtcache_validate() and
rtcache_update(,1) instead of rtcache_check().


# 1.62 11-Jan-2008 dyoung

Cosmetic: remove redundant 'not' from a comment, re-wrap lines.


# 1.61 10-Jan-2008 dyoung

Make many void rtcache_X() routines return struct rtentry *, so
that we can make many back-to-back rtcache_X();rtcache_getrt()
calls into one rtcache_X() call.


Revision tags: matt-armv6-base
# 1.60 04-Jan-2008 dyoung

Replace rtcache_down() with rtcache_validate() and update rtcache_down()
uses.


Revision tags: vmlocking2-base3
# 1.59 20-Dec-2007 dyoung

Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.


Revision tags: nick-csl-alignment-base5 matt-armv6-prevmlocking yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base jmcneill-pm-base reinoud-bufcleanup-base vmlocking-base
# 1.58 27-Aug-2007 dyoung

branches: 1.58.2; 1.58.8; 1.58.10; 1.58.14;
Add a new routing message type, RTM_SETGATE. We can use an
RTM_SETGATE message to ask the link layer to fill in the link-layer
nexthop before we try to detect a duplicate route in a multipath-capable
kernel.


Revision tags: matt-mips64-base
# 1.57 19-Jul-2007 dyoung

branches: 1.57.4; 1.57.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.56 09-Jun-2007 dyoung

branches: 1.56.2;
Get rid of radix_node_head.rnh_walktree, because it is only ever
set to rn_walktree.

Introduce rt_walktree(), which applies a subroutine to every route
in a particular address family. Use it instead of rn_walktree()
virtually everywhere. This helps to hide the routing table
implementation.


Revision tags: yamt-idlelwp-base8
# 1.55 06-May-2007 dyoung

Factor rtcache_lookup2() out of rtcache_lookup1(), for re-use in
the IPv6 stack. rtcache_lookup2() takes an int * argument that it
writes with 1 if we had a cache 'hit', 0 if there was a cache
'miss'.


# 1.54 02-May-2007 dyoung

Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


# 1.53 22-Apr-2007 xtraeme

rtcache_clear is defined as static void in route.c, but it's used
in netinet/in_route.c. Move the prototype into route.h to fix
the build.


Revision tags: thorpej-atomic-base
# 1.52 04-Mar-2007 christos

branches: 1.52.2; 1.52.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.51 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.50 05-Jan-2007 joerg

branches: 1.50.2;
Add a debug option for the route cache to help tracing down issues
like PR 35272 and 35318. When the kernel is compiled with
-DRTCACHE_DEBUG, all rtcache entries are logged to a list with the place
they got initialised. This allows overwrites, double inits and other
manual messing to be detected.


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.49 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.48 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


# 1.47 07-Dec-2006 joerg

Deinline rt_get_ifa. Keep it in route.c as it is part of the routing
API, even though rtsock.c is the only user right now.


# 1.46 07-Dec-2006 joerg

Deinline rt_replace_ifa and move rt_set_ifa and rt_set_ifa1 to
route.c as they are not used outside that file.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base netbsd-4-base
# 1.45 13-Nov-2006 dyoung

Fix bugs in rt_get_ifa() and put aside the sequence number stuff,
which isn't ready for primetime yet.


# 1.44 13-Nov-2006 dyoung

Add a source-address selection policy mechanism to the kernel.

Also, add ioctls SIOCGIFADDRPREF/SIOCSIFADDRPREF to get/set preference
numbers for addresses. Make ifconfig(8) set/display preference
numbers.

To activate source-address selection policies in your kernel, add
'options IPSELSRC' to your kernel configuration.

Miscellaneous changes in support of source-address selection:

1 Factor out some common code, producing rt_replace_ifa().

2 Abbreviate a for-loop with TAILQ_FOREACH().

3 Add the predicates on IPv4 addresses IN_LINKLOCAL() and
IN_PRIVATE(), that are true for link-local unicast
(169.254/16) and RFC1918 private addresses, respectively.
Add the predicate IN_ANY_LOCAL() that is true for link-local
unicast and multicast.

4 Add IPv4-specific interface attach/detach routines,
in_domifattach and in_domifdetach, which build #ifdef
IPSELSRC.

See in_getifa(9) for a more thorough description of source-address
selection policy.


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base2 yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base simonb-timcounters-final yamt-pdpolicy-base5 chap-midi-base yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 elad-kernelauth-base yamt-pdpolicy-base yamt-uio_vmspace-base5 simonb-timecounters-base rpaulo-netinet-merge-pcb-base
# 1.43 11-Dec-2005 christos

branches: 1.43.20; 1.43.22;
merge ktrace-lwp.


Revision tags: ktrace-lwp-base
# 1.42 10-Dec-2005 elad

Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.41 22-Jun-2005 dyoung

branches: 1.41.2;
Resolve conflicts in importation of 18-May-2005 ath(4) / net80211(9)
from FreeBSD. Introduce compatibility shims (sys/dev/ic/ath_netbsd.[ch],
sys/net80211/ieee80211_netbsd.[ch]). Update drivers (an, atu, atw,
awi, ipw, iwi, rtw, wi) for the new net80211(9) API.


# 1.40 29-May-2005 christos

- sprinkle const
- remove unneeded casts
- use more mem*() instead of b*() funcs.


Revision tags: netbsd-3-1-1-RELEASE netbsd-3-0-3-RELEASE netbsd-3-1-RELEASE netbsd-3-0-2-RELEASE netbsd-3-1-RC4 netbsd-3-1-RC3 netbsd-3-1-RC2 netbsd-3-1-RC1 netbsd-3-0-1-RELEASE netbsd-3-0-RELEASE netbsd-3-0-RC6 netbsd-3-0-RC5 netbsd-3-0-RC4 netbsd-3-0-RC3 netbsd-3-0-RC2 netbsd-3-0-RC1 yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.39 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.38 21-Apr-2004 matt

branches: 1.38.4; 1.38.6;
ANSI-fy and some additional de-__P and constification.


# 1.37 21-Apr-2004 matt

Constify if.c radix.c and route.c (and fix related fallout).


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.36 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.35 29-Jun-2003 fvdl

branches: 1.35.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.34 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.33 18-Jan-2003 wiz

bandwidth, not bandwith.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.32 12-Nov-2002 itojun

remove all entries in rt timer queue on ip_mtudisc change, instead of
destroying the queue.


# 1.31 12-Nov-2002 itojun

add an argument to rt_timer_remove_all(), to specify if we need to call
timeout routine on removal.


# 1.30 02-Nov-2002 perry

/*CONTCOND*/ while (0)'ed macros


Revision tags: kqueue-aftermerge kqueue-beforemerge netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base gehenna-devsw-base kqueue-base
# 1.29 12-May-2002 matt

branches: 1.29.4;
Eliminate more commons.


Revision tags: eeh-devprop-base newlock-base ifpoll-base thorpej-mips-cache-base thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.28 08-Mar-2001 enami

branches: 1.28.2;
- lineup comment.
- fix typo in comment.


# 1.27 21-Feb-2001 itojun

branches: 1.27.2;
use u_quad_t for rtstat.
not sure if it really matters, but short (32K) looks way too small given
recent fat pipes connecting *BSD boxes, and our great uptime :-).


# 1.26 27-Jan-2001 itojun

cleanup cloned route when parent route (RTF_CLONING) goes away.
adds rt_parent to link parent from child (like NRL did, ours do refcnt
rt_refcnt properly).

bsdi rt_walkbranch would speedup the processing, but since the code will not
be visited too frequently, the current code (with rt_walktree) should be okay.


# 1.25 27-Jan-2001 itojun

mark cloned routes with RTF_CLONED. present it with netstat -r by "c".

let static routes overwrite cloned routes, as cloned routes can come back again
if necessary. behavior same as freebsd/bsdi, code partially from bsdi42.
(NRL rt->rt_parent was not added)
should fix PR 11916 and maybe some other PRs with ARP behavior.

recompilation of usr.sbin/route6d is suggested.


# 1.24 17-Jan-2001 itojun

pull post-4.4BSD change to sys/net/route.c from BSD/OS 4.2 (UCB copyrighted).

have sys/net/route.c:rtrequest1(), which takes rt_addrinfo * as the argument.
pass rt_addrinfo all the way down to rtrequest, and ifa->ifa_rtrequest.
3rd arg of ifa->ifa_rtrequest is now rt_addrinfo * instead of sockaddr *
(almost noone is using it anyways).

benefit: the follwoing command now works. previously we need two route(8)
invocations, "add" then "change".
# route add -inet6 default ::1 -ifp gif0

remove unsafe typecast in rtrequest(), from rtentry * to sockaddr *. it was
introduced by 4.3BSD-reno and never corrected.

XXX is eon_rtrequest() change correct regarding to 3rd arg?
eon_rtrequest() and rtrequest() were incorrect since 4.3BSD-reno,
so i do not have correct answer in the source code.
someone with more clue about netiso-over-ip, please help.


# 1.23 09-Dec-2000 itojun

update icmp6 too big validation. the change is necessary since pmtud is
mandatory for IPv6 (so we can't just validate by using connected pcb - we need
to allow traffic from unconnected pcb to do pmtud).
- if the traffic is validated by xx_ctlinput, allow up to "hiwat" pmtud
route entries.
- if the traffic was not validated by xx_ctlinput, allow up to "lowat" pmtud
route entries (there's upper limit, so bad guys cannot blow up our routing
table).
sync with kame

XXX need to think again about default hiwat/lowat value.
XXX victim selection to help starvation case


Revision tags: netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.22 04-May-2000 ragge

branches: 1.22.4;
Change rt_refcnt from short to int, to allow more than 32k routes thru
one interface without unexpected side effects.


# 1.21 06-Mar-2000 thorpej

- Add link status to if_data, so that routing daemons and other interested
parties can easily know the state of a link.
- Define an interface announcement message for the routing socket so that
routing daemons and other interested parties know when an interface
is attached/detached.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.20 19-Nov-1999 bouyer

Update protocoles and interfaces stats counters to 64bit.
RTM_IFINFO is now 0xf, 0xe is RTM_OIFINFO which returns the old (if_msghdr14)
struct with 32bit counters (binary compat, conditioned on COMPAT_14).
Same for sysctl: node 3 is renamed NET_RT_OIFLIST, NET_RT_IFLIST is now node 4.
Change rt_msg1() to add an mbuf to the mbuf chain instead of just panic()
when the message is larger than MHLEN.


Revision tags: comdex-fall-1999-base fvdl-softdep-base chs-ubc2-base
# 1.19 30-Jul-1999 itojun

branches: 1.19.2; 1.19.8;
remove reference to in6_systm.h (file itself will be removed afterwords)


# 1.18 01-Jul-1999 itojun

IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


Revision tags: netbsd-1-4-PATCH003 netbsd-1-4-PATCH002 netbsd-1-4-PATCH001 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.17 27-Dec-1998 thorpej

branches: 1.17.4; 1.17.6;
Simplify the rttimer code somewhat; use TAILQs instead of CIRCLEQs (we
didn't really need to traverse the queues backwards anyhow), and other
minor code simplification.


# 1.16 10-Dec-1998 christos

IPX counters and centralize statistics routine.


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.15 25-Aug-1998 thorpej

Use do { ... } while (0) in RTFREE().


Revision tags: eeh-paddr_t-base
# 1.14 02-May-1998 thorpej

Need <sys/socket.h> to stand alone.


# 1.13 29-Apr-1998 thorpej

Oops, we depend on <sys/queue.h>.


# 1.12 29-Apr-1998 kml

Add generic route timeout functionality; used by path MTU discovery code


Revision tags: netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base thorpej-signal-base marc-pcmcia-bp marc-pcmcia-base
# 1.11 02-Apr-1997 christos

branches: 1.11.8;
Sync with Lite2.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.10 22-May-1996 mycroft

Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.9 13-Feb-1996 christos

branches: 1.9.4;
Net prototypes


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.8 26-Mar-1995 jtc

KERNEL -> _KERNEL


# 1.7 08-Mar-1995 cgd

fixed sized types, where appropriate. when casting pointers to
integers to do math on them, cast to long. ioctl commands are
u_longs.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.6 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.5 13-May-1994 mycroft

Update to 4.4-Lite networking code, with a few local changes.


# 1.4 11-May-1994 mycroft

Update to RTM version 3. Add prototypes. Add some new constants which are
not used yet.


Revision tags: magnum-base netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.3 20-May-1993 cgd

add rcs ids to everything, and clean up headers


# 1.2 19-Apr-1993 mycroft

Add consistent multiple-inclusion protection.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.131 29-Aug-2022 knakahara

Add sysctl entry to control to send routing message for RTM_DYNAMIC.

Some routing daemons require such routing message to keep coherency.

If we want to let kernel send such message, set net.inet.icmp.dynamic_rt_msg=1
for IPv4, net.inet6.icmp6.dynamic_rt_msg=1 for IPv6.
Default(=0) is the same as before, that is, not send such routing message.


# 1.130 26-Aug-2022 knakahara

Refactor: rtrequest_newmsg() is no longer used after nd6_rtr.c:r1.149

That has bumped up to 9.99.66 when nd6_rtr.c:r1.149 was commited.


# 1.129 09-Aug-2021 andvar

fix various typos in compatibility, mainly in comments.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.128 22-Mar-2021 christos

Add a list of names


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406
# 1.127 09-Mar-2020 roy

branches: 1.127.6; 1.127.8;
route: RTM_MISS now puts the message source address in RTA_AUTHOR

route(8) also reports this.
A userland app could use this to blacklist nodes who probe for machines
that doesn't exist on a subnet / prefix.


Revision tags: ad-namecache-base3
# 1.126 08-Feb-2020 roy

route(4): add RO_MISSFILTER socket option

This allows filtering of specific RTM_MISS destination sockaddrs.


Revision tags: ad-namecache-base2 ad-namecache-base1 ad-namecache-base phil-wifi-20191119
# 1.125 19-Sep-2019 ozaki-r

branches: 1.125.2;
Avoid having a rtcache directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.
A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.

Reviewed by knakahara@ and yamaguchi@


# 1.124 22-Aug-2019 roy

rtsock: rework rt_clonedmsg to take a message type and lladdr

We will use this in a future patch to notify userland of lladdr
changes.

XXX pullup -8 -9


Revision tags: netbsd-9-base phil-wifi-20190609
# 1.123 29-Apr-2019 roy

branches: 1.123.2;
Introduce rt_addrmsg_src which adds RTA_AUTHOR to the message.
Use this when we notify userland of a duplicate address
and set RTA_AUTHOR to the hardware address of the sender.

While here, match the logging diagnostic of INET6 to the simpler one
of INET so it's consistent.


# 1.122 29-Apr-2019 roy

rtsock: Route address message simplification

Rename rt_newaddrmsg to rt_addrmsg_rt.
Add rt_addrmsg which drops the error and route arguments which are only
needed by one caller.


# 1.121 29-Apr-2019 pgoyette

For the rtsock compat code, make sure we create the "oroute" sysctl
tree. Otherwise a 5.2 version of getifaddrs(2) gets errors.

This makes the 5.2 version of ifconfig(8) behave the same on both
NetBSD-8 and -current. HOWEVER, both of them print nothing (for
``ifconfig -l'' command) so there's still a bug somewhere.

As reported originally by der Mouse.


Revision tags: isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126
# 1.120 30-Oct-2018 ozaki-r

Avoid double rt_replace_ifa on rtrequest1(RTM_ADD)

Some callers of rtrequest1(RTM_ADD) adjust rt_ifa of an rtentry created by
rtrequest1 that may change rt_ifa (in ifa_rtrequest) with another ifa that is
different from requested one. It's wasteful and even worse introduces a race
condition. rtrequest1 should just use a passed ifa as is if a caller hopes so.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422
# 1.119 19-Apr-2018 christos

branches: 1.119.2;
s/static inline/static __inline/g for consistency.


Revision tags: pgoyette-compat-0415
# 1.118 12-Apr-2018 ozaki-r

Resolve tangled lock dependencies in route.c

This change sweeps remaining lock decisions based on if locked or not by moving
utility functions of rtentry updates from rtsock.c and ensuring holding the
rt_lock. It also improves the atomicity of a update of a rtentry.


Revision tags: pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.117 09-Jan-2018 christos

branches: 1.117.2;
Use a queue of deferred entries to delete routes instead of a fixed stack
of 10. Otherwise we can overflow in route deletions from the rexmit timer.
XXX: pullup-8


# 1.116 18-Dec-2017 ozaki-r

Show ARP/NDP caches as LLINFO not LLDATA for backward compatiblity


# 1.115 13-Dec-2017 christos

Add bit definitions for flags so that route(8) can use them.


Revision tags: tls-maxphys-base-20171202
# 1.114 21-Sep-2017 ozaki-r

Invalidate rtcache based on a global generation counter

The change introduces a global generation counter that is incremented when any
routes have been added or deleted. When a rtcache caches a rtentry into itself,
it also stores a snapshot of the generation counter. If the snapshot equals to
the global counter, the cache is still valid, otherwise invalidated.

One drawback of the change is that all rtcaches of all protocol families are
invalidated when any routes of any protocol families are added or deleted.
If that matters, we should have separate generation counters based on
protocol families.

This change removes LIST_ENTRY from struct route, which fixes a part of
PR kern/52515.


Revision tags: nick-nhusb-base-20170825 perseant-stdc-iso10646-base
# 1.113 16-Jun-2017 ozaki-r

Sending a routing message (RTM_ADD) on adding an llentry

A message used to be sent on adding a cloned route. Restore the
behavior for backward compatibility.

Requested by ryo@


Revision tags: netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.112 11-Apr-2017 roy

branches: 1.112.4;
Add RO_MSGFILTER socket option to PF_ROUTE to filter out
un-wanted route(4) messages.

Inspired by the ROUTE_MSGFILTER equivalent in OpenBSD,
but with an API which allows the full range of potential message types.


Revision tags: jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107
# 1.111 19-Dec-2016 roy

branches: 1.111.2;
Fix gcc complaining about int to unsigned long conversion issues by
explictly marking as unsigned in RT_ROUNDUP2.


# 1.110 16-Dec-2016 christos

Can't hide stuff from userland, because struct route is embedded in other
structures (like inpcb) and things like fstat stop working.


# 1.109 12-Dec-2016 ozaki-r

Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.108 08-Dec-2016 ozaki-r

Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.


Revision tags: nick-nhusb-base-20161204
# 1.107 15-Nov-2016 ozaki-r

Don't use rt_walktree to delete routes

Some functions use rt_walktree to scan the routing table and delete
matched routes. However, we shouldn't use rt_walktree to delete
routes because rt_walktree is recursive to the routing table (radix
tree) and isn't friendly to MP-ification. rt_walktree allows a caller
to pass a callback function to delete an matched entry. The callback
function is called from an API of the radix tree (rn_walktree) but
also calls an API of the radix tree to delete an entry.

This change adds a new API of the radix tree, rn_search_matched,
which returns a matched entry that is selected by a callback
function passed by a caller and the caller itself deletes the
entry. By using the API, we can avoid the recursive form.


Revision tags: pgoyette-localcount-20161104
# 1.106 25-Oct-2016 ozaki-r

Remove unnecessary argument

No functional change.


# 1.105 21-Oct-2016 ozaki-r

Make some rt_timer functions and variables static

No functional change.


# 1.104 18-Oct-2016 ozaki-r

Remove unused rtcache_lookup_noclone


Revision tags: nick-nhusb-base-20161004
# 1.103 21-Sep-2016 roy

Add ifam_pid and ifam_addrflags to ifa_msghdr.
Re-version RTM_NEWADDR, RTM_DELADDR, RTM_CHGADDR and NET_RT_IFLIST.
Add compat code for old version.


Revision tags: localcount-20160914 pgoyette-localcount-20160806
# 1.102 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


Revision tags: pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529
# 1.101 28-Apr-2016 ozaki-r

branches: 1.101.2;
Constify rtentry of if_output

We no longer need to change rtentry below if_output.

The change makes it clear where rtentries are changed (or not)
and helps forthcoming locking (os psrefing) rtentries.


# 1.100 26-Apr-2016 ozaki-r

Stop using rt_gwroute on packet sending paths

rt_gwroute of rtentry is a reference to a rtentry of the gateway
for a rtentry with RTF_GATEWAY. That was used by L2 (arp and ndp)
to look up L2 addresses. By separating L2 nexthop caches, we don't
need a route for the purpose and we can stop using rt_gwroute.
By doing so, we can reduce referencing and modifying rtentries,
which makes it easy to apply a lock (and/or psref) to the
routing table and rtentries.

One issue to do this is to keep RTF_REJECT behavior. It seems it
was broken when we moved rtalloc1 things from L2 output routines
(e.g., ether_output) to ip_hresolv_output, but (fortunately?)
it works unexpectedly. What we mistook are:
- RTF_REJECT was checked for any routes in L2 output routines,
but in ip_hresolv_output it is checked only when the route
is RTF_GATEWAY
- The RTF_REJECT check wasn't copied to IPv6 (nd6_output)

It seems that rt_gwroute checks hid the mistakes and it looked
work (unexpectedly) and removing rt_gwroute checks unveil the
issue. So we need to fix RTF_REJECT checks in ip_hresolv_output
and also add them to nd6_output.

One more point we have to care is returning an errno; we need
to mimic looutput behavior. Originally RTF_REJECT check was
done either in L2 output routines or in looutput. The latter is
applied when a reject route directs to a loopback interface.
However, now RTF_REJECT check is done before looutput so to keep
the original behavior we need to return an errno which looutput
chooses. Added rt_check_reject_route does such tweaks.


Revision tags: nick-nhusb-base-20160422
# 1.99 11-Apr-2016 ozaki-r

Don't use radix tree API directly


# 1.98 04-Apr-2016 ozaki-r

Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html


# 1.97 24-Mar-2016 ozaki-r

Constify rt_newmsg's arguments


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.96 02-Sep-2015 ozaki-r

Do rt_refcnt++ when set a rtentry to another rtentry's rt_gwroute

And also do rtfree when deref a rtentry from rt_gwroute.


# 1.95 31-Aug-2015 ozaki-r

Hook up lltable/llentry with the kernel (and rumpkernel)

It is built and initialized on bootup, but there is no user for now.

Most codes in in.c are imported from FreeBSD as well as lltable/llentry.


# 1.94 31-Aug-2015 ozaki-r

Make rt_refcnt take into account rt_timer


# 1.93 24-Aug-2015 ozaki-r

Add an assertion; if rtcache has an rtentry, its refcnt must be > 0


# 1.92 17-Jul-2015 ozaki-r

Reform use of rt_refcnt

rt_refcnt of rtentry was used in bad manners, for example, direct rt_refcnt++
and rt_refcnt-- outside route.c, "rt->rt_refcnt++; rtfree(rt);" idiom, and
touching rt after rt->rt_refcnt--.

These abuses seem to be needed because rt_refcnt manages only references
between rtentry and doesn't take care of references during packet processing
(IOW references from local variables). In order to reduce the above abuses,
the latter cases should be counted by rt_refcnt as well as the former cases.

This change improves consistency of use of rt_refcnt:
- rtentry is always accessed with rt_refcnt incremented
- rtentry's rt_refcnt is decremented after use (rtfree is always used instead
of rt_refcnt--)
- functions returning rtentry increment its rt_refcnt (and caller rtfree it)

Note that rt_refcnt prevents rtentry from being freed but doesn't prevent
rtentry from being updated. Toward MP-safe, we need to provide another
protection for rtentry, e.g., locks. (Or introduce a better data structure
allowing concurrent readers during updates.)


Revision tags: nick-nhusb-base-20150606
# 1.91 30-Apr-2015 ozaki-r

Make some functions static

- rtflushall
- rtcache_clear
- rtcache_invalidate

And pull these static inline functions in route.c

- rt_destroy
- rt_setkey


Revision tags: nick-nhusb-base-20150406
# 1.90 06-Apr-2015 ozaki-r

Classify and sort prototype declarations

No functional change.


# 1.89 06-Apr-2015 ozaki-r

Make rt_maskedcopy static


# 1.88 23-Mar-2015 roy

Add RTF_BROADCAST to mark routes used for the broadcast address when
they are created on the fly. This makes it clear what the route is for
and allows an optimisation in ip_output() by avoiding a call to
in_broadcast() because most of the time we do talk to a host.
It also avoids a needless allocation for the storage of llinfo_arp and
thus vanishes from arp(8) - it showed as incomplete anyway so this
is a nice side effect.

Guard against this and routes marked with RTF_BLACKHOLE in
ip_fastforward().
While here, guard against routes marked with RTF_BLACKHOLE in
ip6_fastforward().
RTF_BROADCAST is IPv4 only, so don't bother checking that here.


# 1.87 26-Feb-2015 roy

Introduce the routing flag RTF_LOCAL to track local address routes.
Add functions rt_ifa_addlocal() and rt_ifa_remlocal() to add and remove
local routes for the address and announce the new address and route
to the routing socket.

Add in_ifaddlocal() and in_ifremlocal() to use these functions.
Rename in6_if{add,rem}loop() to in6_if{add,rem}local() and use these
functions.

rtinit() no longer announces the address, just the network route for the
address. As such, calls to rt_newaddrmsg() have been removed from
in_addprefix() and in_scrubprefix().

This solves the problem of potentially more than one announcement, or no
announcement at all for the address in certain situations.


# 1.86 25-Feb-2015 roy

Rename nd6_rtmsg() to rt_newmsg() and move into the generic routing code
as it's not IPv6 specific and will be used elsewhere.


# 1.85 24-Feb-2015 roy

Clean comments and style.


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.84 06-Jun-2014 rmind

branches: 1.84.4;
- Eliminate RTFREE() macro in favour of rtfree() function.
- Make rtcache() function static.


Revision tags: yamt-pagecache-base9 rmind-smpnet-nbase rmind-smpnet-base
# 1.83 26-Apr-2014 pooka

It's been > 20years since rtioctl() did something. Let's just
remove that special way of returning EOPNOTSUPP.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base
# 1.82 01-Mar-2013 joerg

branches: 1.82.6; 1.82.10;
Retire OSI network stack. OK core@


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6 jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.81 18-Feb-2012 rmind

branches: 1.81.2;
rt_setkey: remove invalid assert, sockaddr_dup() may fail if no memory.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.80 11-Nov-2011 gdt

branches: 1.80.4;
Move RTF_ANNOUNCE flag so that it no longer conflicts with RTF_PROTO2.

RTF_ANNOUNCE was defined as RTF_PROTO2. The flag is used to indicated
that host should act as a proxy for a link level arp or ndp request.
(If RTF_PROTO2 is used as an experimental flag (as advertised),
various problems can occur.)

This commit provides a first-class definition with its own bit for
RTF_ANNOUNCE, removes the old aliasing definitions, and adds support
for the new RTF_ANNOUNCE flag to netstat(8) and route(8).,

Also, remove unused RTF_ flags that collide with RTF_PROTO1:
netinet/icmp6.h defined RTF_PROBEMTU as RTF_PROTO1
netinet/if_inarp.h defined RTF_USETRAILERS as RTF_PROTO1
(Neither of these flags are used anywhere. Both have been removed
to reduce chances of collision with RTF_PROTO1.)

Figuring this out and the diff are the work of Beverly Schwartz of
BBN.

(Passed release build, boot in VM, with no apparently related atf
failures.)

Approved for Public Release, Distribution Unlimited
This material is based upon work supported by the Defense Advanced
Research Projects Agency and Space and Naval Warfare Systems Center,
Pacific, under Contract No. N66001-09-C-2073.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.79 31-Mar-2011 dyoung

branches: 1.79.4;
Hide the radix-trie implementation of the forwarding table so that we
will have an easier time replacing it with something different, even if
it is a second radix-trie implementation.

sys/net/route.c and sys/net/rtsock.c no longer operate directly on
radix_nodes or radix_node_heads.

Hopefully this will reduce the temptation to implement multipath or
source-based routing using grotty hacks to the grotty old radix-trie
code, too. :-)


Revision tags: bouyer-quota2-nbase bouyer-quota2-base
# 1.78 01-Feb-2011 matt

Add a new AF/PF_ROUTE which is 64-bit clean which makes the routing socket
interface (and its associated sysctls) act identically for both 32 and 64 bit
programs. The old unclean one remains for backward compatibility.


# 1.77 26-Jan-2011 dyoung

Update comment on RTM_CHGADDR to describe better what it's for.


Revision tags: jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.76 12-Nov-2010 roy

branches: 1.76.2; 1.76.4;
Add RTM_CHGADDR to signal that an address on the interface has changed.
This is mainly used for notifying userland about active link address changes.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.75 26-Jun-2010 kefren

Add MPLS support, proposed on tech-net@ a couple of days ago

Welcome to 5.99.33


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211
# 1.74 03-Nov-2009 dyoung

branches: 1.74.2; 1.74.4;
s/u_quad_t/uint64_t/.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 jym-xensuspend-nbase yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.73 02-Apr-2009 christos

Centralize the ROUNDUP and ADVANCE macro in a header file, give them an
RT_ prefix and use them appropriately, instead of making copies. Make
pppd use the RT_ROUNDUP macro; fixes proxyarp setting on 64 bit hosts.

XXX: All this should be pulled up to 5.0


Revision tags: nick-hppapmap-base2 mjf-devfs2-base
# 1.72 11-Jan-2009 christos

branches: 1.72.2;
merge christos-time_t


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base christos-time_t-nbase haad-dm-base christos-time_t-base
# 1.71 07-Nov-2008 dyoung

*** Summary ***

When a link-layer address changes (e.g., ifconfig ex0 link
02:de:ad:be:ef:02 active), send a gratuitous ARP and/or a Neighbor
Advertisement to update the network-/link-layer address bindings
on our LAN peers.

Refuse a change of ethernet address to the address 00:00:00:00:00:00
or to any multicast/broadcast address. (Thanks matt@.)

Reorder ifnet ioctl operations so that driver ioctls may inherit
the functions of their "class"---ether_ioctl(), fddi_ioctl(), et
cetera---and the class ioctls may inherit from the generic ioctl,
ifioctl_common(), but both driver- and class-ioctls may override
the generic behavior. Make network drivers share more code.

Distinguish a "factory" link-layer address from others for the
purposes of both protecting that address from deletion and computing
EUI64.

Return consistent, appropriate error codes from network drivers.

Improve readability. KNF.

*** Details ***

In if_attach(), always initialize the interface ioctl routine,
ifnet->if_ioctl, if the driver has not already initialized it.
Delete if_ioctl == NULL tests everywhere else, because it cannot
happen.

In the ioctl routines of network interfaces, inherit common ioctl
behaviors by calling either ifioctl_common() or whichever ioctl
routine is appropriate for the class of interface---e.g., ether_ioctl()
for ethernets.

Stop (ab)using SIOCSIFADDR and start to use SIOCINITIFADDR. In
the user->kernel interface, SIOCSIFADDR's argument was an ifreq,
but on the protocol->ifnet interface, SIOCSIFADDR's argument was
an ifaddr. That was confusing, and it would work against me as I
make it possible for a network interface to overload most ioctls.
On the protocol->ifnet interface, replace SIOCSIFADDR with
SIOCINITIFADDR. In ifioctl(), return EPERM if userland tries to
invoke SIOCINITIFADDR.

In ifioctl(), give the interface the first shot at handling most
interface ioctls, and give the protocol the second shot, instead
of the other way around. Finally, let compatibility code (COMPAT_OSOCK)
take a shot.

Pull device initialization out of switch statements under
SIOCINITIFADDR. For example, pull ..._init() out of any switch
statement that looks like this:

switch (...->sa_family) {
case ...:
..._init();
...
break;
...
default:
..._init();
...
break;
}

Rewrite many if-else clauses that handle all permutations of IFF_UP
and IFF_RUNNING to use a switch statement,

switch (x & (IFF_UP|IFF_RUNNING)) {
case 0:
...
break;
case IFF_RUNNING:
...
break;
case IFF_UP:
...
break;
case IFF_UP|IFF_RUNNING:
...
break;
}

unifdef lots of code containing #ifdef FreeBSD, #ifdef NetBSD, and
#ifdef SIOCSIFMTU, especially in fwip(4) and in ndis(4).

In ipw(4), remove an if_set_sadl() call that is out of place.

In nfe(4), reuse the jumbo MTU logic in ether_ioctl().

Let ethernets register a callback for setting h/w state such as
promiscuous mode and the multicast filter in accord with a change
in the if_flags: ether_set_ifflags_cb() registers a callback that
returns ENETRESET if the caller should reset the ethernet by calling
if_init(), 0 on success, != 0 on failure. Pull common code from
ex(4), gem(4), nfe(4), sip(4), tlp(4), vge(4) into ether_ioctl(),
and register if_flags callbacks for those drivers.

Return ENOTTY instead of EINVAL for inappropriate ioctls. In
zyd(4), use ENXIO instead of ENOTTY to indicate that the device is
not any longer attached.

Add to if_set_sadl() a boolean 'factory' argument that indicates
whether a link-layer address was assigned by the factory or some
other source. In a comment, recommend using the factory address
for generating an EUI64, and update in6_get_hw_ifid() to prefer a
factory address to any other link-layer address.

Add a routing message, RTM_LLINFO_UPD, that tells protocols to
update the binding of network-layer addresses to link-layer addresses.
Implement this message in IPv4 and IPv6 by sending a gratuitous
ARP or a neighbor advertisement, respectively. Generate RTM_LLINFO_UPD
messages on a change of an interface's link-layer address.

In ether_ioctl(), do not let SIOCALIFADDR set a link-layer address
that is broadcast/multicast or equal to 00:00:00:00:00:00.

Make ether_ioctl() call ifioctl_common() to handle ioctls that it
does not understand.

In gif(4), initialize if_softc and use it, instead of assuming that
the gif_softc and ifp overlap.

Let ifioctl_common() handle SIOCGIFADDR.

Sprinkle rtcache_invariants(), which checks on DIAGNOSTIC kernels
that certain invariants on a struct route are satisfied.

In agr(4), rewrite agr_ioctl_filter() to be a bit more explicit
about the ioctls that we do not allow on an agr(4) member interface.

bzero -> memset. Delete unnecessary casts to void *. Use
sockaddr_in_init() and sockaddr_in6_init(). Compare pointers with
NULL instead of "testing truth". Replace some instances of (type
*)0 with NULL. Change some K&R prototypes to ANSI C, and join
lines.


Revision tags: netbsd-5-0-RC3 netbsd-5-0-RC2 netbsd-5-0-RC1 netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-baseX yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base yamt-nfs-mp-base yamt-pf42-base ad-socklock-base1
# 1.70 26-Mar-2008 ad

branches: 1.70.2; 1.70.6; 1.70.12; 1.70.14; 1.70.16;
Defer processing of routing messages to a soft interrupt. These can be
generated at IPL_VM and it's not safe to call directly into the socket
layer at that level. Reviewed by matt@.


Revision tags: yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase hpcarm-cleanup-base
# 1.69 20-Feb-2008 matt

branches: 1.69.2; 1.69.6;
s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)


Revision tags: mjf-devfs-base
# 1.68 11-Feb-2008 simonb

Don't look for <stdbool.h> if compiling _STANDALONE as well.


Revision tags: bouyer-xeni386-nbase
# 1.67 21-Jan-2008 dyoung

struct route is part of the kernel ABI (!!!), so move it back
outside of #ifdef _KERNEL. #include stdbool.h if !_KERNEL.


# 1.66 21-Jan-2008 dyoung

Move struct route inside of #ifdef _KERNEL to protect userland from
it.


# 1.65 21-Jan-2008 dyoung

In rtflushall(), do not clear a route cache by removing its rtentry
reference, but mark the cache 'invalid'. Let the next user of the
route cache check to whether or not the cache is valid, and update
the rtentry reference if necessary. In this way, avoid hairy
splnet()/splx() protection of route caches, which I never did trust.


Revision tags: bouyer-xeni386-base
# 1.64 14-Jan-2008 dyoung

Use rtcache_validate() instead of rtcache_getrt(). Delete rtcache_getrt().

In rtcache_lookup2(), use the return values of rtcache_validate()
and _rtcache_init() instead of looking at _ro_rt. Also, check the
return code of rtcache_setdst() for an error.


# 1.63 12-Jan-2008 dyoung

Good-bye, rtcache_check(). Call both rtcache_validate() and
rtcache_update(,1) instead of rtcache_check().


# 1.62 11-Jan-2008 dyoung

Cosmetic: remove redundant 'not' from a comment, re-wrap lines.


# 1.61 10-Jan-2008 dyoung

Make many void rtcache_X() routines return struct rtentry *, so
that we can make many back-to-back rtcache_X();rtcache_getrt()
calls into one rtcache_X() call.


Revision tags: matt-armv6-base
# 1.60 04-Jan-2008 dyoung

Replace rtcache_down() with rtcache_validate() and update rtcache_down()
uses.


Revision tags: vmlocking2-base3
# 1.59 20-Dec-2007 dyoung

Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.


Revision tags: nick-csl-alignment-base5 matt-armv6-prevmlocking yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base jmcneill-pm-base reinoud-bufcleanup-base vmlocking-base
# 1.58 27-Aug-2007 dyoung

branches: 1.58.2; 1.58.8; 1.58.10; 1.58.14;
Add a new routing message type, RTM_SETGATE. We can use an
RTM_SETGATE message to ask the link layer to fill in the link-layer
nexthop before we try to detect a duplicate route in a multipath-capable
kernel.


Revision tags: matt-mips64-base
# 1.57 19-Jul-2007 dyoung

branches: 1.57.4; 1.57.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.56 09-Jun-2007 dyoung

branches: 1.56.2;
Get rid of radix_node_head.rnh_walktree, because it is only ever
set to rn_walktree.

Introduce rt_walktree(), which applies a subroutine to every route
in a particular address family. Use it instead of rn_walktree()
virtually everywhere. This helps to hide the routing table
implementation.


Revision tags: yamt-idlelwp-base8
# 1.55 06-May-2007 dyoung

Factor rtcache_lookup2() out of rtcache_lookup1(), for re-use in
the IPv6 stack. rtcache_lookup2() takes an int * argument that it
writes with 1 if we had a cache 'hit', 0 if there was a cache
'miss'.


# 1.54 02-May-2007 dyoung

Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


# 1.53 22-Apr-2007 xtraeme

rtcache_clear is defined as static void in route.c, but it's used
in netinet/in_route.c. Move the prototype into route.h to fix
the build.


Revision tags: thorpej-atomic-base
# 1.52 04-Mar-2007 christos

branches: 1.52.2; 1.52.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.51 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.50 05-Jan-2007 joerg

branches: 1.50.2;
Add a debug option for the route cache to help tracing down issues
like PR 35272 and 35318. When the kernel is compiled with
-DRTCACHE_DEBUG, all rtcache entries are logged to a list with the place
they got initialised. This allows overwrites, double inits and other
manual messing to be detected.


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.49 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.48 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


# 1.47 07-Dec-2006 joerg

Deinline rt_get_ifa. Keep it in route.c as it is part of the routing
API, even though rtsock.c is the only user right now.


# 1.46 07-Dec-2006 joerg

Deinline rt_replace_ifa and move rt_set_ifa and rt_set_ifa1 to
route.c as they are not used outside that file.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base netbsd-4-base
# 1.45 13-Nov-2006 dyoung

Fix bugs in rt_get_ifa() and put aside the sequence number stuff,
which isn't ready for primetime yet.


# 1.44 13-Nov-2006 dyoung

Add a source-address selection policy mechanism to the kernel.

Also, add ioctls SIOCGIFADDRPREF/SIOCSIFADDRPREF to get/set preference
numbers for addresses. Make ifconfig(8) set/display preference
numbers.

To activate source-address selection policies in your kernel, add
'options IPSELSRC' to your kernel configuration.

Miscellaneous changes in support of source-address selection:

1 Factor out some common code, producing rt_replace_ifa().

2 Abbreviate a for-loop with TAILQ_FOREACH().

3 Add the predicates on IPv4 addresses IN_LINKLOCAL() and
IN_PRIVATE(), that are true for link-local unicast
(169.254/16) and RFC1918 private addresses, respectively.
Add the predicate IN_ANY_LOCAL() that is true for link-local
unicast and multicast.

4 Add IPv4-specific interface attach/detach routines,
in_domifattach and in_domifdetach, which build #ifdef
IPSELSRC.

See in_getifa(9) for a more thorough description of source-address
selection policy.


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base2 yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base simonb-timcounters-final yamt-pdpolicy-base5 chap-midi-base yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 elad-kernelauth-base yamt-pdpolicy-base yamt-uio_vmspace-base5 simonb-timecounters-base rpaulo-netinet-merge-pcb-base
# 1.43 11-Dec-2005 christos

branches: 1.43.20; 1.43.22;
merge ktrace-lwp.


Revision tags: ktrace-lwp-base
# 1.42 10-Dec-2005 elad

Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.41 22-Jun-2005 dyoung

branches: 1.41.2;
Resolve conflicts in importation of 18-May-2005 ath(4) / net80211(9)
from FreeBSD. Introduce compatibility shims (sys/dev/ic/ath_netbsd.[ch],
sys/net80211/ieee80211_netbsd.[ch]). Update drivers (an, atu, atw,
awi, ipw, iwi, rtw, wi) for the new net80211(9) API.


# 1.40 29-May-2005 christos

- sprinkle const
- remove unneeded casts
- use more mem*() instead of b*() funcs.


Revision tags: netbsd-3-1-1-RELEASE netbsd-3-0-3-RELEASE netbsd-3-1-RELEASE netbsd-3-0-2-RELEASE netbsd-3-1-RC4 netbsd-3-1-RC3 netbsd-3-1-RC2 netbsd-3-1-RC1 netbsd-3-0-1-RELEASE netbsd-3-0-RELEASE netbsd-3-0-RC6 netbsd-3-0-RC5 netbsd-3-0-RC4 netbsd-3-0-RC3 netbsd-3-0-RC2 netbsd-3-0-RC1 yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.39 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.38 21-Apr-2004 matt

branches: 1.38.4; 1.38.6;
ANSI-fy and some additional de-__P and constification.


# 1.37 21-Apr-2004 matt

Constify if.c radix.c and route.c (and fix related fallout).


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.36 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.35 29-Jun-2003 fvdl

branches: 1.35.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.34 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.33 18-Jan-2003 wiz

bandwidth, not bandwith.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.32 12-Nov-2002 itojun

remove all entries in rt timer queue on ip_mtudisc change, instead of
destroying the queue.


# 1.31 12-Nov-2002 itojun

add an argument to rt_timer_remove_all(), to specify if we need to call
timeout routine on removal.


# 1.30 02-Nov-2002 perry

/*CONTCOND*/ while (0)'ed macros


Revision tags: kqueue-aftermerge kqueue-beforemerge netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base gehenna-devsw-base kqueue-base
# 1.29 12-May-2002 matt

branches: 1.29.4;
Eliminate more commons.


Revision tags: eeh-devprop-base newlock-base ifpoll-base thorpej-mips-cache-base thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.28 08-Mar-2001 enami

branches: 1.28.2;
- lineup comment.
- fix typo in comment.


# 1.27 21-Feb-2001 itojun

branches: 1.27.2;
use u_quad_t for rtstat.
not sure if it really matters, but short (32K) looks way too small given
recent fat pipes connecting *BSD boxes, and our great uptime :-).


# 1.26 27-Jan-2001 itojun

cleanup cloned route when parent route (RTF_CLONING) goes away.
adds rt_parent to link parent from child (like NRL did, ours do refcnt
rt_refcnt properly).

bsdi rt_walkbranch would speedup the processing, but since the code will not
be visited too frequently, the current code (with rt_walktree) should be okay.


# 1.25 27-Jan-2001 itojun

mark cloned routes with RTF_CLONED. present it with netstat -r by "c".

let static routes overwrite cloned routes, as cloned routes can come back again
if necessary. behavior same as freebsd/bsdi, code partially from bsdi42.
(NRL rt->rt_parent was not added)
should fix PR 11916 and maybe some other PRs with ARP behavior.

recompilation of usr.sbin/route6d is suggested.


# 1.24 17-Jan-2001 itojun

pull post-4.4BSD change to sys/net/route.c from BSD/OS 4.2 (UCB copyrighted).

have sys/net/route.c:rtrequest1(), which takes rt_addrinfo * as the argument.
pass rt_addrinfo all the way down to rtrequest, and ifa->ifa_rtrequest.
3rd arg of ifa->ifa_rtrequest is now rt_addrinfo * instead of sockaddr *
(almost noone is using it anyways).

benefit: the follwoing command now works. previously we need two route(8)
invocations, "add" then "change".
# route add -inet6 default ::1 -ifp gif0

remove unsafe typecast in rtrequest(), from rtentry * to sockaddr *. it was
introduced by 4.3BSD-reno and never corrected.

XXX is eon_rtrequest() change correct regarding to 3rd arg?
eon_rtrequest() and rtrequest() were incorrect since 4.3BSD-reno,
so i do not have correct answer in the source code.
someone with more clue about netiso-over-ip, please help.


# 1.23 09-Dec-2000 itojun

update icmp6 too big validation. the change is necessary since pmtud is
mandatory for IPv6 (so we can't just validate by using connected pcb - we need
to allow traffic from unconnected pcb to do pmtud).
- if the traffic is validated by xx_ctlinput, allow up to "hiwat" pmtud
route entries.
- if the traffic was not validated by xx_ctlinput, allow up to "lowat" pmtud
route entries (there's upper limit, so bad guys cannot blow up our routing
table).
sync with kame

XXX need to think again about default hiwat/lowat value.
XXX victim selection to help starvation case


Revision tags: netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.22 04-May-2000 ragge

branches: 1.22.4;
Change rt_refcnt from short to int, to allow more than 32k routes thru
one interface without unexpected side effects.


# 1.21 06-Mar-2000 thorpej

- Add link status to if_data, so that routing daemons and other interested
parties can easily know the state of a link.
- Define an interface announcement message for the routing socket so that
routing daemons and other interested parties know when an interface
is attached/detached.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.20 19-Nov-1999 bouyer

Update protocoles and interfaces stats counters to 64bit.
RTM_IFINFO is now 0xf, 0xe is RTM_OIFINFO which returns the old (if_msghdr14)
struct with 32bit counters (binary compat, conditioned on COMPAT_14).
Same for sysctl: node 3 is renamed NET_RT_OIFLIST, NET_RT_IFLIST is now node 4.
Change rt_msg1() to add an mbuf to the mbuf chain instead of just panic()
when the message is larger than MHLEN.


Revision tags: comdex-fall-1999-base fvdl-softdep-base chs-ubc2-base
# 1.19 30-Jul-1999 itojun

branches: 1.19.2; 1.19.8;
remove reference to in6_systm.h (file itself will be removed afterwords)


# 1.18 01-Jul-1999 itojun

IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


Revision tags: netbsd-1-4-PATCH003 netbsd-1-4-PATCH002 netbsd-1-4-PATCH001 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.17 27-Dec-1998 thorpej

branches: 1.17.4; 1.17.6;
Simplify the rttimer code somewhat; use TAILQs instead of CIRCLEQs (we
didn't really need to traverse the queues backwards anyhow), and other
minor code simplification.


# 1.16 10-Dec-1998 christos

IPX counters and centralize statistics routine.


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.15 25-Aug-1998 thorpej

Use do { ... } while (0) in RTFREE().


Revision tags: eeh-paddr_t-base
# 1.14 02-May-1998 thorpej

Need <sys/socket.h> to stand alone.


# 1.13 29-Apr-1998 thorpej

Oops, we depend on <sys/queue.h>.


# 1.12 29-Apr-1998 kml

Add generic route timeout functionality; used by path MTU discovery code


Revision tags: netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base thorpej-signal-base marc-pcmcia-bp marc-pcmcia-base
# 1.11 02-Apr-1997 christos

branches: 1.11.8;
Sync with Lite2.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.10 22-May-1996 mycroft

Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.9 13-Feb-1996 christos

branches: 1.9.4;
Net prototypes


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.8 26-Mar-1995 jtc

KERNEL -> _KERNEL


# 1.7 08-Mar-1995 cgd

fixed sized types, where appropriate. when casting pointers to
integers to do math on them, cast to long. ioctl commands are
u_longs.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.6 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.5 13-May-1994 mycroft

Update to 4.4-Lite networking code, with a few local changes.


# 1.4 11-May-1994 mycroft

Update to RTM version 3. Add prototypes. Add some new constants which are
not used yet.


Revision tags: magnum-base netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.3 20-May-1993 cgd

add rcs ids to everything, and clean up headers


# 1.2 19-Apr-1993 mycroft

Add consistent multiple-inclusion protection.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.130 26-Aug-2022 knakahara

Refactor: rtrequest_newmsg() is no longer used after nd6_rtr.c:r1.149

That has bumped up to 9.99.66 when nd6_rtr.c:r1.149 was commited.


# 1.129 09-Aug-2021 andvar

fix various typos in compatibility, mainly in comments.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.128 22-Mar-2021 christos

Add a list of names


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406
# 1.127 09-Mar-2020 roy

branches: 1.127.6; 1.127.8;
route: RTM_MISS now puts the message source address in RTA_AUTHOR

route(8) also reports this.
A userland app could use this to blacklist nodes who probe for machines
that doesn't exist on a subnet / prefix.


Revision tags: ad-namecache-base3
# 1.126 08-Feb-2020 roy

route(4): add RO_MISSFILTER socket option

This allows filtering of specific RTM_MISS destination sockaddrs.


Revision tags: ad-namecache-base2 ad-namecache-base1 ad-namecache-base phil-wifi-20191119
# 1.125 19-Sep-2019 ozaki-r

branches: 1.125.2;
Avoid having a rtcache directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.
A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.

Reviewed by knakahara@ and yamaguchi@


# 1.124 22-Aug-2019 roy

rtsock: rework rt_clonedmsg to take a message type and lladdr

We will use this in a future patch to notify userland of lladdr
changes.

XXX pullup -8 -9


Revision tags: netbsd-9-base phil-wifi-20190609
# 1.123 29-Apr-2019 roy

branches: 1.123.2;
Introduce rt_addrmsg_src which adds RTA_AUTHOR to the message.
Use this when we notify userland of a duplicate address
and set RTA_AUTHOR to the hardware address of the sender.

While here, match the logging diagnostic of INET6 to the simpler one
of INET so it's consistent.


# 1.122 29-Apr-2019 roy

rtsock: Route address message simplification

Rename rt_newaddrmsg to rt_addrmsg_rt.
Add rt_addrmsg which drops the error and route arguments which are only
needed by one caller.


# 1.121 29-Apr-2019 pgoyette

For the rtsock compat code, make sure we create the "oroute" sysctl
tree. Otherwise a 5.2 version of getifaddrs(2) gets errors.

This makes the 5.2 version of ifconfig(8) behave the same on both
NetBSD-8 and -current. HOWEVER, both of them print nothing (for
``ifconfig -l'' command) so there's still a bug somewhere.

As reported originally by der Mouse.


Revision tags: isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126
# 1.120 30-Oct-2018 ozaki-r

Avoid double rt_replace_ifa on rtrequest1(RTM_ADD)

Some callers of rtrequest1(RTM_ADD) adjust rt_ifa of an rtentry created by
rtrequest1 that may change rt_ifa (in ifa_rtrequest) with another ifa that is
different from requested one. It's wasteful and even worse introduces a race
condition. rtrequest1 should just use a passed ifa as is if a caller hopes so.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422
# 1.119 19-Apr-2018 christos

branches: 1.119.2;
s/static inline/static __inline/g for consistency.


Revision tags: pgoyette-compat-0415
# 1.118 12-Apr-2018 ozaki-r

Resolve tangled lock dependencies in route.c

This change sweeps remaining lock decisions based on if locked or not by moving
utility functions of rtentry updates from rtsock.c and ensuring holding the
rt_lock. It also improves the atomicity of a update of a rtentry.


Revision tags: pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.117 09-Jan-2018 christos

branches: 1.117.2;
Use a queue of deferred entries to delete routes instead of a fixed stack
of 10. Otherwise we can overflow in route deletions from the rexmit timer.
XXX: pullup-8


# 1.116 18-Dec-2017 ozaki-r

Show ARP/NDP caches as LLINFO not LLDATA for backward compatiblity


# 1.115 13-Dec-2017 christos

Add bit definitions for flags so that route(8) can use them.


Revision tags: tls-maxphys-base-20171202
# 1.114 21-Sep-2017 ozaki-r

Invalidate rtcache based on a global generation counter

The change introduces a global generation counter that is incremented when any
routes have been added or deleted. When a rtcache caches a rtentry into itself,
it also stores a snapshot of the generation counter. If the snapshot equals to
the global counter, the cache is still valid, otherwise invalidated.

One drawback of the change is that all rtcaches of all protocol families are
invalidated when any routes of any protocol families are added or deleted.
If that matters, we should have separate generation counters based on
protocol families.

This change removes LIST_ENTRY from struct route, which fixes a part of
PR kern/52515.


Revision tags: nick-nhusb-base-20170825 perseant-stdc-iso10646-base
# 1.113 16-Jun-2017 ozaki-r

Sending a routing message (RTM_ADD) on adding an llentry

A message used to be sent on adding a cloned route. Restore the
behavior for backward compatibility.

Requested by ryo@


Revision tags: netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.112 11-Apr-2017 roy

branches: 1.112.4;
Add RO_MSGFILTER socket option to PF_ROUTE to filter out
un-wanted route(4) messages.

Inspired by the ROUTE_MSGFILTER equivalent in OpenBSD,
but with an API which allows the full range of potential message types.


Revision tags: jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107
# 1.111 19-Dec-2016 roy

branches: 1.111.2;
Fix gcc complaining about int to unsigned long conversion issues by
explictly marking as unsigned in RT_ROUNDUP2.


# 1.110 16-Dec-2016 christos

Can't hide stuff from userland, because struct route is embedded in other
structures (like inpcb) and things like fstat stop working.


# 1.109 12-Dec-2016 ozaki-r

Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.108 08-Dec-2016 ozaki-r

Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.


Revision tags: nick-nhusb-base-20161204
# 1.107 15-Nov-2016 ozaki-r

Don't use rt_walktree to delete routes

Some functions use rt_walktree to scan the routing table and delete
matched routes. However, we shouldn't use rt_walktree to delete
routes because rt_walktree is recursive to the routing table (radix
tree) and isn't friendly to MP-ification. rt_walktree allows a caller
to pass a callback function to delete an matched entry. The callback
function is called from an API of the radix tree (rn_walktree) but
also calls an API of the radix tree to delete an entry.

This change adds a new API of the radix tree, rn_search_matched,
which returns a matched entry that is selected by a callback
function passed by a caller and the caller itself deletes the
entry. By using the API, we can avoid the recursive form.


Revision tags: pgoyette-localcount-20161104
# 1.106 25-Oct-2016 ozaki-r

Remove unnecessary argument

No functional change.


# 1.105 21-Oct-2016 ozaki-r

Make some rt_timer functions and variables static

No functional change.


# 1.104 18-Oct-2016 ozaki-r

Remove unused rtcache_lookup_noclone


Revision tags: nick-nhusb-base-20161004
# 1.103 21-Sep-2016 roy

Add ifam_pid and ifam_addrflags to ifa_msghdr.
Re-version RTM_NEWADDR, RTM_DELADDR, RTM_CHGADDR and NET_RT_IFLIST.
Add compat code for old version.


Revision tags: localcount-20160914 pgoyette-localcount-20160806
# 1.102 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


Revision tags: pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529
# 1.101 28-Apr-2016 ozaki-r

branches: 1.101.2;
Constify rtentry of if_output

We no longer need to change rtentry below if_output.

The change makes it clear where rtentries are changed (or not)
and helps forthcoming locking (os psrefing) rtentries.


# 1.100 26-Apr-2016 ozaki-r

Stop using rt_gwroute on packet sending paths

rt_gwroute of rtentry is a reference to a rtentry of the gateway
for a rtentry with RTF_GATEWAY. That was used by L2 (arp and ndp)
to look up L2 addresses. By separating L2 nexthop caches, we don't
need a route for the purpose and we can stop using rt_gwroute.
By doing so, we can reduce referencing and modifying rtentries,
which makes it easy to apply a lock (and/or psref) to the
routing table and rtentries.

One issue to do this is to keep RTF_REJECT behavior. It seems it
was broken when we moved rtalloc1 things from L2 output routines
(e.g., ether_output) to ip_hresolv_output, but (fortunately?)
it works unexpectedly. What we mistook are:
- RTF_REJECT was checked for any routes in L2 output routines,
but in ip_hresolv_output it is checked only when the route
is RTF_GATEWAY
- The RTF_REJECT check wasn't copied to IPv6 (nd6_output)

It seems that rt_gwroute checks hid the mistakes and it looked
work (unexpectedly) and removing rt_gwroute checks unveil the
issue. So we need to fix RTF_REJECT checks in ip_hresolv_output
and also add them to nd6_output.

One more point we have to care is returning an errno; we need
to mimic looutput behavior. Originally RTF_REJECT check was
done either in L2 output routines or in looutput. The latter is
applied when a reject route directs to a loopback interface.
However, now RTF_REJECT check is done before looutput so to keep
the original behavior we need to return an errno which looutput
chooses. Added rt_check_reject_route does such tweaks.


Revision tags: nick-nhusb-base-20160422
# 1.99 11-Apr-2016 ozaki-r

Don't use radix tree API directly


# 1.98 04-Apr-2016 ozaki-r

Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html


# 1.97 24-Mar-2016 ozaki-r

Constify rt_newmsg's arguments


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.96 02-Sep-2015 ozaki-r

Do rt_refcnt++ when set a rtentry to another rtentry's rt_gwroute

And also do rtfree when deref a rtentry from rt_gwroute.


# 1.95 31-Aug-2015 ozaki-r

Hook up lltable/llentry with the kernel (and rumpkernel)

It is built and initialized on bootup, but there is no user for now.

Most codes in in.c are imported from FreeBSD as well as lltable/llentry.


# 1.94 31-Aug-2015 ozaki-r

Make rt_refcnt take into account rt_timer


# 1.93 24-Aug-2015 ozaki-r

Add an assertion; if rtcache has an rtentry, its refcnt must be > 0


# 1.92 17-Jul-2015 ozaki-r

Reform use of rt_refcnt

rt_refcnt of rtentry was used in bad manners, for example, direct rt_refcnt++
and rt_refcnt-- outside route.c, "rt->rt_refcnt++; rtfree(rt);" idiom, and
touching rt after rt->rt_refcnt--.

These abuses seem to be needed because rt_refcnt manages only references
between rtentry and doesn't take care of references during packet processing
(IOW references from local variables). In order to reduce the above abuses,
the latter cases should be counted by rt_refcnt as well as the former cases.

This change improves consistency of use of rt_refcnt:
- rtentry is always accessed with rt_refcnt incremented
- rtentry's rt_refcnt is decremented after use (rtfree is always used instead
of rt_refcnt--)
- functions returning rtentry increment its rt_refcnt (and caller rtfree it)

Note that rt_refcnt prevents rtentry from being freed but doesn't prevent
rtentry from being updated. Toward MP-safe, we need to provide another
protection for rtentry, e.g., locks. (Or introduce a better data structure
allowing concurrent readers during updates.)


Revision tags: nick-nhusb-base-20150606
# 1.91 30-Apr-2015 ozaki-r

Make some functions static

- rtflushall
- rtcache_clear
- rtcache_invalidate

And pull these static inline functions in route.c

- rt_destroy
- rt_setkey


Revision tags: nick-nhusb-base-20150406
# 1.90 06-Apr-2015 ozaki-r

Classify and sort prototype declarations

No functional change.


# 1.89 06-Apr-2015 ozaki-r

Make rt_maskedcopy static


# 1.88 23-Mar-2015 roy

Add RTF_BROADCAST to mark routes used for the broadcast address when
they are created on the fly. This makes it clear what the route is for
and allows an optimisation in ip_output() by avoiding a call to
in_broadcast() because most of the time we do talk to a host.
It also avoids a needless allocation for the storage of llinfo_arp and
thus vanishes from arp(8) - it showed as incomplete anyway so this
is a nice side effect.

Guard against this and routes marked with RTF_BLACKHOLE in
ip_fastforward().
While here, guard against routes marked with RTF_BLACKHOLE in
ip6_fastforward().
RTF_BROADCAST is IPv4 only, so don't bother checking that here.


# 1.87 26-Feb-2015 roy

Introduce the routing flag RTF_LOCAL to track local address routes.
Add functions rt_ifa_addlocal() and rt_ifa_remlocal() to add and remove
local routes for the address and announce the new address and route
to the routing socket.

Add in_ifaddlocal() and in_ifremlocal() to use these functions.
Rename in6_if{add,rem}loop() to in6_if{add,rem}local() and use these
functions.

rtinit() no longer announces the address, just the network route for the
address. As such, calls to rt_newaddrmsg() have been removed from
in_addprefix() and in_scrubprefix().

This solves the problem of potentially more than one announcement, or no
announcement at all for the address in certain situations.


# 1.86 25-Feb-2015 roy

Rename nd6_rtmsg() to rt_newmsg() and move into the generic routing code
as it's not IPv6 specific and will be used elsewhere.


# 1.85 24-Feb-2015 roy

Clean comments and style.


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.84 06-Jun-2014 rmind

branches: 1.84.4;
- Eliminate RTFREE() macro in favour of rtfree() function.
- Make rtcache() function static.


Revision tags: yamt-pagecache-base9 rmind-smpnet-nbase rmind-smpnet-base
# 1.83 26-Apr-2014 pooka

It's been > 20years since rtioctl() did something. Let's just
remove that special way of returning EOPNOTSUPP.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base
# 1.82 01-Mar-2013 joerg

branches: 1.82.6; 1.82.10;
Retire OSI network stack. OK core@


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6 jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.81 18-Feb-2012 rmind

branches: 1.81.2;
rt_setkey: remove invalid assert, sockaddr_dup() may fail if no memory.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.80 11-Nov-2011 gdt

branches: 1.80.4;
Move RTF_ANNOUNCE flag so that it no longer conflicts with RTF_PROTO2.

RTF_ANNOUNCE was defined as RTF_PROTO2. The flag is used to indicated
that host should act as a proxy for a link level arp or ndp request.
(If RTF_PROTO2 is used as an experimental flag (as advertised),
various problems can occur.)

This commit provides a first-class definition with its own bit for
RTF_ANNOUNCE, removes the old aliasing definitions, and adds support
for the new RTF_ANNOUNCE flag to netstat(8) and route(8).,

Also, remove unused RTF_ flags that collide with RTF_PROTO1:
netinet/icmp6.h defined RTF_PROBEMTU as RTF_PROTO1
netinet/if_inarp.h defined RTF_USETRAILERS as RTF_PROTO1
(Neither of these flags are used anywhere. Both have been removed
to reduce chances of collision with RTF_PROTO1.)

Figuring this out and the diff are the work of Beverly Schwartz of
BBN.

(Passed release build, boot in VM, with no apparently related atf
failures.)

Approved for Public Release, Distribution Unlimited
This material is based upon work supported by the Defense Advanced
Research Projects Agency and Space and Naval Warfare Systems Center,
Pacific, under Contract No. N66001-09-C-2073.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.79 31-Mar-2011 dyoung

branches: 1.79.4;
Hide the radix-trie implementation of the forwarding table so that we
will have an easier time replacing it with something different, even if
it is a second radix-trie implementation.

sys/net/route.c and sys/net/rtsock.c no longer operate directly on
radix_nodes or radix_node_heads.

Hopefully this will reduce the temptation to implement multipath or
source-based routing using grotty hacks to the grotty old radix-trie
code, too. :-)


Revision tags: bouyer-quota2-nbase bouyer-quota2-base
# 1.78 01-Feb-2011 matt

Add a new AF/PF_ROUTE which is 64-bit clean which makes the routing socket
interface (and its associated sysctls) act identically for both 32 and 64 bit
programs. The old unclean one remains for backward compatibility.


# 1.77 26-Jan-2011 dyoung

Update comment on RTM_CHGADDR to describe better what it's for.


Revision tags: jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.76 12-Nov-2010 roy

branches: 1.76.2; 1.76.4;
Add RTM_CHGADDR to signal that an address on the interface has changed.
This is mainly used for notifying userland about active link address changes.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.75 26-Jun-2010 kefren

Add MPLS support, proposed on tech-net@ a couple of days ago

Welcome to 5.99.33


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211
# 1.74 03-Nov-2009 dyoung

branches: 1.74.2; 1.74.4;
s/u_quad_t/uint64_t/.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 jym-xensuspend-nbase yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.73 02-Apr-2009 christos

Centralize the ROUNDUP and ADVANCE macro in a header file, give them an
RT_ prefix and use them appropriately, instead of making copies. Make
pppd use the RT_ROUNDUP macro; fixes proxyarp setting on 64 bit hosts.

XXX: All this should be pulled up to 5.0


Revision tags: nick-hppapmap-base2 mjf-devfs2-base
# 1.72 11-Jan-2009 christos

branches: 1.72.2;
merge christos-time_t


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base christos-time_t-nbase haad-dm-base christos-time_t-base
# 1.71 07-Nov-2008 dyoung

*** Summary ***

When a link-layer address changes (e.g., ifconfig ex0 link
02:de:ad:be:ef:02 active), send a gratuitous ARP and/or a Neighbor
Advertisement to update the network-/link-layer address bindings
on our LAN peers.

Refuse a change of ethernet address to the address 00:00:00:00:00:00
or to any multicast/broadcast address. (Thanks matt@.)

Reorder ifnet ioctl operations so that driver ioctls may inherit
the functions of their "class"---ether_ioctl(), fddi_ioctl(), et
cetera---and the class ioctls may inherit from the generic ioctl,
ifioctl_common(), but both driver- and class-ioctls may override
the generic behavior. Make network drivers share more code.

Distinguish a "factory" link-layer address from others for the
purposes of both protecting that address from deletion and computing
EUI64.

Return consistent, appropriate error codes from network drivers.

Improve readability. KNF.

*** Details ***

In if_attach(), always initialize the interface ioctl routine,
ifnet->if_ioctl, if the driver has not already initialized it.
Delete if_ioctl == NULL tests everywhere else, because it cannot
happen.

In the ioctl routines of network interfaces, inherit common ioctl
behaviors by calling either ifioctl_common() or whichever ioctl
routine is appropriate for the class of interface---e.g., ether_ioctl()
for ethernets.

Stop (ab)using SIOCSIFADDR and start to use SIOCINITIFADDR. In
the user->kernel interface, SIOCSIFADDR's argument was an ifreq,
but on the protocol->ifnet interface, SIOCSIFADDR's argument was
an ifaddr. That was confusing, and it would work against me as I
make it possible for a network interface to overload most ioctls.
On the protocol->ifnet interface, replace SIOCSIFADDR with
SIOCINITIFADDR. In ifioctl(), return EPERM if userland tries to
invoke SIOCINITIFADDR.

In ifioctl(), give the interface the first shot at handling most
interface ioctls, and give the protocol the second shot, instead
of the other way around. Finally, let compatibility code (COMPAT_OSOCK)
take a shot.

Pull device initialization out of switch statements under
SIOCINITIFADDR. For example, pull ..._init() out of any switch
statement that looks like this:

switch (...->sa_family) {
case ...:
..._init();
...
break;
...
default:
..._init();
...
break;
}

Rewrite many if-else clauses that handle all permutations of IFF_UP
and IFF_RUNNING to use a switch statement,

switch (x & (IFF_UP|IFF_RUNNING)) {
case 0:
...
break;
case IFF_RUNNING:
...
break;
case IFF_UP:
...
break;
case IFF_UP|IFF_RUNNING:
...
break;
}

unifdef lots of code containing #ifdef FreeBSD, #ifdef NetBSD, and
#ifdef SIOCSIFMTU, especially in fwip(4) and in ndis(4).

In ipw(4), remove an if_set_sadl() call that is out of place.

In nfe(4), reuse the jumbo MTU logic in ether_ioctl().

Let ethernets register a callback for setting h/w state such as
promiscuous mode and the multicast filter in accord with a change
in the if_flags: ether_set_ifflags_cb() registers a callback that
returns ENETRESET if the caller should reset the ethernet by calling
if_init(), 0 on success, != 0 on failure. Pull common code from
ex(4), gem(4), nfe(4), sip(4), tlp(4), vge(4) into ether_ioctl(),
and register if_flags callbacks for those drivers.

Return ENOTTY instead of EINVAL for inappropriate ioctls. In
zyd(4), use ENXIO instead of ENOTTY to indicate that the device is
not any longer attached.

Add to if_set_sadl() a boolean 'factory' argument that indicates
whether a link-layer address was assigned by the factory or some
other source. In a comment, recommend using the factory address
for generating an EUI64, and update in6_get_hw_ifid() to prefer a
factory address to any other link-layer address.

Add a routing message, RTM_LLINFO_UPD, that tells protocols to
update the binding of network-layer addresses to link-layer addresses.
Implement this message in IPv4 and IPv6 by sending a gratuitous
ARP or a neighbor advertisement, respectively. Generate RTM_LLINFO_UPD
messages on a change of an interface's link-layer address.

In ether_ioctl(), do not let SIOCALIFADDR set a link-layer address
that is broadcast/multicast or equal to 00:00:00:00:00:00.

Make ether_ioctl() call ifioctl_common() to handle ioctls that it
does not understand.

In gif(4), initialize if_softc and use it, instead of assuming that
the gif_softc and ifp overlap.

Let ifioctl_common() handle SIOCGIFADDR.

Sprinkle rtcache_invariants(), which checks on DIAGNOSTIC kernels
that certain invariants on a struct route are satisfied.

In agr(4), rewrite agr_ioctl_filter() to be a bit more explicit
about the ioctls that we do not allow on an agr(4) member interface.

bzero -> memset. Delete unnecessary casts to void *. Use
sockaddr_in_init() and sockaddr_in6_init(). Compare pointers with
NULL instead of "testing truth". Replace some instances of (type
*)0 with NULL. Change some K&R prototypes to ANSI C, and join
lines.


Revision tags: netbsd-5-0-RC3 netbsd-5-0-RC2 netbsd-5-0-RC1 netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-baseX yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base yamt-nfs-mp-base yamt-pf42-base ad-socklock-base1
# 1.70 26-Mar-2008 ad

branches: 1.70.2; 1.70.6; 1.70.12; 1.70.14; 1.70.16;
Defer processing of routing messages to a soft interrupt. These can be
generated at IPL_VM and it's not safe to call directly into the socket
layer at that level. Reviewed by matt@.


Revision tags: yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase hpcarm-cleanup-base
# 1.69 20-Feb-2008 matt

branches: 1.69.2; 1.69.6;
s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)


Revision tags: mjf-devfs-base
# 1.68 11-Feb-2008 simonb

Don't look for <stdbool.h> if compiling _STANDALONE as well.


Revision tags: bouyer-xeni386-nbase
# 1.67 21-Jan-2008 dyoung

struct route is part of the kernel ABI (!!!), so move it back
outside of #ifdef _KERNEL. #include stdbool.h if !_KERNEL.


# 1.66 21-Jan-2008 dyoung

Move struct route inside of #ifdef _KERNEL to protect userland from
it.


# 1.65 21-Jan-2008 dyoung

In rtflushall(), do not clear a route cache by removing its rtentry
reference, but mark the cache 'invalid'. Let the next user of the
route cache check to whether or not the cache is valid, and update
the rtentry reference if necessary. In this way, avoid hairy
splnet()/splx() protection of route caches, which I never did trust.


Revision tags: bouyer-xeni386-base
# 1.64 14-Jan-2008 dyoung

Use rtcache_validate() instead of rtcache_getrt(). Delete rtcache_getrt().

In rtcache_lookup2(), use the return values of rtcache_validate()
and _rtcache_init() instead of looking at _ro_rt. Also, check the
return code of rtcache_setdst() for an error.


# 1.63 12-Jan-2008 dyoung

Good-bye, rtcache_check(). Call both rtcache_validate() and
rtcache_update(,1) instead of rtcache_check().


# 1.62 11-Jan-2008 dyoung

Cosmetic: remove redundant 'not' from a comment, re-wrap lines.


# 1.61 10-Jan-2008 dyoung

Make many void rtcache_X() routines return struct rtentry *, so
that we can make many back-to-back rtcache_X();rtcache_getrt()
calls into one rtcache_X() call.


Revision tags: matt-armv6-base
# 1.60 04-Jan-2008 dyoung

Replace rtcache_down() with rtcache_validate() and update rtcache_down()
uses.


Revision tags: vmlocking2-base3
# 1.59 20-Dec-2007 dyoung

Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.


Revision tags: nick-csl-alignment-base5 matt-armv6-prevmlocking yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base jmcneill-pm-base reinoud-bufcleanup-base vmlocking-base
# 1.58 27-Aug-2007 dyoung

branches: 1.58.2; 1.58.8; 1.58.10; 1.58.14;
Add a new routing message type, RTM_SETGATE. We can use an
RTM_SETGATE message to ask the link layer to fill in the link-layer
nexthop before we try to detect a duplicate route in a multipath-capable
kernel.


Revision tags: matt-mips64-base
# 1.57 19-Jul-2007 dyoung

branches: 1.57.4; 1.57.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.56 09-Jun-2007 dyoung

branches: 1.56.2;
Get rid of radix_node_head.rnh_walktree, because it is only ever
set to rn_walktree.

Introduce rt_walktree(), which applies a subroutine to every route
in a particular address family. Use it instead of rn_walktree()
virtually everywhere. This helps to hide the routing table
implementation.


Revision tags: yamt-idlelwp-base8
# 1.55 06-May-2007 dyoung

Factor rtcache_lookup2() out of rtcache_lookup1(), for re-use in
the IPv6 stack. rtcache_lookup2() takes an int * argument that it
writes with 1 if we had a cache 'hit', 0 if there was a cache
'miss'.


# 1.54 02-May-2007 dyoung

Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


# 1.53 22-Apr-2007 xtraeme

rtcache_clear is defined as static void in route.c, but it's used
in netinet/in_route.c. Move the prototype into route.h to fix
the build.


Revision tags: thorpej-atomic-base
# 1.52 04-Mar-2007 christos

branches: 1.52.2; 1.52.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.51 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.50 05-Jan-2007 joerg

branches: 1.50.2;
Add a debug option for the route cache to help tracing down issues
like PR 35272 and 35318. When the kernel is compiled with
-DRTCACHE_DEBUG, all rtcache entries are logged to a list with the place
they got initialised. This allows overwrites, double inits and other
manual messing to be detected.


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.49 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.48 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


# 1.47 07-Dec-2006 joerg

Deinline rt_get_ifa. Keep it in route.c as it is part of the routing
API, even though rtsock.c is the only user right now.


# 1.46 07-Dec-2006 joerg

Deinline rt_replace_ifa and move rt_set_ifa and rt_set_ifa1 to
route.c as they are not used outside that file.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base netbsd-4-base
# 1.45 13-Nov-2006 dyoung

Fix bugs in rt_get_ifa() and put aside the sequence number stuff,
which isn't ready for primetime yet.


# 1.44 13-Nov-2006 dyoung

Add a source-address selection policy mechanism to the kernel.

Also, add ioctls SIOCGIFADDRPREF/SIOCSIFADDRPREF to get/set preference
numbers for addresses. Make ifconfig(8) set/display preference
numbers.

To activate source-address selection policies in your kernel, add
'options IPSELSRC' to your kernel configuration.

Miscellaneous changes in support of source-address selection:

1 Factor out some common code, producing rt_replace_ifa().

2 Abbreviate a for-loop with TAILQ_FOREACH().

3 Add the predicates on IPv4 addresses IN_LINKLOCAL() and
IN_PRIVATE(), that are true for link-local unicast
(169.254/16) and RFC1918 private addresses, respectively.
Add the predicate IN_ANY_LOCAL() that is true for link-local
unicast and multicast.

4 Add IPv4-specific interface attach/detach routines,
in_domifattach and in_domifdetach, which build #ifdef
IPSELSRC.

See in_getifa(9) for a more thorough description of source-address
selection policy.


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base2 yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base simonb-timcounters-final yamt-pdpolicy-base5 chap-midi-base yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 elad-kernelauth-base yamt-pdpolicy-base yamt-uio_vmspace-base5 simonb-timecounters-base rpaulo-netinet-merge-pcb-base
# 1.43 11-Dec-2005 christos

branches: 1.43.20; 1.43.22;
merge ktrace-lwp.


Revision tags: ktrace-lwp-base
# 1.42 10-Dec-2005 elad

Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.41 22-Jun-2005 dyoung

branches: 1.41.2;
Resolve conflicts in importation of 18-May-2005 ath(4) / net80211(9)
from FreeBSD. Introduce compatibility shims (sys/dev/ic/ath_netbsd.[ch],
sys/net80211/ieee80211_netbsd.[ch]). Update drivers (an, atu, atw,
awi, ipw, iwi, rtw, wi) for the new net80211(9) API.


# 1.40 29-May-2005 christos

- sprinkle const
- remove unneeded casts
- use more mem*() instead of b*() funcs.


Revision tags: netbsd-3-1-1-RELEASE netbsd-3-0-3-RELEASE netbsd-3-1-RELEASE netbsd-3-0-2-RELEASE netbsd-3-1-RC4 netbsd-3-1-RC3 netbsd-3-1-RC2 netbsd-3-1-RC1 netbsd-3-0-1-RELEASE netbsd-3-0-RELEASE netbsd-3-0-RC6 netbsd-3-0-RC5 netbsd-3-0-RC4 netbsd-3-0-RC3 netbsd-3-0-RC2 netbsd-3-0-RC1 yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.39 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.38 21-Apr-2004 matt

branches: 1.38.4; 1.38.6;
ANSI-fy and some additional de-__P and constification.


# 1.37 21-Apr-2004 matt

Constify if.c radix.c and route.c (and fix related fallout).


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.36 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.35 29-Jun-2003 fvdl

branches: 1.35.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.34 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.33 18-Jan-2003 wiz

bandwidth, not bandwith.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.32 12-Nov-2002 itojun

remove all entries in rt timer queue on ip_mtudisc change, instead of
destroying the queue.


# 1.31 12-Nov-2002 itojun

add an argument to rt_timer_remove_all(), to specify if we need to call
timeout routine on removal.


# 1.30 02-Nov-2002 perry

/*CONTCOND*/ while (0)'ed macros


Revision tags: kqueue-aftermerge kqueue-beforemerge netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base gehenna-devsw-base kqueue-base
# 1.29 12-May-2002 matt

branches: 1.29.4;
Eliminate more commons.


Revision tags: eeh-devprop-base newlock-base ifpoll-base thorpej-mips-cache-base thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.28 08-Mar-2001 enami

branches: 1.28.2;
- lineup comment.
- fix typo in comment.


# 1.27 21-Feb-2001 itojun

branches: 1.27.2;
use u_quad_t for rtstat.
not sure if it really matters, but short (32K) looks way too small given
recent fat pipes connecting *BSD boxes, and our great uptime :-).


# 1.26 27-Jan-2001 itojun

cleanup cloned route when parent route (RTF_CLONING) goes away.
adds rt_parent to link parent from child (like NRL did, ours do refcnt
rt_refcnt properly).

bsdi rt_walkbranch would speedup the processing, but since the code will not
be visited too frequently, the current code (with rt_walktree) should be okay.


# 1.25 27-Jan-2001 itojun

mark cloned routes with RTF_CLONED. present it with netstat -r by "c".

let static routes overwrite cloned routes, as cloned routes can come back again
if necessary. behavior same as freebsd/bsdi, code partially from bsdi42.
(NRL rt->rt_parent was not added)
should fix PR 11916 and maybe some other PRs with ARP behavior.

recompilation of usr.sbin/route6d is suggested.


# 1.24 17-Jan-2001 itojun

pull post-4.4BSD change to sys/net/route.c from BSD/OS 4.2 (UCB copyrighted).

have sys/net/route.c:rtrequest1(), which takes rt_addrinfo * as the argument.
pass rt_addrinfo all the way down to rtrequest, and ifa->ifa_rtrequest.
3rd arg of ifa->ifa_rtrequest is now rt_addrinfo * instead of sockaddr *
(almost noone is using it anyways).

benefit: the follwoing command now works. previously we need two route(8)
invocations, "add" then "change".
# route add -inet6 default ::1 -ifp gif0

remove unsafe typecast in rtrequest(), from rtentry * to sockaddr *. it was
introduced by 4.3BSD-reno and never corrected.

XXX is eon_rtrequest() change correct regarding to 3rd arg?
eon_rtrequest() and rtrequest() were incorrect since 4.3BSD-reno,
so i do not have correct answer in the source code.
someone with more clue about netiso-over-ip, please help.


# 1.23 09-Dec-2000 itojun

update icmp6 too big validation. the change is necessary since pmtud is
mandatory for IPv6 (so we can't just validate by using connected pcb - we need
to allow traffic from unconnected pcb to do pmtud).
- if the traffic is validated by xx_ctlinput, allow up to "hiwat" pmtud
route entries.
- if the traffic was not validated by xx_ctlinput, allow up to "lowat" pmtud
route entries (there's upper limit, so bad guys cannot blow up our routing
table).
sync with kame

XXX need to think again about default hiwat/lowat value.
XXX victim selection to help starvation case


Revision tags: netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.22 04-May-2000 ragge

branches: 1.22.4;
Change rt_refcnt from short to int, to allow more than 32k routes thru
one interface without unexpected side effects.


# 1.21 06-Mar-2000 thorpej

- Add link status to if_data, so that routing daemons and other interested
parties can easily know the state of a link.
- Define an interface announcement message for the routing socket so that
routing daemons and other interested parties know when an interface
is attached/detached.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.20 19-Nov-1999 bouyer

Update protocoles and interfaces stats counters to 64bit.
RTM_IFINFO is now 0xf, 0xe is RTM_OIFINFO which returns the old (if_msghdr14)
struct with 32bit counters (binary compat, conditioned on COMPAT_14).
Same for sysctl: node 3 is renamed NET_RT_OIFLIST, NET_RT_IFLIST is now node 4.
Change rt_msg1() to add an mbuf to the mbuf chain instead of just panic()
when the message is larger than MHLEN.


Revision tags: comdex-fall-1999-base fvdl-softdep-base chs-ubc2-base
# 1.19 30-Jul-1999 itojun

branches: 1.19.2; 1.19.8;
remove reference to in6_systm.h (file itself will be removed afterwords)


# 1.18 01-Jul-1999 itojun

IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


Revision tags: netbsd-1-4-PATCH003 netbsd-1-4-PATCH002 netbsd-1-4-PATCH001 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.17 27-Dec-1998 thorpej

branches: 1.17.4; 1.17.6;
Simplify the rttimer code somewhat; use TAILQs instead of CIRCLEQs (we
didn't really need to traverse the queues backwards anyhow), and other
minor code simplification.


# 1.16 10-Dec-1998 christos

IPX counters and centralize statistics routine.


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.15 25-Aug-1998 thorpej

Use do { ... } while (0) in RTFREE().


Revision tags: eeh-paddr_t-base
# 1.14 02-May-1998 thorpej

Need <sys/socket.h> to stand alone.


# 1.13 29-Apr-1998 thorpej

Oops, we depend on <sys/queue.h>.


# 1.12 29-Apr-1998 kml

Add generic route timeout functionality; used by path MTU discovery code


Revision tags: netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base thorpej-signal-base marc-pcmcia-bp marc-pcmcia-base
# 1.11 02-Apr-1997 christos

branches: 1.11.8;
Sync with Lite2.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.10 22-May-1996 mycroft

Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.9 13-Feb-1996 christos

branches: 1.9.4;
Net prototypes


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.8 26-Mar-1995 jtc

KERNEL -> _KERNEL


# 1.7 08-Mar-1995 cgd

fixed sized types, where appropriate. when casting pointers to
integers to do math on them, cast to long. ioctl commands are
u_longs.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.6 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.5 13-May-1994 mycroft

Update to 4.4-Lite networking code, with a few local changes.


# 1.4 11-May-1994 mycroft

Update to RTM version 3. Add prototypes. Add some new constants which are
not used yet.


Revision tags: magnum-base netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.3 20-May-1993 cgd

add rcs ids to everything, and clean up headers


# 1.2 19-Apr-1993 mycroft

Add consistent multiple-inclusion protection.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.129 09-Aug-2021 andvar

fix various typos in compatibility, mainly in comments.


Revision tags: thorpej-i2c-spi-conf2-base thorpej-futex2-base thorpej-cfargs2-base cjep_sun2x-base1 cjep_sun2x-base cjep_staticlib_x-base1 cjep_staticlib_x-base thorpej-i2c-spi-conf-base thorpej-cfargs-base thorpej-futex-base
# 1.128 22-Mar-2021 christos

Add a list of names


Revision tags: bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406
# 1.127 09-Mar-2020 roy

branches: 1.127.6; 1.127.8;
route: RTM_MISS now puts the message source address in RTA_AUTHOR

route(8) also reports this.
A userland app could use this to blacklist nodes who probe for machines
that doesn't exist on a subnet / prefix.


Revision tags: ad-namecache-base3
# 1.126 08-Feb-2020 roy

route(4): add RO_MISSFILTER socket option

This allows filtering of specific RTM_MISS destination sockaddrs.


Revision tags: ad-namecache-base2 ad-namecache-base1 ad-namecache-base phil-wifi-20191119
# 1.125 19-Sep-2019 ozaki-r

branches: 1.125.2;
Avoid having a rtcache directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.
A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.

Reviewed by knakahara@ and yamaguchi@


# 1.124 22-Aug-2019 roy

rtsock: rework rt_clonedmsg to take a message type and lladdr

We will use this in a future patch to notify userland of lladdr
changes.

XXX pullup -8 -9


Revision tags: netbsd-9-base phil-wifi-20190609
# 1.123 29-Apr-2019 roy

branches: 1.123.2;
Introduce rt_addrmsg_src which adds RTA_AUTHOR to the message.
Use this when we notify userland of a duplicate address
and set RTA_AUTHOR to the hardware address of the sender.

While here, match the logging diagnostic of INET6 to the simpler one
of INET so it's consistent.


# 1.122 29-Apr-2019 roy

rtsock: Route address message simplification

Rename rt_newaddrmsg to rt_addrmsg_rt.
Add rt_addrmsg which drops the error and route arguments which are only
needed by one caller.


# 1.121 29-Apr-2019 pgoyette

For the rtsock compat code, make sure we create the "oroute" sysctl
tree. Otherwise a 5.2 version of getifaddrs(2) gets errors.

This makes the 5.2 version of ifconfig(8) behave the same on both
NetBSD-8 and -current. HOWEVER, both of them print nothing (for
``ifconfig -l'' command) so there's still a bug somewhere.

As reported originally by der Mouse.


Revision tags: isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126
# 1.120 30-Oct-2018 ozaki-r

Avoid double rt_replace_ifa on rtrequest1(RTM_ADD)

Some callers of rtrequest1(RTM_ADD) adjust rt_ifa of an rtentry created by
rtrequest1 that may change rt_ifa (in ifa_rtrequest) with another ifa that is
different from requested one. It's wasteful and even worse introduces a race
condition. rtrequest1 should just use a passed ifa as is if a caller hopes so.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422
# 1.119 19-Apr-2018 christos

branches: 1.119.2;
s/static inline/static __inline/g for consistency.


Revision tags: pgoyette-compat-0415
# 1.118 12-Apr-2018 ozaki-r

Resolve tangled lock dependencies in route.c

This change sweeps remaining lock decisions based on if locked or not by moving
utility functions of rtentry updates from rtsock.c and ensuring holding the
rt_lock. It also improves the atomicity of a update of a rtentry.


Revision tags: pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.117 09-Jan-2018 christos

branches: 1.117.2;
Use a queue of deferred entries to delete routes instead of a fixed stack
of 10. Otherwise we can overflow in route deletions from the rexmit timer.
XXX: pullup-8


# 1.116 18-Dec-2017 ozaki-r

Show ARP/NDP caches as LLINFO not LLDATA for backward compatiblity


# 1.115 13-Dec-2017 christos

Add bit definitions for flags so that route(8) can use them.


Revision tags: tls-maxphys-base-20171202
# 1.114 21-Sep-2017 ozaki-r

Invalidate rtcache based on a global generation counter

The change introduces a global generation counter that is incremented when any
routes have been added or deleted. When a rtcache caches a rtentry into itself,
it also stores a snapshot of the generation counter. If the snapshot equals to
the global counter, the cache is still valid, otherwise invalidated.

One drawback of the change is that all rtcaches of all protocol families are
invalidated when any routes of any protocol families are added or deleted.
If that matters, we should have separate generation counters based on
protocol families.

This change removes LIST_ENTRY from struct route, which fixes a part of
PR kern/52515.


Revision tags: nick-nhusb-base-20170825 perseant-stdc-iso10646-base
# 1.113 16-Jun-2017 ozaki-r

Sending a routing message (RTM_ADD) on adding an llentry

A message used to be sent on adding a cloned route. Restore the
behavior for backward compatibility.

Requested by ryo@


Revision tags: netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.112 11-Apr-2017 roy

branches: 1.112.4;
Add RO_MSGFILTER socket option to PF_ROUTE to filter out
un-wanted route(4) messages.

Inspired by the ROUTE_MSGFILTER equivalent in OpenBSD,
but with an API which allows the full range of potential message types.


Revision tags: jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107
# 1.111 19-Dec-2016 roy

branches: 1.111.2;
Fix gcc complaining about int to unsigned long conversion issues by
explictly marking as unsigned in RT_ROUNDUP2.


# 1.110 16-Dec-2016 christos

Can't hide stuff from userland, because struct route is embedded in other
structures (like inpcb) and things like fstat stop working.


# 1.109 12-Dec-2016 ozaki-r

Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.108 08-Dec-2016 ozaki-r

Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.


Revision tags: nick-nhusb-base-20161204
# 1.107 15-Nov-2016 ozaki-r

Don't use rt_walktree to delete routes

Some functions use rt_walktree to scan the routing table and delete
matched routes. However, we shouldn't use rt_walktree to delete
routes because rt_walktree is recursive to the routing table (radix
tree) and isn't friendly to MP-ification. rt_walktree allows a caller
to pass a callback function to delete an matched entry. The callback
function is called from an API of the radix tree (rn_walktree) but
also calls an API of the radix tree to delete an entry.

This change adds a new API of the radix tree, rn_search_matched,
which returns a matched entry that is selected by a callback
function passed by a caller and the caller itself deletes the
entry. By using the API, we can avoid the recursive form.


Revision tags: pgoyette-localcount-20161104
# 1.106 25-Oct-2016 ozaki-r

Remove unnecessary argument

No functional change.


# 1.105 21-Oct-2016 ozaki-r

Make some rt_timer functions and variables static

No functional change.


# 1.104 18-Oct-2016 ozaki-r

Remove unused rtcache_lookup_noclone


Revision tags: nick-nhusb-base-20161004
# 1.103 21-Sep-2016 roy

Add ifam_pid and ifam_addrflags to ifa_msghdr.
Re-version RTM_NEWADDR, RTM_DELADDR, RTM_CHGADDR and NET_RT_IFLIST.
Add compat code for old version.


Revision tags: localcount-20160914 pgoyette-localcount-20160806
# 1.102 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


Revision tags: pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529
# 1.101 28-Apr-2016 ozaki-r

branches: 1.101.2;
Constify rtentry of if_output

We no longer need to change rtentry below if_output.

The change makes it clear where rtentries are changed (or not)
and helps forthcoming locking (os psrefing) rtentries.


# 1.100 26-Apr-2016 ozaki-r

Stop using rt_gwroute on packet sending paths

rt_gwroute of rtentry is a reference to a rtentry of the gateway
for a rtentry with RTF_GATEWAY. That was used by L2 (arp and ndp)
to look up L2 addresses. By separating L2 nexthop caches, we don't
need a route for the purpose and we can stop using rt_gwroute.
By doing so, we can reduce referencing and modifying rtentries,
which makes it easy to apply a lock (and/or psref) to the
routing table and rtentries.

One issue to do this is to keep RTF_REJECT behavior. It seems it
was broken when we moved rtalloc1 things from L2 output routines
(e.g., ether_output) to ip_hresolv_output, but (fortunately?)
it works unexpectedly. What we mistook are:
- RTF_REJECT was checked for any routes in L2 output routines,
but in ip_hresolv_output it is checked only when the route
is RTF_GATEWAY
- The RTF_REJECT check wasn't copied to IPv6 (nd6_output)

It seems that rt_gwroute checks hid the mistakes and it looked
work (unexpectedly) and removing rt_gwroute checks unveil the
issue. So we need to fix RTF_REJECT checks in ip_hresolv_output
and also add them to nd6_output.

One more point we have to care is returning an errno; we need
to mimic looutput behavior. Originally RTF_REJECT check was
done either in L2 output routines or in looutput. The latter is
applied when a reject route directs to a loopback interface.
However, now RTF_REJECT check is done before looutput so to keep
the original behavior we need to return an errno which looutput
chooses. Added rt_check_reject_route does such tweaks.


Revision tags: nick-nhusb-base-20160422
# 1.99 11-Apr-2016 ozaki-r

Don't use radix tree API directly


# 1.98 04-Apr-2016 ozaki-r

Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html


# 1.97 24-Mar-2016 ozaki-r

Constify rt_newmsg's arguments


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.96 02-Sep-2015 ozaki-r

Do rt_refcnt++ when set a rtentry to another rtentry's rt_gwroute

And also do rtfree when deref a rtentry from rt_gwroute.


# 1.95 31-Aug-2015 ozaki-r

Hook up lltable/llentry with the kernel (and rumpkernel)

It is built and initialized on bootup, but there is no user for now.

Most codes in in.c are imported from FreeBSD as well as lltable/llentry.


# 1.94 31-Aug-2015 ozaki-r

Make rt_refcnt take into account rt_timer


# 1.93 24-Aug-2015 ozaki-r

Add an assertion; if rtcache has an rtentry, its refcnt must be > 0


# 1.92 17-Jul-2015 ozaki-r

Reform use of rt_refcnt

rt_refcnt of rtentry was used in bad manners, for example, direct rt_refcnt++
and rt_refcnt-- outside route.c, "rt->rt_refcnt++; rtfree(rt);" idiom, and
touching rt after rt->rt_refcnt--.

These abuses seem to be needed because rt_refcnt manages only references
between rtentry and doesn't take care of references during packet processing
(IOW references from local variables). In order to reduce the above abuses,
the latter cases should be counted by rt_refcnt as well as the former cases.

This change improves consistency of use of rt_refcnt:
- rtentry is always accessed with rt_refcnt incremented
- rtentry's rt_refcnt is decremented after use (rtfree is always used instead
of rt_refcnt--)
- functions returning rtentry increment its rt_refcnt (and caller rtfree it)

Note that rt_refcnt prevents rtentry from being freed but doesn't prevent
rtentry from being updated. Toward MP-safe, we need to provide another
protection for rtentry, e.g., locks. (Or introduce a better data structure
allowing concurrent readers during updates.)


Revision tags: nick-nhusb-base-20150606
# 1.91 30-Apr-2015 ozaki-r

Make some functions static

- rtflushall
- rtcache_clear
- rtcache_invalidate

And pull these static inline functions in route.c

- rt_destroy
- rt_setkey


Revision tags: nick-nhusb-base-20150406
# 1.90 06-Apr-2015 ozaki-r

Classify and sort prototype declarations

No functional change.


# 1.89 06-Apr-2015 ozaki-r

Make rt_maskedcopy static


# 1.88 23-Mar-2015 roy

Add RTF_BROADCAST to mark routes used for the broadcast address when
they are created on the fly. This makes it clear what the route is for
and allows an optimisation in ip_output() by avoiding a call to
in_broadcast() because most of the time we do talk to a host.
It also avoids a needless allocation for the storage of llinfo_arp and
thus vanishes from arp(8) - it showed as incomplete anyway so this
is a nice side effect.

Guard against this and routes marked with RTF_BLACKHOLE in
ip_fastforward().
While here, guard against routes marked with RTF_BLACKHOLE in
ip6_fastforward().
RTF_BROADCAST is IPv4 only, so don't bother checking that here.


# 1.87 26-Feb-2015 roy

Introduce the routing flag RTF_LOCAL to track local address routes.
Add functions rt_ifa_addlocal() and rt_ifa_remlocal() to add and remove
local routes for the address and announce the new address and route
to the routing socket.

Add in_ifaddlocal() and in_ifremlocal() to use these functions.
Rename in6_if{add,rem}loop() to in6_if{add,rem}local() and use these
functions.

rtinit() no longer announces the address, just the network route for the
address. As such, calls to rt_newaddrmsg() have been removed from
in_addprefix() and in_scrubprefix().

This solves the problem of potentially more than one announcement, or no
announcement at all for the address in certain situations.


# 1.86 25-Feb-2015 roy

Rename nd6_rtmsg() to rt_newmsg() and move into the generic routing code
as it's not IPv6 specific and will be used elsewhere.


# 1.85 24-Feb-2015 roy

Clean comments and style.


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.84 06-Jun-2014 rmind

branches: 1.84.4;
- Eliminate RTFREE() macro in favour of rtfree() function.
- Make rtcache() function static.


Revision tags: yamt-pagecache-base9 rmind-smpnet-nbase rmind-smpnet-base
# 1.83 26-Apr-2014 pooka

It's been > 20years since rtioctl() did something. Let's just
remove that special way of returning EOPNOTSUPP.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base
# 1.82 01-Mar-2013 joerg

branches: 1.82.6; 1.82.10;
Retire OSI network stack. OK core@


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6 jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.81 18-Feb-2012 rmind

branches: 1.81.2;
rt_setkey: remove invalid assert, sockaddr_dup() may fail if no memory.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.80 11-Nov-2011 gdt

branches: 1.80.4;
Move RTF_ANNOUNCE flag so that it no longer conflicts with RTF_PROTO2.

RTF_ANNOUNCE was defined as RTF_PROTO2. The flag is used to indicated
that host should act as a proxy for a link level arp or ndp request.
(If RTF_PROTO2 is used as an experimental flag (as advertised),
various problems can occur.)

This commit provides a first-class definition with its own bit for
RTF_ANNOUNCE, removes the old aliasing definitions, and adds support
for the new RTF_ANNOUNCE flag to netstat(8) and route(8).,

Also, remove unused RTF_ flags that collide with RTF_PROTO1:
netinet/icmp6.h defined RTF_PROBEMTU as RTF_PROTO1
netinet/if_inarp.h defined RTF_USETRAILERS as RTF_PROTO1
(Neither of these flags are used anywhere. Both have been removed
to reduce chances of collision with RTF_PROTO1.)

Figuring this out and the diff are the work of Beverly Schwartz of
BBN.

(Passed release build, boot in VM, with no apparently related atf
failures.)

Approved for Public Release, Distribution Unlimited
This material is based upon work supported by the Defense Advanced
Research Projects Agency and Space and Naval Warfare Systems Center,
Pacific, under Contract No. N66001-09-C-2073.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.79 31-Mar-2011 dyoung

branches: 1.79.4;
Hide the radix-trie implementation of the forwarding table so that we
will have an easier time replacing it with something different, even if
it is a second radix-trie implementation.

sys/net/route.c and sys/net/rtsock.c no longer operate directly on
radix_nodes or radix_node_heads.

Hopefully this will reduce the temptation to implement multipath or
source-based routing using grotty hacks to the grotty old radix-trie
code, too. :-)


Revision tags: bouyer-quota2-nbase bouyer-quota2-base
# 1.78 01-Feb-2011 matt

Add a new AF/PF_ROUTE which is 64-bit clean which makes the routing socket
interface (and its associated sysctls) act identically for both 32 and 64 bit
programs. The old unclean one remains for backward compatibility.


# 1.77 26-Jan-2011 dyoung

Update comment on RTM_CHGADDR to describe better what it's for.


Revision tags: jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.76 12-Nov-2010 roy

branches: 1.76.2; 1.76.4;
Add RTM_CHGADDR to signal that an address on the interface has changed.
This is mainly used for notifying userland about active link address changes.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.75 26-Jun-2010 kefren

Add MPLS support, proposed on tech-net@ a couple of days ago

Welcome to 5.99.33


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211
# 1.74 03-Nov-2009 dyoung

branches: 1.74.2; 1.74.4;
s/u_quad_t/uint64_t/.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 jym-xensuspend-nbase yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.73 02-Apr-2009 christos

Centralize the ROUNDUP and ADVANCE macro in a header file, give them an
RT_ prefix and use them appropriately, instead of making copies. Make
pppd use the RT_ROUNDUP macro; fixes proxyarp setting on 64 bit hosts.

XXX: All this should be pulled up to 5.0


Revision tags: nick-hppapmap-base2 mjf-devfs2-base
# 1.72 11-Jan-2009 christos

branches: 1.72.2;
merge christos-time_t


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base christos-time_t-nbase haad-dm-base christos-time_t-base
# 1.71 07-Nov-2008 dyoung

*** Summary ***

When a link-layer address changes (e.g., ifconfig ex0 link
02:de:ad:be:ef:02 active), send a gratuitous ARP and/or a Neighbor
Advertisement to update the network-/link-layer address bindings
on our LAN peers.

Refuse a change of ethernet address to the address 00:00:00:00:00:00
or to any multicast/broadcast address. (Thanks matt@.)

Reorder ifnet ioctl operations so that driver ioctls may inherit
the functions of their "class"---ether_ioctl(), fddi_ioctl(), et
cetera---and the class ioctls may inherit from the generic ioctl,
ifioctl_common(), but both driver- and class-ioctls may override
the generic behavior. Make network drivers share more code.

Distinguish a "factory" link-layer address from others for the
purposes of both protecting that address from deletion and computing
EUI64.

Return consistent, appropriate error codes from network drivers.

Improve readability. KNF.

*** Details ***

In if_attach(), always initialize the interface ioctl routine,
ifnet->if_ioctl, if the driver has not already initialized it.
Delete if_ioctl == NULL tests everywhere else, because it cannot
happen.

In the ioctl routines of network interfaces, inherit common ioctl
behaviors by calling either ifioctl_common() or whichever ioctl
routine is appropriate for the class of interface---e.g., ether_ioctl()
for ethernets.

Stop (ab)using SIOCSIFADDR and start to use SIOCINITIFADDR. In
the user->kernel interface, SIOCSIFADDR's argument was an ifreq,
but on the protocol->ifnet interface, SIOCSIFADDR's argument was
an ifaddr. That was confusing, and it would work against me as I
make it possible for a network interface to overload most ioctls.
On the protocol->ifnet interface, replace SIOCSIFADDR with
SIOCINITIFADDR. In ifioctl(), return EPERM if userland tries to
invoke SIOCINITIFADDR.

In ifioctl(), give the interface the first shot at handling most
interface ioctls, and give the protocol the second shot, instead
of the other way around. Finally, let compatibility code (COMPAT_OSOCK)
take a shot.

Pull device initialization out of switch statements under
SIOCINITIFADDR. For example, pull ..._init() out of any switch
statement that looks like this:

switch (...->sa_family) {
case ...:
..._init();
...
break;
...
default:
..._init();
...
break;
}

Rewrite many if-else clauses that handle all permutations of IFF_UP
and IFF_RUNNING to use a switch statement,

switch (x & (IFF_UP|IFF_RUNNING)) {
case 0:
...
break;
case IFF_RUNNING:
...
break;
case IFF_UP:
...
break;
case IFF_UP|IFF_RUNNING:
...
break;
}

unifdef lots of code containing #ifdef FreeBSD, #ifdef NetBSD, and
#ifdef SIOCSIFMTU, especially in fwip(4) and in ndis(4).

In ipw(4), remove an if_set_sadl() call that is out of place.

In nfe(4), reuse the jumbo MTU logic in ether_ioctl().

Let ethernets register a callback for setting h/w state such as
promiscuous mode and the multicast filter in accord with a change
in the if_flags: ether_set_ifflags_cb() registers a callback that
returns ENETRESET if the caller should reset the ethernet by calling
if_init(), 0 on success, != 0 on failure. Pull common code from
ex(4), gem(4), nfe(4), sip(4), tlp(4), vge(4) into ether_ioctl(),
and register if_flags callbacks for those drivers.

Return ENOTTY instead of EINVAL for inappropriate ioctls. In
zyd(4), use ENXIO instead of ENOTTY to indicate that the device is
not any longer attached.

Add to if_set_sadl() a boolean 'factory' argument that indicates
whether a link-layer address was assigned by the factory or some
other source. In a comment, recommend using the factory address
for generating an EUI64, and update in6_get_hw_ifid() to prefer a
factory address to any other link-layer address.

Add a routing message, RTM_LLINFO_UPD, that tells protocols to
update the binding of network-layer addresses to link-layer addresses.
Implement this message in IPv4 and IPv6 by sending a gratuitous
ARP or a neighbor advertisement, respectively. Generate RTM_LLINFO_UPD
messages on a change of an interface's link-layer address.

In ether_ioctl(), do not let SIOCALIFADDR set a link-layer address
that is broadcast/multicast or equal to 00:00:00:00:00:00.

Make ether_ioctl() call ifioctl_common() to handle ioctls that it
does not understand.

In gif(4), initialize if_softc and use it, instead of assuming that
the gif_softc and ifp overlap.

Let ifioctl_common() handle SIOCGIFADDR.

Sprinkle rtcache_invariants(), which checks on DIAGNOSTIC kernels
that certain invariants on a struct route are satisfied.

In agr(4), rewrite agr_ioctl_filter() to be a bit more explicit
about the ioctls that we do not allow on an agr(4) member interface.

bzero -> memset. Delete unnecessary casts to void *. Use
sockaddr_in_init() and sockaddr_in6_init(). Compare pointers with
NULL instead of "testing truth". Replace some instances of (type
*)0 with NULL. Change some K&R prototypes to ANSI C, and join
lines.


Revision tags: netbsd-5-0-RC3 netbsd-5-0-RC2 netbsd-5-0-RC1 netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-baseX yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base yamt-nfs-mp-base yamt-pf42-base ad-socklock-base1
# 1.70 26-Mar-2008 ad

branches: 1.70.2; 1.70.6; 1.70.12; 1.70.14; 1.70.16;
Defer processing of routing messages to a soft interrupt. These can be
generated at IPL_VM and it's not safe to call directly into the socket
layer at that level. Reviewed by matt@.


Revision tags: yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase hpcarm-cleanup-base
# 1.69 20-Feb-2008 matt

branches: 1.69.2; 1.69.6;
s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)


Revision tags: mjf-devfs-base
# 1.68 11-Feb-2008 simonb

Don't look for <stdbool.h> if compiling _STANDALONE as well.


Revision tags: bouyer-xeni386-nbase
# 1.67 21-Jan-2008 dyoung

struct route is part of the kernel ABI (!!!), so move it back
outside of #ifdef _KERNEL. #include stdbool.h if !_KERNEL.


# 1.66 21-Jan-2008 dyoung

Move struct route inside of #ifdef _KERNEL to protect userland from
it.


# 1.65 21-Jan-2008 dyoung

In rtflushall(), do not clear a route cache by removing its rtentry
reference, but mark the cache 'invalid'. Let the next user of the
route cache check to whether or not the cache is valid, and update
the rtentry reference if necessary. In this way, avoid hairy
splnet()/splx() protection of route caches, which I never did trust.


Revision tags: bouyer-xeni386-base
# 1.64 14-Jan-2008 dyoung

Use rtcache_validate() instead of rtcache_getrt(). Delete rtcache_getrt().

In rtcache_lookup2(), use the return values of rtcache_validate()
and _rtcache_init() instead of looking at _ro_rt. Also, check the
return code of rtcache_setdst() for an error.


# 1.63 12-Jan-2008 dyoung

Good-bye, rtcache_check(). Call both rtcache_validate() and
rtcache_update(,1) instead of rtcache_check().


# 1.62 11-Jan-2008 dyoung

Cosmetic: remove redundant 'not' from a comment, re-wrap lines.


# 1.61 10-Jan-2008 dyoung

Make many void rtcache_X() routines return struct rtentry *, so
that we can make many back-to-back rtcache_X();rtcache_getrt()
calls into one rtcache_X() call.


Revision tags: matt-armv6-base
# 1.60 04-Jan-2008 dyoung

Replace rtcache_down() with rtcache_validate() and update rtcache_down()
uses.


Revision tags: vmlocking2-base3
# 1.59 20-Dec-2007 dyoung

Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.


Revision tags: nick-csl-alignment-base5 matt-armv6-prevmlocking yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base jmcneill-pm-base reinoud-bufcleanup-base vmlocking-base
# 1.58 27-Aug-2007 dyoung

branches: 1.58.2; 1.58.8; 1.58.10; 1.58.14;
Add a new routing message type, RTM_SETGATE. We can use an
RTM_SETGATE message to ask the link layer to fill in the link-layer
nexthop before we try to detect a duplicate route in a multipath-capable
kernel.


Revision tags: matt-mips64-base
# 1.57 19-Jul-2007 dyoung

branches: 1.57.4; 1.57.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.56 09-Jun-2007 dyoung

branches: 1.56.2;
Get rid of radix_node_head.rnh_walktree, because it is only ever
set to rn_walktree.

Introduce rt_walktree(), which applies a subroutine to every route
in a particular address family. Use it instead of rn_walktree()
virtually everywhere. This helps to hide the routing table
implementation.


Revision tags: yamt-idlelwp-base8
# 1.55 06-May-2007 dyoung

Factor rtcache_lookup2() out of rtcache_lookup1(), for re-use in
the IPv6 stack. rtcache_lookup2() takes an int * argument that it
writes with 1 if we had a cache 'hit', 0 if there was a cache
'miss'.


# 1.54 02-May-2007 dyoung

Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


# 1.53 22-Apr-2007 xtraeme

rtcache_clear is defined as static void in route.c, but it's used
in netinet/in_route.c. Move the prototype into route.h to fix
the build.


Revision tags: thorpej-atomic-base
# 1.52 04-Mar-2007 christos

branches: 1.52.2; 1.52.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.51 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.50 05-Jan-2007 joerg

branches: 1.50.2;
Add a debug option for the route cache to help tracing down issues
like PR 35272 and 35318. When the kernel is compiled with
-DRTCACHE_DEBUG, all rtcache entries are logged to a list with the place
they got initialised. This allows overwrites, double inits and other
manual messing to be detected.


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.49 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.48 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


# 1.47 07-Dec-2006 joerg

Deinline rt_get_ifa. Keep it in route.c as it is part of the routing
API, even though rtsock.c is the only user right now.


# 1.46 07-Dec-2006 joerg

Deinline rt_replace_ifa and move rt_set_ifa and rt_set_ifa1 to
route.c as they are not used outside that file.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base netbsd-4-base
# 1.45 13-Nov-2006 dyoung

Fix bugs in rt_get_ifa() and put aside the sequence number stuff,
which isn't ready for primetime yet.


# 1.44 13-Nov-2006 dyoung

Add a source-address selection policy mechanism to the kernel.

Also, add ioctls SIOCGIFADDRPREF/SIOCSIFADDRPREF to get/set preference
numbers for addresses. Make ifconfig(8) set/display preference
numbers.

To activate source-address selection policies in your kernel, add
'options IPSELSRC' to your kernel configuration.

Miscellaneous changes in support of source-address selection:

1 Factor out some common code, producing rt_replace_ifa().

2 Abbreviate a for-loop with TAILQ_FOREACH().

3 Add the predicates on IPv4 addresses IN_LINKLOCAL() and
IN_PRIVATE(), that are true for link-local unicast
(169.254/16) and RFC1918 private addresses, respectively.
Add the predicate IN_ANY_LOCAL() that is true for link-local
unicast and multicast.

4 Add IPv4-specific interface attach/detach routines,
in_domifattach and in_domifdetach, which build #ifdef
IPSELSRC.

See in_getifa(9) for a more thorough description of source-address
selection policy.


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base2 yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base simonb-timcounters-final yamt-pdpolicy-base5 chap-midi-base yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 elad-kernelauth-base yamt-pdpolicy-base yamt-uio_vmspace-base5 simonb-timecounters-base rpaulo-netinet-merge-pcb-base
# 1.43 11-Dec-2005 christos

branches: 1.43.20; 1.43.22;
merge ktrace-lwp.


Revision tags: ktrace-lwp-base
# 1.42 10-Dec-2005 elad

Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.41 22-Jun-2005 dyoung

branches: 1.41.2;
Resolve conflicts in importation of 18-May-2005 ath(4) / net80211(9)
from FreeBSD. Introduce compatibility shims (sys/dev/ic/ath_netbsd.[ch],
sys/net80211/ieee80211_netbsd.[ch]). Update drivers (an, atu, atw,
awi, ipw, iwi, rtw, wi) for the new net80211(9) API.


# 1.40 29-May-2005 christos

- sprinkle const
- remove unneeded casts
- use more mem*() instead of b*() funcs.


Revision tags: netbsd-3-1-1-RELEASE netbsd-3-0-3-RELEASE netbsd-3-1-RELEASE netbsd-3-0-2-RELEASE netbsd-3-1-RC4 netbsd-3-1-RC3 netbsd-3-1-RC2 netbsd-3-1-RC1 netbsd-3-0-1-RELEASE netbsd-3-0-RELEASE netbsd-3-0-RC6 netbsd-3-0-RC5 netbsd-3-0-RC4 netbsd-3-0-RC3 netbsd-3-0-RC2 netbsd-3-0-RC1 yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.39 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.38 21-Apr-2004 matt

branches: 1.38.4; 1.38.6;
ANSI-fy and some additional de-__P and constification.


# 1.37 21-Apr-2004 matt

Constify if.c radix.c and route.c (and fix related fallout).


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.36 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.35 29-Jun-2003 fvdl

branches: 1.35.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.34 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.33 18-Jan-2003 wiz

bandwidth, not bandwith.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.32 12-Nov-2002 itojun

remove all entries in rt timer queue on ip_mtudisc change, instead of
destroying the queue.


# 1.31 12-Nov-2002 itojun

add an argument to rt_timer_remove_all(), to specify if we need to call
timeout routine on removal.


# 1.30 02-Nov-2002 perry

/*CONTCOND*/ while (0)'ed macros


Revision tags: kqueue-aftermerge kqueue-beforemerge netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base gehenna-devsw-base kqueue-base
# 1.29 12-May-2002 matt

branches: 1.29.4;
Eliminate more commons.


Revision tags: eeh-devprop-base newlock-base ifpoll-base thorpej-mips-cache-base thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.28 08-Mar-2001 enami

branches: 1.28.2;
- lineup comment.
- fix typo in comment.


# 1.27 21-Feb-2001 itojun

branches: 1.27.2;
use u_quad_t for rtstat.
not sure if it really matters, but short (32K) looks way too small given
recent fat pipes connecting *BSD boxes, and our great uptime :-).


# 1.26 27-Jan-2001 itojun

cleanup cloned route when parent route (RTF_CLONING) goes away.
adds rt_parent to link parent from child (like NRL did, ours do refcnt
rt_refcnt properly).

bsdi rt_walkbranch would speedup the processing, but since the code will not
be visited too frequently, the current code (with rt_walktree) should be okay.


# 1.25 27-Jan-2001 itojun

mark cloned routes with RTF_CLONED. present it with netstat -r by "c".

let static routes overwrite cloned routes, as cloned routes can come back again
if necessary. behavior same as freebsd/bsdi, code partially from bsdi42.
(NRL rt->rt_parent was not added)
should fix PR 11916 and maybe some other PRs with ARP behavior.

recompilation of usr.sbin/route6d is suggested.


# 1.24 17-Jan-2001 itojun

pull post-4.4BSD change to sys/net/route.c from BSD/OS 4.2 (UCB copyrighted).

have sys/net/route.c:rtrequest1(), which takes rt_addrinfo * as the argument.
pass rt_addrinfo all the way down to rtrequest, and ifa->ifa_rtrequest.
3rd arg of ifa->ifa_rtrequest is now rt_addrinfo * instead of sockaddr *
(almost noone is using it anyways).

benefit: the follwoing command now works. previously we need two route(8)
invocations, "add" then "change".
# route add -inet6 default ::1 -ifp gif0

remove unsafe typecast in rtrequest(), from rtentry * to sockaddr *. it was
introduced by 4.3BSD-reno and never corrected.

XXX is eon_rtrequest() change correct regarding to 3rd arg?
eon_rtrequest() and rtrequest() were incorrect since 4.3BSD-reno,
so i do not have correct answer in the source code.
someone with more clue about netiso-over-ip, please help.


# 1.23 09-Dec-2000 itojun

update icmp6 too big validation. the change is necessary since pmtud is
mandatory for IPv6 (so we can't just validate by using connected pcb - we need
to allow traffic from unconnected pcb to do pmtud).
- if the traffic is validated by xx_ctlinput, allow up to "hiwat" pmtud
route entries.
- if the traffic was not validated by xx_ctlinput, allow up to "lowat" pmtud
route entries (there's upper limit, so bad guys cannot blow up our routing
table).
sync with kame

XXX need to think again about default hiwat/lowat value.
XXX victim selection to help starvation case


Revision tags: netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.22 04-May-2000 ragge

branches: 1.22.4;
Change rt_refcnt from short to int, to allow more than 32k routes thru
one interface without unexpected side effects.


# 1.21 06-Mar-2000 thorpej

- Add link status to if_data, so that routing daemons and other interested
parties can easily know the state of a link.
- Define an interface announcement message for the routing socket so that
routing daemons and other interested parties know when an interface
is attached/detached.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.20 19-Nov-1999 bouyer

Update protocoles and interfaces stats counters to 64bit.
RTM_IFINFO is now 0xf, 0xe is RTM_OIFINFO which returns the old (if_msghdr14)
struct with 32bit counters (binary compat, conditioned on COMPAT_14).
Same for sysctl: node 3 is renamed NET_RT_OIFLIST, NET_RT_IFLIST is now node 4.
Change rt_msg1() to add an mbuf to the mbuf chain instead of just panic()
when the message is larger than MHLEN.


Revision tags: comdex-fall-1999-base fvdl-softdep-base chs-ubc2-base
# 1.19 30-Jul-1999 itojun

branches: 1.19.2; 1.19.8;
remove reference to in6_systm.h (file itself will be removed afterwords)


# 1.18 01-Jul-1999 itojun

IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


Revision tags: netbsd-1-4-PATCH003 netbsd-1-4-PATCH002 netbsd-1-4-PATCH001 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.17 27-Dec-1998 thorpej

branches: 1.17.4; 1.17.6;
Simplify the rttimer code somewhat; use TAILQs instead of CIRCLEQs (we
didn't really need to traverse the queues backwards anyhow), and other
minor code simplification.


# 1.16 10-Dec-1998 christos

IPX counters and centralize statistics routine.


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.15 25-Aug-1998 thorpej

Use do { ... } while (0) in RTFREE().


Revision tags: eeh-paddr_t-base
# 1.14 02-May-1998 thorpej

Need <sys/socket.h> to stand alone.


# 1.13 29-Apr-1998 thorpej

Oops, we depend on <sys/queue.h>.


# 1.12 29-Apr-1998 kml

Add generic route timeout functionality; used by path MTU discovery code


Revision tags: netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base thorpej-signal-base marc-pcmcia-bp marc-pcmcia-base
# 1.11 02-Apr-1997 christos

branches: 1.11.8;
Sync with Lite2.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.10 22-May-1996 mycroft

Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.9 13-Feb-1996 christos

branches: 1.9.4;
Net prototypes


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.8 26-Mar-1995 jtc

KERNEL -> _KERNEL


# 1.7 08-Mar-1995 cgd

fixed sized types, where appropriate. when casting pointers to
integers to do math on them, cast to long. ioctl commands are
u_longs.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.6 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.5 13-May-1994 mycroft

Update to 4.4-Lite networking code, with a few local changes.


# 1.4 11-May-1994 mycroft

Update to RTM version 3. Add prototypes. Add some new constants which are
not used yet.


Revision tags: magnum-base netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.3 20-May-1993 cgd

add rcs ids to everything, and clean up headers


# 1.2 19-Apr-1993 mycroft

Add consistent multiple-inclusion protection.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.128 22-Mar-2021 christos

Add a list of names


Revision tags: thorpej-cfargs-base thorpej-futex-base bouyer-xenpvh-base2 phil-wifi-20200421 bouyer-xenpvh-base1 phil-wifi-20200411 bouyer-xenpvh-base is-mlppp-base phil-wifi-20200406
# 1.127 09-Mar-2020 roy

route: RTM_MISS now puts the message source address in RTA_AUTHOR

route(8) also reports this.
A userland app could use this to blacklist nodes who probe for machines
that doesn't exist on a subnet / prefix.


Revision tags: ad-namecache-base3
# 1.126 08-Feb-2020 roy

route(4): add RO_MISSFILTER socket option

This allows filtering of specific RTM_MISS destination sockaddrs.


Revision tags: ad-namecache-base2 ad-namecache-base1 ad-namecache-base phil-wifi-20191119
# 1.125 19-Sep-2019 ozaki-r

branches: 1.125.2;
Avoid having a rtcache directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.
A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.

Reviewed by knakahara@ and yamaguchi@


# 1.124 22-Aug-2019 roy

rtsock: rework rt_clonedmsg to take a message type and lladdr

We will use this in a future patch to notify userland of lladdr
changes.

XXX pullup -8 -9


Revision tags: netbsd-9-base phil-wifi-20190609
# 1.123 29-Apr-2019 roy

branches: 1.123.2;
Introduce rt_addrmsg_src which adds RTA_AUTHOR to the message.
Use this when we notify userland of a duplicate address
and set RTA_AUTHOR to the hardware address of the sender.

While here, match the logging diagnostic of INET6 to the simpler one
of INET so it's consistent.


# 1.122 29-Apr-2019 roy

rtsock: Route address message simplification

Rename rt_newaddrmsg to rt_addrmsg_rt.
Add rt_addrmsg which drops the error and route arguments which are only
needed by one caller.


# 1.121 29-Apr-2019 pgoyette

For the rtsock compat code, make sure we create the "oroute" sysctl
tree. Otherwise a 5.2 version of getifaddrs(2) gets errors.

This makes the 5.2 version of ifconfig(8) behave the same on both
NetBSD-8 and -current. HOWEVER, both of them print nothing (for
``ifconfig -l'' command) so there's still a bug somewhere.

As reported originally by der Mouse.


Revision tags: isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126
# 1.120 30-Oct-2018 ozaki-r

Avoid double rt_replace_ifa on rtrequest1(RTM_ADD)

Some callers of rtrequest1(RTM_ADD) adjust rt_ifa of an rtentry created by
rtrequest1 that may change rt_ifa (in ifa_rtrequest) with another ifa that is
different from requested one. It's wasteful and even worse introduces a race
condition. rtrequest1 should just use a passed ifa as is if a caller hopes so.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422
# 1.119 19-Apr-2018 christos

branches: 1.119.2;
s/static inline/static __inline/g for consistency.


Revision tags: pgoyette-compat-0415
# 1.118 12-Apr-2018 ozaki-r

Resolve tangled lock dependencies in route.c

This change sweeps remaining lock decisions based on if locked or not by moving
utility functions of rtentry updates from rtsock.c and ensuring holding the
rt_lock. It also improves the atomicity of a update of a rtentry.


Revision tags: pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.117 09-Jan-2018 christos

branches: 1.117.2;
Use a queue of deferred entries to delete routes instead of a fixed stack
of 10. Otherwise we can overflow in route deletions from the rexmit timer.
XXX: pullup-8


# 1.116 18-Dec-2017 ozaki-r

Show ARP/NDP caches as LLINFO not LLDATA for backward compatiblity


# 1.115 13-Dec-2017 christos

Add bit definitions for flags so that route(8) can use them.


Revision tags: tls-maxphys-base-20171202
# 1.114 21-Sep-2017 ozaki-r

Invalidate rtcache based on a global generation counter

The change introduces a global generation counter that is incremented when any
routes have been added or deleted. When a rtcache caches a rtentry into itself,
it also stores a snapshot of the generation counter. If the snapshot equals to
the global counter, the cache is still valid, otherwise invalidated.

One drawback of the change is that all rtcaches of all protocol families are
invalidated when any routes of any protocol families are added or deleted.
If that matters, we should have separate generation counters based on
protocol families.

This change removes LIST_ENTRY from struct route, which fixes a part of
PR kern/52515.


Revision tags: nick-nhusb-base-20170825 perseant-stdc-iso10646-base
# 1.113 16-Jun-2017 ozaki-r

Sending a routing message (RTM_ADD) on adding an llentry

A message used to be sent on adding a cloned route. Restore the
behavior for backward compatibility.

Requested by ryo@


Revision tags: netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.112 11-Apr-2017 roy

branches: 1.112.4;
Add RO_MSGFILTER socket option to PF_ROUTE to filter out
un-wanted route(4) messages.

Inspired by the ROUTE_MSGFILTER equivalent in OpenBSD,
but with an API which allows the full range of potential message types.


Revision tags: jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107
# 1.111 19-Dec-2016 roy

branches: 1.111.2;
Fix gcc complaining about int to unsigned long conversion issues by
explictly marking as unsigned in RT_ROUNDUP2.


# 1.110 16-Dec-2016 christos

Can't hide stuff from userland, because struct route is embedded in other
structures (like inpcb) and things like fstat stop working.


# 1.109 12-Dec-2016 ozaki-r

Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.108 08-Dec-2016 ozaki-r

Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.


Revision tags: nick-nhusb-base-20161204
# 1.107 15-Nov-2016 ozaki-r

Don't use rt_walktree to delete routes

Some functions use rt_walktree to scan the routing table and delete
matched routes. However, we shouldn't use rt_walktree to delete
routes because rt_walktree is recursive to the routing table (radix
tree) and isn't friendly to MP-ification. rt_walktree allows a caller
to pass a callback function to delete an matched entry. The callback
function is called from an API of the radix tree (rn_walktree) but
also calls an API of the radix tree to delete an entry.

This change adds a new API of the radix tree, rn_search_matched,
which returns a matched entry that is selected by a callback
function passed by a caller and the caller itself deletes the
entry. By using the API, we can avoid the recursive form.


Revision tags: pgoyette-localcount-20161104
# 1.106 25-Oct-2016 ozaki-r

Remove unnecessary argument

No functional change.


# 1.105 21-Oct-2016 ozaki-r

Make some rt_timer functions and variables static

No functional change.


# 1.104 18-Oct-2016 ozaki-r

Remove unused rtcache_lookup_noclone


Revision tags: nick-nhusb-base-20161004
# 1.103 21-Sep-2016 roy

Add ifam_pid and ifam_addrflags to ifa_msghdr.
Re-version RTM_NEWADDR, RTM_DELADDR, RTM_CHGADDR and NET_RT_IFLIST.
Add compat code for old version.


Revision tags: localcount-20160914 pgoyette-localcount-20160806
# 1.102 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


Revision tags: pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529
# 1.101 28-Apr-2016 ozaki-r

branches: 1.101.2;
Constify rtentry of if_output

We no longer need to change rtentry below if_output.

The change makes it clear where rtentries are changed (or not)
and helps forthcoming locking (os psrefing) rtentries.


# 1.100 26-Apr-2016 ozaki-r

Stop using rt_gwroute on packet sending paths

rt_gwroute of rtentry is a reference to a rtentry of the gateway
for a rtentry with RTF_GATEWAY. That was used by L2 (arp and ndp)
to look up L2 addresses. By separating L2 nexthop caches, we don't
need a route for the purpose and we can stop using rt_gwroute.
By doing so, we can reduce referencing and modifying rtentries,
which makes it easy to apply a lock (and/or psref) to the
routing table and rtentries.

One issue to do this is to keep RTF_REJECT behavior. It seems it
was broken when we moved rtalloc1 things from L2 output routines
(e.g., ether_output) to ip_hresolv_output, but (fortunately?)
it works unexpectedly. What we mistook are:
- RTF_REJECT was checked for any routes in L2 output routines,
but in ip_hresolv_output it is checked only when the route
is RTF_GATEWAY
- The RTF_REJECT check wasn't copied to IPv6 (nd6_output)

It seems that rt_gwroute checks hid the mistakes and it looked
work (unexpectedly) and removing rt_gwroute checks unveil the
issue. So we need to fix RTF_REJECT checks in ip_hresolv_output
and also add them to nd6_output.

One more point we have to care is returning an errno; we need
to mimic looutput behavior. Originally RTF_REJECT check was
done either in L2 output routines or in looutput. The latter is
applied when a reject route directs to a loopback interface.
However, now RTF_REJECT check is done before looutput so to keep
the original behavior we need to return an errno which looutput
chooses. Added rt_check_reject_route does such tweaks.


Revision tags: nick-nhusb-base-20160422
# 1.99 11-Apr-2016 ozaki-r

Don't use radix tree API directly


# 1.98 04-Apr-2016 ozaki-r

Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html


# 1.97 24-Mar-2016 ozaki-r

Constify rt_newmsg's arguments


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.96 02-Sep-2015 ozaki-r

Do rt_refcnt++ when set a rtentry to another rtentry's rt_gwroute

And also do rtfree when deref a rtentry from rt_gwroute.


# 1.95 31-Aug-2015 ozaki-r

Hook up lltable/llentry with the kernel (and rumpkernel)

It is built and initialized on bootup, but there is no user for now.

Most codes in in.c are imported from FreeBSD as well as lltable/llentry.


# 1.94 31-Aug-2015 ozaki-r

Make rt_refcnt take into account rt_timer


# 1.93 24-Aug-2015 ozaki-r

Add an assertion; if rtcache has an rtentry, its refcnt must be > 0


# 1.92 17-Jul-2015 ozaki-r

Reform use of rt_refcnt

rt_refcnt of rtentry was used in bad manners, for example, direct rt_refcnt++
and rt_refcnt-- outside route.c, "rt->rt_refcnt++; rtfree(rt);" idiom, and
touching rt after rt->rt_refcnt--.

These abuses seem to be needed because rt_refcnt manages only references
between rtentry and doesn't take care of references during packet processing
(IOW references from local variables). In order to reduce the above abuses,
the latter cases should be counted by rt_refcnt as well as the former cases.

This change improves consistency of use of rt_refcnt:
- rtentry is always accessed with rt_refcnt incremented
- rtentry's rt_refcnt is decremented after use (rtfree is always used instead
of rt_refcnt--)
- functions returning rtentry increment its rt_refcnt (and caller rtfree it)

Note that rt_refcnt prevents rtentry from being freed but doesn't prevent
rtentry from being updated. Toward MP-safe, we need to provide another
protection for rtentry, e.g., locks. (Or introduce a better data structure
allowing concurrent readers during updates.)


Revision tags: nick-nhusb-base-20150606
# 1.91 30-Apr-2015 ozaki-r

Make some functions static

- rtflushall
- rtcache_clear
- rtcache_invalidate

And pull these static inline functions in route.c

- rt_destroy
- rt_setkey


Revision tags: nick-nhusb-base-20150406
# 1.90 06-Apr-2015 ozaki-r

Classify and sort prototype declarations

No functional change.


# 1.89 06-Apr-2015 ozaki-r

Make rt_maskedcopy static


# 1.88 23-Mar-2015 roy

Add RTF_BROADCAST to mark routes used for the broadcast address when
they are created on the fly. This makes it clear what the route is for
and allows an optimisation in ip_output() by avoiding a call to
in_broadcast() because most of the time we do talk to a host.
It also avoids a needless allocation for the storage of llinfo_arp and
thus vanishes from arp(8) - it showed as incomplete anyway so this
is a nice side effect.

Guard against this and routes marked with RTF_BLACKHOLE in
ip_fastforward().
While here, guard against routes marked with RTF_BLACKHOLE in
ip6_fastforward().
RTF_BROADCAST is IPv4 only, so don't bother checking that here.


# 1.87 26-Feb-2015 roy

Introduce the routing flag RTF_LOCAL to track local address routes.
Add functions rt_ifa_addlocal() and rt_ifa_remlocal() to add and remove
local routes for the address and announce the new address and route
to the routing socket.

Add in_ifaddlocal() and in_ifremlocal() to use these functions.
Rename in6_if{add,rem}loop() to in6_if{add,rem}local() and use these
functions.

rtinit() no longer announces the address, just the network route for the
address. As such, calls to rt_newaddrmsg() have been removed from
in_addprefix() and in_scrubprefix().

This solves the problem of potentially more than one announcement, or no
announcement at all for the address in certain situations.


# 1.86 25-Feb-2015 roy

Rename nd6_rtmsg() to rt_newmsg() and move into the generic routing code
as it's not IPv6 specific and will be used elsewhere.


# 1.85 24-Feb-2015 roy

Clean comments and style.


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.84 06-Jun-2014 rmind

branches: 1.84.4;
- Eliminate RTFREE() macro in favour of rtfree() function.
- Make rtcache() function static.


Revision tags: yamt-pagecache-base9 rmind-smpnet-nbase rmind-smpnet-base
# 1.83 26-Apr-2014 pooka

It's been > 20years since rtioctl() did something. Let's just
remove that special way of returning EOPNOTSUPP.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base
# 1.82 01-Mar-2013 joerg

branches: 1.82.6; 1.82.10;
Retire OSI network stack. OK core@


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6 jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.81 18-Feb-2012 rmind

branches: 1.81.2;
rt_setkey: remove invalid assert, sockaddr_dup() may fail if no memory.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.80 11-Nov-2011 gdt

branches: 1.80.4;
Move RTF_ANNOUNCE flag so that it no longer conflicts with RTF_PROTO2.

RTF_ANNOUNCE was defined as RTF_PROTO2. The flag is used to indicated
that host should act as a proxy for a link level arp or ndp request.
(If RTF_PROTO2 is used as an experimental flag (as advertised),
various problems can occur.)

This commit provides a first-class definition with its own bit for
RTF_ANNOUNCE, removes the old aliasing definitions, and adds support
for the new RTF_ANNOUNCE flag to netstat(8) and route(8).,

Also, remove unused RTF_ flags that collide with RTF_PROTO1:
netinet/icmp6.h defined RTF_PROBEMTU as RTF_PROTO1
netinet/if_inarp.h defined RTF_USETRAILERS as RTF_PROTO1
(Neither of these flags are used anywhere. Both have been removed
to reduce chances of collision with RTF_PROTO1.)

Figuring this out and the diff are the work of Beverly Schwartz of
BBN.

(Passed release build, boot in VM, with no apparently related atf
failures.)

Approved for Public Release, Distribution Unlimited
This material is based upon work supported by the Defense Advanced
Research Projects Agency and Space and Naval Warfare Systems Center,
Pacific, under Contract No. N66001-09-C-2073.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.79 31-Mar-2011 dyoung

branches: 1.79.4;
Hide the radix-trie implementation of the forwarding table so that we
will have an easier time replacing it with something different, even if
it is a second radix-trie implementation.

sys/net/route.c and sys/net/rtsock.c no longer operate directly on
radix_nodes or radix_node_heads.

Hopefully this will reduce the temptation to implement multipath or
source-based routing using grotty hacks to the grotty old radix-trie
code, too. :-)


Revision tags: bouyer-quota2-nbase bouyer-quota2-base
# 1.78 01-Feb-2011 matt

Add a new AF/PF_ROUTE which is 64-bit clean which makes the routing socket
interface (and its associated sysctls) act identically for both 32 and 64 bit
programs. The old unclean one remains for backward compatibility.


# 1.77 26-Jan-2011 dyoung

Update comment on RTM_CHGADDR to describe better what it's for.


Revision tags: jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.76 12-Nov-2010 roy

branches: 1.76.2; 1.76.4;
Add RTM_CHGADDR to signal that an address on the interface has changed.
This is mainly used for notifying userland about active link address changes.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.75 26-Jun-2010 kefren

Add MPLS support, proposed on tech-net@ a couple of days ago

Welcome to 5.99.33


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211
# 1.74 03-Nov-2009 dyoung

branches: 1.74.2; 1.74.4;
s/u_quad_t/uint64_t/.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 jym-xensuspend-nbase yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.73 02-Apr-2009 christos

Centralize the ROUNDUP and ADVANCE macro in a header file, give them an
RT_ prefix and use them appropriately, instead of making copies. Make
pppd use the RT_ROUNDUP macro; fixes proxyarp setting on 64 bit hosts.

XXX: All this should be pulled up to 5.0


Revision tags: nick-hppapmap-base2 mjf-devfs2-base
# 1.72 11-Jan-2009 christos

branches: 1.72.2;
merge christos-time_t


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base christos-time_t-nbase haad-dm-base christos-time_t-base
# 1.71 07-Nov-2008 dyoung

*** Summary ***

When a link-layer address changes (e.g., ifconfig ex0 link
02:de:ad:be:ef:02 active), send a gratuitous ARP and/or a Neighbor
Advertisement to update the network-/link-layer address bindings
on our LAN peers.

Refuse a change of ethernet address to the address 00:00:00:00:00:00
or to any multicast/broadcast address. (Thanks matt@.)

Reorder ifnet ioctl operations so that driver ioctls may inherit
the functions of their "class"---ether_ioctl(), fddi_ioctl(), et
cetera---and the class ioctls may inherit from the generic ioctl,
ifioctl_common(), but both driver- and class-ioctls may override
the generic behavior. Make network drivers share more code.

Distinguish a "factory" link-layer address from others for the
purposes of both protecting that address from deletion and computing
EUI64.

Return consistent, appropriate error codes from network drivers.

Improve readability. KNF.

*** Details ***

In if_attach(), always initialize the interface ioctl routine,
ifnet->if_ioctl, if the driver has not already initialized it.
Delete if_ioctl == NULL tests everywhere else, because it cannot
happen.

In the ioctl routines of network interfaces, inherit common ioctl
behaviors by calling either ifioctl_common() or whichever ioctl
routine is appropriate for the class of interface---e.g., ether_ioctl()
for ethernets.

Stop (ab)using SIOCSIFADDR and start to use SIOCINITIFADDR. In
the user->kernel interface, SIOCSIFADDR's argument was an ifreq,
but on the protocol->ifnet interface, SIOCSIFADDR's argument was
an ifaddr. That was confusing, and it would work against me as I
make it possible for a network interface to overload most ioctls.
On the protocol->ifnet interface, replace SIOCSIFADDR with
SIOCINITIFADDR. In ifioctl(), return EPERM if userland tries to
invoke SIOCINITIFADDR.

In ifioctl(), give the interface the first shot at handling most
interface ioctls, and give the protocol the second shot, instead
of the other way around. Finally, let compatibility code (COMPAT_OSOCK)
take a shot.

Pull device initialization out of switch statements under
SIOCINITIFADDR. For example, pull ..._init() out of any switch
statement that looks like this:

switch (...->sa_family) {
case ...:
..._init();
...
break;
...
default:
..._init();
...
break;
}

Rewrite many if-else clauses that handle all permutations of IFF_UP
and IFF_RUNNING to use a switch statement,

switch (x & (IFF_UP|IFF_RUNNING)) {
case 0:
...
break;
case IFF_RUNNING:
...
break;
case IFF_UP:
...
break;
case IFF_UP|IFF_RUNNING:
...
break;
}

unifdef lots of code containing #ifdef FreeBSD, #ifdef NetBSD, and
#ifdef SIOCSIFMTU, especially in fwip(4) and in ndis(4).

In ipw(4), remove an if_set_sadl() call that is out of place.

In nfe(4), reuse the jumbo MTU logic in ether_ioctl().

Let ethernets register a callback for setting h/w state such as
promiscuous mode and the multicast filter in accord with a change
in the if_flags: ether_set_ifflags_cb() registers a callback that
returns ENETRESET if the caller should reset the ethernet by calling
if_init(), 0 on success, != 0 on failure. Pull common code from
ex(4), gem(4), nfe(4), sip(4), tlp(4), vge(4) into ether_ioctl(),
and register if_flags callbacks for those drivers.

Return ENOTTY instead of EINVAL for inappropriate ioctls. In
zyd(4), use ENXIO instead of ENOTTY to indicate that the device is
not any longer attached.

Add to if_set_sadl() a boolean 'factory' argument that indicates
whether a link-layer address was assigned by the factory or some
other source. In a comment, recommend using the factory address
for generating an EUI64, and update in6_get_hw_ifid() to prefer a
factory address to any other link-layer address.

Add a routing message, RTM_LLINFO_UPD, that tells protocols to
update the binding of network-layer addresses to link-layer addresses.
Implement this message in IPv4 and IPv6 by sending a gratuitous
ARP or a neighbor advertisement, respectively. Generate RTM_LLINFO_UPD
messages on a change of an interface's link-layer address.

In ether_ioctl(), do not let SIOCALIFADDR set a link-layer address
that is broadcast/multicast or equal to 00:00:00:00:00:00.

Make ether_ioctl() call ifioctl_common() to handle ioctls that it
does not understand.

In gif(4), initialize if_softc and use it, instead of assuming that
the gif_softc and ifp overlap.

Let ifioctl_common() handle SIOCGIFADDR.

Sprinkle rtcache_invariants(), which checks on DIAGNOSTIC kernels
that certain invariants on a struct route are satisfied.

In agr(4), rewrite agr_ioctl_filter() to be a bit more explicit
about the ioctls that we do not allow on an agr(4) member interface.

bzero -> memset. Delete unnecessary casts to void *. Use
sockaddr_in_init() and sockaddr_in6_init(). Compare pointers with
NULL instead of "testing truth". Replace some instances of (type
*)0 with NULL. Change some K&R prototypes to ANSI C, and join
lines.


Revision tags: netbsd-5-0-RC3 netbsd-5-0-RC2 netbsd-5-0-RC1 netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-baseX yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base yamt-nfs-mp-base yamt-pf42-base ad-socklock-base1
# 1.70 26-Mar-2008 ad

branches: 1.70.2; 1.70.6; 1.70.12; 1.70.14; 1.70.16;
Defer processing of routing messages to a soft interrupt. These can be
generated at IPL_VM and it's not safe to call directly into the socket
layer at that level. Reviewed by matt@.


Revision tags: yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase hpcarm-cleanup-base
# 1.69 20-Feb-2008 matt

branches: 1.69.2; 1.69.6;
s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)


Revision tags: mjf-devfs-base
# 1.68 11-Feb-2008 simonb

Don't look for <stdbool.h> if compiling _STANDALONE as well.


Revision tags: bouyer-xeni386-nbase
# 1.67 21-Jan-2008 dyoung

struct route is part of the kernel ABI (!!!), so move it back
outside of #ifdef _KERNEL. #include stdbool.h if !_KERNEL.


# 1.66 21-Jan-2008 dyoung

Move struct route inside of #ifdef _KERNEL to protect userland from
it.


# 1.65 21-Jan-2008 dyoung

In rtflushall(), do not clear a route cache by removing its rtentry
reference, but mark the cache 'invalid'. Let the next user of the
route cache check to whether or not the cache is valid, and update
the rtentry reference if necessary. In this way, avoid hairy
splnet()/splx() protection of route caches, which I never did trust.


Revision tags: bouyer-xeni386-base
# 1.64 14-Jan-2008 dyoung

Use rtcache_validate() instead of rtcache_getrt(). Delete rtcache_getrt().

In rtcache_lookup2(), use the return values of rtcache_validate()
and _rtcache_init() instead of looking at _ro_rt. Also, check the
return code of rtcache_setdst() for an error.


# 1.63 12-Jan-2008 dyoung

Good-bye, rtcache_check(). Call both rtcache_validate() and
rtcache_update(,1) instead of rtcache_check().


# 1.62 11-Jan-2008 dyoung

Cosmetic: remove redundant 'not' from a comment, re-wrap lines.


# 1.61 10-Jan-2008 dyoung

Make many void rtcache_X() routines return struct rtentry *, so
that we can make many back-to-back rtcache_X();rtcache_getrt()
calls into one rtcache_X() call.


Revision tags: matt-armv6-base
# 1.60 04-Jan-2008 dyoung

Replace rtcache_down() with rtcache_validate() and update rtcache_down()
uses.


Revision tags: vmlocking2-base3
# 1.59 20-Dec-2007 dyoung

Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.


Revision tags: nick-csl-alignment-base5 matt-armv6-prevmlocking yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base jmcneill-pm-base reinoud-bufcleanup-base vmlocking-base
# 1.58 27-Aug-2007 dyoung

branches: 1.58.2; 1.58.8; 1.58.10; 1.58.14;
Add a new routing message type, RTM_SETGATE. We can use an
RTM_SETGATE message to ask the link layer to fill in the link-layer
nexthop before we try to detect a duplicate route in a multipath-capable
kernel.


Revision tags: matt-mips64-base
# 1.57 19-Jul-2007 dyoung

branches: 1.57.4; 1.57.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.56 09-Jun-2007 dyoung

branches: 1.56.2;
Get rid of radix_node_head.rnh_walktree, because it is only ever
set to rn_walktree.

Introduce rt_walktree(), which applies a subroutine to every route
in a particular address family. Use it instead of rn_walktree()
virtually everywhere. This helps to hide the routing table
implementation.


Revision tags: yamt-idlelwp-base8
# 1.55 06-May-2007 dyoung

Factor rtcache_lookup2() out of rtcache_lookup1(), for re-use in
the IPv6 stack. rtcache_lookup2() takes an int * argument that it
writes with 1 if we had a cache 'hit', 0 if there was a cache
'miss'.


# 1.54 02-May-2007 dyoung

Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


# 1.53 22-Apr-2007 xtraeme

rtcache_clear is defined as static void in route.c, but it's used
in netinet/in_route.c. Move the prototype into route.h to fix
the build.


Revision tags: thorpej-atomic-base
# 1.52 04-Mar-2007 christos

branches: 1.52.2; 1.52.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.51 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.50 05-Jan-2007 joerg

branches: 1.50.2;
Add a debug option for the route cache to help tracing down issues
like PR 35272 and 35318. When the kernel is compiled with
-DRTCACHE_DEBUG, all rtcache entries are logged to a list with the place
they got initialised. This allows overwrites, double inits and other
manual messing to be detected.


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.49 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.48 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


# 1.47 07-Dec-2006 joerg

Deinline rt_get_ifa. Keep it in route.c as it is part of the routing
API, even though rtsock.c is the only user right now.


# 1.46 07-Dec-2006 joerg

Deinline rt_replace_ifa and move rt_set_ifa and rt_set_ifa1 to
route.c as they are not used outside that file.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base netbsd-4-base
# 1.45 13-Nov-2006 dyoung

Fix bugs in rt_get_ifa() and put aside the sequence number stuff,
which isn't ready for primetime yet.


# 1.44 13-Nov-2006 dyoung

Add a source-address selection policy mechanism to the kernel.

Also, add ioctls SIOCGIFADDRPREF/SIOCSIFADDRPREF to get/set preference
numbers for addresses. Make ifconfig(8) set/display preference
numbers.

To activate source-address selection policies in your kernel, add
'options IPSELSRC' to your kernel configuration.

Miscellaneous changes in support of source-address selection:

1 Factor out some common code, producing rt_replace_ifa().

2 Abbreviate a for-loop with TAILQ_FOREACH().

3 Add the predicates on IPv4 addresses IN_LINKLOCAL() and
IN_PRIVATE(), that are true for link-local unicast
(169.254/16) and RFC1918 private addresses, respectively.
Add the predicate IN_ANY_LOCAL() that is true for link-local
unicast and multicast.

4 Add IPv4-specific interface attach/detach routines,
in_domifattach and in_domifdetach, which build #ifdef
IPSELSRC.

See in_getifa(9) for a more thorough description of source-address
selection policy.


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base2 yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base simonb-timcounters-final yamt-pdpolicy-base5 chap-midi-base yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 elad-kernelauth-base yamt-pdpolicy-base yamt-uio_vmspace-base5 simonb-timecounters-base rpaulo-netinet-merge-pcb-base
# 1.43 11-Dec-2005 christos

branches: 1.43.20; 1.43.22;
merge ktrace-lwp.


Revision tags: ktrace-lwp-base
# 1.42 10-Dec-2005 elad

Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.41 22-Jun-2005 dyoung

branches: 1.41.2;
Resolve conflicts in importation of 18-May-2005 ath(4) / net80211(9)
from FreeBSD. Introduce compatibility shims (sys/dev/ic/ath_netbsd.[ch],
sys/net80211/ieee80211_netbsd.[ch]). Update drivers (an, atu, atw,
awi, ipw, iwi, rtw, wi) for the new net80211(9) API.


# 1.40 29-May-2005 christos

- sprinkle const
- remove unneeded casts
- use more mem*() instead of b*() funcs.


Revision tags: netbsd-3-1-1-RELEASE netbsd-3-0-3-RELEASE netbsd-3-1-RELEASE netbsd-3-0-2-RELEASE netbsd-3-1-RC4 netbsd-3-1-RC3 netbsd-3-1-RC2 netbsd-3-1-RC1 netbsd-3-0-1-RELEASE netbsd-3-0-RELEASE netbsd-3-0-RC6 netbsd-3-0-RC5 netbsd-3-0-RC4 netbsd-3-0-RC3 netbsd-3-0-RC2 netbsd-3-0-RC1 yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.39 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.38 21-Apr-2004 matt

branches: 1.38.4; 1.38.6;
ANSI-fy and some additional de-__P and constification.


# 1.37 21-Apr-2004 matt

Constify if.c radix.c and route.c (and fix related fallout).


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.36 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.35 29-Jun-2003 fvdl

branches: 1.35.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.34 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.33 18-Jan-2003 wiz

bandwidth, not bandwith.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.32 12-Nov-2002 itojun

remove all entries in rt timer queue on ip_mtudisc change, instead of
destroying the queue.


# 1.31 12-Nov-2002 itojun

add an argument to rt_timer_remove_all(), to specify if we need to call
timeout routine on removal.


# 1.30 02-Nov-2002 perry

/*CONTCOND*/ while (0)'ed macros


Revision tags: kqueue-aftermerge kqueue-beforemerge netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base gehenna-devsw-base kqueue-base
# 1.29 12-May-2002 matt

branches: 1.29.4;
Eliminate more commons.


Revision tags: eeh-devprop-base newlock-base ifpoll-base thorpej-mips-cache-base thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.28 08-Mar-2001 enami

branches: 1.28.2;
- lineup comment.
- fix typo in comment.


# 1.27 21-Feb-2001 itojun

branches: 1.27.2;
use u_quad_t for rtstat.
not sure if it really matters, but short (32K) looks way too small given
recent fat pipes connecting *BSD boxes, and our great uptime :-).


# 1.26 27-Jan-2001 itojun

cleanup cloned route when parent route (RTF_CLONING) goes away.
adds rt_parent to link parent from child (like NRL did, ours do refcnt
rt_refcnt properly).

bsdi rt_walkbranch would speedup the processing, but since the code will not
be visited too frequently, the current code (with rt_walktree) should be okay.


# 1.25 27-Jan-2001 itojun

mark cloned routes with RTF_CLONED. present it with netstat -r by "c".

let static routes overwrite cloned routes, as cloned routes can come back again
if necessary. behavior same as freebsd/bsdi, code partially from bsdi42.
(NRL rt->rt_parent was not added)
should fix PR 11916 and maybe some other PRs with ARP behavior.

recompilation of usr.sbin/route6d is suggested.


# 1.24 17-Jan-2001 itojun

pull post-4.4BSD change to sys/net/route.c from BSD/OS 4.2 (UCB copyrighted).

have sys/net/route.c:rtrequest1(), which takes rt_addrinfo * as the argument.
pass rt_addrinfo all the way down to rtrequest, and ifa->ifa_rtrequest.
3rd arg of ifa->ifa_rtrequest is now rt_addrinfo * instead of sockaddr *
(almost noone is using it anyways).

benefit: the follwoing command now works. previously we need two route(8)
invocations, "add" then "change".
# route add -inet6 default ::1 -ifp gif0

remove unsafe typecast in rtrequest(), from rtentry * to sockaddr *. it was
introduced by 4.3BSD-reno and never corrected.

XXX is eon_rtrequest() change correct regarding to 3rd arg?
eon_rtrequest() and rtrequest() were incorrect since 4.3BSD-reno,
so i do not have correct answer in the source code.
someone with more clue about netiso-over-ip, please help.


# 1.23 09-Dec-2000 itojun

update icmp6 too big validation. the change is necessary since pmtud is
mandatory for IPv6 (so we can't just validate by using connected pcb - we need
to allow traffic from unconnected pcb to do pmtud).
- if the traffic is validated by xx_ctlinput, allow up to "hiwat" pmtud
route entries.
- if the traffic was not validated by xx_ctlinput, allow up to "lowat" pmtud
route entries (there's upper limit, so bad guys cannot blow up our routing
table).
sync with kame

XXX need to think again about default hiwat/lowat value.
XXX victim selection to help starvation case


Revision tags: netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.22 04-May-2000 ragge

branches: 1.22.4;
Change rt_refcnt from short to int, to allow more than 32k routes thru
one interface without unexpected side effects.


# 1.21 06-Mar-2000 thorpej

- Add link status to if_data, so that routing daemons and other interested
parties can easily know the state of a link.
- Define an interface announcement message for the routing socket so that
routing daemons and other interested parties know when an interface
is attached/detached.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.20 19-Nov-1999 bouyer

Update protocoles and interfaces stats counters to 64bit.
RTM_IFINFO is now 0xf, 0xe is RTM_OIFINFO which returns the old (if_msghdr14)
struct with 32bit counters (binary compat, conditioned on COMPAT_14).
Same for sysctl: node 3 is renamed NET_RT_OIFLIST, NET_RT_IFLIST is now node 4.
Change rt_msg1() to add an mbuf to the mbuf chain instead of just panic()
when the message is larger than MHLEN.


Revision tags: comdex-fall-1999-base fvdl-softdep-base chs-ubc2-base
# 1.19 30-Jul-1999 itojun

branches: 1.19.2; 1.19.8;
remove reference to in6_systm.h (file itself will be removed afterwords)


# 1.18 01-Jul-1999 itojun

IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


Revision tags: netbsd-1-4-PATCH003 netbsd-1-4-PATCH002 netbsd-1-4-PATCH001 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.17 27-Dec-1998 thorpej

branches: 1.17.4; 1.17.6;
Simplify the rttimer code somewhat; use TAILQs instead of CIRCLEQs (we
didn't really need to traverse the queues backwards anyhow), and other
minor code simplification.


# 1.16 10-Dec-1998 christos

IPX counters and centralize statistics routine.


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.15 25-Aug-1998 thorpej

Use do { ... } while (0) in RTFREE().


Revision tags: eeh-paddr_t-base
# 1.14 02-May-1998 thorpej

Need <sys/socket.h> to stand alone.


# 1.13 29-Apr-1998 thorpej

Oops, we depend on <sys/queue.h>.


# 1.12 29-Apr-1998 kml

Add generic route timeout functionality; used by path MTU discovery code


Revision tags: netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base thorpej-signal-base marc-pcmcia-bp marc-pcmcia-base
# 1.11 02-Apr-1997 christos

branches: 1.11.8;
Sync with Lite2.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.10 22-May-1996 mycroft

Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.9 13-Feb-1996 christos

branches: 1.9.4;
Net prototypes


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.8 26-Mar-1995 jtc

KERNEL -> _KERNEL


# 1.7 08-Mar-1995 cgd

fixed sized types, where appropriate. when casting pointers to
integers to do math on them, cast to long. ioctl commands are
u_longs.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.6 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.5 13-May-1994 mycroft

Update to 4.4-Lite networking code, with a few local changes.


# 1.4 11-May-1994 mycroft

Update to RTM version 3. Add prototypes. Add some new constants which are
not used yet.


Revision tags: magnum-base netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.3 20-May-1993 cgd

add rcs ids to everything, and clean up headers


# 1.2 19-Apr-1993 mycroft

Add consistent multiple-inclusion protection.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.127 09-Mar-2020 roy

route: RTM_MISS now puts the message source address in RTA_AUTHOR

route(8) also reports this.
A userland app could use this to blacklist nodes who probe for machines
that doesn't exist on a subnet / prefix.


Revision tags: ad-namecache-base3
# 1.126 08-Feb-2020 roy

route(4): add RO_MISSFILTER socket option

This allows filtering of specific RTM_MISS destination sockaddrs.


Revision tags: ad-namecache-base2 ad-namecache-base1 ad-namecache-base phil-wifi-20191119
# 1.125 19-Sep-2019 ozaki-r

branches: 1.125.2;
Avoid having a rtcache directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.
A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.

Reviewed by knakahara@ and yamaguchi@


# 1.124 22-Aug-2019 roy

rtsock: rework rt_clonedmsg to take a message type and lladdr

We will use this in a future patch to notify userland of lladdr
changes.

XXX pullup -8 -9


Revision tags: netbsd-9-base phil-wifi-20190609
# 1.123 29-Apr-2019 roy

branches: 1.123.2;
Introduce rt_addrmsg_src which adds RTA_AUTHOR to the message.
Use this when we notify userland of a duplicate address
and set RTA_AUTHOR to the hardware address of the sender.

While here, match the logging diagnostic of INET6 to the simpler one
of INET so it's consistent.


# 1.122 29-Apr-2019 roy

rtsock: Route address message simplification

Rename rt_newaddrmsg to rt_addrmsg_rt.
Add rt_addrmsg which drops the error and route arguments which are only
needed by one caller.


# 1.121 29-Apr-2019 pgoyette

For the rtsock compat code, make sure we create the "oroute" sysctl
tree. Otherwise a 5.2 version of getifaddrs(2) gets errors.

This makes the 5.2 version of ifconfig(8) behave the same on both
NetBSD-8 and -current. HOWEVER, both of them print nothing (for
``ifconfig -l'' command) so there's still a bug somewhere.

As reported originally by der Mouse.


Revision tags: isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126
# 1.120 30-Oct-2018 ozaki-r

Avoid double rt_replace_ifa on rtrequest1(RTM_ADD)

Some callers of rtrequest1(RTM_ADD) adjust rt_ifa of an rtentry created by
rtrequest1 that may change rt_ifa (in ifa_rtrequest) with another ifa that is
different from requested one. It's wasteful and even worse introduces a race
condition. rtrequest1 should just use a passed ifa as is if a caller hopes so.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422
# 1.119 19-Apr-2018 christos

branches: 1.119.2;
s/static inline/static __inline/g for consistency.


Revision tags: pgoyette-compat-0415
# 1.118 12-Apr-2018 ozaki-r

Resolve tangled lock dependencies in route.c

This change sweeps remaining lock decisions based on if locked or not by moving
utility functions of rtentry updates from rtsock.c and ensuring holding the
rt_lock. It also improves the atomicity of a update of a rtentry.


Revision tags: pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.117 09-Jan-2018 christos

branches: 1.117.2;
Use a queue of deferred entries to delete routes instead of a fixed stack
of 10. Otherwise we can overflow in route deletions from the rexmit timer.
XXX: pullup-8


# 1.116 18-Dec-2017 ozaki-r

Show ARP/NDP caches as LLINFO not LLDATA for backward compatiblity


# 1.115 13-Dec-2017 christos

Add bit definitions for flags so that route(8) can use them.


Revision tags: tls-maxphys-base-20171202
# 1.114 21-Sep-2017 ozaki-r

Invalidate rtcache based on a global generation counter

The change introduces a global generation counter that is incremented when any
routes have been added or deleted. When a rtcache caches a rtentry into itself,
it also stores a snapshot of the generation counter. If the snapshot equals to
the global counter, the cache is still valid, otherwise invalidated.

One drawback of the change is that all rtcaches of all protocol families are
invalidated when any routes of any protocol families are added or deleted.
If that matters, we should have separate generation counters based on
protocol families.

This change removes LIST_ENTRY from struct route, which fixes a part of
PR kern/52515.


Revision tags: nick-nhusb-base-20170825 perseant-stdc-iso10646-base
# 1.113 16-Jun-2017 ozaki-r

Sending a routing message (RTM_ADD) on adding an llentry

A message used to be sent on adding a cloned route. Restore the
behavior for backward compatibility.

Requested by ryo@


Revision tags: netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.112 11-Apr-2017 roy

branches: 1.112.4;
Add RO_MSGFILTER socket option to PF_ROUTE to filter out
un-wanted route(4) messages.

Inspired by the ROUTE_MSGFILTER equivalent in OpenBSD,
but with an API which allows the full range of potential message types.


Revision tags: jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107
# 1.111 19-Dec-2016 roy

branches: 1.111.2;
Fix gcc complaining about int to unsigned long conversion issues by
explictly marking as unsigned in RT_ROUNDUP2.


# 1.110 16-Dec-2016 christos

Can't hide stuff from userland, because struct route is embedded in other
structures (like inpcb) and things like fstat stop working.


# 1.109 12-Dec-2016 ozaki-r

Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.108 08-Dec-2016 ozaki-r

Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.


Revision tags: nick-nhusb-base-20161204
# 1.107 15-Nov-2016 ozaki-r

Don't use rt_walktree to delete routes

Some functions use rt_walktree to scan the routing table and delete
matched routes. However, we shouldn't use rt_walktree to delete
routes because rt_walktree is recursive to the routing table (radix
tree) and isn't friendly to MP-ification. rt_walktree allows a caller
to pass a callback function to delete an matched entry. The callback
function is called from an API of the radix tree (rn_walktree) but
also calls an API of the radix tree to delete an entry.

This change adds a new API of the radix tree, rn_search_matched,
which returns a matched entry that is selected by a callback
function passed by a caller and the caller itself deletes the
entry. By using the API, we can avoid the recursive form.


Revision tags: pgoyette-localcount-20161104
# 1.106 25-Oct-2016 ozaki-r

Remove unnecessary argument

No functional change.


# 1.105 21-Oct-2016 ozaki-r

Make some rt_timer functions and variables static

No functional change.


# 1.104 18-Oct-2016 ozaki-r

Remove unused rtcache_lookup_noclone


Revision tags: nick-nhusb-base-20161004
# 1.103 21-Sep-2016 roy

Add ifam_pid and ifam_addrflags to ifa_msghdr.
Re-version RTM_NEWADDR, RTM_DELADDR, RTM_CHGADDR and NET_RT_IFLIST.
Add compat code for old version.


Revision tags: localcount-20160914 pgoyette-localcount-20160806
# 1.102 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


Revision tags: pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529
# 1.101 28-Apr-2016 ozaki-r

branches: 1.101.2;
Constify rtentry of if_output

We no longer need to change rtentry below if_output.

The change makes it clear where rtentries are changed (or not)
and helps forthcoming locking (os psrefing) rtentries.


# 1.100 26-Apr-2016 ozaki-r

Stop using rt_gwroute on packet sending paths

rt_gwroute of rtentry is a reference to a rtentry of the gateway
for a rtentry with RTF_GATEWAY. That was used by L2 (arp and ndp)
to look up L2 addresses. By separating L2 nexthop caches, we don't
need a route for the purpose and we can stop using rt_gwroute.
By doing so, we can reduce referencing and modifying rtentries,
which makes it easy to apply a lock (and/or psref) to the
routing table and rtentries.

One issue to do this is to keep RTF_REJECT behavior. It seems it
was broken when we moved rtalloc1 things from L2 output routines
(e.g., ether_output) to ip_hresolv_output, but (fortunately?)
it works unexpectedly. What we mistook are:
- RTF_REJECT was checked for any routes in L2 output routines,
but in ip_hresolv_output it is checked only when the route
is RTF_GATEWAY
- The RTF_REJECT check wasn't copied to IPv6 (nd6_output)

It seems that rt_gwroute checks hid the mistakes and it looked
work (unexpectedly) and removing rt_gwroute checks unveil the
issue. So we need to fix RTF_REJECT checks in ip_hresolv_output
and also add them to nd6_output.

One more point we have to care is returning an errno; we need
to mimic looutput behavior. Originally RTF_REJECT check was
done either in L2 output routines or in looutput. The latter is
applied when a reject route directs to a loopback interface.
However, now RTF_REJECT check is done before looutput so to keep
the original behavior we need to return an errno which looutput
chooses. Added rt_check_reject_route does such tweaks.


Revision tags: nick-nhusb-base-20160422
# 1.99 11-Apr-2016 ozaki-r

Don't use radix tree API directly


# 1.98 04-Apr-2016 ozaki-r

Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html


# 1.97 24-Mar-2016 ozaki-r

Constify rt_newmsg's arguments


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.96 02-Sep-2015 ozaki-r

Do rt_refcnt++ when set a rtentry to another rtentry's rt_gwroute

And also do rtfree when deref a rtentry from rt_gwroute.


# 1.95 31-Aug-2015 ozaki-r

Hook up lltable/llentry with the kernel (and rumpkernel)

It is built and initialized on bootup, but there is no user for now.

Most codes in in.c are imported from FreeBSD as well as lltable/llentry.


# 1.94 31-Aug-2015 ozaki-r

Make rt_refcnt take into account rt_timer


# 1.93 24-Aug-2015 ozaki-r

Add an assertion; if rtcache has an rtentry, its refcnt must be > 0


# 1.92 17-Jul-2015 ozaki-r

Reform use of rt_refcnt

rt_refcnt of rtentry was used in bad manners, for example, direct rt_refcnt++
and rt_refcnt-- outside route.c, "rt->rt_refcnt++; rtfree(rt);" idiom, and
touching rt after rt->rt_refcnt--.

These abuses seem to be needed because rt_refcnt manages only references
between rtentry and doesn't take care of references during packet processing
(IOW references from local variables). In order to reduce the above abuses,
the latter cases should be counted by rt_refcnt as well as the former cases.

This change improves consistency of use of rt_refcnt:
- rtentry is always accessed with rt_refcnt incremented
- rtentry's rt_refcnt is decremented after use (rtfree is always used instead
of rt_refcnt--)
- functions returning rtentry increment its rt_refcnt (and caller rtfree it)

Note that rt_refcnt prevents rtentry from being freed but doesn't prevent
rtentry from being updated. Toward MP-safe, we need to provide another
protection for rtentry, e.g., locks. (Or introduce a better data structure
allowing concurrent readers during updates.)


Revision tags: nick-nhusb-base-20150606
# 1.91 30-Apr-2015 ozaki-r

Make some functions static

- rtflushall
- rtcache_clear
- rtcache_invalidate

And pull these static inline functions in route.c

- rt_destroy
- rt_setkey


Revision tags: nick-nhusb-base-20150406
# 1.90 06-Apr-2015 ozaki-r

Classify and sort prototype declarations

No functional change.


# 1.89 06-Apr-2015 ozaki-r

Make rt_maskedcopy static


# 1.88 23-Mar-2015 roy

Add RTF_BROADCAST to mark routes used for the broadcast address when
they are created on the fly. This makes it clear what the route is for
and allows an optimisation in ip_output() by avoiding a call to
in_broadcast() because most of the time we do talk to a host.
It also avoids a needless allocation for the storage of llinfo_arp and
thus vanishes from arp(8) - it showed as incomplete anyway so this
is a nice side effect.

Guard against this and routes marked with RTF_BLACKHOLE in
ip_fastforward().
While here, guard against routes marked with RTF_BLACKHOLE in
ip6_fastforward().
RTF_BROADCAST is IPv4 only, so don't bother checking that here.


# 1.87 26-Feb-2015 roy

Introduce the routing flag RTF_LOCAL to track local address routes.
Add functions rt_ifa_addlocal() and rt_ifa_remlocal() to add and remove
local routes for the address and announce the new address and route
to the routing socket.

Add in_ifaddlocal() and in_ifremlocal() to use these functions.
Rename in6_if{add,rem}loop() to in6_if{add,rem}local() and use these
functions.

rtinit() no longer announces the address, just the network route for the
address. As such, calls to rt_newaddrmsg() have been removed from
in_addprefix() and in_scrubprefix().

This solves the problem of potentially more than one announcement, or no
announcement at all for the address in certain situations.


# 1.86 25-Feb-2015 roy

Rename nd6_rtmsg() to rt_newmsg() and move into the generic routing code
as it's not IPv6 specific and will be used elsewhere.


# 1.85 24-Feb-2015 roy

Clean comments and style.


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.84 06-Jun-2014 rmind

branches: 1.84.4;
- Eliminate RTFREE() macro in favour of rtfree() function.
- Make rtcache() function static.


Revision tags: yamt-pagecache-base9 rmind-smpnet-nbase rmind-smpnet-base
# 1.83 26-Apr-2014 pooka

It's been > 20years since rtioctl() did something. Let's just
remove that special way of returning EOPNOTSUPP.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base
# 1.82 01-Mar-2013 joerg

branches: 1.82.6; 1.82.10;
Retire OSI network stack. OK core@


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6 jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.81 18-Feb-2012 rmind

branches: 1.81.2;
rt_setkey: remove invalid assert, sockaddr_dup() may fail if no memory.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.80 11-Nov-2011 gdt

branches: 1.80.4;
Move RTF_ANNOUNCE flag so that it no longer conflicts with RTF_PROTO2.

RTF_ANNOUNCE was defined as RTF_PROTO2. The flag is used to indicated
that host should act as a proxy for a link level arp or ndp request.
(If RTF_PROTO2 is used as an experimental flag (as advertised),
various problems can occur.)

This commit provides a first-class definition with its own bit for
RTF_ANNOUNCE, removes the old aliasing definitions, and adds support
for the new RTF_ANNOUNCE flag to netstat(8) and route(8).,

Also, remove unused RTF_ flags that collide with RTF_PROTO1:
netinet/icmp6.h defined RTF_PROBEMTU as RTF_PROTO1
netinet/if_inarp.h defined RTF_USETRAILERS as RTF_PROTO1
(Neither of these flags are used anywhere. Both have been removed
to reduce chances of collision with RTF_PROTO1.)

Figuring this out and the diff are the work of Beverly Schwartz of
BBN.

(Passed release build, boot in VM, with no apparently related atf
failures.)

Approved for Public Release, Distribution Unlimited
This material is based upon work supported by the Defense Advanced
Research Projects Agency and Space and Naval Warfare Systems Center,
Pacific, under Contract No. N66001-09-C-2073.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.79 31-Mar-2011 dyoung

branches: 1.79.4;
Hide the radix-trie implementation of the forwarding table so that we
will have an easier time replacing it with something different, even if
it is a second radix-trie implementation.

sys/net/route.c and sys/net/rtsock.c no longer operate directly on
radix_nodes or radix_node_heads.

Hopefully this will reduce the temptation to implement multipath or
source-based routing using grotty hacks to the grotty old radix-trie
code, too. :-)


Revision tags: bouyer-quota2-nbase bouyer-quota2-base
# 1.78 01-Feb-2011 matt

Add a new AF/PF_ROUTE which is 64-bit clean which makes the routing socket
interface (and its associated sysctls) act identically for both 32 and 64 bit
programs. The old unclean one remains for backward compatibility.


# 1.77 26-Jan-2011 dyoung

Update comment on RTM_CHGADDR to describe better what it's for.


Revision tags: jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.76 12-Nov-2010 roy

branches: 1.76.2; 1.76.4;
Add RTM_CHGADDR to signal that an address on the interface has changed.
This is mainly used for notifying userland about active link address changes.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.75 26-Jun-2010 kefren

Add MPLS support, proposed on tech-net@ a couple of days ago

Welcome to 5.99.33


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211
# 1.74 03-Nov-2009 dyoung

branches: 1.74.2; 1.74.4;
s/u_quad_t/uint64_t/.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 jym-xensuspend-nbase yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.73 02-Apr-2009 christos

Centralize the ROUNDUP and ADVANCE macro in a header file, give them an
RT_ prefix and use them appropriately, instead of making copies. Make
pppd use the RT_ROUNDUP macro; fixes proxyarp setting on 64 bit hosts.

XXX: All this should be pulled up to 5.0


Revision tags: nick-hppapmap-base2 mjf-devfs2-base
# 1.72 11-Jan-2009 christos

branches: 1.72.2;
merge christos-time_t


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base christos-time_t-nbase haad-dm-base christos-time_t-base
# 1.71 07-Nov-2008 dyoung

*** Summary ***

When a link-layer address changes (e.g., ifconfig ex0 link
02:de:ad:be:ef:02 active), send a gratuitous ARP and/or a Neighbor
Advertisement to update the network-/link-layer address bindings
on our LAN peers.

Refuse a change of ethernet address to the address 00:00:00:00:00:00
or to any multicast/broadcast address. (Thanks matt@.)

Reorder ifnet ioctl operations so that driver ioctls may inherit
the functions of their "class"---ether_ioctl(), fddi_ioctl(), et
cetera---and the class ioctls may inherit from the generic ioctl,
ifioctl_common(), but both driver- and class-ioctls may override
the generic behavior. Make network drivers share more code.

Distinguish a "factory" link-layer address from others for the
purposes of both protecting that address from deletion and computing
EUI64.

Return consistent, appropriate error codes from network drivers.

Improve readability. KNF.

*** Details ***

In if_attach(), always initialize the interface ioctl routine,
ifnet->if_ioctl, if the driver has not already initialized it.
Delete if_ioctl == NULL tests everywhere else, because it cannot
happen.

In the ioctl routines of network interfaces, inherit common ioctl
behaviors by calling either ifioctl_common() or whichever ioctl
routine is appropriate for the class of interface---e.g., ether_ioctl()
for ethernets.

Stop (ab)using SIOCSIFADDR and start to use SIOCINITIFADDR. In
the user->kernel interface, SIOCSIFADDR's argument was an ifreq,
but on the protocol->ifnet interface, SIOCSIFADDR's argument was
an ifaddr. That was confusing, and it would work against me as I
make it possible for a network interface to overload most ioctls.
On the protocol->ifnet interface, replace SIOCSIFADDR with
SIOCINITIFADDR. In ifioctl(), return EPERM if userland tries to
invoke SIOCINITIFADDR.

In ifioctl(), give the interface the first shot at handling most
interface ioctls, and give the protocol the second shot, instead
of the other way around. Finally, let compatibility code (COMPAT_OSOCK)
take a shot.

Pull device initialization out of switch statements under
SIOCINITIFADDR. For example, pull ..._init() out of any switch
statement that looks like this:

switch (...->sa_family) {
case ...:
..._init();
...
break;
...
default:
..._init();
...
break;
}

Rewrite many if-else clauses that handle all permutations of IFF_UP
and IFF_RUNNING to use a switch statement,

switch (x & (IFF_UP|IFF_RUNNING)) {
case 0:
...
break;
case IFF_RUNNING:
...
break;
case IFF_UP:
...
break;
case IFF_UP|IFF_RUNNING:
...
break;
}

unifdef lots of code containing #ifdef FreeBSD, #ifdef NetBSD, and
#ifdef SIOCSIFMTU, especially in fwip(4) and in ndis(4).

In ipw(4), remove an if_set_sadl() call that is out of place.

In nfe(4), reuse the jumbo MTU logic in ether_ioctl().

Let ethernets register a callback for setting h/w state such as
promiscuous mode and the multicast filter in accord with a change
in the if_flags: ether_set_ifflags_cb() registers a callback that
returns ENETRESET if the caller should reset the ethernet by calling
if_init(), 0 on success, != 0 on failure. Pull common code from
ex(4), gem(4), nfe(4), sip(4), tlp(4), vge(4) into ether_ioctl(),
and register if_flags callbacks for those drivers.

Return ENOTTY instead of EINVAL for inappropriate ioctls. In
zyd(4), use ENXIO instead of ENOTTY to indicate that the device is
not any longer attached.

Add to if_set_sadl() a boolean 'factory' argument that indicates
whether a link-layer address was assigned by the factory or some
other source. In a comment, recommend using the factory address
for generating an EUI64, and update in6_get_hw_ifid() to prefer a
factory address to any other link-layer address.

Add a routing message, RTM_LLINFO_UPD, that tells protocols to
update the binding of network-layer addresses to link-layer addresses.
Implement this message in IPv4 and IPv6 by sending a gratuitous
ARP or a neighbor advertisement, respectively. Generate RTM_LLINFO_UPD
messages on a change of an interface's link-layer address.

In ether_ioctl(), do not let SIOCALIFADDR set a link-layer address
that is broadcast/multicast or equal to 00:00:00:00:00:00.

Make ether_ioctl() call ifioctl_common() to handle ioctls that it
does not understand.

In gif(4), initialize if_softc and use it, instead of assuming that
the gif_softc and ifp overlap.

Let ifioctl_common() handle SIOCGIFADDR.

Sprinkle rtcache_invariants(), which checks on DIAGNOSTIC kernels
that certain invariants on a struct route are satisfied.

In agr(4), rewrite agr_ioctl_filter() to be a bit more explicit
about the ioctls that we do not allow on an agr(4) member interface.

bzero -> memset. Delete unnecessary casts to void *. Use
sockaddr_in_init() and sockaddr_in6_init(). Compare pointers with
NULL instead of "testing truth". Replace some instances of (type
*)0 with NULL. Change some K&R prototypes to ANSI C, and join
lines.


Revision tags: netbsd-5-0-RC3 netbsd-5-0-RC2 netbsd-5-0-RC1 netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-baseX yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base yamt-nfs-mp-base yamt-pf42-base ad-socklock-base1
# 1.70 26-Mar-2008 ad

branches: 1.70.2; 1.70.6; 1.70.12; 1.70.14; 1.70.16;
Defer processing of routing messages to a soft interrupt. These can be
generated at IPL_VM and it's not safe to call directly into the socket
layer at that level. Reviewed by matt@.


Revision tags: yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase hpcarm-cleanup-base
# 1.69 20-Feb-2008 matt

branches: 1.69.2; 1.69.6;
s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)


Revision tags: mjf-devfs-base
# 1.68 11-Feb-2008 simonb

Don't look for <stdbool.h> if compiling _STANDALONE as well.


Revision tags: bouyer-xeni386-nbase
# 1.67 21-Jan-2008 dyoung

struct route is part of the kernel ABI (!!!), so move it back
outside of #ifdef _KERNEL. #include stdbool.h if !_KERNEL.


# 1.66 21-Jan-2008 dyoung

Move struct route inside of #ifdef _KERNEL to protect userland from
it.


# 1.65 21-Jan-2008 dyoung

In rtflushall(), do not clear a route cache by removing its rtentry
reference, but mark the cache 'invalid'. Let the next user of the
route cache check to whether or not the cache is valid, and update
the rtentry reference if necessary. In this way, avoid hairy
splnet()/splx() protection of route caches, which I never did trust.


Revision tags: bouyer-xeni386-base
# 1.64 14-Jan-2008 dyoung

Use rtcache_validate() instead of rtcache_getrt(). Delete rtcache_getrt().

In rtcache_lookup2(), use the return values of rtcache_validate()
and _rtcache_init() instead of looking at _ro_rt. Also, check the
return code of rtcache_setdst() for an error.


# 1.63 12-Jan-2008 dyoung

Good-bye, rtcache_check(). Call both rtcache_validate() and
rtcache_update(,1) instead of rtcache_check().


# 1.62 11-Jan-2008 dyoung

Cosmetic: remove redundant 'not' from a comment, re-wrap lines.


# 1.61 10-Jan-2008 dyoung

Make many void rtcache_X() routines return struct rtentry *, so
that we can make many back-to-back rtcache_X();rtcache_getrt()
calls into one rtcache_X() call.


Revision tags: matt-armv6-base
# 1.60 04-Jan-2008 dyoung

Replace rtcache_down() with rtcache_validate() and update rtcache_down()
uses.


Revision tags: vmlocking2-base3
# 1.59 20-Dec-2007 dyoung

Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.


Revision tags: nick-csl-alignment-base5 matt-armv6-prevmlocking yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base jmcneill-pm-base reinoud-bufcleanup-base vmlocking-base
# 1.58 27-Aug-2007 dyoung

branches: 1.58.2; 1.58.8; 1.58.10; 1.58.14;
Add a new routing message type, RTM_SETGATE. We can use an
RTM_SETGATE message to ask the link layer to fill in the link-layer
nexthop before we try to detect a duplicate route in a multipath-capable
kernel.


Revision tags: matt-mips64-base
# 1.57 19-Jul-2007 dyoung

branches: 1.57.4; 1.57.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.56 09-Jun-2007 dyoung

branches: 1.56.2;
Get rid of radix_node_head.rnh_walktree, because it is only ever
set to rn_walktree.

Introduce rt_walktree(), which applies a subroutine to every route
in a particular address family. Use it instead of rn_walktree()
virtually everywhere. This helps to hide the routing table
implementation.


Revision tags: yamt-idlelwp-base8
# 1.55 06-May-2007 dyoung

Factor rtcache_lookup2() out of rtcache_lookup1(), for re-use in
the IPv6 stack. rtcache_lookup2() takes an int * argument that it
writes with 1 if we had a cache 'hit', 0 if there was a cache
'miss'.


# 1.54 02-May-2007 dyoung

Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


# 1.53 22-Apr-2007 xtraeme

rtcache_clear is defined as static void in route.c, but it's used
in netinet/in_route.c. Move the prototype into route.h to fix
the build.


Revision tags: thorpej-atomic-base
# 1.52 04-Mar-2007 christos

branches: 1.52.2; 1.52.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.51 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.50 05-Jan-2007 joerg

branches: 1.50.2;
Add a debug option for the route cache to help tracing down issues
like PR 35272 and 35318. When the kernel is compiled with
-DRTCACHE_DEBUG, all rtcache entries are logged to a list with the place
they got initialised. This allows overwrites, double inits and other
manual messing to be detected.


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.49 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.48 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


# 1.47 07-Dec-2006 joerg

Deinline rt_get_ifa. Keep it in route.c as it is part of the routing
API, even though rtsock.c is the only user right now.


# 1.46 07-Dec-2006 joerg

Deinline rt_replace_ifa and move rt_set_ifa and rt_set_ifa1 to
route.c as they are not used outside that file.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base netbsd-4-base
# 1.45 13-Nov-2006 dyoung

Fix bugs in rt_get_ifa() and put aside the sequence number stuff,
which isn't ready for primetime yet.


# 1.44 13-Nov-2006 dyoung

Add a source-address selection policy mechanism to the kernel.

Also, add ioctls SIOCGIFADDRPREF/SIOCSIFADDRPREF to get/set preference
numbers for addresses. Make ifconfig(8) set/display preference
numbers.

To activate source-address selection policies in your kernel, add
'options IPSELSRC' to your kernel configuration.

Miscellaneous changes in support of source-address selection:

1 Factor out some common code, producing rt_replace_ifa().

2 Abbreviate a for-loop with TAILQ_FOREACH().

3 Add the predicates on IPv4 addresses IN_LINKLOCAL() and
IN_PRIVATE(), that are true for link-local unicast
(169.254/16) and RFC1918 private addresses, respectively.
Add the predicate IN_ANY_LOCAL() that is true for link-local
unicast and multicast.

4 Add IPv4-specific interface attach/detach routines,
in_domifattach and in_domifdetach, which build #ifdef
IPSELSRC.

See in_getifa(9) for a more thorough description of source-address
selection policy.


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base2 yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base simonb-timcounters-final yamt-pdpolicy-base5 chap-midi-base yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 elad-kernelauth-base yamt-pdpolicy-base yamt-uio_vmspace-base5 simonb-timecounters-base rpaulo-netinet-merge-pcb-base
# 1.43 11-Dec-2005 christos

branches: 1.43.20; 1.43.22;
merge ktrace-lwp.


Revision tags: ktrace-lwp-base
# 1.42 10-Dec-2005 elad

Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.41 22-Jun-2005 dyoung

branches: 1.41.2;
Resolve conflicts in importation of 18-May-2005 ath(4) / net80211(9)
from FreeBSD. Introduce compatibility shims (sys/dev/ic/ath_netbsd.[ch],
sys/net80211/ieee80211_netbsd.[ch]). Update drivers (an, atu, atw,
awi, ipw, iwi, rtw, wi) for the new net80211(9) API.


# 1.40 29-May-2005 christos

- sprinkle const
- remove unneeded casts
- use more mem*() instead of b*() funcs.


Revision tags: netbsd-3-1-1-RELEASE netbsd-3-0-3-RELEASE netbsd-3-1-RELEASE netbsd-3-0-2-RELEASE netbsd-3-1-RC4 netbsd-3-1-RC3 netbsd-3-1-RC2 netbsd-3-1-RC1 netbsd-3-0-1-RELEASE netbsd-3-0-RELEASE netbsd-3-0-RC6 netbsd-3-0-RC5 netbsd-3-0-RC4 netbsd-3-0-RC3 netbsd-3-0-RC2 netbsd-3-0-RC1 yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.39 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.38 21-Apr-2004 matt

branches: 1.38.4; 1.38.6;
ANSI-fy and some additional de-__P and constification.


# 1.37 21-Apr-2004 matt

Constify if.c radix.c and route.c (and fix related fallout).


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.36 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.35 29-Jun-2003 fvdl

branches: 1.35.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.34 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.33 18-Jan-2003 wiz

bandwidth, not bandwith.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.32 12-Nov-2002 itojun

remove all entries in rt timer queue on ip_mtudisc change, instead of
destroying the queue.


# 1.31 12-Nov-2002 itojun

add an argument to rt_timer_remove_all(), to specify if we need to call
timeout routine on removal.


# 1.30 02-Nov-2002 perry

/*CONTCOND*/ while (0)'ed macros


Revision tags: kqueue-aftermerge kqueue-beforemerge netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base gehenna-devsw-base kqueue-base
# 1.29 12-May-2002 matt

branches: 1.29.4;
Eliminate more commons.


Revision tags: eeh-devprop-base newlock-base ifpoll-base thorpej-mips-cache-base thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.28 08-Mar-2001 enami

branches: 1.28.2;
- lineup comment.
- fix typo in comment.


# 1.27 21-Feb-2001 itojun

branches: 1.27.2;
use u_quad_t for rtstat.
not sure if it really matters, but short (32K) looks way too small given
recent fat pipes connecting *BSD boxes, and our great uptime :-).


# 1.26 27-Jan-2001 itojun

cleanup cloned route when parent route (RTF_CLONING) goes away.
adds rt_parent to link parent from child (like NRL did, ours do refcnt
rt_refcnt properly).

bsdi rt_walkbranch would speedup the processing, but since the code will not
be visited too frequently, the current code (with rt_walktree) should be okay.


# 1.25 27-Jan-2001 itojun

mark cloned routes with RTF_CLONED. present it with netstat -r by "c".

let static routes overwrite cloned routes, as cloned routes can come back again
if necessary. behavior same as freebsd/bsdi, code partially from bsdi42.
(NRL rt->rt_parent was not added)
should fix PR 11916 and maybe some other PRs with ARP behavior.

recompilation of usr.sbin/route6d is suggested.


# 1.24 17-Jan-2001 itojun

pull post-4.4BSD change to sys/net/route.c from BSD/OS 4.2 (UCB copyrighted).

have sys/net/route.c:rtrequest1(), which takes rt_addrinfo * as the argument.
pass rt_addrinfo all the way down to rtrequest, and ifa->ifa_rtrequest.
3rd arg of ifa->ifa_rtrequest is now rt_addrinfo * instead of sockaddr *
(almost noone is using it anyways).

benefit: the follwoing command now works. previously we need two route(8)
invocations, "add" then "change".
# route add -inet6 default ::1 -ifp gif0

remove unsafe typecast in rtrequest(), from rtentry * to sockaddr *. it was
introduced by 4.3BSD-reno and never corrected.

XXX is eon_rtrequest() change correct regarding to 3rd arg?
eon_rtrequest() and rtrequest() were incorrect since 4.3BSD-reno,
so i do not have correct answer in the source code.
someone with more clue about netiso-over-ip, please help.


# 1.23 09-Dec-2000 itojun

update icmp6 too big validation. the change is necessary since pmtud is
mandatory for IPv6 (so we can't just validate by using connected pcb - we need
to allow traffic from unconnected pcb to do pmtud).
- if the traffic is validated by xx_ctlinput, allow up to "hiwat" pmtud
route entries.
- if the traffic was not validated by xx_ctlinput, allow up to "lowat" pmtud
route entries (there's upper limit, so bad guys cannot blow up our routing
table).
sync with kame

XXX need to think again about default hiwat/lowat value.
XXX victim selection to help starvation case


Revision tags: netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.22 04-May-2000 ragge

branches: 1.22.4;
Change rt_refcnt from short to int, to allow more than 32k routes thru
one interface without unexpected side effects.


# 1.21 06-Mar-2000 thorpej

- Add link status to if_data, so that routing daemons and other interested
parties can easily know the state of a link.
- Define an interface announcement message for the routing socket so that
routing daemons and other interested parties know when an interface
is attached/detached.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.20 19-Nov-1999 bouyer

Update protocoles and interfaces stats counters to 64bit.
RTM_IFINFO is now 0xf, 0xe is RTM_OIFINFO which returns the old (if_msghdr14)
struct with 32bit counters (binary compat, conditioned on COMPAT_14).
Same for sysctl: node 3 is renamed NET_RT_OIFLIST, NET_RT_IFLIST is now node 4.
Change rt_msg1() to add an mbuf to the mbuf chain instead of just panic()
when the message is larger than MHLEN.


Revision tags: comdex-fall-1999-base fvdl-softdep-base chs-ubc2-base
# 1.19 30-Jul-1999 itojun

branches: 1.19.2; 1.19.8;
remove reference to in6_systm.h (file itself will be removed afterwords)


# 1.18 01-Jul-1999 itojun

IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


Revision tags: netbsd-1-4-PATCH003 netbsd-1-4-PATCH002 netbsd-1-4-PATCH001 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.17 27-Dec-1998 thorpej

branches: 1.17.4; 1.17.6;
Simplify the rttimer code somewhat; use TAILQs instead of CIRCLEQs (we
didn't really need to traverse the queues backwards anyhow), and other
minor code simplification.


# 1.16 10-Dec-1998 christos

IPX counters and centralize statistics routine.


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.15 25-Aug-1998 thorpej

Use do { ... } while (0) in RTFREE().


Revision tags: eeh-paddr_t-base
# 1.14 02-May-1998 thorpej

Need <sys/socket.h> to stand alone.


# 1.13 29-Apr-1998 thorpej

Oops, we depend on <sys/queue.h>.


# 1.12 29-Apr-1998 kml

Add generic route timeout functionality; used by path MTU discovery code


Revision tags: netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base thorpej-signal-base marc-pcmcia-bp marc-pcmcia-base
# 1.11 02-Apr-1997 christos

branches: 1.11.8;
Sync with Lite2.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.10 22-May-1996 mycroft

Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.9 13-Feb-1996 christos

branches: 1.9.4;
Net prototypes


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.8 26-Mar-1995 jtc

KERNEL -> _KERNEL


# 1.7 08-Mar-1995 cgd

fixed sized types, where appropriate. when casting pointers to
integers to do math on them, cast to long. ioctl commands are
u_longs.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.6 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.5 13-May-1994 mycroft

Update to 4.4-Lite networking code, with a few local changes.


# 1.4 11-May-1994 mycroft

Update to RTM version 3. Add prototypes. Add some new constants which are
not used yet.


Revision tags: magnum-base netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.3 20-May-1993 cgd

add rcs ids to everything, and clean up headers


# 1.2 19-Apr-1993 mycroft

Add consistent multiple-inclusion protection.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.126 08-Feb-2020 roy

route(4): add RO_MISSFILTER socket option

This allows filtering of specific RTM_MISS destination sockaddrs.


Revision tags: ad-namecache-base2 ad-namecache-base1 ad-namecache-base phil-wifi-20191119
# 1.125 19-Sep-2019 ozaki-r

Avoid having a rtcache directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.
A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.

Reviewed by knakahara@ and yamaguchi@


# 1.124 22-Aug-2019 roy

rtsock: rework rt_clonedmsg to take a message type and lladdr

We will use this in a future patch to notify userland of lladdr
changes.

XXX pullup -8 -9


Revision tags: netbsd-9-base phil-wifi-20190609
# 1.123 29-Apr-2019 roy

branches: 1.123.2;
Introduce rt_addrmsg_src which adds RTA_AUTHOR to the message.
Use this when we notify userland of a duplicate address
and set RTA_AUTHOR to the hardware address of the sender.

While here, match the logging diagnostic of INET6 to the simpler one
of INET so it's consistent.


# 1.122 29-Apr-2019 roy

rtsock: Route address message simplification

Rename rt_newaddrmsg to rt_addrmsg_rt.
Add rt_addrmsg which drops the error and route arguments which are only
needed by one caller.


# 1.121 29-Apr-2019 pgoyette

For the rtsock compat code, make sure we create the "oroute" sysctl
tree. Otherwise a 5.2 version of getifaddrs(2) gets errors.

This makes the 5.2 version of ifconfig(8) behave the same on both
NetBSD-8 and -current. HOWEVER, both of them print nothing (for
``ifconfig -l'' command) so there's still a bug somewhere.

As reported originally by der Mouse.


Revision tags: isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126
# 1.120 30-Oct-2018 ozaki-r

Avoid double rt_replace_ifa on rtrequest1(RTM_ADD)

Some callers of rtrequest1(RTM_ADD) adjust rt_ifa of an rtentry created by
rtrequest1 that may change rt_ifa (in ifa_rtrequest) with another ifa that is
different from requested one. It's wasteful and even worse introduces a race
condition. rtrequest1 should just use a passed ifa as is if a caller hopes so.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422
# 1.119 19-Apr-2018 christos

branches: 1.119.2;
s/static inline/static __inline/g for consistency.


Revision tags: pgoyette-compat-0415
# 1.118 12-Apr-2018 ozaki-r

Resolve tangled lock dependencies in route.c

This change sweeps remaining lock decisions based on if locked or not by moving
utility functions of rtentry updates from rtsock.c and ensuring holding the
rt_lock. It also improves the atomicity of a update of a rtentry.


Revision tags: pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.117 09-Jan-2018 christos

branches: 1.117.2;
Use a queue of deferred entries to delete routes instead of a fixed stack
of 10. Otherwise we can overflow in route deletions from the rexmit timer.
XXX: pullup-8


# 1.116 18-Dec-2017 ozaki-r

Show ARP/NDP caches as LLINFO not LLDATA for backward compatiblity


# 1.115 13-Dec-2017 christos

Add bit definitions for flags so that route(8) can use them.


Revision tags: tls-maxphys-base-20171202
# 1.114 21-Sep-2017 ozaki-r

Invalidate rtcache based on a global generation counter

The change introduces a global generation counter that is incremented when any
routes have been added or deleted. When a rtcache caches a rtentry into itself,
it also stores a snapshot of the generation counter. If the snapshot equals to
the global counter, the cache is still valid, otherwise invalidated.

One drawback of the change is that all rtcaches of all protocol families are
invalidated when any routes of any protocol families are added or deleted.
If that matters, we should have separate generation counters based on
protocol families.

This change removes LIST_ENTRY from struct route, which fixes a part of
PR kern/52515.


Revision tags: nick-nhusb-base-20170825 perseant-stdc-iso10646-base
# 1.113 16-Jun-2017 ozaki-r

Sending a routing message (RTM_ADD) on adding an llentry

A message used to be sent on adding a cloned route. Restore the
behavior for backward compatibility.

Requested by ryo@


Revision tags: netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.112 11-Apr-2017 roy

branches: 1.112.4;
Add RO_MSGFILTER socket option to PF_ROUTE to filter out
un-wanted route(4) messages.

Inspired by the ROUTE_MSGFILTER equivalent in OpenBSD,
but with an API which allows the full range of potential message types.


Revision tags: jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107
# 1.111 19-Dec-2016 roy

branches: 1.111.2;
Fix gcc complaining about int to unsigned long conversion issues by
explictly marking as unsigned in RT_ROUNDUP2.


# 1.110 16-Dec-2016 christos

Can't hide stuff from userland, because struct route is embedded in other
structures (like inpcb) and things like fstat stop working.


# 1.109 12-Dec-2016 ozaki-r

Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.108 08-Dec-2016 ozaki-r

Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.


Revision tags: nick-nhusb-base-20161204
# 1.107 15-Nov-2016 ozaki-r

Don't use rt_walktree to delete routes

Some functions use rt_walktree to scan the routing table and delete
matched routes. However, we shouldn't use rt_walktree to delete
routes because rt_walktree is recursive to the routing table (radix
tree) and isn't friendly to MP-ification. rt_walktree allows a caller
to pass a callback function to delete an matched entry. The callback
function is called from an API of the radix tree (rn_walktree) but
also calls an API of the radix tree to delete an entry.

This change adds a new API of the radix tree, rn_search_matched,
which returns a matched entry that is selected by a callback
function passed by a caller and the caller itself deletes the
entry. By using the API, we can avoid the recursive form.


Revision tags: pgoyette-localcount-20161104
# 1.106 25-Oct-2016 ozaki-r

Remove unnecessary argument

No functional change.


# 1.105 21-Oct-2016 ozaki-r

Make some rt_timer functions and variables static

No functional change.


# 1.104 18-Oct-2016 ozaki-r

Remove unused rtcache_lookup_noclone


Revision tags: nick-nhusb-base-20161004
# 1.103 21-Sep-2016 roy

Add ifam_pid and ifam_addrflags to ifa_msghdr.
Re-version RTM_NEWADDR, RTM_DELADDR, RTM_CHGADDR and NET_RT_IFLIST.
Add compat code for old version.


Revision tags: localcount-20160914 pgoyette-localcount-20160806
# 1.102 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


Revision tags: pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529
# 1.101 28-Apr-2016 ozaki-r

branches: 1.101.2;
Constify rtentry of if_output

We no longer need to change rtentry below if_output.

The change makes it clear where rtentries are changed (or not)
and helps forthcoming locking (os psrefing) rtentries.


# 1.100 26-Apr-2016 ozaki-r

Stop using rt_gwroute on packet sending paths

rt_gwroute of rtentry is a reference to a rtentry of the gateway
for a rtentry with RTF_GATEWAY. That was used by L2 (arp and ndp)
to look up L2 addresses. By separating L2 nexthop caches, we don't
need a route for the purpose and we can stop using rt_gwroute.
By doing so, we can reduce referencing and modifying rtentries,
which makes it easy to apply a lock (and/or psref) to the
routing table and rtentries.

One issue to do this is to keep RTF_REJECT behavior. It seems it
was broken when we moved rtalloc1 things from L2 output routines
(e.g., ether_output) to ip_hresolv_output, but (fortunately?)
it works unexpectedly. What we mistook are:
- RTF_REJECT was checked for any routes in L2 output routines,
but in ip_hresolv_output it is checked only when the route
is RTF_GATEWAY
- The RTF_REJECT check wasn't copied to IPv6 (nd6_output)

It seems that rt_gwroute checks hid the mistakes and it looked
work (unexpectedly) and removing rt_gwroute checks unveil the
issue. So we need to fix RTF_REJECT checks in ip_hresolv_output
and also add them to nd6_output.

One more point we have to care is returning an errno; we need
to mimic looutput behavior. Originally RTF_REJECT check was
done either in L2 output routines or in looutput. The latter is
applied when a reject route directs to a loopback interface.
However, now RTF_REJECT check is done before looutput so to keep
the original behavior we need to return an errno which looutput
chooses. Added rt_check_reject_route does such tweaks.


Revision tags: nick-nhusb-base-20160422
# 1.99 11-Apr-2016 ozaki-r

Don't use radix tree API directly


# 1.98 04-Apr-2016 ozaki-r

Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html


# 1.97 24-Mar-2016 ozaki-r

Constify rt_newmsg's arguments


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.96 02-Sep-2015 ozaki-r

Do rt_refcnt++ when set a rtentry to another rtentry's rt_gwroute

And also do rtfree when deref a rtentry from rt_gwroute.


# 1.95 31-Aug-2015 ozaki-r

Hook up lltable/llentry with the kernel (and rumpkernel)

It is built and initialized on bootup, but there is no user for now.

Most codes in in.c are imported from FreeBSD as well as lltable/llentry.


# 1.94 31-Aug-2015 ozaki-r

Make rt_refcnt take into account rt_timer


# 1.93 24-Aug-2015 ozaki-r

Add an assertion; if rtcache has an rtentry, its refcnt must be > 0


# 1.92 17-Jul-2015 ozaki-r

Reform use of rt_refcnt

rt_refcnt of rtentry was used in bad manners, for example, direct rt_refcnt++
and rt_refcnt-- outside route.c, "rt->rt_refcnt++; rtfree(rt);" idiom, and
touching rt after rt->rt_refcnt--.

These abuses seem to be needed because rt_refcnt manages only references
between rtentry and doesn't take care of references during packet processing
(IOW references from local variables). In order to reduce the above abuses,
the latter cases should be counted by rt_refcnt as well as the former cases.

This change improves consistency of use of rt_refcnt:
- rtentry is always accessed with rt_refcnt incremented
- rtentry's rt_refcnt is decremented after use (rtfree is always used instead
of rt_refcnt--)
- functions returning rtentry increment its rt_refcnt (and caller rtfree it)

Note that rt_refcnt prevents rtentry from being freed but doesn't prevent
rtentry from being updated. Toward MP-safe, we need to provide another
protection for rtentry, e.g., locks. (Or introduce a better data structure
allowing concurrent readers during updates.)


Revision tags: nick-nhusb-base-20150606
# 1.91 30-Apr-2015 ozaki-r

Make some functions static

- rtflushall
- rtcache_clear
- rtcache_invalidate

And pull these static inline functions in route.c

- rt_destroy
- rt_setkey


Revision tags: nick-nhusb-base-20150406
# 1.90 06-Apr-2015 ozaki-r

Classify and sort prototype declarations

No functional change.


# 1.89 06-Apr-2015 ozaki-r

Make rt_maskedcopy static


# 1.88 23-Mar-2015 roy

Add RTF_BROADCAST to mark routes used for the broadcast address when
they are created on the fly. This makes it clear what the route is for
and allows an optimisation in ip_output() by avoiding a call to
in_broadcast() because most of the time we do talk to a host.
It also avoids a needless allocation for the storage of llinfo_arp and
thus vanishes from arp(8) - it showed as incomplete anyway so this
is a nice side effect.

Guard against this and routes marked with RTF_BLACKHOLE in
ip_fastforward().
While here, guard against routes marked with RTF_BLACKHOLE in
ip6_fastforward().
RTF_BROADCAST is IPv4 only, so don't bother checking that here.


# 1.87 26-Feb-2015 roy

Introduce the routing flag RTF_LOCAL to track local address routes.
Add functions rt_ifa_addlocal() and rt_ifa_remlocal() to add and remove
local routes for the address and announce the new address and route
to the routing socket.

Add in_ifaddlocal() and in_ifremlocal() to use these functions.
Rename in6_if{add,rem}loop() to in6_if{add,rem}local() and use these
functions.

rtinit() no longer announces the address, just the network route for the
address. As such, calls to rt_newaddrmsg() have been removed from
in_addprefix() and in_scrubprefix().

This solves the problem of potentially more than one announcement, or no
announcement at all for the address in certain situations.


# 1.86 25-Feb-2015 roy

Rename nd6_rtmsg() to rt_newmsg() and move into the generic routing code
as it's not IPv6 specific and will be used elsewhere.


# 1.85 24-Feb-2015 roy

Clean comments and style.


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.84 06-Jun-2014 rmind

branches: 1.84.4;
- Eliminate RTFREE() macro in favour of rtfree() function.
- Make rtcache() function static.


Revision tags: yamt-pagecache-base9 rmind-smpnet-nbase rmind-smpnet-base
# 1.83 26-Apr-2014 pooka

It's been > 20years since rtioctl() did something. Let's just
remove that special way of returning EOPNOTSUPP.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base
# 1.82 01-Mar-2013 joerg

branches: 1.82.6; 1.82.10;
Retire OSI network stack. OK core@


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6 jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.81 18-Feb-2012 rmind

branches: 1.81.2;
rt_setkey: remove invalid assert, sockaddr_dup() may fail if no memory.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.80 11-Nov-2011 gdt

branches: 1.80.4;
Move RTF_ANNOUNCE flag so that it no longer conflicts with RTF_PROTO2.

RTF_ANNOUNCE was defined as RTF_PROTO2. The flag is used to indicated
that host should act as a proxy for a link level arp or ndp request.
(If RTF_PROTO2 is used as an experimental flag (as advertised),
various problems can occur.)

This commit provides a first-class definition with its own bit for
RTF_ANNOUNCE, removes the old aliasing definitions, and adds support
for the new RTF_ANNOUNCE flag to netstat(8) and route(8).,

Also, remove unused RTF_ flags that collide with RTF_PROTO1:
netinet/icmp6.h defined RTF_PROBEMTU as RTF_PROTO1
netinet/if_inarp.h defined RTF_USETRAILERS as RTF_PROTO1
(Neither of these flags are used anywhere. Both have been removed
to reduce chances of collision with RTF_PROTO1.)

Figuring this out and the diff are the work of Beverly Schwartz of
BBN.

(Passed release build, boot in VM, with no apparently related atf
failures.)

Approved for Public Release, Distribution Unlimited
This material is based upon work supported by the Defense Advanced
Research Projects Agency and Space and Naval Warfare Systems Center,
Pacific, under Contract No. N66001-09-C-2073.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.79 31-Mar-2011 dyoung

branches: 1.79.4;
Hide the radix-trie implementation of the forwarding table so that we
will have an easier time replacing it with something different, even if
it is a second radix-trie implementation.

sys/net/route.c and sys/net/rtsock.c no longer operate directly on
radix_nodes or radix_node_heads.

Hopefully this will reduce the temptation to implement multipath or
source-based routing using grotty hacks to the grotty old radix-trie
code, too. :-)


Revision tags: bouyer-quota2-nbase bouyer-quota2-base
# 1.78 01-Feb-2011 matt

Add a new AF/PF_ROUTE which is 64-bit clean which makes the routing socket
interface (and its associated sysctls) act identically for both 32 and 64 bit
programs. The old unclean one remains for backward compatibility.


# 1.77 26-Jan-2011 dyoung

Update comment on RTM_CHGADDR to describe better what it's for.


Revision tags: jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.76 12-Nov-2010 roy

branches: 1.76.2; 1.76.4;
Add RTM_CHGADDR to signal that an address on the interface has changed.
This is mainly used for notifying userland about active link address changes.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.75 26-Jun-2010 kefren

Add MPLS support, proposed on tech-net@ a couple of days ago

Welcome to 5.99.33


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211
# 1.74 03-Nov-2009 dyoung

branches: 1.74.2; 1.74.4;
s/u_quad_t/uint64_t/.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 jym-xensuspend-nbase yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.73 02-Apr-2009 christos

Centralize the ROUNDUP and ADVANCE macro in a header file, give them an
RT_ prefix and use them appropriately, instead of making copies. Make
pppd use the RT_ROUNDUP macro; fixes proxyarp setting on 64 bit hosts.

XXX: All this should be pulled up to 5.0


Revision tags: nick-hppapmap-base2 mjf-devfs2-base
# 1.72 11-Jan-2009 christos

branches: 1.72.2;
merge christos-time_t


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base christos-time_t-nbase haad-dm-base christos-time_t-base
# 1.71 07-Nov-2008 dyoung

*** Summary ***

When a link-layer address changes (e.g., ifconfig ex0 link
02:de:ad:be:ef:02 active), send a gratuitous ARP and/or a Neighbor
Advertisement to update the network-/link-layer address bindings
on our LAN peers.

Refuse a change of ethernet address to the address 00:00:00:00:00:00
or to any multicast/broadcast address. (Thanks matt@.)

Reorder ifnet ioctl operations so that driver ioctls may inherit
the functions of their "class"---ether_ioctl(), fddi_ioctl(), et
cetera---and the class ioctls may inherit from the generic ioctl,
ifioctl_common(), but both driver- and class-ioctls may override
the generic behavior. Make network drivers share more code.

Distinguish a "factory" link-layer address from others for the
purposes of both protecting that address from deletion and computing
EUI64.

Return consistent, appropriate error codes from network drivers.

Improve readability. KNF.

*** Details ***

In if_attach(), always initialize the interface ioctl routine,
ifnet->if_ioctl, if the driver has not already initialized it.
Delete if_ioctl == NULL tests everywhere else, because it cannot
happen.

In the ioctl routines of network interfaces, inherit common ioctl
behaviors by calling either ifioctl_common() or whichever ioctl
routine is appropriate for the class of interface---e.g., ether_ioctl()
for ethernets.

Stop (ab)using SIOCSIFADDR and start to use SIOCINITIFADDR. In
the user->kernel interface, SIOCSIFADDR's argument was an ifreq,
but on the protocol->ifnet interface, SIOCSIFADDR's argument was
an ifaddr. That was confusing, and it would work against me as I
make it possible for a network interface to overload most ioctls.
On the protocol->ifnet interface, replace SIOCSIFADDR with
SIOCINITIFADDR. In ifioctl(), return EPERM if userland tries to
invoke SIOCINITIFADDR.

In ifioctl(), give the interface the first shot at handling most
interface ioctls, and give the protocol the second shot, instead
of the other way around. Finally, let compatibility code (COMPAT_OSOCK)
take a shot.

Pull device initialization out of switch statements under
SIOCINITIFADDR. For example, pull ..._init() out of any switch
statement that looks like this:

switch (...->sa_family) {
case ...:
..._init();
...
break;
...
default:
..._init();
...
break;
}

Rewrite many if-else clauses that handle all permutations of IFF_UP
and IFF_RUNNING to use a switch statement,

switch (x & (IFF_UP|IFF_RUNNING)) {
case 0:
...
break;
case IFF_RUNNING:
...
break;
case IFF_UP:
...
break;
case IFF_UP|IFF_RUNNING:
...
break;
}

unifdef lots of code containing #ifdef FreeBSD, #ifdef NetBSD, and
#ifdef SIOCSIFMTU, especially in fwip(4) and in ndis(4).

In ipw(4), remove an if_set_sadl() call that is out of place.

In nfe(4), reuse the jumbo MTU logic in ether_ioctl().

Let ethernets register a callback for setting h/w state such as
promiscuous mode and the multicast filter in accord with a change
in the if_flags: ether_set_ifflags_cb() registers a callback that
returns ENETRESET if the caller should reset the ethernet by calling
if_init(), 0 on success, != 0 on failure. Pull common code from
ex(4), gem(4), nfe(4), sip(4), tlp(4), vge(4) into ether_ioctl(),
and register if_flags callbacks for those drivers.

Return ENOTTY instead of EINVAL for inappropriate ioctls. In
zyd(4), use ENXIO instead of ENOTTY to indicate that the device is
not any longer attached.

Add to if_set_sadl() a boolean 'factory' argument that indicates
whether a link-layer address was assigned by the factory or some
other source. In a comment, recommend using the factory address
for generating an EUI64, and update in6_get_hw_ifid() to prefer a
factory address to any other link-layer address.

Add a routing message, RTM_LLINFO_UPD, that tells protocols to
update the binding of network-layer addresses to link-layer addresses.
Implement this message in IPv4 and IPv6 by sending a gratuitous
ARP or a neighbor advertisement, respectively. Generate RTM_LLINFO_UPD
messages on a change of an interface's link-layer address.

In ether_ioctl(), do not let SIOCALIFADDR set a link-layer address
that is broadcast/multicast or equal to 00:00:00:00:00:00.

Make ether_ioctl() call ifioctl_common() to handle ioctls that it
does not understand.

In gif(4), initialize if_softc and use it, instead of assuming that
the gif_softc and ifp overlap.

Let ifioctl_common() handle SIOCGIFADDR.

Sprinkle rtcache_invariants(), which checks on DIAGNOSTIC kernels
that certain invariants on a struct route are satisfied.

In agr(4), rewrite agr_ioctl_filter() to be a bit more explicit
about the ioctls that we do not allow on an agr(4) member interface.

bzero -> memset. Delete unnecessary casts to void *. Use
sockaddr_in_init() and sockaddr_in6_init(). Compare pointers with
NULL instead of "testing truth". Replace some instances of (type
*)0 with NULL. Change some K&R prototypes to ANSI C, and join
lines.


Revision tags: netbsd-5-0-RC3 netbsd-5-0-RC2 netbsd-5-0-RC1 netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-baseX yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base yamt-nfs-mp-base yamt-pf42-base ad-socklock-base1
# 1.70 26-Mar-2008 ad

branches: 1.70.2; 1.70.6; 1.70.12; 1.70.14; 1.70.16;
Defer processing of routing messages to a soft interrupt. These can be
generated at IPL_VM and it's not safe to call directly into the socket
layer at that level. Reviewed by matt@.


Revision tags: yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase hpcarm-cleanup-base
# 1.69 20-Feb-2008 matt

branches: 1.69.2; 1.69.6;
s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)


Revision tags: mjf-devfs-base
# 1.68 11-Feb-2008 simonb

Don't look for <stdbool.h> if compiling _STANDALONE as well.


Revision tags: bouyer-xeni386-nbase
# 1.67 21-Jan-2008 dyoung

struct route is part of the kernel ABI (!!!), so move it back
outside of #ifdef _KERNEL. #include stdbool.h if !_KERNEL.


# 1.66 21-Jan-2008 dyoung

Move struct route inside of #ifdef _KERNEL to protect userland from
it.


# 1.65 21-Jan-2008 dyoung

In rtflushall(), do not clear a route cache by removing its rtentry
reference, but mark the cache 'invalid'. Let the next user of the
route cache check to whether or not the cache is valid, and update
the rtentry reference if necessary. In this way, avoid hairy
splnet()/splx() protection of route caches, which I never did trust.


Revision tags: bouyer-xeni386-base
# 1.64 14-Jan-2008 dyoung

Use rtcache_validate() instead of rtcache_getrt(). Delete rtcache_getrt().

In rtcache_lookup2(), use the return values of rtcache_validate()
and _rtcache_init() instead of looking at _ro_rt. Also, check the
return code of rtcache_setdst() for an error.


# 1.63 12-Jan-2008 dyoung

Good-bye, rtcache_check(). Call both rtcache_validate() and
rtcache_update(,1) instead of rtcache_check().


# 1.62 11-Jan-2008 dyoung

Cosmetic: remove redundant 'not' from a comment, re-wrap lines.


# 1.61 10-Jan-2008 dyoung

Make many void rtcache_X() routines return struct rtentry *, so
that we can make many back-to-back rtcache_X();rtcache_getrt()
calls into one rtcache_X() call.


Revision tags: matt-armv6-base
# 1.60 04-Jan-2008 dyoung

Replace rtcache_down() with rtcache_validate() and update rtcache_down()
uses.


Revision tags: vmlocking2-base3
# 1.59 20-Dec-2007 dyoung

Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.


Revision tags: nick-csl-alignment-base5 matt-armv6-prevmlocking yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base jmcneill-pm-base reinoud-bufcleanup-base vmlocking-base
# 1.58 27-Aug-2007 dyoung

branches: 1.58.2; 1.58.8; 1.58.10; 1.58.14;
Add a new routing message type, RTM_SETGATE. We can use an
RTM_SETGATE message to ask the link layer to fill in the link-layer
nexthop before we try to detect a duplicate route in a multipath-capable
kernel.


Revision tags: matt-mips64-base
# 1.57 19-Jul-2007 dyoung

branches: 1.57.4; 1.57.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.56 09-Jun-2007 dyoung

branches: 1.56.2;
Get rid of radix_node_head.rnh_walktree, because it is only ever
set to rn_walktree.

Introduce rt_walktree(), which applies a subroutine to every route
in a particular address family. Use it instead of rn_walktree()
virtually everywhere. This helps to hide the routing table
implementation.


Revision tags: yamt-idlelwp-base8
# 1.55 06-May-2007 dyoung

Factor rtcache_lookup2() out of rtcache_lookup1(), for re-use in
the IPv6 stack. rtcache_lookup2() takes an int * argument that it
writes with 1 if we had a cache 'hit', 0 if there was a cache
'miss'.


# 1.54 02-May-2007 dyoung

Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


# 1.53 22-Apr-2007 xtraeme

rtcache_clear is defined as static void in route.c, but it's used
in netinet/in_route.c. Move the prototype into route.h to fix
the build.


Revision tags: thorpej-atomic-base
# 1.52 04-Mar-2007 christos

branches: 1.52.2; 1.52.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.51 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.50 05-Jan-2007 joerg

branches: 1.50.2;
Add a debug option for the route cache to help tracing down issues
like PR 35272 and 35318. When the kernel is compiled with
-DRTCACHE_DEBUG, all rtcache entries are logged to a list with the place
they got initialised. This allows overwrites, double inits and other
manual messing to be detected.


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.49 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.48 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


# 1.47 07-Dec-2006 joerg

Deinline rt_get_ifa. Keep it in route.c as it is part of the routing
API, even though rtsock.c is the only user right now.


# 1.46 07-Dec-2006 joerg

Deinline rt_replace_ifa and move rt_set_ifa and rt_set_ifa1 to
route.c as they are not used outside that file.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base netbsd-4-base
# 1.45 13-Nov-2006 dyoung

Fix bugs in rt_get_ifa() and put aside the sequence number stuff,
which isn't ready for primetime yet.


# 1.44 13-Nov-2006 dyoung

Add a source-address selection policy mechanism to the kernel.

Also, add ioctls SIOCGIFADDRPREF/SIOCSIFADDRPREF to get/set preference
numbers for addresses. Make ifconfig(8) set/display preference
numbers.

To activate source-address selection policies in your kernel, add
'options IPSELSRC' to your kernel configuration.

Miscellaneous changes in support of source-address selection:

1 Factor out some common code, producing rt_replace_ifa().

2 Abbreviate a for-loop with TAILQ_FOREACH().

3 Add the predicates on IPv4 addresses IN_LINKLOCAL() and
IN_PRIVATE(), that are true for link-local unicast
(169.254/16) and RFC1918 private addresses, respectively.
Add the predicate IN_ANY_LOCAL() that is true for link-local
unicast and multicast.

4 Add IPv4-specific interface attach/detach routines,
in_domifattach and in_domifdetach, which build #ifdef
IPSELSRC.

See in_getifa(9) for a more thorough description of source-address
selection policy.


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base2 yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base simonb-timcounters-final yamt-pdpolicy-base5 chap-midi-base yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 elad-kernelauth-base yamt-pdpolicy-base yamt-uio_vmspace-base5 simonb-timecounters-base rpaulo-netinet-merge-pcb-base
# 1.43 11-Dec-2005 christos

branches: 1.43.20; 1.43.22;
merge ktrace-lwp.


Revision tags: ktrace-lwp-base
# 1.42 10-Dec-2005 elad

Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.41 22-Jun-2005 dyoung

branches: 1.41.2;
Resolve conflicts in importation of 18-May-2005 ath(4) / net80211(9)
from FreeBSD. Introduce compatibility shims (sys/dev/ic/ath_netbsd.[ch],
sys/net80211/ieee80211_netbsd.[ch]). Update drivers (an, atu, atw,
awi, ipw, iwi, rtw, wi) for the new net80211(9) API.


# 1.40 29-May-2005 christos

- sprinkle const
- remove unneeded casts
- use more mem*() instead of b*() funcs.


Revision tags: netbsd-3-1-1-RELEASE netbsd-3-0-3-RELEASE netbsd-3-1-RELEASE netbsd-3-0-2-RELEASE netbsd-3-1-RC4 netbsd-3-1-RC3 netbsd-3-1-RC2 netbsd-3-1-RC1 netbsd-3-0-1-RELEASE netbsd-3-0-RELEASE netbsd-3-0-RC6 netbsd-3-0-RC5 netbsd-3-0-RC4 netbsd-3-0-RC3 netbsd-3-0-RC2 netbsd-3-0-RC1 yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.39 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.38 21-Apr-2004 matt

branches: 1.38.4; 1.38.6;
ANSI-fy and some additional de-__P and constification.


# 1.37 21-Apr-2004 matt

Constify if.c radix.c and route.c (and fix related fallout).


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.36 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.35 29-Jun-2003 fvdl

branches: 1.35.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.34 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.33 18-Jan-2003 wiz

bandwidth, not bandwith.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.32 12-Nov-2002 itojun

remove all entries in rt timer queue on ip_mtudisc change, instead of
destroying the queue.


# 1.31 12-Nov-2002 itojun

add an argument to rt_timer_remove_all(), to specify if we need to call
timeout routine on removal.


# 1.30 02-Nov-2002 perry

/*CONTCOND*/ while (0)'ed macros


Revision tags: kqueue-aftermerge kqueue-beforemerge netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base gehenna-devsw-base kqueue-base
# 1.29 12-May-2002 matt

branches: 1.29.4;
Eliminate more commons.


Revision tags: eeh-devprop-base newlock-base ifpoll-base thorpej-mips-cache-base thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.28 08-Mar-2001 enami

branches: 1.28.2;
- lineup comment.
- fix typo in comment.


# 1.27 21-Feb-2001 itojun

branches: 1.27.2;
use u_quad_t for rtstat.
not sure if it really matters, but short (32K) looks way too small given
recent fat pipes connecting *BSD boxes, and our great uptime :-).


# 1.26 27-Jan-2001 itojun

cleanup cloned route when parent route (RTF_CLONING) goes away.
adds rt_parent to link parent from child (like NRL did, ours do refcnt
rt_refcnt properly).

bsdi rt_walkbranch would speedup the processing, but since the code will not
be visited too frequently, the current code (with rt_walktree) should be okay.


# 1.25 27-Jan-2001 itojun

mark cloned routes with RTF_CLONED. present it with netstat -r by "c".

let static routes overwrite cloned routes, as cloned routes can come back again
if necessary. behavior same as freebsd/bsdi, code partially from bsdi42.
(NRL rt->rt_parent was not added)
should fix PR 11916 and maybe some other PRs with ARP behavior.

recompilation of usr.sbin/route6d is suggested.


# 1.24 17-Jan-2001 itojun

pull post-4.4BSD change to sys/net/route.c from BSD/OS 4.2 (UCB copyrighted).

have sys/net/route.c:rtrequest1(), which takes rt_addrinfo * as the argument.
pass rt_addrinfo all the way down to rtrequest, and ifa->ifa_rtrequest.
3rd arg of ifa->ifa_rtrequest is now rt_addrinfo * instead of sockaddr *
(almost noone is using it anyways).

benefit: the follwoing command now works. previously we need two route(8)
invocations, "add" then "change".
# route add -inet6 default ::1 -ifp gif0

remove unsafe typecast in rtrequest(), from rtentry * to sockaddr *. it was
introduced by 4.3BSD-reno and never corrected.

XXX is eon_rtrequest() change correct regarding to 3rd arg?
eon_rtrequest() and rtrequest() were incorrect since 4.3BSD-reno,
so i do not have correct answer in the source code.
someone with more clue about netiso-over-ip, please help.


# 1.23 09-Dec-2000 itojun

update icmp6 too big validation. the change is necessary since pmtud is
mandatory for IPv6 (so we can't just validate by using connected pcb - we need
to allow traffic from unconnected pcb to do pmtud).
- if the traffic is validated by xx_ctlinput, allow up to "hiwat" pmtud
route entries.
- if the traffic was not validated by xx_ctlinput, allow up to "lowat" pmtud
route entries (there's upper limit, so bad guys cannot blow up our routing
table).
sync with kame

XXX need to think again about default hiwat/lowat value.
XXX victim selection to help starvation case


Revision tags: netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.22 04-May-2000 ragge

branches: 1.22.4;
Change rt_refcnt from short to int, to allow more than 32k routes thru
one interface without unexpected side effects.


# 1.21 06-Mar-2000 thorpej

- Add link status to if_data, so that routing daemons and other interested
parties can easily know the state of a link.
- Define an interface announcement message for the routing socket so that
routing daemons and other interested parties know when an interface
is attached/detached.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.20 19-Nov-1999 bouyer

Update protocoles and interfaces stats counters to 64bit.
RTM_IFINFO is now 0xf, 0xe is RTM_OIFINFO which returns the old (if_msghdr14)
struct with 32bit counters (binary compat, conditioned on COMPAT_14).
Same for sysctl: node 3 is renamed NET_RT_OIFLIST, NET_RT_IFLIST is now node 4.
Change rt_msg1() to add an mbuf to the mbuf chain instead of just panic()
when the message is larger than MHLEN.


Revision tags: comdex-fall-1999-base fvdl-softdep-base chs-ubc2-base
# 1.19 30-Jul-1999 itojun

branches: 1.19.2; 1.19.8;
remove reference to in6_systm.h (file itself will be removed afterwords)


# 1.18 01-Jul-1999 itojun

IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


Revision tags: netbsd-1-4-PATCH003 netbsd-1-4-PATCH002 netbsd-1-4-PATCH001 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.17 27-Dec-1998 thorpej

branches: 1.17.4; 1.17.6;
Simplify the rttimer code somewhat; use TAILQs instead of CIRCLEQs (we
didn't really need to traverse the queues backwards anyhow), and other
minor code simplification.


# 1.16 10-Dec-1998 christos

IPX counters and centralize statistics routine.


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.15 25-Aug-1998 thorpej

Use do { ... } while (0) in RTFREE().


Revision tags: eeh-paddr_t-base
# 1.14 02-May-1998 thorpej

Need <sys/socket.h> to stand alone.


# 1.13 29-Apr-1998 thorpej

Oops, we depend on <sys/queue.h>.


# 1.12 29-Apr-1998 kml

Add generic route timeout functionality; used by path MTU discovery code


Revision tags: netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base thorpej-signal-base marc-pcmcia-bp marc-pcmcia-base
# 1.11 02-Apr-1997 christos

branches: 1.11.8;
Sync with Lite2.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.10 22-May-1996 mycroft

Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.9 13-Feb-1996 christos

branches: 1.9.4;
Net prototypes


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.8 26-Mar-1995 jtc

KERNEL -> _KERNEL


# 1.7 08-Mar-1995 cgd

fixed sized types, where appropriate. when casting pointers to
integers to do math on them, cast to long. ioctl commands are
u_longs.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.6 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.5 13-May-1994 mycroft

Update to 4.4-Lite networking code, with a few local changes.


# 1.4 11-May-1994 mycroft

Update to RTM version 3. Add prototypes. Add some new constants which are
not used yet.


Revision tags: magnum-base netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.3 20-May-1993 cgd

add rcs ids to everything, and clean up headers


# 1.2 19-Apr-1993 mycroft

Add consistent multiple-inclusion protection.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.125 19-Sep-2019 ozaki-r

Avoid having a rtcache directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.
A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.

Reviewed by knakahara@ and yamaguchi@


# 1.124 22-Aug-2019 roy

rtsock: rework rt_clonedmsg to take a message type and lladdr

We will use this in a future patch to notify userland of lladdr
changes.

XXX pullup -8 -9


Revision tags: netbsd-9-base phil-wifi-20190609
# 1.123 29-Apr-2019 roy

branches: 1.123.2;
Introduce rt_addrmsg_src which adds RTA_AUTHOR to the message.
Use this when we notify userland of a duplicate address
and set RTA_AUTHOR to the hardware address of the sender.

While here, match the logging diagnostic of INET6 to the simpler one
of INET so it's consistent.


# 1.122 29-Apr-2019 roy

rtsock: Route address message simplification

Rename rt_newaddrmsg to rt_addrmsg_rt.
Add rt_addrmsg which drops the error and route arguments which are only
needed by one caller.


# 1.121 29-Apr-2019 pgoyette

For the rtsock compat code, make sure we create the "oroute" sysctl
tree. Otherwise a 5.2 version of getifaddrs(2) gets errors.

This makes the 5.2 version of ifconfig(8) behave the same on both
NetBSD-8 and -current. HOWEVER, both of them print nothing (for
``ifconfig -l'' command) so there's still a bug somewhere.

As reported originally by der Mouse.


Revision tags: isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126
# 1.120 30-Oct-2018 ozaki-r

Avoid double rt_replace_ifa on rtrequest1(RTM_ADD)

Some callers of rtrequest1(RTM_ADD) adjust rt_ifa of an rtentry created by
rtrequest1 that may change rt_ifa (in ifa_rtrequest) with another ifa that is
different from requested one. It's wasteful and even worse introduces a race
condition. rtrequest1 should just use a passed ifa as is if a caller hopes so.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422
# 1.119 19-Apr-2018 christos

branches: 1.119.2;
s/static inline/static __inline/g for consistency.


Revision tags: pgoyette-compat-0415
# 1.118 12-Apr-2018 ozaki-r

Resolve tangled lock dependencies in route.c

This change sweeps remaining lock decisions based on if locked or not by moving
utility functions of rtentry updates from rtsock.c and ensuring holding the
rt_lock. It also improves the atomicity of a update of a rtentry.


Revision tags: pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.117 09-Jan-2018 christos

branches: 1.117.2;
Use a queue of deferred entries to delete routes instead of a fixed stack
of 10. Otherwise we can overflow in route deletions from the rexmit timer.
XXX: pullup-8


# 1.116 18-Dec-2017 ozaki-r

Show ARP/NDP caches as LLINFO not LLDATA for backward compatiblity


# 1.115 13-Dec-2017 christos

Add bit definitions for flags so that route(8) can use them.


Revision tags: tls-maxphys-base-20171202
# 1.114 21-Sep-2017 ozaki-r

Invalidate rtcache based on a global generation counter

The change introduces a global generation counter that is incremented when any
routes have been added or deleted. When a rtcache caches a rtentry into itself,
it also stores a snapshot of the generation counter. If the snapshot equals to
the global counter, the cache is still valid, otherwise invalidated.

One drawback of the change is that all rtcaches of all protocol families are
invalidated when any routes of any protocol families are added or deleted.
If that matters, we should have separate generation counters based on
protocol families.

This change removes LIST_ENTRY from struct route, which fixes a part of
PR kern/52515.


Revision tags: nick-nhusb-base-20170825 perseant-stdc-iso10646-base
# 1.113 16-Jun-2017 ozaki-r

Sending a routing message (RTM_ADD) on adding an llentry

A message used to be sent on adding a cloned route. Restore the
behavior for backward compatibility.

Requested by ryo@


Revision tags: netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.112 11-Apr-2017 roy

branches: 1.112.4;
Add RO_MSGFILTER socket option to PF_ROUTE to filter out
un-wanted route(4) messages.

Inspired by the ROUTE_MSGFILTER equivalent in OpenBSD,
but with an API which allows the full range of potential message types.


Revision tags: jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107
# 1.111 19-Dec-2016 roy

branches: 1.111.2;
Fix gcc complaining about int to unsigned long conversion issues by
explictly marking as unsigned in RT_ROUNDUP2.


# 1.110 16-Dec-2016 christos

Can't hide stuff from userland, because struct route is embedded in other
structures (like inpcb) and things like fstat stop working.


# 1.109 12-Dec-2016 ozaki-r

Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.108 08-Dec-2016 ozaki-r

Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.


Revision tags: nick-nhusb-base-20161204
# 1.107 15-Nov-2016 ozaki-r

Don't use rt_walktree to delete routes

Some functions use rt_walktree to scan the routing table and delete
matched routes. However, we shouldn't use rt_walktree to delete
routes because rt_walktree is recursive to the routing table (radix
tree) and isn't friendly to MP-ification. rt_walktree allows a caller
to pass a callback function to delete an matched entry. The callback
function is called from an API of the radix tree (rn_walktree) but
also calls an API of the radix tree to delete an entry.

This change adds a new API of the radix tree, rn_search_matched,
which returns a matched entry that is selected by a callback
function passed by a caller and the caller itself deletes the
entry. By using the API, we can avoid the recursive form.


Revision tags: pgoyette-localcount-20161104
# 1.106 25-Oct-2016 ozaki-r

Remove unnecessary argument

No functional change.


# 1.105 21-Oct-2016 ozaki-r

Make some rt_timer functions and variables static

No functional change.


# 1.104 18-Oct-2016 ozaki-r

Remove unused rtcache_lookup_noclone


Revision tags: nick-nhusb-base-20161004
# 1.103 21-Sep-2016 roy

Add ifam_pid and ifam_addrflags to ifa_msghdr.
Re-version RTM_NEWADDR, RTM_DELADDR, RTM_CHGADDR and NET_RT_IFLIST.
Add compat code for old version.


Revision tags: localcount-20160914 pgoyette-localcount-20160806
# 1.102 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


Revision tags: pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529
# 1.101 28-Apr-2016 ozaki-r

branches: 1.101.2;
Constify rtentry of if_output

We no longer need to change rtentry below if_output.

The change makes it clear where rtentries are changed (or not)
and helps forthcoming locking (os psrefing) rtentries.


# 1.100 26-Apr-2016 ozaki-r

Stop using rt_gwroute on packet sending paths

rt_gwroute of rtentry is a reference to a rtentry of the gateway
for a rtentry with RTF_GATEWAY. That was used by L2 (arp and ndp)
to look up L2 addresses. By separating L2 nexthop caches, we don't
need a route for the purpose and we can stop using rt_gwroute.
By doing so, we can reduce referencing and modifying rtentries,
which makes it easy to apply a lock (and/or psref) to the
routing table and rtentries.

One issue to do this is to keep RTF_REJECT behavior. It seems it
was broken when we moved rtalloc1 things from L2 output routines
(e.g., ether_output) to ip_hresolv_output, but (fortunately?)
it works unexpectedly. What we mistook are:
- RTF_REJECT was checked for any routes in L2 output routines,
but in ip_hresolv_output it is checked only when the route
is RTF_GATEWAY
- The RTF_REJECT check wasn't copied to IPv6 (nd6_output)

It seems that rt_gwroute checks hid the mistakes and it looked
work (unexpectedly) and removing rt_gwroute checks unveil the
issue. So we need to fix RTF_REJECT checks in ip_hresolv_output
and also add them to nd6_output.

One more point we have to care is returning an errno; we need
to mimic looutput behavior. Originally RTF_REJECT check was
done either in L2 output routines or in looutput. The latter is
applied when a reject route directs to a loopback interface.
However, now RTF_REJECT check is done before looutput so to keep
the original behavior we need to return an errno which looutput
chooses. Added rt_check_reject_route does such tweaks.


Revision tags: nick-nhusb-base-20160422
# 1.99 11-Apr-2016 ozaki-r

Don't use radix tree API directly


# 1.98 04-Apr-2016 ozaki-r

Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html


# 1.97 24-Mar-2016 ozaki-r

Constify rt_newmsg's arguments


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.96 02-Sep-2015 ozaki-r

Do rt_refcnt++ when set a rtentry to another rtentry's rt_gwroute

And also do rtfree when deref a rtentry from rt_gwroute.


# 1.95 31-Aug-2015 ozaki-r

Hook up lltable/llentry with the kernel (and rumpkernel)

It is built and initialized on bootup, but there is no user for now.

Most codes in in.c are imported from FreeBSD as well as lltable/llentry.


# 1.94 31-Aug-2015 ozaki-r

Make rt_refcnt take into account rt_timer


# 1.93 24-Aug-2015 ozaki-r

Add an assertion; if rtcache has an rtentry, its refcnt must be > 0


# 1.92 17-Jul-2015 ozaki-r

Reform use of rt_refcnt

rt_refcnt of rtentry was used in bad manners, for example, direct rt_refcnt++
and rt_refcnt-- outside route.c, "rt->rt_refcnt++; rtfree(rt);" idiom, and
touching rt after rt->rt_refcnt--.

These abuses seem to be needed because rt_refcnt manages only references
between rtentry and doesn't take care of references during packet processing
(IOW references from local variables). In order to reduce the above abuses,
the latter cases should be counted by rt_refcnt as well as the former cases.

This change improves consistency of use of rt_refcnt:
- rtentry is always accessed with rt_refcnt incremented
- rtentry's rt_refcnt is decremented after use (rtfree is always used instead
of rt_refcnt--)
- functions returning rtentry increment its rt_refcnt (and caller rtfree it)

Note that rt_refcnt prevents rtentry from being freed but doesn't prevent
rtentry from being updated. Toward MP-safe, we need to provide another
protection for rtentry, e.g., locks. (Or introduce a better data structure
allowing concurrent readers during updates.)


Revision tags: nick-nhusb-base-20150606
# 1.91 30-Apr-2015 ozaki-r

Make some functions static

- rtflushall
- rtcache_clear
- rtcache_invalidate

And pull these static inline functions in route.c

- rt_destroy
- rt_setkey


Revision tags: nick-nhusb-base-20150406
# 1.90 06-Apr-2015 ozaki-r

Classify and sort prototype declarations

No functional change.


# 1.89 06-Apr-2015 ozaki-r

Make rt_maskedcopy static


# 1.88 23-Mar-2015 roy

Add RTF_BROADCAST to mark routes used for the broadcast address when
they are created on the fly. This makes it clear what the route is for
and allows an optimisation in ip_output() by avoiding a call to
in_broadcast() because most of the time we do talk to a host.
It also avoids a needless allocation for the storage of llinfo_arp and
thus vanishes from arp(8) - it showed as incomplete anyway so this
is a nice side effect.

Guard against this and routes marked with RTF_BLACKHOLE in
ip_fastforward().
While here, guard against routes marked with RTF_BLACKHOLE in
ip6_fastforward().
RTF_BROADCAST is IPv4 only, so don't bother checking that here.


# 1.87 26-Feb-2015 roy

Introduce the routing flag RTF_LOCAL to track local address routes.
Add functions rt_ifa_addlocal() and rt_ifa_remlocal() to add and remove
local routes for the address and announce the new address and route
to the routing socket.

Add in_ifaddlocal() and in_ifremlocal() to use these functions.
Rename in6_if{add,rem}loop() to in6_if{add,rem}local() and use these
functions.

rtinit() no longer announces the address, just the network route for the
address. As such, calls to rt_newaddrmsg() have been removed from
in_addprefix() and in_scrubprefix().

This solves the problem of potentially more than one announcement, or no
announcement at all for the address in certain situations.


# 1.86 25-Feb-2015 roy

Rename nd6_rtmsg() to rt_newmsg() and move into the generic routing code
as it's not IPv6 specific and will be used elsewhere.


# 1.85 24-Feb-2015 roy

Clean comments and style.


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.84 06-Jun-2014 rmind

branches: 1.84.4;
- Eliminate RTFREE() macro in favour of rtfree() function.
- Make rtcache() function static.


Revision tags: yamt-pagecache-base9 rmind-smpnet-nbase rmind-smpnet-base
# 1.83 26-Apr-2014 pooka

It's been > 20years since rtioctl() did something. Let's just
remove that special way of returning EOPNOTSUPP.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base
# 1.82 01-Mar-2013 joerg

branches: 1.82.6; 1.82.10;
Retire OSI network stack. OK core@


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6 jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.81 18-Feb-2012 rmind

branches: 1.81.2;
rt_setkey: remove invalid assert, sockaddr_dup() may fail if no memory.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.80 11-Nov-2011 gdt

branches: 1.80.4;
Move RTF_ANNOUNCE flag so that it no longer conflicts with RTF_PROTO2.

RTF_ANNOUNCE was defined as RTF_PROTO2. The flag is used to indicated
that host should act as a proxy for a link level arp or ndp request.
(If RTF_PROTO2 is used as an experimental flag (as advertised),
various problems can occur.)

This commit provides a first-class definition with its own bit for
RTF_ANNOUNCE, removes the old aliasing definitions, and adds support
for the new RTF_ANNOUNCE flag to netstat(8) and route(8).,

Also, remove unused RTF_ flags that collide with RTF_PROTO1:
netinet/icmp6.h defined RTF_PROBEMTU as RTF_PROTO1
netinet/if_inarp.h defined RTF_USETRAILERS as RTF_PROTO1
(Neither of these flags are used anywhere. Both have been removed
to reduce chances of collision with RTF_PROTO1.)

Figuring this out and the diff are the work of Beverly Schwartz of
BBN.

(Passed release build, boot in VM, with no apparently related atf
failures.)

Approved for Public Release, Distribution Unlimited
This material is based upon work supported by the Defense Advanced
Research Projects Agency and Space and Naval Warfare Systems Center,
Pacific, under Contract No. N66001-09-C-2073.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.79 31-Mar-2011 dyoung

branches: 1.79.4;
Hide the radix-trie implementation of the forwarding table so that we
will have an easier time replacing it with something different, even if
it is a second radix-trie implementation.

sys/net/route.c and sys/net/rtsock.c no longer operate directly on
radix_nodes or radix_node_heads.

Hopefully this will reduce the temptation to implement multipath or
source-based routing using grotty hacks to the grotty old radix-trie
code, too. :-)


Revision tags: bouyer-quota2-nbase bouyer-quota2-base
# 1.78 01-Feb-2011 matt

Add a new AF/PF_ROUTE which is 64-bit clean which makes the routing socket
interface (and its associated sysctls) act identically for both 32 and 64 bit
programs. The old unclean one remains for backward compatibility.


# 1.77 26-Jan-2011 dyoung

Update comment on RTM_CHGADDR to describe better what it's for.


Revision tags: jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.76 12-Nov-2010 roy

branches: 1.76.2; 1.76.4;
Add RTM_CHGADDR to signal that an address on the interface has changed.
This is mainly used for notifying userland about active link address changes.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.75 26-Jun-2010 kefren

Add MPLS support, proposed on tech-net@ a couple of days ago

Welcome to 5.99.33


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211
# 1.74 03-Nov-2009 dyoung

branches: 1.74.2; 1.74.4;
s/u_quad_t/uint64_t/.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 jym-xensuspend-nbase yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.73 02-Apr-2009 christos

Centralize the ROUNDUP and ADVANCE macro in a header file, give them an
RT_ prefix and use them appropriately, instead of making copies. Make
pppd use the RT_ROUNDUP macro; fixes proxyarp setting on 64 bit hosts.

XXX: All this should be pulled up to 5.0


Revision tags: nick-hppapmap-base2 mjf-devfs2-base
# 1.72 11-Jan-2009 christos

branches: 1.72.2;
merge christos-time_t


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base christos-time_t-nbase haad-dm-base christos-time_t-base
# 1.71 07-Nov-2008 dyoung

*** Summary ***

When a link-layer address changes (e.g., ifconfig ex0 link
02:de:ad:be:ef:02 active), send a gratuitous ARP and/or a Neighbor
Advertisement to update the network-/link-layer address bindings
on our LAN peers.

Refuse a change of ethernet address to the address 00:00:00:00:00:00
or to any multicast/broadcast address. (Thanks matt@.)

Reorder ifnet ioctl operations so that driver ioctls may inherit
the functions of their "class"---ether_ioctl(), fddi_ioctl(), et
cetera---and the class ioctls may inherit from the generic ioctl,
ifioctl_common(), but both driver- and class-ioctls may override
the generic behavior. Make network drivers share more code.

Distinguish a "factory" link-layer address from others for the
purposes of both protecting that address from deletion and computing
EUI64.

Return consistent, appropriate error codes from network drivers.

Improve readability. KNF.

*** Details ***

In if_attach(), always initialize the interface ioctl routine,
ifnet->if_ioctl, if the driver has not already initialized it.
Delete if_ioctl == NULL tests everywhere else, because it cannot
happen.

In the ioctl routines of network interfaces, inherit common ioctl
behaviors by calling either ifioctl_common() or whichever ioctl
routine is appropriate for the class of interface---e.g., ether_ioctl()
for ethernets.

Stop (ab)using SIOCSIFADDR and start to use SIOCINITIFADDR. In
the user->kernel interface, SIOCSIFADDR's argument was an ifreq,
but on the protocol->ifnet interface, SIOCSIFADDR's argument was
an ifaddr. That was confusing, and it would work against me as I
make it possible for a network interface to overload most ioctls.
On the protocol->ifnet interface, replace SIOCSIFADDR with
SIOCINITIFADDR. In ifioctl(), return EPERM if userland tries to
invoke SIOCINITIFADDR.

In ifioctl(), give the interface the first shot at handling most
interface ioctls, and give the protocol the second shot, instead
of the other way around. Finally, let compatibility code (COMPAT_OSOCK)
take a shot.

Pull device initialization out of switch statements under
SIOCINITIFADDR. For example, pull ..._init() out of any switch
statement that looks like this:

switch (...->sa_family) {
case ...:
..._init();
...
break;
...
default:
..._init();
...
break;
}

Rewrite many if-else clauses that handle all permutations of IFF_UP
and IFF_RUNNING to use a switch statement,

switch (x & (IFF_UP|IFF_RUNNING)) {
case 0:
...
break;
case IFF_RUNNING:
...
break;
case IFF_UP:
...
break;
case IFF_UP|IFF_RUNNING:
...
break;
}

unifdef lots of code containing #ifdef FreeBSD, #ifdef NetBSD, and
#ifdef SIOCSIFMTU, especially in fwip(4) and in ndis(4).

In ipw(4), remove an if_set_sadl() call that is out of place.

In nfe(4), reuse the jumbo MTU logic in ether_ioctl().

Let ethernets register a callback for setting h/w state such as
promiscuous mode and the multicast filter in accord with a change
in the if_flags: ether_set_ifflags_cb() registers a callback that
returns ENETRESET if the caller should reset the ethernet by calling
if_init(), 0 on success, != 0 on failure. Pull common code from
ex(4), gem(4), nfe(4), sip(4), tlp(4), vge(4) into ether_ioctl(),
and register if_flags callbacks for those drivers.

Return ENOTTY instead of EINVAL for inappropriate ioctls. In
zyd(4), use ENXIO instead of ENOTTY to indicate that the device is
not any longer attached.

Add to if_set_sadl() a boolean 'factory' argument that indicates
whether a link-layer address was assigned by the factory or some
other source. In a comment, recommend using the factory address
for generating an EUI64, and update in6_get_hw_ifid() to prefer a
factory address to any other link-layer address.

Add a routing message, RTM_LLINFO_UPD, that tells protocols to
update the binding of network-layer addresses to link-layer addresses.
Implement this message in IPv4 and IPv6 by sending a gratuitous
ARP or a neighbor advertisement, respectively. Generate RTM_LLINFO_UPD
messages on a change of an interface's link-layer address.

In ether_ioctl(), do not let SIOCALIFADDR set a link-layer address
that is broadcast/multicast or equal to 00:00:00:00:00:00.

Make ether_ioctl() call ifioctl_common() to handle ioctls that it
does not understand.

In gif(4), initialize if_softc and use it, instead of assuming that
the gif_softc and ifp overlap.

Let ifioctl_common() handle SIOCGIFADDR.

Sprinkle rtcache_invariants(), which checks on DIAGNOSTIC kernels
that certain invariants on a struct route are satisfied.

In agr(4), rewrite agr_ioctl_filter() to be a bit more explicit
about the ioctls that we do not allow on an agr(4) member interface.

bzero -> memset. Delete unnecessary casts to void *. Use
sockaddr_in_init() and sockaddr_in6_init(). Compare pointers with
NULL instead of "testing truth". Replace some instances of (type
*)0 with NULL. Change some K&R prototypes to ANSI C, and join
lines.


Revision tags: netbsd-5-0-RC3 netbsd-5-0-RC2 netbsd-5-0-RC1 netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-baseX yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base yamt-nfs-mp-base yamt-pf42-base ad-socklock-base1
# 1.70 26-Mar-2008 ad

branches: 1.70.2; 1.70.6; 1.70.12; 1.70.14; 1.70.16;
Defer processing of routing messages to a soft interrupt. These can be
generated at IPL_VM and it's not safe to call directly into the socket
layer at that level. Reviewed by matt@.


Revision tags: yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase hpcarm-cleanup-base
# 1.69 20-Feb-2008 matt

branches: 1.69.2; 1.69.6;
s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)


Revision tags: mjf-devfs-base
# 1.68 11-Feb-2008 simonb

Don't look for <stdbool.h> if compiling _STANDALONE as well.


Revision tags: bouyer-xeni386-nbase
# 1.67 21-Jan-2008 dyoung

struct route is part of the kernel ABI (!!!), so move it back
outside of #ifdef _KERNEL. #include stdbool.h if !_KERNEL.


# 1.66 21-Jan-2008 dyoung

Move struct route inside of #ifdef _KERNEL to protect userland from
it.


# 1.65 21-Jan-2008 dyoung

In rtflushall(), do not clear a route cache by removing its rtentry
reference, but mark the cache 'invalid'. Let the next user of the
route cache check to whether or not the cache is valid, and update
the rtentry reference if necessary. In this way, avoid hairy
splnet()/splx() protection of route caches, which I never did trust.


Revision tags: bouyer-xeni386-base
# 1.64 14-Jan-2008 dyoung

Use rtcache_validate() instead of rtcache_getrt(). Delete rtcache_getrt().

In rtcache_lookup2(), use the return values of rtcache_validate()
and _rtcache_init() instead of looking at _ro_rt. Also, check the
return code of rtcache_setdst() for an error.


# 1.63 12-Jan-2008 dyoung

Good-bye, rtcache_check(). Call both rtcache_validate() and
rtcache_update(,1) instead of rtcache_check().


# 1.62 11-Jan-2008 dyoung

Cosmetic: remove redundant 'not' from a comment, re-wrap lines.


# 1.61 10-Jan-2008 dyoung

Make many void rtcache_X() routines return struct rtentry *, so
that we can make many back-to-back rtcache_X();rtcache_getrt()
calls into one rtcache_X() call.


Revision tags: matt-armv6-base
# 1.60 04-Jan-2008 dyoung

Replace rtcache_down() with rtcache_validate() and update rtcache_down()
uses.


Revision tags: vmlocking2-base3
# 1.59 20-Dec-2007 dyoung

Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.


Revision tags: nick-csl-alignment-base5 matt-armv6-prevmlocking yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base jmcneill-pm-base reinoud-bufcleanup-base vmlocking-base
# 1.58 27-Aug-2007 dyoung

branches: 1.58.2; 1.58.8; 1.58.10; 1.58.14;
Add a new routing message type, RTM_SETGATE. We can use an
RTM_SETGATE message to ask the link layer to fill in the link-layer
nexthop before we try to detect a duplicate route in a multipath-capable
kernel.


Revision tags: matt-mips64-base
# 1.57 19-Jul-2007 dyoung

branches: 1.57.4; 1.57.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.56 09-Jun-2007 dyoung

branches: 1.56.2;
Get rid of radix_node_head.rnh_walktree, because it is only ever
set to rn_walktree.

Introduce rt_walktree(), which applies a subroutine to every route
in a particular address family. Use it instead of rn_walktree()
virtually everywhere. This helps to hide the routing table
implementation.


Revision tags: yamt-idlelwp-base8
# 1.55 06-May-2007 dyoung

Factor rtcache_lookup2() out of rtcache_lookup1(), for re-use in
the IPv6 stack. rtcache_lookup2() takes an int * argument that it
writes with 1 if we had a cache 'hit', 0 if there was a cache
'miss'.


# 1.54 02-May-2007 dyoung

Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


# 1.53 22-Apr-2007 xtraeme

rtcache_clear is defined as static void in route.c, but it's used
in netinet/in_route.c. Move the prototype into route.h to fix
the build.


Revision tags: thorpej-atomic-base
# 1.52 04-Mar-2007 christos

branches: 1.52.2; 1.52.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.51 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.50 05-Jan-2007 joerg

branches: 1.50.2;
Add a debug option for the route cache to help tracing down issues
like PR 35272 and 35318. When the kernel is compiled with
-DRTCACHE_DEBUG, all rtcache entries are logged to a list with the place
they got initialised. This allows overwrites, double inits and other
manual messing to be detected.


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.49 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.48 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


# 1.47 07-Dec-2006 joerg

Deinline rt_get_ifa. Keep it in route.c as it is part of the routing
API, even though rtsock.c is the only user right now.


# 1.46 07-Dec-2006 joerg

Deinline rt_replace_ifa and move rt_set_ifa and rt_set_ifa1 to
route.c as they are not used outside that file.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base netbsd-4-base
# 1.45 13-Nov-2006 dyoung

Fix bugs in rt_get_ifa() and put aside the sequence number stuff,
which isn't ready for primetime yet.


# 1.44 13-Nov-2006 dyoung

Add a source-address selection policy mechanism to the kernel.

Also, add ioctls SIOCGIFADDRPREF/SIOCSIFADDRPREF to get/set preference
numbers for addresses. Make ifconfig(8) set/display preference
numbers.

To activate source-address selection policies in your kernel, add
'options IPSELSRC' to your kernel configuration.

Miscellaneous changes in support of source-address selection:

1 Factor out some common code, producing rt_replace_ifa().

2 Abbreviate a for-loop with TAILQ_FOREACH().

3 Add the predicates on IPv4 addresses IN_LINKLOCAL() and
IN_PRIVATE(), that are true for link-local unicast
(169.254/16) and RFC1918 private addresses, respectively.
Add the predicate IN_ANY_LOCAL() that is true for link-local
unicast and multicast.

4 Add IPv4-specific interface attach/detach routines,
in_domifattach and in_domifdetach, which build #ifdef
IPSELSRC.

See in_getifa(9) for a more thorough description of source-address
selection policy.


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base2 yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base simonb-timcounters-final yamt-pdpolicy-base5 chap-midi-base yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 elad-kernelauth-base yamt-pdpolicy-base yamt-uio_vmspace-base5 simonb-timecounters-base rpaulo-netinet-merge-pcb-base
# 1.43 11-Dec-2005 christos

branches: 1.43.20; 1.43.22;
merge ktrace-lwp.


Revision tags: ktrace-lwp-base
# 1.42 10-Dec-2005 elad

Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.41 22-Jun-2005 dyoung

branches: 1.41.2;
Resolve conflicts in importation of 18-May-2005 ath(4) / net80211(9)
from FreeBSD. Introduce compatibility shims (sys/dev/ic/ath_netbsd.[ch],
sys/net80211/ieee80211_netbsd.[ch]). Update drivers (an, atu, atw,
awi, ipw, iwi, rtw, wi) for the new net80211(9) API.


# 1.40 29-May-2005 christos

- sprinkle const
- remove unneeded casts
- use more mem*() instead of b*() funcs.


Revision tags: netbsd-3-1-1-RELEASE netbsd-3-0-3-RELEASE netbsd-3-1-RELEASE netbsd-3-0-2-RELEASE netbsd-3-1-RC4 netbsd-3-1-RC3 netbsd-3-1-RC2 netbsd-3-1-RC1 netbsd-3-0-1-RELEASE netbsd-3-0-RELEASE netbsd-3-0-RC6 netbsd-3-0-RC5 netbsd-3-0-RC4 netbsd-3-0-RC3 netbsd-3-0-RC2 netbsd-3-0-RC1 yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.39 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.38 21-Apr-2004 matt

branches: 1.38.4; 1.38.6;
ANSI-fy and some additional de-__P and constification.


# 1.37 21-Apr-2004 matt

Constify if.c radix.c and route.c (and fix related fallout).


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.36 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.35 29-Jun-2003 fvdl

branches: 1.35.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.34 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.33 18-Jan-2003 wiz

bandwidth, not bandwith.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.32 12-Nov-2002 itojun

remove all entries in rt timer queue on ip_mtudisc change, instead of
destroying the queue.


# 1.31 12-Nov-2002 itojun

add an argument to rt_timer_remove_all(), to specify if we need to call
timeout routine on removal.


# 1.30 02-Nov-2002 perry

/*CONTCOND*/ while (0)'ed macros


Revision tags: kqueue-aftermerge kqueue-beforemerge netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base gehenna-devsw-base kqueue-base
# 1.29 12-May-2002 matt

branches: 1.29.4;
Eliminate more commons.


Revision tags: eeh-devprop-base newlock-base ifpoll-base thorpej-mips-cache-base thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.28 08-Mar-2001 enami

branches: 1.28.2;
- lineup comment.
- fix typo in comment.


# 1.27 21-Feb-2001 itojun

branches: 1.27.2;
use u_quad_t for rtstat.
not sure if it really matters, but short (32K) looks way too small given
recent fat pipes connecting *BSD boxes, and our great uptime :-).


# 1.26 27-Jan-2001 itojun

cleanup cloned route when parent route (RTF_CLONING) goes away.
adds rt_parent to link parent from child (like NRL did, ours do refcnt
rt_refcnt properly).

bsdi rt_walkbranch would speedup the processing, but since the code will not
be visited too frequently, the current code (with rt_walktree) should be okay.


# 1.25 27-Jan-2001 itojun

mark cloned routes with RTF_CLONED. present it with netstat -r by "c".

let static routes overwrite cloned routes, as cloned routes can come back again
if necessary. behavior same as freebsd/bsdi, code partially from bsdi42.
(NRL rt->rt_parent was not added)
should fix PR 11916 and maybe some other PRs with ARP behavior.

recompilation of usr.sbin/route6d is suggested.


# 1.24 17-Jan-2001 itojun

pull post-4.4BSD change to sys/net/route.c from BSD/OS 4.2 (UCB copyrighted).

have sys/net/route.c:rtrequest1(), which takes rt_addrinfo * as the argument.
pass rt_addrinfo all the way down to rtrequest, and ifa->ifa_rtrequest.
3rd arg of ifa->ifa_rtrequest is now rt_addrinfo * instead of sockaddr *
(almost noone is using it anyways).

benefit: the follwoing command now works. previously we need two route(8)
invocations, "add" then "change".
# route add -inet6 default ::1 -ifp gif0

remove unsafe typecast in rtrequest(), from rtentry * to sockaddr *. it was
introduced by 4.3BSD-reno and never corrected.

XXX is eon_rtrequest() change correct regarding to 3rd arg?
eon_rtrequest() and rtrequest() were incorrect since 4.3BSD-reno,
so i do not have correct answer in the source code.
someone with more clue about netiso-over-ip, please help.


# 1.23 09-Dec-2000 itojun

update icmp6 too big validation. the change is necessary since pmtud is
mandatory for IPv6 (so we can't just validate by using connected pcb - we need
to allow traffic from unconnected pcb to do pmtud).
- if the traffic is validated by xx_ctlinput, allow up to "hiwat" pmtud
route entries.
- if the traffic was not validated by xx_ctlinput, allow up to "lowat" pmtud
route entries (there's upper limit, so bad guys cannot blow up our routing
table).
sync with kame

XXX need to think again about default hiwat/lowat value.
XXX victim selection to help starvation case


Revision tags: netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.22 04-May-2000 ragge

branches: 1.22.4;
Change rt_refcnt from short to int, to allow more than 32k routes thru
one interface without unexpected side effects.


# 1.21 06-Mar-2000 thorpej

- Add link status to if_data, so that routing daemons and other interested
parties can easily know the state of a link.
- Define an interface announcement message for the routing socket so that
routing daemons and other interested parties know when an interface
is attached/detached.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.20 19-Nov-1999 bouyer

Update protocoles and interfaces stats counters to 64bit.
RTM_IFINFO is now 0xf, 0xe is RTM_OIFINFO which returns the old (if_msghdr14)
struct with 32bit counters (binary compat, conditioned on COMPAT_14).
Same for sysctl: node 3 is renamed NET_RT_OIFLIST, NET_RT_IFLIST is now node 4.
Change rt_msg1() to add an mbuf to the mbuf chain instead of just panic()
when the message is larger than MHLEN.


Revision tags: comdex-fall-1999-base fvdl-softdep-base chs-ubc2-base
# 1.19 30-Jul-1999 itojun

branches: 1.19.2; 1.19.8;
remove reference to in6_systm.h (file itself will be removed afterwords)


# 1.18 01-Jul-1999 itojun

IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


Revision tags: netbsd-1-4-PATCH003 netbsd-1-4-PATCH002 netbsd-1-4-PATCH001 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.17 27-Dec-1998 thorpej

branches: 1.17.4; 1.17.6;
Simplify the rttimer code somewhat; use TAILQs instead of CIRCLEQs (we
didn't really need to traverse the queues backwards anyhow), and other
minor code simplification.


# 1.16 10-Dec-1998 christos

IPX counters and centralize statistics routine.


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.15 25-Aug-1998 thorpej

Use do { ... } while (0) in RTFREE().


Revision tags: eeh-paddr_t-base
# 1.14 02-May-1998 thorpej

Need <sys/socket.h> to stand alone.


# 1.13 29-Apr-1998 thorpej

Oops, we depend on <sys/queue.h>.


# 1.12 29-Apr-1998 kml

Add generic route timeout functionality; used by path MTU discovery code


Revision tags: netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base thorpej-signal-base marc-pcmcia-bp marc-pcmcia-base
# 1.11 02-Apr-1997 christos

branches: 1.11.8;
Sync with Lite2.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.10 22-May-1996 mycroft

Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.9 13-Feb-1996 christos

branches: 1.9.4;
Net prototypes


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.8 26-Mar-1995 jtc

KERNEL -> _KERNEL


# 1.7 08-Mar-1995 cgd

fixed sized types, where appropriate. when casting pointers to
integers to do math on them, cast to long. ioctl commands are
u_longs.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.6 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.5 13-May-1994 mycroft

Update to 4.4-Lite networking code, with a few local changes.


# 1.4 11-May-1994 mycroft

Update to RTM version 3. Add prototypes. Add some new constants which are
not used yet.


Revision tags: magnum-base netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.3 20-May-1993 cgd

add rcs ids to everything, and clean up headers


# 1.2 19-Apr-1993 mycroft

Add consistent multiple-inclusion protection.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.124 22-Aug-2019 roy

rtsock: rework rt_clonedmsg to take a message type and lladdr

We will use this in a future patch to notify userland of lladdr
changes.

XXX pullup -8 -9


Revision tags: netbsd-9-base phil-wifi-20190609
# 1.123 29-Apr-2019 roy

Introduce rt_addrmsg_src which adds RTA_AUTHOR to the message.
Use this when we notify userland of a duplicate address
and set RTA_AUTHOR to the hardware address of the sender.

While here, match the logging diagnostic of INET6 to the simpler one
of INET so it's consistent.


# 1.122 29-Apr-2019 roy

rtsock: Route address message simplification

Rename rt_newaddrmsg to rt_addrmsg_rt.
Add rt_addrmsg which drops the error and route arguments which are only
needed by one caller.


# 1.121 29-Apr-2019 pgoyette

For the rtsock compat code, make sure we create the "oroute" sysctl
tree. Otherwise a 5.2 version of getifaddrs(2) gets errors.

This makes the 5.2 version of ifconfig(8) behave the same on both
NetBSD-8 and -current. HOWEVER, both of them print nothing (for
``ifconfig -l'' command) so there's still a bug somewhere.

As reported originally by der Mouse.


Revision tags: isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126
# 1.120 30-Oct-2018 ozaki-r

Avoid double rt_replace_ifa on rtrequest1(RTM_ADD)

Some callers of rtrequest1(RTM_ADD) adjust rt_ifa of an rtentry created by
rtrequest1 that may change rt_ifa (in ifa_rtrequest) with another ifa that is
different from requested one. It's wasteful and even worse introduces a race
condition. rtrequest1 should just use a passed ifa as is if a caller hopes so.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422
# 1.119 19-Apr-2018 christos

branches: 1.119.2;
s/static inline/static __inline/g for consistency.


Revision tags: pgoyette-compat-0415
# 1.118 12-Apr-2018 ozaki-r

Resolve tangled lock dependencies in route.c

This change sweeps remaining lock decisions based on if locked or not by moving
utility functions of rtentry updates from rtsock.c and ensuring holding the
rt_lock. It also improves the atomicity of a update of a rtentry.


Revision tags: pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.117 09-Jan-2018 christos

branches: 1.117.2;
Use a queue of deferred entries to delete routes instead of a fixed stack
of 10. Otherwise we can overflow in route deletions from the rexmit timer.
XXX: pullup-8


# 1.116 18-Dec-2017 ozaki-r

Show ARP/NDP caches as LLINFO not LLDATA for backward compatiblity


# 1.115 13-Dec-2017 christos

Add bit definitions for flags so that route(8) can use them.


Revision tags: tls-maxphys-base-20171202
# 1.114 21-Sep-2017 ozaki-r

Invalidate rtcache based on a global generation counter

The change introduces a global generation counter that is incremented when any
routes have been added or deleted. When a rtcache caches a rtentry into itself,
it also stores a snapshot of the generation counter. If the snapshot equals to
the global counter, the cache is still valid, otherwise invalidated.

One drawback of the change is that all rtcaches of all protocol families are
invalidated when any routes of any protocol families are added or deleted.
If that matters, we should have separate generation counters based on
protocol families.

This change removes LIST_ENTRY from struct route, which fixes a part of
PR kern/52515.


Revision tags: nick-nhusb-base-20170825 perseant-stdc-iso10646-base
# 1.113 16-Jun-2017 ozaki-r

Sending a routing message (RTM_ADD) on adding an llentry

A message used to be sent on adding a cloned route. Restore the
behavior for backward compatibility.

Requested by ryo@


Revision tags: netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.112 11-Apr-2017 roy

branches: 1.112.4;
Add RO_MSGFILTER socket option to PF_ROUTE to filter out
un-wanted route(4) messages.

Inspired by the ROUTE_MSGFILTER equivalent in OpenBSD,
but with an API which allows the full range of potential message types.


Revision tags: jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107
# 1.111 19-Dec-2016 roy

branches: 1.111.2;
Fix gcc complaining about int to unsigned long conversion issues by
explictly marking as unsigned in RT_ROUNDUP2.


# 1.110 16-Dec-2016 christos

Can't hide stuff from userland, because struct route is embedded in other
structures (like inpcb) and things like fstat stop working.


# 1.109 12-Dec-2016 ozaki-r

Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.108 08-Dec-2016 ozaki-r

Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.


Revision tags: nick-nhusb-base-20161204
# 1.107 15-Nov-2016 ozaki-r

Don't use rt_walktree to delete routes

Some functions use rt_walktree to scan the routing table and delete
matched routes. However, we shouldn't use rt_walktree to delete
routes because rt_walktree is recursive to the routing table (radix
tree) and isn't friendly to MP-ification. rt_walktree allows a caller
to pass a callback function to delete an matched entry. The callback
function is called from an API of the radix tree (rn_walktree) but
also calls an API of the radix tree to delete an entry.

This change adds a new API of the radix tree, rn_search_matched,
which returns a matched entry that is selected by a callback
function passed by a caller and the caller itself deletes the
entry. By using the API, we can avoid the recursive form.


Revision tags: pgoyette-localcount-20161104
# 1.106 25-Oct-2016 ozaki-r

Remove unnecessary argument

No functional change.


# 1.105 21-Oct-2016 ozaki-r

Make some rt_timer functions and variables static

No functional change.


# 1.104 18-Oct-2016 ozaki-r

Remove unused rtcache_lookup_noclone


Revision tags: nick-nhusb-base-20161004
# 1.103 21-Sep-2016 roy

Add ifam_pid and ifam_addrflags to ifa_msghdr.
Re-version RTM_NEWADDR, RTM_DELADDR, RTM_CHGADDR and NET_RT_IFLIST.
Add compat code for old version.


Revision tags: localcount-20160914 pgoyette-localcount-20160806
# 1.102 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


Revision tags: pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529
# 1.101 28-Apr-2016 ozaki-r

branches: 1.101.2;
Constify rtentry of if_output

We no longer need to change rtentry below if_output.

The change makes it clear where rtentries are changed (or not)
and helps forthcoming locking (os psrefing) rtentries.


# 1.100 26-Apr-2016 ozaki-r

Stop using rt_gwroute on packet sending paths

rt_gwroute of rtentry is a reference to a rtentry of the gateway
for a rtentry with RTF_GATEWAY. That was used by L2 (arp and ndp)
to look up L2 addresses. By separating L2 nexthop caches, we don't
need a route for the purpose and we can stop using rt_gwroute.
By doing so, we can reduce referencing and modifying rtentries,
which makes it easy to apply a lock (and/or psref) to the
routing table and rtentries.

One issue to do this is to keep RTF_REJECT behavior. It seems it
was broken when we moved rtalloc1 things from L2 output routines
(e.g., ether_output) to ip_hresolv_output, but (fortunately?)
it works unexpectedly. What we mistook are:
- RTF_REJECT was checked for any routes in L2 output routines,
but in ip_hresolv_output it is checked only when the route
is RTF_GATEWAY
- The RTF_REJECT check wasn't copied to IPv6 (nd6_output)

It seems that rt_gwroute checks hid the mistakes and it looked
work (unexpectedly) and removing rt_gwroute checks unveil the
issue. So we need to fix RTF_REJECT checks in ip_hresolv_output
and also add them to nd6_output.

One more point we have to care is returning an errno; we need
to mimic looutput behavior. Originally RTF_REJECT check was
done either in L2 output routines or in looutput. The latter is
applied when a reject route directs to a loopback interface.
However, now RTF_REJECT check is done before looutput so to keep
the original behavior we need to return an errno which looutput
chooses. Added rt_check_reject_route does such tweaks.


Revision tags: nick-nhusb-base-20160422
# 1.99 11-Apr-2016 ozaki-r

Don't use radix tree API directly


# 1.98 04-Apr-2016 ozaki-r

Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html


# 1.97 24-Mar-2016 ozaki-r

Constify rt_newmsg's arguments


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.96 02-Sep-2015 ozaki-r

Do rt_refcnt++ when set a rtentry to another rtentry's rt_gwroute

And also do rtfree when deref a rtentry from rt_gwroute.


# 1.95 31-Aug-2015 ozaki-r

Hook up lltable/llentry with the kernel (and rumpkernel)

It is built and initialized on bootup, but there is no user for now.

Most codes in in.c are imported from FreeBSD as well as lltable/llentry.


# 1.94 31-Aug-2015 ozaki-r

Make rt_refcnt take into account rt_timer


# 1.93 24-Aug-2015 ozaki-r

Add an assertion; if rtcache has an rtentry, its refcnt must be > 0


# 1.92 17-Jul-2015 ozaki-r

Reform use of rt_refcnt

rt_refcnt of rtentry was used in bad manners, for example, direct rt_refcnt++
and rt_refcnt-- outside route.c, "rt->rt_refcnt++; rtfree(rt);" idiom, and
touching rt after rt->rt_refcnt--.

These abuses seem to be needed because rt_refcnt manages only references
between rtentry and doesn't take care of references during packet processing
(IOW references from local variables). In order to reduce the above abuses,
the latter cases should be counted by rt_refcnt as well as the former cases.

This change improves consistency of use of rt_refcnt:
- rtentry is always accessed with rt_refcnt incremented
- rtentry's rt_refcnt is decremented after use (rtfree is always used instead
of rt_refcnt--)
- functions returning rtentry increment its rt_refcnt (and caller rtfree it)

Note that rt_refcnt prevents rtentry from being freed but doesn't prevent
rtentry from being updated. Toward MP-safe, we need to provide another
protection for rtentry, e.g., locks. (Or introduce a better data structure
allowing concurrent readers during updates.)


Revision tags: nick-nhusb-base-20150606
# 1.91 30-Apr-2015 ozaki-r

Make some functions static

- rtflushall
- rtcache_clear
- rtcache_invalidate

And pull these static inline functions in route.c

- rt_destroy
- rt_setkey


Revision tags: nick-nhusb-base-20150406
# 1.90 06-Apr-2015 ozaki-r

Classify and sort prototype declarations

No functional change.


# 1.89 06-Apr-2015 ozaki-r

Make rt_maskedcopy static


# 1.88 23-Mar-2015 roy

Add RTF_BROADCAST to mark routes used for the broadcast address when
they are created on the fly. This makes it clear what the route is for
and allows an optimisation in ip_output() by avoiding a call to
in_broadcast() because most of the time we do talk to a host.
It also avoids a needless allocation for the storage of llinfo_arp and
thus vanishes from arp(8) - it showed as incomplete anyway so this
is a nice side effect.

Guard against this and routes marked with RTF_BLACKHOLE in
ip_fastforward().
While here, guard against routes marked with RTF_BLACKHOLE in
ip6_fastforward().
RTF_BROADCAST is IPv4 only, so don't bother checking that here.


# 1.87 26-Feb-2015 roy

Introduce the routing flag RTF_LOCAL to track local address routes.
Add functions rt_ifa_addlocal() and rt_ifa_remlocal() to add and remove
local routes for the address and announce the new address and route
to the routing socket.

Add in_ifaddlocal() and in_ifremlocal() to use these functions.
Rename in6_if{add,rem}loop() to in6_if{add,rem}local() and use these
functions.

rtinit() no longer announces the address, just the network route for the
address. As such, calls to rt_newaddrmsg() have been removed from
in_addprefix() and in_scrubprefix().

This solves the problem of potentially more than one announcement, or no
announcement at all for the address in certain situations.


# 1.86 25-Feb-2015 roy

Rename nd6_rtmsg() to rt_newmsg() and move into the generic routing code
as it's not IPv6 specific and will be used elsewhere.


# 1.85 24-Feb-2015 roy

Clean comments and style.


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.84 06-Jun-2014 rmind

branches: 1.84.4;
- Eliminate RTFREE() macro in favour of rtfree() function.
- Make rtcache() function static.


Revision tags: yamt-pagecache-base9 rmind-smpnet-nbase rmind-smpnet-base
# 1.83 26-Apr-2014 pooka

It's been > 20years since rtioctl() did something. Let's just
remove that special way of returning EOPNOTSUPP.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base
# 1.82 01-Mar-2013 joerg

branches: 1.82.6; 1.82.10;
Retire OSI network stack. OK core@


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6 jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.81 18-Feb-2012 rmind

branches: 1.81.2;
rt_setkey: remove invalid assert, sockaddr_dup() may fail if no memory.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.80 11-Nov-2011 gdt

branches: 1.80.4;
Move RTF_ANNOUNCE flag so that it no longer conflicts with RTF_PROTO2.

RTF_ANNOUNCE was defined as RTF_PROTO2. The flag is used to indicated
that host should act as a proxy for a link level arp or ndp request.
(If RTF_PROTO2 is used as an experimental flag (as advertised),
various problems can occur.)

This commit provides a first-class definition with its own bit for
RTF_ANNOUNCE, removes the old aliasing definitions, and adds support
for the new RTF_ANNOUNCE flag to netstat(8) and route(8).,

Also, remove unused RTF_ flags that collide with RTF_PROTO1:
netinet/icmp6.h defined RTF_PROBEMTU as RTF_PROTO1
netinet/if_inarp.h defined RTF_USETRAILERS as RTF_PROTO1
(Neither of these flags are used anywhere. Both have been removed
to reduce chances of collision with RTF_PROTO1.)

Figuring this out and the diff are the work of Beverly Schwartz of
BBN.

(Passed release build, boot in VM, with no apparently related atf
failures.)

Approved for Public Release, Distribution Unlimited
This material is based upon work supported by the Defense Advanced
Research Projects Agency and Space and Naval Warfare Systems Center,
Pacific, under Contract No. N66001-09-C-2073.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.79 31-Mar-2011 dyoung

branches: 1.79.4;
Hide the radix-trie implementation of the forwarding table so that we
will have an easier time replacing it with something different, even if
it is a second radix-trie implementation.

sys/net/route.c and sys/net/rtsock.c no longer operate directly on
radix_nodes or radix_node_heads.

Hopefully this will reduce the temptation to implement multipath or
source-based routing using grotty hacks to the grotty old radix-trie
code, too. :-)


Revision tags: bouyer-quota2-nbase bouyer-quota2-base
# 1.78 01-Feb-2011 matt

Add a new AF/PF_ROUTE which is 64-bit clean which makes the routing socket
interface (and its associated sysctls) act identically for both 32 and 64 bit
programs. The old unclean one remains for backward compatibility.


# 1.77 26-Jan-2011 dyoung

Update comment on RTM_CHGADDR to describe better what it's for.


Revision tags: jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.76 12-Nov-2010 roy

branches: 1.76.2; 1.76.4;
Add RTM_CHGADDR to signal that an address on the interface has changed.
This is mainly used for notifying userland about active link address changes.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.75 26-Jun-2010 kefren

Add MPLS support, proposed on tech-net@ a couple of days ago

Welcome to 5.99.33


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211
# 1.74 03-Nov-2009 dyoung

branches: 1.74.2; 1.74.4;
s/u_quad_t/uint64_t/.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 jym-xensuspend-nbase yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.73 02-Apr-2009 christos

Centralize the ROUNDUP and ADVANCE macro in a header file, give them an
RT_ prefix and use them appropriately, instead of making copies. Make
pppd use the RT_ROUNDUP macro; fixes proxyarp setting on 64 bit hosts.

XXX: All this should be pulled up to 5.0


Revision tags: nick-hppapmap-base2 mjf-devfs2-base
# 1.72 11-Jan-2009 christos

branches: 1.72.2;
merge christos-time_t


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base christos-time_t-nbase haad-dm-base christos-time_t-base
# 1.71 07-Nov-2008 dyoung

*** Summary ***

When a link-layer address changes (e.g., ifconfig ex0 link
02:de:ad:be:ef:02 active), send a gratuitous ARP and/or a Neighbor
Advertisement to update the network-/link-layer address bindings
on our LAN peers.

Refuse a change of ethernet address to the address 00:00:00:00:00:00
or to any multicast/broadcast address. (Thanks matt@.)

Reorder ifnet ioctl operations so that driver ioctls may inherit
the functions of their "class"---ether_ioctl(), fddi_ioctl(), et
cetera---and the class ioctls may inherit from the generic ioctl,
ifioctl_common(), but both driver- and class-ioctls may override
the generic behavior. Make network drivers share more code.

Distinguish a "factory" link-layer address from others for the
purposes of both protecting that address from deletion and computing
EUI64.

Return consistent, appropriate error codes from network drivers.

Improve readability. KNF.

*** Details ***

In if_attach(), always initialize the interface ioctl routine,
ifnet->if_ioctl, if the driver has not already initialized it.
Delete if_ioctl == NULL tests everywhere else, because it cannot
happen.

In the ioctl routines of network interfaces, inherit common ioctl
behaviors by calling either ifioctl_common() or whichever ioctl
routine is appropriate for the class of interface---e.g., ether_ioctl()
for ethernets.

Stop (ab)using SIOCSIFADDR and start to use SIOCINITIFADDR. In
the user->kernel interface, SIOCSIFADDR's argument was an ifreq,
but on the protocol->ifnet interface, SIOCSIFADDR's argument was
an ifaddr. That was confusing, and it would work against me as I
make it possible for a network interface to overload most ioctls.
On the protocol->ifnet interface, replace SIOCSIFADDR with
SIOCINITIFADDR. In ifioctl(), return EPERM if userland tries to
invoke SIOCINITIFADDR.

In ifioctl(), give the interface the first shot at handling most
interface ioctls, and give the protocol the second shot, instead
of the other way around. Finally, let compatibility code (COMPAT_OSOCK)
take a shot.

Pull device initialization out of switch statements under
SIOCINITIFADDR. For example, pull ..._init() out of any switch
statement that looks like this:

switch (...->sa_family) {
case ...:
..._init();
...
break;
...
default:
..._init();
...
break;
}

Rewrite many if-else clauses that handle all permutations of IFF_UP
and IFF_RUNNING to use a switch statement,

switch (x & (IFF_UP|IFF_RUNNING)) {
case 0:
...
break;
case IFF_RUNNING:
...
break;
case IFF_UP:
...
break;
case IFF_UP|IFF_RUNNING:
...
break;
}

unifdef lots of code containing #ifdef FreeBSD, #ifdef NetBSD, and
#ifdef SIOCSIFMTU, especially in fwip(4) and in ndis(4).

In ipw(4), remove an if_set_sadl() call that is out of place.

In nfe(4), reuse the jumbo MTU logic in ether_ioctl().

Let ethernets register a callback for setting h/w state such as
promiscuous mode and the multicast filter in accord with a change
in the if_flags: ether_set_ifflags_cb() registers a callback that
returns ENETRESET if the caller should reset the ethernet by calling
if_init(), 0 on success, != 0 on failure. Pull common code from
ex(4), gem(4), nfe(4), sip(4), tlp(4), vge(4) into ether_ioctl(),
and register if_flags callbacks for those drivers.

Return ENOTTY instead of EINVAL for inappropriate ioctls. In
zyd(4), use ENXIO instead of ENOTTY to indicate that the device is
not any longer attached.

Add to if_set_sadl() a boolean 'factory' argument that indicates
whether a link-layer address was assigned by the factory or some
other source. In a comment, recommend using the factory address
for generating an EUI64, and update in6_get_hw_ifid() to prefer a
factory address to any other link-layer address.

Add a routing message, RTM_LLINFO_UPD, that tells protocols to
update the binding of network-layer addresses to link-layer addresses.
Implement this message in IPv4 and IPv6 by sending a gratuitous
ARP or a neighbor advertisement, respectively. Generate RTM_LLINFO_UPD
messages on a change of an interface's link-layer address.

In ether_ioctl(), do not let SIOCALIFADDR set a link-layer address
that is broadcast/multicast or equal to 00:00:00:00:00:00.

Make ether_ioctl() call ifioctl_common() to handle ioctls that it
does not understand.

In gif(4), initialize if_softc and use it, instead of assuming that
the gif_softc and ifp overlap.

Let ifioctl_common() handle SIOCGIFADDR.

Sprinkle rtcache_invariants(), which checks on DIAGNOSTIC kernels
that certain invariants on a struct route are satisfied.

In agr(4), rewrite agr_ioctl_filter() to be a bit more explicit
about the ioctls that we do not allow on an agr(4) member interface.

bzero -> memset. Delete unnecessary casts to void *. Use
sockaddr_in_init() and sockaddr_in6_init(). Compare pointers with
NULL instead of "testing truth". Replace some instances of (type
*)0 with NULL. Change some K&R prototypes to ANSI C, and join
lines.


Revision tags: netbsd-5-0-RC3 netbsd-5-0-RC2 netbsd-5-0-RC1 netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-baseX yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base yamt-nfs-mp-base yamt-pf42-base ad-socklock-base1
# 1.70 26-Mar-2008 ad

branches: 1.70.2; 1.70.6; 1.70.12; 1.70.14; 1.70.16;
Defer processing of routing messages to a soft interrupt. These can be
generated at IPL_VM and it's not safe to call directly into the socket
layer at that level. Reviewed by matt@.


Revision tags: yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase hpcarm-cleanup-base
# 1.69 20-Feb-2008 matt

branches: 1.69.2; 1.69.6;
s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)


Revision tags: mjf-devfs-base
# 1.68 11-Feb-2008 simonb

Don't look for <stdbool.h> if compiling _STANDALONE as well.


Revision tags: bouyer-xeni386-nbase
# 1.67 21-Jan-2008 dyoung

struct route is part of the kernel ABI (!!!), so move it back
outside of #ifdef _KERNEL. #include stdbool.h if !_KERNEL.


# 1.66 21-Jan-2008 dyoung

Move struct route inside of #ifdef _KERNEL to protect userland from
it.


# 1.65 21-Jan-2008 dyoung

In rtflushall(), do not clear a route cache by removing its rtentry
reference, but mark the cache 'invalid'. Let the next user of the
route cache check to whether or not the cache is valid, and update
the rtentry reference if necessary. In this way, avoid hairy
splnet()/splx() protection of route caches, which I never did trust.


Revision tags: bouyer-xeni386-base
# 1.64 14-Jan-2008 dyoung

Use rtcache_validate() instead of rtcache_getrt(). Delete rtcache_getrt().

In rtcache_lookup2(), use the return values of rtcache_validate()
and _rtcache_init() instead of looking at _ro_rt. Also, check the
return code of rtcache_setdst() for an error.


# 1.63 12-Jan-2008 dyoung

Good-bye, rtcache_check(). Call both rtcache_validate() and
rtcache_update(,1) instead of rtcache_check().


# 1.62 11-Jan-2008 dyoung

Cosmetic: remove redundant 'not' from a comment, re-wrap lines.


# 1.61 10-Jan-2008 dyoung

Make many void rtcache_X() routines return struct rtentry *, so
that we can make many back-to-back rtcache_X();rtcache_getrt()
calls into one rtcache_X() call.


Revision tags: matt-armv6-base
# 1.60 04-Jan-2008 dyoung

Replace rtcache_down() with rtcache_validate() and update rtcache_down()
uses.


Revision tags: vmlocking2-base3
# 1.59 20-Dec-2007 dyoung

Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.


Revision tags: nick-csl-alignment-base5 matt-armv6-prevmlocking yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base jmcneill-pm-base reinoud-bufcleanup-base vmlocking-base
# 1.58 27-Aug-2007 dyoung

branches: 1.58.2; 1.58.8; 1.58.10; 1.58.14;
Add a new routing message type, RTM_SETGATE. We can use an
RTM_SETGATE message to ask the link layer to fill in the link-layer
nexthop before we try to detect a duplicate route in a multipath-capable
kernel.


Revision tags: matt-mips64-base
# 1.57 19-Jul-2007 dyoung

branches: 1.57.4; 1.57.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.56 09-Jun-2007 dyoung

branches: 1.56.2;
Get rid of radix_node_head.rnh_walktree, because it is only ever
set to rn_walktree.

Introduce rt_walktree(), which applies a subroutine to every route
in a particular address family. Use it instead of rn_walktree()
virtually everywhere. This helps to hide the routing table
implementation.


Revision tags: yamt-idlelwp-base8
# 1.55 06-May-2007 dyoung

Factor rtcache_lookup2() out of rtcache_lookup1(), for re-use in
the IPv6 stack. rtcache_lookup2() takes an int * argument that it
writes with 1 if we had a cache 'hit', 0 if there was a cache
'miss'.


# 1.54 02-May-2007 dyoung

Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


# 1.53 22-Apr-2007 xtraeme

rtcache_clear is defined as static void in route.c, but it's used
in netinet/in_route.c. Move the prototype into route.h to fix
the build.


Revision tags: thorpej-atomic-base
# 1.52 04-Mar-2007 christos

branches: 1.52.2; 1.52.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.51 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.50 05-Jan-2007 joerg

branches: 1.50.2;
Add a debug option for the route cache to help tracing down issues
like PR 35272 and 35318. When the kernel is compiled with
-DRTCACHE_DEBUG, all rtcache entries are logged to a list with the place
they got initialised. This allows overwrites, double inits and other
manual messing to be detected.


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.49 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.48 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


# 1.47 07-Dec-2006 joerg

Deinline rt_get_ifa. Keep it in route.c as it is part of the routing
API, even though rtsock.c is the only user right now.


# 1.46 07-Dec-2006 joerg

Deinline rt_replace_ifa and move rt_set_ifa and rt_set_ifa1 to
route.c as they are not used outside that file.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base netbsd-4-base
# 1.45 13-Nov-2006 dyoung

Fix bugs in rt_get_ifa() and put aside the sequence number stuff,
which isn't ready for primetime yet.


# 1.44 13-Nov-2006 dyoung

Add a source-address selection policy mechanism to the kernel.

Also, add ioctls SIOCGIFADDRPREF/SIOCSIFADDRPREF to get/set preference
numbers for addresses. Make ifconfig(8) set/display preference
numbers.

To activate source-address selection policies in your kernel, add
'options IPSELSRC' to your kernel configuration.

Miscellaneous changes in support of source-address selection:

1 Factor out some common code, producing rt_replace_ifa().

2 Abbreviate a for-loop with TAILQ_FOREACH().

3 Add the predicates on IPv4 addresses IN_LINKLOCAL() and
IN_PRIVATE(), that are true for link-local unicast
(169.254/16) and RFC1918 private addresses, respectively.
Add the predicate IN_ANY_LOCAL() that is true for link-local
unicast and multicast.

4 Add IPv4-specific interface attach/detach routines,
in_domifattach and in_domifdetach, which build #ifdef
IPSELSRC.

See in_getifa(9) for a more thorough description of source-address
selection policy.


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base2 yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base simonb-timcounters-final yamt-pdpolicy-base5 chap-midi-base yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 elad-kernelauth-base yamt-pdpolicy-base yamt-uio_vmspace-base5 simonb-timecounters-base rpaulo-netinet-merge-pcb-base
# 1.43 11-Dec-2005 christos

branches: 1.43.20; 1.43.22;
merge ktrace-lwp.


Revision tags: ktrace-lwp-base
# 1.42 10-Dec-2005 elad

Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.41 22-Jun-2005 dyoung

branches: 1.41.2;
Resolve conflicts in importation of 18-May-2005 ath(4) / net80211(9)
from FreeBSD. Introduce compatibility shims (sys/dev/ic/ath_netbsd.[ch],
sys/net80211/ieee80211_netbsd.[ch]). Update drivers (an, atu, atw,
awi, ipw, iwi, rtw, wi) for the new net80211(9) API.


# 1.40 29-May-2005 christos

- sprinkle const
- remove unneeded casts
- use more mem*() instead of b*() funcs.


Revision tags: netbsd-3-1-1-RELEASE netbsd-3-0-3-RELEASE netbsd-3-1-RELEASE netbsd-3-0-2-RELEASE netbsd-3-1-RC4 netbsd-3-1-RC3 netbsd-3-1-RC2 netbsd-3-1-RC1 netbsd-3-0-1-RELEASE netbsd-3-0-RELEASE netbsd-3-0-RC6 netbsd-3-0-RC5 netbsd-3-0-RC4 netbsd-3-0-RC3 netbsd-3-0-RC2 netbsd-3-0-RC1 yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.39 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.38 21-Apr-2004 matt

branches: 1.38.4; 1.38.6;
ANSI-fy and some additional de-__P and constification.


# 1.37 21-Apr-2004 matt

Constify if.c radix.c and route.c (and fix related fallout).


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.36 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.35 29-Jun-2003 fvdl

branches: 1.35.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.34 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.33 18-Jan-2003 wiz

bandwidth, not bandwith.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.32 12-Nov-2002 itojun

remove all entries in rt timer queue on ip_mtudisc change, instead of
destroying the queue.


# 1.31 12-Nov-2002 itojun

add an argument to rt_timer_remove_all(), to specify if we need to call
timeout routine on removal.


# 1.30 02-Nov-2002 perry

/*CONTCOND*/ while (0)'ed macros


Revision tags: kqueue-aftermerge kqueue-beforemerge netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base gehenna-devsw-base kqueue-base
# 1.29 12-May-2002 matt

branches: 1.29.4;
Eliminate more commons.


Revision tags: eeh-devprop-base newlock-base ifpoll-base thorpej-mips-cache-base thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.28 08-Mar-2001 enami

branches: 1.28.2;
- lineup comment.
- fix typo in comment.


# 1.27 21-Feb-2001 itojun

branches: 1.27.2;
use u_quad_t for rtstat.
not sure if it really matters, but short (32K) looks way too small given
recent fat pipes connecting *BSD boxes, and our great uptime :-).


# 1.26 27-Jan-2001 itojun

cleanup cloned route when parent route (RTF_CLONING) goes away.
adds rt_parent to link parent from child (like NRL did, ours do refcnt
rt_refcnt properly).

bsdi rt_walkbranch would speedup the processing, but since the code will not
be visited too frequently, the current code (with rt_walktree) should be okay.


# 1.25 27-Jan-2001 itojun

mark cloned routes with RTF_CLONED. present it with netstat -r by "c".

let static routes overwrite cloned routes, as cloned routes can come back again
if necessary. behavior same as freebsd/bsdi, code partially from bsdi42.
(NRL rt->rt_parent was not added)
should fix PR 11916 and maybe some other PRs with ARP behavior.

recompilation of usr.sbin/route6d is suggested.


# 1.24 17-Jan-2001 itojun

pull post-4.4BSD change to sys/net/route.c from BSD/OS 4.2 (UCB copyrighted).

have sys/net/route.c:rtrequest1(), which takes rt_addrinfo * as the argument.
pass rt_addrinfo all the way down to rtrequest, and ifa->ifa_rtrequest.
3rd arg of ifa->ifa_rtrequest is now rt_addrinfo * instead of sockaddr *
(almost noone is using it anyways).

benefit: the follwoing command now works. previously we need two route(8)
invocations, "add" then "change".
# route add -inet6 default ::1 -ifp gif0

remove unsafe typecast in rtrequest(), from rtentry * to sockaddr *. it was
introduced by 4.3BSD-reno and never corrected.

XXX is eon_rtrequest() change correct regarding to 3rd arg?
eon_rtrequest() and rtrequest() were incorrect since 4.3BSD-reno,
so i do not have correct answer in the source code.
someone with more clue about netiso-over-ip, please help.


# 1.23 09-Dec-2000 itojun

update icmp6 too big validation. the change is necessary since pmtud is
mandatory for IPv6 (so we can't just validate by using connected pcb - we need
to allow traffic from unconnected pcb to do pmtud).
- if the traffic is validated by xx_ctlinput, allow up to "hiwat" pmtud
route entries.
- if the traffic was not validated by xx_ctlinput, allow up to "lowat" pmtud
route entries (there's upper limit, so bad guys cannot blow up our routing
table).
sync with kame

XXX need to think again about default hiwat/lowat value.
XXX victim selection to help starvation case


Revision tags: netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.22 04-May-2000 ragge

branches: 1.22.4;
Change rt_refcnt from short to int, to allow more than 32k routes thru
one interface without unexpected side effects.


# 1.21 06-Mar-2000 thorpej

- Add link status to if_data, so that routing daemons and other interested
parties can easily know the state of a link.
- Define an interface announcement message for the routing socket so that
routing daemons and other interested parties know when an interface
is attached/detached.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.20 19-Nov-1999 bouyer

Update protocoles and interfaces stats counters to 64bit.
RTM_IFINFO is now 0xf, 0xe is RTM_OIFINFO which returns the old (if_msghdr14)
struct with 32bit counters (binary compat, conditioned on COMPAT_14).
Same for sysctl: node 3 is renamed NET_RT_OIFLIST, NET_RT_IFLIST is now node 4.
Change rt_msg1() to add an mbuf to the mbuf chain instead of just panic()
when the message is larger than MHLEN.


Revision tags: comdex-fall-1999-base fvdl-softdep-base chs-ubc2-base
# 1.19 30-Jul-1999 itojun

branches: 1.19.2; 1.19.8;
remove reference to in6_systm.h (file itself will be removed afterwords)


# 1.18 01-Jul-1999 itojun

IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


Revision tags: netbsd-1-4-PATCH003 netbsd-1-4-PATCH002 netbsd-1-4-PATCH001 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.17 27-Dec-1998 thorpej

branches: 1.17.4; 1.17.6;
Simplify the rttimer code somewhat; use TAILQs instead of CIRCLEQs (we
didn't really need to traverse the queues backwards anyhow), and other
minor code simplification.


# 1.16 10-Dec-1998 christos

IPX counters and centralize statistics routine.


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.15 25-Aug-1998 thorpej

Use do { ... } while (0) in RTFREE().


Revision tags: eeh-paddr_t-base
# 1.14 02-May-1998 thorpej

Need <sys/socket.h> to stand alone.


# 1.13 29-Apr-1998 thorpej

Oops, we depend on <sys/queue.h>.


# 1.12 29-Apr-1998 kml

Add generic route timeout functionality; used by path MTU discovery code


Revision tags: netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base thorpej-signal-base marc-pcmcia-bp marc-pcmcia-base
# 1.11 02-Apr-1997 christos

branches: 1.11.8;
Sync with Lite2.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.10 22-May-1996 mycroft

Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.9 13-Feb-1996 christos

branches: 1.9.4;
Net prototypes


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.8 26-Mar-1995 jtc

KERNEL -> _KERNEL


# 1.7 08-Mar-1995 cgd

fixed sized types, where appropriate. when casting pointers to
integers to do math on them, cast to long. ioctl commands are
u_longs.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.6 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.5 13-May-1994 mycroft

Update to 4.4-Lite networking code, with a few local changes.


# 1.4 11-May-1994 mycroft

Update to RTM version 3. Add prototypes. Add some new constants which are
not used yet.


Revision tags: magnum-base netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.3 20-May-1993 cgd

add rcs ids to everything, and clean up headers


# 1.2 19-Apr-1993 mycroft

Add consistent multiple-inclusion protection.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.123 29-Apr-2019 roy

Introduce rt_addrmsg_src which adds RTA_AUTHOR to the message.
Use this when we notify userland of a duplicate address
and set RTA_AUTHOR to the hardware address of the sender.

While here, match the logging diagnostic of INET6 to the simpler one
of INET so it's consistent.


# 1.122 29-Apr-2019 roy

rtsock: Route address message simplification

Rename rt_newaddrmsg to rt_addrmsg_rt.
Add rt_addrmsg which drops the error and route arguments which are only
needed by one caller.


# 1.121 29-Apr-2019 pgoyette

For the rtsock compat code, make sure we create the "oroute" sysctl
tree. Otherwise a 5.2 version of getifaddrs(2) gets errors.

This makes the 5.2 version of ifconfig(8) behave the same on both
NetBSD-8 and -current. HOWEVER, both of them print nothing (for
``ifconfig -l'' command) so there's still a bug somewhere.

As reported originally by der Mouse.


Revision tags: isaki-audio2-base pgoyette-compat-20190127 pgoyette-compat-20190118 pgoyette-compat-1226 pgoyette-compat-1126
# 1.120 30-Oct-2018 ozaki-r

Avoid double rt_replace_ifa on rtrequest1(RTM_ADD)

Some callers of rtrequest1(RTM_ADD) adjust rt_ifa of an rtentry created by
rtrequest1 that may change rt_ifa (in ifa_rtrequest) with another ifa that is
different from requested one. It's wasteful and even worse introduces a race
condition. rtrequest1 should just use a passed ifa as is if a caller hopes so.


Revision tags: pgoyette-compat-1020 pgoyette-compat-0930 pgoyette-compat-0906 pgoyette-compat-0728 phil-wifi-base pgoyette-compat-0625 pgoyette-compat-0521 pgoyette-compat-0502 pgoyette-compat-0422
# 1.119 19-Apr-2018 christos

s/static inline/static __inline/g for consistency.


Revision tags: pgoyette-compat-0415
# 1.118 12-Apr-2018 ozaki-r

Resolve tangled lock dependencies in route.c

This change sweeps remaining lock decisions based on if locked or not by moving
utility functions of rtentry updates from rtsock.c and ensuring holding the
rt_lock. It also improves the atomicity of a update of a rtentry.


Revision tags: pgoyette-compat-0407 pgoyette-compat-0330 pgoyette-compat-0322 pgoyette-compat-0315 pgoyette-compat-base
# 1.117 09-Jan-2018 christos

branches: 1.117.2;
Use a queue of deferred entries to delete routes instead of a fixed stack
of 10. Otherwise we can overflow in route deletions from the rexmit timer.
XXX: pullup-8


# 1.116 18-Dec-2017 ozaki-r

Show ARP/NDP caches as LLINFO not LLDATA for backward compatiblity


# 1.115 13-Dec-2017 christos

Add bit definitions for flags so that route(8) can use them.


Revision tags: tls-maxphys-base-20171202
# 1.114 21-Sep-2017 ozaki-r

Invalidate rtcache based on a global generation counter

The change introduces a global generation counter that is incremented when any
routes have been added or deleted. When a rtcache caches a rtentry into itself,
it also stores a snapshot of the generation counter. If the snapshot equals to
the global counter, the cache is still valid, otherwise invalidated.

One drawback of the change is that all rtcaches of all protocol families are
invalidated when any routes of any protocol families are added or deleted.
If that matters, we should have separate generation counters based on
protocol families.

This change removes LIST_ENTRY from struct route, which fixes a part of
PR kern/52515.


Revision tags: nick-nhusb-base-20170825 perseant-stdc-iso10646-base
# 1.113 16-Jun-2017 ozaki-r

Sending a routing message (RTM_ADD) on adding an llentry

A message used to be sent on adding a cloned route. Restore the
behavior for backward compatibility.

Requested by ryo@


Revision tags: netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.112 11-Apr-2017 roy

branches: 1.112.4;
Add RO_MSGFILTER socket option to PF_ROUTE to filter out
un-wanted route(4) messages.

Inspired by the ROUTE_MSGFILTER equivalent in OpenBSD,
but with an API which allows the full range of potential message types.


Revision tags: jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107
# 1.111 19-Dec-2016 roy

branches: 1.111.2;
Fix gcc complaining about int to unsigned long conversion issues by
explictly marking as unsigned in RT_ROUNDUP2.


# 1.110 16-Dec-2016 christos

Can't hide stuff from userland, because struct route is embedded in other
structures (like inpcb) and things like fstat stop working.


# 1.109 12-Dec-2016 ozaki-r

Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.108 08-Dec-2016 ozaki-r

Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.


Revision tags: nick-nhusb-base-20161204
# 1.107 15-Nov-2016 ozaki-r

Don't use rt_walktree to delete routes

Some functions use rt_walktree to scan the routing table and delete
matched routes. However, we shouldn't use rt_walktree to delete
routes because rt_walktree is recursive to the routing table (radix
tree) and isn't friendly to MP-ification. rt_walktree allows a caller
to pass a callback function to delete an matched entry. The callback
function is called from an API of the radix tree (rn_walktree) but
also calls an API of the radix tree to delete an entry.

This change adds a new API of the radix tree, rn_search_matched,
which returns a matched entry that is selected by a callback
function passed by a caller and the caller itself deletes the
entry. By using the API, we can avoid the recursive form.


Revision tags: pgoyette-localcount-20161104
# 1.106 25-Oct-2016 ozaki-r

Remove unnecessary argument

No functional change.


# 1.105 21-Oct-2016 ozaki-r

Make some rt_timer functions and variables static

No functional change.


# 1.104 18-Oct-2016 ozaki-r

Remove unused rtcache_lookup_noclone


Revision tags: nick-nhusb-base-20161004
# 1.103 21-Sep-2016 roy

Add ifam_pid and ifam_addrflags to ifa_msghdr.
Re-version RTM_NEWADDR, RTM_DELADDR, RTM_CHGADDR and NET_RT_IFLIST.
Add compat code for old version.


Revision tags: localcount-20160914 pgoyette-localcount-20160806
# 1.102 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


Revision tags: pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529
# 1.101 28-Apr-2016 ozaki-r

branches: 1.101.2;
Constify rtentry of if_output

We no longer need to change rtentry below if_output.

The change makes it clear where rtentries are changed (or not)
and helps forthcoming locking (os psrefing) rtentries.


# 1.100 26-Apr-2016 ozaki-r

Stop using rt_gwroute on packet sending paths

rt_gwroute of rtentry is a reference to a rtentry of the gateway
for a rtentry with RTF_GATEWAY. That was used by L2 (arp and ndp)
to look up L2 addresses. By separating L2 nexthop caches, we don't
need a route for the purpose and we can stop using rt_gwroute.
By doing so, we can reduce referencing and modifying rtentries,
which makes it easy to apply a lock (and/or psref) to the
routing table and rtentries.

One issue to do this is to keep RTF_REJECT behavior. It seems it
was broken when we moved rtalloc1 things from L2 output routines
(e.g., ether_output) to ip_hresolv_output, but (fortunately?)
it works unexpectedly. What we mistook are:
- RTF_REJECT was checked for any routes in L2 output routines,
but in ip_hresolv_output it is checked only when the route
is RTF_GATEWAY
- The RTF_REJECT check wasn't copied to IPv6 (nd6_output)

It seems that rt_gwroute checks hid the mistakes and it looked
work (unexpectedly) and removing rt_gwroute checks unveil the
issue. So we need to fix RTF_REJECT checks in ip_hresolv_output
and also add them to nd6_output.

One more point we have to care is returning an errno; we need
to mimic looutput behavior. Originally RTF_REJECT check was
done either in L2 output routines or in looutput. The latter is
applied when a reject route directs to a loopback interface.
However, now RTF_REJECT check is done before looutput so to keep
the original behavior we need to return an errno which looutput
chooses. Added rt_check_reject_route does such tweaks.


Revision tags: nick-nhusb-base-20160422
# 1.99 11-Apr-2016 ozaki-r

Don't use radix tree API directly


# 1.98 04-Apr-2016 ozaki-r

Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html


# 1.97 24-Mar-2016 ozaki-r

Constify rt_newmsg's arguments


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.96 02-Sep-2015 ozaki-r

Do rt_refcnt++ when set a rtentry to another rtentry's rt_gwroute

And also do rtfree when deref a rtentry from rt_gwroute.


# 1.95 31-Aug-2015 ozaki-r

Hook up lltable/llentry with the kernel (and rumpkernel)

It is built and initialized on bootup, but there is no user for now.

Most codes in in.c are imported from FreeBSD as well as lltable/llentry.


# 1.94 31-Aug-2015 ozaki-r

Make rt_refcnt take into account rt_timer


# 1.93 24-Aug-2015 ozaki-r

Add an assertion; if rtcache has an rtentry, its refcnt must be > 0


# 1.92 17-Jul-2015 ozaki-r

Reform use of rt_refcnt

rt_refcnt of rtentry was used in bad manners, for example, direct rt_refcnt++
and rt_refcnt-- outside route.c, "rt->rt_refcnt++; rtfree(rt);" idiom, and
touching rt after rt->rt_refcnt--.

These abuses seem to be needed because rt_refcnt manages only references
between rtentry and doesn't take care of references during packet processing
(IOW references from local variables). In order to reduce the above abuses,
the latter cases should be counted by rt_refcnt as well as the former cases.

This change improves consistency of use of rt_refcnt:
- rtentry is always accessed with rt_refcnt incremented
- rtentry's rt_refcnt is decremented after use (rtfree is always used instead
of rt_refcnt--)
- functions returning rtentry increment its rt_refcnt (and caller rtfree it)

Note that rt_refcnt prevents rtentry from being freed but doesn't prevent
rtentry from being updated. Toward MP-safe, we need to provide another
protection for rtentry, e.g., locks. (Or introduce a better data structure
allowing concurrent readers during updates.)


Revision tags: nick-nhusb-base-20150606
# 1.91 30-Apr-2015 ozaki-r

Make some functions static

- rtflushall
- rtcache_clear
- rtcache_invalidate

And pull these static inline functions in route.c

- rt_destroy
- rt_setkey


Revision tags: nick-nhusb-base-20150406
# 1.90 06-Apr-2015 ozaki-r

Classify and sort prototype declarations

No functional change.


# 1.89 06-Apr-2015 ozaki-r

Make rt_maskedcopy static


# 1.88 23-Mar-2015 roy

Add RTF_BROADCAST to mark routes used for the broadcast address when
they are created on the fly. This makes it clear what the route is for
and allows an optimisation in ip_output() by avoiding a call to
in_broadcast() because most of the time we do talk to a host.
It also avoids a needless allocation for the storage of llinfo_arp and
thus vanishes from arp(8) - it showed as incomplete anyway so this
is a nice side effect.

Guard against this and routes marked with RTF_BLACKHOLE in
ip_fastforward().
While here, guard against routes marked with RTF_BLACKHOLE in
ip6_fastforward().
RTF_BROADCAST is IPv4 only, so don't bother checking that here.


# 1.87 26-Feb-2015 roy

Introduce the routing flag RTF_LOCAL to track local address routes.
Add functions rt_ifa_addlocal() and rt_ifa_remlocal() to add and remove
local routes for the address and announce the new address and route
to the routing socket.

Add in_ifaddlocal() and in_ifremlocal() to use these functions.
Rename in6_if{add,rem}loop() to in6_if{add,rem}local() and use these
functions.

rtinit() no longer announces the address, just the network route for the
address. As such, calls to rt_newaddrmsg() have been removed from
in_addprefix() and in_scrubprefix().

This solves the problem of potentially more than one announcement, or no
announcement at all for the address in certain situations.


# 1.86 25-Feb-2015 roy

Rename nd6_rtmsg() to rt_newmsg() and move into the generic routing code
as it's not IPv6 specific and will be used elsewhere.


# 1.85 24-Feb-2015 roy

Clean comments and style.


Revision tags: netbsd-7-2-RELEASE netbsd-7-1-2-RELEASE netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.84 06-Jun-2014 rmind

branches: 1.84.4;
- Eliminate RTFREE() macro in favour of rtfree() function.
- Make rtcache() function static.


Revision tags: yamt-pagecache-base9 rmind-smpnet-nbase rmind-smpnet-base
# 1.83 26-Apr-2014 pooka

It's been > 20years since rtioctl() did something. Let's just
remove that special way of returning EOPNOTSUPP.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base
# 1.82 01-Mar-2013 joerg

branches: 1.82.6; 1.82.10;
Retire OSI network stack. OK core@


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6 jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.81 18-Feb-2012 rmind

branches: 1.81.2;
rt_setkey: remove invalid assert, sockaddr_dup() may fail if no memory.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.80 11-Nov-2011 gdt

branches: 1.80.4;
Move RTF_ANNOUNCE flag so that it no longer conflicts with RTF_PROTO2.

RTF_ANNOUNCE was defined as RTF_PROTO2. The flag is used to indicated
that host should act as a proxy for a link level arp or ndp request.
(If RTF_PROTO2 is used as an experimental flag (as advertised),
various problems can occur.)

This commit provides a first-class definition with its own bit for
RTF_ANNOUNCE, removes the old aliasing definitions, and adds support
for the new RTF_ANNOUNCE flag to netstat(8) and route(8).,

Also, remove unused RTF_ flags that collide with RTF_PROTO1:
netinet/icmp6.h defined RTF_PROBEMTU as RTF_PROTO1
netinet/if_inarp.h defined RTF_USETRAILERS as RTF_PROTO1
(Neither of these flags are used anywhere. Both have been removed
to reduce chances of collision with RTF_PROTO1.)

Figuring this out and the diff are the work of Beverly Schwartz of
BBN.

(Passed release build, boot in VM, with no apparently related atf
failures.)

Approved for Public Release, Distribution Unlimited
This material is based upon work supported by the Defense Advanced
Research Projects Agency and Space and Naval Warfare Systems Center,
Pacific, under Contract No. N66001-09-C-2073.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.79 31-Mar-2011 dyoung

branches: 1.79.4;
Hide the radix-trie implementation of the forwarding table so that we
will have an easier time replacing it with something different, even if
it is a second radix-trie implementation.

sys/net/route.c and sys/net/rtsock.c no longer operate directly on
radix_nodes or radix_node_heads.

Hopefully this will reduce the temptation to implement multipath or
source-based routing using grotty hacks to the grotty old radix-trie
code, too. :-)


Revision tags: bouyer-quota2-nbase bouyer-quota2-base
# 1.78 01-Feb-2011 matt

Add a new AF/PF_ROUTE which is 64-bit clean which makes the routing socket
interface (and its associated sysctls) act identically for both 32 and 64 bit
programs. The old unclean one remains for backward compatibility.


# 1.77 26-Jan-2011 dyoung

Update comment on RTM_CHGADDR to describe better what it's for.


Revision tags: jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.76 12-Nov-2010 roy

branches: 1.76.2; 1.76.4;
Add RTM_CHGADDR to signal that an address on the interface has changed.
This is mainly used for notifying userland about active link address changes.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.75 26-Jun-2010 kefren

Add MPLS support, proposed on tech-net@ a couple of days ago

Welcome to 5.99.33


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211
# 1.74 03-Nov-2009 dyoung

branches: 1.74.2; 1.74.4;
s/u_quad_t/uint64_t/.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 jym-xensuspend-nbase yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.73 02-Apr-2009 christos

Centralize the ROUNDUP and ADVANCE macro in a header file, give them an
RT_ prefix and use them appropriately, instead of making copies. Make
pppd use the RT_ROUNDUP macro; fixes proxyarp setting on 64 bit hosts.

XXX: All this should be pulled up to 5.0


Revision tags: nick-hppapmap-base2 mjf-devfs2-base
# 1.72 11-Jan-2009 christos

branches: 1.72.2;
merge christos-time_t


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base christos-time_t-nbase haad-dm-base christos-time_t-base
# 1.71 07-Nov-2008 dyoung

*** Summary ***

When a link-layer address changes (e.g., ifconfig ex0 link
02:de:ad:be:ef:02 active), send a gratuitous ARP and/or a Neighbor
Advertisement to update the network-/link-layer address bindings
on our LAN peers.

Refuse a change of ethernet address to the address 00:00:00:00:00:00
or to any multicast/broadcast address. (Thanks matt@.)

Reorder ifnet ioctl operations so that driver ioctls may inherit
the functions of their "class"---ether_ioctl(), fddi_ioctl(), et
cetera---and the class ioctls may inherit from the generic ioctl,
ifioctl_common(), but both driver- and class-ioctls may override
the generic behavior. Make network drivers share more code.

Distinguish a "factory" link-layer address from others for the
purposes of both protecting that address from deletion and computing
EUI64.

Return consistent, appropriate error codes from network drivers.

Improve readability. KNF.

*** Details ***

In if_attach(), always initialize the interface ioctl routine,
ifnet->if_ioctl, if the driver has not already initialized it.
Delete if_ioctl == NULL tests everywhere else, because it cannot
happen.

In the ioctl routines of network interfaces, inherit common ioctl
behaviors by calling either ifioctl_common() or whichever ioctl
routine is appropriate for the class of interface---e.g., ether_ioctl()
for ethernets.

Stop (ab)using SIOCSIFADDR and start to use SIOCINITIFADDR. In
the user->kernel interface, SIOCSIFADDR's argument was an ifreq,
but on the protocol->ifnet interface, SIOCSIFADDR's argument was
an ifaddr. That was confusing, and it would work against me as I
make it possible for a network interface to overload most ioctls.
On the protocol->ifnet interface, replace SIOCSIFADDR with
SIOCINITIFADDR. In ifioctl(), return EPERM if userland tries to
invoke SIOCINITIFADDR.

In ifioctl(), give the interface the first shot at handling most
interface ioctls, and give the protocol the second shot, instead
of the other way around. Finally, let compatibility code (COMPAT_OSOCK)
take a shot.

Pull device initialization out of switch statements under
SIOCINITIFADDR. For example, pull ..._init() out of any switch
statement that looks like this:

switch (...->sa_family) {
case ...:
..._init();
...
break;
...
default:
..._init();
...
break;
}

Rewrite many if-else clauses that handle all permutations of IFF_UP
and IFF_RUNNING to use a switch statement,

switch (x & (IFF_UP|IFF_RUNNING)) {
case 0:
...
break;
case IFF_RUNNING:
...
break;
case IFF_UP:
...
break;
case IFF_UP|IFF_RUNNING:
...
break;
}

unifdef lots of code containing #ifdef FreeBSD, #ifdef NetBSD, and
#ifdef SIOCSIFMTU, especially in fwip(4) and in ndis(4).

In ipw(4), remove an if_set_sadl() call that is out of place.

In nfe(4), reuse the jumbo MTU logic in ether_ioctl().

Let ethernets register a callback for setting h/w state such as
promiscuous mode and the multicast filter in accord with a change
in the if_flags: ether_set_ifflags_cb() registers a callback that
returns ENETRESET if the caller should reset the ethernet by calling
if_init(), 0 on success, != 0 on failure. Pull common code from
ex(4), gem(4), nfe(4), sip(4), tlp(4), vge(4) into ether_ioctl(),
and register if_flags callbacks for those drivers.

Return ENOTTY instead of EINVAL for inappropriate ioctls. In
zyd(4), use ENXIO instead of ENOTTY to indicate that the device is
not any longer attached.

Add to if_set_sadl() a boolean 'factory' argument that indicates
whether a link-layer address was assigned by the factory or some
other source. In a comment, recommend using the factory address
for generating an EUI64, and update in6_get_hw_ifid() to prefer a
factory address to any other link-layer address.

Add a routing message, RTM_LLINFO_UPD, that tells protocols to
update the binding of network-layer addresses to link-layer addresses.
Implement this message in IPv4 and IPv6 by sending a gratuitous
ARP or a neighbor advertisement, respectively. Generate RTM_LLINFO_UPD
messages on a change of an interface's link-layer address.

In ether_ioctl(), do not let SIOCALIFADDR set a link-layer address
that is broadcast/multicast or equal to 00:00:00:00:00:00.

Make ether_ioctl() call ifioctl_common() to handle ioctls that it
does not understand.

In gif(4), initialize if_softc and use it, instead of assuming that
the gif_softc and ifp overlap.

Let ifioctl_common() handle SIOCGIFADDR.

Sprinkle rtcache_invariants(), which checks on DIAGNOSTIC kernels
that certain invariants on a struct route are satisfied.

In agr(4), rewrite agr_ioctl_filter() to be a bit more explicit
about the ioctls that we do not allow on an agr(4) member interface.

bzero -> memset. Delete unnecessary casts to void *. Use
sockaddr_in_init() and sockaddr_in6_init(). Compare pointers with
NULL instead of "testing truth". Replace some instances of (type
*)0 with NULL. Change some K&R prototypes to ANSI C, and join
lines.


Revision tags: netbsd-5-0-RC3 netbsd-5-0-RC2 netbsd-5-0-RC1 netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-baseX yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base yamt-nfs-mp-base yamt-pf42-base ad-socklock-base1
# 1.70 26-Mar-2008 ad

branches: 1.70.2; 1.70.6; 1.70.12; 1.70.14; 1.70.16;
Defer processing of routing messages to a soft interrupt. These can be
generated at IPL_VM and it's not safe to call directly into the socket
layer at that level. Reviewed by matt@.


Revision tags: yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase hpcarm-cleanup-base
# 1.69 20-Feb-2008 matt

branches: 1.69.2; 1.69.6;
s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)


Revision tags: mjf-devfs-base
# 1.68 11-Feb-2008 simonb

Don't look for <stdbool.h> if compiling _STANDALONE as well.


Revision tags: bouyer-xeni386-nbase
# 1.67 21-Jan-2008 dyoung

struct route is part of the kernel ABI (!!!), so move it back
outside of #ifdef _KERNEL. #include stdbool.h if !_KERNEL.


# 1.66 21-Jan-2008 dyoung

Move struct route inside of #ifdef _KERNEL to protect userland from
it.


# 1.65 21-Jan-2008 dyoung

In rtflushall(), do not clear a route cache by removing its rtentry
reference, but mark the cache 'invalid'. Let the next user of the
route cache check to whether or not the cache is valid, and update
the rtentry reference if necessary. In this way, avoid hairy
splnet()/splx() protection of route caches, which I never did trust.


Revision tags: bouyer-xeni386-base
# 1.64 14-Jan-2008 dyoung

Use rtcache_validate() instead of rtcache_getrt(). Delete rtcache_getrt().

In rtcache_lookup2(), use the return values of rtcache_validate()
and _rtcache_init() instead of looking at _ro_rt. Also, check the
return code of rtcache_setdst() for an error.


# 1.63 12-Jan-2008 dyoung

Good-bye, rtcache_check(). Call both rtcache_validate() and
rtcache_update(,1) instead of rtcache_check().


# 1.62 11-Jan-2008 dyoung

Cosmetic: remove redundant 'not' from a comment, re-wrap lines.


# 1.61 10-Jan-2008 dyoung

Make many void rtcache_X() routines return struct rtentry *, so
that we can make many back-to-back rtcache_X();rtcache_getrt()
calls into one rtcache_X() call.


Revision tags: matt-armv6-base
# 1.60 04-Jan-2008 dyoung

Replace rtcache_down() with rtcache_validate() and update rtcache_down()
uses.


Revision tags: vmlocking2-base3
# 1.59 20-Dec-2007 dyoung

Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.


Revision tags: nick-csl-alignment-base5 matt-armv6-prevmlocking yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base jmcneill-pm-base reinoud-bufcleanup-base vmlocking-base
# 1.58 27-Aug-2007 dyoung

branches: 1.58.2; 1.58.8; 1.58.10; 1.58.14;
Add a new routing message type, RTM_SETGATE. We can use an
RTM_SETGATE message to ask the link layer to fill in the link-layer
nexthop before we try to detect a duplicate route in a multipath-capable
kernel.


Revision tags: matt-mips64-base
# 1.57 19-Jul-2007 dyoung

branches: 1.57.4; 1.57.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.56 09-Jun-2007 dyoung

branches: 1.56.2;
Get rid of radix_node_head.rnh_walktree, because it is only ever
set to rn_walktree.

Introduce rt_walktree(), which applies a subroutine to every route
in a particular address family. Use it instead of rn_walktree()
virtually everywhere. This helps to hide the routing table
implementation.


Revision tags: yamt-idlelwp-base8
# 1.55 06-May-2007 dyoung

Factor rtcache_lookup2() out of rtcache_lookup1(), for re-use in
the IPv6 stack. rtcache_lookup2() takes an int * argument that it
writes with 1 if we had a cache 'hit', 0 if there was a cache
'miss'.


# 1.54 02-May-2007 dyoung

Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


# 1.53 22-Apr-2007 xtraeme

rtcache_clear is defined as static void in route.c, but it's used
in netinet/in_route.c. Move the prototype into route.h to fix
the build.


Revision tags: thorpej-atomic-base
# 1.52 04-Mar-2007 christos

branches: 1.52.2; 1.52.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.51 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.50 05-Jan-2007 joerg

branches: 1.50.2;
Add a debug option for the route cache to help tracing down issues
like PR 35272 and 35318. When the kernel is compiled with
-DRTCACHE_DEBUG, all rtcache entries are logged to a list with the place
they got initialised. This allows overwrites, double inits and other
manual messing to be detected.


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.49 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.48 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


# 1.47 07-Dec-2006 joerg

Deinline rt_get_ifa. Keep it in route.c as it is part of the routing
API, even though rtsock.c is the only user right now.


# 1.46 07-Dec-2006 joerg

Deinline rt_replace_ifa and move rt_set_ifa and rt_set_ifa1 to
route.c as they are not used outside that file.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base netbsd-4-base
# 1.45 13-Nov-2006 dyoung

Fix bugs in rt_get_ifa() and put aside the sequence number stuff,
which isn't ready for primetime yet.


# 1.44 13-Nov-2006 dyoung

Add a source-address selection policy mechanism to the kernel.

Also, add ioctls SIOCGIFADDRPREF/SIOCSIFADDRPREF to get/set preference
numbers for addresses. Make ifconfig(8) set/display preference
numbers.

To activate source-address selection policies in your kernel, add
'options IPSELSRC' to your kernel configuration.

Miscellaneous changes in support of source-address selection:

1 Factor out some common code, producing rt_replace_ifa().

2 Abbreviate a for-loop with TAILQ_FOREACH().

3 Add the predicates on IPv4 addresses IN_LINKLOCAL() and
IN_PRIVATE(), that are true for link-local unicast
(169.254/16) and RFC1918 private addresses, respectively.
Add the predicate IN_ANY_LOCAL() that is true for link-local
unicast and multicast.

4 Add IPv4-specific interface attach/detach routines,
in_domifattach and in_domifdetach, which build #ifdef
IPSELSRC.

See in_getifa(9) for a more thorough description of source-address
selection policy.


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base2 yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base simonb-timcounters-final yamt-pdpolicy-base5 chap-midi-base yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 elad-kernelauth-base yamt-pdpolicy-base yamt-uio_vmspace-base5 simonb-timecounters-base rpaulo-netinet-merge-pcb-base
# 1.43 11-Dec-2005 christos

branches: 1.43.20; 1.43.22;
merge ktrace-lwp.


Revision tags: ktrace-lwp-base
# 1.42 10-Dec-2005 elad

Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.41 22-Jun-2005 dyoung

branches: 1.41.2;
Resolve conflicts in importation of 18-May-2005 ath(4) / net80211(9)
from FreeBSD. Introduce compatibility shims (sys/dev/ic/ath_netbsd.[ch],
sys/net80211/ieee80211_netbsd.[ch]). Update drivers (an, atu, atw,
awi, ipw, iwi, rtw, wi) for the new net80211(9) API.


# 1.40 29-May-2005 christos

- sprinkle const
- remove unneeded casts
- use more mem*() instead of b*() funcs.


Revision tags: netbsd-3-1-1-RELEASE netbsd-3-0-3-RELEASE netbsd-3-1-RELEASE netbsd-3-0-2-RELEASE netbsd-3-1-RC4 netbsd-3-1-RC3 netbsd-3-1-RC2 netbsd-3-1-RC1 netbsd-3-0-1-RELEASE netbsd-3-0-RELEASE netbsd-3-0-RC6 netbsd-3-0-RC5 netbsd-3-0-RC4 netbsd-3-0-RC3 netbsd-3-0-RC2 netbsd-3-0-RC1 yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.39 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.38 21-Apr-2004 matt

branches: 1.38.4; 1.38.6;
ANSI-fy and some additional de-__P and constification.


# 1.37 21-Apr-2004 matt

Constify if.c radix.c and route.c (and fix related fallout).


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.36 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.35 29-Jun-2003 fvdl

branches: 1.35.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.34 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.33 18-Jan-2003 wiz

bandwidth, not bandwith.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.32 12-Nov-2002 itojun

remove all entries in rt timer queue on ip_mtudisc change, instead of
destroying the queue.


# 1.31 12-Nov-2002 itojun

add an argument to rt_timer_remove_all(), to specify if we need to call
timeout routine on removal.


# 1.30 02-Nov-2002 perry

/*CONTCOND*/ while (0)'ed macros


Revision tags: kqueue-aftermerge kqueue-beforemerge netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base gehenna-devsw-base kqueue-base
# 1.29 12-May-2002 matt

branches: 1.29.4;
Eliminate more commons.


Revision tags: eeh-devprop-base newlock-base ifpoll-base thorpej-mips-cache-base thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.28 08-Mar-2001 enami

branches: 1.28.2;
- lineup comment.
- fix typo in comment.


# 1.27 21-Feb-2001 itojun

branches: 1.27.2;
use u_quad_t for rtstat.
not sure if it really matters, but short (32K) looks way too small given
recent fat pipes connecting *BSD boxes, and our great uptime :-).


# 1.26 27-Jan-2001 itojun

cleanup cloned route when parent route (RTF_CLONING) goes away.
adds rt_parent to link parent from child (like NRL did, ours do refcnt
rt_refcnt properly).

bsdi rt_walkbranch would speedup the processing, but since the code will not
be visited too frequently, the current code (with rt_walktree) should be okay.


# 1.25 27-Jan-2001 itojun

mark cloned routes with RTF_CLONED. present it with netstat -r by "c".

let static routes overwrite cloned routes, as cloned routes can come back again
if necessary. behavior same as freebsd/bsdi, code partially from bsdi42.
(NRL rt->rt_parent was not added)
should fix PR 11916 and maybe some other PRs with ARP behavior.

recompilation of usr.sbin/route6d is suggested.


# 1.24 17-Jan-2001 itojun

pull post-4.4BSD change to sys/net/route.c from BSD/OS 4.2 (UCB copyrighted).

have sys/net/route.c:rtrequest1(), which takes rt_addrinfo * as the argument.
pass rt_addrinfo all the way down to rtrequest, and ifa->ifa_rtrequest.
3rd arg of ifa->ifa_rtrequest is now rt_addrinfo * instead of sockaddr *
(almost noone is using it anyways).

benefit: the follwoing command now works. previously we need two route(8)
invocations, "add" then "change".
# route add -inet6 default ::1 -ifp gif0

remove unsafe typecast in rtrequest(), from rtentry * to sockaddr *. it was
introduced by 4.3BSD-reno and never corrected.

XXX is eon_rtrequest() change correct regarding to 3rd arg?
eon_rtrequest() and rtrequest() were incorrect since 4.3BSD-reno,
so i do not have correct answer in the source code.
someone with more clue about netiso-over-ip, please help.


# 1.23 09-Dec-2000 itojun

update icmp6 too big validation. the change is necessary since pmtud is
mandatory for IPv6 (so we can't just validate by using connected pcb - we need
to allow traffic from unconnected pcb to do pmtud).
- if the traffic is validated by xx_ctlinput, allow up to "hiwat" pmtud
route entries.
- if the traffic was not validated by xx_ctlinput, allow up to "lowat" pmtud
route entries (there's upper limit, so bad guys cannot blow up our routing
table).
sync with kame

XXX need to think again about default hiwat/lowat value.
XXX victim selection to help starvation case


Revision tags: netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.22 04-May-2000 ragge

branches: 1.22.4;
Change rt_refcnt from short to int, to allow more than 32k routes thru
one interface without unexpected side effects.


# 1.21 06-Mar-2000 thorpej

- Add link status to if_data, so that routing daemons and other interested
parties can easily know the state of a link.
- Define an interface announcement message for the routing socket so that
routing daemons and other interested parties know when an interface
is attached/detached.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.20 19-Nov-1999 bouyer

Update protocoles and interfaces stats counters to 64bit.
RTM_IFINFO is now 0xf, 0xe is RTM_OIFINFO which returns the old (if_msghdr14)
struct with 32bit counters (binary compat, conditioned on COMPAT_14).
Same for sysctl: node 3 is renamed NET_RT_OIFLIST, NET_RT_IFLIST is now node 4.
Change rt_msg1() to add an mbuf to the mbuf chain instead of just panic()
when the message is larger than MHLEN.


Revision tags: comdex-fall-1999-base fvdl-softdep-base chs-ubc2-base
# 1.19 30-Jul-1999 itojun

branches: 1.19.2; 1.19.8;
remove reference to in6_systm.h (file itself will be removed afterwords)


# 1.18 01-Jul-1999 itojun

IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


Revision tags: netbsd-1-4-PATCH003 netbsd-1-4-PATCH002 netbsd-1-4-PATCH001 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.17 27-Dec-1998 thorpej

branches: 1.17.4; 1.17.6;
Simplify the rttimer code somewhat; use TAILQs instead of CIRCLEQs (we
didn't really need to traverse the queues backwards anyhow), and other
minor code simplification.


# 1.16 10-Dec-1998 christos

IPX counters and centralize statistics routine.


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.15 25-Aug-1998 thorpej

Use do { ... } while (0) in RTFREE().


Revision tags: eeh-paddr_t-base
# 1.14 02-May-1998 thorpej

Need <sys/socket.h> to stand alone.


# 1.13 29-Apr-1998 thorpej

Oops, we depend on <sys/queue.h>.


# 1.12 29-Apr-1998 kml

Add generic route timeout functionality; used by path MTU discovery code


Revision tags: netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base thorpej-signal-base marc-pcmcia-bp marc-pcmcia-base
# 1.11 02-Apr-1997 christos

branches: 1.11.8;
Sync with Lite2.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.10 22-May-1996 mycroft

Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.9 13-Feb-1996 christos

branches: 1.9.4;
Net prototypes


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.8 26-Mar-1995 jtc

KERNEL -> _KERNEL


# 1.7 08-Mar-1995 cgd

fixed sized types, where appropriate. when casting pointers to
integers to do math on them, cast to long. ioctl commands are
u_longs.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.6 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.5 13-May-1994 mycroft

Update to 4.4-Lite networking code, with a few local changes.


# 1.4 11-May-1994 mycroft

Update to RTM version 3. Add prototypes. Add some new constants which are
not used yet.


Revision tags: magnum-base netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.3 20-May-1993 cgd

add rcs ids to everything, and clean up headers


# 1.2 19-Apr-1993 mycroft

Add consistent multiple-inclusion protection.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.117 09-Jan-2018 christos

Use a queue of deferred entries to delete routes instead of a fixed stack
of 10. Otherwise we can overflow in route deletions from the rexmit timer.
XXX: pullup-8


# 1.116 18-Dec-2017 ozaki-r

Show ARP/NDP caches as LLINFO not LLDATA for backward compatiblity


# 1.115 13-Dec-2017 christos

Add bit definitions for flags so that route(8) can use them.


Revision tags: tls-maxphys-base-20171202
# 1.114 21-Sep-2017 ozaki-r

Invalidate rtcache based on a global generation counter

The change introduces a global generation counter that is incremented when any
routes have been added or deleted. When a rtcache caches a rtentry into itself,
it also stores a snapshot of the generation counter. If the snapshot equals to
the global counter, the cache is still valid, otherwise invalidated.

One drawback of the change is that all rtcaches of all protocol families are
invalidated when any routes of any protocol families are added or deleted.
If that matters, we should have separate generation counters based on
protocol families.

This change removes LIST_ENTRY from struct route, which fixes a part of
PR kern/52515.


Revision tags: nick-nhusb-base-20170825 perseant-stdc-iso10646-base
# 1.113 16-Jun-2017 ozaki-r

Sending a routing message (RTM_ADD) on adding an llentry

A message used to be sent on adding a cloned route. Restore the
behavior for backward compatibility.

Requested by ryo@


Revision tags: netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.112 11-Apr-2017 roy

branches: 1.112.4;
Add RO_MSGFILTER socket option to PF_ROUTE to filter out
un-wanted route(4) messages.

Inspired by the ROUTE_MSGFILTER equivalent in OpenBSD,
but with an API which allows the full range of potential message types.


Revision tags: jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107
# 1.111 19-Dec-2016 roy

branches: 1.111.2;
Fix gcc complaining about int to unsigned long conversion issues by
explictly marking as unsigned in RT_ROUNDUP2.


# 1.110 16-Dec-2016 christos

Can't hide stuff from userland, because struct route is embedded in other
structures (like inpcb) and things like fstat stop working.


# 1.109 12-Dec-2016 ozaki-r

Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.108 08-Dec-2016 ozaki-r

Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.


Revision tags: nick-nhusb-base-20161204
# 1.107 15-Nov-2016 ozaki-r

Don't use rt_walktree to delete routes

Some functions use rt_walktree to scan the routing table and delete
matched routes. However, we shouldn't use rt_walktree to delete
routes because rt_walktree is recursive to the routing table (radix
tree) and isn't friendly to MP-ification. rt_walktree allows a caller
to pass a callback function to delete an matched entry. The callback
function is called from an API of the radix tree (rn_walktree) but
also calls an API of the radix tree to delete an entry.

This change adds a new API of the radix tree, rn_search_matched,
which returns a matched entry that is selected by a callback
function passed by a caller and the caller itself deletes the
entry. By using the API, we can avoid the recursive form.


Revision tags: pgoyette-localcount-20161104
# 1.106 25-Oct-2016 ozaki-r

Remove unnecessary argument

No functional change.


# 1.105 21-Oct-2016 ozaki-r

Make some rt_timer functions and variables static

No functional change.


# 1.104 18-Oct-2016 ozaki-r

Remove unused rtcache_lookup_noclone


Revision tags: nick-nhusb-base-20161004
# 1.103 21-Sep-2016 roy

Add ifam_pid and ifam_addrflags to ifa_msghdr.
Re-version RTM_NEWADDR, RTM_DELADDR, RTM_CHGADDR and NET_RT_IFLIST.
Add compat code for old version.


Revision tags: localcount-20160914 pgoyette-localcount-20160806
# 1.102 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


Revision tags: pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529
# 1.101 28-Apr-2016 ozaki-r

branches: 1.101.2;
Constify rtentry of if_output

We no longer need to change rtentry below if_output.

The change makes it clear where rtentries are changed (or not)
and helps forthcoming locking (os psrefing) rtentries.


# 1.100 26-Apr-2016 ozaki-r

Stop using rt_gwroute on packet sending paths

rt_gwroute of rtentry is a reference to a rtentry of the gateway
for a rtentry with RTF_GATEWAY. That was used by L2 (arp and ndp)
to look up L2 addresses. By separating L2 nexthop caches, we don't
need a route for the purpose and we can stop using rt_gwroute.
By doing so, we can reduce referencing and modifying rtentries,
which makes it easy to apply a lock (and/or psref) to the
routing table and rtentries.

One issue to do this is to keep RTF_REJECT behavior. It seems it
was broken when we moved rtalloc1 things from L2 output routines
(e.g., ether_output) to ip_hresolv_output, but (fortunately?)
it works unexpectedly. What we mistook are:
- RTF_REJECT was checked for any routes in L2 output routines,
but in ip_hresolv_output it is checked only when the route
is RTF_GATEWAY
- The RTF_REJECT check wasn't copied to IPv6 (nd6_output)

It seems that rt_gwroute checks hid the mistakes and it looked
work (unexpectedly) and removing rt_gwroute checks unveil the
issue. So we need to fix RTF_REJECT checks in ip_hresolv_output
and also add them to nd6_output.

One more point we have to care is returning an errno; we need
to mimic looutput behavior. Originally RTF_REJECT check was
done either in L2 output routines or in looutput. The latter is
applied when a reject route directs to a loopback interface.
However, now RTF_REJECT check is done before looutput so to keep
the original behavior we need to return an errno which looutput
chooses. Added rt_check_reject_route does such tweaks.


Revision tags: nick-nhusb-base-20160422
# 1.99 11-Apr-2016 ozaki-r

Don't use radix tree API directly


# 1.98 04-Apr-2016 ozaki-r

Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html


# 1.97 24-Mar-2016 ozaki-r

Constify rt_newmsg's arguments


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.96 02-Sep-2015 ozaki-r

Do rt_refcnt++ when set a rtentry to another rtentry's rt_gwroute

And also do rtfree when deref a rtentry from rt_gwroute.


# 1.95 31-Aug-2015 ozaki-r

Hook up lltable/llentry with the kernel (and rumpkernel)

It is built and initialized on bootup, but there is no user for now.

Most codes in in.c are imported from FreeBSD as well as lltable/llentry.


# 1.94 31-Aug-2015 ozaki-r

Make rt_refcnt take into account rt_timer


# 1.93 24-Aug-2015 ozaki-r

Add an assertion; if rtcache has an rtentry, its refcnt must be > 0


# 1.92 17-Jul-2015 ozaki-r

Reform use of rt_refcnt

rt_refcnt of rtentry was used in bad manners, for example, direct rt_refcnt++
and rt_refcnt-- outside route.c, "rt->rt_refcnt++; rtfree(rt);" idiom, and
touching rt after rt->rt_refcnt--.

These abuses seem to be needed because rt_refcnt manages only references
between rtentry and doesn't take care of references during packet processing
(IOW references from local variables). In order to reduce the above abuses,
the latter cases should be counted by rt_refcnt as well as the former cases.

This change improves consistency of use of rt_refcnt:
- rtentry is always accessed with rt_refcnt incremented
- rtentry's rt_refcnt is decremented after use (rtfree is always used instead
of rt_refcnt--)
- functions returning rtentry increment its rt_refcnt (and caller rtfree it)

Note that rt_refcnt prevents rtentry from being freed but doesn't prevent
rtentry from being updated. Toward MP-safe, we need to provide another
protection for rtentry, e.g., locks. (Or introduce a better data structure
allowing concurrent readers during updates.)


Revision tags: nick-nhusb-base-20150606
# 1.91 30-Apr-2015 ozaki-r

Make some functions static

- rtflushall
- rtcache_clear
- rtcache_invalidate

And pull these static inline functions in route.c

- rt_destroy
- rt_setkey


Revision tags: nick-nhusb-base-20150406
# 1.90 06-Apr-2015 ozaki-r

Classify and sort prototype declarations

No functional change.


# 1.89 06-Apr-2015 ozaki-r

Make rt_maskedcopy static


# 1.88 23-Mar-2015 roy

Add RTF_BROADCAST to mark routes used for the broadcast address when
they are created on the fly. This makes it clear what the route is for
and allows an optimisation in ip_output() by avoiding a call to
in_broadcast() because most of the time we do talk to a host.
It also avoids a needless allocation for the storage of llinfo_arp and
thus vanishes from arp(8) - it showed as incomplete anyway so this
is a nice side effect.

Guard against this and routes marked with RTF_BLACKHOLE in
ip_fastforward().
While here, guard against routes marked with RTF_BLACKHOLE in
ip6_fastforward().
RTF_BROADCAST is IPv4 only, so don't bother checking that here.


# 1.87 26-Feb-2015 roy

Introduce the routing flag RTF_LOCAL to track local address routes.
Add functions rt_ifa_addlocal() and rt_ifa_remlocal() to add and remove
local routes for the address and announce the new address and route
to the routing socket.

Add in_ifaddlocal() and in_ifremlocal() to use these functions.
Rename in6_if{add,rem}loop() to in6_if{add,rem}local() and use these
functions.

rtinit() no longer announces the address, just the network route for the
address. As such, calls to rt_newaddrmsg() have been removed from
in_addprefix() and in_scrubprefix().

This solves the problem of potentially more than one announcement, or no
announcement at all for the address in certain situations.


# 1.86 25-Feb-2015 roy

Rename nd6_rtmsg() to rt_newmsg() and move into the generic routing code
as it's not IPv6 specific and will be used elsewhere.


# 1.85 24-Feb-2015 roy

Clean comments and style.


Revision tags: netbsd-7-1-1-RELEASE netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.84 06-Jun-2014 rmind

branches: 1.84.4;
- Eliminate RTFREE() macro in favour of rtfree() function.
- Make rtcache() function static.


Revision tags: yamt-pagecache-base9 rmind-smpnet-nbase rmind-smpnet-base
# 1.83 26-Apr-2014 pooka

It's been > 20years since rtioctl() did something. Let's just
remove that special way of returning EOPNOTSUPP.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base
# 1.82 01-Mar-2013 joerg

branches: 1.82.6; 1.82.10;
Retire OSI network stack. OK core@


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6 jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.81 18-Feb-2012 rmind

branches: 1.81.2;
rt_setkey: remove invalid assert, sockaddr_dup() may fail if no memory.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.80 11-Nov-2011 gdt

branches: 1.80.4;
Move RTF_ANNOUNCE flag so that it no longer conflicts with RTF_PROTO2.

RTF_ANNOUNCE was defined as RTF_PROTO2. The flag is used to indicated
that host should act as a proxy for a link level arp or ndp request.
(If RTF_PROTO2 is used as an experimental flag (as advertised),
various problems can occur.)

This commit provides a first-class definition with its own bit for
RTF_ANNOUNCE, removes the old aliasing definitions, and adds support
for the new RTF_ANNOUNCE flag to netstat(8) and route(8).,

Also, remove unused RTF_ flags that collide with RTF_PROTO1:
netinet/icmp6.h defined RTF_PROBEMTU as RTF_PROTO1
netinet/if_inarp.h defined RTF_USETRAILERS as RTF_PROTO1
(Neither of these flags are used anywhere. Both have been removed
to reduce chances of collision with RTF_PROTO1.)

Figuring this out and the diff are the work of Beverly Schwartz of
BBN.

(Passed release build, boot in VM, with no apparently related atf
failures.)

Approved for Public Release, Distribution Unlimited
This material is based upon work supported by the Defense Advanced
Research Projects Agency and Space and Naval Warfare Systems Center,
Pacific, under Contract No. N66001-09-C-2073.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.79 31-Mar-2011 dyoung

branches: 1.79.4;
Hide the radix-trie implementation of the forwarding table so that we
will have an easier time replacing it with something different, even if
it is a second radix-trie implementation.

sys/net/route.c and sys/net/rtsock.c no longer operate directly on
radix_nodes or radix_node_heads.

Hopefully this will reduce the temptation to implement multipath or
source-based routing using grotty hacks to the grotty old radix-trie
code, too. :-)


Revision tags: bouyer-quota2-nbase bouyer-quota2-base
# 1.78 01-Feb-2011 matt

Add a new AF/PF_ROUTE which is 64-bit clean which makes the routing socket
interface (and its associated sysctls) act identically for both 32 and 64 bit
programs. The old unclean one remains for backward compatibility.


# 1.77 26-Jan-2011 dyoung

Update comment on RTM_CHGADDR to describe better what it's for.


Revision tags: jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.76 12-Nov-2010 roy

branches: 1.76.2; 1.76.4;
Add RTM_CHGADDR to signal that an address on the interface has changed.
This is mainly used for notifying userland about active link address changes.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.75 26-Jun-2010 kefren

Add MPLS support, proposed on tech-net@ a couple of days ago

Welcome to 5.99.33


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211
# 1.74 03-Nov-2009 dyoung

branches: 1.74.2; 1.74.4;
s/u_quad_t/uint64_t/.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 jym-xensuspend-nbase yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.73 02-Apr-2009 christos

Centralize the ROUNDUP and ADVANCE macro in a header file, give them an
RT_ prefix and use them appropriately, instead of making copies. Make
pppd use the RT_ROUNDUP macro; fixes proxyarp setting on 64 bit hosts.

XXX: All this should be pulled up to 5.0


Revision tags: nick-hppapmap-base2 mjf-devfs2-base
# 1.72 11-Jan-2009 christos

branches: 1.72.2;
merge christos-time_t


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base christos-time_t-nbase haad-dm-base christos-time_t-base
# 1.71 07-Nov-2008 dyoung

*** Summary ***

When a link-layer address changes (e.g., ifconfig ex0 link
02:de:ad:be:ef:02 active), send a gratuitous ARP and/or a Neighbor
Advertisement to update the network-/link-layer address bindings
on our LAN peers.

Refuse a change of ethernet address to the address 00:00:00:00:00:00
or to any multicast/broadcast address. (Thanks matt@.)

Reorder ifnet ioctl operations so that driver ioctls may inherit
the functions of their "class"---ether_ioctl(), fddi_ioctl(), et
cetera---and the class ioctls may inherit from the generic ioctl,
ifioctl_common(), but both driver- and class-ioctls may override
the generic behavior. Make network drivers share more code.

Distinguish a "factory" link-layer address from others for the
purposes of both protecting that address from deletion and computing
EUI64.

Return consistent, appropriate error codes from network drivers.

Improve readability. KNF.

*** Details ***

In if_attach(), always initialize the interface ioctl routine,
ifnet->if_ioctl, if the driver has not already initialized it.
Delete if_ioctl == NULL tests everywhere else, because it cannot
happen.

In the ioctl routines of network interfaces, inherit common ioctl
behaviors by calling either ifioctl_common() or whichever ioctl
routine is appropriate for the class of interface---e.g., ether_ioctl()
for ethernets.

Stop (ab)using SIOCSIFADDR and start to use SIOCINITIFADDR. In
the user->kernel interface, SIOCSIFADDR's argument was an ifreq,
but on the protocol->ifnet interface, SIOCSIFADDR's argument was
an ifaddr. That was confusing, and it would work against me as I
make it possible for a network interface to overload most ioctls.
On the protocol->ifnet interface, replace SIOCSIFADDR with
SIOCINITIFADDR. In ifioctl(), return EPERM if userland tries to
invoke SIOCINITIFADDR.

In ifioctl(), give the interface the first shot at handling most
interface ioctls, and give the protocol the second shot, instead
of the other way around. Finally, let compatibility code (COMPAT_OSOCK)
take a shot.

Pull device initialization out of switch statements under
SIOCINITIFADDR. For example, pull ..._init() out of any switch
statement that looks like this:

switch (...->sa_family) {
case ...:
..._init();
...
break;
...
default:
..._init();
...
break;
}

Rewrite many if-else clauses that handle all permutations of IFF_UP
and IFF_RUNNING to use a switch statement,

switch (x & (IFF_UP|IFF_RUNNING)) {
case 0:
...
break;
case IFF_RUNNING:
...
break;
case IFF_UP:
...
break;
case IFF_UP|IFF_RUNNING:
...
break;
}

unifdef lots of code containing #ifdef FreeBSD, #ifdef NetBSD, and
#ifdef SIOCSIFMTU, especially in fwip(4) and in ndis(4).

In ipw(4), remove an if_set_sadl() call that is out of place.

In nfe(4), reuse the jumbo MTU logic in ether_ioctl().

Let ethernets register a callback for setting h/w state such as
promiscuous mode and the multicast filter in accord with a change
in the if_flags: ether_set_ifflags_cb() registers a callback that
returns ENETRESET if the caller should reset the ethernet by calling
if_init(), 0 on success, != 0 on failure. Pull common code from
ex(4), gem(4), nfe(4), sip(4), tlp(4), vge(4) into ether_ioctl(),
and register if_flags callbacks for those drivers.

Return ENOTTY instead of EINVAL for inappropriate ioctls. In
zyd(4), use ENXIO instead of ENOTTY to indicate that the device is
not any longer attached.

Add to if_set_sadl() a boolean 'factory' argument that indicates
whether a link-layer address was assigned by the factory or some
other source. In a comment, recommend using the factory address
for generating an EUI64, and update in6_get_hw_ifid() to prefer a
factory address to any other link-layer address.

Add a routing message, RTM_LLINFO_UPD, that tells protocols to
update the binding of network-layer addresses to link-layer addresses.
Implement this message in IPv4 and IPv6 by sending a gratuitous
ARP or a neighbor advertisement, respectively. Generate RTM_LLINFO_UPD
messages on a change of an interface's link-layer address.

In ether_ioctl(), do not let SIOCALIFADDR set a link-layer address
that is broadcast/multicast or equal to 00:00:00:00:00:00.

Make ether_ioctl() call ifioctl_common() to handle ioctls that it
does not understand.

In gif(4), initialize if_softc and use it, instead of assuming that
the gif_softc and ifp overlap.

Let ifioctl_common() handle SIOCGIFADDR.

Sprinkle rtcache_invariants(), which checks on DIAGNOSTIC kernels
that certain invariants on a struct route are satisfied.

In agr(4), rewrite agr_ioctl_filter() to be a bit more explicit
about the ioctls that we do not allow on an agr(4) member interface.

bzero -> memset. Delete unnecessary casts to void *. Use
sockaddr_in_init() and sockaddr_in6_init(). Compare pointers with
NULL instead of "testing truth". Replace some instances of (type
*)0 with NULL. Change some K&R prototypes to ANSI C, and join
lines.


Revision tags: netbsd-5-0-RC3 netbsd-5-0-RC2 netbsd-5-0-RC1 netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-baseX yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base yamt-nfs-mp-base yamt-pf42-base ad-socklock-base1
# 1.70 26-Mar-2008 ad

branches: 1.70.2; 1.70.6; 1.70.12; 1.70.14; 1.70.16;
Defer processing of routing messages to a soft interrupt. These can be
generated at IPL_VM and it's not safe to call directly into the socket
layer at that level. Reviewed by matt@.


Revision tags: yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase hpcarm-cleanup-base
# 1.69 20-Feb-2008 matt

branches: 1.69.2; 1.69.6;
s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)


Revision tags: mjf-devfs-base
# 1.68 11-Feb-2008 simonb

Don't look for <stdbool.h> if compiling _STANDALONE as well.


Revision tags: bouyer-xeni386-nbase
# 1.67 21-Jan-2008 dyoung

struct route is part of the kernel ABI (!!!), so move it back
outside of #ifdef _KERNEL. #include stdbool.h if !_KERNEL.


# 1.66 21-Jan-2008 dyoung

Move struct route inside of #ifdef _KERNEL to protect userland from
it.


# 1.65 21-Jan-2008 dyoung

In rtflushall(), do not clear a route cache by removing its rtentry
reference, but mark the cache 'invalid'. Let the next user of the
route cache check to whether or not the cache is valid, and update
the rtentry reference if necessary. In this way, avoid hairy
splnet()/splx() protection of route caches, which I never did trust.


Revision tags: bouyer-xeni386-base
# 1.64 14-Jan-2008 dyoung

Use rtcache_validate() instead of rtcache_getrt(). Delete rtcache_getrt().

In rtcache_lookup2(), use the return values of rtcache_validate()
and _rtcache_init() instead of looking at _ro_rt. Also, check the
return code of rtcache_setdst() for an error.


# 1.63 12-Jan-2008 dyoung

Good-bye, rtcache_check(). Call both rtcache_validate() and
rtcache_update(,1) instead of rtcache_check().


# 1.62 11-Jan-2008 dyoung

Cosmetic: remove redundant 'not' from a comment, re-wrap lines.


# 1.61 10-Jan-2008 dyoung

Make many void rtcache_X() routines return struct rtentry *, so
that we can make many back-to-back rtcache_X();rtcache_getrt()
calls into one rtcache_X() call.


Revision tags: matt-armv6-base
# 1.60 04-Jan-2008 dyoung

Replace rtcache_down() with rtcache_validate() and update rtcache_down()
uses.


Revision tags: vmlocking2-base3
# 1.59 20-Dec-2007 dyoung

Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.


Revision tags: nick-csl-alignment-base5 matt-armv6-prevmlocking yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base jmcneill-pm-base reinoud-bufcleanup-base vmlocking-base
# 1.58 27-Aug-2007 dyoung

branches: 1.58.2; 1.58.8; 1.58.10; 1.58.14;
Add a new routing message type, RTM_SETGATE. We can use an
RTM_SETGATE message to ask the link layer to fill in the link-layer
nexthop before we try to detect a duplicate route in a multipath-capable
kernel.


Revision tags: matt-mips64-base
# 1.57 19-Jul-2007 dyoung

branches: 1.57.4; 1.57.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.56 09-Jun-2007 dyoung

branches: 1.56.2;
Get rid of radix_node_head.rnh_walktree, because it is only ever
set to rn_walktree.

Introduce rt_walktree(), which applies a subroutine to every route
in a particular address family. Use it instead of rn_walktree()
virtually everywhere. This helps to hide the routing table
implementation.


Revision tags: yamt-idlelwp-base8
# 1.55 06-May-2007 dyoung

Factor rtcache_lookup2() out of rtcache_lookup1(), for re-use in
the IPv6 stack. rtcache_lookup2() takes an int * argument that it
writes with 1 if we had a cache 'hit', 0 if there was a cache
'miss'.


# 1.54 02-May-2007 dyoung

Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


# 1.53 22-Apr-2007 xtraeme

rtcache_clear is defined as static void in route.c, but it's used
in netinet/in_route.c. Move the prototype into route.h to fix
the build.


Revision tags: thorpej-atomic-base
# 1.52 04-Mar-2007 christos

branches: 1.52.2; 1.52.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.51 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.50 05-Jan-2007 joerg

branches: 1.50.2;
Add a debug option for the route cache to help tracing down issues
like PR 35272 and 35318. When the kernel is compiled with
-DRTCACHE_DEBUG, all rtcache entries are logged to a list with the place
they got initialised. This allows overwrites, double inits and other
manual messing to be detected.


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.49 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.48 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


# 1.47 07-Dec-2006 joerg

Deinline rt_get_ifa. Keep it in route.c as it is part of the routing
API, even though rtsock.c is the only user right now.


# 1.46 07-Dec-2006 joerg

Deinline rt_replace_ifa and move rt_set_ifa and rt_set_ifa1 to
route.c as they are not used outside that file.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base netbsd-4-base
# 1.45 13-Nov-2006 dyoung

Fix bugs in rt_get_ifa() and put aside the sequence number stuff,
which isn't ready for primetime yet.


# 1.44 13-Nov-2006 dyoung

Add a source-address selection policy mechanism to the kernel.

Also, add ioctls SIOCGIFADDRPREF/SIOCSIFADDRPREF to get/set preference
numbers for addresses. Make ifconfig(8) set/display preference
numbers.

To activate source-address selection policies in your kernel, add
'options IPSELSRC' to your kernel configuration.

Miscellaneous changes in support of source-address selection:

1 Factor out some common code, producing rt_replace_ifa().

2 Abbreviate a for-loop with TAILQ_FOREACH().

3 Add the predicates on IPv4 addresses IN_LINKLOCAL() and
IN_PRIVATE(), that are true for link-local unicast
(169.254/16) and RFC1918 private addresses, respectively.
Add the predicate IN_ANY_LOCAL() that is true for link-local
unicast and multicast.

4 Add IPv4-specific interface attach/detach routines,
in_domifattach and in_domifdetach, which build #ifdef
IPSELSRC.

See in_getifa(9) for a more thorough description of source-address
selection policy.


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base2 yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base simonb-timcounters-final yamt-pdpolicy-base5 chap-midi-base yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 elad-kernelauth-base yamt-pdpolicy-base yamt-uio_vmspace-base5 simonb-timecounters-base rpaulo-netinet-merge-pcb-base
# 1.43 11-Dec-2005 christos

branches: 1.43.20; 1.43.22;
merge ktrace-lwp.


Revision tags: ktrace-lwp-base
# 1.42 10-Dec-2005 elad

Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.41 22-Jun-2005 dyoung

branches: 1.41.2;
Resolve conflicts in importation of 18-May-2005 ath(4) / net80211(9)
from FreeBSD. Introduce compatibility shims (sys/dev/ic/ath_netbsd.[ch],
sys/net80211/ieee80211_netbsd.[ch]). Update drivers (an, atu, atw,
awi, ipw, iwi, rtw, wi) for the new net80211(9) API.


# 1.40 29-May-2005 christos

- sprinkle const
- remove unneeded casts
- use more mem*() instead of b*() funcs.


Revision tags: netbsd-3-1-1-RELEASE netbsd-3-0-3-RELEASE netbsd-3-1-RELEASE netbsd-3-0-2-RELEASE netbsd-3-1-RC4 netbsd-3-1-RC3 netbsd-3-1-RC2 netbsd-3-1-RC1 netbsd-3-0-1-RELEASE netbsd-3-0-RELEASE netbsd-3-0-RC6 netbsd-3-0-RC5 netbsd-3-0-RC4 netbsd-3-0-RC3 netbsd-3-0-RC2 netbsd-3-0-RC1 yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.39 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.38 21-Apr-2004 matt

branches: 1.38.4; 1.38.6;
ANSI-fy and some additional de-__P and constification.


# 1.37 21-Apr-2004 matt

Constify if.c radix.c and route.c (and fix related fallout).


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.36 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.35 29-Jun-2003 fvdl

branches: 1.35.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.34 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.33 18-Jan-2003 wiz

bandwidth, not bandwith.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.32 12-Nov-2002 itojun

remove all entries in rt timer queue on ip_mtudisc change, instead of
destroying the queue.


# 1.31 12-Nov-2002 itojun

add an argument to rt_timer_remove_all(), to specify if we need to call
timeout routine on removal.


# 1.30 02-Nov-2002 perry

/*CONTCOND*/ while (0)'ed macros


Revision tags: kqueue-aftermerge kqueue-beforemerge netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base gehenna-devsw-base kqueue-base
# 1.29 12-May-2002 matt

branches: 1.29.4;
Eliminate more commons.


Revision tags: eeh-devprop-base newlock-base ifpoll-base thorpej-mips-cache-base thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.28 08-Mar-2001 enami

branches: 1.28.2;
- lineup comment.
- fix typo in comment.


# 1.27 21-Feb-2001 itojun

branches: 1.27.2;
use u_quad_t for rtstat.
not sure if it really matters, but short (32K) looks way too small given
recent fat pipes connecting *BSD boxes, and our great uptime :-).


# 1.26 27-Jan-2001 itojun

cleanup cloned route when parent route (RTF_CLONING) goes away.
adds rt_parent to link parent from child (like NRL did, ours do refcnt
rt_refcnt properly).

bsdi rt_walkbranch would speedup the processing, but since the code will not
be visited too frequently, the current code (with rt_walktree) should be okay.


# 1.25 27-Jan-2001 itojun

mark cloned routes with RTF_CLONED. present it with netstat -r by "c".

let static routes overwrite cloned routes, as cloned routes can come back again
if necessary. behavior same as freebsd/bsdi, code partially from bsdi42.
(NRL rt->rt_parent was not added)
should fix PR 11916 and maybe some other PRs with ARP behavior.

recompilation of usr.sbin/route6d is suggested.


# 1.24 17-Jan-2001 itojun

pull post-4.4BSD change to sys/net/route.c from BSD/OS 4.2 (UCB copyrighted).

have sys/net/route.c:rtrequest1(), which takes rt_addrinfo * as the argument.
pass rt_addrinfo all the way down to rtrequest, and ifa->ifa_rtrequest.
3rd arg of ifa->ifa_rtrequest is now rt_addrinfo * instead of sockaddr *
(almost noone is using it anyways).

benefit: the follwoing command now works. previously we need two route(8)
invocations, "add" then "change".
# route add -inet6 default ::1 -ifp gif0

remove unsafe typecast in rtrequest(), from rtentry * to sockaddr *. it was
introduced by 4.3BSD-reno and never corrected.

XXX is eon_rtrequest() change correct regarding to 3rd arg?
eon_rtrequest() and rtrequest() were incorrect since 4.3BSD-reno,
so i do not have correct answer in the source code.
someone with more clue about netiso-over-ip, please help.


# 1.23 09-Dec-2000 itojun

update icmp6 too big validation. the change is necessary since pmtud is
mandatory for IPv6 (so we can't just validate by using connected pcb - we need
to allow traffic from unconnected pcb to do pmtud).
- if the traffic is validated by xx_ctlinput, allow up to "hiwat" pmtud
route entries.
- if the traffic was not validated by xx_ctlinput, allow up to "lowat" pmtud
route entries (there's upper limit, so bad guys cannot blow up our routing
table).
sync with kame

XXX need to think again about default hiwat/lowat value.
XXX victim selection to help starvation case


Revision tags: netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.22 04-May-2000 ragge

branches: 1.22.4;
Change rt_refcnt from short to int, to allow more than 32k routes thru
one interface without unexpected side effects.


# 1.21 06-Mar-2000 thorpej

- Add link status to if_data, so that routing daemons and other interested
parties can easily know the state of a link.
- Define an interface announcement message for the routing socket so that
routing daemons and other interested parties know when an interface
is attached/detached.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.20 19-Nov-1999 bouyer

Update protocoles and interfaces stats counters to 64bit.
RTM_IFINFO is now 0xf, 0xe is RTM_OIFINFO which returns the old (if_msghdr14)
struct with 32bit counters (binary compat, conditioned on COMPAT_14).
Same for sysctl: node 3 is renamed NET_RT_OIFLIST, NET_RT_IFLIST is now node 4.
Change rt_msg1() to add an mbuf to the mbuf chain instead of just panic()
when the message is larger than MHLEN.


Revision tags: comdex-fall-1999-base fvdl-softdep-base chs-ubc2-base
# 1.19 30-Jul-1999 itojun

branches: 1.19.2; 1.19.8;
remove reference to in6_systm.h (file itself will be removed afterwords)


# 1.18 01-Jul-1999 itojun

IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


Revision tags: netbsd-1-4-PATCH003 netbsd-1-4-PATCH002 netbsd-1-4-PATCH001 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.17 27-Dec-1998 thorpej

branches: 1.17.4; 1.17.6;
Simplify the rttimer code somewhat; use TAILQs instead of CIRCLEQs (we
didn't really need to traverse the queues backwards anyhow), and other
minor code simplification.


# 1.16 10-Dec-1998 christos

IPX counters and centralize statistics routine.


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.15 25-Aug-1998 thorpej

Use do { ... } while (0) in RTFREE().


Revision tags: eeh-paddr_t-base
# 1.14 02-May-1998 thorpej

Need <sys/socket.h> to stand alone.


# 1.13 29-Apr-1998 thorpej

Oops, we depend on <sys/queue.h>.


# 1.12 29-Apr-1998 kml

Add generic route timeout functionality; used by path MTU discovery code


Revision tags: netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base thorpej-signal-base marc-pcmcia-bp marc-pcmcia-base
# 1.11 02-Apr-1997 christos

branches: 1.11.8;
Sync with Lite2.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.10 22-May-1996 mycroft

Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.9 13-Feb-1996 christos

branches: 1.9.4;
Net prototypes


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.8 26-Mar-1995 jtc

KERNEL -> _KERNEL


# 1.7 08-Mar-1995 cgd

fixed sized types, where appropriate. when casting pointers to
integers to do math on them, cast to long. ioctl commands are
u_longs.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.6 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.5 13-May-1994 mycroft

Update to 4.4-Lite networking code, with a few local changes.


# 1.4 11-May-1994 mycroft

Update to RTM version 3. Add prototypes. Add some new constants which are
not used yet.


Revision tags: magnum-base netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.3 20-May-1993 cgd

add rcs ids to everything, and clean up headers


# 1.2 19-Apr-1993 mycroft

Add consistent multiple-inclusion protection.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.114 21-Sep-2017 ozaki-r

Invalidate rtcache based on a global generation counter

The change introduces a global generation counter that is incremented when any
routes have been added or deleted. When a rtcache caches a rtentry into itself,
it also stores a snapshot of the generation counter. If the snapshot equals to
the global counter, the cache is still valid, otherwise invalidated.

One drawback of the change is that all rtcaches of all protocol families are
invalidated when any routes of any protocol families are added or deleted.
If that matters, we should have separate generation counters based on
protocol families.

This change removes LIST_ENTRY from struct route, which fixes a part of
PR kern/52515.


Revision tags: nick-nhusb-base-20170825 perseant-stdc-iso10646-base
# 1.113 16-Jun-2017 ozaki-r

Sending a routing message (RTM_ADD) on adding an llentry

A message used to be sent on adding a cloned route. Restore the
behavior for backward compatibility.

Requested by ryo@


Revision tags: netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.112 11-Apr-2017 roy

branches: 1.112.4;
Add RO_MSGFILTER socket option to PF_ROUTE to filter out
un-wanted route(4) messages.

Inspired by the ROUTE_MSGFILTER equivalent in OpenBSD,
but with an API which allows the full range of potential message types.


Revision tags: jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107
# 1.111 19-Dec-2016 roy

branches: 1.111.2;
Fix gcc complaining about int to unsigned long conversion issues by
explictly marking as unsigned in RT_ROUNDUP2.


# 1.110 16-Dec-2016 christos

Can't hide stuff from userland, because struct route is embedded in other
structures (like inpcb) and things like fstat stop working.


# 1.109 12-Dec-2016 ozaki-r

Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.108 08-Dec-2016 ozaki-r

Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.


Revision tags: nick-nhusb-base-20161204
# 1.107 15-Nov-2016 ozaki-r

Don't use rt_walktree to delete routes

Some functions use rt_walktree to scan the routing table and delete
matched routes. However, we shouldn't use rt_walktree to delete
routes because rt_walktree is recursive to the routing table (radix
tree) and isn't friendly to MP-ification. rt_walktree allows a caller
to pass a callback function to delete an matched entry. The callback
function is called from an API of the radix tree (rn_walktree) but
also calls an API of the radix tree to delete an entry.

This change adds a new API of the radix tree, rn_search_matched,
which returns a matched entry that is selected by a callback
function passed by a caller and the caller itself deletes the
entry. By using the API, we can avoid the recursive form.


Revision tags: pgoyette-localcount-20161104
# 1.106 25-Oct-2016 ozaki-r

Remove unnecessary argument

No functional change.


# 1.105 21-Oct-2016 ozaki-r

Make some rt_timer functions and variables static

No functional change.


# 1.104 18-Oct-2016 ozaki-r

Remove unused rtcache_lookup_noclone


Revision tags: nick-nhusb-base-20161004
# 1.103 21-Sep-2016 roy

Add ifam_pid and ifam_addrflags to ifa_msghdr.
Re-version RTM_NEWADDR, RTM_DELADDR, RTM_CHGADDR and NET_RT_IFLIST.
Add compat code for old version.


Revision tags: localcount-20160914 pgoyette-localcount-20160806
# 1.102 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


Revision tags: pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529
# 1.101 28-Apr-2016 ozaki-r

branches: 1.101.2;
Constify rtentry of if_output

We no longer need to change rtentry below if_output.

The change makes it clear where rtentries are changed (or not)
and helps forthcoming locking (os psrefing) rtentries.


# 1.100 26-Apr-2016 ozaki-r

Stop using rt_gwroute on packet sending paths

rt_gwroute of rtentry is a reference to a rtentry of the gateway
for a rtentry with RTF_GATEWAY. That was used by L2 (arp and ndp)
to look up L2 addresses. By separating L2 nexthop caches, we don't
need a route for the purpose and we can stop using rt_gwroute.
By doing so, we can reduce referencing and modifying rtentries,
which makes it easy to apply a lock (and/or psref) to the
routing table and rtentries.

One issue to do this is to keep RTF_REJECT behavior. It seems it
was broken when we moved rtalloc1 things from L2 output routines
(e.g., ether_output) to ip_hresolv_output, but (fortunately?)
it works unexpectedly. What we mistook are:
- RTF_REJECT was checked for any routes in L2 output routines,
but in ip_hresolv_output it is checked only when the route
is RTF_GATEWAY
- The RTF_REJECT check wasn't copied to IPv6 (nd6_output)

It seems that rt_gwroute checks hid the mistakes and it looked
work (unexpectedly) and removing rt_gwroute checks unveil the
issue. So we need to fix RTF_REJECT checks in ip_hresolv_output
and also add them to nd6_output.

One more point we have to care is returning an errno; we need
to mimic looutput behavior. Originally RTF_REJECT check was
done either in L2 output routines or in looutput. The latter is
applied when a reject route directs to a loopback interface.
However, now RTF_REJECT check is done before looutput so to keep
the original behavior we need to return an errno which looutput
chooses. Added rt_check_reject_route does such tweaks.


Revision tags: nick-nhusb-base-20160422
# 1.99 11-Apr-2016 ozaki-r

Don't use radix tree API directly


# 1.98 04-Apr-2016 ozaki-r

Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html


# 1.97 24-Mar-2016 ozaki-r

Constify rt_newmsg's arguments


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.96 02-Sep-2015 ozaki-r

Do rt_refcnt++ when set a rtentry to another rtentry's rt_gwroute

And also do rtfree when deref a rtentry from rt_gwroute.


# 1.95 31-Aug-2015 ozaki-r

Hook up lltable/llentry with the kernel (and rumpkernel)

It is built and initialized on bootup, but there is no user for now.

Most codes in in.c are imported from FreeBSD as well as lltable/llentry.


# 1.94 31-Aug-2015 ozaki-r

Make rt_refcnt take into account rt_timer


# 1.93 24-Aug-2015 ozaki-r

Add an assertion; if rtcache has an rtentry, its refcnt must be > 0


# 1.92 17-Jul-2015 ozaki-r

Reform use of rt_refcnt

rt_refcnt of rtentry was used in bad manners, for example, direct rt_refcnt++
and rt_refcnt-- outside route.c, "rt->rt_refcnt++; rtfree(rt);" idiom, and
touching rt after rt->rt_refcnt--.

These abuses seem to be needed because rt_refcnt manages only references
between rtentry and doesn't take care of references during packet processing
(IOW references from local variables). In order to reduce the above abuses,
the latter cases should be counted by rt_refcnt as well as the former cases.

This change improves consistency of use of rt_refcnt:
- rtentry is always accessed with rt_refcnt incremented
- rtentry's rt_refcnt is decremented after use (rtfree is always used instead
of rt_refcnt--)
- functions returning rtentry increment its rt_refcnt (and caller rtfree it)

Note that rt_refcnt prevents rtentry from being freed but doesn't prevent
rtentry from being updated. Toward MP-safe, we need to provide another
protection for rtentry, e.g., locks. (Or introduce a better data structure
allowing concurrent readers during updates.)


Revision tags: nick-nhusb-base-20150606
# 1.91 30-Apr-2015 ozaki-r

Make some functions static

- rtflushall
- rtcache_clear
- rtcache_invalidate

And pull these static inline functions in route.c

- rt_destroy
- rt_setkey


Revision tags: nick-nhusb-base-20150406
# 1.90 06-Apr-2015 ozaki-r

Classify and sort prototype declarations

No functional change.


# 1.89 06-Apr-2015 ozaki-r

Make rt_maskedcopy static


# 1.88 23-Mar-2015 roy

Add RTF_BROADCAST to mark routes used for the broadcast address when
they are created on the fly. This makes it clear what the route is for
and allows an optimisation in ip_output() by avoiding a call to
in_broadcast() because most of the time we do talk to a host.
It also avoids a needless allocation for the storage of llinfo_arp and
thus vanishes from arp(8) - it showed as incomplete anyway so this
is a nice side effect.

Guard against this and routes marked with RTF_BLACKHOLE in
ip_fastforward().
While here, guard against routes marked with RTF_BLACKHOLE in
ip6_fastforward().
RTF_BROADCAST is IPv4 only, so don't bother checking that here.


# 1.87 26-Feb-2015 roy

Introduce the routing flag RTF_LOCAL to track local address routes.
Add functions rt_ifa_addlocal() and rt_ifa_remlocal() to add and remove
local routes for the address and announce the new address and route
to the routing socket.

Add in_ifaddlocal() and in_ifremlocal() to use these functions.
Rename in6_if{add,rem}loop() to in6_if{add,rem}local() and use these
functions.

rtinit() no longer announces the address, just the network route for the
address. As such, calls to rt_newaddrmsg() have been removed from
in_addprefix() and in_scrubprefix().

This solves the problem of potentially more than one announcement, or no
announcement at all for the address in certain situations.


# 1.86 25-Feb-2015 roy

Rename nd6_rtmsg() to rt_newmsg() and move into the generic routing code
as it's not IPv6 specific and will be used elsewhere.


# 1.85 24-Feb-2015 roy

Clean comments and style.


Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.84 06-Jun-2014 rmind

branches: 1.84.4;
- Eliminate RTFREE() macro in favour of rtfree() function.
- Make rtcache() function static.


Revision tags: yamt-pagecache-base9 rmind-smpnet-nbase rmind-smpnet-base
# 1.83 26-Apr-2014 pooka

It's been > 20years since rtioctl() did something. Let's just
remove that special way of returning EOPNOTSUPP.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base
# 1.82 01-Mar-2013 joerg

branches: 1.82.6; 1.82.10;
Retire OSI network stack. OK core@


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6 jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.81 18-Feb-2012 rmind

branches: 1.81.2;
rt_setkey: remove invalid assert, sockaddr_dup() may fail if no memory.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.80 11-Nov-2011 gdt

branches: 1.80.4;
Move RTF_ANNOUNCE flag so that it no longer conflicts with RTF_PROTO2.

RTF_ANNOUNCE was defined as RTF_PROTO2. The flag is used to indicated
that host should act as a proxy for a link level arp or ndp request.
(If RTF_PROTO2 is used as an experimental flag (as advertised),
various problems can occur.)

This commit provides a first-class definition with its own bit for
RTF_ANNOUNCE, removes the old aliasing definitions, and adds support
for the new RTF_ANNOUNCE flag to netstat(8) and route(8).,

Also, remove unused RTF_ flags that collide with RTF_PROTO1:
netinet/icmp6.h defined RTF_PROBEMTU as RTF_PROTO1
netinet/if_inarp.h defined RTF_USETRAILERS as RTF_PROTO1
(Neither of these flags are used anywhere. Both have been removed
to reduce chances of collision with RTF_PROTO1.)

Figuring this out and the diff are the work of Beverly Schwartz of
BBN.

(Passed release build, boot in VM, with no apparently related atf
failures.)

Approved for Public Release, Distribution Unlimited
This material is based upon work supported by the Defense Advanced
Research Projects Agency and Space and Naval Warfare Systems Center,
Pacific, under Contract No. N66001-09-C-2073.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.79 31-Mar-2011 dyoung

branches: 1.79.4;
Hide the radix-trie implementation of the forwarding table so that we
will have an easier time replacing it with something different, even if
it is a second radix-trie implementation.

sys/net/route.c and sys/net/rtsock.c no longer operate directly on
radix_nodes or radix_node_heads.

Hopefully this will reduce the temptation to implement multipath or
source-based routing using grotty hacks to the grotty old radix-trie
code, too. :-)


Revision tags: bouyer-quota2-nbase bouyer-quota2-base
# 1.78 01-Feb-2011 matt

Add a new AF/PF_ROUTE which is 64-bit clean which makes the routing socket
interface (and its associated sysctls) act identically for both 32 and 64 bit
programs. The old unclean one remains for backward compatibility.


# 1.77 26-Jan-2011 dyoung

Update comment on RTM_CHGADDR to describe better what it's for.


Revision tags: jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.76 12-Nov-2010 roy

branches: 1.76.2; 1.76.4;
Add RTM_CHGADDR to signal that an address on the interface has changed.
This is mainly used for notifying userland about active link address changes.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.75 26-Jun-2010 kefren

Add MPLS support, proposed on tech-net@ a couple of days ago

Welcome to 5.99.33


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211
# 1.74 03-Nov-2009 dyoung

branches: 1.74.2; 1.74.4;
s/u_quad_t/uint64_t/.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 jym-xensuspend-nbase yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.73 02-Apr-2009 christos

Centralize the ROUNDUP and ADVANCE macro in a header file, give them an
RT_ prefix and use them appropriately, instead of making copies. Make
pppd use the RT_ROUNDUP macro; fixes proxyarp setting on 64 bit hosts.

XXX: All this should be pulled up to 5.0


Revision tags: nick-hppapmap-base2 mjf-devfs2-base
# 1.72 11-Jan-2009 christos

branches: 1.72.2;
merge christos-time_t


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base christos-time_t-nbase haad-dm-base christos-time_t-base
# 1.71 07-Nov-2008 dyoung

*** Summary ***

When a link-layer address changes (e.g., ifconfig ex0 link
02:de:ad:be:ef:02 active), send a gratuitous ARP and/or a Neighbor
Advertisement to update the network-/link-layer address bindings
on our LAN peers.

Refuse a change of ethernet address to the address 00:00:00:00:00:00
or to any multicast/broadcast address. (Thanks matt@.)

Reorder ifnet ioctl operations so that driver ioctls may inherit
the functions of their "class"---ether_ioctl(), fddi_ioctl(), et
cetera---and the class ioctls may inherit from the generic ioctl,
ifioctl_common(), but both driver- and class-ioctls may override
the generic behavior. Make network drivers share more code.

Distinguish a "factory" link-layer address from others for the
purposes of both protecting that address from deletion and computing
EUI64.

Return consistent, appropriate error codes from network drivers.

Improve readability. KNF.

*** Details ***

In if_attach(), always initialize the interface ioctl routine,
ifnet->if_ioctl, if the driver has not already initialized it.
Delete if_ioctl == NULL tests everywhere else, because it cannot
happen.

In the ioctl routines of network interfaces, inherit common ioctl
behaviors by calling either ifioctl_common() or whichever ioctl
routine is appropriate for the class of interface---e.g., ether_ioctl()
for ethernets.

Stop (ab)using SIOCSIFADDR and start to use SIOCINITIFADDR. In
the user->kernel interface, SIOCSIFADDR's argument was an ifreq,
but on the protocol->ifnet interface, SIOCSIFADDR's argument was
an ifaddr. That was confusing, and it would work against me as I
make it possible for a network interface to overload most ioctls.
On the protocol->ifnet interface, replace SIOCSIFADDR with
SIOCINITIFADDR. In ifioctl(), return EPERM if userland tries to
invoke SIOCINITIFADDR.

In ifioctl(), give the interface the first shot at handling most
interface ioctls, and give the protocol the second shot, instead
of the other way around. Finally, let compatibility code (COMPAT_OSOCK)
take a shot.

Pull device initialization out of switch statements under
SIOCINITIFADDR. For example, pull ..._init() out of any switch
statement that looks like this:

switch (...->sa_family) {
case ...:
..._init();
...
break;
...
default:
..._init();
...
break;
}

Rewrite many if-else clauses that handle all permutations of IFF_UP
and IFF_RUNNING to use a switch statement,

switch (x & (IFF_UP|IFF_RUNNING)) {
case 0:
...
break;
case IFF_RUNNING:
...
break;
case IFF_UP:
...
break;
case IFF_UP|IFF_RUNNING:
...
break;
}

unifdef lots of code containing #ifdef FreeBSD, #ifdef NetBSD, and
#ifdef SIOCSIFMTU, especially in fwip(4) and in ndis(4).

In ipw(4), remove an if_set_sadl() call that is out of place.

In nfe(4), reuse the jumbo MTU logic in ether_ioctl().

Let ethernets register a callback for setting h/w state such as
promiscuous mode and the multicast filter in accord with a change
in the if_flags: ether_set_ifflags_cb() registers a callback that
returns ENETRESET if the caller should reset the ethernet by calling
if_init(), 0 on success, != 0 on failure. Pull common code from
ex(4), gem(4), nfe(4), sip(4), tlp(4), vge(4) into ether_ioctl(),
and register if_flags callbacks for those drivers.

Return ENOTTY instead of EINVAL for inappropriate ioctls. In
zyd(4), use ENXIO instead of ENOTTY to indicate that the device is
not any longer attached.

Add to if_set_sadl() a boolean 'factory' argument that indicates
whether a link-layer address was assigned by the factory or some
other source. In a comment, recommend using the factory address
for generating an EUI64, and update in6_get_hw_ifid() to prefer a
factory address to any other link-layer address.

Add a routing message, RTM_LLINFO_UPD, that tells protocols to
update the binding of network-layer addresses to link-layer addresses.
Implement this message in IPv4 and IPv6 by sending a gratuitous
ARP or a neighbor advertisement, respectively. Generate RTM_LLINFO_UPD
messages on a change of an interface's link-layer address.

In ether_ioctl(), do not let SIOCALIFADDR set a link-layer address
that is broadcast/multicast or equal to 00:00:00:00:00:00.

Make ether_ioctl() call ifioctl_common() to handle ioctls that it
does not understand.

In gif(4), initialize if_softc and use it, instead of assuming that
the gif_softc and ifp overlap.

Let ifioctl_common() handle SIOCGIFADDR.

Sprinkle rtcache_invariants(), which checks on DIAGNOSTIC kernels
that certain invariants on a struct route are satisfied.

In agr(4), rewrite agr_ioctl_filter() to be a bit more explicit
about the ioctls that we do not allow on an agr(4) member interface.

bzero -> memset. Delete unnecessary casts to void *. Use
sockaddr_in_init() and sockaddr_in6_init(). Compare pointers with
NULL instead of "testing truth". Replace some instances of (type
*)0 with NULL. Change some K&R prototypes to ANSI C, and join
lines.


Revision tags: netbsd-5-0-RC3 netbsd-5-0-RC2 netbsd-5-0-RC1 netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-baseX yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base yamt-nfs-mp-base yamt-pf42-base ad-socklock-base1
# 1.70 26-Mar-2008 ad

branches: 1.70.2; 1.70.6; 1.70.12; 1.70.14; 1.70.16;
Defer processing of routing messages to a soft interrupt. These can be
generated at IPL_VM and it's not safe to call directly into the socket
layer at that level. Reviewed by matt@.


Revision tags: yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase hpcarm-cleanup-base
# 1.69 20-Feb-2008 matt

branches: 1.69.2; 1.69.6;
s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)


Revision tags: mjf-devfs-base
# 1.68 11-Feb-2008 simonb

Don't look for <stdbool.h> if compiling _STANDALONE as well.


Revision tags: bouyer-xeni386-nbase
# 1.67 21-Jan-2008 dyoung

struct route is part of the kernel ABI (!!!), so move it back
outside of #ifdef _KERNEL. #include stdbool.h if !_KERNEL.


# 1.66 21-Jan-2008 dyoung

Move struct route inside of #ifdef _KERNEL to protect userland from
it.


# 1.65 21-Jan-2008 dyoung

In rtflushall(), do not clear a route cache by removing its rtentry
reference, but mark the cache 'invalid'. Let the next user of the
route cache check to whether or not the cache is valid, and update
the rtentry reference if necessary. In this way, avoid hairy
splnet()/splx() protection of route caches, which I never did trust.


Revision tags: bouyer-xeni386-base
# 1.64 14-Jan-2008 dyoung

Use rtcache_validate() instead of rtcache_getrt(). Delete rtcache_getrt().

In rtcache_lookup2(), use the return values of rtcache_validate()
and _rtcache_init() instead of looking at _ro_rt. Also, check the
return code of rtcache_setdst() for an error.


# 1.63 12-Jan-2008 dyoung

Good-bye, rtcache_check(). Call both rtcache_validate() and
rtcache_update(,1) instead of rtcache_check().


# 1.62 11-Jan-2008 dyoung

Cosmetic: remove redundant 'not' from a comment, re-wrap lines.


# 1.61 10-Jan-2008 dyoung

Make many void rtcache_X() routines return struct rtentry *, so
that we can make many back-to-back rtcache_X();rtcache_getrt()
calls into one rtcache_X() call.


Revision tags: matt-armv6-base
# 1.60 04-Jan-2008 dyoung

Replace rtcache_down() with rtcache_validate() and update rtcache_down()
uses.


Revision tags: vmlocking2-base3
# 1.59 20-Dec-2007 dyoung

Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.


Revision tags: nick-csl-alignment-base5 matt-armv6-prevmlocking yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base jmcneill-pm-base reinoud-bufcleanup-base vmlocking-base
# 1.58 27-Aug-2007 dyoung

branches: 1.58.2; 1.58.8; 1.58.10; 1.58.14;
Add a new routing message type, RTM_SETGATE. We can use an
RTM_SETGATE message to ask the link layer to fill in the link-layer
nexthop before we try to detect a duplicate route in a multipath-capable
kernel.


Revision tags: matt-mips64-base
# 1.57 19-Jul-2007 dyoung

branches: 1.57.4; 1.57.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.56 09-Jun-2007 dyoung

branches: 1.56.2;
Get rid of radix_node_head.rnh_walktree, because it is only ever
set to rn_walktree.

Introduce rt_walktree(), which applies a subroutine to every route
in a particular address family. Use it instead of rn_walktree()
virtually everywhere. This helps to hide the routing table
implementation.


Revision tags: yamt-idlelwp-base8
# 1.55 06-May-2007 dyoung

Factor rtcache_lookup2() out of rtcache_lookup1(), for re-use in
the IPv6 stack. rtcache_lookup2() takes an int * argument that it
writes with 1 if we had a cache 'hit', 0 if there was a cache
'miss'.


# 1.54 02-May-2007 dyoung

Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


# 1.53 22-Apr-2007 xtraeme

rtcache_clear is defined as static void in route.c, but it's used
in netinet/in_route.c. Move the prototype into route.h to fix
the build.


Revision tags: thorpej-atomic-base
# 1.52 04-Mar-2007 christos

branches: 1.52.2; 1.52.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.51 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.50 05-Jan-2007 joerg

branches: 1.50.2;
Add a debug option for the route cache to help tracing down issues
like PR 35272 and 35318. When the kernel is compiled with
-DRTCACHE_DEBUG, all rtcache entries are logged to a list with the place
they got initialised. This allows overwrites, double inits and other
manual messing to be detected.


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.49 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.48 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


# 1.47 07-Dec-2006 joerg

Deinline rt_get_ifa. Keep it in route.c as it is part of the routing
API, even though rtsock.c is the only user right now.


# 1.46 07-Dec-2006 joerg

Deinline rt_replace_ifa and move rt_set_ifa and rt_set_ifa1 to
route.c as they are not used outside that file.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base netbsd-4-base
# 1.45 13-Nov-2006 dyoung

Fix bugs in rt_get_ifa() and put aside the sequence number stuff,
which isn't ready for primetime yet.


# 1.44 13-Nov-2006 dyoung

Add a source-address selection policy mechanism to the kernel.

Also, add ioctls SIOCGIFADDRPREF/SIOCSIFADDRPREF to get/set preference
numbers for addresses. Make ifconfig(8) set/display preference
numbers.

To activate source-address selection policies in your kernel, add
'options IPSELSRC' to your kernel configuration.

Miscellaneous changes in support of source-address selection:

1 Factor out some common code, producing rt_replace_ifa().

2 Abbreviate a for-loop with TAILQ_FOREACH().

3 Add the predicates on IPv4 addresses IN_LINKLOCAL() and
IN_PRIVATE(), that are true for link-local unicast
(169.254/16) and RFC1918 private addresses, respectively.
Add the predicate IN_ANY_LOCAL() that is true for link-local
unicast and multicast.

4 Add IPv4-specific interface attach/detach routines,
in_domifattach and in_domifdetach, which build #ifdef
IPSELSRC.

See in_getifa(9) for a more thorough description of source-address
selection policy.


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base2 yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base simonb-timcounters-final yamt-pdpolicy-base5 chap-midi-base yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 elad-kernelauth-base yamt-pdpolicy-base yamt-uio_vmspace-base5 simonb-timecounters-base rpaulo-netinet-merge-pcb-base
# 1.43 11-Dec-2005 christos

branches: 1.43.20; 1.43.22;
merge ktrace-lwp.


Revision tags: ktrace-lwp-base
# 1.42 10-Dec-2005 elad

Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.41 22-Jun-2005 dyoung

branches: 1.41.2;
Resolve conflicts in importation of 18-May-2005 ath(4) / net80211(9)
from FreeBSD. Introduce compatibility shims (sys/dev/ic/ath_netbsd.[ch],
sys/net80211/ieee80211_netbsd.[ch]). Update drivers (an, atu, atw,
awi, ipw, iwi, rtw, wi) for the new net80211(9) API.


# 1.40 29-May-2005 christos

- sprinkle const
- remove unneeded casts
- use more mem*() instead of b*() funcs.


Revision tags: netbsd-3-1-1-RELEASE netbsd-3-0-3-RELEASE netbsd-3-1-RELEASE netbsd-3-0-2-RELEASE netbsd-3-1-RC4 netbsd-3-1-RC3 netbsd-3-1-RC2 netbsd-3-1-RC1 netbsd-3-0-1-RELEASE netbsd-3-0-RELEASE netbsd-3-0-RC6 netbsd-3-0-RC5 netbsd-3-0-RC4 netbsd-3-0-RC3 netbsd-3-0-RC2 netbsd-3-0-RC1 yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.39 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.38 21-Apr-2004 matt

branches: 1.38.4; 1.38.6;
ANSI-fy and some additional de-__P and constification.


# 1.37 21-Apr-2004 matt

Constify if.c radix.c and route.c (and fix related fallout).


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.36 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.35 29-Jun-2003 fvdl

branches: 1.35.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.34 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.33 18-Jan-2003 wiz

bandwidth, not bandwith.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.32 12-Nov-2002 itojun

remove all entries in rt timer queue on ip_mtudisc change, instead of
destroying the queue.


# 1.31 12-Nov-2002 itojun

add an argument to rt_timer_remove_all(), to specify if we need to call
timeout routine on removal.


# 1.30 02-Nov-2002 perry

/*CONTCOND*/ while (0)'ed macros


Revision tags: kqueue-aftermerge kqueue-beforemerge netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base gehenna-devsw-base kqueue-base
# 1.29 12-May-2002 matt

branches: 1.29.4;
Eliminate more commons.


Revision tags: eeh-devprop-base newlock-base ifpoll-base thorpej-mips-cache-base thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.28 08-Mar-2001 enami

branches: 1.28.2;
- lineup comment.
- fix typo in comment.


# 1.27 21-Feb-2001 itojun

branches: 1.27.2;
use u_quad_t for rtstat.
not sure if it really matters, but short (32K) looks way too small given
recent fat pipes connecting *BSD boxes, and our great uptime :-).


# 1.26 27-Jan-2001 itojun

cleanup cloned route when parent route (RTF_CLONING) goes away.
adds rt_parent to link parent from child (like NRL did, ours do refcnt
rt_refcnt properly).

bsdi rt_walkbranch would speedup the processing, but since the code will not
be visited too frequently, the current code (with rt_walktree) should be okay.


# 1.25 27-Jan-2001 itojun

mark cloned routes with RTF_CLONED. present it with netstat -r by "c".

let static routes overwrite cloned routes, as cloned routes can come back again
if necessary. behavior same as freebsd/bsdi, code partially from bsdi42.
(NRL rt->rt_parent was not added)
should fix PR 11916 and maybe some other PRs with ARP behavior.

recompilation of usr.sbin/route6d is suggested.


# 1.24 17-Jan-2001 itojun

pull post-4.4BSD change to sys/net/route.c from BSD/OS 4.2 (UCB copyrighted).

have sys/net/route.c:rtrequest1(), which takes rt_addrinfo * as the argument.
pass rt_addrinfo all the way down to rtrequest, and ifa->ifa_rtrequest.
3rd arg of ifa->ifa_rtrequest is now rt_addrinfo * instead of sockaddr *
(almost noone is using it anyways).

benefit: the follwoing command now works. previously we need two route(8)
invocations, "add" then "change".
# route add -inet6 default ::1 -ifp gif0

remove unsafe typecast in rtrequest(), from rtentry * to sockaddr *. it was
introduced by 4.3BSD-reno and never corrected.

XXX is eon_rtrequest() change correct regarding to 3rd arg?
eon_rtrequest() and rtrequest() were incorrect since 4.3BSD-reno,
so i do not have correct answer in the source code.
someone with more clue about netiso-over-ip, please help.


# 1.23 09-Dec-2000 itojun

update icmp6 too big validation. the change is necessary since pmtud is
mandatory for IPv6 (so we can't just validate by using connected pcb - we need
to allow traffic from unconnected pcb to do pmtud).
- if the traffic is validated by xx_ctlinput, allow up to "hiwat" pmtud
route entries.
- if the traffic was not validated by xx_ctlinput, allow up to "lowat" pmtud
route entries (there's upper limit, so bad guys cannot blow up our routing
table).
sync with kame

XXX need to think again about default hiwat/lowat value.
XXX victim selection to help starvation case


Revision tags: netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.22 04-May-2000 ragge

branches: 1.22.4;
Change rt_refcnt from short to int, to allow more than 32k routes thru
one interface without unexpected side effects.


# 1.21 06-Mar-2000 thorpej

- Add link status to if_data, so that routing daemons and other interested
parties can easily know the state of a link.
- Define an interface announcement message for the routing socket so that
routing daemons and other interested parties know when an interface
is attached/detached.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.20 19-Nov-1999 bouyer

Update protocoles and interfaces stats counters to 64bit.
RTM_IFINFO is now 0xf, 0xe is RTM_OIFINFO which returns the old (if_msghdr14)
struct with 32bit counters (binary compat, conditioned on COMPAT_14).
Same for sysctl: node 3 is renamed NET_RT_OIFLIST, NET_RT_IFLIST is now node 4.
Change rt_msg1() to add an mbuf to the mbuf chain instead of just panic()
when the message is larger than MHLEN.


Revision tags: comdex-fall-1999-base fvdl-softdep-base chs-ubc2-base
# 1.19 30-Jul-1999 itojun

branches: 1.19.2; 1.19.8;
remove reference to in6_systm.h (file itself will be removed afterwords)


# 1.18 01-Jul-1999 itojun

IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


Revision tags: netbsd-1-4-PATCH003 netbsd-1-4-PATCH002 netbsd-1-4-PATCH001 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.17 27-Dec-1998 thorpej

branches: 1.17.4; 1.17.6;
Simplify the rttimer code somewhat; use TAILQs instead of CIRCLEQs (we
didn't really need to traverse the queues backwards anyhow), and other
minor code simplification.


# 1.16 10-Dec-1998 christos

IPX counters and centralize statistics routine.


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.15 25-Aug-1998 thorpej

Use do { ... } while (0) in RTFREE().


Revision tags: eeh-paddr_t-base
# 1.14 02-May-1998 thorpej

Need <sys/socket.h> to stand alone.


# 1.13 29-Apr-1998 thorpej

Oops, we depend on <sys/queue.h>.


# 1.12 29-Apr-1998 kml

Add generic route timeout functionality; used by path MTU discovery code


Revision tags: netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base thorpej-signal-base marc-pcmcia-bp marc-pcmcia-base
# 1.11 02-Apr-1997 christos

branches: 1.11.8;
Sync with Lite2.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.10 22-May-1996 mycroft

Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.9 13-Feb-1996 christos

branches: 1.9.4;
Net prototypes


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.8 26-Mar-1995 jtc

KERNEL -> _KERNEL


# 1.7 08-Mar-1995 cgd

fixed sized types, where appropriate. when casting pointers to
integers to do math on them, cast to long. ioctl commands are
u_longs.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.6 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.5 13-May-1994 mycroft

Update to 4.4-Lite networking code, with a few local changes.


# 1.4 11-May-1994 mycroft

Update to RTM version 3. Add prototypes. Add some new constants which are
not used yet.


Revision tags: magnum-base netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.3 20-May-1993 cgd

add rcs ids to everything, and clean up headers


# 1.2 19-Apr-1993 mycroft

Add consistent multiple-inclusion protection.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.113 16-Jun-2017 ozaki-r

Sending a routing message (RTM_ADD) on adding an llentry

A message used to be sent on adding a cloned route. Restore the
behavior for backward compatibility.

Requested by ryo@


Revision tags: netbsd-8-base prg-localcount2-base3 prg-localcount2-base2 prg-localcount2-base1 prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.112 11-Apr-2017 roy

Add RO_MSGFILTER socket option to PF_ROUTE to filter out
un-wanted route(4) messages.

Inspired by the ROUTE_MSGFILTER equivalent in OpenBSD,
but with an API which allows the full range of potential message types.


Revision tags: jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107
# 1.111 19-Dec-2016 roy

branches: 1.111.2;
Fix gcc complaining about int to unsigned long conversion issues by
explictly marking as unsigned in RT_ROUNDUP2.


# 1.110 16-Dec-2016 christos

Can't hide stuff from userland, because struct route is embedded in other
structures (like inpcb) and things like fstat stop working.


# 1.109 12-Dec-2016 ozaki-r

Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.108 08-Dec-2016 ozaki-r

Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.


Revision tags: nick-nhusb-base-20161204
# 1.107 15-Nov-2016 ozaki-r

Don't use rt_walktree to delete routes

Some functions use rt_walktree to scan the routing table and delete
matched routes. However, we shouldn't use rt_walktree to delete
routes because rt_walktree is recursive to the routing table (radix
tree) and isn't friendly to MP-ification. rt_walktree allows a caller
to pass a callback function to delete an matched entry. The callback
function is called from an API of the radix tree (rn_walktree) but
also calls an API of the radix tree to delete an entry.

This change adds a new API of the radix tree, rn_search_matched,
which returns a matched entry that is selected by a callback
function passed by a caller and the caller itself deletes the
entry. By using the API, we can avoid the recursive form.


Revision tags: pgoyette-localcount-20161104
# 1.106 25-Oct-2016 ozaki-r

Remove unnecessary argument

No functional change.


# 1.105 21-Oct-2016 ozaki-r

Make some rt_timer functions and variables static

No functional change.


# 1.104 18-Oct-2016 ozaki-r

Remove unused rtcache_lookup_noclone


Revision tags: nick-nhusb-base-20161004
# 1.103 21-Sep-2016 roy

Add ifam_pid and ifam_addrflags to ifa_msghdr.
Re-version RTM_NEWADDR, RTM_DELADDR, RTM_CHGADDR and NET_RT_IFLIST.
Add compat code for old version.


Revision tags: localcount-20160914 pgoyette-localcount-20160806
# 1.102 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


Revision tags: pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529
# 1.101 28-Apr-2016 ozaki-r

branches: 1.101.2;
Constify rtentry of if_output

We no longer need to change rtentry below if_output.

The change makes it clear where rtentries are changed (or not)
and helps forthcoming locking (os psrefing) rtentries.


# 1.100 26-Apr-2016 ozaki-r

Stop using rt_gwroute on packet sending paths

rt_gwroute of rtentry is a reference to a rtentry of the gateway
for a rtentry with RTF_GATEWAY. That was used by L2 (arp and ndp)
to look up L2 addresses. By separating L2 nexthop caches, we don't
need a route for the purpose and we can stop using rt_gwroute.
By doing so, we can reduce referencing and modifying rtentries,
which makes it easy to apply a lock (and/or psref) to the
routing table and rtentries.

One issue to do this is to keep RTF_REJECT behavior. It seems it
was broken when we moved rtalloc1 things from L2 output routines
(e.g., ether_output) to ip_hresolv_output, but (fortunately?)
it works unexpectedly. What we mistook are:
- RTF_REJECT was checked for any routes in L2 output routines,
but in ip_hresolv_output it is checked only when the route
is RTF_GATEWAY
- The RTF_REJECT check wasn't copied to IPv6 (nd6_output)

It seems that rt_gwroute checks hid the mistakes and it looked
work (unexpectedly) and removing rt_gwroute checks unveil the
issue. So we need to fix RTF_REJECT checks in ip_hresolv_output
and also add them to nd6_output.

One more point we have to care is returning an errno; we need
to mimic looutput behavior. Originally RTF_REJECT check was
done either in L2 output routines or in looutput. The latter is
applied when a reject route directs to a loopback interface.
However, now RTF_REJECT check is done before looutput so to keep
the original behavior we need to return an errno which looutput
chooses. Added rt_check_reject_route does such tweaks.


Revision tags: nick-nhusb-base-20160422
# 1.99 11-Apr-2016 ozaki-r

Don't use radix tree API directly


# 1.98 04-Apr-2016 ozaki-r

Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html


# 1.97 24-Mar-2016 ozaki-r

Constify rt_newmsg's arguments


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.96 02-Sep-2015 ozaki-r

Do rt_refcnt++ when set a rtentry to another rtentry's rt_gwroute

And also do rtfree when deref a rtentry from rt_gwroute.


# 1.95 31-Aug-2015 ozaki-r

Hook up lltable/llentry with the kernel (and rumpkernel)

It is built and initialized on bootup, but there is no user for now.

Most codes in in.c are imported from FreeBSD as well as lltable/llentry.


# 1.94 31-Aug-2015 ozaki-r

Make rt_refcnt take into account rt_timer


# 1.93 24-Aug-2015 ozaki-r

Add an assertion; if rtcache has an rtentry, its refcnt must be > 0


# 1.92 17-Jul-2015 ozaki-r

Reform use of rt_refcnt

rt_refcnt of rtentry was used in bad manners, for example, direct rt_refcnt++
and rt_refcnt-- outside route.c, "rt->rt_refcnt++; rtfree(rt);" idiom, and
touching rt after rt->rt_refcnt--.

These abuses seem to be needed because rt_refcnt manages only references
between rtentry and doesn't take care of references during packet processing
(IOW references from local variables). In order to reduce the above abuses,
the latter cases should be counted by rt_refcnt as well as the former cases.

This change improves consistency of use of rt_refcnt:
- rtentry is always accessed with rt_refcnt incremented
- rtentry's rt_refcnt is decremented after use (rtfree is always used instead
of rt_refcnt--)
- functions returning rtentry increment its rt_refcnt (and caller rtfree it)

Note that rt_refcnt prevents rtentry from being freed but doesn't prevent
rtentry from being updated. Toward MP-safe, we need to provide another
protection for rtentry, e.g., locks. (Or introduce a better data structure
allowing concurrent readers during updates.)


Revision tags: nick-nhusb-base-20150606
# 1.91 30-Apr-2015 ozaki-r

Make some functions static

- rtflushall
- rtcache_clear
- rtcache_invalidate

And pull these static inline functions in route.c

- rt_destroy
- rt_setkey


Revision tags: nick-nhusb-base-20150406
# 1.90 06-Apr-2015 ozaki-r

Classify and sort prototype declarations

No functional change.


# 1.89 06-Apr-2015 ozaki-r

Make rt_maskedcopy static


# 1.88 23-Mar-2015 roy

Add RTF_BROADCAST to mark routes used for the broadcast address when
they are created on the fly. This makes it clear what the route is for
and allows an optimisation in ip_output() by avoiding a call to
in_broadcast() because most of the time we do talk to a host.
It also avoids a needless allocation for the storage of llinfo_arp and
thus vanishes from arp(8) - it showed as incomplete anyway so this
is a nice side effect.

Guard against this and routes marked with RTF_BLACKHOLE in
ip_fastforward().
While here, guard against routes marked with RTF_BLACKHOLE in
ip6_fastforward().
RTF_BROADCAST is IPv4 only, so don't bother checking that here.


# 1.87 26-Feb-2015 roy

Introduce the routing flag RTF_LOCAL to track local address routes.
Add functions rt_ifa_addlocal() and rt_ifa_remlocal() to add and remove
local routes for the address and announce the new address and route
to the routing socket.

Add in_ifaddlocal() and in_ifremlocal() to use these functions.
Rename in6_if{add,rem}loop() to in6_if{add,rem}local() and use these
functions.

rtinit() no longer announces the address, just the network route for the
address. As such, calls to rt_newaddrmsg() have been removed from
in_addprefix() and in_scrubprefix().

This solves the problem of potentially more than one announcement, or no
announcement at all for the address in certain situations.


# 1.86 25-Feb-2015 roy

Rename nd6_rtmsg() to rt_newmsg() and move into the generic routing code
as it's not IPv6 specific and will be used elsewhere.


# 1.85 24-Feb-2015 roy

Clean comments and style.


Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.84 06-Jun-2014 rmind

branches: 1.84.4;
- Eliminate RTFREE() macro in favour of rtfree() function.
- Make rtcache() function static.


Revision tags: yamt-pagecache-base9 rmind-smpnet-nbase rmind-smpnet-base
# 1.83 26-Apr-2014 pooka

It's been > 20years since rtioctl() did something. Let's just
remove that special way of returning EOPNOTSUPP.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base
# 1.82 01-Mar-2013 joerg

branches: 1.82.6; 1.82.10;
Retire OSI network stack. OK core@


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6 jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.81 18-Feb-2012 rmind

branches: 1.81.2;
rt_setkey: remove invalid assert, sockaddr_dup() may fail if no memory.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.80 11-Nov-2011 gdt

branches: 1.80.4;
Move RTF_ANNOUNCE flag so that it no longer conflicts with RTF_PROTO2.

RTF_ANNOUNCE was defined as RTF_PROTO2. The flag is used to indicated
that host should act as a proxy for a link level arp or ndp request.
(If RTF_PROTO2 is used as an experimental flag (as advertised),
various problems can occur.)

This commit provides a first-class definition with its own bit for
RTF_ANNOUNCE, removes the old aliasing definitions, and adds support
for the new RTF_ANNOUNCE flag to netstat(8) and route(8).,

Also, remove unused RTF_ flags that collide with RTF_PROTO1:
netinet/icmp6.h defined RTF_PROBEMTU as RTF_PROTO1
netinet/if_inarp.h defined RTF_USETRAILERS as RTF_PROTO1
(Neither of these flags are used anywhere. Both have been removed
to reduce chances of collision with RTF_PROTO1.)

Figuring this out and the diff are the work of Beverly Schwartz of
BBN.

(Passed release build, boot in VM, with no apparently related atf
failures.)

Approved for Public Release, Distribution Unlimited
This material is based upon work supported by the Defense Advanced
Research Projects Agency and Space and Naval Warfare Systems Center,
Pacific, under Contract No. N66001-09-C-2073.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.79 31-Mar-2011 dyoung

branches: 1.79.4;
Hide the radix-trie implementation of the forwarding table so that we
will have an easier time replacing it with something different, even if
it is a second radix-trie implementation.

sys/net/route.c and sys/net/rtsock.c no longer operate directly on
radix_nodes or radix_node_heads.

Hopefully this will reduce the temptation to implement multipath or
source-based routing using grotty hacks to the grotty old radix-trie
code, too. :-)


Revision tags: bouyer-quota2-nbase bouyer-quota2-base
# 1.78 01-Feb-2011 matt

Add a new AF/PF_ROUTE which is 64-bit clean which makes the routing socket
interface (and its associated sysctls) act identically for both 32 and 64 bit
programs. The old unclean one remains for backward compatibility.


# 1.77 26-Jan-2011 dyoung

Update comment on RTM_CHGADDR to describe better what it's for.


Revision tags: jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.76 12-Nov-2010 roy

branches: 1.76.2; 1.76.4;
Add RTM_CHGADDR to signal that an address on the interface has changed.
This is mainly used for notifying userland about active link address changes.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.75 26-Jun-2010 kefren

Add MPLS support, proposed on tech-net@ a couple of days ago

Welcome to 5.99.33


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211
# 1.74 03-Nov-2009 dyoung

branches: 1.74.2; 1.74.4;
s/u_quad_t/uint64_t/.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 jym-xensuspend-nbase yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.73 02-Apr-2009 christos

Centralize the ROUNDUP and ADVANCE macro in a header file, give them an
RT_ prefix and use them appropriately, instead of making copies. Make
pppd use the RT_ROUNDUP macro; fixes proxyarp setting on 64 bit hosts.

XXX: All this should be pulled up to 5.0


Revision tags: nick-hppapmap-base2 mjf-devfs2-base
# 1.72 11-Jan-2009 christos

branches: 1.72.2;
merge christos-time_t


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base christos-time_t-nbase haad-dm-base christos-time_t-base
# 1.71 07-Nov-2008 dyoung

*** Summary ***

When a link-layer address changes (e.g., ifconfig ex0 link
02:de:ad:be:ef:02 active), send a gratuitous ARP and/or a Neighbor
Advertisement to update the network-/link-layer address bindings
on our LAN peers.

Refuse a change of ethernet address to the address 00:00:00:00:00:00
or to any multicast/broadcast address. (Thanks matt@.)

Reorder ifnet ioctl operations so that driver ioctls may inherit
the functions of their "class"---ether_ioctl(), fddi_ioctl(), et
cetera---and the class ioctls may inherit from the generic ioctl,
ifioctl_common(), but both driver- and class-ioctls may override
the generic behavior. Make network drivers share more code.

Distinguish a "factory" link-layer address from others for the
purposes of both protecting that address from deletion and computing
EUI64.

Return consistent, appropriate error codes from network drivers.

Improve readability. KNF.

*** Details ***

In if_attach(), always initialize the interface ioctl routine,
ifnet->if_ioctl, if the driver has not already initialized it.
Delete if_ioctl == NULL tests everywhere else, because it cannot
happen.

In the ioctl routines of network interfaces, inherit common ioctl
behaviors by calling either ifioctl_common() or whichever ioctl
routine is appropriate for the class of interface---e.g., ether_ioctl()
for ethernets.

Stop (ab)using SIOCSIFADDR and start to use SIOCINITIFADDR. In
the user->kernel interface, SIOCSIFADDR's argument was an ifreq,
but on the protocol->ifnet interface, SIOCSIFADDR's argument was
an ifaddr. That was confusing, and it would work against me as I
make it possible for a network interface to overload most ioctls.
On the protocol->ifnet interface, replace SIOCSIFADDR with
SIOCINITIFADDR. In ifioctl(), return EPERM if userland tries to
invoke SIOCINITIFADDR.

In ifioctl(), give the interface the first shot at handling most
interface ioctls, and give the protocol the second shot, instead
of the other way around. Finally, let compatibility code (COMPAT_OSOCK)
take a shot.

Pull device initialization out of switch statements under
SIOCINITIFADDR. For example, pull ..._init() out of any switch
statement that looks like this:

switch (...->sa_family) {
case ...:
..._init();
...
break;
...
default:
..._init();
...
break;
}

Rewrite many if-else clauses that handle all permutations of IFF_UP
and IFF_RUNNING to use a switch statement,

switch (x & (IFF_UP|IFF_RUNNING)) {
case 0:
...
break;
case IFF_RUNNING:
...
break;
case IFF_UP:
...
break;
case IFF_UP|IFF_RUNNING:
...
break;
}

unifdef lots of code containing #ifdef FreeBSD, #ifdef NetBSD, and
#ifdef SIOCSIFMTU, especially in fwip(4) and in ndis(4).

In ipw(4), remove an if_set_sadl() call that is out of place.

In nfe(4), reuse the jumbo MTU logic in ether_ioctl().

Let ethernets register a callback for setting h/w state such as
promiscuous mode and the multicast filter in accord with a change
in the if_flags: ether_set_ifflags_cb() registers a callback that
returns ENETRESET if the caller should reset the ethernet by calling
if_init(), 0 on success, != 0 on failure. Pull common code from
ex(4), gem(4), nfe(4), sip(4), tlp(4), vge(4) into ether_ioctl(),
and register if_flags callbacks for those drivers.

Return ENOTTY instead of EINVAL for inappropriate ioctls. In
zyd(4), use ENXIO instead of ENOTTY to indicate that the device is
not any longer attached.

Add to if_set_sadl() a boolean 'factory' argument that indicates
whether a link-layer address was assigned by the factory or some
other source. In a comment, recommend using the factory address
for generating an EUI64, and update in6_get_hw_ifid() to prefer a
factory address to any other link-layer address.

Add a routing message, RTM_LLINFO_UPD, that tells protocols to
update the binding of network-layer addresses to link-layer addresses.
Implement this message in IPv4 and IPv6 by sending a gratuitous
ARP or a neighbor advertisement, respectively. Generate RTM_LLINFO_UPD
messages on a change of an interface's link-layer address.

In ether_ioctl(), do not let SIOCALIFADDR set a link-layer address
that is broadcast/multicast or equal to 00:00:00:00:00:00.

Make ether_ioctl() call ifioctl_common() to handle ioctls that it
does not understand.

In gif(4), initialize if_softc and use it, instead of assuming that
the gif_softc and ifp overlap.

Let ifioctl_common() handle SIOCGIFADDR.

Sprinkle rtcache_invariants(), which checks on DIAGNOSTIC kernels
that certain invariants on a struct route are satisfied.

In agr(4), rewrite agr_ioctl_filter() to be a bit more explicit
about the ioctls that we do not allow on an agr(4) member interface.

bzero -> memset. Delete unnecessary casts to void *. Use
sockaddr_in_init() and sockaddr_in6_init(). Compare pointers with
NULL instead of "testing truth". Replace some instances of (type
*)0 with NULL. Change some K&R prototypes to ANSI C, and join
lines.


Revision tags: netbsd-5-0-RC3 netbsd-5-0-RC2 netbsd-5-0-RC1 netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-baseX yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base yamt-nfs-mp-base yamt-pf42-base ad-socklock-base1
# 1.70 26-Mar-2008 ad

branches: 1.70.2; 1.70.6; 1.70.12; 1.70.14; 1.70.16;
Defer processing of routing messages to a soft interrupt. These can be
generated at IPL_VM and it's not safe to call directly into the socket
layer at that level. Reviewed by matt@.


Revision tags: yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase hpcarm-cleanup-base
# 1.69 20-Feb-2008 matt

branches: 1.69.2; 1.69.6;
s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)


Revision tags: mjf-devfs-base
# 1.68 11-Feb-2008 simonb

Don't look for <stdbool.h> if compiling _STANDALONE as well.


Revision tags: bouyer-xeni386-nbase
# 1.67 21-Jan-2008 dyoung

struct route is part of the kernel ABI (!!!), so move it back
outside of #ifdef _KERNEL. #include stdbool.h if !_KERNEL.


# 1.66 21-Jan-2008 dyoung

Move struct route inside of #ifdef _KERNEL to protect userland from
it.


# 1.65 21-Jan-2008 dyoung

In rtflushall(), do not clear a route cache by removing its rtentry
reference, but mark the cache 'invalid'. Let the next user of the
route cache check to whether or not the cache is valid, and update
the rtentry reference if necessary. In this way, avoid hairy
splnet()/splx() protection of route caches, which I never did trust.


Revision tags: bouyer-xeni386-base
# 1.64 14-Jan-2008 dyoung

Use rtcache_validate() instead of rtcache_getrt(). Delete rtcache_getrt().

In rtcache_lookup2(), use the return values of rtcache_validate()
and _rtcache_init() instead of looking at _ro_rt. Also, check the
return code of rtcache_setdst() for an error.


# 1.63 12-Jan-2008 dyoung

Good-bye, rtcache_check(). Call both rtcache_validate() and
rtcache_update(,1) instead of rtcache_check().


# 1.62 11-Jan-2008 dyoung

Cosmetic: remove redundant 'not' from a comment, re-wrap lines.


# 1.61 10-Jan-2008 dyoung

Make many void rtcache_X() routines return struct rtentry *, so
that we can make many back-to-back rtcache_X();rtcache_getrt()
calls into one rtcache_X() call.


Revision tags: matt-armv6-base
# 1.60 04-Jan-2008 dyoung

Replace rtcache_down() with rtcache_validate() and update rtcache_down()
uses.


Revision tags: vmlocking2-base3
# 1.59 20-Dec-2007 dyoung

Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.


Revision tags: nick-csl-alignment-base5 matt-armv6-prevmlocking yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base jmcneill-pm-base reinoud-bufcleanup-base vmlocking-base
# 1.58 27-Aug-2007 dyoung

branches: 1.58.2; 1.58.8; 1.58.10; 1.58.14;
Add a new routing message type, RTM_SETGATE. We can use an
RTM_SETGATE message to ask the link layer to fill in the link-layer
nexthop before we try to detect a duplicate route in a multipath-capable
kernel.


Revision tags: matt-mips64-base
# 1.57 19-Jul-2007 dyoung

branches: 1.57.4; 1.57.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.56 09-Jun-2007 dyoung

branches: 1.56.2;
Get rid of radix_node_head.rnh_walktree, because it is only ever
set to rn_walktree.

Introduce rt_walktree(), which applies a subroutine to every route
in a particular address family. Use it instead of rn_walktree()
virtually everywhere. This helps to hide the routing table
implementation.


Revision tags: yamt-idlelwp-base8
# 1.55 06-May-2007 dyoung

Factor rtcache_lookup2() out of rtcache_lookup1(), for re-use in
the IPv6 stack. rtcache_lookup2() takes an int * argument that it
writes with 1 if we had a cache 'hit', 0 if there was a cache
'miss'.


# 1.54 02-May-2007 dyoung

Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


# 1.53 22-Apr-2007 xtraeme

rtcache_clear is defined as static void in route.c, but it's used
in netinet/in_route.c. Move the prototype into route.h to fix
the build.


Revision tags: thorpej-atomic-base
# 1.52 04-Mar-2007 christos

branches: 1.52.2; 1.52.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.51 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.50 05-Jan-2007 joerg

branches: 1.50.2;
Add a debug option for the route cache to help tracing down issues
like PR 35272 and 35318. When the kernel is compiled with
-DRTCACHE_DEBUG, all rtcache entries are logged to a list with the place
they got initialised. This allows overwrites, double inits and other
manual messing to be detected.


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.49 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.48 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


# 1.47 07-Dec-2006 joerg

Deinline rt_get_ifa. Keep it in route.c as it is part of the routing
API, even though rtsock.c is the only user right now.


# 1.46 07-Dec-2006 joerg

Deinline rt_replace_ifa and move rt_set_ifa and rt_set_ifa1 to
route.c as they are not used outside that file.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base netbsd-4-base
# 1.45 13-Nov-2006 dyoung

Fix bugs in rt_get_ifa() and put aside the sequence number stuff,
which isn't ready for primetime yet.


# 1.44 13-Nov-2006 dyoung

Add a source-address selection policy mechanism to the kernel.

Also, add ioctls SIOCGIFADDRPREF/SIOCSIFADDRPREF to get/set preference
numbers for addresses. Make ifconfig(8) set/display preference
numbers.

To activate source-address selection policies in your kernel, add
'options IPSELSRC' to your kernel configuration.

Miscellaneous changes in support of source-address selection:

1 Factor out some common code, producing rt_replace_ifa().

2 Abbreviate a for-loop with TAILQ_FOREACH().

3 Add the predicates on IPv4 addresses IN_LINKLOCAL() and
IN_PRIVATE(), that are true for link-local unicast
(169.254/16) and RFC1918 private addresses, respectively.
Add the predicate IN_ANY_LOCAL() that is true for link-local
unicast and multicast.

4 Add IPv4-specific interface attach/detach routines,
in_domifattach and in_domifdetach, which build #ifdef
IPSELSRC.

See in_getifa(9) for a more thorough description of source-address
selection policy.


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base2 yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base simonb-timcounters-final yamt-pdpolicy-base5 chap-midi-base yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 elad-kernelauth-base yamt-pdpolicy-base yamt-uio_vmspace-base5 simonb-timecounters-base rpaulo-netinet-merge-pcb-base
# 1.43 11-Dec-2005 christos

branches: 1.43.20; 1.43.22;
merge ktrace-lwp.


Revision tags: ktrace-lwp-base
# 1.42 10-Dec-2005 elad

Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.41 22-Jun-2005 dyoung

branches: 1.41.2;
Resolve conflicts in importation of 18-May-2005 ath(4) / net80211(9)
from FreeBSD. Introduce compatibility shims (sys/dev/ic/ath_netbsd.[ch],
sys/net80211/ieee80211_netbsd.[ch]). Update drivers (an, atu, atw,
awi, ipw, iwi, rtw, wi) for the new net80211(9) API.


# 1.40 29-May-2005 christos

- sprinkle const
- remove unneeded casts
- use more mem*() instead of b*() funcs.


Revision tags: netbsd-3-1-1-RELEASE netbsd-3-0-3-RELEASE netbsd-3-1-RELEASE netbsd-3-0-2-RELEASE netbsd-3-1-RC4 netbsd-3-1-RC3 netbsd-3-1-RC2 netbsd-3-1-RC1 netbsd-3-0-1-RELEASE netbsd-3-0-RELEASE netbsd-3-0-RC6 netbsd-3-0-RC5 netbsd-3-0-RC4 netbsd-3-0-RC3 netbsd-3-0-RC2 netbsd-3-0-RC1 yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.39 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.38 21-Apr-2004 matt

branches: 1.38.4; 1.38.6;
ANSI-fy and some additional de-__P and constification.


# 1.37 21-Apr-2004 matt

Constify if.c radix.c and route.c (and fix related fallout).


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.36 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.35 29-Jun-2003 fvdl

branches: 1.35.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.34 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.33 18-Jan-2003 wiz

bandwidth, not bandwith.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.32 12-Nov-2002 itojun

remove all entries in rt timer queue on ip_mtudisc change, instead of
destroying the queue.


# 1.31 12-Nov-2002 itojun

add an argument to rt_timer_remove_all(), to specify if we need to call
timeout routine on removal.


# 1.30 02-Nov-2002 perry

/*CONTCOND*/ while (0)'ed macros


Revision tags: kqueue-aftermerge kqueue-beforemerge netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base gehenna-devsw-base kqueue-base
# 1.29 12-May-2002 matt

branches: 1.29.4;
Eliminate more commons.


Revision tags: eeh-devprop-base newlock-base ifpoll-base thorpej-mips-cache-base thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.28 08-Mar-2001 enami

branches: 1.28.2;
- lineup comment.
- fix typo in comment.


# 1.27 21-Feb-2001 itojun

branches: 1.27.2;
use u_quad_t for rtstat.
not sure if it really matters, but short (32K) looks way too small given
recent fat pipes connecting *BSD boxes, and our great uptime :-).


# 1.26 27-Jan-2001 itojun

cleanup cloned route when parent route (RTF_CLONING) goes away.
adds rt_parent to link parent from child (like NRL did, ours do refcnt
rt_refcnt properly).

bsdi rt_walkbranch would speedup the processing, but since the code will not
be visited too frequently, the current code (with rt_walktree) should be okay.


# 1.25 27-Jan-2001 itojun

mark cloned routes with RTF_CLONED. present it with netstat -r by "c".

let static routes overwrite cloned routes, as cloned routes can come back again
if necessary. behavior same as freebsd/bsdi, code partially from bsdi42.
(NRL rt->rt_parent was not added)
should fix PR 11916 and maybe some other PRs with ARP behavior.

recompilation of usr.sbin/route6d is suggested.


# 1.24 17-Jan-2001 itojun

pull post-4.4BSD change to sys/net/route.c from BSD/OS 4.2 (UCB copyrighted).

have sys/net/route.c:rtrequest1(), which takes rt_addrinfo * as the argument.
pass rt_addrinfo all the way down to rtrequest, and ifa->ifa_rtrequest.
3rd arg of ifa->ifa_rtrequest is now rt_addrinfo * instead of sockaddr *
(almost noone is using it anyways).

benefit: the follwoing command now works. previously we need two route(8)
invocations, "add" then "change".
# route add -inet6 default ::1 -ifp gif0

remove unsafe typecast in rtrequest(), from rtentry * to sockaddr *. it was
introduced by 4.3BSD-reno and never corrected.

XXX is eon_rtrequest() change correct regarding to 3rd arg?
eon_rtrequest() and rtrequest() were incorrect since 4.3BSD-reno,
so i do not have correct answer in the source code.
someone with more clue about netiso-over-ip, please help.


# 1.23 09-Dec-2000 itojun

update icmp6 too big validation. the change is necessary since pmtud is
mandatory for IPv6 (so we can't just validate by using connected pcb - we need
to allow traffic from unconnected pcb to do pmtud).
- if the traffic is validated by xx_ctlinput, allow up to "hiwat" pmtud
route entries.
- if the traffic was not validated by xx_ctlinput, allow up to "lowat" pmtud
route entries (there's upper limit, so bad guys cannot blow up our routing
table).
sync with kame

XXX need to think again about default hiwat/lowat value.
XXX victim selection to help starvation case


Revision tags: netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.22 04-May-2000 ragge

branches: 1.22.4;
Change rt_refcnt from short to int, to allow more than 32k routes thru
one interface without unexpected side effects.


# 1.21 06-Mar-2000 thorpej

- Add link status to if_data, so that routing daemons and other interested
parties can easily know the state of a link.
- Define an interface announcement message for the routing socket so that
routing daemons and other interested parties know when an interface
is attached/detached.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.20 19-Nov-1999 bouyer

Update protocoles and interfaces stats counters to 64bit.
RTM_IFINFO is now 0xf, 0xe is RTM_OIFINFO which returns the old (if_msghdr14)
struct with 32bit counters (binary compat, conditioned on COMPAT_14).
Same for sysctl: node 3 is renamed NET_RT_OIFLIST, NET_RT_IFLIST is now node 4.
Change rt_msg1() to add an mbuf to the mbuf chain instead of just panic()
when the message is larger than MHLEN.


Revision tags: comdex-fall-1999-base fvdl-softdep-base chs-ubc2-base
# 1.19 30-Jul-1999 itojun

branches: 1.19.2; 1.19.8;
remove reference to in6_systm.h (file itself will be removed afterwords)


# 1.18 01-Jul-1999 itojun

IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


Revision tags: netbsd-1-4-PATCH003 netbsd-1-4-PATCH002 netbsd-1-4-PATCH001 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.17 27-Dec-1998 thorpej

branches: 1.17.4; 1.17.6;
Simplify the rttimer code somewhat; use TAILQs instead of CIRCLEQs (we
didn't really need to traverse the queues backwards anyhow), and other
minor code simplification.


# 1.16 10-Dec-1998 christos

IPX counters and centralize statistics routine.


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.15 25-Aug-1998 thorpej

Use do { ... } while (0) in RTFREE().


Revision tags: eeh-paddr_t-base
# 1.14 02-May-1998 thorpej

Need <sys/socket.h> to stand alone.


# 1.13 29-Apr-1998 thorpej

Oops, we depend on <sys/queue.h>.


# 1.12 29-Apr-1998 kml

Add generic route timeout functionality; used by path MTU discovery code


Revision tags: netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base thorpej-signal-base marc-pcmcia-bp marc-pcmcia-base
# 1.11 02-Apr-1997 christos

branches: 1.11.8;
Sync with Lite2.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.10 22-May-1996 mycroft

Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.9 13-Feb-1996 christos

branches: 1.9.4;
Net prototypes


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.8 26-Mar-1995 jtc

KERNEL -> _KERNEL


# 1.7 08-Mar-1995 cgd

fixed sized types, where appropriate. when casting pointers to
integers to do math on them, cast to long. ioctl commands are
u_longs.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.6 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.5 13-May-1994 mycroft

Update to 4.4-Lite networking code, with a few local changes.


# 1.4 11-May-1994 mycroft

Update to RTM version 3. Add prototypes. Add some new constants which are
not used yet.


Revision tags: magnum-base netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.3 20-May-1993 cgd

add rcs ids to everything, and clean up headers


# 1.2 19-Apr-1993 mycroft

Add consistent multiple-inclusion protection.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


Revision tags: prg-localcount2-base pgoyette-localcount-20170426 bouyer-socketcan-base1
# 1.112 11-Apr-2017 roy

Add RO_MSGFILTER socket option to PF_ROUTE to filter out
un-wanted route(4) messages.

Inspired by the ROUTE_MSGFILTER equivalent in OpenBSD,
but with an API which allows the full range of potential message types.


Revision tags: jdolecek-ncq-base pgoyette-localcount-20170320 nick-nhusb-base-20170204 bouyer-socketcan-base pgoyette-localcount-20170107
# 1.111 19-Dec-2016 roy

branches: 1.111.2;
Fix gcc complaining about int to unsigned long conversion issues by
explictly marking as unsigned in RT_ROUNDUP2.


# 1.110 16-Dec-2016 christos

Can't hide stuff from userland, because struct route is embedded in other
structures (like inpcb) and things like fstat stop working.


# 1.109 12-Dec-2016 ozaki-r

Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.108 08-Dec-2016 ozaki-r

Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.


Revision tags: nick-nhusb-base-20161204
# 1.107 15-Nov-2016 ozaki-r

Don't use rt_walktree to delete routes

Some functions use rt_walktree to scan the routing table and delete
matched routes. However, we shouldn't use rt_walktree to delete
routes because rt_walktree is recursive to the routing table (radix
tree) and isn't friendly to MP-ification. rt_walktree allows a caller
to pass a callback function to delete an matched entry. The callback
function is called from an API of the radix tree (rn_walktree) but
also calls an API of the radix tree to delete an entry.

This change adds a new API of the radix tree, rn_search_matched,
which returns a matched entry that is selected by a callback
function passed by a caller and the caller itself deletes the
entry. By using the API, we can avoid the recursive form.


Revision tags: pgoyette-localcount-20161104
# 1.106 25-Oct-2016 ozaki-r

Remove unnecessary argument

No functional change.


# 1.105 21-Oct-2016 ozaki-r

Make some rt_timer functions and variables static

No functional change.


# 1.104 18-Oct-2016 ozaki-r

Remove unused rtcache_lookup_noclone


Revision tags: nick-nhusb-base-20161004
# 1.103 21-Sep-2016 roy

Add ifam_pid and ifam_addrflags to ifa_msghdr.
Re-version RTM_NEWADDR, RTM_DELADDR, RTM_CHGADDR and NET_RT_IFLIST.
Add compat code for old version.


Revision tags: localcount-20160914 pgoyette-localcount-20160806
# 1.102 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


Revision tags: pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529
# 1.101 28-Apr-2016 ozaki-r

branches: 1.101.2;
Constify rtentry of if_output

We no longer need to change rtentry below if_output.

The change makes it clear where rtentries are changed (or not)
and helps forthcoming locking (os psrefing) rtentries.


# 1.100 26-Apr-2016 ozaki-r

Stop using rt_gwroute on packet sending paths

rt_gwroute of rtentry is a reference to a rtentry of the gateway
for a rtentry with RTF_GATEWAY. That was used by L2 (arp and ndp)
to look up L2 addresses. By separating L2 nexthop caches, we don't
need a route for the purpose and we can stop using rt_gwroute.
By doing so, we can reduce referencing and modifying rtentries,
which makes it easy to apply a lock (and/or psref) to the
routing table and rtentries.

One issue to do this is to keep RTF_REJECT behavior. It seems it
was broken when we moved rtalloc1 things from L2 output routines
(e.g., ether_output) to ip_hresolv_output, but (fortunately?)
it works unexpectedly. What we mistook are:
- RTF_REJECT was checked for any routes in L2 output routines,
but in ip_hresolv_output it is checked only when the route
is RTF_GATEWAY
- The RTF_REJECT check wasn't copied to IPv6 (nd6_output)

It seems that rt_gwroute checks hid the mistakes and it looked
work (unexpectedly) and removing rt_gwroute checks unveil the
issue. So we need to fix RTF_REJECT checks in ip_hresolv_output
and also add them to nd6_output.

One more point we have to care is returning an errno; we need
to mimic looutput behavior. Originally RTF_REJECT check was
done either in L2 output routines or in looutput. The latter is
applied when a reject route directs to a loopback interface.
However, now RTF_REJECT check is done before looutput so to keep
the original behavior we need to return an errno which looutput
chooses. Added rt_check_reject_route does such tweaks.


Revision tags: nick-nhusb-base-20160422
# 1.99 11-Apr-2016 ozaki-r

Don't use radix tree API directly


# 1.98 04-Apr-2016 ozaki-r

Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html


# 1.97 24-Mar-2016 ozaki-r

Constify rt_newmsg's arguments


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.96 02-Sep-2015 ozaki-r

Do rt_refcnt++ when set a rtentry to another rtentry's rt_gwroute

And also do rtfree when deref a rtentry from rt_gwroute.


# 1.95 31-Aug-2015 ozaki-r

Hook up lltable/llentry with the kernel (and rumpkernel)

It is built and initialized on bootup, but there is no user for now.

Most codes in in.c are imported from FreeBSD as well as lltable/llentry.


# 1.94 31-Aug-2015 ozaki-r

Make rt_refcnt take into account rt_timer


# 1.93 24-Aug-2015 ozaki-r

Add an assertion; if rtcache has an rtentry, its refcnt must be > 0


# 1.92 17-Jul-2015 ozaki-r

Reform use of rt_refcnt

rt_refcnt of rtentry was used in bad manners, for example, direct rt_refcnt++
and rt_refcnt-- outside route.c, "rt->rt_refcnt++; rtfree(rt);" idiom, and
touching rt after rt->rt_refcnt--.

These abuses seem to be needed because rt_refcnt manages only references
between rtentry and doesn't take care of references during packet processing
(IOW references from local variables). In order to reduce the above abuses,
the latter cases should be counted by rt_refcnt as well as the former cases.

This change improves consistency of use of rt_refcnt:
- rtentry is always accessed with rt_refcnt incremented
- rtentry's rt_refcnt is decremented after use (rtfree is always used instead
of rt_refcnt--)
- functions returning rtentry increment its rt_refcnt (and caller rtfree it)

Note that rt_refcnt prevents rtentry from being freed but doesn't prevent
rtentry from being updated. Toward MP-safe, we need to provide another
protection for rtentry, e.g., locks. (Or introduce a better data structure
allowing concurrent readers during updates.)


Revision tags: nick-nhusb-base-20150606
# 1.91 30-Apr-2015 ozaki-r

Make some functions static

- rtflushall
- rtcache_clear
- rtcache_invalidate

And pull these static inline functions in route.c

- rt_destroy
- rt_setkey


Revision tags: nick-nhusb-base-20150406
# 1.90 06-Apr-2015 ozaki-r

Classify and sort prototype declarations

No functional change.


# 1.89 06-Apr-2015 ozaki-r

Make rt_maskedcopy static


# 1.88 23-Mar-2015 roy

Add RTF_BROADCAST to mark routes used for the broadcast address when
they are created on the fly. This makes it clear what the route is for
and allows an optimisation in ip_output() by avoiding a call to
in_broadcast() because most of the time we do talk to a host.
It also avoids a needless allocation for the storage of llinfo_arp and
thus vanishes from arp(8) - it showed as incomplete anyway so this
is a nice side effect.

Guard against this and routes marked with RTF_BLACKHOLE in
ip_fastforward().
While here, guard against routes marked with RTF_BLACKHOLE in
ip6_fastforward().
RTF_BROADCAST is IPv4 only, so don't bother checking that here.


# 1.87 26-Feb-2015 roy

Introduce the routing flag RTF_LOCAL to track local address routes.
Add functions rt_ifa_addlocal() and rt_ifa_remlocal() to add and remove
local routes for the address and announce the new address and route
to the routing socket.

Add in_ifaddlocal() and in_ifremlocal() to use these functions.
Rename in6_if{add,rem}loop() to in6_if{add,rem}local() and use these
functions.

rtinit() no longer announces the address, just the network route for the
address. As such, calls to rt_newaddrmsg() have been removed from
in_addprefix() and in_scrubprefix().

This solves the problem of potentially more than one announcement, or no
announcement at all for the address in certain situations.


# 1.86 25-Feb-2015 roy

Rename nd6_rtmsg() to rt_newmsg() and move into the generic routing code
as it's not IPv6 specific and will be used elsewhere.


# 1.85 24-Feb-2015 roy

Clean comments and style.


Revision tags: netbsd-7-1-RELEASE netbsd-7-1-RC2 netbsd-7-nhusb-base-20170116 netbsd-7-1-RC1 netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.84 06-Jun-2014 rmind

branches: 1.84.4;
- Eliminate RTFREE() macro in favour of rtfree() function.
- Make rtcache() function static.


Revision tags: yamt-pagecache-base9 rmind-smpnet-nbase rmind-smpnet-base
# 1.83 26-Apr-2014 pooka

It's been > 20years since rtioctl() did something. Let's just
remove that special way of returning EOPNOTSUPP.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base
# 1.82 01-Mar-2013 joerg

branches: 1.82.6; 1.82.10;
Retire OSI network stack. OK core@


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6 jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.81 18-Feb-2012 rmind

branches: 1.81.2;
rt_setkey: remove invalid assert, sockaddr_dup() may fail if no memory.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.80 11-Nov-2011 gdt

branches: 1.80.4;
Move RTF_ANNOUNCE flag so that it no longer conflicts with RTF_PROTO2.

RTF_ANNOUNCE was defined as RTF_PROTO2. The flag is used to indicated
that host should act as a proxy for a link level arp or ndp request.
(If RTF_PROTO2 is used as an experimental flag (as advertised),
various problems can occur.)

This commit provides a first-class definition with its own bit for
RTF_ANNOUNCE, removes the old aliasing definitions, and adds support
for the new RTF_ANNOUNCE flag to netstat(8) and route(8).,

Also, remove unused RTF_ flags that collide with RTF_PROTO1:
netinet/icmp6.h defined RTF_PROBEMTU as RTF_PROTO1
netinet/if_inarp.h defined RTF_USETRAILERS as RTF_PROTO1
(Neither of these flags are used anywhere. Both have been removed
to reduce chances of collision with RTF_PROTO1.)

Figuring this out and the diff are the work of Beverly Schwartz of
BBN.

(Passed release build, boot in VM, with no apparently related atf
failures.)

Approved for Public Release, Distribution Unlimited
This material is based upon work supported by the Defense Advanced
Research Projects Agency and Space and Naval Warfare Systems Center,
Pacific, under Contract No. N66001-09-C-2073.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.79 31-Mar-2011 dyoung

branches: 1.79.4;
Hide the radix-trie implementation of the forwarding table so that we
will have an easier time replacing it with something different, even if
it is a second radix-trie implementation.

sys/net/route.c and sys/net/rtsock.c no longer operate directly on
radix_nodes or radix_node_heads.

Hopefully this will reduce the temptation to implement multipath or
source-based routing using grotty hacks to the grotty old radix-trie
code, too. :-)


Revision tags: bouyer-quota2-nbase bouyer-quota2-base
# 1.78 01-Feb-2011 matt

Add a new AF/PF_ROUTE which is 64-bit clean which makes the routing socket
interface (and its associated sysctls) act identically for both 32 and 64 bit
programs. The old unclean one remains for backward compatibility.


# 1.77 26-Jan-2011 dyoung

Update comment on RTM_CHGADDR to describe better what it's for.


Revision tags: jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.76 12-Nov-2010 roy

branches: 1.76.2; 1.76.4;
Add RTM_CHGADDR to signal that an address on the interface has changed.
This is mainly used for notifying userland about active link address changes.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.75 26-Jun-2010 kefren

Add MPLS support, proposed on tech-net@ a couple of days ago

Welcome to 5.99.33


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211
# 1.74 03-Nov-2009 dyoung

branches: 1.74.2; 1.74.4;
s/u_quad_t/uint64_t/.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 jym-xensuspend-nbase yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.73 02-Apr-2009 christos

Centralize the ROUNDUP and ADVANCE macro in a header file, give them an
RT_ prefix and use them appropriately, instead of making copies. Make
pppd use the RT_ROUNDUP macro; fixes proxyarp setting on 64 bit hosts.

XXX: All this should be pulled up to 5.0


Revision tags: nick-hppapmap-base2 mjf-devfs2-base
# 1.72 11-Jan-2009 christos

branches: 1.72.2;
merge christos-time_t


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base christos-time_t-nbase haad-dm-base christos-time_t-base
# 1.71 07-Nov-2008 dyoung

*** Summary ***

When a link-layer address changes (e.g., ifconfig ex0 link
02:de:ad:be:ef:02 active), send a gratuitous ARP and/or a Neighbor
Advertisement to update the network-/link-layer address bindings
on our LAN peers.

Refuse a change of ethernet address to the address 00:00:00:00:00:00
or to any multicast/broadcast address. (Thanks matt@.)

Reorder ifnet ioctl operations so that driver ioctls may inherit
the functions of their "class"---ether_ioctl(), fddi_ioctl(), et
cetera---and the class ioctls may inherit from the generic ioctl,
ifioctl_common(), but both driver- and class-ioctls may override
the generic behavior. Make network drivers share more code.

Distinguish a "factory" link-layer address from others for the
purposes of both protecting that address from deletion and computing
EUI64.

Return consistent, appropriate error codes from network drivers.

Improve readability. KNF.

*** Details ***

In if_attach(), always initialize the interface ioctl routine,
ifnet->if_ioctl, if the driver has not already initialized it.
Delete if_ioctl == NULL tests everywhere else, because it cannot
happen.

In the ioctl routines of network interfaces, inherit common ioctl
behaviors by calling either ifioctl_common() or whichever ioctl
routine is appropriate for the class of interface---e.g., ether_ioctl()
for ethernets.

Stop (ab)using SIOCSIFADDR and start to use SIOCINITIFADDR. In
the user->kernel interface, SIOCSIFADDR's argument was an ifreq,
but on the protocol->ifnet interface, SIOCSIFADDR's argument was
an ifaddr. That was confusing, and it would work against me as I
make it possible for a network interface to overload most ioctls.
On the protocol->ifnet interface, replace SIOCSIFADDR with
SIOCINITIFADDR. In ifioctl(), return EPERM if userland tries to
invoke SIOCINITIFADDR.

In ifioctl(), give the interface the first shot at handling most
interface ioctls, and give the protocol the second shot, instead
of the other way around. Finally, let compatibility code (COMPAT_OSOCK)
take a shot.

Pull device initialization out of switch statements under
SIOCINITIFADDR. For example, pull ..._init() out of any switch
statement that looks like this:

switch (...->sa_family) {
case ...:
..._init();
...
break;
...
default:
..._init();
...
break;
}

Rewrite many if-else clauses that handle all permutations of IFF_UP
and IFF_RUNNING to use a switch statement,

switch (x & (IFF_UP|IFF_RUNNING)) {
case 0:
...
break;
case IFF_RUNNING:
...
break;
case IFF_UP:
...
break;
case IFF_UP|IFF_RUNNING:
...
break;
}

unifdef lots of code containing #ifdef FreeBSD, #ifdef NetBSD, and
#ifdef SIOCSIFMTU, especially in fwip(4) and in ndis(4).

In ipw(4), remove an if_set_sadl() call that is out of place.

In nfe(4), reuse the jumbo MTU logic in ether_ioctl().

Let ethernets register a callback for setting h/w state such as
promiscuous mode and the multicast filter in accord with a change
in the if_flags: ether_set_ifflags_cb() registers a callback that
returns ENETRESET if the caller should reset the ethernet by calling
if_init(), 0 on success, != 0 on failure. Pull common code from
ex(4), gem(4), nfe(4), sip(4), tlp(4), vge(4) into ether_ioctl(),
and register if_flags callbacks for those drivers.

Return ENOTTY instead of EINVAL for inappropriate ioctls. In
zyd(4), use ENXIO instead of ENOTTY to indicate that the device is
not any longer attached.

Add to if_set_sadl() a boolean 'factory' argument that indicates
whether a link-layer address was assigned by the factory or some
other source. In a comment, recommend using the factory address
for generating an EUI64, and update in6_get_hw_ifid() to prefer a
factory address to any other link-layer address.

Add a routing message, RTM_LLINFO_UPD, that tells protocols to
update the binding of network-layer addresses to link-layer addresses.
Implement this message in IPv4 and IPv6 by sending a gratuitous
ARP or a neighbor advertisement, respectively. Generate RTM_LLINFO_UPD
messages on a change of an interface's link-layer address.

In ether_ioctl(), do not let SIOCALIFADDR set a link-layer address
that is broadcast/multicast or equal to 00:00:00:00:00:00.

Make ether_ioctl() call ifioctl_common() to handle ioctls that it
does not understand.

In gif(4), initialize if_softc and use it, instead of assuming that
the gif_softc and ifp overlap.

Let ifioctl_common() handle SIOCGIFADDR.

Sprinkle rtcache_invariants(), which checks on DIAGNOSTIC kernels
that certain invariants on a struct route are satisfied.

In agr(4), rewrite agr_ioctl_filter() to be a bit more explicit
about the ioctls that we do not allow on an agr(4) member interface.

bzero -> memset. Delete unnecessary casts to void *. Use
sockaddr_in_init() and sockaddr_in6_init(). Compare pointers with
NULL instead of "testing truth". Replace some instances of (type
*)0 with NULL. Change some K&R prototypes to ANSI C, and join
lines.


Revision tags: netbsd-5-0-RC3 netbsd-5-0-RC2 netbsd-5-0-RC1 netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-baseX yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base yamt-nfs-mp-base yamt-pf42-base ad-socklock-base1
# 1.70 26-Mar-2008 ad

branches: 1.70.2; 1.70.6; 1.70.12; 1.70.14; 1.70.16;
Defer processing of routing messages to a soft interrupt. These can be
generated at IPL_VM and it's not safe to call directly into the socket
layer at that level. Reviewed by matt@.


Revision tags: yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase hpcarm-cleanup-base
# 1.69 20-Feb-2008 matt

branches: 1.69.2; 1.69.6;
s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)


Revision tags: mjf-devfs-base
# 1.68 11-Feb-2008 simonb

Don't look for <stdbool.h> if compiling _STANDALONE as well.


Revision tags: bouyer-xeni386-nbase
# 1.67 21-Jan-2008 dyoung

struct route is part of the kernel ABI (!!!), so move it back
outside of #ifdef _KERNEL. #include stdbool.h if !_KERNEL.


# 1.66 21-Jan-2008 dyoung

Move struct route inside of #ifdef _KERNEL to protect userland from
it.


# 1.65 21-Jan-2008 dyoung

In rtflushall(), do not clear a route cache by removing its rtentry
reference, but mark the cache 'invalid'. Let the next user of the
route cache check to whether or not the cache is valid, and update
the rtentry reference if necessary. In this way, avoid hairy
splnet()/splx() protection of route caches, which I never did trust.


Revision tags: bouyer-xeni386-base
# 1.64 14-Jan-2008 dyoung

Use rtcache_validate() instead of rtcache_getrt(). Delete rtcache_getrt().

In rtcache_lookup2(), use the return values of rtcache_validate()
and _rtcache_init() instead of looking at _ro_rt. Also, check the
return code of rtcache_setdst() for an error.


# 1.63 12-Jan-2008 dyoung

Good-bye, rtcache_check(). Call both rtcache_validate() and
rtcache_update(,1) instead of rtcache_check().


# 1.62 11-Jan-2008 dyoung

Cosmetic: remove redundant 'not' from a comment, re-wrap lines.


# 1.61 10-Jan-2008 dyoung

Make many void rtcache_X() routines return struct rtentry *, so
that we can make many back-to-back rtcache_X();rtcache_getrt()
calls into one rtcache_X() call.


Revision tags: matt-armv6-base
# 1.60 04-Jan-2008 dyoung

Replace rtcache_down() with rtcache_validate() and update rtcache_down()
uses.


Revision tags: vmlocking2-base3
# 1.59 20-Dec-2007 dyoung

Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.


Revision tags: nick-csl-alignment-base5 matt-armv6-prevmlocking yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base jmcneill-pm-base reinoud-bufcleanup-base vmlocking-base
# 1.58 27-Aug-2007 dyoung

branches: 1.58.2; 1.58.8; 1.58.10; 1.58.14;
Add a new routing message type, RTM_SETGATE. We can use an
RTM_SETGATE message to ask the link layer to fill in the link-layer
nexthop before we try to detect a duplicate route in a multipath-capable
kernel.


Revision tags: matt-mips64-base
# 1.57 19-Jul-2007 dyoung

branches: 1.57.4; 1.57.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.56 09-Jun-2007 dyoung

branches: 1.56.2;
Get rid of radix_node_head.rnh_walktree, because it is only ever
set to rn_walktree.

Introduce rt_walktree(), which applies a subroutine to every route
in a particular address family. Use it instead of rn_walktree()
virtually everywhere. This helps to hide the routing table
implementation.


Revision tags: yamt-idlelwp-base8
# 1.55 06-May-2007 dyoung

Factor rtcache_lookup2() out of rtcache_lookup1(), for re-use in
the IPv6 stack. rtcache_lookup2() takes an int * argument that it
writes with 1 if we had a cache 'hit', 0 if there was a cache
'miss'.


# 1.54 02-May-2007 dyoung

Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


# 1.53 22-Apr-2007 xtraeme

rtcache_clear is defined as static void in route.c, but it's used
in netinet/in_route.c. Move the prototype into route.h to fix
the build.


Revision tags: thorpej-atomic-base
# 1.52 04-Mar-2007 christos

branches: 1.52.2; 1.52.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.51 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.50 05-Jan-2007 joerg

branches: 1.50.2;
Add a debug option for the route cache to help tracing down issues
like PR 35272 and 35318. When the kernel is compiled with
-DRTCACHE_DEBUG, all rtcache entries are logged to a list with the place
they got initialised. This allows overwrites, double inits and other
manual messing to be detected.


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.49 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.48 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


# 1.47 07-Dec-2006 joerg

Deinline rt_get_ifa. Keep it in route.c as it is part of the routing
API, even though rtsock.c is the only user right now.


# 1.46 07-Dec-2006 joerg

Deinline rt_replace_ifa and move rt_set_ifa and rt_set_ifa1 to
route.c as they are not used outside that file.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base netbsd-4-base
# 1.45 13-Nov-2006 dyoung

Fix bugs in rt_get_ifa() and put aside the sequence number stuff,
which isn't ready for primetime yet.


# 1.44 13-Nov-2006 dyoung

Add a source-address selection policy mechanism to the kernel.

Also, add ioctls SIOCGIFADDRPREF/SIOCSIFADDRPREF to get/set preference
numbers for addresses. Make ifconfig(8) set/display preference
numbers.

To activate source-address selection policies in your kernel, add
'options IPSELSRC' to your kernel configuration.

Miscellaneous changes in support of source-address selection:

1 Factor out some common code, producing rt_replace_ifa().

2 Abbreviate a for-loop with TAILQ_FOREACH().

3 Add the predicates on IPv4 addresses IN_LINKLOCAL() and
IN_PRIVATE(), that are true for link-local unicast
(169.254/16) and RFC1918 private addresses, respectively.
Add the predicate IN_ANY_LOCAL() that is true for link-local
unicast and multicast.

4 Add IPv4-specific interface attach/detach routines,
in_domifattach and in_domifdetach, which build #ifdef
IPSELSRC.

See in_getifa(9) for a more thorough description of source-address
selection policy.


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base2 yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base simonb-timcounters-final yamt-pdpolicy-base5 chap-midi-base yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 elad-kernelauth-base yamt-pdpolicy-base yamt-uio_vmspace-base5 simonb-timecounters-base rpaulo-netinet-merge-pcb-base
# 1.43 11-Dec-2005 christos

branches: 1.43.20; 1.43.22;
merge ktrace-lwp.


Revision tags: ktrace-lwp-base
# 1.42 10-Dec-2005 elad

Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.41 22-Jun-2005 dyoung

branches: 1.41.2;
Resolve conflicts in importation of 18-May-2005 ath(4) / net80211(9)
from FreeBSD. Introduce compatibility shims (sys/dev/ic/ath_netbsd.[ch],
sys/net80211/ieee80211_netbsd.[ch]). Update drivers (an, atu, atw,
awi, ipw, iwi, rtw, wi) for the new net80211(9) API.


# 1.40 29-May-2005 christos

- sprinkle const
- remove unneeded casts
- use more mem*() instead of b*() funcs.


Revision tags: netbsd-3-1-1-RELEASE netbsd-3-0-3-RELEASE netbsd-3-1-RELEASE netbsd-3-0-2-RELEASE netbsd-3-1-RC4 netbsd-3-1-RC3 netbsd-3-1-RC2 netbsd-3-1-RC1 netbsd-3-0-1-RELEASE netbsd-3-0-RELEASE netbsd-3-0-RC6 netbsd-3-0-RC5 netbsd-3-0-RC4 netbsd-3-0-RC3 netbsd-3-0-RC2 netbsd-3-0-RC1 yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.39 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.38 21-Apr-2004 matt

branches: 1.38.4; 1.38.6;
ANSI-fy and some additional de-__P and constification.


# 1.37 21-Apr-2004 matt

Constify if.c radix.c and route.c (and fix related fallout).


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.36 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.35 29-Jun-2003 fvdl

branches: 1.35.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.34 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.33 18-Jan-2003 wiz

bandwidth, not bandwith.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.32 12-Nov-2002 itojun

remove all entries in rt timer queue on ip_mtudisc change, instead of
destroying the queue.


# 1.31 12-Nov-2002 itojun

add an argument to rt_timer_remove_all(), to specify if we need to call
timeout routine on removal.


# 1.30 02-Nov-2002 perry

/*CONTCOND*/ while (0)'ed macros


Revision tags: kqueue-aftermerge kqueue-beforemerge netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base gehenna-devsw-base kqueue-base
# 1.29 12-May-2002 matt

branches: 1.29.4;
Eliminate more commons.


Revision tags: eeh-devprop-base newlock-base ifpoll-base thorpej-mips-cache-base thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.28 08-Mar-2001 enami

branches: 1.28.2;
- lineup comment.
- fix typo in comment.


# 1.27 21-Feb-2001 itojun

branches: 1.27.2;
use u_quad_t for rtstat.
not sure if it really matters, but short (32K) looks way too small given
recent fat pipes connecting *BSD boxes, and our great uptime :-).


# 1.26 27-Jan-2001 itojun

cleanup cloned route when parent route (RTF_CLONING) goes away.
adds rt_parent to link parent from child (like NRL did, ours do refcnt
rt_refcnt properly).

bsdi rt_walkbranch would speedup the processing, but since the code will not
be visited too frequently, the current code (with rt_walktree) should be okay.


# 1.25 27-Jan-2001 itojun

mark cloned routes with RTF_CLONED. present it with netstat -r by "c".

let static routes overwrite cloned routes, as cloned routes can come back again
if necessary. behavior same as freebsd/bsdi, code partially from bsdi42.
(NRL rt->rt_parent was not added)
should fix PR 11916 and maybe some other PRs with ARP behavior.

recompilation of usr.sbin/route6d is suggested.


# 1.24 17-Jan-2001 itojun

pull post-4.4BSD change to sys/net/route.c from BSD/OS 4.2 (UCB copyrighted).

have sys/net/route.c:rtrequest1(), which takes rt_addrinfo * as the argument.
pass rt_addrinfo all the way down to rtrequest, and ifa->ifa_rtrequest.
3rd arg of ifa->ifa_rtrequest is now rt_addrinfo * instead of sockaddr *
(almost noone is using it anyways).

benefit: the follwoing command now works. previously we need two route(8)
invocations, "add" then "change".
# route add -inet6 default ::1 -ifp gif0

remove unsafe typecast in rtrequest(), from rtentry * to sockaddr *. it was
introduced by 4.3BSD-reno and never corrected.

XXX is eon_rtrequest() change correct regarding to 3rd arg?
eon_rtrequest() and rtrequest() were incorrect since 4.3BSD-reno,
so i do not have correct answer in the source code.
someone with more clue about netiso-over-ip, please help.


# 1.23 09-Dec-2000 itojun

update icmp6 too big validation. the change is necessary since pmtud is
mandatory for IPv6 (so we can't just validate by using connected pcb - we need
to allow traffic from unconnected pcb to do pmtud).
- if the traffic is validated by xx_ctlinput, allow up to "hiwat" pmtud
route entries.
- if the traffic was not validated by xx_ctlinput, allow up to "lowat" pmtud
route entries (there's upper limit, so bad guys cannot blow up our routing
table).
sync with kame

XXX need to think again about default hiwat/lowat value.
XXX victim selection to help starvation case


Revision tags: netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.22 04-May-2000 ragge

branches: 1.22.4;
Change rt_refcnt from short to int, to allow more than 32k routes thru
one interface without unexpected side effects.


# 1.21 06-Mar-2000 thorpej

- Add link status to if_data, so that routing daemons and other interested
parties can easily know the state of a link.
- Define an interface announcement message for the routing socket so that
routing daemons and other interested parties know when an interface
is attached/detached.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.20 19-Nov-1999 bouyer

Update protocoles and interfaces stats counters to 64bit.
RTM_IFINFO is now 0xf, 0xe is RTM_OIFINFO which returns the old (if_msghdr14)
struct with 32bit counters (binary compat, conditioned on COMPAT_14).
Same for sysctl: node 3 is renamed NET_RT_OIFLIST, NET_RT_IFLIST is now node 4.
Change rt_msg1() to add an mbuf to the mbuf chain instead of just panic()
when the message is larger than MHLEN.


Revision tags: comdex-fall-1999-base fvdl-softdep-base chs-ubc2-base
# 1.19 30-Jul-1999 itojun

branches: 1.19.2; 1.19.8;
remove reference to in6_systm.h (file itself will be removed afterwords)


# 1.18 01-Jul-1999 itojun

IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


Revision tags: netbsd-1-4-PATCH003 netbsd-1-4-PATCH002 netbsd-1-4-PATCH001 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.17 27-Dec-1998 thorpej

branches: 1.17.4; 1.17.6;
Simplify the rttimer code somewhat; use TAILQs instead of CIRCLEQs (we
didn't really need to traverse the queues backwards anyhow), and other
minor code simplification.


# 1.16 10-Dec-1998 christos

IPX counters and centralize statistics routine.


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.15 25-Aug-1998 thorpej

Use do { ... } while (0) in RTFREE().


Revision tags: eeh-paddr_t-base
# 1.14 02-May-1998 thorpej

Need <sys/socket.h> to stand alone.


# 1.13 29-Apr-1998 thorpej

Oops, we depend on <sys/queue.h>.


# 1.12 29-Apr-1998 kml

Add generic route timeout functionality; used by path MTU discovery code


Revision tags: netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base thorpej-signal-base marc-pcmcia-bp marc-pcmcia-base
# 1.11 02-Apr-1997 christos

branches: 1.11.8;
Sync with Lite2.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.10 22-May-1996 mycroft

Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.9 13-Feb-1996 christos

branches: 1.9.4;
Net prototypes


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.8 26-Mar-1995 jtc

KERNEL -> _KERNEL


# 1.7 08-Mar-1995 cgd

fixed sized types, where appropriate. when casting pointers to
integers to do math on them, cast to long. ioctl commands are
u_longs.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.6 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.5 13-May-1994 mycroft

Update to 4.4-Lite networking code, with a few local changes.


# 1.4 11-May-1994 mycroft

Update to RTM version 3. Add prototypes. Add some new constants which are
not used yet.


Revision tags: magnum-base netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.3 20-May-1993 cgd

add rcs ids to everything, and clean up headers


# 1.2 19-Apr-1993 mycroft

Add consistent multiple-inclusion protection.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision


# 1.111 19-Dec-2016 roy

Fix gcc complaining about int to unsigned long conversion issues by
explictly marking as unsigned in RT_ROUNDUP2.


# 1.110 16-Dec-2016 christos

Can't hide stuff from userland, because struct route is embedded in other
structures (like inpcb) and things like fstat stop working.


# 1.109 12-Dec-2016 ozaki-r

Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview


# 1.108 08-Dec-2016 ozaki-r

Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.


Revision tags: nick-nhusb-base-20161204
# 1.107 15-Nov-2016 ozaki-r

Don't use rt_walktree to delete routes

Some functions use rt_walktree to scan the routing table and delete
matched routes. However, we shouldn't use rt_walktree to delete
routes because rt_walktree is recursive to the routing table (radix
tree) and isn't friendly to MP-ification. rt_walktree allows a caller
to pass a callback function to delete an matched entry. The callback
function is called from an API of the radix tree (rn_walktree) but
also calls an API of the radix tree to delete an entry.

This change adds a new API of the radix tree, rn_search_matched,
which returns a matched entry that is selected by a callback
function passed by a caller and the caller itself deletes the
entry. By using the API, we can avoid the recursive form.


Revision tags: pgoyette-localcount-20161104
# 1.106 25-Oct-2016 ozaki-r

Remove unnecessary argument

No functional change.


# 1.105 21-Oct-2016 ozaki-r

Make some rt_timer functions and variables static

No functional change.


# 1.104 18-Oct-2016 ozaki-r

Remove unused rtcache_lookup_noclone


Revision tags: nick-nhusb-base-20161004
# 1.103 21-Sep-2016 roy

Add ifam_pid and ifam_addrflags to ifa_msghdr.
Re-version RTM_NEWADDR, RTM_DELADDR, RTM_CHGADDR and NET_RT_IFLIST.
Add compat code for old version.


Revision tags: localcount-20160914 pgoyette-localcount-20160806
# 1.102 01-Aug-2016 ozaki-r

Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.


Revision tags: pgoyette-localcount-20160726 pgoyette-localcount-base nick-nhusb-base-20160907 nick-nhusb-base-20160529
# 1.101 28-Apr-2016 ozaki-r

branches: 1.101.2;
Constify rtentry of if_output

We no longer need to change rtentry below if_output.

The change makes it clear where rtentries are changed (or not)
and helps forthcoming locking (os psrefing) rtentries.


# 1.100 26-Apr-2016 ozaki-r

Stop using rt_gwroute on packet sending paths

rt_gwroute of rtentry is a reference to a rtentry of the gateway
for a rtentry with RTF_GATEWAY. That was used by L2 (arp and ndp)
to look up L2 addresses. By separating L2 nexthop caches, we don't
need a route for the purpose and we can stop using rt_gwroute.
By doing so, we can reduce referencing and modifying rtentries,
which makes it easy to apply a lock (and/or psref) to the
routing table and rtentries.

One issue to do this is to keep RTF_REJECT behavior. It seems it
was broken when we moved rtalloc1 things from L2 output routines
(e.g., ether_output) to ip_hresolv_output, but (fortunately?)
it works unexpectedly. What we mistook are:
- RTF_REJECT was checked for any routes in L2 output routines,
but in ip_hresolv_output it is checked only when the route
is RTF_GATEWAY
- The RTF_REJECT check wasn't copied to IPv6 (nd6_output)

It seems that rt_gwroute checks hid the mistakes and it looked
work (unexpectedly) and removing rt_gwroute checks unveil the
issue. So we need to fix RTF_REJECT checks in ip_hresolv_output
and also add them to nd6_output.

One more point we have to care is returning an errno; we need
to mimic looutput behavior. Originally RTF_REJECT check was
done either in L2 output routines or in looutput. The latter is
applied when a reject route directs to a loopback interface.
However, now RTF_REJECT check is done before looutput so to keep
the original behavior we need to return an errno which looutput
chooses. Added rt_check_reject_route does such tweaks.


Revision tags: nick-nhusb-base-20160422
# 1.99 11-Apr-2016 ozaki-r

Don't use radix tree API directly


# 1.98 04-Apr-2016 ozaki-r

Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html


# 1.97 24-Mar-2016 ozaki-r

Constify rt_newmsg's arguments


Revision tags: nick-nhusb-base-20160319 nick-nhusb-base-20151226 nick-nhusb-base-20150921
# 1.96 02-Sep-2015 ozaki-r

Do rt_refcnt++ when set a rtentry to another rtentry's rt_gwroute

And also do rtfree when deref a rtentry from rt_gwroute.


# 1.95 31-Aug-2015 ozaki-r

Hook up lltable/llentry with the kernel (and rumpkernel)

It is built and initialized on bootup, but there is no user for now.

Most codes in in.c are imported from FreeBSD as well as lltable/llentry.


# 1.94 31-Aug-2015 ozaki-r

Make rt_refcnt take into account rt_timer


# 1.93 24-Aug-2015 ozaki-r

Add an assertion; if rtcache has an rtentry, its refcnt must be > 0


# 1.92 17-Jul-2015 ozaki-r

Reform use of rt_refcnt

rt_refcnt of rtentry was used in bad manners, for example, direct rt_refcnt++
and rt_refcnt-- outside route.c, "rt->rt_refcnt++; rtfree(rt);" idiom, and
touching rt after rt->rt_refcnt--.

These abuses seem to be needed because rt_refcnt manages only references
between rtentry and doesn't take care of references during packet processing
(IOW references from local variables). In order to reduce the above abuses,
the latter cases should be counted by rt_refcnt as well as the former cases.

This change improves consistency of use of rt_refcnt:
- rtentry is always accessed with rt_refcnt incremented
- rtentry's rt_refcnt is decremented after use (rtfree is always used instead
of rt_refcnt--)
- functions returning rtentry increment its rt_refcnt (and caller rtfree it)

Note that rt_refcnt prevents rtentry from being freed but doesn't prevent
rtentry from being updated. Toward MP-safe, we need to provide another
protection for rtentry, e.g., locks. (Or introduce a better data structure
allowing concurrent readers during updates.)


Revision tags: nick-nhusb-base-20150606
# 1.91 30-Apr-2015 ozaki-r

Make some functions static

- rtflushall
- rtcache_clear
- rtcache_invalidate

And pull these static inline functions in route.c

- rt_destroy
- rt_setkey


Revision tags: nick-nhusb-base-20150406
# 1.90 06-Apr-2015 ozaki-r

Classify and sort prototype declarations

No functional change.


# 1.89 06-Apr-2015 ozaki-r

Make rt_maskedcopy static


# 1.88 23-Mar-2015 roy

Add RTF_BROADCAST to mark routes used for the broadcast address when
they are created on the fly. This makes it clear what the route is for
and allows an optimisation in ip_output() by avoiding a call to
in_broadcast() because most of the time we do talk to a host.
It also avoids a needless allocation for the storage of llinfo_arp and
thus vanishes from arp(8) - it showed as incomplete anyway so this
is a nice side effect.

Guard against this and routes marked with RTF_BLACKHOLE in
ip_fastforward().
While here, guard against routes marked with RTF_BLACKHOLE in
ip6_fastforward().
RTF_BROADCAST is IPv4 only, so don't bother checking that here.


# 1.87 26-Feb-2015 roy

Introduce the routing flag RTF_LOCAL to track local address routes.
Add functions rt_ifa_addlocal() and rt_ifa_remlocal() to add and remove
local routes for the address and announce the new address and route
to the routing socket.

Add in_ifaddlocal() and in_ifremlocal() to use these functions.
Rename in6_if{add,rem}loop() to in6_if{add,rem}local() and use these
functions.

rtinit() no longer announces the address, just the network route for the
address. As such, calls to rt_newaddrmsg() have been removed from
in_addprefix() and in_scrubprefix().

This solves the problem of potentially more than one announcement, or no
announcement at all for the address in certain situations.


# 1.86 25-Feb-2015 roy

Rename nd6_rtmsg() to rt_newmsg() and move into the generic routing code
as it's not IPv6 specific and will be used elsewhere.


# 1.85 24-Feb-2015 roy

Clean comments and style.


Revision tags: netbsd-7-0-2-RELEASE netbsd-7-nhusb-base netbsd-7-0-1-RELEASE netbsd-7-0-RELEASE netbsd-7-0-RC3 netbsd-7-0-RC2 netbsd-7-0-RC1 nick-nhusb-base netbsd-7-base tls-earlyentropy-base tls-maxphys-base
# 1.84 06-Jun-2014 rmind

branches: 1.84.4;
- Eliminate RTFREE() macro in favour of rtfree() function.
- Make rtcache() function static.


Revision tags: yamt-pagecache-base9 rmind-smpnet-nbase rmind-smpnet-base
# 1.83 26-Apr-2014 pooka

It's been > 20years since rtioctl() did something. Let's just
remove that special way of returning EOPNOTSUPP.


Revision tags: riastradh-xf86-video-intel-2-7-1-pre-2-21-15 riastradh-drm2-base3 riastradh-drm2-base2 riastradh-drm2-base1 riastradh-drm2-base agc-symver-base
# 1.82 01-Mar-2013 joerg

branches: 1.82.6; 1.82.10;
Retire OSI network stack. OK core@


Revision tags: yamt-pagecache-base8 yamt-pagecache-base7 yamt-pagecache-base6 jmcneill-usbmp-base10 yamt-pagecache-base5 jmcneill-usbmp-base9 yamt-pagecache-base4 jmcneill-usbmp-base8 jmcneill-usbmp-base7 jmcneill-usbmp-base6 jmcneill-usbmp-base5 jmcneill-usbmp-base4 jmcneill-usbmp-base3
# 1.81 18-Feb-2012 rmind

branches: 1.81.2;
rt_setkey: remove invalid assert, sockaddr_dup() may fail if no memory.


Revision tags: netbsd-6-0-6-RELEASE netbsd-6-1-5-RELEASE netbsd-6-1-4-RELEASE netbsd-6-0-5-RELEASE netbsd-6-1-3-RELEASE netbsd-6-0-4-RELEASE netbsd-6-1-2-RELEASE netbsd-6-0-3-RELEASE netbsd-6-1-1-RELEASE netbsd-6-0-2-RELEASE netbsd-6-1-RELEASE netbsd-6-1-RC4 netbsd-6-1-RC3 netbsd-6-1-RC2 netbsd-6-1-RC1 netbsd-6-0-1-RELEASE matt-nb6-plus-nbase netbsd-6-0-RELEASE netbsd-6-0-RC2 matt-nb6-plus-base netbsd-6-0-RC1 jmcneill-usbmp-pre-base2 jmcneill-usbmp-base2 netbsd-6-base jmcneill-usbmp-base jmcneill-audiomp3-base
# 1.80 11-Nov-2011 gdt

branches: 1.80.4;
Move RTF_ANNOUNCE flag so that it no longer conflicts with RTF_PROTO2.

RTF_ANNOUNCE was defined as RTF_PROTO2. The flag is used to indicated
that host should act as a proxy for a link level arp or ndp request.
(If RTF_PROTO2 is used as an experimental flag (as advertised),
various problems can occur.)

This commit provides a first-class definition with its own bit for
RTF_ANNOUNCE, removes the old aliasing definitions, and adds support
for the new RTF_ANNOUNCE flag to netstat(8) and route(8).,

Also, remove unused RTF_ flags that collide with RTF_PROTO1:
netinet/icmp6.h defined RTF_PROBEMTU as RTF_PROTO1
netinet/if_inarp.h defined RTF_USETRAILERS as RTF_PROTO1
(Neither of these flags are used anywhere. Both have been removed
to reduce chances of collision with RTF_PROTO1.)

Figuring this out and the diff are the work of Beverly Schwartz of
BBN.

(Passed release build, boot in VM, with no apparently related atf
failures.)

Approved for Public Release, Distribution Unlimited
This material is based upon work supported by the Defense Advanced
Research Projects Agency and Space and Naval Warfare Systems Center,
Pacific, under Contract No. N66001-09-C-2073.


Revision tags: yamt-pagecache-base3 yamt-pagecache-base2 yamt-pagecache-base rmind-uvmplock-nbase cherry-xenmp-base rmind-uvmplock-base
# 1.79 31-Mar-2011 dyoung

branches: 1.79.4;
Hide the radix-trie implementation of the forwarding table so that we
will have an easier time replacing it with something different, even if
it is a second radix-trie implementation.

sys/net/route.c and sys/net/rtsock.c no longer operate directly on
radix_nodes or radix_node_heads.

Hopefully this will reduce the temptation to implement multipath or
source-based routing using grotty hacks to the grotty old radix-trie
code, too. :-)


Revision tags: bouyer-quota2-nbase bouyer-quota2-base
# 1.78 01-Feb-2011 matt

Add a new AF/PF_ROUTE which is 64-bit clean which makes the routing socket
interface (and its associated sysctls) act identically for both 32 and 64 bit
programs. The old unclean one remains for backward compatibility.


# 1.77 26-Jan-2011 dyoung

Update comment on RTM_CHGADDR to describe better what it's for.


Revision tags: jruoho-x86intr-base matt-mips64-premerge-20101231
# 1.76 12-Nov-2010 roy

branches: 1.76.2; 1.76.4;
Add RTM_CHGADDR to signal that an address on the interface has changed.
This is mainly used for notifying userland about active link address changes.


Revision tags: uebayasi-xip-base4 uebayasi-xip-base3 yamt-nfs-mp-base11 uebayasi-xip-base2 yamt-nfs-mp-base10
# 1.75 26-Jun-2010 kefren

Add MPLS support, proposed on tech-net@ a couple of days ago

Welcome to 5.99.33


Revision tags: uebayasi-xip-base1 yamt-nfs-mp-base9 uebayasi-xip-base matt-premerge-20091211
# 1.74 03-Nov-2009 dyoung

branches: 1.74.2; 1.74.4;
s/u_quad_t/uint64_t/.


Revision tags: yamt-nfs-mp-base8 yamt-nfs-mp-base7 jymxensuspend-base yamt-nfs-mp-base6 yamt-nfs-mp-base5 yamt-nfs-mp-base4 jym-xensuspend-nbase yamt-nfs-mp-base3 nick-hppapmap-base4 nick-hppapmap-base3 jym-xensuspend-base nick-hppapmap-base
# 1.73 02-Apr-2009 christos

Centralize the ROUNDUP and ADVANCE macro in a header file, give them an
RT_ prefix and use them appropriately, instead of making copies. Make
pppd use the RT_ROUNDUP macro; fixes proxyarp setting on 64 bit hosts.

XXX: All this should be pulled up to 5.0


Revision tags: nick-hppapmap-base2 mjf-devfs2-base
# 1.72 11-Jan-2009 christos

branches: 1.72.2;
merge christos-time_t


Revision tags: haad-dm-base2 haad-nbase2 ad-audiomp2-base christos-time_t-nbase haad-dm-base christos-time_t-base
# 1.71 07-Nov-2008 dyoung

*** Summary ***

When a link-layer address changes (e.g., ifconfig ex0 link
02:de:ad:be:ef:02 active), send a gratuitous ARP and/or a Neighbor
Advertisement to update the network-/link-layer address bindings
on our LAN peers.

Refuse a change of ethernet address to the address 00:00:00:00:00:00
or to any multicast/broadcast address. (Thanks matt@.)

Reorder ifnet ioctl operations so that driver ioctls may inherit
the functions of their "class"---ether_ioctl(), fddi_ioctl(), et
cetera---and the class ioctls may inherit from the generic ioctl,
ifioctl_common(), but both driver- and class-ioctls may override
the generic behavior. Make network drivers share more code.

Distinguish a "factory" link-layer address from others for the
purposes of both protecting that address from deletion and computing
EUI64.

Return consistent, appropriate error codes from network drivers.

Improve readability. KNF.

*** Details ***

In if_attach(), always initialize the interface ioctl routine,
ifnet->if_ioctl, if the driver has not already initialized it.
Delete if_ioctl == NULL tests everywhere else, because it cannot
happen.

In the ioctl routines of network interfaces, inherit common ioctl
behaviors by calling either ifioctl_common() or whichever ioctl
routine is appropriate for the class of interface---e.g., ether_ioctl()
for ethernets.

Stop (ab)using SIOCSIFADDR and start to use SIOCINITIFADDR. In
the user->kernel interface, SIOCSIFADDR's argument was an ifreq,
but on the protocol->ifnet interface, SIOCSIFADDR's argument was
an ifaddr. That was confusing, and it would work against me as I
make it possible for a network interface to overload most ioctls.
On the protocol->ifnet interface, replace SIOCSIFADDR with
SIOCINITIFADDR. In ifioctl(), return EPERM if userland tries to
invoke SIOCINITIFADDR.

In ifioctl(), give the interface the first shot at handling most
interface ioctls, and give the protocol the second shot, instead
of the other way around. Finally, let compatibility code (COMPAT_OSOCK)
take a shot.

Pull device initialization out of switch statements under
SIOCINITIFADDR. For example, pull ..._init() out of any switch
statement that looks like this:

switch (...->sa_family) {
case ...:
..._init();
...
break;
...
default:
..._init();
...
break;
}

Rewrite many if-else clauses that handle all permutations of IFF_UP
and IFF_RUNNING to use a switch statement,

switch (x & (IFF_UP|IFF_RUNNING)) {
case 0:
...
break;
case IFF_RUNNING:
...
break;
case IFF_UP:
...
break;
case IFF_UP|IFF_RUNNING:
...
break;
}

unifdef lots of code containing #ifdef FreeBSD, #ifdef NetBSD, and
#ifdef SIOCSIFMTU, especially in fwip(4) and in ndis(4).

In ipw(4), remove an if_set_sadl() call that is out of place.

In nfe(4), reuse the jumbo MTU logic in ether_ioctl().

Let ethernets register a callback for setting h/w state such as
promiscuous mode and the multicast filter in accord with a change
in the if_flags: ether_set_ifflags_cb() registers a callback that
returns ENETRESET if the caller should reset the ethernet by calling
if_init(), 0 on success, != 0 on failure. Pull common code from
ex(4), gem(4), nfe(4), sip(4), tlp(4), vge(4) into ether_ioctl(),
and register if_flags callbacks for those drivers.

Return ENOTTY instead of EINVAL for inappropriate ioctls. In
zyd(4), use ENXIO instead of ENOTTY to indicate that the device is
not any longer attached.

Add to if_set_sadl() a boolean 'factory' argument that indicates
whether a link-layer address was assigned by the factory or some
other source. In a comment, recommend using the factory address
for generating an EUI64, and update in6_get_hw_ifid() to prefer a
factory address to any other link-layer address.

Add a routing message, RTM_LLINFO_UPD, that tells protocols to
update the binding of network-layer addresses to link-layer addresses.
Implement this message in IPv4 and IPv6 by sending a gratuitous
ARP or a neighbor advertisement, respectively. Generate RTM_LLINFO_UPD
messages on a change of an interface's link-layer address.

In ether_ioctl(), do not let SIOCALIFADDR set a link-layer address
that is broadcast/multicast or equal to 00:00:00:00:00:00.

Make ether_ioctl() call ifioctl_common() to handle ioctls that it
does not understand.

In gif(4), initialize if_softc and use it, instead of assuming that
the gif_softc and ifp overlap.

Let ifioctl_common() handle SIOCGIFADDR.

Sprinkle rtcache_invariants(), which checks on DIAGNOSTIC kernels
that certain invariants on a struct route are satisfied.

In agr(4), rewrite agr_ioctl_filter() to be a bit more explicit
about the ioctls that we do not allow on an agr(4) member interface.

bzero -> memset. Delete unnecessary casts to void *. Use
sockaddr_in_init() and sockaddr_in6_init(). Compare pointers with
NULL instead of "testing truth". Replace some instances of (type
*)0 with NULL. Change some K&R prototypes to ANSI C, and join
lines.


Revision tags: netbsd-5-0-RC3 netbsd-5-0-RC2 netbsd-5-0-RC1 netbsd-5-base matt-mips64-base2 haad-dm-base1 wrstuden-revivesa-base-4 wrstuden-revivesa-base-3 wrstuden-revivesa-base-2 wrstuden-revivesa-base-1 simonb-wapbl-nbase yamt-pf42-base4 simonb-wapbl-base yamt-pf42-base3 hpcarm-cleanup-nbase yamt-pf42-baseX yamt-pf42-base2 yamt-nfs-mp-base2 wrstuden-revivesa-base yamt-nfs-mp-base yamt-pf42-base ad-socklock-base1
# 1.70 26-Mar-2008 ad

branches: 1.70.2; 1.70.6; 1.70.12; 1.70.14; 1.70.16;
Defer processing of routing messages to a soft interrupt. These can be
generated at IPL_VM and it's not safe to call directly into the socket
layer at that level. Reviewed by matt@.


Revision tags: yamt-lazymbuf-base15 yamt-lazymbuf-base14 keiichi-mipv6-nbase nick-net80211-sync-base keiichi-mipv6-base matt-armv6-nbase hpcarm-cleanup-base
# 1.69 20-Feb-2008 matt

branches: 1.69.2; 1.69.6;
s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)


Revision tags: mjf-devfs-base
# 1.68 11-Feb-2008 simonb

Don't look for <stdbool.h> if compiling _STANDALONE as well.


Revision tags: bouyer-xeni386-nbase
# 1.67 21-Jan-2008 dyoung

struct route is part of the kernel ABI (!!!), so move it back
outside of #ifdef _KERNEL. #include stdbool.h if !_KERNEL.


# 1.66 21-Jan-2008 dyoung

Move struct route inside of #ifdef _KERNEL to protect userland from
it.


# 1.65 21-Jan-2008 dyoung

In rtflushall(), do not clear a route cache by removing its rtentry
reference, but mark the cache 'invalid'. Let the next user of the
route cache check to whether or not the cache is valid, and update
the rtentry reference if necessary. In this way, avoid hairy
splnet()/splx() protection of route caches, which I never did trust.


Revision tags: bouyer-xeni386-base
# 1.64 14-Jan-2008 dyoung

Use rtcache_validate() instead of rtcache_getrt(). Delete rtcache_getrt().

In rtcache_lookup2(), use the return values of rtcache_validate()
and _rtcache_init() instead of looking at _ro_rt. Also, check the
return code of rtcache_setdst() for an error.


# 1.63 12-Jan-2008 dyoung

Good-bye, rtcache_check(). Call both rtcache_validate() and
rtcache_update(,1) instead of rtcache_check().


# 1.62 11-Jan-2008 dyoung

Cosmetic: remove redundant 'not' from a comment, re-wrap lines.


# 1.61 10-Jan-2008 dyoung

Make many void rtcache_X() routines return struct rtentry *, so
that we can make many back-to-back rtcache_X();rtcache_getrt()
calls into one rtcache_X() call.


Revision tags: matt-armv6-base
# 1.60 04-Jan-2008 dyoung

Replace rtcache_down() with rtcache_validate() and update rtcache_down()
uses.


Revision tags: vmlocking2-base3
# 1.59 20-Dec-2007 dyoung

Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.


Revision tags: nick-csl-alignment-base5 matt-armv6-prevmlocking yamt-kmem-base3 cube-autoconf-base yamt-kmem-base2 yamt-kmem-base vmlocking2-base2 reinoud-bufcleanup-nbase vmlocking2-base1 jmcneill-base bouyer-xenamd64-base2 vmlocking-nbase yamt-x86pmap-base4 bouyer-xenamd64-base yamt-x86pmap-base3 yamt-x86pmap-base2 yamt-x86pmap-base jmcneill-pm-base reinoud-bufcleanup-base vmlocking-base
# 1.58 27-Aug-2007 dyoung

branches: 1.58.2; 1.58.8; 1.58.10; 1.58.14;
Add a new routing message type, RTM_SETGATE. We can use an
RTM_SETGATE message to ask the link layer to fill in the link-layer
nexthop before we try to detect a duplicate route in a multipath-capable
kernel.


Revision tags: matt-mips64-base
# 1.57 19-Jul-2007 dyoung

branches: 1.57.4; 1.57.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.


Revision tags: nick-csl-alignment-base mjf-ufs-trans-base
# 1.56 09-Jun-2007 dyoung

branches: 1.56.2;
Get rid of radix_node_head.rnh_walktree, because it is only ever
set to rn_walktree.

Introduce rt_walktree(), which applies a subroutine to every route
in a particular address family. Use it instead of rn_walktree()
virtually everywhere. This helps to hide the routing table
implementation.


Revision tags: yamt-idlelwp-base8
# 1.55 06-May-2007 dyoung

Factor rtcache_lookup2() out of rtcache_lookup1(), for re-use in
the IPv6 stack. rtcache_lookup2() takes an int * argument that it
writes with 1 if we had a cache 'hit', 0 if there was a cache
'miss'.


# 1.54 02-May-2007 dyoung

Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.


# 1.53 22-Apr-2007 xtraeme

rtcache_clear is defined as static void in route.c, but it's used
in netinet/in_route.c. Move the prototype into route.h to fix
the build.


Revision tags: thorpej-atomic-base
# 1.52 04-Mar-2007 christos

branches: 1.52.2; 1.52.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.


Revision tags: ad-audiomp-base
# 1.51 17-Feb-2007 dyoung

KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.


Revision tags: post-newlock2-merge newlock2-nbase newlock2-base
# 1.50 05-Jan-2007 joerg

branches: 1.50.2;
Add a debug option for the route cache to help tracing down issues
like PR 35272 and 35318. When the kernel is compiled with
-DRTCACHE_DEBUG, all rtcache entries are logged to a list with the place
they got initialised. This allows overwrites, double inits and other
manual messing to be detected.


Revision tags: yamt-splraiseipl-base5 yamt-splraiseipl-base4
# 1.49 15-Dec-2006 joerg

Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.


Revision tags: yamt-splraiseipl-base3
# 1.48 09-Dec-2006 dyoung

Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.


# 1.47 07-Dec-2006 joerg

Deinline rt_get_ifa. Keep it in route.c as it is part of the routing
API, even though rtsock.c is the only user right now.


# 1.46 07-Dec-2006 joerg

Deinline rt_replace_ifa and move rt_set_ifa and rt_set_ifa1 to
route.c as they are not used outside that file.


Revision tags: netbsd-4-0-1-RELEASE wrstuden-fixsa-newbase wrstuden-fixsa-base-1 netbsd-4-0-RELEASE netbsd-4-0-RC5 matt-nb4-arm-base netbsd-4-0-RC4 netbsd-4-0-RC3 netbsd-4-0-RC2 netbsd-4-0-RC1 wrstuden-fixsa-base netbsd-4-base
# 1.45 13-Nov-2006 dyoung

Fix bugs in rt_get_ifa() and put aside the sequence number stuff,
which isn't ready for primetime yet.


# 1.44 13-Nov-2006 dyoung

Add a source-address selection policy mechanism to the kernel.

Also, add ioctls SIOCGIFADDRPREF/SIOCSIFADDRPREF to get/set preference
numbers for addresses. Make ifconfig(8) set/display preference
numbers.

To activate source-address selection policies in your kernel, add
'options IPSELSRC' to your kernel configuration.

Miscellaneous changes in support of source-address selection:

1 Factor out some common code, producing rt_replace_ifa().

2 Abbreviate a for-loop with TAILQ_FOREACH().

3 Add the predicates on IPv4 addresses IN_LINKLOCAL() and
IN_PRIVATE(), that are true for link-local unicast
(169.254/16) and RFC1918 private addresses, respectively.
Add the predicate IN_ANY_LOCAL() that is true for link-local
unicast and multicast.

4 Add IPv4-specific interface attach/detach routines,
in_domifattach and in_domifdetach, which build #ifdef
IPSELSRC.

See in_getifa(9) for a more thorough description of source-address
selection policy.


Revision tags: abandoned-netbsd-4-base yamt-splraiseipl-base2 yamt-splraiseipl-base yamt-pdpolicy-base9 yamt-pdpolicy-base8 yamt-pdpolicy-base7 yamt-pdpolicy-base6 chap-midi-nbase gdamore-uart-base simonb-timcounters-final yamt-pdpolicy-base5 chap-midi-base yamt-pdpolicy-base4 yamt-pdpolicy-base3 peter-altq-base yamt-pdpolicy-base2 elad-kernelauth-base yamt-pdpolicy-base yamt-uio_vmspace-base5 simonb-timecounters-base rpaulo-netinet-merge-pcb-base
# 1.43 11-Dec-2005 christos

branches: 1.43.20; 1.43.22;
merge ktrace-lwp.


Revision tags: ktrace-lwp-base
# 1.42 10-Dec-2005 elad

Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.


Revision tags: yamt-readahead-base3 yamt-readahead-base2 yamt-readahead-pervnode yamt-readahead-perfile yamt-readahead-base yamt-vop-base3 yamt-vop-base2 thorpej-vnode-attr-base yamt-vop-base
# 1.41 22-Jun-2005 dyoung

branches: 1.41.2;
Resolve conflicts in importation of 18-May-2005 ath(4) / net80211(9)
from FreeBSD. Introduce compatibility shims (sys/dev/ic/ath_netbsd.[ch],
sys/net80211/ieee80211_netbsd.[ch]). Update drivers (an, atu, atw,
awi, ipw, iwi, rtw, wi) for the new net80211(9) API.


# 1.40 29-May-2005 christos

- sprinkle const
- remove unneeded casts
- use more mem*() instead of b*() funcs.


Revision tags: netbsd-3-1-1-RELEASE netbsd-3-0-3-RELEASE netbsd-3-1-RELEASE netbsd-3-0-2-RELEASE netbsd-3-1-RC4 netbsd-3-1-RC3 netbsd-3-1-RC2 netbsd-3-1-RC1 netbsd-3-0-1-RELEASE netbsd-3-0-RELEASE netbsd-3-0-RC6 netbsd-3-0-RC5 netbsd-3-0-RC4 netbsd-3-0-RC3 netbsd-3-0-RC2 netbsd-3-0-RC1 yamt-km-base4 yamt-km-base3 netbsd-3-base kent-audio2-base
# 1.39 26-Feb-2005 perry

nuke trailing whitespace


Revision tags: yamt-km-base2 yamt-km-base kent-audio1-beforemerge kent-audio1-base
# 1.38 21-Apr-2004 matt

branches: 1.38.4; 1.38.6;
ANSI-fy and some additional de-__P and constification.


# 1.37 21-Apr-2004 matt

Constify if.c radix.c and route.c (and fix related fallout).


Revision tags: netbsd-2-0-3-RELEASE netbsd-2-1-RELEASE netbsd-2-1-RC6 netbsd-2-1-RC5 netbsd-2-1-RC4 netbsd-2-1-RC3 netbsd-2-1-RC2 netbsd-2-1-RC1 netbsd-2-0-2-RELEASE netbsd-2-0-1-RELEASE netbsd-2-base netbsd-2-0-RELEASE netbsd-2-0-RC5 netbsd-2-0-RC4 netbsd-2-0-RC3 netbsd-2-0-RC2 netbsd-2-0-RC1 netbsd-2-0-base
# 1.36 07-Aug-2003 agc

Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.


# 1.35 29-Jun-2003 fvdl

branches: 1.35.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.


# 1.34 28-Jun-2003 darrenr

Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V


# 1.33 18-Jan-2003 wiz

bandwidth, not bandwith.


Revision tags: nathanw_sa_before_merge fvdl_fs64_base gmcgarry_ctxsw_base gmcgarry_ucred_base nathanw_sa_base
# 1.32 12-Nov-2002 itojun

remove all entries in rt timer queue on ip_mtudisc change, instead of
destroying the queue.


# 1.31 12-Nov-2002 itojun

add an argument to rt_timer_remove_all(), to specify if we need to call
timeout routine on removal.


# 1.30 02-Nov-2002 perry

/*CONTCOND*/ while (0)'ed macros


Revision tags: kqueue-aftermerge kqueue-beforemerge netbsd-1-6-RELEASE netbsd-1-6-RC3 netbsd-1-6-RC2 netbsd-1-6-RC1 netbsd-1-6-base gehenna-devsw-base kqueue-base
# 1.29 12-May-2002 matt

branches: 1.29.4;
Eliminate more commons.


Revision tags: eeh-devprop-base newlock-base ifpoll-base thorpej-mips-cache-base thorpej-devvp-base3 thorpej-devvp-base2 post-chs-ubcperf pre-chs-ubcperf thorpej-devvp-base thorpej_scsipi_beforemerge thorpej_scsipi_nbase thorpej_scsipi_base
# 1.28 08-Mar-2001 enami

branches: 1.28.2;
- lineup comment.
- fix typo in comment.


# 1.27 21-Feb-2001 itojun

branches: 1.27.2;
use u_quad_t for rtstat.
not sure if it really matters, but short (32K) looks way too small given
recent fat pipes connecting *BSD boxes, and our great uptime :-).


# 1.26 27-Jan-2001 itojun

cleanup cloned route when parent route (RTF_CLONING) goes away.
adds rt_parent to link parent from child (like NRL did, ours do refcnt
rt_refcnt properly).

bsdi rt_walkbranch would speedup the processing, but since the code will not
be visited too frequently, the current code (with rt_walktree) should be okay.


# 1.25 27-Jan-2001 itojun

mark cloned routes with RTF_CLONED. present it with netstat -r by "c".

let static routes overwrite cloned routes, as cloned routes can come back again
if necessary. behavior same as freebsd/bsdi, code partially from bsdi42.
(NRL rt->rt_parent was not added)
should fix PR 11916 and maybe some other PRs with ARP behavior.

recompilation of usr.sbin/route6d is suggested.


# 1.24 17-Jan-2001 itojun

pull post-4.4BSD change to sys/net/route.c from BSD/OS 4.2 (UCB copyrighted).

have sys/net/route.c:rtrequest1(), which takes rt_addrinfo * as the argument.
pass rt_addrinfo all the way down to rtrequest, and ifa->ifa_rtrequest.
3rd arg of ifa->ifa_rtrequest is now rt_addrinfo * instead of sockaddr *
(almost noone is using it anyways).

benefit: the follwoing command now works. previously we need two route(8)
invocations, "add" then "change".
# route add -inet6 default ::1 -ifp gif0

remove unsafe typecast in rtrequest(), from rtentry * to sockaddr *. it was
introduced by 4.3BSD-reno and never corrected.

XXX is eon_rtrequest() change correct regarding to 3rd arg?
eon_rtrequest() and rtrequest() were incorrect since 4.3BSD-reno,
so i do not have correct answer in the source code.
someone with more clue about netiso-over-ip, please help.


# 1.23 09-Dec-2000 itojun

update icmp6 too big validation. the change is necessary since pmtud is
mandatory for IPv6 (so we can't just validate by using connected pcb - we need
to allow traffic from unconnected pcb to do pmtud).
- if the traffic is validated by xx_ctlinput, allow up to "hiwat" pmtud
route entries.
- if the traffic was not validated by xx_ctlinput, allow up to "lowat" pmtud
route entries (there's upper limit, so bad guys cannot blow up our routing
table).
sync with kame

XXX need to think again about default hiwat/lowat value.
XXX victim selection to help starvation case


Revision tags: netbsd-1-5-RELEASE netbsd-1-5-BETA2 netbsd-1-5-BETA netbsd-1-5-ALPHA2 netbsd-1-5-base minoura-xpg4dl-base
# 1.22 04-May-2000 ragge

branches: 1.22.4;
Change rt_refcnt from short to int, to allow more than 32k routes thru
one interface without unexpected side effects.


# 1.21 06-Mar-2000 thorpej

- Add link status to if_data, so that routing daemons and other interested
parties can easily know the state of a link.
- Define an interface announcement message for the routing socket so that
routing daemons and other interested parties know when an interface
is attached/detached.


Revision tags: chs-ubc2-newbase wrstuden-devbsize-19991221 wrstuden-devbsize-base
# 1.20 19-Nov-1999 bouyer

Update protocoles and interfaces stats counters to 64bit.
RTM_IFINFO is now 0xf, 0xe is RTM_OIFINFO which returns the old (if_msghdr14)
struct with 32bit counters (binary compat, conditioned on COMPAT_14).
Same for sysctl: node 3 is renamed NET_RT_OIFLIST, NET_RT_IFLIST is now node 4.
Change rt_msg1() to add an mbuf to the mbuf chain instead of just panic()
when the message is larger than MHLEN.


Revision tags: comdex-fall-1999-base fvdl-softdep-base chs-ubc2-base
# 1.19 30-Jul-1999 itojun

branches: 1.19.2; 1.19.8;
remove reference to in6_systm.h (file itself will be removed afterwords)


# 1.18 01-Jul-1999 itojun

IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.


Revision tags: netbsd-1-4-PATCH003 netbsd-1-4-PATCH002 netbsd-1-4-PATCH001 netbsd-1-4-RELEASE netbsd-1-4-base
# 1.17 27-Dec-1998 thorpej

branches: 1.17.4; 1.17.6;
Simplify the rttimer code somewhat; use TAILQs instead of CIRCLEQs (we
didn't really need to traverse the queues backwards anyhow), and other
minor code simplification.


# 1.16 10-Dec-1998 christos

IPX counters and centralize statistics routine.


Revision tags: kenh-if-detach-base chs-ubc-base
# 1.15 25-Aug-1998 thorpej

Use do { ... } while (0) in RTFREE().


Revision tags: eeh-paddr_t-base
# 1.14 02-May-1998 thorpej

Need <sys/socket.h> to stand alone.


# 1.13 29-Apr-1998 thorpej

Oops, we depend on <sys/queue.h>.


# 1.12 29-Apr-1998 kml

Add generic route timeout functionality; used by path MTU discovery code


Revision tags: netbsd-1-3-PATCH001 netbsd-1-3-RELEASE netbsd-1-3-BETA netbsd-1-3-base thorpej-signal-base marc-pcmcia-bp marc-pcmcia-base
# 1.11 02-Apr-1997 christos

branches: 1.11.8;
Sync with Lite2.


Revision tags: is-newarp-before-merge is-newarp-base
# 1.10 22-May-1996 mycroft

Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.


Revision tags: netbsd-1-2-RELEASE netbsd-1-2-BETA netbsd-1-2-base
# 1.9 13-Feb-1996 christos

branches: 1.9.4;
Net prototypes


Revision tags: netbsd-1-1-PATCH001 netbsd-1-1-RELEASE netbsd-1-1-base
# 1.8 26-Mar-1995 jtc

KERNEL -> _KERNEL


# 1.7 08-Mar-1995 cgd

fixed sized types, where appropriate. when casting pointers to
integers to do math on them, cast to long. ioctl commands are
u_longs.


Revision tags: netbsd-1-0-PATCH06 netbsd-1-0-PATCH05 netbsd-1-0-PATCH04 netbsd-1-0-PATCH03 netbsd-1-0-PATCH02 netbsd-1-0-PATCH1 netbsd-1-0-PATCH0 netbsd-1-0-RELEASE netbsd-1-0-base
# 1.6 29-Jun-1994 cgd

New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'


# 1.5 13-May-1994 mycroft

Update to 4.4-Lite networking code, with a few local changes.


# 1.4 11-May-1994 mycroft

Update to RTM version 3. Add prototypes. Add some new constants which are
not used yet.


Revision tags: magnum-base netbsd-0-9-patch-001 netbsd-0-9-RELEASE netbsd-0-9-BETA netbsd-0-9-ALPHA2 netbsd-0-9-ALPHA netbsd-0-9-base
# 1.3 20-May-1993 cgd

add rcs ids to everything, and clean up headers


# 1.2 19-Apr-1993 mycroft

Add consistent multiple-inclusion protection.


# 1.1 21-Mar-1993 cgd

branches: 1.1.1;
Initial revision