History log of /freebsd-11-stable/sys/net/if_var.h
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# 359024 16-Mar-2020 brooks

MFC r358592:

Expose ifr_buffer_get_(buffer|length) outside if.c.

This is a preparatory commit for D23933.

Reviewed by: jhb
Obtained from: CheriBSD
Sponsored by: DARPA


# 341884 12-Dec-2018 hselasky

MFC r339012:
For changing the MTU on tun/tap devices, it should not matter whether it
is done via using ifconfig, which uses a SIOCSIFMTU ioctl() command, or
doing it using a TUNSIFINFO/TAPSIFINFO ioctl() command.
Without this patch, for IPv6 the new MTU is not used when creating routes.
Especially, when initiating TCP connections after increasing the MTU,
the old MTU is still used to compute the MSS.
Thanks to ae@ and bz@ for helping to improve the patch.

Reviewed by: ae@, bz@
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D17180


# 333101 30-Apr-2018 hselasky

MFC r333015:
Add network device event for priority code point, PCP, changes.

When the PCP is changed for either a VLAN network interface or when
prio tagging is enabled for a regular ethernet network interface,
broadcast the IFNET_EVENT_PCP event so applications like ibcore can
update its GID tables accordingly.

Reviewed by: ae, kib
Differential Revision: https://reviews.freebsd.org/D15040
Sponsored by: Mellanox Technologies


# 332991 25-Apr-2018 kib

MFC r331622:
Allow to specify PCP on packets not belonging to any VLAN.

Sponsored by: Mellanox Technologies


# 332288 08-Apr-2018 brooks

MFC r331797:

Use an accessor function to access ifr_data.

This fixes 32-bit compat (no ioctl command defintions are required
as struct ifreq is the same size).

Reviewed by: kib
Obtained from: CheriBSD
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14900


# 331722 29-Mar-2018 eadler

Revert r330897:

This was intended to be a non-functional change. It wasn't. The commit
message was thus wrong. In addition it broke arm, and merged crypto
related code.

Revert with prejudice.

This revert skips files touched in r316370 since that commit was since
MFCed. This revert also skips files that require $FreeBSD$ property
changes.

Thank you to those who helped me get out of this mess including but not
limited to gonzo, kevans, rgrimes.

Requested by: gjb (re)


# 330897 14-Mar-2018 eadler

Partial merge of the SPDX changes

These changes are incomplete but are making it difficult
to determine what other changes can/should be merged.

No objections from: pfg


# 318504 18-May-2017 rpokala

Persistently store NIC's hardware MAC address, and add a way to retrive it

jhb pointed out that (struct ifnet) is part of the network driver KBI, and
thus the offsets of internal fields must not change. Therefore, move the new
"if_hw_addr" field to the end, and consume one of the "if_pspare"s; that's
what they're there for. The new field replaces the *last* element of that
array; that way, offsetof(if_pspare) and offsetof(if_ispare) are unchanged
compared to before r318397.

PR: 194386
Reviewed by: jhb
Pointyhat to: rpokala
Sponsored by: Panasas


# 318397 17-May-2017 rpokala

MFC r318160, 318176: Persistently store NIC's hardware MAC address, and add
a way to retrive it

The MAC address reported by `ifconfig ${nic} ether' does not always match
the address in the hardware, as reported by the driver during attach. In
particular, NICs which are components of a lagg(4) interface all report the
same MAC.

When attaching, the NIC driver passes the MAC address it read from the
hardware as an argument to ether_ifattach(). Keep a second copy of it, and
create ioctl(SIOCGHWADDR) to return it. Teach `ifconfig' to report it along
with the active MAC address.

PR: 194386


# 314090 22-Feb-2017 dexuan

MFC: 312687, 312916

Approved by: sephe (mentor)

r312687
ifnet: introduce event handlers for ifup/ifdown events

Hyper-V's NIC SR-IOV implementation needs a Hyper-V synthetic NIC and
a VF NIC to work together, mainly to support seamless live migration.

When the VF device becomes UP (or DOWN), the synthetic NIC driver needs
to switch the data path from the synthetic NIC to the VF (or the opposite).

So the synthetic NIC driver needs to know when a VF device is becoming
UP or DOWN and hence the patch is made.

Reviewed by: sephe
Approved by: sephe (mentor)
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D8963

r312916
ifnet: move the new ifnet_event EVENTHANDLER_DECLARE to net/if_var.h

Thank glebius for pointing this out:
"The network stuff shall not be added to sys/eventhandler.h"

Reviewed by: David_A_Bright_DELL.com, sephe, glebius
Approved by: sephe (mentor)
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D9345


# 314089 22-Feb-2017 dexuan

revert the MFC r314085

Sorry, I generated a wrong commit log for r314085 due to a copy&pasate mistake.
Let me revert it and I'll redo the MFC.


# 314085 22-Feb-2017 dexuan

MFC: 312687, 312688

Approved by: sephe (mentor)

r312687
ifnet: introduce event handlers for ifup/ifdown events

Hyper-V's NIC SR-IOV implementation needs a Hyper-V synthetic NIC and
a VF NIC to work together, mainly to support seamless live migration.

When the VF device becomes UP (or DOWN), the synthetic NIC driver needs
to switch the data path from the synthetic NIC to the VF (or the opposite).

So the synthetic NIC driver needs to know when a VF device is becoming
UP or DOWN and hence the patch is made.

Reviewed by: sephe
Approved by: sephe (mentor)
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D8963

r312688
hyperv/hn: add the support for VF drivers (SR-IOV)

Hyper-V's NIC SR-IOV implementation needs a Hyper-V synthetic NIC and
a VF NIC to work together (both NICs have the same MAC address), mainly to
support seamless live migration.

When the VF device becomes UP (or DOWN), the synthetic NIC driver needs
to switch the data path from the synthetic NIC to the VF (or the opposite).

Note: multicast/broadcast packets are still received through the synthetic
NIC and we need to inject the packets through the VF interface (if the VF is
UP), even if the synthetic NIC is DOWN (so we need to force the rxfilter
to be NDIS_PACKET_TYPE_PROMISCUOUS, when the VF is UP).

Reviewed by: sephe
Approved by: sephe (mentor)
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D8964


# 302408 07-Jul-2016 gjb

Copy head@r302406 to stable/11 as part of the 11.0-RELEASE cycle.
Prune svn:mergeinfo from the new branch, as nothing has been merged
here.

Additional commits post-branch will follow.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


/freebsd-11-stable/MAINTAINERS
/freebsd-11-stable/cddl
/freebsd-11-stable/cddl/contrib/opensolaris
/freebsd-11-stable/cddl/contrib/opensolaris/cmd/dtrace/test/tst/common/print
/freebsd-11-stable/cddl/contrib/opensolaris/cmd/zfs
/freebsd-11-stable/cddl/contrib/opensolaris/lib/libzfs
/freebsd-11-stable/contrib/amd
/freebsd-11-stable/contrib/apr
/freebsd-11-stable/contrib/apr-util
/freebsd-11-stable/contrib/atf
/freebsd-11-stable/contrib/binutils
/freebsd-11-stable/contrib/bmake
/freebsd-11-stable/contrib/byacc
/freebsd-11-stable/contrib/bzip2
/freebsd-11-stable/contrib/com_err
/freebsd-11-stable/contrib/compiler-rt
/freebsd-11-stable/contrib/dialog
/freebsd-11-stable/contrib/dma
/freebsd-11-stable/contrib/dtc
/freebsd-11-stable/contrib/ee
/freebsd-11-stable/contrib/elftoolchain
/freebsd-11-stable/contrib/elftoolchain/ar
/freebsd-11-stable/contrib/elftoolchain/brandelf
/freebsd-11-stable/contrib/elftoolchain/elfdump
/freebsd-11-stable/contrib/expat
/freebsd-11-stable/contrib/file
/freebsd-11-stable/contrib/gcc
/freebsd-11-stable/contrib/gcclibs/libgomp
/freebsd-11-stable/contrib/gdb
/freebsd-11-stable/contrib/gdtoa
/freebsd-11-stable/contrib/groff
/freebsd-11-stable/contrib/ipfilter
/freebsd-11-stable/contrib/ldns
/freebsd-11-stable/contrib/ldns-host
/freebsd-11-stable/contrib/less
/freebsd-11-stable/contrib/libarchive
/freebsd-11-stable/contrib/libarchive/cpio
/freebsd-11-stable/contrib/libarchive/libarchive
/freebsd-11-stable/contrib/libarchive/libarchive_fe
/freebsd-11-stable/contrib/libarchive/tar
/freebsd-11-stable/contrib/libc++
/freebsd-11-stable/contrib/libc-vis
/freebsd-11-stable/contrib/libcxxrt
/freebsd-11-stable/contrib/libexecinfo
/freebsd-11-stable/contrib/libpcap
/freebsd-11-stable/contrib/libstdc++
/freebsd-11-stable/contrib/libucl
/freebsd-11-stable/contrib/libxo
/freebsd-11-stable/contrib/llvm
/freebsd-11-stable/contrib/llvm/projects/libunwind
/freebsd-11-stable/contrib/llvm/tools/clang
/freebsd-11-stable/contrib/llvm/tools/lldb
/freebsd-11-stable/contrib/llvm/tools/llvm-dwarfdump
/freebsd-11-stable/contrib/llvm/tools/llvm-lto
/freebsd-11-stable/contrib/mdocml
/freebsd-11-stable/contrib/mtree
/freebsd-11-stable/contrib/ncurses
/freebsd-11-stable/contrib/netcat
/freebsd-11-stable/contrib/ntp
/freebsd-11-stable/contrib/nvi
/freebsd-11-stable/contrib/one-true-awk
/freebsd-11-stable/contrib/openbsm
/freebsd-11-stable/contrib/openpam
/freebsd-11-stable/contrib/openresolv
/freebsd-11-stable/contrib/pf
/freebsd-11-stable/contrib/sendmail
/freebsd-11-stable/contrib/serf
/freebsd-11-stable/contrib/sqlite3
/freebsd-11-stable/contrib/subversion
/freebsd-11-stable/contrib/tcpdump
/freebsd-11-stable/contrib/tcsh
/freebsd-11-stable/contrib/tnftp
/freebsd-11-stable/contrib/top
/freebsd-11-stable/contrib/top/install-sh
/freebsd-11-stable/contrib/tzcode/stdtime
/freebsd-11-stable/contrib/tzcode/zic
/freebsd-11-stable/contrib/tzdata
/freebsd-11-stable/contrib/unbound
/freebsd-11-stable/contrib/vis
/freebsd-11-stable/contrib/wpa
/freebsd-11-stable/contrib/xz
/freebsd-11-stable/crypto/heimdal
/freebsd-11-stable/crypto/openssh
/freebsd-11-stable/crypto/openssl
/freebsd-11-stable/gnu/lib
/freebsd-11-stable/gnu/usr.bin/binutils
/freebsd-11-stable/gnu/usr.bin/cc/cc_tools
/freebsd-11-stable/gnu/usr.bin/gdb
/freebsd-11-stable/lib/libc/locale/ascii.c
/freebsd-11-stable/sys/cddl/contrib/opensolaris
/freebsd-11-stable/sys/contrib/dev/acpica
/freebsd-11-stable/sys/contrib/ipfilter
/freebsd-11-stable/sys/contrib/libfdt
/freebsd-11-stable/sys/contrib/octeon-sdk
/freebsd-11-stable/sys/contrib/x86emu
/freebsd-11-stable/sys/contrib/xz-embedded
/freebsd-11-stable/usr.sbin/bhyve/atkbdc.h
/freebsd-11-stable/usr.sbin/bhyve/bhyvegc.c
/freebsd-11-stable/usr.sbin/bhyve/bhyvegc.h
/freebsd-11-stable/usr.sbin/bhyve/console.c
/freebsd-11-stable/usr.sbin/bhyve/console.h
/freebsd-11-stable/usr.sbin/bhyve/pci_fbuf.c
/freebsd-11-stable/usr.sbin/bhyve/pci_xhci.c
/freebsd-11-stable/usr.sbin/bhyve/pci_xhci.h
/freebsd-11-stable/usr.sbin/bhyve/ps2kbd.c
/freebsd-11-stable/usr.sbin/bhyve/ps2kbd.h
/freebsd-11-stable/usr.sbin/bhyve/ps2mouse.c
/freebsd-11-stable/usr.sbin/bhyve/ps2mouse.h
/freebsd-11-stable/usr.sbin/bhyve/rfb.c
/freebsd-11-stable/usr.sbin/bhyve/rfb.h
/freebsd-11-stable/usr.sbin/bhyve/sockstream.c
/freebsd-11-stable/usr.sbin/bhyve/sockstream.h
/freebsd-11-stable/usr.sbin/bhyve/usb_emul.c
/freebsd-11-stable/usr.sbin/bhyve/usb_emul.h
/freebsd-11-stable/usr.sbin/bhyve/usb_mouse.c
/freebsd-11-stable/usr.sbin/bhyve/vga.c
/freebsd-11-stable/usr.sbin/bhyve/vga.h
# 302153 23-Jun-2016 np

Add spares to struct ifnet and socket for packet pacing and/or general
use. Update comments regarding the spare fields in struct inpcb.

Bump __FreeBSD_version for the changes to the size of the structures.

Reviewed by: gnn@
Approved by: re@ (gjb@)
Sponsored by: Chelsio Communications


# 300164 18-May-2016 bz

Rather than having the if_vmove() code intermixed in the vnet_destroy()
function in vnet.c move it to if.c where it logically belongs and put
it under a VNET_SYSUNINIT() call.
To not change the current behaviour make sure it runs first thing
during teardown. In the future this will allow us more flexibility
on changing the order on when we want to get rid of interfaces.

Stop exporting if_vmove() and make it file static.

Reviewed by: gnn
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D6438


# 300113 18-May-2016 scottl

Import the 'iflib' API library for network drivers. From the author:

"iflib is a library to eliminate the need for frequently duplicated device
independent logic propagated (poorly) across many network drivers."

Participation is purely optional. The IFLIB kernel config option is
provided for drivers that want to transition between legacy and iflib
modes of operation. ixl and ixgbe driver conversions will be committed
shortly. We hope to see participation from the Broadcom and maybe
Chelsio drivers in the near future.

Submitted by: mmacy@nextbsd.org
Reviewed by: gallatin
Differential Revision: D5211


# 292978 31-Dec-2015 melifaro

Implement interface link header precomputation API.

Add if_requestencap() interface method which is capable of calculating
various link headers for given interface. Right now there is support
for INET/INET6/ARP llheader calculation (IFENCAP_LL type request).
Other types are planned to support more complex calculation
(L2 multipath lagg nexthops, tunnel encap nexthops, etc..).

Reshape 'struct route' to be able to pass additional data (with is length)
to prepend to mbuf.

These two changes permits routing code to pass pre-calculated nexthop data
(like L2 header for route w/gateway) down to the stack eliminating the
need for other lookups. It also brings us closer to more complex scenarios
like transparently handling MPLS nexthops and tunnel interfaces.
Last, but not least, it removes layering violation introduced by flowtable
code (ro_lle) and simplifies handling of existing if_output consumers.

ARP/ND changes:
Make arp/ndp stack pre-calculate link header upon installing/updating lle
record. Interface link address change are handled by re-calculating
headers for all lles based on if_lladdr event. After these changes,
arpresolve()/nd6_resolve() returns full pre-calculated header for
supported interfaces thus simplifying if_output().
Move these lookups to separate ether_resolve_addr() function which ether
returs error or fully-prepared link header. Add <arp|nd6_>resolve_addr()
compat versions to return link addresses instead of pre-calculated data.

BPF changes:
Raw bpf writes occupied _two_ cases: AF_UNSPEC and pseudo_AF_HDRCMPLT.
Despite the naming, both of there have ther header "complete". The only
difference is that interface source mac has to be filled by OS for
AF_UNSPEC (controlled via BIOCGHDRCMPLT). This logic has to stay inside
BPF and not pollute if_output() routines. Convert BPF to pass prepend data
via new 'struct route' mechanism. Note that it does not change
non-optimized if_output(): ro_prepend handling is purely optional.
Side note: hackish pseudo_AF_HDRCMPLT is supported for ethernet and FDDI.
It is not needed for ethernet anymore. The only remaining FDDI user is
dev/pdq mostly untouched since 2007. FDDI support was eliminated from
OpenBSD in 2013 (sys/net/if_fddisubr.c rev 1.65).

Flowtable changes:
Flowtable violates layering by saving (and not correctly managing)
rtes/lles. Instead of passing lle pointer, pass pointer to pre-calculated
header data from that lle.

Differential Revision: https://reviews.freebsd.org/D4102


# 292402 17-Dec-2015 smh

Revert r292275 & r292379

glebius has concerns about these changes so reverting those can be discussed
and addressed.

Sponsored by: Multiplay


# 292275 15-Dec-2015 smh

Fix lagg failover due to missing notifications

When using lagg failover mode neither Gratuitous ARP (IPv4) or Unsolicited
Neighbour Advertisements (IPv6) are sent to notify other nodes that the
address may have moved.

This results is slow failover, dropped packets and network outages for the
lagg interface when the primary link goes down.

We now use the new if_link_state_change_cond with the force param set to
allow lagg to force through link state changes and hence fire a
ifnet_link_event which are now monitored by rip and nd6.

Upon receiving these events each protocol trigger the relevant
notifications:
* inet4 => Gratuitous ARP
* inet6 => Unsolicited Neighbour Announce

This also fixes the carp IPv6 NA's that stopped working after r251584 which
added the ipv6_route__llma route.

The new behavour can be controlled using the sysctls:
* net.link.ether.inet.arp_on_link
* net.inet6.icmp6.nd6_on_link

Also removed unused param from lagg_port_state and added descriptions for the
sysctls while here.

PR: 156226
MFC after: 1 month
Sponsored by: Multiplay
Differential Revision: https://reviews.freebsd.org/D4111


# 291292 25-Nov-2015 ae

Overhaul if_enc(4) and make it loadable in run-time.

Use hhook(9) framework to achieve ability of loading and unloading
if_enc(4) kernel module. INET and INET6 code on initialization registers
two helper hooks points in the kernel. if_enc(4) module uses these helper
hook points and registers its hooks. IPSEC code uses these hhook points
to call helper hooks implemented in if_enc(4).


# 287851 16-Sep-2015 melifaro

Unify loopback route switching:
* prepare gateway before insertion
* use RTM_CHANGE instead of explicit find/change route
* Remove fib argument from ifa_switch_loopback_route added in r264887:
if old ifp fib differes from new one, that the caller
is doing something wrong
* Make ifa_*_loopback_route call single ifa_maintain_loopback_route().


# 287775 14-Sep-2015 hselasky

Update TSO limits to include all headers.

To make driver programming easier the TSO limits are changed to
reflect the values used in the BUSDMA tag a network adapter driver is
using. The TCP/IP network stack will subtract space for all linklevel
and protocol level headers and ensure that the full mbuf chain passed
to the network adapter fits within the given limits.

Implementation notes:

If a network adapter driver needs to fixup the first mbuf in order to
support VLAN tag insertion, the size of the VLAN tag should be
subtracted from the TSO limit. Else not.

Network adapters which typically inline the complete header mbuf could
technically transmit one more segment. This patch does not implement a
mechanism to recover the last segment for data transmission. It is
believed when sufficiently large mbuf clusters are used, the segment
limit will not be reached and recovering the last segment will not
have any effect.

The current TSO algorithm tries to send MTU-sized packets, where the
MTU typically is 1500 bytes, which gives 1448 bytes of TCP data
payload per packet for IPv4. That means if the TSO length limitiation
is set to 65536 bytes, there will be a data payload remainder of
(65536 - 1500) mod 1448 bytes which is equal to 324 bytes. Trying to
recover total TSO length due to inlining mbuf header data will not
have any effect, because adding or removing the ETH/IP/TCP headers
to or from 324 bytes will not cause more or less TCP payload to be
TSO'ed.

Existing network adapter limits will be updated separately.

Differential Revision: https://reviews.freebsd.org/D3458
Reviewed by: rmacklem
MFC after: 2 weeks


# 287476 05-Sep-2015 melifaro

Constantify lookup key in ifa_ifwith* functions.
Some places in our network stack already have const
arguments (like if_output() routines and LLE functions).

Code using ifa_ifwith (and similar functins) along with
LLE/_output functions is currently bound to use tricks
like __DECONST(). Provide a cleaner way by making sockaddr
lookup key really constant.

MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D3464


# 281613 16-Apr-2015 glebius

Move ALTQ from contrib to net/altq. The ALTQ code is for many years
discontinued by its initial authors. In FreeBSD the code was already
slightly edited during the pf(4) SMP project. It is about to be edited
more in the projects/ifnet. Moving out of contrib also allows to remove
several hacks to the make glue.

Reviewed by: net@


# 279342 26-Feb-2015 glebius

Hide struct ifmultiaddr under _KERNEL, too.


# 279030 19-Feb-2015 glebius

Now that all users of _WANT_IFADDR are fixed, remove this crutch and
hide ifaddr, in_ifaddr and in6_ifaddr under _KERNEL.

Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# 274376 11-Nov-2014 hselasky

Fix some minor TSO issues:
- Improve description of TSO limits.
- Remove a not needed KASSERT()
- Remove some not needed variable casts.

Sponsored by: Mellanox Technologies
Discussed with: lstewart @
MFC after: 1 week


# 274306 09-Nov-2014 glebius

Use standard mtx(9), rwlock(9), sx(9) system initialization macros
instead of doing initialization manually.

Sponsored by: Nginx, Inc.
Sponsored by: Netflix


# 274175 06-Nov-2014 melifaro

Make checks for rt_mtu generic:

Some virtual if drivers has (ab)used ifa ifa_rtrequest hook to enforce
route MTU to be not bigger that interface MTU. While ifa_rtrequest hooking
might be an option in some situation, it is not feasible to do MTU checks
there: generic (or per-domain) routing code is perfectly capable of doing
this.

We currrently have 3 places where MTU is altered:

1) route addition.
In this case domain overrides radix _addroute callback (in[6]_addroute)
and all necessary checks/fixes are/can be done there.

2) route change (especially, GW change).
In this case, there are no explicit per-domain calls, but one can
override rte by setting ifa_rtrequest hook to domain handler
(inet6 does this).

3) ifconfig ifaceX mtu YYYY
In this case, we have no callbacks, but ip[6]_output performes runtime
checks and decreases rt_mtu if necessary.

Generally, the goals are to be able to handle all MTU changes in
control plane, not in runtime part, and properly deal with increased
interface MTU.

This commit changes the following:
* removes hooks setting MTU from drivers side
* adds proper per-doman MTU checks for case 1)
* adds generic MTU check for case 2)

* The latter is done by using new dom_ifmtu callback since
if_mtu denotes L3 interface MTU, e.g. maximum trasmitted _packet_ size.
However, IPv6 mtu might be different from if_mtu one (e.g. default 1280)
for some cases, so we need an abstract way to know maximum MTU size
for given interface and domain.
* moves rt_setmetrics() before MTU/ifa_rtrequest hooks since it copies
user-supplied data which must be checked.
* removes RT_LOCK_ASSERT() from other ifa_rtrequest hooks to be able to
use this functions on new non-inserted rte.

More changes will follow soon.

MFC after: 1 month
Sponsored by: Yandex LLC


# 274047 03-Nov-2014 hselasky

Clarify TSO segment limit comment and remove two TABs to make lines a
bit shorter.

Sponsored by: Mellanox Technologies


# 272261 28-Sep-2014 bz

Move the unconditional #include of net/ifq.h to the very end of file.
This seems to allow us to pass a universe with either clang or gcc
after r272244 (and r272260) and probably makes it easier to untabgle
these chained #includes in the future.


# 272257 28-Sep-2014 glebius

- Remove empty wrappers ether_poll_[de]register_drv(). [1]
- Move polling(9) declarations out of ifq.h back to if_var.h
they are absolutely unrelated to queues.

Submitted by: Mikhail <mp lenta.ru> [1]


# 272244 28-Sep-2014 glebius

Finally, convert counters in struct ifnet to counter(9).

Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# 272211 27-Sep-2014 melifaro

Use underlying ports counters to get lagg statistics instead of
per-packet accounting.
This introduce user-visible changes like aggregating error counters.

Reviewed by: asomers (prev.version), glebius
CR: D781
MFC after: 2 weeks
Sponsored by: Yandex LLC


# 271946 22-Sep-2014 hselasky

Improve transmit sending offload, TSO, algorithm in general.

The current TSO limitation feature only takes the total number of
bytes in an mbuf chain into account and does not limit by the number
of mbufs in a chain. Some kinds of hardware is limited by two
factors. One is the fragment length and the second is the fragment
count. Both of these limits need to be taken into account when doing
TSO. Else some kinds of hardware might have to drop completely valid
mbuf chains because they cannot loaded into the given hardware's DMA
engine. The new way of doing TSO limitation has been made backwards
compatible as input from other FreeBSD developers and will use
defaults for values not set.

Reviewed by: adrian, rmacklem
Sponsored by: Mellanox Technologies
MFC after: 1 week


# 271783 18-Sep-2014 glebius

Remove a bunch of methods that are superseded by if_inc_counter().

Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# 271774 18-Sep-2014 glebius

While not too late rename 'ifnet_counter' to 'ift_counter'. One of the
imporant moments that we discussed with Marcel and Anuranjan was that
a converted driver should return false for 'grep ifnet if_driver.c' :)

Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# 271770 18-Sep-2014 glebius

Add a function to set if_get_counter method for an ifnet. To be used
in the drivers that are already converted to "Juniper drvapi". This
can be revisited in future.


# 271752 18-Sep-2014 glebius

While not too late rename if_get_counter_compat() to if_get_counter_default().
The compat counters will go away, but the function will remain in its place,
and in all places where it is going to be called.

Discussed with: melifaro


# 271751 18-Sep-2014 glebius

Add if_inc_counter(), a generic method to update ifnet(9) counter
w/o dereferencing the struct.

Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# 271551 13-Sep-2014 hselasky

Revert r271504. A new patch to solve this issue will be made.

Suggested by: adrian @


# 271504 13-Sep-2014 hselasky

Improve transmit sending offload, TSO, algorithm in general.

The current TSO limitation feature only takes the total number of
bytes in an mbuf chain into account and does not limit by the number
of mbufs in a chain. Some kinds of hardware is limited by two
factors. One is the fragment length and the second is the fragment
count. Both of these limits need to be taken into account when doing
TSO. Else some kinds of hardware might have to drop completely valid
mbuf chains because they cannot loaded into the given hardware's DMA
engine. The new way of doing TSO limitation has been made backwards
compatible as input from other FreeBSD developers and will use
defaults for values not set.

MFC after: 1 week
Sponsored by: Mellanox Technologies


# 271438 11-Sep-2014 asomers

Revisions 264905 and 266860 added a "int fib" argument to ifa_ifwithnet and
ifa_ifwithdstaddr. For the sake of backwards compatibility, the new
arguments were added to new functions named ifa_ifwithnet_fib and
ifa_ifwithdstaddr_fib, while the old functions became wrappers around the
new ones that passed RT_ALL_FIBS for the fib argument. However, the
backwards compatibility is not desired for FreeBSD 11, because there are
numerous other incompatible changes to the ifnet(9) API. We therefore
decided to remove it from head but leave it in place for stable/9 and
stable/10. In addition, this commit adds the fib argument to
ifa_ifwithbroadaddr for consistency's sake.

sys/sys/param.h
Increment __FreeBSD_version

sys/net/if.c
sys/net/if_var.h
sys/net/route.c
Add fibnum argument to ifa_ifwithbroadaddr, and remove the _fib
versions of ifa_ifwithdstaddr, ifa_ifwithnet, and ifa_ifwithroute.

sys/net/route.c
sys/net/rtsock.c
sys/netinet/in_pcb.c
sys/netinet/ip_options.c
sys/netinet/ip_output.c
sys/netinet6/nd6.c
Fixup calls of modified functions.

share/man/man9/ifnet.9
Document changed API.

CR: https://reviews.freebsd.org/D458
MFC after: Never
Sponsored by: Spectra Logic


# 270877 31-Aug-2014 glebius

Toss fields so that no padding field is required to achieve alignment.


# 270876 31-Aug-2014 glebius

It is actually possible to have if_t a typedef to non-void type,
and keep both converted to drvapi and non-converted drivers
compilable.

o Make if_t typedef to struct ifnet *.
o Remove shim functions.

Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# 270874 31-Aug-2014 glebius

Provide pointer from struct ifnet to struct netmap_adapter,
instead of abusing spare field.


# 270870 31-Aug-2014 glebius

o Remove struct if_data from struct ifnet. Now it is merely API structure
for route(4) socket and ifmib(4) sysctl.
o Move fields from if_data to ifnet, but keep all statistic counters
separate, since they should disappear later.
o Provide function if_data_copy() to fill if_data, utilize it in routing
socket and ifmib handler.
o Provide overridable ifnet(9) method to fetch counters. If no provided,
if_get_counters_compat() would be used, that returns old counters.

Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# 269243 29-Jul-2014 glebius

Garbage collect couple of unused fields from struct ifaddr:
- ifa_claim_addr() unused since removal of NetAtalk
- ifa_metric seems to be never utilized, always a copy of if_metric


# 266974 02-Jun-2014 marcel

Introduce a procedural interface to the ifnet structure. The new
interface allows the ifnet structure to be defined as an opaque
type in NIC drivers. This then allows the ifnet structure to be
changed without a need to change or recompile NIC drivers.

Put differently, NIC drivers can be written and compiled once and
be used with different network stack implementations, provided of
course that those network stack implementations have an API and
ABI compatible interface.

This commit introduces the 'if_t' type to replace 'struct ifnet *'
as the type of a network interface. The 'if_t' type is defined as
'void *' to enable the compiler to perform type conversion to
'struct ifnet *' and vice versa where needed and without warnings.
The functions that implement the API are the only functions that
need to have an explicit cast.

The MII code has been converted to use the driver API to avoid
unnecessary code churn. Code churn comes from having to work with
both converted and unconverted drivers in correlation with having
callback functions that take an interface. By converting the MII
code first, the callback functions can be defined so that the
compiler will perform the typecasts automatically.

As soon as all drivers have been converted, the if_t type can be
redefined as needed and the API functions can be fix to not need
an explicit cast.

The immediate benefactors of this change are:
1. Juniper Networks - The network stack implementation in Junos
is entirely different from FreeBSD's one and this change
allows Juniper to build "stock" NIC drivers that can be used
in combination with both the FreeBSD and Junos stacks.
2. FreeBSD - This change opens the door towards changing ifnet
and implementing new features and optimizations in the network
stack without it requiring a change in the many NIC drivers
FreeBSD has.

Submitted by: Anuranjan Shukla <anshukla@juniper.net>
Reviewed by: glebius@
Obtained from: Juniper Networks, Inc.


# 266860 29-May-2014 asomers

Fix unintended KBI change from r264905. Add _fib versions of
ifa_ifwithnet() and ifa_ifwithdstaddr() The legacy functions will call the
_fib() versions with RT_ALL_FIBS, preserving legacy behavior.

sys/net/if_var.h
sys/net/if.c
Add legacy-compatible functions as described above. Ensure legacy
behavior when RT_ALL_FIBS is passed as fibnum.

sys/netinet/in_pcb.c
sys/netinet/ip_output.c
sys/netinet/ip_options.c
sys/net/route.c
sys/net/rtsock.c
sys/netinet6/nd6.c
Call with _fib() functions if we must use a specific fib, or the
legacy functions otherwise.

tests/sys/netinet/fibs_test.sh
tests/sys/netinet/udp_dontroute.c
Improve the udp_dontroute test. The bug that this test exercises is
that ifa_ifwithnet() will return the wrong address, if multiple
interfaces have addresses on the same subnet but with different
fibs. The previous version of the test only considered one possible
failure mode: that ifa_ifwithnet_fib() might fail to find any
suitable address at all. The new version also checks whether
ifa_ifwithnet_fib() finds the correct address by checking where the
ARP request goes.

Reported by: bz, hrs
Reviewed by: hrs
MFC after: 1 week
X-MFC-with: 264905
Sponsored by: Spectra Logic


# 264905 24-Apr-2014 asomers

Fix subnet and default routes on different FIBs on the same subnet.

These two bugs are closely related. The root cause is that ifa_ifwithnet
does not consider FIBs when searching for an interface address.

sys/net/if_var.h
sys/net/if.c
Add a fib argument to ifa_ifwithnet and ifa_ifwithdstadddr. Those
functions will only return an address whose interface fib equals the
argument.

sys/net/route.c
Update calls to ifa_ifwithnet and ifa_ifwithdstaddr with fib
arguments.

sys/netinet/in.c
Update in_addprefix to consider the interface fib when adding
prefixes. This will prevent it from not adding a subnet route when
one already exists on a different fib.

sys/net/rtsock.c
sys/netinet/in_pcb.c
sys/netinet/ip_output.c
sys/netinet/ip_options.c
sys/netinet6/nd6.c
Add RT_DEFAULT_FIB arguments to ifa_ifwithdstaddr and ifa_ifwithnet.
In some cases it there wasn't a clear specific fib number to use.
In others, I was unable to test those functions so I chose
RT_DEFAULT_FIB to minimize divergence from current behavior. I will
fix some of the latter changes along with PR kern/187553.

tests/sys/netinet/fibs_test.sh
tests/sys/netinet/udp_dontroute.c
tests/sys/netinet/Makefile
Revert r263738. The udp_dontroute test was right all along.
However, bugs kern/187550 and kern/187553 cancelled each other out
when it came to this test. Because of kern/187553, ifa_ifwithnet
searched the default fib instead of the requested one, but because
of kern/187550, there was an applicable subnet route on the default
fib. The new test added in r263738 doesn't work right, however. I
can verify with dtrace that ifa_ifwithnet returned the wrong address
before I applied this commit, but route(8) miraculously found the
correct interface to use anyway. I don't know how.

Clear expected failure messages for kern/187550 and kern/187552.

PR: kern/187550
PR: kern/187552
Reviewed by: melifaro
MFC after: 3 weeks
Sponsored by: Spectra Logic


# 264887 24-Apr-2014 asomers

Fix host and network routes for new interfaces when net.add_addr_allfibs=0

sys/net/route.c
In rtinit1, use the interface fib instead of the process fib. The
latter wasn't very useful because ifconfig(8) is usually invoked
with the default process fib. Changing ifconfig(8) to use setfib(2)
would be redundant, because it already sets the interface fib.

tests/sys/netinet/fibs_test.sh
Clear the expected ATF failure

sys/net/if.c
Pass the interface fib in calls to rtrequest1_fib and rtalloc1_fib

sys/netinet/in.c
sys/net/if_var.h
Add a fibnum argument to ifa_switch_loopback_route, a subroutine of
in_scrubprefix. Pass it the interface fib.

PR: kern/187549
Reviewed by: melifaro
MFC after: 3 weeks
Sponsored by: Spectra Logic Corporation


# 263413 20-Mar-2014 np

Add a shorter alias for if_data.ifi_oqdrops.


# 263102 13-Mar-2014 glebius

Since 32-bit if_baudrate isn't enough to describe a baud rate of a 10 Gbit
interface, in the r241616 a crutch was provided. It didn't work well, and
finally we decided that it is time to break ABI and simply make if_baudrate
a 64-bit value. Meanwhile, the entire struct if_data was reviewed.

o Remove the if_baudrate_pf crutch.

o Make all fields of struct if_data fixed machine independent size. The
notion of data (packet counters, etc) are by no means MD. And it is a
bug that on amd64 we've got a 64-bit counters, while on i386 32-bit,
which at modern speeds overflow within a second.

This also removes quite a lot of COMPAT_FREEBSD32 code.

o Give 16 bit for the ifi_datalen field. This field was provided to
make future changes to if_data less ABI breaking. Unfortunately the
8 bit size of it had effectively limited sizeof if_data to 256 bytes.

o Give 32 bits to ifi_mtu and ifi_metric.
o Give 64 bits to the rest of fields, since they are counters.

__FreeBSD_version bumped.

Discussed with: emax
Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# 257690 05-Nov-2013 glebius

In complemence to ifa_add_loopback_route() and ifa_del_loopback_route()
provide function ifa_switch_loopback_route() that will be used in case when
an interface address used for a loopback route goes away, but we have another
interface address with same address value and want to preserve loopback
route.

Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# 257689 05-Nov-2013 glebius

Remove net.link.ether.inet.useloopback sysctl tunable. It was always on by
default from the very beginning. It was placed in wrong namespace
net.link.ether, originally it had been at another wrong namespace. It was
incorrectly documented at incorrect manual page arp(8). Since new-ARP commit,
the tunable have been consulted only on route addition, and ignored on route
deletion. Behaviour of a system with tunable turned off is not fully correct,
and has no advantages comparing to normal behavior.


# 257455 31-Oct-2013 andre

Make struct ifnet readable and comprehensible again by grouping
and ordering related variables, fields and locks next to each
other. Add more comments to variables.

Over time 'ifnet' has accumlated a lot of additional pointers and
functionality in an unstructured way making it quite hard to read
and understand while obfuscating relationships between fields and
variables.

Quantify the structure size and how bloated it has become.

This is only a mechanical change in preparation for upcoming
work to make ifnet opaque to drivers and to separate out the
interface queuing.

Sponsored by: The FreeBSD Foundation


# 257351 29-Oct-2013 andre

Move all interface queue related structures, macros and definitions
from net/if_var to it own new net/ifq.h.

For now net/ifq.h is unconditionally included through net/if_var.h.

This is a mechanical change in preparation to make struct ifnet and
the individual interface queue mechanisms opaque.

Discussed with: glebius
Sponsored by: The FreeBSD Foundation


# 257271 28-Oct-2013 glebius

Style: s/SYS_EVENTHANDLER_H/_SYS_EVENTHANDLER_H_/g

Submitted by: bde


# 257244 28-Oct-2013 glebius

- Make the prophecy from 1997 happen and remove if_var.h inclusion
from if.h.
- Remove unnecessary includes and declarations from if.h
- Remove unnecessary includes and declarations from if_var.h [1]
- Mark some declarations that are about to be removed in near
future with comments, explaning why this declaration is still
necessary.
- Protect eventhandler declarations with #ifdef SYS_EVENTHANDLER_H.

Obtained from: bdeBSD [1]
Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# 256525 15-Oct-2013 glebius

- Utilize counter(9) to accumulate statistics on interface addresses. Add
four counters to struct ifaddr. This kills '+=' on a variables shared
between processors for every packet.
- Nuke struct if_data from struct ifaddr.
- In ip_input() do not put a reference on ifaddr, instead update statistics
right now in place and do IN_IFADDR_RUNLOCK(). These removes atomic(9)
for every packet. [1]
- To properly support NET_RT_IFLISTL sysctl used by getifaddrs(3), in
rtsock.c fill if_data fields using counter_u64_fetch().
- Accidentially fix bug in COMPAT_32 version of NET_RT_IFLISTL, which
took if_data not from the ifaddr, but from ifaddr's ifnet. [2]

Submitted by: melifaro [1], pluknet[2]
Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# 256522 15-Oct-2013 glebius

Push some defines under _KERNEL, improve styling and comments.


# 256521 15-Oct-2013 glebius

Remove ifa_mtx. It was used only in one place in kernel, and ifnet's
ifaddr lock can substitute it there.

Discussed with: melifaro, ae
Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# 256519 15-Oct-2013 glebius

Remove ifa_init() and provide ifa_alloc() that will allocate and setup
struct ifaddr internally.

Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# 256518 15-Oct-2013 glebius

Hide 'struct ifaddr' definition from userland. Two tools left that use it,
namely ipftest(1) and ifmcstat(1). These sniff structure definition using
_WANT_IFADDR define.

Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# 252854 05-Jul-2013 cperciva

Fix typo: minmum -> minimum.

Submitted by: @z3ndrag0n


# 251296 03-Jun-2013 andre

Allow drivers to specify a maximum TSO length in bytes if they are
limited in the amount of data they can handle at once.

Drivers can set ifp->if_hw_tsomax before calling ether_ifattach() to
change the limit.

The lowest allowable size is IP_MAXPACKET / 8 (8192 bytes) as anything
less wouldn't be very useful anymore. The upper limit is still at
IP_MAXPACKET (65536 bytes). Raising it requires further auditing of
the IPv4/v6 code path's as the length field in the IP header would
overflow leading to confusion in firewalls and others packet handler on
the real size of the packet.

The placement into "struct ifnet" is a bit hackish but the best place
that was found. When the stack/driver boundary is updated it should
be handled in a better way.

Submitted by: cperciva (earlier version)
Reviewed by: cperciva
Tested by: cperciva
MFC after: 1 week (using spare struct members to preserve ABI)


# 250300 06-May-2013 andre

Back out r249318, r249320 and r249327 due to a heisenbug most
likely related to a race condition in the ipi_hash_lock with
the exact cause currently unknown but under investigation.


# 249925 26-Apr-2013 glebius

Add const qualifier to the dst parameter of the ifnet if_output method.


# 249327 10-Apr-2013 glebius

Fix build.


# 249318 09-Apr-2013 andre

Change certain heavily used network related mutexes and rwlocks to
reside on their own cache line to prevent false sharing with other
nearby structures, especially for those in the .bss segment.

NB: Those mutexes and rwlocks with variables next to them that get
changed on every invocation do not benefit from their own cache line.
Actually it may be net negative because two cache misses would be
incurred in those cases.


# 246659 11-Feb-2013 glebius

Resolve source address selection in presense of CARP. Add a couple
of helper functions:

- carp_master() - boolean function which is true if an address
is in the MASTER state.
- ifa_preferred() - boolean function that compares two addresses,
and is aware of CARP.

Utilize ifa_preferred() in ifa_ifwithnet().

The previous version of patch also changed source address selection
logic in jails using carp_master(), but we failed to negotiate this part
with Bjoern. May be we will approach this problem again later.

Reported & tested by: Anton Yuzhaninov <citrin citrin.ru>
Sponsored by: Nginx, Inc


# 246482 07-Feb-2013 rrs

This fixes a out-of-order problem with several
of the newer drivers. The basic problem was
that the driver was pulling the mbuf off the
drbr ring and then when sending with xmit(), encounting
a full transmit ring. Thus the lower layer
xmit() function would return an error, and the
drivers would then append the data back on to the ring.
For TCP this is a horrible scenario sure to bring
on a fast-retransmit.

The fix is to use drbr_peek() to pull the data pointer
but not remove it from the ring. If it fails then
we either call the new drbr_putback or drbr_advance
method. Advance moves it forward (we do this sometimes
when the xmit() function frees the mbuf). When
we succeed we always call advance. The
putback will always copy the mbuf back to the top
of the ring. Note that the putback *cannot* be used
with a drbr_dequeue() only with drbr_peek(). We most
of the time, in putback, would not need to copy it
back since most likey the mbuf is still the same, but
sometimes xmit() functions will change the mbuf via
a pullup or other call. So the optimial case for
the single consumer is to always copy it back. If
we ever do a multiple_consumer (for lagg?) we
will need a test and atomic in the put back possibly
a seperate putback_mc() in the ring buf.

Reviewed by: jhb@freebsd.org, jlv@freebsd.org


# 241686 18-Oct-2012 andre

Mechanically remove the last stray remains of spl* calls from net*/*.
They have been Noop's for a long time now.


# 241646 17-Oct-2012 emax

provide helper if_initbaudrate() to set if_baudrate_pf and if_baudrate_pf.
again, use ixgbe(4) as an example of how to use new helper function.

Reviewed by: jhb
MFC after: 1 week


# 241616 16-Oct-2012 emax

introduce concept of ifi_baudrate power factor. the idea is to work
around the problem where high speed interfaces (such as ixgbe(4))
are not able to report real ifi_baudrate. bascially, take a spare
byte from struct if_data and use it to store ifi_baudrate power
factor. in other words,

real ifi_baudrate = ifi_baudrate * 10 ^ ifi_baudrate power factor

this should be backwards compatible with old binaries. use ixgbe(4)
as an example on how drivers would set ifi_baudrate power factor

Discussed with: kib, scottl, glebius
MFC after: 1 week


# 241037 28-Sep-2012 glebius

The drbr(9) API appeared to be so unclear, that most drivers in
tree used it incorrectly, which lead to inaccurate overrated
if_obytes accounting. The drbr(9) used to update ifnet stats on
drbr_enqueue(), which is not accurate since enqueuing doesn't
imply successful processing by driver. Dequeuing neither mean
that. Most drivers also called drbr_stats_update() which did
accounting again, leading to doubled if_obytes statistics. And
in case of severe transmitting, when a packet could be several
times enqueued and dequeued it could have been accounted several
times.

o Thus, make drbr(9) API thinner. Now drbr(9) merely chooses between
ALTQ queueing or buf_ring(9) queueing.
- It doesn't touch the buf_ring stats any more.
- It doesn't touch ifnet stats anymore.
- drbr_stats_update() no longer exists.

o buf_ring(9) handles its stats itself:
- It handles br_drops itself.
- br_prod_bytes stats are dropped. Rationale: no one ever
reads them but update of a common counter on every packet
negatively affects performance due to excessive cache
invalidation.
- buf_ring_enqueue_bytes() reduced to buf_ring_enqueue(), since
we no longer account bytes.

o Drivers handle their stats theirselves: if_obytes, if_omcasts.

o mlx4(4), igb(4), em(4), vxge(4), oce(4) and ixv(4) no longer
use drbr_stats_update(), and update ifnet stats theirselves.

o bxe(4) was the most correct driver, it didn't call
drbr_stats_update(), thus it was the only driver accurate under
moderate load. Now it also maintains stats itself.

o ixgbe(4) had already taken stats from hardware, so just
- drop software stats updating.
- take multicast packet count from hardware as well.

o mxge(4) just no longer needs NO_SLOW_STATS define.

o cxgb(4), cxgbe(4) need no change, since they obtain stats
from hardware.

Reviewed by: jfv, gnn


# 240112 04-Sep-2012 melifaro

Fix the build broken by r240099.
Hide link_pfil_hook under _KERNEL macro.

MFC after: 3 weeks


# 240099 04-Sep-2012 melifaro

Introduce new link-layer PFIL hook V_link_pfil_hook.
Merge ether_ipfw_chk() and part of bridge_pfil() into
unified ipfw_check_frame() function called by PFIL.
This change was suggested by rwatson? @ DevSummit.

Remove ipfw headers from ether/bridge code since they are unneeded now.

Note this thange introduce some (temporary) performance penalty since
PFIL read lock has to be acquired for every link-level packet.

MFC after: 3 weeks


# 238990 02-Aug-2012 glebius

Fix races between in_lltable_prefix_free(), lla_lookup(),
llentry_free() and arptimer():

o Use callout_init_rw() for lle timeout, this allows us safely
disestablish them.
- This allows us to simplify the arptimer() and make it
race safe.
o Consistently use ifp->if_afdata_lock to lock access to
linked lists in the lle hashes.
o Introduce new lle flag LLE_LINKED, which marks an entry that
is attached to the hash.
- Use LLE_LINKED to avoid double unlinking via consequent
calls to llentry_free().
- Mark lle with LLE_DELETED via |= operation istead of =,
so that other flags won't be lost.
o Make LLE_ADDREF(), LLE_REMREF() and LLE_FREE_LOCKED() more
consistent and provide more informative KASSERTs.

The patch is a collaborative work of all submitters and myself.

PR: kern/165863
Submitted by: Andrey Zonov <andrey zonov.org>
Submitted by: Ryan Stone <rysto32 gmail.com>
Submitted by: Eric van Gyzen <eric_van_gyzen dell.com>


# 237263 19-Jun-2012 np

- Updated TOE support in the kernel.

- Stateful TCP offload drivers for Terminator 3 and 4 (T3 and T4) ASICs.
These are available as t3_tom and t4_tom modules that augment cxgb(4)
and cxgbe(4) respectively. The cxgb/cxgbe drivers continue to work as
usual with or without these extra features.

- iWARP driver for Terminator 3 ASIC (kernel verbs). T4 iWARP in the
works and will follow soon.

Build-tested with make universe.

30s overview
============
What interfaces support TCP offload? Look for TOE4 and/or TOE6 in the
capabilities of an interface:
# ifconfig -m | grep TOE

Enable/disable TCP offload on an interface (just like any other ifnet
capability):
# ifconfig cxgbe0 toe
# ifconfig cxgbe0 -toe

Which connections are offloaded? Look for toe4 and/or toe6 in the
output of netstat and sockstat:
# netstat -np tcp | grep toe
# sockstat -46c | grep toe

Reviewed by: bz, gnn
Sponsored by: Chelsio communications.
MFC after: ~3 months (after 9.1, and after ensuring MFC is feasible)


# 233202 19-Mar-2012 jhb

Retire the IF_ADDR_LOCK() and IF_ADDR_UNLOCK() compat macros from HEAD.
The new [RW]LOCK macros are merged back to 8.x so should be suitable for
new code in HEAD even if it is to be MFC'd.


# 231229 08-Feb-2012 pluknet

g/c last bit of old ipv6 prefix management.

Reviewed by: bz
Obtained from: NetBSD, net/if.h, rev 1.80


# 229873 09-Jan-2012 jhb

Convert the per-interface address list lock from a mutex to a reader/writer
lock.

Reviewed by: bz


# 229614 05-Jan-2012 jhb

Add new variants of the IF_ADDR_*LOCK*() macros used for protecting
interface address lists that distinguish read locks from write locks.
To preserve the KPI, the previous operations are mapped to the write
lock macros. The lock is still kept as a mutex for now.

Reviewed by: bz
MFC after: 2 weeks


# 228571 16-Dec-2011 glebius

A major overhaul of the CARP implementation. The ip_carp.c was started
from scratch, copying needed functionality from the old implemenation
on demand, with a thorough review of all code. The main change is that
interface layer has been removed from the CARP. Now redundant addresses
are configured exactly on the interfaces, they run on.

The CARP configuration itself is, as before, configured and read via
SIOCSVH/SIOCGVH ioctls. A new prefix created with SIOCAIFADDR or
SIOCAIFADDR_IN6 may now be configured to a particular virtual host id,
which makes the prefix redundant.

ifconfig(8) semantics has been changed too: now one doesn't need
to clone carpXX interface, he/she should directly configure a vhid
on a Ethernet interface.

To supply vhid data from the kernel to an application the getifaddrs(8)
function had been changed to pass ifam_data with each address. [1]

The new implementation definitely closes all PRs related to carp(4)
being an interface, and may close several others. It also allows
to run a single redundant IP per interface.

Big thanks to Bjoern Zeeb for his help with inet6 part of patch, for
idea on using ifam_data and for several rounds of reviewing!

PR: kern/117000, kern/126945, kern/126714, kern/120130, kern/117448
Reviewed by: bz
Submitted by: bz [1]


# 228380 09-Dec-2011 brooks

Remove the unused if_free_type() function.

X-MFC after: never


# 226830 27-Oct-2011 glebius

Add macro IF_DEQUEUE_ALL(ifq, m), that takes the entire mbuf chain off
the queue. It can be utilized in queue processing to avoid multiple
locking/unlocking.


# 224151 17-Jul-2011 bz

Add spares to the network stack for FreeBSD-9:
- TCP keep* timers
- TCP UTO (adjust from what was there already)
- netmap
- route caching
- user cookie (temporary to allow for the real fix)

Slightly re-shuffle struct ifnet moving fields out of the middle
of spares and to better align.

Discussed with: rwatson (slightly earlier version)


# 223739 03-Jul-2011 bz

Remove extra white space to comply with style for the rest of the struct.

MFC after: 2 weeks


# 223735 03-Jul-2011 bz

Add infrastructure to allow all frames/packets received on an interface
to be assigned to a non-default FIB instance.

You may need to recompile world or ports due to the change of struct ifnet.

Submitted by: cjsp
Submitted by: Alexander V. Chernikov (melifaro ipfw.ru)
(original versions)
Reviewed by: julian
Reviewed by: Alexander V. Chernikov (melifaro ipfw.ru)
MFC after: 2 weeks
X-MFC: use spare in struct ifnet


# 219819 21-Mar-2011 jeff

- Merge changes to the base system to support OFED. These include
a wider arg2 for sysctl, updates to vlan code, IFT_INFINIBAND,
and other miscellaneous small features.


# 208553 25-May-2010 qingli

This patch fixes the problem where proxy ARP entries cannot be added
over the if_ng interface.

MFC after: 3 days


# 205197 15-Mar-2010 mlaier

Fix a small bug in drbr_dequeue_cond spotted while preparing MFC of r203834.

MFC after: 3 days


# 203834 13-Feb-2010 mlaier

Fix drbr and altq interaction:
- introduce drbr_needs_enqueue that returns whether the interface/br needs
an enqueue operation: returns true if altq is enabled or there are
already packets in the ring (as we need to maintain packet order)
- update all drbr consumers
- fix drbr_flush
- avoid using the driver queue (IFQ_DRV_*) in the altq case as the
multiqueue consumer does not provide enough protection, serialize altq
interaction with the main queue lock
- make drbr_dequeue_cond work with altq

Discussed with: kmacy, yongari, jfv
MFC after: 4 weeks


# 203052 26-Jan-2010 delphij

Revised revision 199201 (add interface description capability as inspired
by OpenBSD), based on comments from many, including rwatson, jhb, brooks
and others.

Sponsored by: iXsystems, Inc.
MFC after: 1 month


# 202935 24-Jan-2010 syrinx

While flushing the multicast filter of an interface, do not zero the relevant
ifmultiaddr structures' reference to the parent interface, unless the parent
interface is really detaching. While here, program only link layer multicast
filters to a wlan's hardware parent interface.

PR: kern/142391, kern/142392
Reviewed by: sam, rpaolo, bms
MFC after: 1 week


# 202588 18-Jan-2010 thompsa

Declare a new EVENTHANDLER called iflladdr_event which signals that the L2
address on an interface has changed. This lets stacked interfaces such as
vlan(4) detect that their lower interface has changed and adjust things in
order to keep working. Previously this situation broke at least vlan(4) and
lagg(4) configurations.

The EVENTHANDLER_INVOKE call was not placed within if_setlladdr() due to the
risk of a loop.

PR: kern/142927
Submitted by: Nikolay Denev


# 201319 30-Dec-2009 qingli

Remove a deleted comment line that was brought back by
my previous commit.

MFC after: 5 days


# 201282 30-Dec-2009 qingli

The proxy arp entries could not be added into the system over the
IFF_POINTOPOINT link types. The reason was due to the routing
entry returned from the kernel covering the remote end is of an
interface type that does not support ARP. This patch fixes this
problem by providing a hint to the kernel routing code, which
indicates the prefix route instead of the PPP host route should
be returned to the caller. Since a host route to the local end
point is also added into the routing table, and there could be
multiple such instantiations due to multiple PPP links can be
created with the same local end IP address, this patch also fixes
the loopback route installation failure problem observed prior to
this patch. The reference count of loopback route to local end would
be either incremented or decremented. The first instantiation would
create the entry and the last removal would delete the route entry.

MFC after: 5 days


# 200805 21-Dec-2009 jhb

Remove commented out prototype for ifinit(). This prototype has been
commented out since 1.1 and has not been present in <sys/systm.h> since at
least 1.1 of that file. It is also not needed in FreeBSD due to SYSINIT().


# 199975 30-Nov-2009 jhb

Remove if_timer/if_watchdog now that they are no longer used. The space
used by if_timer is reserved for expanding if_index to an int in the
future.

Reviewed by: rwatson, brooks


# 199231 12-Nov-2009 delphij

Revert revision 199201 for now as it has introduced a kernel vulnerability
and requires more polishing.


# 199201 11-Nov-2009 delphij

Add interface description capability as inspired by OpenBSD.

MFC after: 3 months


# 197227 15-Sep-2009 qingli

Self pointing routes are installed for configured interface addresses
and address aliases. After an interface is brought down and brought
back up again, those self pointing routes disappeared. This patch
ensures after an interface is brought back up, the loopback routes
are reinstalled properly.

Reviewed by: bz
MFC after: immediately


# 196510 24-Aug-2009 rwatson

Make if_grow static -- it's not used outside of if.c, and with the
internals destined to change, it's better if it remains that way.

MFC after: 3 days


# 196481 23-Aug-2009 rwatson

Rework global locks for interface list and index management, correcting
several critical bugs, including race conditions and lock order issues:

Replace the single rwlock, ifnet_lock, with two locks, an rwlock and an
sxlock. Either can be held to stablize the lists and indexes, but both
are required to write. This allows the list to be held stable in both
network interrupt contexts and sleepable user threads across sleeping
memory allocations or device driver interactions. As before, writes to
the interface list must occur from sleepable contexts.

Reviewed by: bz, julian
MFC after: 3 days


# 196263 15-Aug-2009 rwatson

Remove unused if_rawoutput() macro; it has been unused since at least
FreeBSD 2.

Approved by: re (kib)


# 195914 27-Jul-2009 qingli

This patch does the following:

- Allow loopback route to be installed for address assigned to
interface of IFF_POINTOPOINT type.
- Install loopback route for an IPv4 interface addreess when the
"useloopback" sysctl variable is enabled. Similarly, install
loopback route for an IPv6 interface address when the sysctl variable
"nd6_useloopback" is enabled. Deleting loopback routes for interface
addresses is unconditional in case these sysctl variables were
disabled after an interface address has been assigned.

Reviewed by: bz
Approved by: re


# 195727 16-Jul-2009 rwatson

Remove unused VNET_SET() and related macros; only VNET_GET() is
ever actually used. Rename VNET_GET() to VNET() to shorten
variable references.

Discussed with: bz, julian
Reviewed by: bz
Approved by: re (kensmith, kib)


# 195699 14-Jul-2009 rwatson

Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator. Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...). This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.

Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack. Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory. Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.

Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy. Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address. When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.

This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.

Bump __FreeBSD_version and update UPDATING.

Portions submitted by: bz
Reviewed by: bz, zec
Discussed with: gnn, jamie, jeff, jhb, julian, sam
Suggested by: peter
Approved by: re (kensmith)


# 195175 29-Jun-2009 brooks

Remove support for the /dev/net/* per-interface devices. They serve
little purpose and are unused in the base system.

The IOCTL functionality is entirely duplicated and routing sockets
provide a richer interface than the kqueue functionality.

Further, it is not practical for these devices to be made sensible in
the face of VIMAGE.

Bump __FreeBSD_version on the off chance that there is any code out
there that actually uses this stuff.

Reviewed by: rwatson
Discussed with: bz, zec
Approved by: re@ (kensmith)


# 195020 25-Jun-2009 rwatson

Define four wrapper functions for interface address locking,
if_addr_rlock() and if_addr_runlock() for regular address lists, and
if_maddr_rlock() and if_maddr_runlock() for multicast address lists.

We will use these in various kernel modules to avoid encoding specific
type and locking strategy information into modules that currently use
IF_ADDR_LOCK() and IF_ADDR_UNLOCK() directly.

MFC after: 6 weeks


# 194622 22-Jun-2009 rwatson

Add a new function, ifa_ifwithaddr_check(), which rather than returning
a pointer to an ifaddr matching the passed socket address, returns a
boolean indicating whether one was present. In the (near) future,
ifa_ifwithaddr() will return a referenced ifaddr rather than a raw
ifaddr pointer, and the new wrapper will allow callers that care only
about the boolean condition to avoid having to free that reference.

MFC after: 3 weeks


# 194602 21-Jun-2009 rwatson

Clean up common ifaddr management:

- Unify reference count and lock initialization in a single function,
ifa_init().
- Move tear-down from a macro (IFAFREE) to a function ifa_free().
- Move reference count bump from a macro (IFAREF) to a function ifa_ref().
- Instead of using a u_int protected by a mutex to refcount(9) for
reference count management.

The ifa_mtx is now used for exactly one ioctl, and possibly should be
removed.

MFC after: 3 weeks


# 194518 19-Jun-2009 kmacy

add helper function for flushing software queues


# 194259 15-Jun-2009 sam

r193336 moved ifq_detach to if_free which broke if_alloc followed
by if_free (w/o doing if_attach); move ifq_attach to if_alloc and
rename ifq_attach/detach to ifq_init/ifq_delete to better identify
their purpose

Reviewed by: jhb, kmacy


# 193848 09-Jun-2009 kmacy

- add drbr routines for accessing #qentries and conditionally dequeueing
- track bytes enqueued in buf_ring


# 193731 08-Jun-2009 zec

Introduce an infrastructure for dismantling vnet instances.

Vnet modules and protocol domains may now register destructor
functions to clean up and release per-module state. The destructor
mechanisms can be triggered by invoking "vimage -d", or a future
equivalent command which will be provided via the new jail framework.

While this patch introduces numerous placeholder destructor functions,
many of those are currently incomplete, thus leaking memory or (even
worse) failing to stop all running timers. Many of such issues are
already known and will be incrementaly fixed over the next weeks in
smaller incremental commits.

Apart from introducing new fields in structs ifnet, domain, protosw
and vnet_net, which requires the kernel and modules to be rebuilt, this
change should have no impact on nooptions VIMAGE builds, since vnet
destructors can only be called in VIMAGE kernels. Moreover,
destructor functions should be in general compiled in only in
options VIMAGE builds, except for kernel modules which can be safely
kldunloaded at run time.

Bump __FreeBSD_version to 800097.
Reviewed by: bz, julian
Approved by: rwatson, kib (re), julian (mentor)


# 193096 30-May-2009 attilio

When user_frac in the polling subsystem is low it is going to busy the
CPU for too long period than necessary. Additively, interfaces are kept
polled (in the tick) even if no more packets are available.
In order to avoid such situations a new generic mechanism can be
implemented in proactive way, keeping track of the time spent on any
packet and fragmenting the time for any tick, stopping the processing
as soon as possible.

In order to implement such mechanism, the polling handler needs to
change, returning the number of packets processed.
While the intended logic is not part of this patch, the polling KPI is
broken by this commit, adding an int return value and the new flag
IFCAP_POLLING_NOCOUNT (which will signal that the return value is
meaningless for the installed handler and checking should be skipped).

Bump __FreeBSD_version in order to signal such situation.

Reviewed by: emaste
Sponsored by: Sandvine Incorporated


# 192605 22-May-2009 zec

Introduce the if_vmove() function, which will be used in the future
for reassigning ifnets from one vnet to another.

if_vmove() works by calling a restricted subset of actions normally
executed by if_detach() on an ifnet in the current vnet, and then
switches to the target vnet and executes an appropriate subset of
if_attach() actions there.

if_attach() and if_detach() have become wrapper functions around
if_attach_internal() and if_detach_internal(), where the later
variants have an additional argument, a flag indicating whether a
full attach or detach sequence is to be executed, or only a
restricted subset suitable for moving an ifnet from one vnet to
another. Hence, if_vmove() will not call if_detach() and if_attach()
directly, but will call the if_detach_internal() and
if_attach_internal() variants instead, with the vmove flag set.

While here, staticize ifnet_setbyindex() since it is not referenced
from outside of sys/net/if.c.

Also rename ifccnt field in struct vimage to ifcnt, and do some minor
whitespace garbage collection where appropriate.

This change should have no functional impact on nooptions VIMAGE kernel
builds.

Reviewed by: bz, rwatson, brooks?
Approved by: julian (mentor)


# 191816 05-May-2009 zec

Change the curvnet variable from a global const struct vnet *,
previously always pointing to the default vnet context, to a
dynamically changing thread-local one. The currvnet context
should be set on entry to networking code via CURVNET_SET() macros,
and reverted to previous state via CURVNET_RESTORE(). Recursions
on curvnet are permitted, though strongly discuouraged.

This change should have no functional impact on nooptions VIMAGE
kernel builds, where CURVNET_* macros expand to whitespace.

The curthread->td_vnet (aka curvnet) variable's purpose is to be an
indicator of the vnet context in which the current network-related
operation takes place, in case we cannot deduce the current vnet
context from any other source, such as by looking at mbuf's
m->m_pkthdr.rcvif->if_vnet, sockets's so->so_vnet etc. Moreover, so
far curvnet has turned out to be an invaluable consistency checking
aid: it helps to catch cases when sockets, ifnets or any other
vnet-aware structures may have leaked from one vnet to another.

The exact placement of the CURVNET_SET() / CURVNET_RESTORE() macros
was a result of an empirical iterative process, whith an aim to
reduce recursions on CURVNET_SET() to a minimum, while still reducing
the scope of CURVNET_SET() to networking only operations - the
alternative would be calling CURVNET_SET() on each system call entry.
In general, curvnet has to be set in three typicall cases: when
processing socket-related requests from userspace or from within the
kernel; when processing inbound traffic flowing from device drivers
to upper layers of the networking stack, and when executing
timer-driven networking functions.

This change also introduces a DDB subcommand to show the list of all
vnet instances.

Approved by: julian (mentor)


# 191688 30-Apr-2009 zec

Permit buiding kernels with options VIMAGE, restricted to only a single
active network stack instance. Turning on options VIMAGE at compile
time yields the following changes relative to default kernel build:

1) V_ accessor macros for virtualized variables resolve to structure
fields via base pointers, instead of being resolved as fields in global
structs or plain global variables. As an example, V_ifnet becomes:

options VIMAGE: ((struct vnet_net *) vnet_net)->_ifnet
default build: vnet_net_0._ifnet
options VIMAGE_GLOBALS: ifnet

2) INIT_VNET_* macros will declare and set up base pointers to be used
by V_ accessor macros, instead of resolving to whitespace:

INIT_VNET_NET(ifp->if_vnet); becomes

struct vnet_net *vnet_net = (ifp->if_vnet)->mod_data[VNET_MOD_NET];

3) Memory for vnet modules registered via vnet_mod_register() is now
allocated at run time in sys/kern/kern_vimage.c, instead of per vnet
module structs being declared as globals. If required, vnet modules
can now request the framework to provide them with allocated bzeroed
memory by filling in the vmi_size field in their vmi_modinfo structures.

4) structs socket, ifnet, inpcbinfo, tcpcb and syncache_head are
extended to hold a pointer to the parent vnet. options VIMAGE builds
will fill in those fields as required.

5) curvnet is introduced as a new global variable in options VIMAGE
builds, always pointing to the default and only struct vnet.

6) struct sysctl_oid has been extended with additional two fields to
store major and minor virtualization module identifiers, oid_v_subs and
oid_v_mod. SYSCTL_V_* family of macros will fill in those fields
accordingly, and store the offset in the appropriate vnet container
struct in oid_arg1.
In sysctl handlers dealing with virtualized sysctls, the
SYSCTL_RESOLVE_V_ARG1() macro will compute the address of the target
variable and make it available in arg1 variable for further processing.

Unused fields in structs vnet_inet, vnet_inet6 and vnet_ipfw have
been deleted.

Reviewed by: bz, rwatson
Approved by: julian (mentor)


# 191423 23-Apr-2009 rwatson

Add ifunit_ref(), a version of ifunit(), that returns not just an
interface pointer, but also a reference to it.

Modify ifioctl() to use ifunit_ref(), holding the reference until
all ioctls, etc, have completed.

This closes a class of reader-writer races in which interfaces
could be removed during long-running ioctls, leading to crashes.
Many other consumers of ifunit() should now use ifunit_ref() to
avoid similar races.

MFC after: 3 weeks


# 191418 23-Apr-2009 rwatson

During if_detach(), invoke if_dead() to set the ifnet's function
pointers to "dead" implementations that no-op rather than invoking
the device driver. This would generally be unexpected and
possibly quite badly handled by most device drivers after
if_detach() has completed.

Reviewed by: bms
MFC after: 3 weeks


# 191367 21-Apr-2009 rwatson

Start to address a number of races relating to use of ifnet pointers
after the corresponding interface has been destroyed:

(1) Add an ifnet refcount, ifp->if_refcount. Initialize it to 1 in
if_alloc(), and modify if_free_type() to decrement and check the
refcount.

(2) Add new if_ref() and if_rele() interfaces to allow kernel code
walking global interface lists to release IFNET_[RW]LOCK() yet
keep the ifnet stable. Currently, if_rele() is a no-op wrapper
around if_free(), but this may change in the future.

(3) Add new ifnet field, if_alloctype, which caches the type passed
to if_alloc(), but unlike if_type, won't be changed by drivers.
This allows asynchronous free's of the interface after the
driver has released it to still use the right type. Use that
instead of the type passed to if_free_type(), but assert that
they are the same (might have to rethink this if that doesn't
work out).

(4) Add a new ifnet_byindex_ref(), which looks up an interface by
index and returns a reference rather than a pointer to it.

(5) Fix if_alloc() to fully initialize the if_addr_mtx before hooking
up the ifnet to global lists.

(6) Modify sysctls in if_mib.c to use ifnet_byindex_ref() and release
the ifnet when done.

When this change is MFC'd, it will need to replace if_ispare fields
rather than adding new fields in order to avoid breaking the binary
interface. Once this change is MFC'd, if_free_type() should be
removed, as its 'type' argument is now optional.

This refcount is not appropriate for counting mbuf pkthdr references,
and also not for counting entry into the device driver via ifnet
function pointers. An rmlock may be appropriate for the latter.
Rather, this is about ensuring data structure stability when reaching
an ifnet via global ifnet lists and tables followed by copy in or out
of userspace.

MFC after: 3 weeks
Reported by: mdtancsa
Reviewed by: brooks


# 191161 16-Apr-2009 kmacy

export if_qflush for use by driver if_qflush routines
only set ifp->if_{transmit, qflush} if not already set
KASSERT that neither or both are set


# 191148 16-Apr-2009 kmacy

Change if_output to take a struct route as its fourth argument in order
to allow passing a cached struct llentry * down to L2

Reviewed by: rwatson


# 191033 13-Apr-2009 kmacy

Adapt buf_ring abstraction interface to allow consumers to interoperate with ALTQ


# 189851 15-Mar-2009 rwatson

Remove IFF_NEEDSGIANT, a compatibility infrastructure introduced
in FreeBSD 5.x to allow network device drivers to run with Giant
despite the network stack being Giant-free. This significantly
simplifies calls into ioctl() on network interfaces, especially
in the multicast code, as well as eliminates deferred invocation
of interface if_start routines.

Disable the build on device drivers still depending on
IFF_NEEDSGIANT as they no longer compile. They will be removed
in a few weeks if they haven't been made MPSAFE in that time.
Disabled drivers:

if_ar
if_axe
if_aue
if_cdce
if_cue
if_kue
if_ray
if_rue
if_rum
if_sr
if_udav
if_ural
if_zyd

Drivers that were already disabled because of tty changes:

if_ppp
if_sl

Discussed on: arch@


# 189230 01-Mar-2009 rwatson

Do a bit of struct ifnet cleanup in preparation for 8.0: group function
pointers together, move padding to the bottom of the structure, and add
two new integer spares due to attrition over time. Remove unused spare
"flags" field, we can use one of the spare ints if we need it later.

This change requires a rebuild of device driver modules that depend on
the layout of ifnet for binary compatibility reasons.

Discussed with: kmacy


# 186213 17-Dec-2008 kmacy

Keep stats in drbr_enqueue

Discussed with: ps


# 186207 17-Dec-2008 kmacy

merge in 2 buf_ring helper routines for enqueueing and freeing buf_rings


# 186199 16-Dec-2008 kmacy

convert ifnet and afdata locks from mutexes to rwlocks


# 186119 15-Dec-2008 qingli

This main goals of this project are:
1. separating L2 tables (ARP, NDP) from the L3 routing tables
2. removing as much locking dependencies among these layers as
possible to allow for some parallelism in the search operations
3. simplify the logic in the routing code,

The most notable end result is the obsolescent of the route
cloning (RTF_CLONING) concept, which translated into code reduction
in both IPv4 ARP and IPv6 NDP related modules, and size reduction in
struct rtentry{}. The change in design obsoletes the semantics of
RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland
applications such as "arp" and "ndp" have been modified to reflect
those changes. The output from "netstat -r" shows only the routing
entries.

Quite a few developers have contributed to this project in the
past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and
Andre Oppermann. And most recently:

- Kip Macy revised the locking code completely, thus completing
the last piece of the puzzle, Kip has also been conducting
active functional testing
- Sam Leffler has helped me improving/refactoring the code, and
provided valuable reviews
- Julian Elischer setup the perforce tree for me and has helped
me maintaining that branch before the svn conversion


# 186048 13-Dec-2008 bz

Second round of putting global variables, which were virtualized
but formerly missed under VIMAGE_GLOBAL.

Put the extern declarations of the virtualized globals
under VIMAGE_GLOBAL as the globals themsevles are already.
This will help by the time when we are going to remove the globals
entirely.

Sponsored by: The FreeBSD Foundation


# 185571 02-Dec-2008 bz

Rather than using hidden includes (with cicular dependencies),
directly include only the header files needed. This reduces the
unneeded spamming of various headers into lots of files.

For now, this leaves us with very few modules including vnet.h
and thus needing to depend on opt_route.h.

Reviewed by: brooks, gnn, des, zec, imp
Sponsored by: The FreeBSD Foundation


# 185162 22-Nov-2008 kmacy

- bump __FreeBSD version to reflect added buf_ring, memory barriers,
and ifnet functions

- add memory barriers to <machine/atomic.h>
- update drivers to only conditionally define their own

- add lockless producer / consumer ring buffer
- remove ring buffer implementation from cxgb and update its callers

- add if_transmit(struct ifnet *ifp, struct mbuf *m) to ifnet to
allow drivers to efficiently manage multiple hardware queues
(i.e. not serialize all packets through one ifq)
- expose if_qflush to allow drivers to flush any driver managed queues

This work was supported by Bitgravity Inc. and Chelsio Inc.


# 183550 02-Oct-2008 zec

Step 1.5 of importing the network stack virtualization infrastructure
from the vimage project, as per plan established at devsummit 08/08:
http://wiki.freebsd.org/Image/Notes200808DevSummit

Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator
macros, and CURVNET_SET() context setting macros, all currently
resolving to NOPs.

Prepare for virtualization of selected SYSCTL objects by introducing a
family of SYSCTL_V_*() macros, currently resolving to their global
counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT().

Move selected #defines from sys/sys/vimage.h to newly introduced header
files specific to virtualized subsystems (sys/net/vnet.h,
sys/netinet/vinet.h etc.).

All the changes are verified to have zero functional impact at this
point in time by doing MD5 comparision between pre- and post-change
object files(*).

(*) netipsec/keysock.c did not validate depending on compile time options.

Implemented by: julian, bz, brooks, zec
Reviewed by: julian, bz, brooks, kris, rwatson, ...
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation


# 181900 20-Aug-2008 thompsa

ifnet_setbyindex() is only used locally, go back to being static.


# 181892 20-Aug-2008 kmacy

Fix build


# 180042 26-Jun-2008 rwatson

Introduce locking around use of ifindex_table, whose use was previously
unsynchronized. While races were extremely rare, we've now had a
couple of reports of panics in environments involving large numbers of
IPSEC tunnels being added very quickly on an active system.

- Add accessor functions ifnet_byindex(), ifaddr_byindex(),
ifdev_byindex() to replace existing accessor macros. These functions
now acquire the ifnet lock before derefencing the table.
- Add IFNET_WLOCK_ASSERT().
- Add static accessor functions ifnet_setbyindex(), ifdev_setbyindex(),
which set values in the table either asserting of acquiring the ifnet
lock.
- Use accessor functions throughout if.c to modify and read
ifindex_table.
- Rework ifnet attach/detach to lock around ifindex_table modification.

Note that these changes simply close races around use of ifindex_table,
and make no attempt to solve the probem of disappearing ifnets. Further
refinement of this work, including with respect to ifindex_table
resizing, is still required.

In a future change, the ifnet lock should be converted from a mutex to an
rwlock in order to reduce contention.

Reviewed and tested by: brooks


# 178888 09-May-2008 julian

Add code to allow the system to handle multiple routing tables.
This particular implementation is designed to be fully backwards compatible
and to be MFC-able to 7.x (and 6.x)

Currently the only protocol that can make use of the multiple tables is IPv4
Similar functionality exists in OpenBSD and Linux.

From my notes:

-----

One thing where FreeBSD has been falling behind, and which by chance I
have some time to work on is "policy based routing", which allows
different
packet streams to be routed by more than just the destination address.

Constraints:
------------

I want to make some form of this available in the 6.x tree
(and by extension 7.x) , but FreeBSD in general needs it so I might as
well do it in -current and back port the portions I need.

One of the ways that this can be done is to have the ability to
instantiate multiple kernel routing tables (which I will now
refer to as "Forwarding Information Bases" or "FIBs" for political
correctness reasons). Which FIB a particular packet uses to make
the next hop decision can be decided by a number of mechanisms.
The policies these mechanisms implement are the "Policies" referred
to in "Policy based routing".

One of the constraints I have if I try to back port this work to
6.x is that it must be implemented as a EXTENSION to the existing
ABIs in 6.x so that third party applications do not need to be
recompiled in timespan of the branch.

This first version will not have some of the bells and whistles that
will come with later versions. It will, for example, be limited to 16
tables in the first commit.
Implementation method, Compatible version. (part 1)
-------------------------------
For this reason I have implemented a "sufficient subset" of a
multiple routing table solution in Perforce, and back-ported it
to 6.x. (also in Perforce though not always caught up with what I
have done in -current/P4). The subset allows a number of FIBs
to be defined at compile time (8 is sufficient for my purposes in 6.x)
and implements the changes needed to allow IPV4 to use them. I have not
done the changes for ipv6 simply because I do not need it, and I do not
have enough knowledge of ipv6 (e.g. neighbor discovery) needed to do it.

Other protocol families are left untouched and should there be
users with proprietary protocol families, they should continue to work
and be oblivious to the existence of the extra FIBs.

To understand how this is done, one must know that the current FIB
code starts everything off with a single dimensional array of
pointers to FIB head structures (One per protocol family), each of
which in turn points to the trie of routes available to that family.

The basic change in the ABI compatible version of the change is to
extent that array to be a 2 dimensional array, so that
instead of protocol family X looking at rt_tables[X] for the
table it needs, it looks at rt_tables[Y][X] when for all
protocol families except ipv4 Y is always 0.
Code that is unaware of the change always just sees the first row
of the table, which of course looks just like the one dimensional
array that existed before.

The entry points rtrequest(), rtalloc(), rtalloc1(), rtalloc_ign()
are all maintained, but refer only to the first row of the array,
so that existing callers in proprietary protocols can continue to
do the "right thing".
Some new entry points are added, for the exclusive use of ipv4 code
called in_rtrequest(), in_rtalloc(), in_rtalloc1() and in_rtalloc_ign(),
which have an extra argument which refers the code to the correct row.

In addition, there are some new entry points (currently called
rtalloc_fib() and friends) that check the Address family being
looked up and call either rtalloc() (and friends) if the protocol
is not IPv4 forcing the action to row 0 or to the appropriate row
if it IS IPv4 (and that info is available). These are for calling
from code that is not specific to any particular protocol. The way
these are implemented would change in the non ABI preserving code
to be added later.

One feature of the first version of the code is that for ipv4,
the interface routes show up automatically on all the FIBs, so
that no matter what FIB you select you always have the basic
direct attached hosts available to you. (rtinit() does this
automatically).

You CAN delete an interface route from one FIB should you want
to but by default it's there. ARP information is also available
in each FIB. It's assumed that the same machine would have the
same MAC address, regardless of which FIB you are using to get
to it.

This brings us as to how the correct FIB is selected for an outgoing
IPV4 packet.

Firstly, all packets have a FIB associated with them. if nothing
has been done to change it, it will be FIB 0. The FIB is changed
in the following ways.

Packets fall into one of a number of classes.

1/ locally generated packets, coming from a socket/PCB.
Such packets select a FIB from a number associated with the
socket/PCB. This in turn is inherited from the process,
but can be changed by a socket option. The process in turn
inherits it on fork. I have written a utility call setfib
that acts a bit like nice..

setfib -3 ping target.example.com # will use fib 3 for ping.

It is an obvious extension to make it a property of a jail
but I have not done so. It can be achieved by combining the setfib and
jail commands.

2/ packets received on an interface for forwarding.
By default these packets would use table 0,
(or possibly a number settable in a sysctl(not yet)).
but prior to routing the firewall can inspect them (see below).
(possibly in the future you may be able to associate a FIB
with packets received on an interface.. An ifconfig arg, but not yet.)

3/ packets inspected by a packet classifier, which can arbitrarily
associate a fib with it on a packet by packet basis.
A fib assigned to a packet by a packet classifier
(such as ipfw) would over-ride a fib associated by
a more default source. (such as cases 1 or 2).

4/ a tcp listen socket associated with a fib will generate
accept sockets that are associated with that same fib.

5/ Packets generated in response to some other packet (e.g. reset
or icmp packets). These should use the FIB associated with the
packet being reponded to.

6/ Packets generated during encapsulation.
gif, tun and other tunnel interfaces will encapsulate using the FIB
that was in effect withthe proces that set up the tunnel.
thus setfib 1 ifconfig gif0 [tunnel instructions]
will set the fib for the tunnel to use to be fib 1.

Routing messages would be associated with their
process, and thus select one FIB or another.
messages from the kernel would be associated with the fib they
refer to and would only be received by a routing socket associated
with that fib. (not yet implemented)

In addition Netstat has been edited to be able to cope with the
fact that the array is now 2 dimensional. (It looks in system
memory using libkvm (!)). Old versions of netstat see only the first FIB.

In addition two sysctls are added to give:
a) the number of FIBs compiled in (active)
b) the default FIB of the calling process.

Early testing experience:
-------------------------

Basically our (IronPort's) appliance does this functionality already
using ipfw fwd but that method has some drawbacks.

For example,
It can't fully simulate a routing table because it can't influence the
socket's choice of local address when a connect() is done.

Testing during the generating of these changes has been
remarkably smooth so far. Multiple tables have co-existed
with no notable side effects, and packets have been routes
accordingly.

ipfw has grown 2 new keywords:

setfib N ip from anay to any
count ip from any to any fib N

In pf there seems to be a requirement to be able to give symbolic names to the
fibs but I do not have that capacity. I am not sure if it is required.

SCTP has interestingly enough built in support for this, called VRFs
in Cisco parlance. it will be interesting to see how that handles it
when it suddenly actually does something.

Where to next:
--------------------

After committing the ABI compatible version and MFCing it, I'd
like to proceed in a forward direction in -current. this will
result in some roto-tilling in the routing code.

Firstly: the current code's idea of having a separate tree per
protocol family, all of the same format, and pointed to by the
1 dimensional array is a bit silly. Especially when one considers that
there is code that makes assumptions about every protocol having the
same internal structures there. Some protocols don't WANT that
sort of structure. (for example the whole idea of a netmask is foreign
to appletalk). This needs to be made opaque to the external code.

My suggested first change is to add routing method pointers to the
'domain' structure, along with information pointing the data.
instead of having an array of pointers to uniform structures,
there would be an array pointing to the 'domain' structures
for each protocol address domain (protocol family),
and the methods this reached would be called. The methods would have
an argument that gives FIB number, but the protocol would be free
to ignore it.

When the ABI can be changed it raises the possibilty of the
addition of a fib entry into the "struct route". Currently,
the structure contains the sockaddr of the desination, and the resulting
fib entry. To make this work fully, one could add a fib number
so that given an address and a fib, one can find the third element, the
fib entry.

Interaction with the ARP layer/ LL layer would need to be
revisited as well. Qing Li has been working on this already.

This work was sponsored by Ironport Systems/Cisco

Reviewed by: several including rwatson, bz and mlair (parts each)
Obtained from: Ironport systems/Cisco


# 177617 25-Mar-2008 sam

expose if_purgemaddrs, it will be used by the vap code unless someone
redesigns the mcast support code in the next few weeks

MFC after: 3 weeks


# 174388 06-Dec-2007 kmacy

Add padding for anticipated functionality
- vimage
- TOE
- multiq
- host rtentry caching

Rename spare used by 80211 to if_llsoftc

Reviewed by: rwatson, gnn
MFC after: 1 day


# 169614 16-May-2007 brooks

The struct if_data members ifi_recvquota and ifi_xmitquota have been
unused for ages. Rename them to ifi_spare_char1 and ifi_spare_char2
respectively to indicate this face.


# 168793 16-Apr-2007 thompsa

Rename the trunk(4) driver to lagg(4) as it is too similar to vlan trunking.

The name trunk is misused as the networking term trunk means carrying multiple
VLANs over a single connection. The IEEE standard for link aggregation (802.3
section 3) does not talk about 'trunk' at all while it is used throughout IEEE
802.1Q in describing vlans.

The lagg(4) driver provides link aggregation, failover and fault tolerance.

Discussed on: current@


# 168561 09-Apr-2007 thompsa

Add the trunk(4) driver for providing link aggregation, failover and fault
tolerance. This driver allows aggregation of multiple network interfaces as
one virtual interface using a number of different protocols/algorithms.

failover - Sends traffic through the secondary port if the master becomes
inactive.
fec - Supports Cisco Fast EtherChannel.
lacp - Supports the IEEE 802.3ad Link Aggregation Control Protocol
(LACP) and the Marker Protocol.
loadbalance - Static loadbalancing using an outgoing hash.
roundrobin - Distributes outgoing traffic using a round-robin scheduler
through all active ports.

This code was obtained from OpenBSD and this also includes 802.3ad LACP support
from agr(4) in NetBSD.


# 167732 20-Mar-2007 bms

Fix tinderbox; ng_ether needs to see if_findmulti().


# 167729 19-Mar-2007 bms

Implement reference counting for ifmultiaddr, in_multi, and in6_multi
structures. Detect when ifnet instances are detached from the network
stack and perform appropriate cleanup to prevent memory leaks.

This has been implemented in such a way as to be backwards ABI compatible.
Kernel consumers are changed to use if_delmulti_ifma(); in_delmulti()
is unable to detect interface removal by design, as it performs searches
on structures which are removed with the interface.

With this architectural change, the panics FreeBSD users have experienced
with carp and pfsync should be resolved.

Obtained from: p4 branch bms_netdev
Reviewed by: andre
Sponsored by: Garance A Drosehn
Idea from: NetBSD
MFC after: 1 month


# 162070 06-Sep-2006 andre

Improve description of if_capabilities, if_capenable and ifi_hwassist.

Sponsored by: TCP/IP Optimization Fundraise 2005


# 162068 06-Sep-2006 andre

Fix the socket option IP_ONESBCAST by giving it its own case in ip_output()
and skip over the normal IP processing.

Add a supporting function ifa_ifwithbroadaddr() to verify and validate the
supplied subnet broadcast address.

PR: kern/99558
Tested by: Andrey V. Elsukov <bu7cher-at-yandex.ru>
Sponsored by: TCP/IP Optimization Fundraise 2005
MFC after: 3 days


# 160981 04-Aug-2006 brooks

With exception of the if_name() macro, all definitions in net_osdep.h
were unused or already in if_var.h so add if_name() to if_var.h and
remove net_osdep.h along with all references to it.

Longer term we may want to kill off if_name() entierly since all modern
BSDs have if_xname variables rendering it unnecessicary.


# 159781 19-Jun-2006 mlaier

Import interface groups from OpenBSD. This allows to group interfaces in
order to - for example - apply firewall rules to a whole group of
interfaces. This is required for importing pf from OpenBSD 3.9

Obtained from: OpenBSD (with changes)
Discussed on: -net (back in April)


# 155051 30-Jan-2006 glebius

Merge the //depot/user/yar/vlan branch into CVS. It contains some collective
work by yar, thompsa and myself. The checksum offloading part also involves
work done by Mihail Balikov.

The most important changes:

o Instead of global linked list of all vlan softc use a per-trunk
hash. The size of hash is dynamically adjusted, depending on
number of entries. This changes struct ifnet, replacing counter
of vlans with a pointer to trunk structure. This change is an
improvement for setups with big number of VLANs, several interfaces
and several CPUs. It is a small regression for a setup with a single
VLAN interface.
An alternative to dynamic hash is a per-trunk static array with
4096 entries, which is a compile time option - VLAN_ARRAY. In my
experiments the array is not an improvement, probably because such
a big trunk structure doesn't fit into CPU cache.
o Introduce an UMA zone for VLAN tags. Since drivers depend on it,
the zone is declared in kern_mbuf.c, not in optional vlan(4) driver.
This change is a big improvement for any setup utilizing vlan(4).
o Use rwlock(9) instead of mutex(9) for locking. We are the first
ones to do this! :)
o Some drivers can do hardware VLAN tagging + hardware checksum
offloading. Add an infrastructure for this. Whenever vlan(4) is
attached to a parent or parent configuration is changed, the flags
on vlan(4) interface are updated.

In collaboration with: yar, thompsa
In collaboration with: Mihail Balikov <mihail.balikov interbgc.com>


# 152315 11-Nov-2005 ru

- Store pointer to the link-level address right in "struct ifnet"
rather than in ifindex_table[]; all (except one) accesses are
through ifp anyway. IF_LLADDR() works faster, and all (except
one) ifaddr_byindex() users were converted to use ifp->if_addr.

- Stop storing a (pointer to) Ethernet address in "struct arpcom",
and drop the IFP2ENADDR() macro; all users have been converted
to use IF_LLADDR() instead.


# 152209 08-Nov-2005 thompsa

Move the cloned interface list management in to if_clone. For some drivers the
softc lists and associated mutex are now unused so these have been removed.

Calling if_clone_detach() will now destroy all the cloned interfaces for the
driver and in most cases is all thats needed to unload.

Idea by: brooks
Reviewed by: brooks


# 150789 01-Oct-2005 glebius

Big polling(4) cleanup.

o Axe poll in trap.

o Axe IFF_POLLING flag from if_flags.

o Rework revision 1.21 (Giant removal), in such a way that
poll_mtx is not dropped during call to polling handler.
This fixes problem with idle polling.

o Make registration and deregistration from polling in a
functional way, insted of next tick/interrupt.

o Obsolete kern.polling.enable. Polling is turned on/off
with ifconfig.

Detailed kern_poll.c changes:
- Remove polling handler flags, introduced in 1.21. The are not
needed now.
- Forget and do not check if_flags, if_capenable and if_drv_flags.
- Call all registered polling handlers unconditionally.
- Do not drop poll_mtx, when entering polling handlers.
- In ether_poll() NET_LOCK_GIANT prior to locking poll_mtx.
- In netisr_poll() axe the block, where polling code asks drivers
to unregister.
- In netisr_poll() and ether_poll() do polling always, if any
handlers are present.
- In ether_poll_[de]register() remove a lot of error hiding code. Assert
that arguments are correct, instead.
- In ether_poll_[de]register() use standard return values in case of
error or success.
- Introduce poll_switch() that is a sysctl handler for kern.polling.enable.
poll_switch() goes through interface list and enabled/disables polling.
A message that kern.polling.enable is deprecated is printed.

Detailed driver changes:
- On attach driver announces IFCAP_POLLING in if_capabilities, but
not in if_capenable.
- On detach driver calls ether_poll_deregister() if polling is enabled.
- In polling handler driver obtains its lock and checks IFF_DRV_RUNNING
flag. If there is no, then unlocks and returns.
- In ioctl handler driver checks for IFCAP_POLLING flag requested to
be set or cleared. Driver first calls ether_poll_[de]register(), then
obtains driver lock and [dis/en]ables interrupts.
- In interrupt handler driver checks IFCAP_POLLING flag in if_capenable.
If present, then returns.This is important to protect from spurious
interrupts.

Reviewed by: ru, sam, jhb


# 148886 09-Aug-2005 rwatson

Rename IFF_RUNNING to IFF_DRV_RUNNING, IFF_OACTIVE to IFF_DRV_OACTIVE,
and move both flags from ifnet.if_flags to ifnet.if_drv_flags, making
and documenting the locking of these flags the responsibility of the
device driver, not the network stack. The flags for these two fields
will be mutually exclusive so that they can be exposed to user space as
though they were stored in the same variable.

Provide #defines to provide the old names #ifndef _KERNEL, so that user
applications (such as ifconfig) can use the old flag names. Using the
old names in a device driver will result in a compile error in order to
help device driver writers adopt the new model.

When exposing the interface flags to user space, via interface ioctls
or routing sockets, or the two fields together. Since the driver flags
cannot currently be set for user space, no new logic is currently
required to handle this case.

Add some assertions that general purpose network stack routines, such
as if_setflags(), are not improperly used on driver-owned flags.

With this change, a large number of very minor network stack races are
closed, subject to correct device driver locking. Most were likely
never triggered.

Driver sweep to follow; many thanks to pjd and bz for the line-by-line
review they gave this patch.

Reviewed by: pjd, bz
MFC after: 7 days


# 148652 02-Aug-2005 rwatson

Protect link layer network interface multicast address list manipulation
using ifp->if_addr_mtx:

- Initialize if_addr_mtx when ifnet is initialized.

- Destroy if_addr_mtx when ifnet is torn down.

- Rename ifmaof_ifpforaddr() to if_findmulti(); assert if_addr_mtx.
Staticize.

- Extract ifmultiaddr allocation and initialization into if_allocmulti();
accept a 'mflags' argument to indicate whether or not sleeping is
permitted. This centralizes error handling and address duplication.

- Extract ifmultiaddr tear-down and deallocation in if_freemulti().

- Re-structure if_addmulti() to hold if_addr_mtx around manipulation of
the ifnet multicast address list and reference count manipulation.
Make use of non-sleeping allocations. Annotate the fact that we only
generate routing socket events for explicit address addition, not
implicit link layer address addition.

- Re-structure if_delmulti() to hold if_addr_mtx around manipulation of
the ifnet multicast address list and reference count manipulation.
Annotate the lack of a routing socket event for implicit link layer
address removal.

- De-spl all and sundry.

Problem reported by: Ed Maste <emaste at phaedrus dot sandvine dot ca>
MFC after: 1 week


# 148640 02-Aug-2005 rwatson

Add if_addr_mtx to struct ifnet, a mutex to protect ifnet-related address
lists. Add accessor macros.

This changes the size of struct ifnet, but ideally, all ifnet consumers
are now using if_alloc() to allocate these structures rather than
embedding them into device driver softc's, so this won't modify the
network device driver ABI.

MFC after: 1 week


# 148265 21-Jul-2005 rwatson

Allocate one of the spare ifnet integer fields to hold if_drv_flags,
which in the future will hold IFF_OACTIVE and IFF_RUNNING, and have
its access synchronized by the device driver rather than the
protocol stack. This will avoid potential races in the management
of flags in if_flags.

Discussed with: various (scottl, jhb, ...)
MFC after: 1 week


# 147256 10-Jun-2005 brooks

Stop embedding struct ifnet at the top of driver softcs. Instead the
struct ifnet or the layer 2 common structure it was embedded in have
been replaced with a struct ifnet pointer to be filled by a call to the
new function, if_alloc(). The layer 2 common structure is also allocated
via if_alloc() based on the interface type. It is hung off the new
struct ifnet member, if_l2com.

This change removes the size of these structures from the kernel ABI and
will allow us to better manage them as interfaces come and go.

Other changes of note:
- Struct arpcom is no longer referenced in normal interface code.
Instead the Ethernet address is accessed via the IFP2ENADDR() macro.
To enforce this ac_enaddr has been renamed to _ac_enaddr.
- The second argument to ether_ifattach is now always the mac address
from driver private storage rather than sometimes being ac_enaddr.

Reviewed by: sobomax, sam


# 146986 05-Jun-2005 thompsa

Add hooks into the networking layer to support if_bridge. This changes struct
ifnet so a buildworld is necessary.

Approved by: mlaier (mentor)
Obtained from: NetBSD


# 146620 25-May-2005 peadar

Separate out address-detaching part of if_detach into if_purgeaddrs,
so if_tap doesn't need to rely on locally-rolled code to do same.

The observable symptom of if_tap's bzero'ing the address details
was a crash in "ifconfig tap0" after an if_tap device was closed.

Reported By: Matti Saarinen (mjsaarin at cc dot helsinki dot fi)


# 145320 20-Apr-2005 glebius

Do not call all link state callbacks directly, but schedule
a taskqueue(9) task. This fixes LORs and adds possibility
to serve such events pseudorecursively, when link state
change of interface causes subsequent change on other
interfaces.

Sponsored by: Rambler
Reviewed by: sam, brooks, mux


# 142901 01-Mar-2005 glebius

Revert change to struct ifnet. Use ifnet pointer in softc. Embedding
ifnet into smth will soon be removed.

Requested by: brooks


# 142564 26-Feb-2005 glebius

Remove carp_softc.sc_ifp member in favor of union pointers in struct ifnet.

Obtained from: OpenBSD


# 142215 22-Feb-2005 glebius

Add CARP (Common Address Redundancy Protocol), which allows multiple
hosts to share an IP address, providing high availability and load
balancing.

Original work on CARP done by Michael Shalayeff, with many
additions by Marco Pfatschbacher and Ryan McBride.

FreeBSD port done solely by Max Laier.

Patch by: mlaier
Obtained from: OpenBSD (mickey, mcbride)


# 139823 06-Jan-2005 imp

/* -> /*- for license, minor formatting changes


# 138542 08-Dec-2004 sam

Cleanup link state change notification:
o add new if_link_state_change routine that deals with link state changes
o change mii to use if_link_state_change


# 137476 09-Nov-2004 mlaier

Remove the #if 0 wrapping around !ALTQ stuff that can't be used due to ABI
stability anyway.


# 137065 30-Oct-2004 rwatson

Move if_handoff() from an inline in if_var.h to a function to if.c
in orden to harden the ABI for 5.x; this will permit us to modify
the locking in the ifnet packet dispatch without requiring drivers
to be recompiled.

MFC after: 3 days
Discussed at: EuroBSDCon Developer's Summit


# 137062 30-Oct-2004 rwatson

Add additional "spare" fields to 'struct ifnet' in order to improve
the resistance of the network driver ABI to changes that will be
required as we optimize locking.

MFC after: 3 days
Discussed at: Developer Summit


# 136950 25-Oct-2004 jmg

use NULL instead of 0 when casting/comparing w/ a pointer...


# 136704 19-Oct-2004 rwatson

Define IFF_LOCKGIANT() and IFF_UNLOCKGIANT() macros, which conditionally
acquire Giant if the passed interface has IFF_NEEDSGIANT set on it.
Modify calls into (ifp)->if_ioctl() in if.c to use these macros in order
to ensure that Giant is held.

MFC after: 3 days
Bumped into by: jmg


# 133741 15-Aug-2004 jmg

Add locking to the kqueue subsystem. This also makes the kqueue subsystem
a more complete subsystem, and removes the knowlege of how things are
implemented from the drivers. Include locking around filter ops, so a
module like aio will know when not to be unloaded if there are outstanding
knotes using it's filter ops.

Currently, it uses the MTX_DUPOK even though it is not always safe to
aquire duplicate locks. Witness currently doesn't support the ability
to discover if a dup lock is ok (in some cases).

Reviewed by: green, rwatson (both earlier versions)


# 133261 07-Aug-2004 mlaier

Add a "void *if_carp" placeholder to struct ifnet with prospect to bring in
the "Common address redundancy protocol" (CARP) during the 5-STABLE cycle.
Hence doing the ABI break now.

Approved by: re (scottl)


# 132712 27-Jul-2004 rwatson

Add a new network interface flag, IFF_NEEDSGIANT, which will allow
device drivers to declare that the ifp->if_start() method implemented
by the driver requires Giant in order to operate correctly.

Add a 'struct task' to 'struct ifnet' that can be used to execute a
deferred ifp->if_start() in the event that if_start needs to be called
in a Giant-free environment. To do this, introduce if_start(), a
wrapper function for ifp->if_start(). If the interface can run MPSAFE,
it directly dispatches into the interface start routine. If it can't
run MPSAFE, we're running with debug.mpsafenet != 0, and Giant isn't
currently held, the task is queued to execute in a swi holding Giant
via if_start_deferred().

Modify if_handoff() to use if_start() instead of direct dispatch.
Modify 802.11 to use if_start() instead of direct dispatch.

This is intended to provide increased compatibility for non-MPSAFE
network device drivers in the presence of Giant-free operation via
asynchronous dispatch. However, this commit does not mark any network
interfaces as IFF_NEEDSGIANT.


# 132152 14-Jul-2004 mlaier

Fix a copy-and-paste-o in IFQ_DRV_PREPEND - all pointyhats to me.
While here also fix a (not less stupid) braino in IFQ_DRV_PURGE.

Reported-by: clement
Tested-by: clement (_PREPEND in sis(4))


# 130933 22-Jun-2004 brooks

Major overhaul of pseudo-interface cloning. Highlights include:

- Split the code out into if_clone.[ch].
- Locked struct if_clone. [1]
- Add a per-cloner match function rather then simply matching names of
the form <name><unit> and <name>.
- Use the match function to allow creation of <interface>.<tag>
vlan interfaces. The old way is preserved unchanged!
- Also the match function to allow creation of stf(4) interfaces named
stf0, stf, or 6to4. This is the only major user visible change in
that "ifconfig stf" creates the interface stf rather then stf0 and
does not print "stf0" to stdout.
- Allow destroy functions to fail so they can refuse to delete
interfaces. Currently, we forbid the deletion of interfaces which
were created in the init function, particularly lo0, pflog0, and
pfsync0. In the case of lo0 this was a panic implementation so it
does not count as a user visiable change. :-)
- Since most interfaces do not need the new functionality, an family of
wrapper functions, ifc_simple_*(), were created to wrap old style
cloner functions.
- The IF_CLONE_INITIALIZER macro is replaced with a new incompatible
IFC_CLONE_INITIALIZER and ifc_simple consumers use IFC_SIMPLE_DECLARE
instead.

Submitted by: Maurycy Pawlowski-Wieronski <maurycy at fouk.org> [1]
Reviewed by: andre, mlaier
Discussed on: net


# 130585 16-Jun-2004 phk

Do the dreaded s/dev_t/struct cdev */
Bump __FreeBSD_version accordingly.


# 130512 15-Jun-2004 mlaier

Fix a typeo in IFQ_HANDOFF.


# 130508 14-Jun-2004 mlaier

Transform tbr_dequeue into a function pointer in order to build drivers with
ALTQ enabled versions of IFQ_* macros by default, as requested by serveral
others. This is a follow-up to the quick fix I committed yesterday which
turned off the ALTQ checks for non-ALTQ kernels.


# 130449 14-Jun-2004 mlaier

Unbreak non-ALTQ kernel linking. I forgot about tbr_dequeue.

In the end drivers should be building with ALTQ checks by default, but for
now build them with the old macros for non-ALTQ kernels.

Note: Check new features w/ LINT *and* w/ LINT minus the new feature.

Found-by: rwatson


# 130416 13-Jun-2004 mlaier

Link ALTQ to the build and break with ABI for struct ifnet. Please recompile
your (network) modules as well as any userland that might make sense of
sizeof(struct ifnet).
This does not change the queueing yet. These changes will follow in a
seperate commit. Same with the driver changes, which need case by case
evaluation.

__FreeBSD_version bump will follow.

Tested-by: (i386)LINT


# 128871 03-May-2004 andre

Link state change notification of ethernet media to the routing socket.

o Extend the if_data structure with an ifi_link_state field and
provide the corresponding defines for the valid states.

o The mii_linkchg() callback updates the ifi_link_state field
and calls rt_ifmsg() to notify listeners on the routing socket
in addition to the kqueue KNOTE.

o If vlans are configured on a physical interface notify and update
all vlan pseudo devices as well with the vlan_link_state() callback.

No objections by: sam, wpaul, ru, bms
Brucification by: bde


# 128407 18-Apr-2004 mlaier

Make if_(un)route static in if.c as they are called from if_up/if_down only.
This is also cleanup to make locking easier.

Reviewed by: luigi
Approved by: bms(mentor)


# 128376 17-Apr-2004 luigi

+ rename and document an unused field in struct arpcom (field is still
there so there are no ABI changes);
+ replace 5 redefinitions of the IPF2AC macro with one in if_arp.h

Eventually (but before freezing the ABI) we need to get rid of
struct arpcom (initially with the help of some smart #defines
to avoid having to touch each and every driver, see below).

Apart from the struct ifnet, struct arpcom now only stores a copy
of the MAC address (ac_enaddr, but we already have another copy in
the struct ifnet -- if_addrhead), and a netgraph-specific field
which is _always_ accessed through the ifp, so it might well go
into the struct ifnet too (where, besides, there is already an entry
for AF_NETGRAPH data...)

Too bad ac_enaddr is widely referenced by all drivers. But
this can be fixed as follows:

#define ac_enaddr ac_if.the_original_ac_enaddr_in_struct_ifnet

(note that the right hand side would likely be a pointer rather than
the base address of an array.)


# 128315 16-Apr-2004 luigi

Documented the intended usage of if_addrhead and ifaddr_byindex()
This commit only changes comments. Nothing to recompile.


# 128291 15-Apr-2004 luigi

Document the way if_addrhead and struct ifaddr are used.
Remove a member from 'struct ifaddr' which has been in an
#ifdef notdef block since rev 1.1

No ABI changes -- no need to recompile anything.


# 128157 12-Apr-2004 ru

Count outgoing link-level broadcast packets in if_omcasts.
I'm not sure this is completely correct but at least this
is consistent with the accounting of incoming broadcasts.

PR: kern/65273
Submitted by: David J Duchscher <daved@tamu.edu>


# 128124 11-Apr-2004 rwatson

In 4.x, if_ipending is used to track network interrupt state. In 5.x,
it is no longer used, so GC the ifnet.if_ipending field.


# 128019 07-Apr-2004 imp

Remove advertising clause from University of California Regent's
license, per letter dated July 22, 1999 and email from Peter Wemm,
Alan Cox and Robert Watson.

Approved by: core, peter, alc, rwatson


# 127828 04-Apr-2004 luigi

+ arpresolve(): remove an unused argument
+ struct ifnet: remove unused fields, move ipv6-related field close
to each other, add a pointer to l3<->l2 translation tables (arp,nd6,
etc.) for future use.

+ struct route: remove an unused field, move close to each
other some fields that might likely go away in the future


# 126900 13-Mar-2004 brooks

Remove if_withname. It came in with the KAME import, but never got
used. Should someone need its functionality, it's a really expensive
implementation of:
ifnet_byindex(sdl->sdl_index)

Reviewed by: bde, ume


# 126264 26-Feb-2004 mlaier

Bring eventhandler callbacks for pf.
This enables pf to track dynamic address changes on interfaces (dailup) with
the "on (<ifname>)"-syntax. This also brings hooks in anticipation of
tracking cloned interfaces, which will be in future versions of pf.

Approved by: bms(mentor)


# 123220 07-Dec-2003 imp

Make the if_broadcastaddr const. All the drivers in the tree which
violated the constness were corrected before the freeze. This was
suggested by mdodd@, I think, and sam@ and others have signed off on
this if I recall my conversations with them correctly.


# 122524 12-Nov-2003 rwatson

Modify the MAC Framework so that instead of embedding a (struct label)
in various kernel objects to represent security data, we embed a
(struct label *) pointer, which now references labels allocated using
a UMA zone (mac_label.c). This allows the size and shape of struct
label to be varied without changing the size and shape of these kernel
objects, which become part of the frozen ABI with 5-STABLE. This opens
the door for boot-time selection of the number of label slots, and hence
changes to the bound on the number of simultaneous labeled policies
at boot-time instead of compile-time. This also makes it easier to
embed label references in new objects as required for locking/caching
with fine-grained network stack locking, such as inpcb structures.

This change also moves us further in the direction of hiding the
structure of kernel objects from MAC policy modules, not to mention
dramatically reducing the number of '&' symbols appearing in both the
MAC Framework and MAC policy modules, and improving readability.

While this results in minimal performance change with MAC enabled, it
will observably shrink the size of a number of critical kernel data
structures for the !MAC case, and should have a small (but measurable)
performance benefit (i.e., struct vnode, struct socket) do to memory
conservation and reduced cost of zeroing memory.

NOTE: Users of MAC must recompile their kernel and all MAC modules as a
result of this change. Because this is an API change, third party
MAC modules will also need to be updated to make less use of the '&'
symbol.

Suggestions from: bmilekic
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories


# 121816 31-Oct-2003 brooks

Replace the if_name and if_unit members of struct ifnet with new members
if_xname, if_dname, and if_dunit. if_xname is the name of the interface
and if_dname/unit are the driver name and instance.

This change paves the way for interface renaming and enhanced pseudo
device creation and configuration symantics.

Approved By: re (in principle)
Reviewed By: njl, imp
Tested On: i386, amd64, sparc64
Obtained From: NetBSD (if_xname)


# 121470 24-Oct-2003 ume

Since dp->dom_ifattach calls malloc() with M_WAITOK, we cannot
use mutex lock directly here. Protect ifp->if_afdata instead.

Reported by: grehan


# 121161 17-Oct-2003 ume

- add dom_if{attach,detach} framework.
- transition to use ifp->if_afdata.

Obtained from: KAME


# 108533 01-Jan-2003 schweikh

Correct typos, mostly s/ a / an / where appropriate. Some whitespace cleanup,
especially in troff files.


# 108470 30-Dec-2002 schweikh

Fix typos, mostly s/ an / a / where appropriate and a few s/an/and/
Add FreeBSD Id tag where missing.


# 108298 26-Dec-2002 hsu

Long chain of calls starting with bridge_on(), going through IPv6, and
ending up at ifa_ifwithdstaddr() could lead to a recursive lock of
the ifnet list mutex.


# 108172 22-Dec-2002 hsu

SMP locking for ifnet list.


# 108036 18-Dec-2002 hsu

Switch to the conventional reference counting scheme.


# 108033 18-Dec-2002 hsu

Lock up ifaddr reference counts.


# 106931 14-Nov-2002 sam

o add if_nvlans member to track the number of vlans active on an interface
o add if_input member for interface drivers to call through to pass packets "up"
o remove ethernet-specific function decls (moved to ethernet.h)

Reviewed by: many
Approved by: re


# 104140 29-Sep-2002 bde

Fixed some of the namespace pollution in rev.1.33. <sys/systm.h> was
included here because it was once a prerequisite of <sys/mutex.h>
although that bug was fixed long ago.


# 103900 24-Sep-2002 brooks

Add a new helper function if_printf() modeled on device_printf(). The
function takes a struct ifnet pointer followed by the usual printf
arguments and prints "<interfacename>: " before the results of printf.
Since this is the primary form of printf calls in network device drivers
and accounts for most uses of the ifnet menber if_unit, this
significantly simplifies many printf()s.


# 102052 18-Aug-2002 sobomax

Increase size of ifnet.if_flags from 16 bits (short) to 32 bits (int). To avoid
breaking application ABI use unused ifreq.ifru_flags[1] for upper 16 bits in
SIOCSIFFLAGS and SIOCGIFFLAGS ioctl's.

Reviewed by: -hackers, -net


# 101849 13-Aug-2002 rwatson

Move to nested include of _label.h instead of mac.h, reducing namespace
pollution.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs
Suggested by: bde


# 100992 30-Jul-2002 rwatson

Introduce support for Mandatory Access Control and extensible
kernel access control.

Label network interface structures, permitting security features to
be maintained on those objects. if_label will be used to authorize
data flow using the network interface. if_label will be protected
using the same synchronization primitives as other mutable entries
in struct ifnet.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


# 96173 07-May-2002 imp

Minor style nit


# 92725 19-Mar-2002 alfred

Remove __P.


# 87914 14-Dec-2001 jlemon

whitespace fixes.


# 87902 14-Dec-2001 luigi

Device Polling code for -current.

Non-SMP, i386-only, no polling in the idle loop at the moment.

To use this code you must compile a kernel with

options DEVICE_POLLING

and at runtime enable polling with

sysctl kern.polling.enable=1

The percentage of CPU reserved to userland can be set with

sysctl kern.polling.user_frac=NN (default is 50)

while the remainder is used by polling device drivers and netisr's.
These are the only two variables that you should need to touch. There
are a few more parameters in kern.polling but the default values
are adequate for all purposes. See the code in kern_poll.c for
more details on them.

Polling in the idle loop will be implemented shortly by introducing
a kernel thread which does the job. Until then, the amount of CPU
dedicated to polling will never exceed (100-user_frac).
The equivalent (actually, better) code for -stable is at

http://info.iet.unipi.it/~luigi/polling/

and also supports polling in the idle loop.

NOTE to Alpha developers:
There is really nothing in this code that is i386-specific.
If you move the 2 lines supporting the new option from
sys/conf/{files,options}.i386 to sys/conf/{files,options} I am
pretty sure that this should work on the Alpha as well, just that
I do not have a suitable test box to try it. If someone feels like
trying it, I would appreciate it.

NOTE to other developers:
sure some things could be done better, and as always I am open to
constructive criticism, which a few of you have already given and
I greatly appreciated.
However, before proposing radical architectural changes, please
take some time to possibly try out this code, or at the very least
read the comments in kern_poll.c, especially re. the reason why I
am using a soft netisr and cannot (I believe) replace it with a
simple timeout.

Quick description of files touched by this commit:

sys/conf/files.i386
new file kern/kern_poll.c
sys/conf/options.i386
new option
sys/i386/i386/trap.c
poll in trap (disabled by default)
sys/kern/kern_clock.c
initialization and hardclock hooks.
sys/kern/kern_intr.c
minor swi_net changes
sys/kern/kern_poll.c
the bulk of the code.
sys/net/if.h
new flag
sys/net/if_var.h
declaration for functions used in device drivers.
sys/net/netisr.h
NETISR_POLL
sys/dev/fxp/if_fxp.c
sys/dev/fxp/if_fxpvar.h
sys/pci/if_dc.c
sys/pci/if_dcreg.h
sys/pci/if_sis.c
sys/pci/if_sisreg.h
device driver modifications


# 86797 22-Nov-2001 luigi

Expand the comment on the layout of softc, arpcom and ifnet structures,
and list the places where the assumption is used.


# 86364 14-Nov-2001 jhb

Remove ifnet.if_mpsafe for now. If this is needed, it won't be needed
until much later when the network stack locking is farther along.

Approved by: jlemon


# 85074 17-Oct-2001 ru

Pull post-4.4BSD change to sys/net/route.c from BSD/OS 4.2.

Have sys/net/route.c:rtrequest1(), which takes ``rt_addrinfo *''
as the argument. Pass rt_addrinfo all the way down to rtrequest1
and ifa->ifa_rtrequest. 3rd argument of ifa->ifa_rtrequest is now
``rt_addrinfo *'' instead of ``sockaddr *'' (almost noone is
using it anyways).

Benefit: the following command now works. Previously we needed
two route(8) invocations, "add" then "change".
# route add -inet6 default ::1 -ifp gif0

Remove unsafe typecast in rtrequest(), from ``rtentry *'' to
``sockaddr *''. It was introduced by 4.3BSD-Reno and never
corrected.

Obtained from: BSD/OS, NetBSD
MFC after: 1 month
PR: kern/28360


# 84931 14-Oct-2001 fjoe

bring in ARP support for variable length link level addresses

Reviewed by: jdp
Approved by: jdp
Obtained from: NetBSD
MFC after: 6 weeks


# 84380 02-Oct-2001 mjacob

Documentation comment: note that the each NIC's softc is assumed to start
with an ifnet structure.

MFC after: 1 week


# 83624 18-Sep-2001 jlemon

Add two fields to the ifnet structure indicating what extra capabilities
a network device has, and which ones are enabled.


# 83366 12-Sep-2001 julian

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


# 83130 06-Sep-2001 jlemon

Wrap array accesses in macros, which also happen to be lvalues:

ifnet_addrs[i - 1] -> ifaddr_byindex(i)
ifindex2ifnet[i] -> ifnet_byindex(i)

This is intended to ease the conversion to SMPng.


# 79103 02-Jul-2001 brooks

Add kernel infrastructure for network device cloning.

Reviewed by: ru, ume
Obtained from: NetBSD
MFC after: 1 week


# 74914 28-Mar-2001 jhb

Catch up to header include changes:
- <sys/mutex.h> now requires <sys/systm.h>
- <sys/mutex.h> and <sys/sx.h> now require <sys/lock.h>


# 72200 09-Feb-2001 bmilekic

Change and clean the mutex lock interface.

mtx_enter(lock, type) becomes:

mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks)
mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)

similarily, for releasing a lock, we now have:

mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN.
We change the caller interface for the two different types of locks
because the semantics are entirely different for each case, and this
makes it explicitly clear and, at the same time, it rids us of the
extra `type' argument.

The enter->lock and exit->unlock change has been made with the idea
that we're "locking data" and not "entering locked code" in mind.

Further, remove all additional "flags" previously passed to the
lock acquire/release routines with the exception of two:

MTX_QUIET and MTX_NOSWITCH

The functionality of these flags is preserved and they can be passed
to the lock/unlock routines by calling the corresponding wrappers:

mtx_{lock, unlock}_flags(lock, flag(s)) and
mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN
locks, respectively.

Re-inline some lock acq/rel code; in the sleep lock case, we only
inline the _obtain_lock()s in order to ensure that the inlined code
fits into a cache line. In the spin lock case, we inline recursion and
actually only perform a function call if we need to spin. This change
has been made with the idea that we generally tend to avoid spin locks
and that also the spin locks that we do have and are heavily used
(i.e. sched_lock) do recurse, and therefore in an effort to reduce
function call overhead for some architectures (such as alpha), we
inline recursion for this case.

Create a new malloc type for the witness code and retire from using
the M_DEV type. The new type is called M_WITNESS and is only declared
if WITNESS is enabled.

Begin cleaning up some machdep/mutex.h code - specifically updated the
"optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN
and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently
need those.

Finally, caught up to the interface changes in all sys code.

Contributors: jake, jhb, jasone (in no particular order)


# 72084 06-Feb-2001 phk

Convert if_multiaddrs from LIST to TAILQ so that it can be traversed
backwards in the three drivers which want to do that.

Reviewed by: mikeh


# 71791 29-Jan-2001 peter

Make the number of loopback interfaces dynamically tunable. Why one
would *want* to is a different story, but it used to be able to be done
statically. Get rid of #include "loop.h" and struct ifnet loif[NLOOP];
This could be used as an example of how to do this in other drivers,
for example: ccd.


# 69224 26-Nov-2000 jlemon

Unbreak world; #include <sys/mutex.h> instead of <machine/mutex.h>
Only include <sys/mbuf.h> when building kernel sources. This should
probably be changed to require callers to include it themselves.


# 69152 25-Nov-2000 jlemon

Lock down the network interface queues. The queue mutex must be obtained
before adding/removing packets from the queue. Also, the if_obytes and
if_omcasts fields should only be manipulated under protection of the mutex.

IF_ENQUEUE, IF_PREPEND, and IF_DEQUEUE perform all necessary locking on
the queue. An IF_LOCK macro is provided, as well as the old (mutex-less)
versions of the macros in the form _IF_ENQUEUE, _IF_QFULL, for code which
needs them, but their use is discouraged.

Two new macros are introduced: IF_DRAIN() to drain a queue, and IF_HANDOFF,
which takes care of locking/enqueue, and also statistics updating/start
if necessary.


# 67334 19-Oct-2000 joe

Augment the 'ifaddr' structure with a 'struct if_data' to keep
statistics on a per network address basis.

Teach the IPv4 and IPv6 input/output routines to log packets/bytes
against the network address connected to the flow.

Teach netstat to display the per-address stats for IP protocols
when 'netstat -i' is evoked, instead of displaying the per-interface
stats.


# 64651 14-Aug-2000 archie

Export the functionality of SIOCSIFLLADDR with if_setlladdr()
and add some more rigorous sanity checking in the process.

Reviewed by: freebsd-net


# 63090 13-Jul-2000 archie

Make all Ethernet drivers attach using ether_ifattach() and detach using
ether_ifdetach().

The former consolidates the operations of if_attach(), ng_ether_attach(),
and bpfattach(). The latter consolidates the corresponding detach operations.

Reviewed by: julian, freebsd-net


# 62264 29-Jun-2000 archie

Fix kernel build breakage when 'device ether' was not included.


# 62143 26-Jun-2000 archie

Make the ng_ether(4) node type dynamically loadable like the rest.
This means 'options NETGRAPH' is no longer necessary in order to get
netgraph-enabled Ethernet interfaces. This supports loading/unloading
the ng_ether.ko and attaching/detaching the Ethernet interface in any
order.

Add two new hooks 'upper' and 'lower' to allow access to the protocol
demux engine and the raw device, respectively. This enables bridging
to be defined as a netgraph node, if so desired.

Reviewed by: freebsd-net@freebsd.org


# 60938 26-May-2000 jake

Back out the previous change to the queue(3) interface.
It was not discussed and should probably not happen.

Requested by: msmith and others


# 60889 24-May-2000 archie

Just need to pass the address family to if_simloop(), not the whole sockaddr.


# 60833 23-May-2000 jake

Change the way that the queue(3) structures are declared; don't assume that
the type argument to *_HEAD and *_ENTRY is a struct.

Suggested by: phk
Reviewed by: phk
Approved by: mdodd


# 58698 27-Mar-2000 jlemon

Add support for offloading IP/TCP/UDP checksums to NIC hardware which
supports them.


# 55205 29-Dec-1999 peter

Change #ifdef KERNEL to #ifdef _KERNEL in the public headers. "KERNEL"
is an application space macro and the applications are supposed to be free
to use it as they please (but cannot). This is consistant with the other
BSD's who made this change quite some time ago. More commits to come.


# 53541 22-Nov-1999 shin

KAME netinet6 basic part(no IPsec,no V6 Multicast Forwarding, no UDP/TCP
for IPv6 yet)

With this patch, you can assigne IPv6 addr automatically, and can reply to
IPv6 ping.

Reviewed by: freebsd-arch, cvs-committers
Obtained from: KAME project


# 52904 05-Nov-1999 shin

KAME related header files additions and merges.
(only those which don't affect c source files so much)

Reviewed by: cvs-committers
Obtained from: KAME project


# 50477 27-Aug-1999 peter

$Id$ -> $FreeBSD$


# 49459 06-Aug-1999 brian

Define IF_MAXMTU and IF_MINMTU and don't allow MTUs with
out-of-range values.

``comparison is always 0'' warnings are silly !

Ok'd by: wollman, dg
Advised by: bde


# 47254 16-May-1999 pb

PR: kern/10570
Submitted by: adrian@freebsd.org

Change reference count in struct ifaddr to a u_int, to be able
to handle more than 2^16 routes to the same interface.

Fix suggested by Andrew Bangs <andrewb@demon.net> in PR kern/10570.
Tested by <adrian@freebsd.org> and me under -current.


# 46568 06-May-1999 peter

Add sufficient braces to keep egcs happy about potentially ambiguous
if/else nesting.


# 45720 16-Apr-1999 peter

Bring the 'new-bus' to the i386. This extensively changes the way the
i386 platform boots, it is no longer ISA-centric, and is fully dynamic.
Most old drivers compile and run without modification via 'compatability
shims' to enable a smoother transition. eisa, isapnp and pccard* are
not yet using the new resource manager. Once fully converted, all drivers
will be loadable, including PCI and ISA.

(Some other changes appear to have snuck in, including a port of Soren's
ATA driver to the Alpha. Soren, back this out if you need to.)

This is a checkpoint of work-in-progress, but is quite functional.

The bulk of the work was done over the last few years by Doug Rabson and
Garrett Wollman.

Approved by: core


# 41879 16-Dec-1998 phk

Generalize the if_up() and if_down() functions under the names
if_route() and if_unroute().

This is first step towards sanitizing IFF_UP and IFF_RUNNING


# 36908 12-Jun-1998 julian

Go through the loopback code with a broom..
Remove lots'o'hacks.
looutput is now static.

Other callers who want to use loopback to allow shortcutting
should call the special entrypoint for this, if_simloop(), which is
specifically designed for this purpose. Using looutput for this purpose
was problematic, particularly with bpf and trying to keep track
of whether one should be using the charateristics of the loopback interface
or the interface (e.g. if_ethersubr.c) that was requesting the loopback.
There was a whole class of errors due to this mis-use each of which had
hacks to cover them up.

Consists largly of hack removal :-)


# 36735 07-Jun-1998 dfr

This commit fixes various 64bit portability problems required for
FreeBSD/alpha. The most significant item is to change the command
argument to ioctl functions from int to u_long. This change brings us
inline with various other BSD versions. Driver writers may like to
use (__FreeBSD_version == 300003) to detect this change.

The prototype FreeBSD/alpha machdep will follow in a couple of days
time.


# 35210 15-Apr-1998 bde

Support compiling with `gcc -ansi'.


# 28845 27-Aug-1997 julian

Add a per-interface-address pointer to a function that can be supplied
by a protocol, to detirmine if an address matches the net this address
is part of. This is needed by protocols for which netmasks
"just don't work", for example appletalk.

Also add the code in appletalk to make use of this new feature.
Thsi fixes one of the longest standing bugs in appletalk.
The inability to talk to machines to which the path is via a router
which is on a different net, but the same netrange, as your interface.
Protocols that do not supply this function (e.g. IP) should not be affected.


# 22975 22-Feb-1997 peter

Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.


# 21673 14-Jan-1997 jkh

Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.


# 21434 08-Jan-1997 wollman

Fix a few oversights in the new multicast membership interface.


# 21404 07-Jan-1997 wollman

Checkpoint the beginnings of the new kernel interface for
multicast group memberships. This is not actually operative
at the moment (a lot of other code still needs to be changed), but
this seemed like a useful reference point to check in so that
others (i.e. Bill Fenner) have fair warning of where we are going.


# 21259 03-Jan-1997 wollman

Separate kernel-internal data structures from exposed user interface
to interfaces. (Amazing nobody had done this!)

More commits to fix up user-land to follow.