History log of /freebsd-9.3-release/sys/net/if.c
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# 267654 19-Jun-2014 gjb

Copy stable/9 to releng/9.3 as part of the 9.3-RELEASE cycle.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation

# 265292 03-May-2014 rmacklem

MFC: r264630
For NFS mounts using rsize,wsize=65536 over TSO enabled
network interfaces limited to 32 transmit segments, there
are two known issues.
The more serious one is that for an I/O of slightly less than 64K,
the net device driver prepends an ethernet header, resulting in a
TSO segment slightly larger than 64K. Since m_defrag() copies this
into 33 mbuf clusters, the transmit fails with EFBIG.
A tester indicated observing a similar failure using iSCSI.

The second less critical problem is that the network
device driver must copy the mbuf chain via m_defrag()
(m_collapse() is not sufficient), resulting in measurable overhead.

This patch reduces the default size of if_hw_tsomax
slightly, so that the first issue is avoided.
Fixing the second issue will require a way for the
network device driver to inform tcp_output() that it
is limited to 32 transmit segments.


# 255443 10-Sep-2013 des

Fix the length calculation for the final block of a sendfile(2)
transmission which could be tricked into rounding up to the nearest
page size, leaking up to a page of kernel memory. [13:11]

In IPv6 and NetATM, stop SIOCSIFADDR, SIOCSIFBRDADDR, SIOCSIFDSTADDR
and SIOCSIFNETMASK at the socket layer rather than pass them on to the
link layer without validation or credential checks. [SA-13:12]

Prevent cross-mount hardlinks between different nullfs mounts of the
same underlying filesystem. [SA-13:13]

Security: CVE-2013-5666
Security: FreeBSD-SA-13:11.sendfile
Security: CVE-2013-5691
Security: FreeBSD-SA-13:12.ifioctl
Security: CVE-2013-5710
Security: FreeBSD-SA-13:13.nullfs
Approved by: so


# 253700 27-Jul-2013 rodrigc

Approved by: re (hrs, marius)

MFC 253346:

PR: 168520 170096
Submitted by: adrian, zec

Fix multiple kernel panics when VIMAGE is enabled in the kernel.
These fixes are based on patches submitted by Adrian Chadd and Marko Zec.

(1) Set curthread->td_vnet to vnet0 in device_probe_and_attach() just before calling
device_attach(). This fixes multiple VIMAGE related kernel panics
when trying to attach Bluetooth or USB Ethernet devices because
curthread->td_vnet is NULL.

(2) Set curthread->td_vnet in if_detach(). This fixes kernel panics when detaching networking
interfaces, especially USB Ethernet devices.

(3) Use VNET_DOMAIN_SET() in ng_btsocket.c

(4) In ng_unref_node() set curthread->td_vnet. This fixes kernel panics
when detaching Netgraph nodes.


# 252781 05-Jul-2013 andre

MFC r291296, r291297, r291393:

Allow drivers to specify a maximum TSP length in bytes if they
are limited in the amount of data they can handle at once.

Apply this to the netfront driver.

Spare fields in struct tcpcb and struct ifnet are used to keep
the structure sizes the same.


# 249132 05-Apr-2013 mav

MFC r227293 (by ed):
Mark MALLOC_DEFINEs static that have no corresponding MALLOC_DECLAREs.

This means that their use is restricted to a single C file.


# 248895 29-Mar-2013 melifaro

Merge 248070.

Fix long-standing issue with interface routes being unprotected:
Use RTM_PINNED flag to mark route as immutable.
Forbid deleting immutable routes without special rtrequest1_fib() flag.
Adding interface address with prefix already in route table is handled
by atomically deleting old prefix and adding interface one.


# 236051 26-May-2012 thompsa

MFC r234487

Add linkstate to bridge(4), set the link to up when at least one underlying
interface is up, otherwise the link is down.

This, among other things, allows carp to work on a bridge.


# 233200 19-Mar-2012 jhb

MFC 229621:
Convert all users of IF_ADDR_LOCK to use new locking macros that specify
either a read lock or write lock.


# 225736 22-Sep-2011 kensmith

Copy head to stable/9 as part of 9.0-RELEASE release cycle.

Approved by: re (implicit)


# 223735 03-Jul-2011 bz

Add infrastructure to allow all frames/packets received on an interface
to be assigned to a non-default FIB instance.

You may need to recompile world or ports due to the change of struct ifnet.

Submitted by: cjsp
Submitted by: Alexander V. Chernikov (melifaro ipfw.ru)
(original versions)
Reviewed by: julian
Reviewed by: Alexander V. Chernikov (melifaro ipfw.ru)
MFC after: 2 weeks
X-MFC: use spare in struct ifnet


# 223625 28-Jun-2011 pluknet

Update ifc_len field of struct ifconf passed for the ioctl SIOCGIFCONF32
(i.e. under COMPAT_FREEBSD32) in case ifconf() returned success to match
the native SIOCGIFCONF behavior.

PR: kern/158369
Reported by: Paul Procacci <pprocacci att gmail com>
MFC after: 1 week


# 220317 04-Apr-2011 glebius

When removing ifnets, we should first remove the reference to ifnet
from the interface index, then decrease refcount, not vice versa.

Otherwise there is a race (reproducible) when if_free_internal()
contests on IFNET_WLOCK(), and we got a zero-refed ifnet in the
index for a long time. It may be picked by some other thread,
that runs ifnet_byindex_ref(), who takes the ifnet from index,
and bumps refcount. When reader drops the lock, if_free_internal()
proceeds with free. Then reader tries to free it a second time.


# 219819 21-Mar-2011 jeff

- Merge changes to the base system to support OFED. These include
a wider arg2 for sysctl, updates to vlan code, IFT_INFINIBAND,
and other miscellaneous small features.


# 218757 16-Feb-2011 bz

Mfp4 CH=177274,177280,177284-177285,177297,177324-177325

VNET socket push back:
try to minimize the number of places where we have to switch vnets
and narrow down the time we stay switched. Add assertions to the
socket code to catch possibly unset vnets as seen in r204147.

While this reduces the number of vnet recursion in some places like
NFS, POSIX local sockets and some netgraph, .. recursions are
impossible to fix.

The current expectations are documented at the beginning of
uipc_socket.c along with the other information there.

Sponsored by: The FreeBSD Foundation
Sponsored by: CK Software GmbH
Reviewed by: jhb
Tested by: zec

Tested by: Mikolaj Golub (to.my.trociny gmail.com)
MFC after: 2 weeks


# 218559 11-Feb-2011 bz

Mfp4 CH=177255:

Make VNET_ASSERT() available with either VNET_DEBUG or INVARIANTS.

Change the syntax to match KASSERT() to allow more flexible panic
messages rather than having a printf with hardcoded arguments
before panic.

Adjust the few assertions we have to the new format (and enhance
the output).

Sponsored by: The FreeBSD Foundation
Sponsored by: CK Software GmbH
Reviewed by: jhb

MFC after: 2 weeks


# 217805 24-Jan-2011 jhb

Fix a LOR by dropping the global ifnet locks while allocating a new ifnet
table in if_grow(). The order of the SYSINIT's for ifnet state were swapped
so that the various locks were initialized before being used.

Reviewed by: pluknet, bz
MFC after: 2 weeks


# 217322 12-Jan-2011 mdf

sysctl(9) cleanup checkpoint: amd64 GENERIC builds cleanly.

Commit the net* piece.


# 215701 22-Nov-2010 dim

After some off-list discussion, revert a number of changes to the
DPCPU_DEFINE and VNET_DEFINE macros, as these cause problems for various
people working on the affected files. A better long-term solution is
still being considered. This reversal may give some modules empty
set_pcpu or set_vnet sections, but these are harmless.

Changes reverted:

------------------------------------------------------------------------
r215318 | dim | 2010-11-14 21:40:55 +0100 (Sun, 14 Nov 2010) | 4 lines

Instead of unconditionally emitting .globl's for the __start_set_xxx and
__stop_set_xxx symbols, only emit them when the set_vnet or set_pcpu
sections are actually defined.

------------------------------------------------------------------------
r215317 | dim | 2010-11-14 21:38:11 +0100 (Sun, 14 Nov 2010) | 3 lines

Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout
the tree.

------------------------------------------------------------------------
r215316 | dim | 2010-11-14 21:23:02 +0100 (Sun, 14 Nov 2010) | 2 lines

Add macros to define static instances of VNET_DEFINE and DPCPU_DEFINE.


# 215317 14-Nov-2010 dim

Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout
the tree.


# 214333 25-Oct-2010 bz

Factor out DDB commands from r204145, r204279 into if_debug.c for further
enhancements (1). Switch to a standard 2-clause BSD license for this (2).

Unfortunately we have to un-static the ifindex_table for this but do not
publicly export it.

Suggested by: rwatson (1) a while back.
Approved by: thompsa (2) for the change from r204279.
MFC after: 6 days


# 214136 21-Oct-2010 pluknet

Reshuffle SIOCGIFCONF32 handler from r155224.

- move all the chunks into one file, which allows to hide SIOCGIFCONF32
global definition as well.
- replace __amd64__ with proper COMPAT_FREEBSD32 around.
- handle 32bit capacity before going into the handler itself instead of
doing internal 32bit specific changes within it (e.g. as it's done for
SIOCGDEFIFACE32_IN6).
- use explicitely sized types for ABI compat.

Approved by: kib (mentor)
MFC after: 2 weeks


# 212425 10-Sep-2010 mdf

Replace sbuf_overflowed() with sbuf_error(), which returns any error
code associated with overflow or with the drain function. While this
function is not expected to be used often, it produces more information
in the form of an errno that sbuf_overflowed() did.


# 211283 13-Aug-2010 zec

When moving an ethernet ifnet from one vnet to another, destroy the
associated ng_ether netgraph node in the current vnet, and create a
new one in the target vnet.

Reviewed by: julian
MFC after: 3 days


# 211193 11-Aug-2010 will

Unbreak LINT by moving all carp hooks to net/if.c / netinet/ip_carp.h, with
the appropriate ifdefs.

Reviewed by: bz
Approved by: ken (mentor)


# 211157 10-Aug-2010 will

Allow carp(4) to be loaded as a kernel module. Follow precedent set by
bridge(4), lagg(4) etc. and make use of function pointers and
pf_proto_register() to hook carp into the network stack.

Currently, because of the uncertainty about whether the unload path is free
of race condition panics, unloads are disallowed by default. Compiling with
CARPMOD_CAN_UNLOAD in CFLAGS removes this anti foot shooting measure.

This commit requires IP6PROTOSPACER, introduced in r211115.

Reviewed by: bz, simon
Approved by: ken (mentor)
MFC after: 2 weeks


# 210532 27-Jul-2010 bz

Return NULL rather than 0 for a pointer.

MFC after: 3 days


# 208553 25-May-2010 qingli

This patch fixes the problem where proxy ARP entries cannot be added
over the if_ng interface.

MFC after: 3 days


# 207554 03-May-2010 sobomax

Add new tunable 'net.link.ifqmaxlen' to set default send interface
queue length. The default value for this parameter is 50, which is
quite low for many of today's uses and the only way to modify this
parameter right now is to edit if_var.h file. Also add read-only
sysctl with the same name, so that it's possible to retrieve the
current value.

MFC after: 1 month


# 207369 29-Apr-2010 bz

MFP4: @176978-176982, 176984, 176990-176994, 177441

"Whitspace" churn after the VIMAGE/VNET whirls.

Remove the need for some "init" functions within the network
stack, like pim6_init(), icmp_init() or significantly shorten
others like ip6_init() and nd6_init(), using static initialization
again where possible and formerly missed.

Move (most) variables back to the place they used to be before the
container structs and VIMAGE_GLOABLS (before r185088) and try to
reduce the diff to stable/7 and earlier as good as possible,
to help out-of-tree consumers to update from 6.x or 7.x to 8 or 9.

This also removes some header file pollution for putatively
static global variables.

Revert VIMAGE specific changes in ipfilter::ip_auth.c, that are
no longer needed.

Reviewed by: jhb
Discussed with: rwatson
Sponsored by: The FreeBSD Foundation
Sponsored by: CK Software GmbH
MFC after: 6 days


# 206637 14-Apr-2010 delphij

When an underlying ioctl(2) handler returns an error, our ioctl(2)
interface considers that it hits a fatal error, and will not copyout
the request structure back for _IOW and _IOWR ioctls, keeping them
untouched.

The previous implementation of the SIOCGIFDESCR ioctl intends to
feed the buffer length back to userland. However, if we return
an error, the feedback would be defeated and ifconfig(8) would
trap into an infinite loop.

This commit changes SIOCGIFDESCR to set buffer field to NULL to
indicate the previous ENAMETOOLONG case.

Reported by: bschmidt
MFC after: 2 weeks


# 206470 11-Apr-2010 bz

In if_detach_internal() we cannot hold the af_data lock over the
dom_ifdetach() calls as they might sleep for callout_drain().
Do as we do in if_attachdomain1() [r121470] and handle
if_afdata_initialized earlier and call dom_ifdetach() unlocked.

Discussed with: rwatson
MFC after: 10 days


# 206469 11-Apr-2010 bz

In if_detach_internal() only try to do the detach run if if_attachdomain1()
has actually succeeded to initialize and attach. There is a theoretical
possibility to drop out early in if_attachdomain1() leaving the array
uninitialized if we cannot get the lock.

Discussed with: rwatson
MFC after: 10 days


# 204279 24-Feb-2010 bz

Use the DB_SHOW_ALL_COMMAND() macro to register the formerly 'show ifnets'
in the db_show_all_table as 'show all ifnets' and with that follow the
convention for showing complete lists.

Submitted by: thompsa
MFC after: 3 days


# 204145 20-Feb-2010 bz

Start to implement ifnet DDB support:
- 'show ifnets' prints a list of ifnet *s per virtual network stack,
- 'show ifnet <struct ifnet *>' prints fields matching the given ifp.

We do not yet print the complete set of fields and might want to
factor this out to an extra if_debug.c file in case this grows
a lot[1]. We may also want to grow 'show ifnet <if_xname>' support[1].

Sponsored by: ISPsystem
Suggested by: rwatson [1]
Reviewed by: rwatson
MFC after: 5 days


# 204142 20-Feb-2010 bz

Enhance a panic string to contain more useful debugging information.

Sponsored by: ISPsystem
Reviewed by: rwatson
MFC after: 5 days


# 203052 26-Jan-2010 delphij

Revised revision 199201 (add interface description capability as inspired
by OpenBSD), based on comments from many, including rwatson, jhb, brooks
and others.

Sponsored by: iXsystems, Inc.
MFC after: 1 month


# 202935 24-Jan-2010 syrinx

While flushing the multicast filter of an interface, do not zero the relevant
ifmultiaddr structures' reference to the parent interface, unless the parent
interface is really detaching. While here, program only link layer multicast
filters to a wlan's hardware parent interface.

PR: kern/142391, kern/142392
Reviewed by: sam, rpaolo, bms
MFC after: 1 week


# 202588 18-Jan-2010 thompsa

Declare a new EVENTHANDLER called iflladdr_event which signals that the L2
address on an interface has changed. This lets stacked interfaces such as
vlan(4) detect that their lower interface has changed and adjust things in
order to keep working. Previously this situation broke at least vlan(4) and
lagg(4) configurations.

The EVENTHANDLER_INVOKE call was not placed within if_setlladdr() due to the
risk of a loop.

PR: kern/142927
Submitted by: Nikolay Denev


# 201350 31-Dec-2009 brooks

The devices that supported EVFILT_NETDEV kqueue filters were removed in
r195175. Remove all definitions, documentation, and usage.

fifo_misc.c:
Remove all kqueue tests as fifo_io.c performs all those that
would have remained.

Reviewed by: rwatson
MFC after: 3 weeks
X-MFC note: don't change vlan_link_state() function signature


# 201196 29-Dec-2009 jhb

Change vlan interfaces to cope more usefully with the parent interface being
renamed. Previously the vlan interfaces would lose their configuration as if
the parent interface had been physically removed. Now vlan interfaces ignore
rename events.
- Add a new ifnet flag (IFF_RENAMING) that is set while an ifnet is being
renamed. This flag can be checked in ifnet departure/arrival event
handlers to treat rename events differently.
- Change the ifnet departure event handler in the if_vlan(4) driver to
ignore departure events due to a trunk interface being renamed.

Reviewed by: brooks, rwatson
MFC after: 1 week


# 199975 30-Nov-2009 jhb

Remove if_timer/if_watchdog now that they are no longer used. The space
used by if_timer is reserved for expanding if_index to an int in the
future.

Reviewed by: rwatson, brooks


# 199231 12-Nov-2009 delphij

Revert revision 199201 for now as it has introduced a kernel vulnerability
and requires more polishing.


# 199201 11-Nov-2009 delphij

Add interface description capability as inspired by OpenBSD.

MFC after: 3 months


# 197364 20-Sep-2009 qingli

A wrong variable is used when setting up the interface
address route, which broke source address selection in
some code paths.

Submitted by: noted by bz
Reviewed by: hrs
MFC after: immediately


# 197227 15-Sep-2009 qingli

Self pointing routes are installed for configured interface addresses
and address aliases. After an interface is brought down and brought
back up again, those self pointing routes disappeared. This patch
ensures after an interface is brought back up, the loopback routes
are reinstalled properly.

Reviewed by: bz
MFC after: immediately


# 196559 26-Aug-2009 rwatson

Add IFNET_HOLD reserved pointer value for the ifindex ifnet array,
which allows an index to be reserved for an ifnet without making
the ifnet available for management operations. Use this in if_alloc()
while the ifnet lock is released between initial index allocation and
completion of ifnet initialization.

Add ifindex_free() to centralize the implementation of releasing an
ifindex value. Use in if_free() and if_vmove(), as well as when
releasing a held index in if_alloc().

Reviewed by: bz
MFC after: 3 days


# 196553 25-Aug-2009 rwatson

Break out allocation of new ifindex values from if_alloc() and if_vmove(),
and centralize in a single function ifindex_alloc(). Assert the
IFNET_WLOCK, and add missing IFNET_WLOCK in if_alloc(). This does not
close all known races in this code.

Reviewed by: bz
MFC after: 3 days


# 196510 24-Aug-2009 rwatson

Make if_grow static -- it's not used outside of if.c, and with the
internals destined to change, it's better if it remains that way.

MFC after: 3 days


# 196504 24-Aug-2009 zec

When moving ifnets from one vnet to another, and the ifnet
has ifaddresses of AF_LINK type which thus have an embedded
if_index "backpointer", we must update that if_index backpointer
to reflect the new if_index that our ifnet just got assigned.

This change affects only options VIMAGE builds.

Submitted by: bz
Reviewed by: bz
Approved by: re (rwatson), julian (mentor)


# 196481 23-Aug-2009 rwatson

Rework global locks for interface list and index management, correcting
several critical bugs, including race conditions and lock order issues:

Replace the single rwlock, ifnet_lock, with two locks, an rwlock and an
sxlock. Either can be held to stablize the lists and indexes, but both
are required to write. This allows the list to be held stable in both
network interrupt contexts and sleepable user threads across sleeping
memory allocations or device driver interactions. As before, writes to
the interface list must occur from sleepable contexts.

Reviewed by: bz, julian
MFC after: 3 days


# 196230 14-Aug-2009 zec

Appease VNET_DEBUG - in if_vmove we temporarily switch i.e.
recurse from one vnet to another which is OK, so no need
to flood the console with warnings here.

Approved by: re (rwatson), julian (mentor)


# 196019 01-Aug-2009 rwatson

Merge the remainder of kern_vimage.c and vimage.h into vnet.c and
vnet.h, we now use jails (rather than vimages) as the abstraction
for virtualization management, and what remained was specific to
virtual network stacks. Minor cleanups are done in the process,
and comments updated to reflect these changes.

Reviewed by: bz
Approved by: re (vimage blanket)


# 195891 26-Jul-2009 bz

Make the in-kernel logic for the SIOCSIFVNET, SIOCSIFRVNET ioctls
(ifconfig ifN (-)vnet <jname|jid>) work correctly.

Move vi_if_move to if.c and split it up into two functions(*),
one for each ioctl.

In the reclaim case, correctly set the vnet before calling if_vmove.

Instead of silently allowing a move of an interface from the current
vnet to the current vnet, return an error. (*)

There is some duplicate interface name checking before actually moving
the interface between network stacks without locking and thus race
prone. Ideally if_vmove will correctly and automagically handle these
in the future.

Suggested by: rwatson (*)
Approved by: re (kib)


# 195837 23-Jul-2009 rwatson

Introduce and use a sysinit-based initialization scheme for virtual
network stacks, VNET_SYSINIT:

- Add VNET_SYSINIT and VNET_SYSUNINIT macros to declare events that will
occur each time a network stack is instantiated and destroyed. In the
!VIMAGE case, these are simply mapped into regular SYSINIT/SYSUNINIT.
For the VIMAGE case, we instead use SYSINIT's to track their order and
properties on registration, using them for each vnet when created/
destroyed, or immediately on module load for already-started vnets.
- Remove vnet_modinfo mechanism that existed to serve this purpose
previously, as well as its dependency scheme: we now just use the
SYSINIT ordering scheme.
- Implement VNET_DOMAIN_SET() to allow protocol domains to declare that
they want init functions to be called for each virtual network stack
rather than just once at boot, compiling down to DOMAIN_SET() in the
non-VIMAGE case.
- Walk all virtualized kernel subsystems and make use of these instead
of modinfo or DOMAIN_SET() for init/uninit events. In some cases,
convert modular components from using modevent to using sysinit (where
appropriate). In some cases, do minor rejuggling of SYSINIT ordering
to make room for or better manage events.

Portions submitted by: jhb (VNET_SYSINIT), bz (cleanup)
Discussed with: jhb, bz, julian, zec
Reviewed by: bz
Approved by: re (VIMAGE blanket)


# 195769 19-Jul-2009 rwatson

Normalize field naming for struct vnet, fix two debugging printfs that
print them.

Reviewed by: bz
Approved by: re (kensmith, kib)


# 195760 19-Jul-2009 rwatson

Reimplement and/or implement vnet list locking by replacing a mostly
unused custom mutex/condvar-based sleep locks with two locks: an
rwlock (for non-sleeping use) and sxlock (for sleeping use). Either
acquired for read is sufficient to stabilize the vnet list, but both
must be acquired for write to modify the list.

Replace previous no-op read locking macros, used in various places
in the stack, with actual locking to prevent race conditions. Callers
must declare when they may perform unbounded sleeps or not when
selecting how to lock.

Refactor vnet sysinits so that the vnet list and locks are initialized
before kernel modules are linked, as the kernel linker will use them
for modules loaded by the boot loader.

Update various consumers of these KPIs based on whether they may sleep
or not.

Reviewed by: bz
Approved by: re (kib)


# 195741 17-Jul-2009 jamie

Remove the interim vimage containers, struct vimage and struct procg,
and the ioctl-based interface that supported them.

Approved by: re (kib), bz (mentor)


# 195727 16-Jul-2009 rwatson

Remove unused VNET_SET() and related macros; only VNET_GET() is
ever actually used. Rename VNET_GET() to VNET() to shorten
variable references.

Discussed with: bz, julian
Reviewed by: bz
Approved by: re (kensmith, kib)


# 195699 14-Jul-2009 rwatson

Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator. Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...). This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.

Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack. Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory. Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.

Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy. Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address. When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.

This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.

Bump __FreeBSD_version and update UPDATING.

Portions submitted by: bz
Reviewed by: bz, zec
Discussed with: gnn, jamie, jeff, jhb, julian, sam
Suggested by: peter
Approved by: re (kensmith)


# 195175 29-Jun-2009 brooks

Remove support for the /dev/net/* per-interface devices. They serve
little purpose and are unused in the base system.

The IOCTL functionality is entirely duplicated and routing sockets
provide a richer interface than the kqueue functionality.

Further, it is not practical for these devices to be made sensible in
the face of VIMAGE.

Bump __FreeBSD_version on the off chance that there is any code out
there that actually uses this stuff.

Reviewed by: rwatson
Discussed with: bz, zec
Approved by: re@ (kensmith)


# 195097 27-Jun-2009 rwatson

Remove unnecessary include of kdb.h that snuck in during ifaddr refcount
work.

Reported by: pluknet <pluknet at gmail.com>
Approved by: re (kib)


# 195020 25-Jun-2009 rwatson

Define four wrapper functions for interface address locking,
if_addr_rlock() and if_addr_runlock() for regular address lists, and
if_maddr_rlock() and if_maddr_runlock() for multicast address lists.

We will use these in various kernel modules to avoid encoding specific
type and locking strategy information into modules that currently use
IF_ADDR_LOCK() and IF_ADDR_UNLOCK() directly.

MFC after: 6 weeks


# 194821 24-Jun-2009 rwatson

In if_setlladdr(), use IF_ADDR_LOCK() and ifaddr references to improve
the safety of link layer address manipulation.

MFC after: 6 weeks


# 194760 23-Jun-2009 rwatson

Modify most routines returning 'struct ifaddr *' to return references
rather than pointers, requiring callers to properly dispose of those
references. The following routines now return references:

ifaddr_byindex
ifa_ifwithaddr
ifa_ifwithbroadaddr
ifa_ifwithdstaddr
ifa_ifwithnet
ifaof_ifpforaddr
ifa_ifwithroute
ifa_ifwithroute_fib
rt_getifa
rt_getifa_fib
IFP_TO_IA
ip_rtaddr
in6_ifawithifp
in6ifa_ifpforlinklocal
in6ifa_ifpwithaddr
in6_ifadd
carp_iamatch6
ip6_getdstifaddr

Remove unused macro which didn't have required referencing:

IFP_TO_IA6

This closes many small races in which changes to interface
or address lists while an ifaddr was in use could lead to use of freed
memory (etc). In a few cases, add missing if_addr_list locking
required to safely acquire references.

Because of a lack of deep copying support, we accept a race in which
an in6_ifaddr pointed to by mbuf tags and extracted with
ip6_getdstifaddr() doesn't hold a reference while in transmit. Once
we have mbuf tag deep copy support, this can be fixed.

Reviewed by: bz
Obtained from: Apple, Inc. (portions)
MFC after: 6 weeks (portions)


# 194700 23-Jun-2009 bz

Remove duplicate #include <net/route.h> from the middle of the file.


# 194640 22-Jun-2009 bz

Move virtualization of routing related variables into their own
Vimage module, which had been there already but now is stateful.

All variables are now file local; so this further limits the global
spreading of routing related things throughout the kernel.

Add a missing function local variable in case of MPATHing.

Reviewed by: zec


# 194622 22-Jun-2009 rwatson

Add a new function, ifa_ifwithaddr_check(), which rather than returning
a pointer to an ifaddr matching the passed socket address, returns a
boolean indicating whether one was present. In the (near) future,
ifa_ifwithaddr() will return a referenced ifaddr rather than a raw
ifaddr pointer, and the new wrapper will allow callers that care only
about the boolean condition to avoid having to free that reference.

MFC after: 3 weeks


# 194620 22-Jun-2009 bz

After the update to fxp(4) in r194573 we should no longer need
this DELAY(100) hack introduced in r56938.

Thanks to: yongari
MFC after: 6 weeks
X-MFC note: not before the fxp(4) changes


# 194602 21-Jun-2009 rwatson

Clean up common ifaddr management:

- Unify reference count and lock initialization in a single function,
ifa_init().
- Move tear-down from a macro (IFAFREE) to a function ifa_free().
- Move reference count bump from a macro (IFAREF) to a function ifa_ref().
- Instead of using a u_int protected by a mutex to refcount(9) for
reference count management.

The ifa_mtx is now used for exactly one ioctl, and possibly should be
removed.

MFC after: 3 weeks


# 194581 21-Jun-2009 rdivacky

Switch cmd argument to u_long. This matches what if_ethersubr.c does and
allows the code to compile cleanly on amd64 with clang.

Reviewed by: rwatson
Approved by: ed (mentor)


# 194259 15-Jun-2009 sam

r193336 moved ifq_detach to if_free which broke if_alloc followed
by if_free (w/o doing if_attach); move ifq_attach to if_alloc and
rename ifq_attach/detach to ifq_init/ifq_delete to better identify
their purpose

Reviewed by: jhb, kmacy


# 194251 15-Jun-2009 jamie

Manage vnets via the jail system. If a jail is given the boolean
parameter "vnet" when it is created, a new vnet instance will be created
along with the jail. Networks interfaces can be moved between prisons
with an ioctl similar to the one that moves them between vimages.
For now vnets will co-exist under both jails and vimages, but soon
struct vimage will be going away.

Reviewed by: zec, julian
Approved by: bz (mentor)


# 193983 11-Jun-2009 bz

carp(4) allows people to share a set of IP addresses and can only
use IPv4/v6 for inter-node communication (according to my reading).

Properly wrap the carp callouts in INET || INET6 and refelect this
in sys/conf/files as well. While in theory this should be ok,
it might be a bit optimistic to think that carp could build with
inet6 only[1].

Discussed with: mlaier [1]


# 193951 10-Jun-2009 kib

Adapt vfs kqfilter to the shared vnode lock used by zfs write vop. Use
vnode interlock to protect the knote fields [1]. The locking assumes
that shared vnode lock is held, thus we get exclusive access to knote
either by exclusive vnode lock protection, or by shared vnode lock +
vnode interlock.

Do not use kl_locked() method to assert either lock ownership or the
fact that curthread does not own the lock. For shared locks, ownership
is not recorded, e.g. VOP_ISLOCKED can return LK_SHARED for the shared
lock not owned by curthread, causing false positives in kqueue subsystem
assertions about knlist lock.

Remove kl_locked method from knlist lock vector, and add two separate
assertion methods kl_assert_locked and kl_assert_unlocked, that are
supposed to use proper asserts. Change knlist_init accordingly.

Add convenience function knlist_init_mtx to reduce number of arguments
for typical knlist initialization.

Submitted by: jhb [1]
Noted by: jhb [2]
Reviewed by: jhb
Tested by: rnoland


# 193744 08-Jun-2009 bz

After r193232 rt_tables in vnet.h are no longer indirectly dependent on
the ROUTETABLES kernel option thus there is no need to include opt_route.h
anymore in all consumers of vnet.h and no longer depend on it for module
builds.

Remove the hidden include in flowtable.h as well and leave the two
explicit #includes in ip_input.c and ip_output.c.


# 193731 08-Jun-2009 zec

Introduce an infrastructure for dismantling vnet instances.

Vnet modules and protocol domains may now register destructor
functions to clean up and release per-module state. The destructor
mechanisms can be triggered by invoking "vimage -d", or a future
equivalent command which will be provided via the new jail framework.

While this patch introduces numerous placeholder destructor functions,
many of those are currently incomplete, thus leaking memory or (even
worse) failing to stop all running timers. Many of such issues are
already known and will be incrementaly fixed over the next weeks in
smaller incremental commits.

Apart from introducing new fields in structs ifnet, domain, protosw
and vnet_net, which requires the kernel and modules to be rebuilt, this
change should have no impact on nooptions VIMAGE builds, since vnet
destructors can only be called in VIMAGE kernels. Moreover,
destructor functions should be in general compiled in only in
options VIMAGE builds, except for kernel modules which can be safely
kldunloaded at run time.

Bump __FreeBSD_version to 800097.
Reviewed by: bz, julian
Approved by: rwatson, kib (re), julian (mentor)


# 193511 05-Jun-2009 rwatson

Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC
and used in a large number of files, but also because an increasing number
of incorrect uses of MAC calls were sneaking in due to copy-and-paste of
MAC-aware code without the associated opt_mac.h include.

Discussed with: pjd


# 193336 02-Jun-2009 sam

move ifq_detach from if_detach to if_free; this permits callers to
reference if_snd in the period between detach+free which helps simplify
detach code

Reviewed by: jhb, rwatson


# 193232 01-Jun-2009 bz

Convert the two dimensional array to be malloced and introduce
an accessor function to get the correct rnh pointer back.

Update netstat to get the correct pointer using kvm_read()
as well.

This not only fixes the ABI problem depending on the kernel
option but also permits the tunable to overwrite the kernel
option at boot time up to MAXFIBS, enlarging the number of
FIBs without having to recompile. So people could just use
GENERIC now.

Reviewed by: julian, rwatson, zec
X-MFC: not possible


# 193166 31-May-2009 zec

Introduce an interm userland-kernel API for creating vnets and
assigning ifnets from one vnet to another. Deletion of vnets is not
yet supported.

The interface is implemented as an ioctl extension so that no syscalls
had to be introduced. This should be acceptable given that the new
interface will be used for a short / interim period only, until the
new jail management framwork gains the capability of managing vnets.
This method for managing vimages / vnets has been in use for the past
7 years without any observable issues.

The userland tool to be used in conjunction with the interim API can be
found in p4: //depot/projects/vimage-commit2/src/usr.sbin/vimage/... and
will most probably never get commited to svn.

While here, bump copyright notices in kern_vimage.c and vimage.h to
cover work done in year 2009.

Approved by: julian (mentor)
Discussed with: bz, rwatson


# 192608 22-May-2009 zec

Set ifp->if_afdata_initialized to 0 while holding IF_AFDATA_LOCK on ifp,
not after the lock has been released.

Reviewed by: bz
Discussed with: rwatson


# 192605 22-May-2009 zec

Introduce the if_vmove() function, which will be used in the future
for reassigning ifnets from one vnet to another.

if_vmove() works by calling a restricted subset of actions normally
executed by if_detach() on an ifnet in the current vnet, and then
switches to the target vnet and executes an appropriate subset of
if_attach() actions there.

if_attach() and if_detach() have become wrapper functions around
if_attach_internal() and if_detach_internal(), where the later
variants have an additional argument, a flag indicating whether a
full attach or detach sequence is to be executed, or only a
restricted subset suitable for moving an ifnet from one vnet to
another. Hence, if_vmove() will not call if_detach() and if_attach()
directly, but will call the if_detach_internal() and
if_attach_internal() variants instead, with the vmove flag set.

While here, staticize ifnet_setbyindex() since it is not referenced
from outside of sys/net/if.c.

Also rename ifccnt field in struct vimage to ifcnt, and do some minor
whitespace garbage collection where appropriate.

This change should have no functional impact on nooptions VIMAGE kernel
builds.

Reviewed by: bz, rwatson, brooks?
Approved by: julian (mentor)


# 191816 05-May-2009 zec

Change the curvnet variable from a global const struct vnet *,
previously always pointing to the default vnet context, to a
dynamically changing thread-local one. The currvnet context
should be set on entry to networking code via CURVNET_SET() macros,
and reverted to previous state via CURVNET_RESTORE(). Recursions
on curvnet are permitted, though strongly discuouraged.

This change should have no functional impact on nooptions VIMAGE
kernel builds, where CURVNET_* macros expand to whitespace.

The curthread->td_vnet (aka curvnet) variable's purpose is to be an
indicator of the vnet context in which the current network-related
operation takes place, in case we cannot deduce the current vnet
context from any other source, such as by looking at mbuf's
m->m_pkthdr.rcvif->if_vnet, sockets's so->so_vnet etc. Moreover, so
far curvnet has turned out to be an invaluable consistency checking
aid: it helps to catch cases when sockets, ifnets or any other
vnet-aware structures may have leaked from one vnet to another.

The exact placement of the CURVNET_SET() / CURVNET_RESTORE() macros
was a result of an empirical iterative process, whith an aim to
reduce recursions on CURVNET_SET() to a minimum, while still reducing
the scope of CURVNET_SET() to networking only operations - the
alternative would be calling CURVNET_SET() on each system call entry.
In general, curvnet has to be set in three typicall cases: when
processing socket-related requests from userspace or from within the
kernel; when processing inbound traffic flowing from device drivers
to upper layers of the networking stack, and when executing
timer-driven networking functions.

This change also introduces a DDB subcommand to show the list of all
vnet instances.

Approved by: julian (mentor)


# 191688 30-Apr-2009 zec

Permit buiding kernels with options VIMAGE, restricted to only a single
active network stack instance. Turning on options VIMAGE at compile
time yields the following changes relative to default kernel build:

1) V_ accessor macros for virtualized variables resolve to structure
fields via base pointers, instead of being resolved as fields in global
structs or plain global variables. As an example, V_ifnet becomes:

options VIMAGE: ((struct vnet_net *) vnet_net)->_ifnet
default build: vnet_net_0._ifnet
options VIMAGE_GLOBALS: ifnet

2) INIT_VNET_* macros will declare and set up base pointers to be used
by V_ accessor macros, instead of resolving to whitespace:

INIT_VNET_NET(ifp->if_vnet); becomes

struct vnet_net *vnet_net = (ifp->if_vnet)->mod_data[VNET_MOD_NET];

3) Memory for vnet modules registered via vnet_mod_register() is now
allocated at run time in sys/kern/kern_vimage.c, instead of per vnet
module structs being declared as globals. If required, vnet modules
can now request the framework to provide them with allocated bzeroed
memory by filling in the vmi_size field in their vmi_modinfo structures.

4) structs socket, ifnet, inpcbinfo, tcpcb and syncache_head are
extended to hold a pointer to the parent vnet. options VIMAGE builds
will fill in those fields as required.

5) curvnet is introduced as a new global variable in options VIMAGE
builds, always pointing to the default and only struct vnet.

6) struct sysctl_oid has been extended with additional two fields to
store major and minor virtualization module identifiers, oid_v_subs and
oid_v_mod. SYSCTL_V_* family of macros will fill in those fields
accordingly, and store the offset in the appropriate vnet container
struct in oid_arg1.
In sysctl handlers dealing with virtualized sysctls, the
SYSCTL_RESOLVE_V_ARG1() macro will compute the address of the target
variable and make it available in arg1 variable for further processing.

Unused fields in structs vnet_inet, vnet_inet6 and vnet_ipfw have
been deleted.

Reviewed by: bz, rwatson
Approved by: julian (mentor)


# 191424 23-Apr-2009 rwatson

As with ifnet_byindex_ref(), don't return IFF_DYING interfaces from
ifunit_ref(). ifunit() continues to return them.

MFC after: 3 weeks


# 191423 23-Apr-2009 rwatson

Add ifunit_ref(), a version of ifunit(), that returns not just an
interface pointer, but also a reference to it.

Modify ifioctl() to use ifunit_ref(), holding the reference until
all ioctls, etc, have completed.

This closes a class of reader-writer races in which interfaces
could be removed during long-running ioctls, leading to crashes.
Many other consumers of ifunit() should now use ifunit_ref() to
avoid similar races.

MFC after: 3 weeks


# 191418 23-Apr-2009 rwatson

During if_detach(), invoke if_dead() to set the ifnet's function
pointers to "dead" implementations that no-op rather than invoking
the device driver. This would generally be unexpected and
possibly quite badly handled by most device drivers after
if_detach() has completed.

Reviewed by: bms
MFC after: 3 weeks


# 191417 23-Apr-2009 rwatson

Move portions of data structure initialization from if_attach() to
if_alloc(), and portions of data structure destruction from if_detach()
to if_free(). These changes leave more of the struct ifnet in a
safe-to-access condition between alloc and attach, and between detach
and free, and focus on attach/detach as stack usage events rather than
data structure initialization.

Affected fields include the linkstate task queue, if_afdata lock,
address lists, kqueue state, and MAC labels. ifq_attach() ifq_detach()
are not moved as ifq_attach() may use a queue length set by the device
driver between if_alloc() and if_attach().

MFC after: 3 weeks


# 191416 23-Apr-2009 rwatson

Add a new interface flag, IFF_DYING, which is set when a device driver
calls if_free(), and remains set if the refcount is elevated. IF_DYING
skips the bit in the if_flags bitmask previously used by IFF_NEEDSGIANT,
so that an MFC can be done without changing which bit is used, as
IFF_NEEDSGIANT is still present in 7.x.

ifnet_byindex_ref() checks for IFF_DYING and returns NULL if it is set,
preventing new references from by acquired by index, preventing
monitoring sysctls from seeing it. Other lookup mechanisms currently
do not check IFF_DYING, but may need to in the future.

MFC after: 3 weeks


# 191367 21-Apr-2009 rwatson

Start to address a number of races relating to use of ifnet pointers
after the corresponding interface has been destroyed:

(1) Add an ifnet refcount, ifp->if_refcount. Initialize it to 1 in
if_alloc(), and modify if_free_type() to decrement and check the
refcount.

(2) Add new if_ref() and if_rele() interfaces to allow kernel code
walking global interface lists to release IFNET_[RW]LOCK() yet
keep the ifnet stable. Currently, if_rele() is a no-op wrapper
around if_free(), but this may change in the future.

(3) Add new ifnet field, if_alloctype, which caches the type passed
to if_alloc(), but unlike if_type, won't be changed by drivers.
This allows asynchronous free's of the interface after the
driver has released it to still use the right type. Use that
instead of the type passed to if_free_type(), but assert that
they are the same (might have to rethink this if that doesn't
work out).

(4) Add a new ifnet_byindex_ref(), which looks up an interface by
index and returns a reference rather than a pointer to it.

(5) Fix if_alloc() to fully initialize the if_addr_mtx before hooking
up the ifnet to global lists.

(6) Modify sysctls in if_mib.c to use ifnet_byindex_ref() and release
the ifnet when done.

When this change is MFC'd, it will need to replace if_ispare fields
rather than adding new fields in order to avoid breaking the binary
interface. Once this change is MFC'd, if_free_type() should be
removed, as its 'type' argument is now optional.

This refcount is not appropriate for counting mbuf pkthdr references,
and also not for counting entry into the device driver via ifnet
function pointers. An rmlock may be appropriate for the latter.
Rather, this is about ensuring data structure stability when reaching
an ifnet via global ifnet lists and tables followed by copy in or out
of userspace.

MFC after: 3 weeks
Reported by: mdtancsa
Reviewed by: brooks


# 191365 21-Apr-2009 rwatson

Acquire the interface address list lock over some iterations over
if_addrhead. This closes some reader-writer races associated with
the address list.

MFC after: 2 weeks


# 191161 16-Apr-2009 kmacy

export if_qflush for use by driver if_qflush routines
only set ifp->if_{transmit, qflush} if not already set
KASSERT that neither or both are set


# 191112 15-Apr-2009 zec

In the !VIMAGE_GLOBALS case, make sure not to call vnet_net_iattach()
both via the vnet_mod_register() framework and then directly, but only
once.

Reviewed by: bz
Approved by: julian (mentor)


# 191037 14-Apr-2009 kmacy

call default if_qflush on ifq if default method isn't used by the driver


# 190909 11-Apr-2009 zec

Introduce vnet module registration / initialization framework with
dependency tracking and ordering enforcement.

With this change, per-vnet initialization functions introduced with
r190787 are no longer directly called from traditional initialization
functions (which cc in most cases inlined to pre-r190787 code), but are
instead registered via the vnet framework first, and are invoked only
after all prerequisite modules have been initialized. In the long run,
this framework should allow us to both initialize and dismantle
multiple vnet instances in a correct order.

The problem this change aims to solve is how to replay the
initialization sequence of various network stack components, which
have been traditionally triggered via different mechanisms (SYSINIT,
protosw). Note that this initialization sequence was and still can be
subtly different depending on whether certain pieces of code have been
statically compiled into the kernel, loaded as modules by boot
loader, or kldloaded at run time.

The approach is simple - we record the initialization sequence
established by the traditional mechanisms whenever vnet_mod_register()
is called for a particular vnet module. The vnet_mod_register_multi()
variant allows a single initializer function to be registered multiple
times but with different arguments - currently this is only used in
kern/uipc_domain.c by net_add_domain() with different struct domain *
as arguments, which allows for protosw-registered initialization
routines to be invoked in a correct order by the new vnet
initialization framework.

For the purpose of identifying vnet modules, each vnet module has to
have a unique ID, which is statically assigned in sys/vimage.h.
Dynamic assignment of vnet module IDs is not supported yet.

A vnet module may specify a single prerequisite module at registration
time by filling in the vmi_dependson field of its vnet_modinfo struct
with the ID of the module it depends on. Unless specified otherwise,
all vnet modules depend on VNET_MOD_NET (container for ifnet list head,
rt_tables etc.), which thus has to and will always be initialized
first. The framework will panic if it detects any unresolved
dependencies before completing system initialization. Detection of
unresolved dependencies for vnet modules registered after boot
(kldloaded modules) is not provided.

Note that the fact that each module can specify only a single
prerequisite may become problematic in the long run. In particular,
INET6 depends on INET being already instantiated, due to TCP / UDP
structures residing in INET container. IPSEC also depends on INET,
which will in turn additionally complicate making INET6-only kernel
configs a reality.

The entire registration framework can be compiled out by turning on the
VIMAGE_GLOBALS kernel config option.

Reviewed by: bz
Approved by: julian (mentor)


# 190903 10-Apr-2009 mlaier

Follow up for r190895 It's not only the "all" group that is affected, but
all groups on the given interface.

PR: kern/130977, kern/131310
MFC after: 3 days (%vnet)


# 190895 10-Apr-2009 mlaier

Remove interfaces from IFG_ALL on detach. This cures a couple of pf panics
when using the "self" keyword in tables or as ()-style host address and
fixes "ifconfig -g all" output.

PR: kern/130977, kern/131310
Submitted by: Mikolaj Golub
MFC after: 3 days


# 190787 06-Apr-2009 zec

First pass at separating per-vnet initializer functions
from existing functions for initializing global state.

At this stage, the new per-vnet initializer functions are
directly called from the existing global initialization code,
which should in most cases result in compiler inlining those
new functions, hence yielding a near-zero functional change.

Modify the existing initializer functions which are invoked via
protosw, like ip_init() et. al., to allow them to be invoked
multiple times, i.e. per each vnet. Global state, if any,
is initialized only if such functions are called within the
context of vnet0, which will be determined via the
IS_DEFAULT_VNET(curvnet) check (currently always true).

While here, V_irtualize a few remaining global UMA zones
used by net/netinet/netipsec networking code. While it is
not yet clear to me or anybody else whether this is the right
thing to do, at this stage this makes the code more readable,
and makes it easier to track uncollected UMA-zone-backed
objects on vnet removal. In the long run, it's quite possible
that some form of shared use of UMA zone pools among multiple
vnets should be considered.

Bump __FreeBSD_version due to changes in layout of structs
vnet_ipfw, vnet_inet and vnet_net.

Approved by: julian (mentor)


# 190508 28-Mar-2009 sam

enable setting the mac address of 802.11 devices


# 190151 20-Mar-2009 jamie

Call the interface's if_ioctl from ifioctl(), if the protocol didn't
handle the ioctl. There are other paths that already call it, but this
allows for a non-interface socket (like AF_LOCAL which ifconfig now
uses) to use a broader class of interface ioctls.

Approved by: bz (mentor), rwatson


# 189851 15-Mar-2009 rwatson

Remove IFF_NEEDSGIANT, a compatibility infrastructure introduced
in FreeBSD 5.x to allow network device drivers to run with Giant
despite the network stack being Giant-free. This significantly
simplifies calls into ioctl() on network interfaces, especially
in the multicast code, as well as eliminates deferred invocation
of interface if_start routines.

Disable the build on device drivers still depending on
IFF_NEEDSGIANT as they no longer compile. They will be removed
in a few weeks if they haven't been made MPSAFE in that time.
Disabled drivers:

if_ar
if_axe
if_aue
if_cdce
if_cue
if_kue
if_ray
if_rue
if_rum
if_sr
if_udav
if_ural
if_zyd

Drivers that were already disabled because of tty changes:

if_ppp
if_sl

Discussed on: arch@


# 189800 14-Mar-2009 sam

remove stray ;


# 189106 27-Feb-2009 bz

For all files including net/vnet.h directly include opt_route.h and
net/route.h.

Remove the hidden include of opt_route.h and net/route.h from net/vnet.h.

We need to make sure that both opt_route.h and net/route.h are included
before net/vnet.h because of the way MRT figures out the number of FIBs
from the kernel option. If we do not, we end up with the default number
of 1 when including net/vnet.h and array sizes are wrong.

This does not change the list of files which depend on opt_route.h
but we can identify them now more easily.


# 188144 05-Feb-2009 jamie

Standardize the various prison_foo_ip[46] functions and prison_if to
return zero on success and an error code otherwise. The possible errors
are EADDRNOTAVAIL if an address being checked for doesn't match the
prison, and EAFNOSUPPORT if the prison doesn't have any addresses in
that address family. For most callers of these functions, use the
returned error code instead of e.g. a hard-coded EADDRNOTAVAIL or
EINVAL.

Always include a jailed() check in these functions, where a non-jailed
cred always returns success (and makes no changes). Remove the explicit
jailed() checks that preceded many of the function calls.

Approved by: bz (mentor)


# 187648 23-Jan-2009 jhb

Only start the if_slowtimo timer (which drives the if_watchdog methods of
network interfaces) if we have at least one interface with an if_watchdog
routine.

MFC after: 2 weeks


# 186275 18-Dec-2008 kmacy

if_rtdel is always called with the RADIX_NODE_HEAD lock held


# 186266 18-Dec-2008 kmacy

add ifnet_byindex_locked to allow for use of IFNET_RLOCK


# 186209 17-Dec-2008 kmacy

avoid trying to acquire a shared lock while holding an exclusive lock
by making the ifnet lock acquisition exclusive


# 186199 16-Dec-2008 kmacy

convert ifnet and afdata locks from mutexes to rwlocks


# 186119 15-Dec-2008 qingli

This main goals of this project are:
1. separating L2 tables (ARP, NDP) from the L3 routing tables
2. removing as much locking dependencies among these layers as
possible to allow for some parallelism in the search operations
3. simplify the logic in the routing code,

The most notable end result is the obsolescent of the route
cloning (RTF_CLONING) concept, which translated into code reduction
in both IPv4 ARP and IPv6 NDP related modules, and size reduction in
struct rtentry{}. The change in design obsoletes the semantics of
RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland
applications such as "arp" and "ndp" have been modified to reflect
those changes. The output from "netstat -r" shows only the routing
entries.

Quite a few developers have contributed to this project in the
past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and
Andre Oppermann. And most recently:

- Kip Macy revised the locking code completely, thus completing
the last piece of the puzzle, Kip has also been conducting
active functional testing
- Sam Leffler has helped me improving/refactoring the code, and
provided valuable reviews
- Julian Elischer setup the perforce tree for me and has helped
me maintaining that branch before the svn conversion


# 185931 11-Dec-2008 bz

Whitespace changes only - tabs must have been converted to spaces
somehow, when moving the code from p4 to svn.

Sponsored by: The FreeBSD Foundation


# 185895 10-Dec-2008 zec

Conditionally compile out V_ globals while instantiating the appropriate
container structures, depending on VIMAGE_GLOBALS compile time option.

Make VIMAGE_GLOBALS a new compile-time option, which by default will not
be defined, resulting in instatiations of global variables selected for
V_irtualization (enclosed in #ifdef VIMAGE_GLOBALS blocks) to be
effectively compiled out. Instantiate new global container structures
to hold V_irtualized variables: vnet_net_0, vnet_inet_0, vnet_inet6_0,
vnet_ipsec_0, vnet_netgraph_0, and vnet_gif_0.

Update the VSYM() macro so that depending on VIMAGE_GLOBALS the V_
macros resolve either to the original globals, or to fields inside
container structures, i.e. effectively

#ifdef VIMAGE_GLOBALS
#define V_rt_tables rt_tables
#else
#define V_rt_tables vnet_net_0._rt_tables
#endif

Update SYSCTL_V_*() macros to operate either on globals or on fields
inside container structs.

Extend the internal kldsym() lookups with the ability to resolve
selected fields inside the virtualization container structs. This
applies only to the fields which are explicitly registered for kldsym()
visibility via VNET_MOD_DECLARE() and vnet_mod_register(), currently
this is done only in sys/net/if.c.

Fix a few broken instances of MODULE_GLOBAL() macro use in SCTP code,
and modify the MODULE_GLOBAL() macro to resolve to V_ macros, which in
turn result in proper code being generated depending on VIMAGE_GLOBALS.

De-virtualize local static variables in sys/contrib/pf/net/pf_subr.c
which were prematurely V_irtualized by automated V_ prepending scripts
during earlier merging steps. PF virtualization will be done
separately, most probably after next PF import.

Convert a few variable initializations at instantiation to
initialization in init functions, most notably in ipfw. Also convert
TUNABLE_INT() initializers for V_ variables to TUNABLE_FETCH_INT() in
initializer functions.

Discussed at: devsummit Strassburg
Reviewed by: bz, julian
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation


# 185810 09-Dec-2008 bz

It does not make much sense to include net/route.h twice.
Remove one #include.


# 185808 09-Dec-2008 bz

Add rwlock.h (and lock.h for that) to keep no-INET kernels compiling
after RADIX_NODE_HEAD_{,UN}LOCK() were added. Must have been "learned"
by pollution before (most likely: route.h -> radix.h -> rwlock.h)


# 185571 02-Dec-2008 bz

Rather than using hidden includes (with cicular dependencies),
directly include only the header files needed. This reduces the
unneeded spamming of various headers into lots of files.

For now, this leaves us with very few modules including vnet.h
and thus needing to depend on opt_route.h.

Reviewed by: brooks, gnn, des, zec, imp
Sponsored by: The FreeBSD Foundation


# 185435 29-Nov-2008 bz

MFp4:
Bring in updated jail support from bz_jail branch.

This enhances the current jail implementation to permit multiple
addresses per jail. In addtion to IPv4, IPv6 is supported as well.
Due to updated checks it is even possible to have jails without
an IP address at all, which basically gives one a chroot with
restricted process view, no networking,..

SCTP support was updated and supports IPv6 in jails as well.

Cpuset support permits jails to be bound to specific processor
sets after creation.

Jails can have an unrestricted (no duplicate protection, etc.) name
in addition to the hostname. The jail name cannot be changed from
within a jail and is considered to be used for management purposes
or as audit-token in the future.

DDB 'show jails' command was added to aid debugging.

Proper compat support permits 32bit jail binaries to be used on 64bit
systems to manage jails. Also backward compatibility was preserved where
possible: for jail v1 syscalls, as well as with user space management
utilities.

Both jail as well as prison version were updated for the new features.
A gap was intentionally left as the intermediate versions had been
used by various patches floating around the last years.

Bump __FreeBSD_version for the afore mentioned and in kernel changes.

Special thanks to:
- Pawel Jakub Dawidek (pjd) for his multi-IPv4 patches
and Olivier Houchard (cognet) for initial single-IPv6 patches.
- Jeff Roberson (jeff) and Randall Stewart (rrs) for their
help, ideas and review on cpuset and SCTP support.
- Robert Watson (rwatson) for lots and lots of help, discussions,
suggestions and review of most of the patch at various stages.
- John Baldwin (jhb) for his help.
- Simon L. Nielsen (simon) as early adopter testing changes
on cluster machines as well as all the testers and people
who provided feedback the last months on freebsd-jail and
other channels.
- My employer, CK Software GmbH, for the support so I could work on this.

Reviewed by: (see above)
MFC after: 3 months (this is just so that I get the mail)
X-MFC Before: 7.2-RELEASE if possible


# 185348 26-Nov-2008 zec

Merge more of currently non-functional (i.e. resolving to
whitespace) macros from p4/vimage branch.

Do a better job at enclosing all instantiations of globals
scheduled for virtualization in #ifdef VIMAGE_GLOBALS blocks.

De-virtualize and mark as const saorder_state_alive and
saorder_state_any arrays from ipsec code, given that they are never
updated at runtime, so virtualizing them would be pointless.

Reviewed by: bz, julian
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation


# 185253 24-Nov-2008 sam

use consistent style


# 185162 22-Nov-2008 kmacy

- bump __FreeBSD version to reflect added buf_ring, memory barriers,
and ifnet functions

- add memory barriers to <machine/atomic.h>
- update drivers to only conditionally define their own

- add lockless producer / consumer ring buffer
- remove ring buffer implementation from cxgb and update its callers

- add if_transmit(struct ifnet *ifp, struct mbuf *m) to ifnet to
allow drivers to efficiently manage multiple hardware queues
(i.e. not serialize all packets through one ifq)
- expose if_qflush to allow drivers to flush any driver managed queues

This work was supported by Bitgravity Inc. and Chelsio Inc.


# 185088 19-Nov-2008 zec

Change the initialization methodology for global variables scheduled
for virtualization.

Instead of initializing the affected global variables at instatiation,
assign initial values to them in initializer functions. As a rule,
initialization at instatiation for such variables should never be
introduced again from now on. Furthermore, enclose all instantiations
of such global variables in #ifdef VIMAGE_GLOBALS blocks.

Essentialy, this change should have zero functional impact. In the next
phase of merging network stack virtualization infrastructure from
p4/vimage branch, the new initialization methology will allow us to
switch between using global variables and their counterparts residing in
virtualization containers with minimum code churn, and in the long run
allow us to intialize multiple instances of such container structures.

Discussed at: devsummit Strassburg
Reviewed by: bz, julian
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation


# 184726 06-Nov-2008 bz

Include if_arp.h for IFP2AC so that the netgraph parts in if.c
are happy even if compiled without INET or INET6.

MFC after: 2 months


# 184205 23-Oct-2008 des

Retire the MALLOC and FREE macros. They are an abomination unto style(9).

MFC after: 3 months


# 183550 02-Oct-2008 zec

Step 1.5 of importing the network stack virtualization infrastructure
from the vimage project, as per plan established at devsummit 08/08:
http://wiki.freebsd.org/Image/Notes200808DevSummit

Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator
macros, and CURVNET_SET() context setting macros, all currently
resolving to NOPs.

Prepare for virtualization of selected SYSCTL objects by introducing a
family of SYSCTL_V_*() macros, currently resolving to their global
counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT().

Move selected #defines from sys/sys/vimage.h to newly introduced header
files specific to virtualized subsystems (sys/net/vnet.h,
sys/netinet/vinet.h etc.).

All the changes are verified to have zero functional impact at this
point in time by doing MD5 comparision between pre- and post-change
object files(*).

(*) netipsec/keysock.c did not validate depending on compile time options.

Implemented by: julian, bz, brooks, zec
Reviewed by: julian, bz, brooks, kris, rwatson, ...
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation


# 183397 27-Sep-2008 ed

Replace all calls to minor() with dev2unit().

After I removed all the unit2minor()/minor2unit() calls from the kernel
yesterday, I realised calling minor() everywhere is quite confusing.
Character devices now only have the ability to store a unit number, not
a minor number. Remove the confusion by using dev2unit() everywhere.

This commit could also be considered as a bug fix. A lot of drivers call
minor(), while they should actually be calling dev2unit(). In -CURRENT
this isn't a problem, but it turns out we never had any problem reports
related to that issue in the past. I suspect not many people connect
more than 256 pieces of the same hardware.

Reviewed by: kib


# 183381 26-Sep-2008 ed

Remove unit2minor() use from kernel code.

When I changed kern_conf.c three months ago I made device unit numbers
equal to (unneeded) device minor numbers. We used to require
bitshifting, because there were eight bits in the middle that were
reserved for a device major number. Not very long after I turned
dev2unit(), minor(), unit2minor() and minor2unit() into macro's.
The unit2minor() and minor2unit() macro's were no-ops.

We'd better not remove these four macro's from the kernel, because there
is a lot of (external) code that may still depend on them. For now it's
harmless to remove all invocations of unit2minor() and minor2unit().

Reviewed by: kib


# 182106 24-Aug-2008 bz

Make the checks for ptp interfaces in ifa_ifwithdstaddr() and
ifa_ifwithnet() look more similar by comparing the pointer to NULL
in both cases.

MFC after: 3 months


# 181900 20-Aug-2008 thompsa

ifnet_setbyindex() is only used locally, go back to being static.


# 181887 19-Aug-2008 julian

A bunch of formatting fixes brough to light by, or created by the Vimage commit
a few days ago.


# 181803 17-Aug-2008 bz

Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).

This is the first in a series of commits over the course
of the next few weeks.

Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.

We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.

Obtained from: //depot/projects/vimage-commit2/...
Reviewed by: brooks, des, ed, mav, julian,
jamie, kris, rwatson, zec, ...
(various people I forgot, different versions)
md5 (with a bit of help)
Sponsored by: NLnet Foundation, The FreeBSD Foundation
X-MFC after: never
V_Commit_Message_Reviewed_By: more people than the patch


# 180042 26-Jun-2008 rwatson

Introduce locking around use of ifindex_table, whose use was previously
unsynchronized. While races were extremely rare, we've now had a
couple of reports of panics in environments involving large numbers of
IPSEC tunnels being added very quickly on an active system.

- Add accessor functions ifnet_byindex(), ifaddr_byindex(),
ifdev_byindex() to replace existing accessor macros. These functions
now acquire the ifnet lock before derefencing the table.
- Add IFNET_WLOCK_ASSERT().
- Add static accessor functions ifnet_setbyindex(), ifdev_setbyindex(),
which set values in the table either asserting of acquiring the ifnet
lock.
- Use accessor functions throughout if.c to modify and read
ifindex_table.
- Rework ifnet attach/detach to lock around ifindex_table modification.

Note that these changes simply close races around use of ifindex_table,
and make no attempt to solve the probem of disappearing ifnets. Further
refinement of this work, including with respect to ifindex_table
resizing, is still required.

In a future change, the ifnet lock should be converted from a mutex to an
rwlock in order to reduce contention.

Reviewed and tested by: brooks


# 179066 17-May-2008 brooks

The if_check() function performed three actions:
- verified that the ifp->if_snd.ifq_mtx was initalized for
all attached interfaces. This was pointless because it was
initalized for all interfaces in if_attach() so I've removed it.
- Checked that ifp->if_snd.ifq_maxlen is initalized and set it to
ifqmaxlen if unset. This makes more sense in if_attach() so
I moved it there.
- The first call of if_slowtimo(). Delete if_check() and call
if_slowtimo() directly from the SYSINIT().


# 178888 09-May-2008 julian

Add code to allow the system to handle multiple routing tables.
This particular implementation is designed to be fully backwards compatible
and to be MFC-able to 7.x (and 6.x)

Currently the only protocol that can make use of the multiple tables is IPv4
Similar functionality exists in OpenBSD and Linux.

From my notes:

-----

One thing where FreeBSD has been falling behind, and which by chance I
have some time to work on is "policy based routing", which allows
different
packet streams to be routed by more than just the destination address.

Constraints:
------------

I want to make some form of this available in the 6.x tree
(and by extension 7.x) , but FreeBSD in general needs it so I might as
well do it in -current and back port the portions I need.

One of the ways that this can be done is to have the ability to
instantiate multiple kernel routing tables (which I will now
refer to as "Forwarding Information Bases" or "FIBs" for political
correctness reasons). Which FIB a particular packet uses to make
the next hop decision can be decided by a number of mechanisms.
The policies these mechanisms implement are the "Policies" referred
to in "Policy based routing".

One of the constraints I have if I try to back port this work to
6.x is that it must be implemented as a EXTENSION to the existing
ABIs in 6.x so that third party applications do not need to be
recompiled in timespan of the branch.

This first version will not have some of the bells and whistles that
will come with later versions. It will, for example, be limited to 16
tables in the first commit.
Implementation method, Compatible version. (part 1)
-------------------------------
For this reason I have implemented a "sufficient subset" of a
multiple routing table solution in Perforce, and back-ported it
to 6.x. (also in Perforce though not always caught up with what I
have done in -current/P4). The subset allows a number of FIBs
to be defined at compile time (8 is sufficient for my purposes in 6.x)
and implements the changes needed to allow IPV4 to use them. I have not
done the changes for ipv6 simply because I do not need it, and I do not
have enough knowledge of ipv6 (e.g. neighbor discovery) needed to do it.

Other protocol families are left untouched and should there be
users with proprietary protocol families, they should continue to work
and be oblivious to the existence of the extra FIBs.

To understand how this is done, one must know that the current FIB
code starts everything off with a single dimensional array of
pointers to FIB head structures (One per protocol family), each of
which in turn points to the trie of routes available to that family.

The basic change in the ABI compatible version of the change is to
extent that array to be a 2 dimensional array, so that
instead of protocol family X looking at rt_tables[X] for the
table it needs, it looks at rt_tables[Y][X] when for all
protocol families except ipv4 Y is always 0.
Code that is unaware of the change always just sees the first row
of the table, which of course looks just like the one dimensional
array that existed before.

The entry points rtrequest(), rtalloc(), rtalloc1(), rtalloc_ign()
are all maintained, but refer only to the first row of the array,
so that existing callers in proprietary protocols can continue to
do the "right thing".
Some new entry points are added, for the exclusive use of ipv4 code
called in_rtrequest(), in_rtalloc(), in_rtalloc1() and in_rtalloc_ign(),
which have an extra argument which refers the code to the correct row.

In addition, there are some new entry points (currently called
rtalloc_fib() and friends) that check the Address family being
looked up and call either rtalloc() (and friends) if the protocol
is not IPv4 forcing the action to row 0 or to the appropriate row
if it IS IPv4 (and that info is available). These are for calling
from code that is not specific to any particular protocol. The way
these are implemented would change in the non ABI preserving code
to be added later.

One feature of the first version of the code is that for ipv4,
the interface routes show up automatically on all the FIBs, so
that no matter what FIB you select you always have the basic
direct attached hosts available to you. (rtinit() does this
automatically).

You CAN delete an interface route from one FIB should you want
to but by default it's there. ARP information is also available
in each FIB. It's assumed that the same machine would have the
same MAC address, regardless of which FIB you are using to get
to it.

This brings us as to how the correct FIB is selected for an outgoing
IPV4 packet.

Firstly, all packets have a FIB associated with them. if nothing
has been done to change it, it will be FIB 0. The FIB is changed
in the following ways.

Packets fall into one of a number of classes.

1/ locally generated packets, coming from a socket/PCB.
Such packets select a FIB from a number associated with the
socket/PCB. This in turn is inherited from the process,
but can be changed by a socket option. The process in turn
inherits it on fork. I have written a utility call setfib
that acts a bit like nice..

setfib -3 ping target.example.com # will use fib 3 for ping.

It is an obvious extension to make it a property of a jail
but I have not done so. It can be achieved by combining the setfib and
jail commands.

2/ packets received on an interface for forwarding.
By default these packets would use table 0,
(or possibly a number settable in a sysctl(not yet)).
but prior to routing the firewall can inspect them (see below).
(possibly in the future you may be able to associate a FIB
with packets received on an interface.. An ifconfig arg, but not yet.)

3/ packets inspected by a packet classifier, which can arbitrarily
associate a fib with it on a packet by packet basis.
A fib assigned to a packet by a packet classifier
(such as ipfw) would over-ride a fib associated by
a more default source. (such as cases 1 or 2).

4/ a tcp listen socket associated with a fib will generate
accept sockets that are associated with that same fib.

5/ Packets generated in response to some other packet (e.g. reset
or icmp packets). These should use the FIB associated with the
packet being reponded to.

6/ Packets generated during encapsulation.
gif, tun and other tunnel interfaces will encapsulate using the FIB
that was in effect withthe proces that set up the tunnel.
thus setfib 1 ifconfig gif0 [tunnel instructions]
will set the fib for the tunnel to use to be fib 1.

Routing messages would be associated with their
process, and thus select one FIB or another.
messages from the kernel would be associated with the fib they
refer to and would only be received by a routing socket associated
with that fib. (not yet implemented)

In addition Netstat has been edited to be able to cope with the
fact that the array is now 2 dimensional. (It looks in system
memory using libkvm (!)). Old versions of netstat see only the first FIB.

In addition two sysctls are added to give:
a) the number of FIBs compiled in (active)
b) the default FIB of the calling process.

Early testing experience:
-------------------------

Basically our (IronPort's) appliance does this functionality already
using ipfw fwd but that method has some drawbacks.

For example,
It can't fully simulate a routing table because it can't influence the
socket's choice of local address when a connect() is done.

Testing during the generating of these changes has been
remarkably smooth so far. Multiple tables have co-existed
with no notable side effects, and packets have been routes
accordingly.

ipfw has grown 2 new keywords:

setfib N ip from anay to any
count ip from any to any fib N

In pf there seems to be a requirement to be able to give symbolic names to the
fibs but I do not have that capacity. I am not sure if it is required.

SCTP has interestingly enough built in support for this, called VRFs
in Cisco parlance. it will be interesting to see how that handles it
when it suddenly actually does something.

Where to next:
--------------------

After committing the ABI compatible version and MFCing it, I'd
like to proceed in a forward direction in -current. this will
result in some roto-tilling in the routing code.

Firstly: the current code's idea of having a separate tree per
protocol family, all of the same format, and pointed to by the
1 dimensional array is a bit silly. Especially when one considers that
there is code that makes assumptions about every protocol having the
same internal structures there. Some protocols don't WANT that
sort of structure. (for example the whole idea of a netmask is foreign
to appletalk). This needs to be made opaque to the external code.

My suggested first change is to add routing method pointers to the
'domain' structure, along with information pointing the data.
instead of having an array of pointers to uniform structures,
there would be an array pointing to the 'domain' structures
for each protocol address domain (protocol family),
and the methods this reached would be called. The methods would have
an argument that gives FIB number, but the protocol would be free
to ignore it.

When the ABI can be changed it raises the possibilty of the
addition of a fib entry into the "struct route". Currently,
the structure contains the sockaddr of the desination, and the resulting
fib entry. To make this work fully, one could add a fib number
so that given an address and a fib, one can find the third element, the
fib entry.

Interaction with the ARP layer/ LL layer would need to be
revisited as well. Qing Li has been working on this already.

This work was sponsored by Ironport Systems/Cisco

Reviewed by: several including rwatson, bz and mlair (parts each)
Obtained from: Ironport systems/Cisco


# 178323 19-Apr-2008 brooks

Delay the global registration of the struct ifnet in if_alloc() until after
we're certain the allocation will entierly succeed. This fixes a leak in a
fairly unlikely case.

Reported by: vijay singh <vijjus at rocketmail dot com>
MFC after: 1 week


# 177617 25-Mar-2008 sam

expose if_purgemaddrs, it will be used by the vap code unless someone
redesigns the mcast support code in the next few weeks

MFC after: 3 weeks


# 177253 16-Mar-2008 rwatson

In keeping with style(9)'s recommendations on macros, use a ';'
after each SYSINIT() macro invocation. This makes a number of
lightweight C parsers much happier with the FreeBSD kernel
source, including cflow's prcc and lxr.

MFC after: 1 month
Discussed with: imp, rink


# 176906 07-Mar-2008 rwatson

Move IFF_NEEDSGIANT warning from if_ethersubr.c to if.c so it is displayed
for all network interfaces, not just ethernet-like ones.

Upgrade it to a louder WARNING and be explicit that the flag is obsolete.
Support for IFF_NEEDSGIANT will be removed in a few months (see arch@ for
details) and will not appear in 8.0.

Upgrade if_watchdog to a WARNING.


# 172930 24-Oct-2007 rwatson

Merge first in a series of TrustedBSD MAC Framework KPI changes
from Mac OS X Leopard--rationalize naming for entry points to
the following general forms:

mac_<object>_<method/action>
mac_<object>_check_<method/action>

The previous naming scheme was inconsistent and mostly
reversed from the new scheme. Also, make object types more
consistent and remove spaces from object types that contain
multiple parts ("posix_sem" -> "posixsem") to make mechanical
parsing easier. Introduce a new "netinet" object type for
certain IPv4/IPv6-related methods. Also simplify, slightly,
some entry point names.

All MAC policy modules will need to be recompiled, and modules
not updates as part of this commit will need to be modified to
conform to the new KPI.

Sponsored by: SPARTA (original patches against Mac OS X)
Obtained from: TrustedBSD Project, Apple Computer


# 171613 27-Jul-2007 rwatson

First in a series of changes to remove the now-unused Giant compatibility
framework for non-MPSAFE network protocols:

- Remove debug_mpsafenet variable, sysctl, and tunable.
- Remove NET_NEEDS_GIANT() and associate SYSINITSs used by it to force
debug.mpsafenet=0 if non-MPSAFE protocols are compiled into the kernel.
- Remove logic to automatically flag interrupt handlers as non-MPSAFE if
debug.mpsafenet is set for an INTR_TYPE_NET handler.
- Remove logic to automatically flag netisr handlers as non-MPSAFE if
debug.mpsafenet is set.
- Remove references in a few subsystems, including NFS and Cronyx drivers,
which keyed off debug_mpsafenet to determine various aspects of their own
locking behavior.
- Convert NET_LOCK_GIANT(), NET_UNLOCK_GIANT(), and NET_ASSERT_GIANT into
no-op's, as their entire behavior was determined by the value in
debug_mpsafenet.
- Alias NET_CALLOUT_MPSAFE to CALLOUT_MPSAFE.

Many remaining references to NET_.*_GIANT() and NET_CALLOUT_MPSAFE are still
present in subsystems, and will be removed in followup commits.

Reviewed by: bz, jhb
Approved by: re (kensmith)


# 169619 16-May-2007 brooks

Update the comments on if_alloc(), if_free(), if_free_type(), and
if_attach.

Remove a comment about pre-3.0 network drivers from if_attach().

Be a bit more consistant about whitespace near comments.


# 168793 16-Apr-2007 thompsa

Rename the trunk(4) driver to lagg(4) as it is too similar to vlan trunking.

The name trunk is misused as the networking term trunk means carrying multiple
VLANs over a single connection. The IEEE standard for link aggregation (802.3
section 3) does not talk about 'trunk' at all while it is used throughout IEEE
802.1Q in describing vlans.

The lagg(4) driver provides link aggregation, failover and fault tolerance.

Discussed on: current@


# 168561 09-Apr-2007 thompsa

Add the trunk(4) driver for providing link aggregation, failover and fault
tolerance. This driver allows aggregation of multiple network interfaces as
one virtual interface using a number of different protocols/algorithms.

failover - Sends traffic through the secondary port if the master becomes
inactive.
fec - Supports Cisco Fast EtherChannel.
lacp - Supports the IEEE 802.3ad Link Aggregation Control Protocol
(LACP) and the Marker Protocol.
loadbalance - Static loadbalancing using an outgoing hash.
roundrobin - Distributes outgoing traffic using a round-robin scheduler
through all active ports.

This code was obtained from OpenBSD and this also includes 802.3ad LACP support
from agr(4) in NetBSD.


# 167943 27-Mar-2007 bms

Fix a case where hardware removal of an interface caused an attempt to
announce an ll_ifma which has gone away. Add a KASSERT to catch regressions.

Bug found by: Tom Uffner


# 167732 20-Mar-2007 bms

Fix tinderbox; ng_ether needs to see if_findmulti().


# 167729 19-Mar-2007 bms

Implement reference counting for ifmultiaddr, in_multi, and in6_multi
structures. Detect when ifnet instances are detached from the network
stack and perform appropriate cleanup to prevent memory leaks.

This has been implemented in such a way as to be backwards ABI compatible.
Kernel consumers are changed to use if_delmulti_ifma(); in_delmulti()
is unable to detect interface removal by design, as it performs searches
on structures which are removed with the interface.

With this architectural change, the panics FreeBSD users have experienced
with carp and pfsync should be resolved.

Obtained from: p4 branch bms_netdev
Reviewed by: andre
Sponsored by: Garance A Drosehn
Idea from: NetBSD
MFC after: 1 month


# 166879 21-Feb-2007 bms

Fix a bug in if_findmulti(), whereby it would not find (and thus delete)
a link-layer multicast group membership.
Such memberships are needed in order to support protocols such as
IS-IS without putting the interface into PROMISC or ALLMULTI modes.

sa_equal() is not OK for comparing sockaddr_dl as it has deeper structure
than a simple byte array, so add sa_dl_equal() and use that instead.

Reviewed by: rwatson
Verified with: /usr/sbin/mtest
Bug found by: Jouke Witteveen
MFC after: 2 weeks


# 164772 30-Nov-2006 glebius

The recent issues with em(4) interface has shown that the old 4.4BSD
if_watchdog/if_timer interface doesn't fit modern SMP network
stack design.

Device drivers that need watchdog to monitor their hardware should
implement it theirselves.

Eventually the if_watchdog/if_timer API will be removed. For now,
warn that driver uses it.

Reviewed by: scottl


# 164033 06-Nov-2006 rwatson

Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges. These may
require some future tweaking.

Sponsored by: nCircle Network Security, Inc.
Obtained from: TrustedBSD Project
Discussed on: arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
Alex Lyashkov <umka at sevcity dot net>,
Skip Ford <skip dot ford at verizon dot net>,
Antoine Brodin <antoine dot brodin at laposte dot net>


# 163606 22-Oct-2006 rwatson

Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h
begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now
contains the userspace and user<->kernel API and definitions, with all
in-kernel interfaces moved to mac_framework.h, which is now included
across most of the kernel instead.

This change is the first step in a larger cleanup and sweep of MAC
Framework interfaces in the kernel, and will not be MFC'd.

Obtained from: TrustedBSD Project
Sponsored by: SPARTA


# 162068 06-Sep-2006 andre

Fix the socket option IP_ONESBCAST by giving it its own case in ip_output()
and skip over the normal IP processing.

Add a supporting function ifa_ifwithbroadaddr() to verify and validate the
supplied subnet broadcast address.

PR: kern/99558
Tested by: Andrey V. Elsukov <bu7cher-at-yandex.ru>
Sponsored by: TCP/IP Optimization Fundraise 2005
MFC after: 3 days


# 160195 09-Jul-2006 sam

Revise network interface cloning to take an optional opaque
parameter that can specify configuration parameters:
o rev cloner api's to add optional parameter block
o add SIOCCREATE2 that accepts parameter data
o rev vlan support to use new api (maintain old code)

Reviewed by: arch@


# 160038 29-Jun-2006 yar

There is a consensus that ifaddr.ifa_addr should never be NULL,
except in places dealing with ifaddr creation or destruction; and
in such special places incomplete ifaddrs should never be linked
to system-wide data structures. Therefore we can eliminate all the
superfluous checks for "ifa->ifa_addr != NULL" and get ready
to the system crashing honestly instead of masking possible bugs.

Suggested by: glebius, jhb, ru


# 159822 21-Jun-2006 glebius

- First initialize ifnet, and then insert it into global
list.
- First remove from global list, then start destroying.

PR: kern/97679
Submitted by: Alex Lyashkov <shadow itt.net.ru>
Reviewed by: rwatson, brooks


# 159781 19-Jun-2006 mlaier

Import interface groups from OpenBSD. This allows to group interfaces in
order to - for example - apply firewall rules to a whole group of
interfaces. This is required for importing pf from OpenBSD 3.9

Obtained from: OpenBSD (with changes)
Discussed on: -net (back in April)


# 159528 11-Jun-2006 fjoe

Fix KASSERT conditions in if_deregister_com_alloc().


# 159126 31-May-2006 thompsa

Announce all interfaces to devd on attach/detach. This adds a new devctl
notification so all interfaces including pseudo are reported. When netif
creates the clones at startup devctl_disable has not been turned off yet so the
interfaces will not be initialised twice, enforce this by adding an explicit
order between rc.d/netif and rc.d/devd.

This change allows actions to taken in userland when an interface is cloned
and the pseudo interface will be automatically configured if a ifconfig_<int>=""
line exists in rc.conf.

Reviewed by: brooks
No objections on: net


# 156948 21-Mar-2006 glebius

No direct call to carp_ifdetach() anymore. It is called by
event handler.

PR: kern/82908
Submitted by: Dan Lukes <dan obluda.cz>


# 155224 02-Feb-2006 ps

Implement SIOCGIFCONF for 32bit binaries.


# 155051 30-Jan-2006 glebius

Merge the //depot/user/yar/vlan branch into CVS. It contains some collective
work by yar, thompsa and myself. The checksum offloading part also involves
work done by Mihail Balikov.

The most important changes:

o Instead of global linked list of all vlan softc use a per-trunk
hash. The size of hash is dynamically adjusted, depending on
number of entries. This changes struct ifnet, replacing counter
of vlans with a pointer to trunk structure. This change is an
improvement for setups with big number of VLANs, several interfaces
and several CPUs. It is a small regression for a setup with a single
VLAN interface.
An alternative to dynamic hash is a per-trunk static array with
4096 entries, which is a compile time option - VLAN_ARRAY. In my
experiments the array is not an improvement, probably because such
a big trunk structure doesn't fit into CPU cache.
o Introduce an UMA zone for VLAN tags. Since drivers depend on it,
the zone is declared in kern_mbuf.c, not in optional vlan(4) driver.
This change is a big improvement for any setup utilizing vlan(4).
o Use rwlock(9) instead of mutex(9) for locking. We are the first
ones to do this! :)
o Some drivers can do hardware VLAN tagging + hardware checksum
offloading. Add an infrastructure for this. Whenever vlan(4) is
attached to a parent or parent configuration is changed, the flags
on vlan(4) interface are updated.

In collaboration with: yar, thompsa
In collaboration with: Mihail Balikov <mihail.balikov interbgc.com>


# 154708 23-Jan-2006 yar

Be consistent in checking ifa->ifa_addr for NULL.

Found by: Coverity Prevent (tm)
MFC after: 3 days


# 152315 11-Nov-2005 ru

- Store pointer to the link-level address right in "struct ifnet"
rather than in ifindex_table[]; all (except one) accesses are
through ifp anyway. IF_LLADDR() works faster, and all (except
one) ifaddr_byindex() users were converted to use ifp->if_addr.

- Stop storing a (pointer to) Ethernet address in "struct arpcom",
and drop the IFP2ENADDR() macro; all users have been converted
to use IF_LLADDR() instead.


# 152296 11-Nov-2005 ru

- Make IFP2ENADDR() a pointer to IF_LLADDR() rather than another
copy of Ethernet address.

- Change iso88025_ifattach() and fddi_ifattach() to accept MAC
address as an argument, similar to ether_ifattach(), to make
this work.


# 150845 03-Oct-2005 yar

Clean up consistency checks in if_setflag():
. use KASSERT for all checks so that the source of an error can be detected;
. use __func__ instead of spelling function name each time;
. fix a typo.


# 150844 02-Oct-2005 yar

Log a message about entering or leaving permanently promiscuous mode,
as it is done for usual promiscuous mode already. This info is important
because promiscuous mode in the hands of a malicious party can jeopardize
the whole network.


# 150296 18-Sep-2005 rwatson

Take a first cut at cleaning up ifnet removal and multicast socket
panics, which occur when stale ifnet pointers are left in struct
moptions hung off of inpcbs:

- Add in_ifdetach(), which matches in6_ifdetach(), and allows the
protocol to perform early tear-down on the interface early in
if_detach().

- Annotate that if_detach() needs careful consideration.

- Remove calls to in_pcbpurgeif0() in the handling of SIOCDIFADDR --
this is not the place to detect interface removal! This also
removes what is basically a nasty (and now unnecessary) hack.

- Invoke in_pcbpurgeif0() from in_ifdetach(), in both raw and UDP
IPv4 sockets.

It is now possible to run the msocket_ifnet_remove regression test
using HEAD without panicking.

MFC after: 3 days


# 150063 12-Sep-2005 rwatson

In netkqfilter(), return EINVAL instead of 1 (EPERM) when a filter type
is requested on a network interface file descriptor that is non-applicable.

MFC after: 3 days


# 149782 04-Sep-2005 sam

reclaim sbuf and clear lock on error in ifconf

Submitted by: Ted Unangst
Reviewed by: rwatson
MFC after: 3 days


# 149243 18-Aug-2005 brooks

When we started calling if_findindex() from if_alloc() with an empty
struct ifnet most of if_findindex() become a complex no-op. Remove it
and replace it with a corrected version of the four line for loop it
devolved to plus some error handling. This should probably be replaced
with subr_unit at some point.

Switch from checking ifaddr_byindex to ifnet_byindex when looking for
empty indexes. Since we're doing this from if_alloc/if_free, we can
only be sure that ifnet_byindex will be correct. This fixes panics when
loading the ef(4) module. The panics were caused by the fact that
if_alloc was called four time before if_attach was called and thus
ifaddr_byindex was not set and the same unit was allocated again. This
in turn caused the first if_attach to fail because the ifp was not the
one in ifnet_byindex(ifp->if_index).

Reported by: "Wojciech A. Koszek" <dunstan at freebsd dot czest dot pl>
PR: kern/84987
MFC After: 1 day


# 149141 16-Aug-2005 brooks

- Move IF_ADDR_LOCK_DESTROY(ifp) from if_free to if_free_type.
- Add a note that additions should be made to if_free_type and not
if_free to help avoid this in the future.

This apparently fixes a use after free in if_bridge and may fix bugs
in other direct if_free_type consumers.

Reported by: thompsa


# 148886 09-Aug-2005 rwatson

Rename IFF_RUNNING to IFF_DRV_RUNNING, IFF_OACTIVE to IFF_DRV_OACTIVE,
and move both flags from ifnet.if_flags to ifnet.if_drv_flags, making
and documenting the locking of these flags the responsibility of the
device driver, not the network stack. The flags for these two fields
will be mutually exclusive so that they can be exposed to user space as
though they were stored in the same variable.

Provide #defines to provide the old names #ifndef _KERNEL, so that user
applications (such as ifconfig) can use the old flag names. Using the
old names in a device driver will result in a compile error in order to
help device driver writers adopt the new model.

When exposing the interface flags to user space, via interface ioctls
or routing sockets, or the two fields together. Since the driver flags
cannot currently be set for user space, no new logic is currently
required to handle this case.

Add some assertions that general purpose network stack routines, such
as if_setflags(), are not improperly used on driver-owned flags.

With this change, a large number of very minor network stack races are
closed, subject to correct device driver locking. Most were likely
never triggered.

Driver sweep to follow; many thanks to pjd and bz for the line-by-line
review they gave this patch.

Reviewed by: pjd, bz
MFC after: 7 days


# 148799 06-Aug-2005 sam

destroy lock _before_ free'ing the structure it resides in


# 148696 04-Aug-2005 jhb

Initialize the if_addr mutex in if_alloc() rather than waiting until
if_attach(). This allows ethernet drivers to use it in their routines
to program their MAC filters before ether_ifattach() is called (de(4) is
one such driver). Also, the if_addr mutex is destroyed in if_free()
rather than if_detach(), so there was another potential bug in that a
driver that failed during attach and called if_free() without having
called ether_ifattach() would have tried to destroy an uninitialized mutex.

Reported by: Holm Tiffe holm at freibergnet dot de
Discussed with: rwatson


# 148652 02-Aug-2005 rwatson

Protect link layer network interface multicast address list manipulation
using ifp->if_addr_mtx:

- Initialize if_addr_mtx when ifnet is initialized.

- Destroy if_addr_mtx when ifnet is torn down.

- Rename ifmaof_ifpforaddr() to if_findmulti(); assert if_addr_mtx.
Staticize.

- Extract ifmultiaddr allocation and initialization into if_allocmulti();
accept a 'mflags' argument to indicate whether or not sleeping is
permitted. This centralizes error handling and address duplication.

- Extract ifmultiaddr tear-down and deallocation in if_freemulti().

- Re-structure if_addmulti() to hold if_addr_mtx around manipulation of
the ifnet multicast address list and reference count manipulation.
Make use of non-sleeping allocations. Annotate the fact that we only
generate routing socket events for explicit address addition, not
implicit link layer address addition.

- Re-structure if_delmulti() to hold if_addr_mtx around manipulation of
the ifnet multicast address list and reference count manipulation.
Annotate the lack of a routing socket event for implicit link layer
address removal.

- De-spl all and sundry.

Problem reported by: Ed Maste <emaste at phaedrus dot sandvine dot ca>
MFC after: 1 week


# 148153 19-Jul-2005 rwatson

In multicast routines:

Compare pointers with NULL rather than treating them as booleans.

Compare pointers with NULL rather than 0 to make it more clear
they are pointers.

Assign pointers value of NULL rather than 0 to make it more clear
they are pointers.

MFC after: 3 days


# 148152 19-Jul-2005 rwatson

Rename equal() macro to sa_equal(), which matches the definitions
of sa_equal() in other files, and makes it more clear what equal()
is comparing.

MFC after: 3 days


# 148010 14-Jul-2005 mlaier

Move eventhandler for 'ifnet_departure_event' at the end of the progress.
Some of the (IPv6) cleanup functions send packets to inform peers of the
departure. These packets confused users of ifnet_departure_event (pf at the
moment).

PR: kern/80627
Tested by: Divacky Roman
MFC after: 1 week


# 147986 14-Jul-2005 yar

MFp4:

- Introduce a helper function if_setflag() containing the code common
to ifpromisc() and if_allmulti() instead of duplicating the code poorly,
with different bugs.
- Call ifp->if_ioctl() in a consistent way: always use more compatible C
syntax and check whether ifp->if_ioctl is not NULL prior to the call.

MFC after: 1 month


# 147730 01-Jul-2005 ssouhlal

Fix the recent panics/LORs/hangs created by my kqueue commit by:

- Introducing the possibility of using locks different than mutexes
for the knlist locking. In order to do this, we add three arguments to
knlist_init() to specify the functions to use to lock, unlock and
check if the lock is owned. If these arguments are NULL, we assume
mtx_lock, mtx_unlock and mtx_owned, respectively.

- Using the vnode lock for the knlist locking, when doing kqueue operations
on a vnode. This way, we don't have to lock the vnode while holding a
mutex, in filt_vfsread.

Reviewed by: jmg
Approved by: re (scottl), scottl (mentor override)
Pointyhat to: ssouhlal
Will be happy: everyone


# 147470 17-Jun-2005 brooks

Spelling/grammer fixes in comment.

Reported by: Hans Petter Selasky <hselasky at c2i dot net>
Approved by: re (ifnet blanked)


# 147308 11-Jun-2005 brooks

Return NULL instead of a bogus pointer from if_alloc when if_com_alloc
fails.

Move detaching the ifnet from the ifindex_table into if_free so we can
both keep the sanity checks and actually delete the ifnets. [0]

Reported by: gallatin [0]
Approved by: re (blanket)


# 147256 10-Jun-2005 brooks

Stop embedding struct ifnet at the top of driver softcs. Instead the
struct ifnet or the layer 2 common structure it was embedded in have
been replaced with a struct ifnet pointer to be filled by a call to the
new function, if_alloc(). The layer 2 common structure is also allocated
via if_alloc() based on the interface type. It is hung off the new
struct ifnet member, if_l2com.

This change removes the size of these structures from the kernel ABI and
will allow us to better manage them as interfaces come and go.

Other changes of note:
- Struct arpcom is no longer referenced in normal interface code.
Instead the Ethernet address is accessed via the IFP2ENADDR() macro.
To enforce this ac_enaddr has been renamed to _ac_enaddr.
- The second argument to ether_ifattach is now always the mac address
from driver private storage rather than sometimes being ac_enaddr.

Reviewed by: sobomax, sam


# 147059 06-Jun-2005 brooks

Send link state change notifications to /dev/devctl. This is needed to
start the OpenBSD dhclient when links come up.


# 146986 05-Jun-2005 thompsa

Add hooks into the networking layer to support if_bridge. This changes struct
ifnet so a buildworld is necessary.

Approved by: mlaier (mentor)
Obtained from: NetBSD


# 146620 25-May-2005 peadar

Separate out address-detaching part of if_detach into if_purgeaddrs,
so if_tap doesn't need to rely on locally-rolled code to do same.

The observable symptom of if_tap's bzero'ing the address details
was a crash in "ifconfig tap0" after an if_tap device was closed.

Reported By: Matti Saarinen (mjsaarin at cc dot helsinki dot fi)


# 145320 20-Apr-2005 glebius

Do not call all link state callbacks directly, but schedule
a taskqueue(9) task. This fixes LORs and adds possibility
to serve such events pseudorecursively, when link state
change of interface causes subsequent change on other
interfaces.

Sponsored by: Rambler
Reviewed by: sam, brooks, mux


# 145095 14-Apr-2005 cperciva

Zero the ifr.ifr_name buffer in ifconf() in order to avoid
accidental disclosure of kernel memory to userland.

Security: FreeBSD-SA-05:04.ifconf


# 143881 20-Mar-2005 glebius

ifma_protospec is a pointer. Use NULL when assigning or compating it.


# 143464 12-Mar-2005 glebius

Add a sysctl net.link.log_link_state_change, which allows to
suppress logging of interface link state changes.

Requested by: sam, kan


# 142501 25-Feb-2005 brooks

Change the definition of struct if_data's member ifi_epoch from wall
clock time to uptime because wall clock time may go backwards.

This is a change in the API which will impact SNMP agents who are using
ifi_epoch to set RFC2233's ifCounterDiscontinuityTime. None are know to
exist today. This will not impact applications that are using the
<index, epoch> tuple to verify interface uniqueness except that it
eliminates a race which could lead to a false assumption of uniqueness.

Because this is a behavior change, bump __FreeBSD_version.

Discussed with: re (jhb, scottl)
MFC after: 3 days
Pointed out by: pkh (way back at EuroBSDCon)
Pointy hat: brooks


# 142240 22-Feb-2005 glebius

Typo in comment.


# 142228 22-Feb-2005 glebius

- In if_link_state_change() extract function body from if-block, to improve
readability.
- Call carp_carpdev_state() from if_link_state_change() if interface has
associated CARP interface.

Sponsored by: Rambler


# 142215 22-Feb-2005 glebius

Add CARP (Common Address Redundancy Protocol), which allows multiple
hosts to share an IP address, providing high availability and load
balancing.

Original work on CARP done by Michael Shalayeff, with many
additions by Marco Pfatschbacher and Ryan McBride.

FreeBSD port done solely by Max Laier.

Patch by: mlaier
Obtained from: OpenBSD (mickey, mcbride)


# 141871 14-Feb-2005 delphij

Forced commit to clarify that the previous commit should read:

Security: This prevents a local DoS that can be exploited by
Security: both privileged and unprivileged users.


# 141749 12-Feb-2005 delphij

Validate ifc->ifc_len before submitting its incarnation to sbuf_new,
which will finally lead to kernel panic.

Security: This prevents a local (root-launched) DoS
Submitted by: Wojciech A. Koszek [dunstan at freebsd czest pl]
PR: 77421
MFC After: 1 week


# 141051 30-Jan-2005 glebius

Log changes of link state.

Reviewed by: rwatson


# 139903 08-Jan-2005 glebius

This change adds reliability for Ethernet trunks built with ng_one2many:

- Introduce another ng_ether(4) callback ng_ether_link_state_p, which
is called from if_link_state_change(), every time link is changed.
- In ng_ether_link_state() send netgraph control message notifying
of link state change to a node connected to "lower" hook.

Reviewed by: sam
MFC after: 2 weeks


# 139823 06-Jan-2005 imp

/* -> /*- for license, minor formatting changes


# 138542 08-Dec-2004 sam

Cleanup link state change notification:
o add new if_link_state_change routine that deals with link state changes
o change mii to use if_link_state_change


# 138239 30-Nov-2004 mlaier

Implement the check I was talking about in the previous message already.
Introduce domain_init_status to keep track of the init status of the domains
list (surprise). 0 = uninitialized, 1 = initialized/unpopulated, 2 =
initialized/done. Higher values can be used to support late addition of
domains which right now "works", but is potential dangerous. I choose to
only give a warning when doing so.

Use domain_init_status with if_attachdomain[1]() to ensure that we have a
complete domains list when we init the if_afdata array. Store the current
value of domain_init_status in if_afdata_initialized. This way we can update
if_afdata after a new protocol has been added (once that is allowed).

Submitted by: se (with changes)
Reviewed by: julian, glebius, se
PR: kern/73321 (partly)


# 138039 23-Nov-2004 rwatson

Assign if_broadcastaddr to NULL not 0 in if_attach().

Printf() a warning if if_attachdomain() is called more than once on an
interface to generate some noise on mailing lists when this occurs.

Fix up style in if_start(), where spaces crept in instead of tabs at
some point.

MFC after: 1 week
MFC note: Not the printf().


# 137065 30-Oct-2004 rwatson

Move if_handoff() from an inline in if_var.h to a function to if.c
in orden to harden the ABI for 5.x; this will permit us to modify
the locking in the ifnet packet dispatch without requiring drivers
to be recompiled.

MFC after: 3 days
Discussed at: EuroBSDCon Developer's Summit


# 136704 19-Oct-2004 rwatson

Define IFF_LOCKGIANT() and IFF_UNLOCKGIANT() macros, which conditionally
acquire Giant if the passed interface has IFF_NEEDSGIANT set on it.
Modify calls into (ifp)->if_ioctl() in if.c to use these macros in order
to ensure that Giant is held.

MFC after: 3 days
Bumped into by: jmg


# 135570 22-Sep-2004 green

Call sbuf_finish() before sbuf_data() so as to not panic the system.


# 135568 22-Sep-2004 brooks

Fix a LOR where ifconf() used copyout while holding a mutex. This LOR
was seen when configuring addresses on interfaces using ifconfig. This
patch has been verified to work with over eight thousand addresses
assigned to an interface.

LOR id: 031


# 135416 18-Sep-2004 brooks

Log the renaming of an interface. This should make it easier to follow
kernel log files.


# 134933 08-Sep-2004 brooks

Re-add ifi_epoch, to struct if_data, this time replacing ifi_unused
to avoid ABI changes. It is set to the last time the interface
counters were zeroed, currently the time if_attach() was called. It is
intentended to be a valid value for RFC2233's ifCounterDiscontinuityTime
and to make it easier for applications to verify that the interface they
find at a given index is the one that was there last time they looked.

Due to space constraints ifi_epoch is a time_t rather then a struct
timeval. SNMP would prefer higher precision, but this unlikely to be
useful in practice.


# 134859 06-Sep-2004 jmg

don't call f_detach if the filter has alread removed the knote.. This
happens when a proc exits, but needs to inform the user that this has
happened.. This also means we can remove the check for detached from
proc and sig f_detach functions as this is doing in kqueue now...

MFC after: 5 days


# 134630 02-Sep-2004 brooks

Back out ifi_epoch. The ABI breakage is too disruptive this close to
5-STABLE. ifi_epoch will shortly be reintroduced with less precistion
using the space currently allocated to ifi_unused.


# 134614 01-Sep-2004 mlaier

Fix an assertion when if_down()ing a ALTQ managed interface. The lock should
have been in place all the time the mtx_assert in the ALTQ code just
discovered the shortcoming.

PR: i386/71195
Tested by: Bettan (PR originator), myself
MFC after: 5 days


# 134609 01-Sep-2004 brooks

Use a spare byte in struct if_data to store the structure size without
increasing it. Add code to ifconfig to use this size to find the
sockaddr_dl after the struct if_data in the routing message. This
allows struct if_data to grow (up to 255 bytes) without breaking
ifconfig.

Submitted by: peter


# 134514 30-Aug-2004 brooks

Add a new variable, ifi_epoch, to struct if_data. It is set to the last
time the interface counters were zeroed, currently the time if_attach()
was called. It is indentended to be a valid value for RFC2233's
ifCounterDiscontinuityTime and to make it easier for applications to
verify that the interface they find at a given index is the one that was
there last time they looked.

An if_epoch "compatability" macro has not been created as ifi_epoch has
never been a member of struct ifnet.

Approved by: andre, bms, wollman


# 134399 27-Aug-2004 brooks

When detaching an interface, don't leave an obsolete pointer to the
soon to be deleted struct ifnet around.

PR: kern/52260
MFC After: 3 days


# 133741 15-Aug-2004 jmg

Add locking to the kqueue subsystem. This also makes the kqueue subsystem
a more complete subsystem, and removes the knowlege of how things are
implemented from the drivers. Include locking around filter ops, so a
module like aio will know when not to be unloaded if there are outstanding
knotes using it's filter ops.

Currently, it uses the MTX_DUPOK even though it is not always safe to
aquire duplicate locks. Witness currently doesn't support the ability
to discover if a dup lock is ok (in some cases).

Reviewed by: green, rwatson (both earlier versions)


# 133200 06-Aug-2004 roam

Do not attempt to clean up data that has not been initialized yet.
This fixes two kernel panics on boot when the xl driver fails to
allocate bus/port/memory resources.

Reviewed by: silence on -net


# 132712 27-Jul-2004 rwatson

Add a new network interface flag, IFF_NEEDSGIANT, which will allow
device drivers to declare that the ifp->if_start() method implemented
by the driver requires Giant in order to operate correctly.

Add a 'struct task' to 'struct ifnet' that can be used to execute a
deferred ifp->if_start() in the event that if_start needs to be called
in a Giant-free environment. To do this, introduce if_start(), a
wrapper function for ifp->if_start(). If the interface can run MPSAFE,
it directly dispatches into the interface start routine. If it can't
run MPSAFE, we're running with debug.mpsafenet != 0, and Giant isn't
currently held, the task is queued to execute in a swi holding Giant
via if_start_deferred().

Modify if_handoff() to use if_start() instead of direct dispatch.
Modify 802.11 to use if_start() instead of direct dispatch.

This is intended to provide increased compatibility for non-MPSAFE
network device drivers in the presence of Giant-free operation via
asynchronous dispatch. However, this commit does not mark any network
interfaces as IFF_NEEDSGIANT.


# 132362 18-Jul-2004 rwatson

Gratuitous whitespace change to un-wrap a short line.


# 130933 22-Jun-2004 brooks

Major overhaul of pseudo-interface cloning. Highlights include:

- Split the code out into if_clone.[ch].
- Locked struct if_clone. [1]
- Add a per-cloner match function rather then simply matching names of
the form <name><unit> and <name>.
- Use the match function to allow creation of <interface>.<tag>
vlan interfaces. The old way is preserved unchanged!
- Also the match function to allow creation of stf(4) interfaces named
stf0, stf, or 6to4. This is the only major user visible change in
that "ifconfig stf" creates the interface stf rather then stf0 and
does not print "stf0" to stdout.
- Allow destroy functions to fail so they can refuse to delete
interfaces. Currently, we forbid the deletion of interfaces which
were created in the init function, particularly lo0, pflog0, and
pfsync0. In the case of lo0 this was a panic implementation so it
does not count as a user visiable change. :-)
- Since most interfaces do not need the new functionality, an family of
wrapper functions, ifc_simple_*(), were created to wrap old style
cloner functions.
- The IF_CLONE_INITIALIZER macro is replaced with a new incompatible
IFC_CLONE_INITIALIZER and ifc_simple consumers use IFC_SIMPLE_DECLARE
instead.

Submitted by: Maurycy Pawlowski-Wieronski <maurycy at fouk.org> [1]
Reviewed by: andre, mlaier
Discussed on: net


# 130585 16-Jun-2004 phk

Do the dreaded s/dev_t/struct cdev */
Bump __FreeBSD_version accordingly.


# 130508 14-Jun-2004 mlaier

Transform tbr_dequeue into a function pointer in order to build drivers with
ALTQ enabled versions of IFQ_* macros by default, as requested by serveral
others. This is a follow-up to the quick fix I committed yesterday which
turned off the ALTQ checks for non-ALTQ kernels.


# 130416 13-Jun-2004 mlaier

Link ALTQ to the build and break with ABI for struct ifnet. Please recompile
your (network) modules as well as any userland that might make sense of
sizeof(struct ifnet).
This does not change the queueing yet. These changes will follow in a
seperate commit. Same with the driver changes, which need case by case
evaluation.

__FreeBSD_version bump will follow.

Tested-by: (i386)LINT


# 128618 24-Apr-2004 luigi

arpcom untangling:

consistently with the rest of the code, use IFP2AC(ifp) to access
the arpcom structure given the ifp.

In this case also fix a difference in assumptions WRT the rest of
the net/ sources: it is not the 'struct *softc' that starts with a
'struct arpcom', but a 'struct arpcom' that starts with a
'struct ifnet'


# 128432 19-Apr-2004 luigi

Fix a recently introduced panic in if_detach() by delaying
the invalidation of ifindex_table[] entry. Probably this
code should be moved even further down, but for the time being
let's do it this way.


# 128407 18-Apr-2004 mlaier

Make if_(un)route static in if.c as they are called from if_up/if_down only.
This is also cleanup to make locking easier.

Reviewed by: luigi
Approved by: bms(mentor)


# 128316 16-Apr-2004 luigi

Use if_link instead of the alias if_list, and change a for() into
the TAILQ_FOREACH() form.

Comment the need to store the same info (mac address for ethernet-type
devices) in two different places.

No functional changes. Even the compiler output should be unmodified
by this change.


# 128311 16-Apr-2004 luigi

Consistently use ifaddr_byindex() to access the link-level address
of an interface. No functional change.

On passing, comment a likely bug in net/rtsock.c:sysctl_ifmalist()
which, if confirmed, would deserve to be fixed and MFC'ed


# 128019 07-Apr-2004 imp

Remove advertising clause from University of California Regent's
license, per letter dated July 22, 1999 and email from Peter Wemm,
Alan Cox and Robert Watson.

Approved by: core, peter, alc, rwatson


# 126901 13-Mar-2004 brooks

Don't allow interfaces to be renamed to the empty string.
While I'm here, errors aren't bools.

Pointed out by: hmp


# 126900 13-Mar-2004 brooks

Remove if_withname. It came in with the KAME import, but never got
used. Should someone need its functionality, it's a really expensive
implementation of:
ifnet_byindex(sdl->sdl_index)

Reviewed by: bde, ume


# 126264 26-Feb-2004 mlaier

Bring eventhandler callbacks for pf.
This enables pf to track dynamic address changes on interfaces (dailup) with
the "on (<ifname>)"-syntax. This also brings hooks in anticipation of
tracking cloned interfaces, which will be in future versions of pf.

Approved by: bms(mentor)


# 126080 21-Feb-2004 phk

Device megapatch 4/6:

Introduce d_version field in struct cdevsw, this must always be
initialized to D_VERSION.

Flip sense of D_NOGIANT flag to D_NEEDGIANT, this involves removing
four D_NOGIANT flags and adding 145 D_NEEDGIANT flags.


# 126064 21-Feb-2004 yar

Minor beautifications related to style(9) and code consistency.
No functional changes.


# 126062 21-Feb-2004 yar

Improve the SIOCSIFCAP handler a bit:
- allow for ifp->if_ioctl being NULL, as the rest of ifioctl() does;
- give the interface driver a chance to report a error to the caller;
- don't forget to update ifp->if_lastchange upon successful modification
of interface operation parameters.


# 125411 04-Feb-2004 brooks

Add the kernel side of network interface renaming support.

The basic process is to send a routing socket announcement that the
interface has departed, change if_xname, update the sockaddr_dl
associated with the interface, and announce the arrival of the interface
on the routing socket.

As part of this change, ifunit() is greatly simplified by testing
if_xname directly. if_clone_destroy() now uses if_dname to look up the
cloner for the interface and if_dunit to identify the unit number.

Reviewed by: ru, sam (concept)
Vincent Jardin <vjardin AT free.fr>
Max Laier <max AT love2party.net>


# 125345 02-Feb-2004 brooks

More macro cleanup. Use the system roundup2() macro instead of making
our own ROUNDUP() macro.

Suggested by: bde


# 125109 27-Jan-2004 brooks

Cleanup malloc() use in if_attach():
- malloc() returns a void* and does not need a cast
- when called with M_WAITOK, malloc() can not return NULL so don't
check for that case. The result of the check was bogus anyway since
it would leave the interface broken.


# 125062 27-Jan-2004 brooks

Clean up macro usage in if_attach():
- Use the system offsetof macro rather then making out own.
- undef ROUND after we use it rather then polluting the whole file.


# 124872 23-Jan-2004 ru

Don't panic if there are more than 255 interfaces in the system.


# 123875 26-Dec-2003 green

Don't truncate the interface name in ifunit(). It's now possible to query
"very long interface names", e.g.:
ndis_atheros0: flags=8847<UP,BROADCAST,DEBUG,RUNNING,SIMPLEX,MULTICAST> mtu 1500


# 121816 31-Oct-2003 brooks

Replace the if_name and if_unit members of struct ifnet with new members
if_xname, if_dname, and if_dunit. if_xname is the name of the interface
and if_dname/unit are the driver name and instance.

This change paves the way for interface renaming and enhanced pseudo
device creation and configuration symantics.

Approved By: re (in principle)
Reviewed By: njl, imp
Tested On: i386, amd64, sparc64
Obtained From: NetBSD (if_xname)


# 121777 30-Oct-2003 brooks

Replace a couple printfs with if_printfs.


# 121470 24-Oct-2003 ume

Since dp->dom_ifattach calls malloc() with M_WAITOK, we cannot
use mutex lock directly here. Protect ifp->if_afdata instead.

Reported by: grehan


# 121422 23-Oct-2003 des

Clean up whitespace, remove "register" keyword, ANSIfy.
No functional changes.


# 121341 22-Oct-2003 ume

protect by IFNET_RLOCK.


# 121161 17-Oct-2003 ume

- add dom_if{attach,detach} framework.
- transition to use ifp->if_afdata.

Obtained from: KAME


# 121135 16-Oct-2003 ume

AF_LINK sockaddr has to be attached to ifp->if_addrlist until the
end, as many of the code assumes that TAILQ_FIRST(ifp->if_addrlist)
is non-null.

Submitted by: itojun


# 120727 04-Oct-2003 sam

Locking for updates to routing table entries. Each rtentry gets a mutex
that covers updates to the contents. Note this is separate from holding
a reference and/or locking the routing table itself.

Other/related changes:

o rtredirect loses the final parameter by which an rtentry reference
may be returned; this was never used and added unwarranted complexity
for locking.
o minor style cleanups to routing code (e.g. ansi-fy function decls)
o remove the logic to bump the refcnt on the parent of cloned routes,
we assume the parent will remain as long as the clone; doing this avoids
a circularity in locking during delete
o convert some timeouts to MPSAFE callouts

Notes:

1. rt_mtx in struct rtentry is guarded by #ifdef _KERNEL as user-level
applications cannot/do-no know about mutex's. Doing this requires
that the mutex be the last element in the structure. A better solution
is to introduce an externalized version of struct rtentry but this is
a major task because of the intertwining of rtentry and other data
structures that are visible to user applications.
2. There are known LOR's that are expected to go away with forthcoming
work to eliminate many held references. If not these will be resolved
prior to release.
3. ATM changes are untested.

Sponsored by: FreeBSD Foundation
Obtained from: BSD/OS (partly)


# 120559 28-Sep-2003 phk

I don't know from where the notion that device driver should or
even could call VOP_REVOKE() on vnodes associated with its dev_t's
has originated, but it stops right here.

If there are things people belive destroy_dev() needs to learn how to
do, please tell me about it, preferably with a reproducible test case.

Include <sys/uio.h> in bluetooth code rather than rely on <sys/vnode.h>
to do so.

The fact that some of the USB code needs to include <sys/vnode.h>
still disturbs me greatly, but I do not have time to chase that.


# 117786 19-Jul-2003 ume

Disabling multicast on vlan interface caused kernel panic.

PR: kern/40723
Submitted by: Hideki ONO <ono@kame.net>
MFC after: 1 week


# 114293 30-Apr-2003 markm

Fix some easy, global, lint warnings. In most cases, this means
making some local variables static. In a couple of cases, this means
removing an unused variable.


# 112451 20-Mar-2003 jhb

Use td->td_ucred instead of td->td_proc->p_ucred.


# 112037 09-Mar-2003 phk

Note that MAJOR_AUTO is now the default if d_maj is not initialized. This
is more robust and prevents the hijacking of /dev/console for the typical
mistake.

Remove unneeded MAJOR_AUTO uses, it is only needed explicitly now if the
driver source has cross-branch compatibility to old releases.


# 111821 03-Mar-2003 phk

Make nokqfilter() return the correct return value.

Ditch the D_KQFILTER flag which was used to prevent calling NULL pointers.


# 111815 03-Mar-2003 phk

Gigacommit to improve device-driver source compatibility between
branches:

Initialize struct cdevsw using C99 sparse initializtion and remove
all initializations to default values.

This patch is automatically generated and has been tested by compiling
LINT with all the fields in struct cdevsw in reverse order on alpha,
sparc64 and i386.

Approved by: re(scottl)


# 111678 28-Feb-2003 mux

Make the network /dev entries use MAJOR_AUTO.


# 111119 19-Feb-2003 imp

Back out M_* changes, per decision of the TRB.

Approved by: trb


# 109771 23-Jan-2003 fjoe

- add support for IPX (tested with mount -t nwfs and mars_nwe),
IP fast forwarding, SIOCGIFADDR, setting hardware address (not currently
enabled in cm driver), multicasts (experimental)
- add ARC_MAX_DATA, use IF_HANDOFF, remove arc_sprintf() and some unused
variables
- if_simloop logic is made more similar to ethernet
- drop not ours packets early (if we are not in promiscous mode)

Submitted by: mark tinguely (partially)


# 109623 21-Jan-2003 alfred

Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.


# 108250 24-Dec-2002 hsu

SMP locking for radix nodes.


# 108172 22-Dec-2002 hsu

SMP locking for ifnet list.


# 108033 18-Dec-2002 hsu

Lock up ifaddr reference counts.


# 106957 15-Nov-2002 sam

Back out rev 1.150; things are more complicated than this.


# 106955 15-Nov-2002 sam

if_attach should not sleep; change malloc's M_WAITOK to M_NOWAIT


# 103900 24-Sep-2002 brooks

Add a new helper function if_printf() modeled on device_printf(). The
function takes a struct ifnet pointer followed by the usual printf
arguments and prints "<interfacename>: " before the results of printf.
Since this is the primary form of printf calls in network device drivers
and accounts for most uses of the ifnet menber if_unit, this
significantly simplifies many printf()s.


# 102118 19-Aug-2002 jmallett

Clean up a comment talking about C strings, which are terminated with the
ASCII NUL character (0, or '\0' in C).


# 102100 19-Aug-2002 sobomax

Previous deltas (promisc mode) were a subject of:

MFC after: 1 week


# 102099 19-Aug-2002 sobomax

Implement user-setable promiscuous mode (a new `promisc' flag for ifconfig(8)).
Also, for all interfaces in this mode pass all ethernet frames to upper layer,
even those not addressed to our own MAC, which allows packets encapsulated
in those frames be processed with packet filters (ipfw(8) et al).

Emphatically requested by: Anton Turygin <pa3op@ukr-link.net>
Valuable suggestions by: fenner


# 102052 18-Aug-2002 sobomax

Increase size of ifnet.if_flags from 16 bits (short) to 32 bits (int). To avoid
breaking application ABI use unused ifreq.ifru_flags[1] for upper 16 bits in
SIOCSIFFLAGS and SIOCGIFFLAGS ioctl's.

Reviewed by: -hackers, -net


# 101184 01-Aug-2002 rwatson

Introduce support for Mandatory Access Control and extensible
kernel access control.

Introduce two ioctls, SIOCGIFMAC, SIOCSIFMAC, which permit user
processes to manage the MAC labels on network interfaces. Note
that this is part of the user process API/ABI that will be revised
prior to 5.0-RELEASE.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


# 101079 31-Jul-2002 rwatson

Introduce support for Mandatory Access Control and extensible
kernel access control.

Instrument the interface management code so that MAC labels are
properly maintained on network interfaces (struct ifnet). In
particular, invoke entry points when interfaces are created and
removed. MAC policies may initialized the label interface based
on a variety of factors, including the interface name.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


# 99250 02-Jul-2002 mini

Check retifma for NULL before using it.

PR: kern/9391
Submitted by: Assar Westerlund <assar@sics.se>
MFC after: 3 days


# 97289 25-May-2002 brooks

Move all unit number management cloned interfaces into the cloning
code. The reverts the API change which made the <if>_clone_destory()
functions return an int instead of void bringing us into closer
alignment with NetBSD.

Reviewed by: net (a long time ago)


# 95023 19-Apr-2002 suz

just merged cosmetic changes from KAME to ease sync between KAME and FreeBSD.
(based on freebsd4-snap-20020128)

Reviewed by: ume
MFC after: 1 week


# 94348 10-Apr-2002 peter

Add missing 'struct ifreq ifr;' that was forgotten in the last commit.


# 94344 10-Apr-2002 suz

fixed a kernel crash when enabling multicast on vlan interface
owing to a NULL argument to vlan_ioctl() at if_allmulti().

Reviewed by: ume
MFC after: 1 week


# 93818 04-Apr-2002 jhb

Change callers of mtx_init() to pass in an appropriate lock type name. In
most cases NULL is passed, but in some cases such as network driver locks
(which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used.

Tested on: i386, alpha, sparc64


# 93593 01-Apr-2002 jhb

Change the suser() API to take advantage of td_ucred as well as do a
general cleanup of the API. The entire API now consists of two functions
similar to the pre-KSE API. The suser() function takes a thread pointer
as its only argument. The td_ucred member of this thread must be valid
so the only valid thread pointers are curthread and a few kernel threads
such as thread0. The suser_cred() function takes a pointer to a struct
ucred as its first argument and an integer flag as its second argument.
The flag is currently only used for the PRISON_ROOT flag.

Discussed on: smp@


# 93546 01-Apr-2002 ume

Make `route add -inet6 default ::1 -ifp gif0' work actually.
The change between 1.13 and 1.14 is specific to AF_INET.

MFC after: 1 week


# 92725 19-Mar-2002 alfred

Remove __P.


# 92081 11-Mar-2002 mux

Simplify the interface cloning framework by handling unit
unit allocation with a bitmap in the generic layer. This
allows us to get rid of the duplicated rman code in every
clonable interface.

Reviewed by: brooks
Approved by: phk


# 91699 05-Mar-2002 green

Use revoke_and_destroy_dev() instead of destroy_dev() when removing /dev/net
pseudo-devices when an interface goes away. Otherwise, an open /dev/net/foo0
when the interface is removed can cause a crash.

Not objected to by: jlemon


# 91647 04-Mar-2002 brooks

Change the network interface cloning API so the destroy function returns
an int errorcode instead of void in preperation for merging cloning of
the loopback device.

Submitted by: mux
MFC after: 2 weeks


# 91406 27-Feb-2002 jhb

Simple p_ucred -> td_ucred changes to start using the per-thread ucred
reference.


# 91266 25-Feb-2002 peter

Fix a warning by pulling prototype for arp_ifinit() into scope.
Then fix cast the correct value into an incorrect value, which was not
detected due to the missing prototype (but was harmless anyway).


# 90875 18-Feb-2002 luigi

When the local link address is changed, send out gratuitous ARPs
to notify other nodes about the address change. Otherwise, they
might try and keep using the old address until their arp table
entry times out and the address is refreshed.

Maybe this ought to be done for INET6 addresses as well but i have
no idea how to do it. It should be pretty straightforward though.

MFC-after: 10 days


# 89498 18-Jan-2002 ru

Introduce an interface announcement message for the routing
socket so that routing daemons and other interested parties
know when an interface is attached/detached.

PR: kern/33747
Obtained from: NetBSD
MFC after: 2 weeks


# 85079 17-Oct-2001 jlemon

Add a SIOCGIFINDEX ioctl, which returns the index of a named interface.
This will be used to more efficiently support if_nametoindex(3).


# 85077 17-Oct-2001 jlemon

Cleanup ifunit(), so it uses the dev_named() function to map an interface
name into a device.


# 85074 17-Oct-2001 ru

Pull post-4.4BSD change to sys/net/route.c from BSD/OS 4.2.

Have sys/net/route.c:rtrequest1(), which takes ``rt_addrinfo *''
as the argument. Pass rt_addrinfo all the way down to rtrequest1
and ifa->ifa_rtrequest. 3rd argument of ifa->ifa_rtrequest is now
``rt_addrinfo *'' instead of ``sockaddr *'' (almost noone is
using it anyways).

Benefit: the following command now works. Previously we needed
two route(8) invocations, "add" then "change".
# route add -inet6 default ::1 -ifp gif0

Remove unsafe typecast in rtrequest(), from ``rtentry *'' to
``sockaddr *''. It was introduced by 4.3BSD-Reno and never
corrected.

Obtained from: BSD/OS, NetBSD
MFC after: 1 month
PR: kern/28360


# 85050 17-Oct-2001 ru

Revision 1.13 corresponded to CSRG revision 8.4.
Revision 1.59 corresponded to CSRG revision 8.5.


# 85042 17-Oct-2001 fenner

if_index is the highest interface index in the system, not the next
available index.


# 84931 14-Oct-2001 fjoe

bring in ARP support for variable length link level addresses

Reviewed by: jdp
Approved by: jdp
Obtained from: NetBSD
MFC after: 6 weeks


# 84817 11-Oct-2001 jlemon

Fix the ``WARNING: Driver mistake: repeat make_dev'', caused by using
the wrong index variable within a loop. I have no idea how this managed
to work on my test box.

Spotted by: fenner


# 84787 11-Oct-2001 jlemon

Move device nodes into a /dev/net/ directory, to avoid conflict with
existing devices (e.g.: tunX). This may need a little more thought.

Create a /dev/netX alias for devices. net0 is reserved.

Allow wiring of net aliases in /boot/device.hints of the form:
hint.net.1.dev="lo0"
hint.net.12.ether="00:a0:c9:c9:9d:63"


# 84139 29-Sep-2001 jlemon

Add ability to attach knotes to network devices.
Introduce EVFILT_NETDEV to report network device changes.


# 84106 29-Sep-2001 jlemon

Introduce network device nodes. Network devices will now automatically
appear in /dev. Interface hardware ioctls (not protocol or routing) can
be performed on the descriptor. The SIOCGIFCONF ioctl may be performed
on the special /dev/network node.


# 83624 18-Sep-2001 jlemon

Add two fields to the ifnet structure indicating what extra capabilities
a network device has, and which ones are enabled.


# 83366 12-Sep-2001 julian

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


# 83185 07-Sep-2001 jlemon

Fix another shortcircuit return() statement that I missed.


# 83184 07-Sep-2001 jlemon

Fix sense of comparison in space test. Also eliminate a compile
warning and remove a previously existing off-by-one error.


# 83130 06-Sep-2001 jlemon

Wrap array accesses in macros, which also happen to be lvalues:

ifnet_addrs[i - 1] -> ifaddr_byindex(i)
ifindex2ifnet[i] -> ifnet_byindex(i)

This is intended to ease the conversion to SMPng.


# 83129 05-Sep-2001 jlemon

Cosmetic cleanups and rearrangement for code to come. There should be
no functional change in this commit.


# 79103 02-Jul-2001 brooks

Add kernel infrastructure for network device cloning.

Reviewed by: ru, ume
Obtained from: NetBSD
MFC after: 1 week


# 78064 11-Jun-2001 ume

Sync with recent KAME.
This work was based on kame-20010528-freebsd43-snap.tgz and some
critical problem after the snap was out were fixed.
There are many many changes since last KAME merge.

TODO:
- The definitions of SADB_* in sys/net/pfkeyv2.h are still different
from RFC2407/IANA assignment because of binary compatibility
issue. It should be fixed under 5-CURRENT.
- ip6po_m member of struct ip6_pktopts is no longer used. But, it
is still there because of binary compatibility issue. It should
be removed under 5-CURRENT.

Reviewed by: itojun
Obtained from: KAME
MFC after: 3 weeks


# 76083 27-Apr-2001 fenner

Better handling of ioctl(SIOCSIFFLAGS) failing in ifpromisc():
- Don't print the "promiscuous mode (enabled|disabled)" on failure
- Restore the reference count on failure


# 75179 04-Apr-2001 yar

Change the type of the VLAN interface from IFT_PROPVIRTUAL,
which was a temporary hack, to IFT_L2VLAN, which is the type
assigned by IANA.


# 75096 02-Apr-2001 brian

If ifpromisc() fails the SIOCSIFFLAGS ioctl, put ifp->if_flags
back the way we found them.


# 74913 28-Mar-2001 jhb

Use mtx_initiaalized() rather than violating the internals of the mutex
structure.


# 74852 27-Mar-2001 yar

Don't bypass notifying a corresponding interface
when leaving a link-layer multicast group.

PR: kern/22176
Reviewed by: wollman


# 72786 21-Feb-2001 rwatson

o Move per-process jail pointer (p->pr_prison) to inside of the subject
credential structure, ucred (cr->cr_prison).
o Allow jail inheritence to be a function of credential inheritence.
o Abstract prison structure reference counting behind pr_hold() and
pr_free(), invoked by the similarly named credential reference
management functions, removing this code from per-ABI fork/exit code.
o Modify various jail() functions to use struct ucred arguments instead
of struct proc arguments.
o Introduce jailed() function to determine if a credential is jailed,
rather than directly checking pointers all over the place.
o Convert PRISON_CHECK() macro to prison_check() function.
o Move jail() function prototypes to jail.h.
o Emulate the P_JAILED flag in fill_kinfo_proc() and no longer set the
flag in the process flags field itself.
o Eliminate that "const" qualifier from suser/p_can/etc to reflect
mutex use.

Notes:

o Some further cleanup of the linux/jail code is still required.
o It's now possible to consider resolving some of the process vs
credential based permission checking confusion in the socket code.
o Mutex protection of struct prison is still not present, and is
required to protect the reference count plus some fields in the
structure.

Reviewed by: freebsd-arch
Obtained from: TrustedBSD Project


# 72084 06-Feb-2001 phk

Convert if_multiaddrs from LIST to TAILQ so that it can be traversed
backwards in the three drivers which want to do that.

Reviewed by: mikeh


# 72012 04-Feb-2001 phk

Another round of the <sys/queue.h> FOREACH transmogriffer.

Created with: sed(1)
Reviewed by: md5(1)


# 71999 04-Feb-2001 phk

Mechanical change to use <sys/queue.h> macro API instead of
fondling implementation details.

Created with: sed(1)
Reviewed by: md5(1)


# 71959 03-Feb-2001 phk

Use <sys/queue.h> macro api rather than fondle its implementation detals.

Created with: /usr/bin/sed
Reviewed by: /sbin/md5


# 71853 30-Jan-2001 jasone

Revert mutex initialization check to look at mtx_description.

Pointed out by: jlemon, jhb


# 71352 21-Jan-2001 jasone

Move most of sys/mutex.h into kern/kern_mutex.c, thereby making the mutex
inline functions non-inlined. Hide parts of the mutex implementation that
should not be exposed.

Make sure that WITNESS code is not executed during boot until the mutexes
are fully initialized by SI_SUB_MUTEX (the original motivation for this
commit).

Submitted by: peter


# 69781 08-Dec-2000 dwmalone

Convert more malloc+bzero to malloc+M_ZERO.

Submitted by: josh@zipperup.org
Submitted by: Robert Drehmel <robd@gmx.net>


# 69152 25-Nov-2000 jlemon

Lock down the network interface queues. The queue mutex must be obtained
before adding/removing packets from the queue. Also, the if_obytes and
if_omcasts fields should only be manipulated under protection of the mutex.

IF_ENQUEUE, IF_PREPEND, and IF_DEQUEUE perform all necessary locking on
the queue. An IF_LOCK macro is provided, as well as the old (mutex-less)
versions of the macros in the form _IF_ENQUEUE, _IF_QFULL, for code which
needs them, but their use is discouraged.

Two new macros are introduced: IF_DRAIN() to drain a queue, and IF_HANDOFF,
which takes care of locking/enqueue, and also statistics updating/start
if necessary.


# 67164 15-Oct-2000 phk

Remove unneeded #include <machine/clock.h>


# 66640 04-Oct-2000 itojun

make sure we have root priv on SIOCSIFPHY*. from thorpej@netbsd


# 64651 14-Aug-2000 archie

Export the functionality of SIOCSIFLLADDR with if_setlladdr()
and add some more rigorous sanity checking in the process.

Reviewed by: freebsd-net


# 63241 15-Jul-2000 itojun

improve route/nd cache cleanup on interface removal.
CAVEAT: haven't really tested it yet, please report


# 62290 30-Jun-2000 archie

Previous commit didn't work; this time really fix it.


# 62264 29-Jun-2000 archie

Fix kernel build breakage when 'device ether' was not included.


# 62143 26-Jun-2000 archie

Make the ng_ether(4) node type dynamically loadable like the rest.
This means 'options NETGRAPH' is no longer necessary in order to get
netgraph-enabled Ethernet interfaces. This supports loading/unloading
the ng_ether.ko and attaching/detaching the Ethernet interface in any
order.

Add two new hooks 'upper' and 'lower' to allow access to the protocol
demux engine and the raw device, respectively. This enables bridging
to be defined as a netgraph node, if so desired.

Reviewed by: freebsd-net@freebsd.org


# 61734 16-Jun-2000 wpaul

Implement SIOCSIFLLADDR, which allows you to change the link-level
address on an interface. This basically allows you to do what my
little setmac module/utility does via ifconfig. This involves the
following changes:

socket.h: define SIOCSIFLLADDR
if.c: add support for SIOCSIFLLADDR, which resets the values in
the arpcom struct and sockaddr_dl for the specified interface.
Note that if the interface is already up, we need to down/up
it in order to program the underlying hardware's receive filter.
ifconfig.c: add lladdr command
ifconfig.8: document lladdr command

You can now force the MAC address on any ethernet interface to be
whatever you want. (The change is not sticky across reboots of course:
we don't actually reprogram the EEPROM or anything.) Actually, you
can reprogram the MAC address on other kinds of interfaces too; this
shouldn't be ethernet-specific (though at the moment it's limited to
6 bytes of address data).

Nobody ran up to me and said "this is the politically correct way to
do this!" so I don't want to hear any complaints from people who think
I could have done it more elegantly. Consider yourselves lucky I didn't
do it by having ifconfig tread all over /dev/kmem.


# 59468 21-Apr-2000 guido

IOCGIFCONF once and for all. Sometimes the ifc_len variable
would be returned with a wrong value.
While we're here, get rid of unnecessary panic call.

PR: 17311, 12996, 14457
Submitted by: Patrick Bihan-Faou <patrick@mindstep.com>,
Kris Kennaway <kris@FreeBSD.org>


# 57570 28-Feb-2000 guido

This fixes a problem where the SIOCGIFCONF ioctl goes wrong. This
is triggered when qmail is used with INET6 enabled. The bug
manifests itself in that the space variable can become negative
and that in the comparison in the guards of the 2 loops, this was
not noticed because sizeof() returns an unsigned and thus the signed
variable gets promoted to unsigned. I decided not to make space
unsigned because I think we should guard against this from happening.
Thus panic() in case space becomes negative.

Approved by: jkh


# 56938 01-Feb-2000 shin

Add workaround for fxp issue at interface initialization with IPv6.

Some LAN card chip for fxp is known to cause problem at
interface initialization with IPv6 enabled. It happens at
some delicate timing.
And also, just adding some DELAY before IPv6 address
autoconfiguration is known to avoid the problem.

Complete fix is changing the driver not to use interrupt at
multicast filter initialization, but trying such change in
this stage will be dangerous.

So I add some DELAY() only inside #ifdef INET6 part,
as temporal workaround only for 4.0.

Approbed by: jkh

Noticed by: Mattias Pantzare <pantzer@ludd.luth.se>

Obtained from: openbsd-tech mailing list


# 56517 24-Jan-2000 ru

Notify user processes about interface's MTU change.

Reviewed by: wollman, freebsd-net


# 55276 30-Dec-1999 shin

Prevent kernel panic at ifconfig up after Note PC resume.

Submitted by: imp, kuriyama
Reviewed by: imp


# 54728 17-Dec-1999 imp

Two more fixes to if_detach. These are generic to all interfaces and
do not pollute the interface further.

o Run if_detach at splnet().
o Creatively swipe the relevant parts of the netatm atm_nif_detach
which will delete the relevant references to the interface going
away.


# 54557 13-Dec-1999 bp

Allow ifunit() routine to understand names like ed0f2. Also
fix a bug caused by using bcmp() instead of strcmp().

Reviewed by: Garrett Wollman <wollman@khavrinen.lcs.mit.edu>


# 54410 10-Dec-1999 imp

Add some gross ad-hock hacks to increase stability of if_detach:
o be more careful about clearing addresses (this isn't a kludge)
o For AF_INET interfaces, call SIOCDIFFADDR to remove last(?) bit
of cruft.

Special cases for AF_INET shouldn't be here, but I didn't see a good
generic way of doing this. If I missed something, please let me know.

This gross hack makes pccard ejection stable for ethernet cards.

Submitted by: Atushi Onoe-san


# 54263 07-Dec-1999 shin

udp IPv6 support, IPv6/IPv4 tunneling support in kernel,
packet divert at kernel for IPv6/IPv4 translater daemon

This includes queue related patch submitted by jburkhol@home.com.

Submitted by: queue related patch from jburkhol@home.com
Reviewed by: freebsd-arch, cvs-committers
Obtained from: KAME project


# 53541 22-Nov-1999 shin

KAME netinet6 basic part(no IPsec,no V6 Multicast Forwarding, no UDP/TCP
for IPv6 yet)

With this patch, you can assigne IPv6 addr automatically, and can reply to
IPv6 ping.

Reviewed by: freebsd-arch, cvs-committers
Obtained from: KAME project


# 50655 30-Aug-1999 sheldonh

For every "promiscuous mode enabled" message printed for an interface,
print a matching "disabled" message when we drop out of promiscuous
mode for that interface.

Discussed on the freebsd-hackers mailing list.


# 50477 27-Aug-1999 peter

$Id$ -> $FreeBSD$


# 49459 06-Aug-1999 brian

Define IF_MAXMTU and IF_MINMTU and don't allow MTUs with
out-of-range values.

``comparison is always 0'' warnings are silly !

Ok'd by: wollman, dg
Advised by: bde


# 48021 19-Jun-1999 phk

Add a new interface ioctl, to return "aux status".

This is inteded for to allow ifconfig to print various unstructured
information from an interface.

The data is returned from the kernel in ASCII form, see the comment in
if.h for some technicalities.

Canonical cut&paste example to be found in if_tun.c

Initial use:
Now tun* interfaces tell the PID of the process which opened them.

Future uses could be (volounteers welcome!):
Have ppp/slip interfaces tell which tty they use.
Make sync interfaces return their media state: red/yellow/blue
alarm, timeslot assignment and so on.
Make ethernets warn about missing heartbeats and/or cables


# 47778 06-Jun-1999 phk

typo in previous commit


# 47777 06-Jun-1999 phk

Introduce IFF_SMART bit.

This means that the driver will add/delete routes when it knows it is
up/down, rather than have the generic code belive it is up if configured.

This is probably most useful for serial lines, although many PHY chips
could probably tell us if we're connected to the cable/hub as well.


# 46155 28-Apr-1999 phk

This Implements the mumbled about "Jail" feature.

This is a seriously beefed up chroot kind of thing. The process
is jailed along the same lines as a chroot does it, but with
additional tough restrictions imposed on what the superuser can do.

For all I know, it is safe to hand over the root bit inside a
prison to the customer living in that prison, this is what
it was developed for in fact: "real virtual servers".

Each prison has an ip number associated with it, which all IP
communications will be coerced to use and each prison has its own
hostname.

Needless to say, you need more RAM this way, but the advantage is
that each customer can run their own particular version of apache
and not stomp on the toes of their neighbors.

It generally does what one would expect, but setting up a jail
still takes a little knowledge.

A few notes:

I have no scripts for setting up a jail, don't ask me for them.

The IP number should be an alias on one of the interfaces.

mount a /proc in each jail, it will make ps more useable.

/proc/<pid>/status tells the hostname of the prison for
jailed processes.

Quotas are only sensible if you have a mountpoint per prison.

There are no privisions for stopping resource-hogging.

Some "#ifdef INET" and similar may be missing (send patches!)

If somebody wants to take it from here and develop it into
more of a "virtual machine" they should be most welcome!

Tools, comments, patches & documentation most welcome.

Have fun...

Sponsored by: http://www.rndassociates.com/
Run for almost a year by: http://www.servetheweb.com/


# 46112 27-Apr-1999 phk

Suser() simplification:

1:
s/suser/suser_xxx/

2:
Add new function: suser(struct proc *), prototyped in <sys/proc.h>.

3:
s/suser_xxx(\([a-zA-Z0-9_]*\)->p_ucred, \&\1->p_acflag)/suser(\1)/

The remaining suser_xxx() calls will be scrutinized and dealt with
later.

There may be some unneeded #include <sys/cred.h>, but they are left
as an exercise for Bruce.

More changes to the suser() API will come along with the "jail" code.


# 46091 26-Apr-1999 peter

Protect the ifinit() function's internals with splimp() for safety since
it used to be that way. I'm not sure that it's needed, but it does
walk the ifp list..

Incidently, there's nothing to sanity check the ifq_maxlen on loaded
interfaces..


# 45720 16-Apr-1999 peter

Bring the 'new-bus' to the i386. This extensively changes the way the
i386 platform boots, it is no longer ISA-centric, and is fully dynamic.
Most old drivers compile and run without modification via 'compatability
shims' to enable a smoother transition. eisa, isapnp and pccard* are
not yet using the new resource manager. Once fully converted, all drivers
will be loadable, including PCI and ISA.

(Some other changes appear to have snuck in, including a port of Soren's
ATA driver to the Alpha. Soren, back this out if you need to.)

This is a checkpoint of work-in-progress, but is quite functional.

The bulk of the work was done over the last few years by Doug Rabson and
Garrett Wollman.

Approved by: core


# 44144 19-Feb-1999 phk

Since ifru_flags is a short, we can fit in a copy of the flags
before they got changed. This can help eliminate much of the
gymnastics drivers do in their ioctl routines to figure this out.

Remove commented out IFF_NOTRAILERS


# 43508 01-Feb-1999 phk

Print a message if the driver didn't initialize ifq_maxlen.
Drivers should be updated if they get flagged by this message.

(The reason this is important is because we do not have a way
to catch this mistake for interfaces added after ifinit() runs.)


# 41879 16-Dec-1998 phk

Generalize the if_up() and if_down() functions under the names
if_route() and if_unroute().

This is first step towards sanitizing IFF_UP and IFF_RUNNING


# 41514 04-Dec-1998 archie

Examine all occurrences of sprintf(), strcat(), and str[n]cpy()
for possible buffer overflow problems. Replaced most sprintf()'s
with snprintf(); for others cases, added terminating NUL bytes where
appropriate, replaced constants like "16" with sizeof(), etc.

These changes include several bug fixes, but most changes are for
maintainability's sake. Any instance where it wasn't "immediately
obvious" that a buffer overflow could not occur was made safer.

Reviewed by: Bruce Evans <bde@zeta.org.au>
Reviewed by: Matthew Dillon <dillon@apollo.backplane.com>
Reviewed by: Mike Spengler <mks@networkcs.com>


# 38293 12-Aug-1998 wpaul

One-liner: add a call to the underlying device driver's SIOCDELMULTI
ioctl() routine at the end of if_delmulti() so that interfaces with
hardware multicast filtering can update their filters in a timely
manner.

If the interface doesn't support hardware multicast filtering, then
reception of multicast frames is done using 'promiscious mode' or
'capture all multicast frames' mode and software filtering in the
kernel. In this case, it doesn't matter if if_delmulti() ever does
an SCIODELMULTI on the interface or not: if MULTICAST support is
enabled, then we join the 'all hosts' group when the interface is
configured, and remain in it until the interface is brought down.
Without hardware filtering, joining one group means joining all
groups, so it makes no difference if we call the SIOCDELMULTI
routine.

If the interface does support hardware multicast filtering, then
by not reprogramming the hardware filter in if_delmulti(), we have
to wait until somebody calls if_setmulti(), during which time the
interface is receiving frames for multicast groups in which we are
no longer interested.


# 37778 20-Jul-1998 dfr

Make sure the link level sockaddr size is rounded up correctly on alpha.


# 36775 08-Jun-1998 julian

Don't let ifunit() modify the string passed as an argument.
it may be in the text segment and write protected.


# 36735 07-Jun-1998 dfr

This commit fixes various 64bit portability problems required for
FreeBSD/alpha. The most significant item is to change the command
argument to ioctl functions from int to u_long. This change brings us
inline with various other BSD versions. Driver writers may like to
use (__FreeBSD_version == 300003) to detect this change.

The prototype FreeBSD/alpha machdep will follow in a couple of days
time.


# 35067 06-Apr-1998 phk

Use getmicrotime() for if_lastchange, 10msec is plenty precision.


# 31778 16-Dec-1997 eivind

Make COMPAT_43 and COMPAT_SUNOS new-style options.


# 30813 28-Oct-1997 bde

Removed unused #includes.


# 30354 12-Oct-1997 phk

Last major round (Unless Bruce thinks of somthing :-) of malloc changes.

Distribute all but the most fundamental malloc types. This time I also
remembered the trick to making things static: Put "static" in front of
them.

A couple of finer points by: bde


# 30199 07-Oct-1997 joerg

Ooops, this should have made it into the same commit, but didn't.

Introduce the SIOC[SG]IFGENERIC hooks that can be used to pass an
arbritrary ioctl subcommand into an interface driver. Surprisingly
enough, there was no provision for this already present (except of the
option of abusing SIOC[SG]IFMEDIA for this).

The idea is that an interface driver can establish ioctl subcommands
of its own that can't be meaningfully interpreted by the upper layer
interface ioctl function. Something like this is required to
implement a clean solution of passing down things like CHAP secrets or
PPP options to the /sys/net/if_sppp* files. (Yes, my CHAP is now
finally working with it, but i gotta update my kernel to the new
callout interface before being able to commit _that_.)

Reviewed by: peter [long ago, actually]


# 29194 07-Sep-1997 joerg

Fix a typo that becomes apparent when compiling without COMPAT_443.

Submitted by: Tony Kimball <Anthony.Kimball@East.Sun.COM>


# 29024 01-Sep-1997 bde

Added used #include - don't depend on <sys/mbuf.h> including
<sys/malloc.h> (unless we only use the bogusly shared M*WAIT flags).


# 28845 27-Aug-1997 julian

Add a per-interface-address pointer to a function that can be supplied
by a protocol, to detirmine if an address matches the net this address
is part of. This is needed by protocols for which netmasks
"just don't work", for example appletalk.

Also add the code in appletalk to make use of this new feature.
Thsi fixes one of the longest standing bugs in appletalk.
The inability to talk to machines to which the path is via a router
which is on a different net, but the same netrange, as your interface.
Protocols that do not supply this function (e.g. IP) should not be affected.


# 28608 22-Aug-1997 julian

add some comments while trying to understand why appletalk
gets some things wrong.
(part of my continuing "comment it as you understand it" effort :)


# 27265 07-Jul-1997 julian

Don't add an item to the multicast linked list if it's already
on the list.


# 25434 03-May-1997 peter

add SIOC{S,G}IFMEDIA ioctl support


# 25201 27-Apr-1997 wollman

The long-awaited mega-massive-network-code- cleanup. Part I.

This commit includes the following changes:
1) Old-style (pr_usrreq()) protocols are no longer supported, the compatibility
glue for them is deleted, and the kernel will panic on boot if any are compiled
in.

2) Certain protocol entry points are modified to take a process structure,
so they they can easily tell whether or not it is possible to sleep, and
also to access credentials.

3) SS_PRIV is no more, and with it goes the SO_PRIVSTATE setsockopt()
call. Protocols should use the process pointer they are now passed.

4) The PF_LOCAL and PF_ROUTE families have been updated to use the new
style, as has the `raw' skeleton family.

5) PF_LOCAL sockets now obey the process's umask when creating a socket
in the filesystem.

As a result, LINT is now broken. I'm hoping that some enterprising hacker
with a bit more time will either make the broken bits work (should be
easy for netipx) or dike them out.


# 24204 24-Mar-1997 bde

Don't include <sys/ioctl.h> in the kernel. Stage 2: include
<sys/sockio.h> instead of <sys/ioctl.h> in network files.


# 22718 14-Feb-1997 wollman

Send RTM_IFINFO messages whenever promiscuous and all-multicast
modes are enabled or disabled.


# 22614 12-Feb-1997 wollman

Implement PRC_IFUP a la PRC_IFDOWN so that protocols know when an interface
has come bacl up (and can referse actions taken as a result of downing).


# 21673 14-Jan-1997 jkh

Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.


# 21666 13-Jan-1997 wollman

Use the new if_multiaddrs list for multicast addresses rather than the
previous hackery involving struct in_ifaddr and arpcom. Get rid of the
abominable multi_kludge. Update all network interfaces to use the
new machanism. Distressingly few Ethernet drivers program the multicast
filter properly (assuming the hardware has one, which it usually does).


# 21437 08-Jan-1997 wollman

Fix typo. I hate waking up at 4:45 in the morning...


# 21434 08-Jan-1997 wollman

Fix a few oversights in the new multicast membership interface.


# 21404 07-Jan-1997 wollman

Checkpoint the beginnings of the new kernel interface for
multicast group memberships. This is not actually operative
at the moment (a lot of other code still needs to be changed), but
this seemed like a useful reference point to check in so that
others (i.e. Bill Fenner) have fair warning of where we are going.


# 20407 13-Dec-1996 wollman

Convert the interface address and IP interface address structures
to TAILQs. Fix places which referenced these for no good reason
that I can see (the references remain, but were fixed to compile
again; they are still questionable).


# 20337 11-Dec-1996 wollman

Use queue macros for the list of interfaces. Next stop: ifaddrs!


# 17462 07-Aug-1996 julian

Submitted by: archie@whistle.com
This is a patch to sys/net/if.c. What it does is patch the algorithm
for finding an IP address on an interface which most closely matches
a given IP address. The problem with it is when no address matches,
and you have to just pick one at random. Then the code ends up picking
the last IP address in the list. This patch changes things so it
picks up the first address instead.
Usually the first address is more useful as the later ones are aliases.


# 17352 30-Jul-1996 wollman

Add better support for retrieving management information from network
interfaces. This creates two new tables in the net.link.generic branch
of the MIB; one contains (essentially) `ifdata' structures, and the other
contains a blob provided by the interface (and presumably used to
implement link-layer-specific MIB variables). A number of things
have been moved around in the `ifnet' and `ifdata' structures, so
NEW VERSIONS OF ifconfig(8) AND routed(8) ARE REQUIRED. (A simple
recompile is all that's necessary.)

I have a sample program which uses this interface for those interested
in making use of it.


# 17270 24-Jul-1996 wollman

Fix a bug in ifa_ifwithnet() which caused a page fault in bcmp()
when attepmting to add certain types of routes. This problem
only manifested itself in the presence of unconfigured point-to-point
interfaces.

Noticed by: Chuck Cranor <chuck@maria.wustl.edu>


# 17096 11-Jul-1996 wollman

Modify the kernel to use the new pr_usrreqs interface rather than the old
pr_usrreq mechanism which was poorly designed and error-prone. This
commit renames pr_usrreq to pr_ousrreq so that old code which depended on it
would break in an obvious manner. This commit also implements the new
interface for TCP, although the old function is left as an example
(#ifdef'ed out). This commit ALSO fixes a longstanding bug in the
TCP timer processing (introduced by davidg on 1995/04/12) which caused
timer processing on a TCB to always stop after a single timer had
expired (because it misinterpreted the return value from tcp_usrreq()
to indicate that the TCB had been deleted). Finally, some code
related to polling has been deleted from if.c because it is not
relevant t -current and doesn't look at all like my current code.


# 16332 12-Jun-1996 gpalmer

Since the updates to ifnet.if_lastchange are so rare (relatively
speaking), go for the extra accuracy and call microtime() to get
the current time.

Pointed Out By: bde


# 16287 10-Jun-1996 gpalmer

Change the use if ifnet.if_lastchange to be more in line with
SNMP requirements. Update description of ifnet.if_lastchange in if.h
to indicate this.


# 16142 05-Jun-1996 wollman

Don't allow trailing garbage after the unit number in ifunit().


# 14546 11-Mar-1996 dg

Move or add #include <queue.h> in preparation for upcoming struct socket
changes.


# 13981 08-Feb-1996 wollman

If a slow input queue was defined by the driver, initialize it.


# 13937 06-Feb-1996 wollman

Clean up Ethernet drivers:
- fill in and use ifp->if_softc
- use if_bpf rather than private cookie variables
- change bpf interface to take advantage of this
- call ether_ifattach() directly from Ethernet drivers
- delete kludge in if_attach() that did this indirectly


# 13619 24-Jan-1996 phk

Use new printf features rather than local kludges.


# 12942 20-Dec-1995 wollman

in_proto.c: spell ``Internet'' right and put whitespace after commas.

others: start to populate the link-layer branch of the net mib, by
moving ARP to its proper place. (ARP is not a protocol family, it's an
interface layer between a medium-access layer and a protocol family.)
sysctl(8) needs to be taught about the structure of this branch, unless
Poul-Henning implements dynamic MIB exploration soon.


# 12706 09-Dec-1995 phk

Staticize, clean lint.


# 12628 05-Dec-1995 dg

all:
Removed ifnet.if_init and ifnet.if_reset as they are generally unused.
Change the parameter passed to if_watchdog to be a ifnet * rather than
a unit number. All of this is an attempt to move toward not needing an
array of softc pointers (which is usually static in size) to point to
the driver softc.

if_ed.c:
Changed some of the argument passing to some functions to make a little
more sense.

if_ep.c, if_vx.c:
Killed completely bogus use of if_timer. It was being set in such a way
that the interface was being reset once per second (blech!).


# 12374 18-Nov-1995 bde

Added bogus casts to avoid warnings.

Continued cleaning up sysinit stuff.


# 11029 27-Sep-1995 wollman

Add newline at end of log message and reduce log level to INFO from NOTICE.


# 10957 22-Sep-1995 wollman

Fix BPf to generate a header mbuf for writes.
Fix loopback and discard interfaces to understand BPF writes.
(These two from Bill Fenner to fix PR 512.)

Move ifpromisc() from bpf.c to if.c as suggested by comment in BPF.
Send a notice to the log when promiscuous mode is enabled.


# 10653 09-Sep-1995 dg

Fixed init functions argument type - caddr_t -> void *. Fixed a couple of
compiler warnings.


# 10358 28-Aug-1995 julian

Reviewed by: julian with quick glances by bruce and others
Submitted by: terry (terry lambert)
This is a composite of 3 patch sets submitted by terry.
they are:
New low-level init code that supports loadbal modules better
some cleanups in the namei code to help terry in 16-bit character support
some changes to the mount-root code to make it a little more
modular..

NOTE: mounting root off cdrom or NFS MIGHT be broken as I haven't been able
to test those cases..

certainly mounting root of disk still works just fine..
mfs should work but is untested. (tomorrows task)

The low level init stuff includes a total rewrite of init_main.c
to make it possible for new modules to have an init phase by simply
adding an entry to a TEXT_SET (or is it DATA_SET) list. thus a new module can
be added to the kernel without editing any other files other than the
'files' file.


# 9348 28-Jun-1995 dg

Don't skip point-to-point interfaces if the netmask==0 (the netmask
should be completely ignored for point-to-point interfaces).
For point-to-point interfaces, route based on the destination address,
not the local address.

Submitted by: Peter Wemm


# 9234 14-Jun-1995 dg

Took out P2P_LOCALADDR_SHARE option and made it standard.


# 8876 30-May-1995 rgrimes

Remove trailing whitespace.


# 8788 27-May-1995 dg

Added a fix for a bug which caused the wrong interface to be selected
for broadcasts if point-to-point links shared the same IP address as
the ethernet. The fix must be enabled with P2P_LOCALADDR_SHARE option
in the kernel config file. This will someday likely be standard, but
there isn't sufficient time before release to determine if there are
any interoperability problems with routed and/or gated.

Reviewed by: Garrett Wollman, and me
Submitted by: Peter Wemm


# 6687 24-Feb-1995 dg

In ifa_ifwithdstaddr() when walking through ifa structs associated with
a point-to-point link, don't attempt a comparison if the pointer to the
destination sockaddr is NULL (i.e. it has not been set/initialized).


# 5280 30-Dec-1994 dg

Moved declaration of ifnet pointer out of the header file and into the
.c file where it belongs. Bezeroed some uninitialized malloc data.


# 5184 21-Dec-1994 wollman

Add generic part of generic multiple-physical-interface support (the
successor of IFF_ALTPHYS).


# 4345 10-Nov-1994 guido

Remove redundant stuff. Amazing that they actually solved a bug found in
1.1.5.1, and oversaw this thang.


# 3419 07-Oct-1994 phk

Mostly Cosmetics. Some of the procedures in if_sl.c was void, but should
be int. I made them int, and let them return 0. Will have to find out
what the return-val is used for.


# 3377 05-Oct-1994 wollman

A number of bug-fixes inspired by Mark Treacy:
- Allow PPP to run multicasts natively.
- Deal properly with lots of similarly-named interfaces.
- Don't sign-extend if_flags.

NB: the last fix (to rtsock.c) must be reversed when we expand if_flags to a
reasonable size.

Submitted by: Mark Treacy


# 2822 16-Sep-1994 phk

Made the kernel compile even without "ether".


# 2112 18-Aug-1994 wollman

Fix up some sloppy coding practices:

- Delete redundant declarations.
- Add -Wredundant-declarations to Makefile.i386 so they don't come back.
- Delete sloppy COMMON-style declarations of uninitialized data in
header files.
- Add a few prototypes.
- Clean up warnings resulting from the above.

NB: ioconf.c will still generate a redundant-declaration warning, which
is unavoidable unless somebody volunteers to make `config' smarter.


# 1943 08-Aug-1994 dg

On second thought, better restrict the mtu to between 72-65535...strange
things happen otherwise.


# 1942 08-Aug-1994 dg

Enforce the mtu to between the range 1-65535 before calling the driver
ioctl routine.


# 1941 08-Aug-1994 dg

Added ioctl support for SIOCGIFMTU and SIOCSIFMTU. These set the per-
interface MTU.


# 1817 02-Aug-1994 dg

Added $Id$


# 1542 24-May-1994 rgrimes

This commit was generated by cvs2svn to compensate for changes in r1541,
which included commits to RCS files with non-trunk default branches.


# 1541 24-May-1994 rgrimes

BSD 4.4 Lite Kernel Sources