History log of /openbsd-current/sys/net/if_trunk.c
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 1.154 23-Dec-2023 bluhm

Backout always allocate per-CPU statistics counters for network
interface descriptor. It panics during attach of em(4) device at
boot.


# 1.153 22-Dec-2023 mvs

Always allocate per-CPU statistics counters for network interface
descriptor.

We have the mess in network interface statistics. Only pseudo drivers
do per-CPU counters allocation, all other network devices use the old
`if_data'. The network stack partially uses per-CPU counters and
partially use `if_data', but the protection is inconsistent: some times
counters accessed with exclusive netlock, some times with shared
netlock, some times with kernel lock, but without netlock, some times
with another locks.

To make network interfaces statistics more consistent, always allocate
per-CPU counters at interface attachment time and use it instead of
`if_data'. At this step only move counters allocation to the if_attach()
internals. The `if_data' removal will be performed with the following
diffs to make review and tests easier.

ok bluhm


Revision tags: OPENBSD_7_0_BASE OPENBSD_7_1_BASE OPENBSD_7_2_BASE OPENBSD_7_3_BASE OPENBSD_7_4_BASE
# 1.152 02-Aug-2021 mvs

Don't call rtm_ifchg() in trunk_port_state().

The preceding trunk_link_active() already produced RTM_IFINFO message when
trunk(4) state was changed. I such case we double RTM_IFINFO message or we
produce false message when trunk(4) state was not changed.

ok florian@


Revision tags: OPENBSD_6_9_BASE
# 1.151 28-Jan-2021 mvs

trunk(4): convert ifunit to if_unit(9)

ok bluhm@


Revision tags: OPENBSD_6_8_BASE
# 1.150 12-Sep-2020 kn

Keep port interface UP on removal

There is no reason to change flags on member interfaces when removing
them, aggr(4) does not pull its members down either.

OK florian bluhm


# 1.149 28-Jul-2020 mvs

Add missing `IFXF_CLONED' flag to clone interfaces.

ok mpi@


# 1.148 22-Jul-2020 dlg

deprecate interface input handler lists, just use one input function.

the interface input handler lists were originally set up to help
us during the intial mpsafe network stack work. at the time not all
the virtual ethernet interfaces (vlan, svlan, bridge, trunk, etc)
were mpsafe, so we wanted a way to avoid them by default, and only
take the kernel lock hit when they were specifically enabled on the
interface. since then, they have been fixed up to be mpsafe.

i could leave the list in place, but it has some semantic problems.
because virtual interfaces filter packets based on the order they
were attached to the parent interface, you can get packets taken
away in surprising ways, especially when you reboot and netstart
does something different to what you did by hand. by hardcoding the
order that things like vlan and bridge get to look at packets, we
can document the behaviour and get consistency.

it also means we can get rid of a use of SRPs which were difficult
to replace with SMRs. the interface input handler list is an SRPL,
which we would like to deprecate. it turns out that you can sleep
during stack processing, which you're not supposed to do with SRPs
or SMRs, but SRPs are a lot more forgiving and it worked.

lastly, it turns out that this code is faster than the input list
handling, so lots of winning all around.

special thanks to hrvoje popovski and aaron bieber for testing.
this has been in snaps as part of a larger diff for over a week.


# 1.147 10-Jul-2020 patrick

Change users of IFQ_DEQUEUE(), IFQ_ENQUEUE() and IFQ_LEN() to use the
"new" API.

ok dlg@ tobhe@


# 1.146 17-Jun-2020 dlg

make ph_flowid in mbufs 16bits by storing whether it's set in csum_flags.

i've been wanting to do this for a while, and now that we've got
stoeplitz and it gives us 16 bits, it seems like the right time.


# 1.145 21-May-2020 dlg

don't limit the output queue (ifq) length to 1 anymore.

if we use the ifq to move packet processing to another context,
it's too easy to fill up the one slot and cause packet loss.

the ifq len was set to 1 to avoid delays produced by the original
implementation of tx mitigation. however, trunk now introduces
latency because it isn't mpsafe yet, which causes the network stack
to have to take the kernel lock for each packet, and the kernel
lock can be quite contended. i want to use the ifq to move the
packet to the systq thread (which already has the kernel lock)
before trunk is asked to transmit it.

tested by mark patruck and myself.


Revision tags: OPENBSD_6_7_BASE
# 1.144 06-Dec-2019 dlg

when copying capabilities from the first port to a trunk, copy hardmtu too.

previously it copied the ports if_mtu to the trunks if_hardmtu,
which makes it hard for things like vlan(4) to work with a full
frame size, or large frame size.

tested by hrvoje popovski


# 1.143 07-Nov-2019 dlg

turn the linkstate hooks into a task list, like the detach hooks.

this is largely mechanical, except for carp. this moves the addition
of the carp link state hook after we're committed to using the new
interface as a carpdev. because the add can't fail, we avoid a
complicated unwind dance. also, this tweaks the carp linkstate hook
so it only updates the relevant carp interface, not all of the
carpdevs on the parent.

hrvoje popovski has tested an early version of this diff and it's
generally ok, but there's some splasserts that this diff fires that
i'll fix in an upcoming diff.

ok claudio@


# 1.142 06-Nov-2019 dlg

replace the hooks used with if_detachhooks with a task list.

the main semantic change is that things registering detach hooks
have to allocate and set a task structure that then gets added to
the list. this means if the task is allocated up front (eg, as part
of carps softc or bridges port structure), it avoids the possibility
that adding a hook can fail. a lot of drivers weren't checking for
failure, and unwinding state in the event of failure in other parts
was error prone.

while doing this i discovered that the list operations have to be
in a particular order, but drivers weren't doing that consistently
either. this diff wraps the list ops up so you have to seriously
go out of your way to screw them up.

ive also sprinkled some NET_ASSERT_LOCKED around the list operations
so we can make sure there's no potential for the list to be corrupted,
especially while it's being run.

hrvoje popovski has tested this a bit, and some issues he discovered
have been fixed.

ok sashan@


Revision tags: OPENBSD_6_6_BASE
# 1.141 05-Jul-2019 dlg

record when trunk takes over an interface by setting ac_trunkport

this will be used to prevent trunk and the upcoming aggr driver
from taking ownership of an Ethernet interface at the same time.


# 1.140 11-May-2019 florian

A trunk(4) usually stays up when the link state of one of its members
changes. While we do get RTM_IFINFO messages for the (physical) member
interfaces there is no indication that something changed from the
trunk(4) interface.
It is helpful to get this information in userland from the trunk so that
userland daemons do not need to track interface membership by themselves.
OK phessler


# 1.139 29-Apr-2019 dlg

tr_unit is unused, so gc it


# 1.138 23-Apr-2019 dlg

a first cut at converting some virtual ethernet interfaces to if_vinput

this let's input processing bypass ifiqs. there's a performance
benefit from this, and it will let me tweak the backpressure detection
mechanism that ifiqs use without impacting on a stack of virtual
interfaces.

ive tested all of these except mpw, which i will end up testing
soon anyway.


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.137 12-Aug-2018 ccardenas

Add administrative options to LACP trunk implementation.

The trunk driver now has a new ioctl (SIOCxTRUNKOPTS), which for now only
has options for LACP:
* Mode - Active or Passive (default Active)
* Timeout - Fast or Slow (default Slow)
* System Priority - 1(high) to 65535(low) (default 32768/0x8000)
* Port Priority - 1(high) to 65535(low) (default 32768/0x8000)
* IFQ Priority - 0 to NUM_QUEUES (default 6)

At the moment, ifconfig only has options for lacpmode and lacptimeout
plumbed as those are the immediate need.

The approach taken for the options was to make them on a "trunk" vs a
"port" as what's typically seen on various NOSes (JunOS, NXOS, etc...)
as it's uncommon for a host to have one link "Passive" and the other
"Active" in a given trunk.

Just like on a NOS, when applying lacpmode or lacptimeout, the settings
are immediately applied to all existing ports in the trunk and to all
future ports brought into the trunk.

Tested by many on a plethora of NIC drivers and switches.

Ok remi@


Revision tags: OPENBSD_6_3_BASE
# 1.136 19-Feb-2018 mpi

Remove almost unused `flags' argument of suser().

The account flag `ASU' will no longer be set but that makes suser()
mpsafe since it no longer mess with a per-process field.

No objection from millert@, ok tedu@, bluhm@


# 1.135 09-Jan-2018 bluhm

Creating a cloned interface could return ENOMEM due to temporary
memory shortage. As it is invoked from a system call, it should
not fail and wait instead.
OK visa@ mpi@


Revision tags: OPENBSD_6_2_BASE
# 1.134 14-Aug-2017 reyk

The "ret" return value is reused and overwritten, potentially
returning 0 (success) on error instead of an error number. The caller
doesn't evaluate the return value, so it is good enough to return
ENOBUFS (non-0) on error and to remove "ret" in trunk_cast_start().

Coverity CID 1453105; Severity: Minor

OK mpi@


# 1.133 11-Aug-2017 mpi

Remove NET_LOCK()'s argument.

Tested by Hrvoje Popovski, ok bluhm@


# 1.132 28-May-2017 mpi

Add missing NET_UNLOCK() in error path.

Spotted by sashan@


# 1.131 28-May-2017 mpi

trunk_port_destroy() needs the NET_LOCK().

It brings the interface down and restore the original lladdr.

Found by Hrvoje Popovski


# 1.130 28-May-2017 mpi

Remove useless splnet()/splx() dances.

Data structures modified in the ioctl path are protected by the NET_LOCK().

ok sashan@


Revision tags: OPENBSD_6_1_BASE
# 1.129 22-Jan-2017 dlg

move counting if_opackets next to counting if_obytes in if_enqueue.

this means packets are consistently counted in one place, unlike the
many and various ways that drivers thought they should do it.

ok mpi@ deraadt@


# 1.128 16-Sep-2016 mikeb

Reconfigure interface capabilities after switching trunkproto; ok mpi


Revision tags: OPENBSD_6_0_BASE
# 1.127 13-Apr-2016 mpi

We're always ready! So send IFQ_SET_READY() to the bitbucket.


Revision tags: OPENBSD_5_9_BASE
# 1.126 31-Dec-2015 sthen

Move tr_port_destroy down; fixes 'lacp_compose_key protection fault trap'
when removing a port from a lacp trunk. Part of a larger diff from mpi,
as suggested by mikeb. ok mpi@


# 1.125 21-Nov-2015 dlg

dont check IFF_OACTIVE to see if the port is busy.

dont check if its busy at all, actually.

fine with reyk@


# 1.124 20-Nov-2015 dlg

dont play with IFF_OACTIVE needlessly.

only a driver sets or clears it, and trunk never sets it. therefore it
never needs to clear it.


# 1.123 12-Nov-2015 mpi

Prefix flowid with ph_ and print it in m_print().

ok dlg@


# 1.122 25-Oct-2015 mpi

arp_ifinit() is no longer required.


# 1.121 08-Oct-2015 mikeb

Make sure that when trunk_port_ioctl is called to set a new
lladdr the trunk port is already on the list.

OK mpi


# 1.120 08-Oct-2015 dlg

if the mbuf has a valid flowid, use it instead of using siphash24
and a bunch of header fields we have to parse the mbuf for.

siphash24 is about 20% of the cost of sending a udp packet on a
trunk interface with tcpbench on my box. if there's a flowid set
we get all that back.

ok mpi@ mikeb@ sthen@


# 1.119 05-Oct-2015 mikeb

Factor LACP frame processing out to a separate task

This is slightly refactored version of the diff by jmatthew@
that makes use of a single per-trunk task but retains per-port
mbuf queues.

Running LACP frame processing in a task context allows a simple
way to synchronize changes to the trunk ports and trunk itself
performed from the ioctl, timeout and task contexts with a kernel
lock.

OK mpi


# 1.118 29-Sep-2015 deraadt

add sizes to some of the simpler free calls
ok mpi


# 1.117 28-Sep-2015 mpi

Remove "if_tp" from the "struct ifnet".

Instead of violating a layer of abstraction by keeping per pseudo-driver
informations in "struct ifnet", the port trunk is now passed as a cookie
to the interface input handler (ifih).

The time of per pseudo-driver hack in the network stack is over!

ok mikeb@


# 1.116 24-Sep-2015 mikeb

add a comment explaining how we serialize when switching trunkproto;
requested by mpi@


# 1.115 24-Sep-2015 mikeb

Avoid a theoretical m_pullup(9) mishandling by delegating the mbuf
reclaiming to the PDU and marker input routines.

m_pullup may return a pointer to the newly allocated mbuf. In this
case m_freem is called by the trunk_input, not by the proto specific
code and pointer to the mbuf is not passed by reference. Therefore
m_freem will either be called on the middle element of the chain
(when the m_pullup call succeeds) or on the stale pointer (when it
frees the chain in the failure case). Fortunately we should never
hit this case as the receive path uniformly uses contiguous chunks
of memory.

Verified with and ok blambert, ok mpi


# 1.114 23-Sep-2015 mikeb

Serialize trunk changes with input handler insertion and removal.

This moves around calls to if_ih_insert and if_ih_remove to ensure
that we either have completed port initialization or are going to
tear the port configuration down and don't want any input processes
to get hold of the port.

When trunk_port_destroy is called from the ioctl this would wait for
all input processes to finish and release their references to be able
to disestablish the input handler and ensure full control of the port.

When switching trunkproto it is required for the ioctl context to
be able to touch all trunk ports and the protocol (tr_psc). The
easiest way do this is to disestablish all input handlers (while
making sure they all complete) and then reestablish them after the
trunk reconfiguration is completed.

This avoids getting trunk a separate locking protocol of its own.

ok mpi, suggested by and ok dlg


# 1.113 23-Sep-2015 mikeb

Keep track of an active port in the failover trunk to avoid list
iterations and additional locking protection in the future.

Suggested by and ok mpi


# 1.112 23-Sep-2015 mikeb

Remove trunk watchdog code since it doesn't do anything useful
and we want to limit the number of different places where we
access trunk port pointers.

trunk_watchdog should be never called as we don't set up it's
if_timer and trunk_port_watchdog just calls the if_watchdog
from the underlying interface.

It's possible that this is no longer needed due to if_slowtimo/
if_watchdog changes done earlier.

ok mpi


# 1.111 10-Sep-2015 mikeb

pass a cookie argument to interface input handlers that can be used
to pass additional context or transient data with the similar life
time.

ok mpi, suggestions, hand holding and ok from dlg


# 1.110 10-Sep-2015 dlg

move the if input handler list to an SRP list.

instead of having every driver that manipulates the ifih list
understand SRPLs, this moves that processing into if_ih_insert and
if_ih_remove functions.

we rely on the kernel lock to serialise the modifications to the
list.

tested by mpi@
ok mpi@ claudio@ mikeb@


Revision tags: OPENBSD_5_8_BASE
# 1.109 17-Jul-2015 mpi

Drop promiscuously received packets if the trunk(4) interface is not
in promiscuous mode.

The long story is that claudio@ had his ssh session reset multiple
times in the hackroom because czarkoff@'s machine was sending reset.
We figured out that the packet was reaching pf because of this missing
check. pf would then not find any state and sent a reset.

Analyzed with and ok phessler@, claudio@


# 1.108 02-Jul-2015 mpi

Unify the check for up & running between all pseudo-drivers.


# 1.107 02-Jul-2015 mpi

By design if_input_process() needs to hold a reference on the receiving
ifp in order to access its ifih handlers.

So get rid of if_get() in the various ifih handlers we know the ifp is
live at this point.

ok dlg@


# 1.106 30-Jun-2015 mpi

Rename if_output() into if_enqueue() to avoid confusion with comments
talking about (*ifp->if_output)().

ok claudio@, dlg@


# 1.105 29-Jun-2015 dlg

count if_ibytes in if_input like we do for if_ipackets.

tweaks and ok mpi@


# 1.104 24-Jun-2015 mpi

Increment if_ipackets in if_input().

Note that pseudo-drivers not using if_input() are not affected by this
conversion.

ok mikeb@, kettenis@, claudio@, dlg@


# 1.103 16-Jun-2015 mpi

Store a unique ID, an interface index, rather than a pointer to the
receiving interface in the packet header of every mbuf.

The interface pointer should now be retrieved when necessary with
if_get(). If a NULL pointer is returned by if_get(), the interface
has probably been destroy/removed and the mbuf should be freed.

Such mechanism will simplify garbage collection of mbufs and limit
problems with dangling ifp pointers.

Tested by jmatthew@ and krw@, discussed with many.

ok mikeb@, bluhm@, dlg@


# 1.102 15-Jun-2015 mpi

Fix a double free in the destroy path triggered when a second process,
in my case dhclient(8), races with ifconfig(8) to free the descriptors
of the joined multicast groups.

While here reduce the difference with carp(4).

ok dms@


# 1.101 09-Jun-2015 mpi

Convert trunk(4) to if_input().

ok dlg@


# 1.100 26-May-2015 mpi

Now that the Ethernet header is always passed as part of the mbuf, kill
the second (unused) argument of the input packet handlers.

ok dlg@


# 1.99 15-May-2015 mpi

Introduce if_output(), a function do to the last steps before enqueuing
a packet on the sending queue of an interface.

Tested by many, thanks a lot!

ok dlg@, claudio@


# 1.98 14-May-2015 mpi

Allocate the input packet handler as part of the trunk_port structure
since they have the same lifetime.

Requested by and ok dlg@


# 1.97 13-May-2015 mpi

Get rid of the last "#if NTRUNK" by overwriting trunk ports' output
function.

ok claudio@, reyk@


# 1.96 11-May-2015 mpi

Take trunk(4) out of ether_input().

Each physical interface of a trunk now gets a specific ifih (interface
input handler) that runs before ether_input().

Tested by sthen@, dlg@, Theo Buehler and <mxb AT alumni.chalmers DOT se>

ok sthen@, dlg@


# 1.95 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.94 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.93 04-Dec-2014 tedu

use siphash for trunk loadbalancing. ok deraadt


# 1.92 01-Dec-2014 mikeb

Make every interface with a watchdog register it's own slow timeout

This removes the system wide if_slowtimo timeout and lets every
interface with a valid if_watchdog method register it's own in
order to get rid of the ifnet loop in the softclock context and
avoid further complications with concurrent access to the ifnet
list.

ok deraadt, input and ok mpi, looked at by claudio


# 1.91 18-Nov-2014 tedu

move arc4random prototype to systm.h. more appropriate for most code
to include that than rdnvar.h. ok deraadt dlg


Revision tags: OPENBSD_5_6_BASE
# 1.90 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.89 12-Jul-2014 tedu

add a size argument to free. will be used soon, but for now default to 0.
after discussions with beck deraadt kettenis.


# 1.88 09-Jul-2014 henning

bpf code surgery / shuffling / simplification.
the various bpf_mtap_* are very similiar, they differ in what (and to some
extent how) they prepend something, and what copy function they pass to
bpf_catchpacket.
use an internal _bpf_mtap as "backend" for bpf_mtap and friends.
extend bpf_mtap_hdr so that it covers all common cases:
if dlen is 0, nothing gets prepended.
copy function can be given, if NULL the default bpf_mcopy is used.
adjust the existing bpf_mtap_hdr users to pass a NULL ptr for the copy fn.
re-implement bpf_mtap_af as simple wrapper for bpf_mtap_hdr.
re-implement bpf_mtap_ether using bpf_map_hdr
re-implement bpf_mtap_pflog as trivial bpf_mtap_hdr wrapper
ok bluhm benno


# 1.87 10-Mar-2014 mpi

if_lladdr -> if_sadl, no functional change.

ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.86 21-Nov-2013 mpi

Remove unneeded include.

ok deraadt@


# 1.85 18-Nov-2013 mpi

Convert trunk(4) to use a detachhook, discussed at b2k13 with many.

While here add a comment explaining detach hooks' order of execution when
destroying/detaching an interface.


Revision tags: OPENBSD_5_4_BASE
# 1.84 20-Jun-2013 mpi

Revert previous and unbreak asr, the new include should be protected.

Reported by naddy@


# 1.83 20-Jun-2013 mpi

Allocate the various hook head descriptors as part of the ifnet
structure rather than doing various M_WAITOK allocations during
the *attach() functions, we always rely on them anyway.

ok mikeb@, uebayasi@


# 1.82 11-May-2013 sthen

Set trunk(4)'s MTU to that of the first trunkport. Allows trunk to work with
jumbo/baby-jumbo frames. To avoid problems with mismatches between trunkports,
any additional ports must have the same MTU as already set on the trunk(4).
Based on changes made in FreeBSD. Tested by myself and jj@, ok reyk@


# 1.81 02-Apr-2013 mpi

Instead of storing the link-level address of every interface in a global
array indexed by interface numbers, add a new field to the interface
descriptor pointing to it.

claudio@ and todd@ like it, ok mikeb@


# 1.80 28-Mar-2013 tedu

no need for a lot of code to include proc.h


Revision tags: OPENBSD_5_3_BASE
# 1.79 25-Feb-2013 dlg

trunk_bcast_start sent packets on all its member interfaces by copying
the mbuf it just sent on the previous interface. this is bad because the
previous interface could have modified the mbuf chain, which can make the
subsequent m_copym()s panic.

this copies the dance that rtsock.c does for broadcasting mbufs which
copies the mbuf before transmit, except for the last interface which it
handles outside the loop.

tested by halex@ who verified it fixes his panic.
ok claudio@ deraadt@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.78 28-Oct-2011 krw

Take more care to ensure all callbacks are initialized. In particular
tr_linkstate() was not initialized in trunk_rr_attach(), and
tr_init() and tr_stop() were missing in trunk_lb_attach().

Fixes crash triggered by changing trunkproto, reported by Anders
Berggren on bugs@.

ok mpf henning


Revision tags: OPENBSD_4_9_BASE OPENBSD_5_0_BASE
# 1.77 28-Jan-2011 reyk

Fix another memory leak by replacing m_free() with m_freem() in trunk's
broadcast mode.

ok blambert@ mikeb@


# 1.76 12-Nov-2010 dhill

avoid pointer arithmetic on void *
no change in .o

ok claudio


Revision tags: OPENBSD_4_8_BASE
# 1.75 08-May-2010 stsp

Upon changing the MAC address of an if_trunk interface, all ports are switched
to the new MAC. But subsequently added ports were still being assigned the
old MAC address because it was copied from the wrong place. Give newly added
trunk ports the current MAC of the master port, rather than the saved MAC of
the master port. The saved MAC should only be used to restore the original
MAC address of the interface when it is removed from the trunk.

ok claudio@


# 1.74 23-Apr-2010 stsp

Use proper Queen's English in a comment. Drive-by fix, no functional change.


# 1.73 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


# 1.72 17-Apr-2010 deraadt

use ifnewlladdr() for trunk lladdr changes, too
ok stsp


Revision tags: OPENBSD_4_7_BASE
# 1.71 12-Jan-2010 dlg

set the length of the send queue to 1.

this prevents the ultimate length of the queue of the underlying interface
from being artificially inflated while hte vlan/trunk queue is filled and
then dumped wholesale on the underlying interface, which will dump its
massive queue wholesale on the chip.

tx mitigation is only triggered on real interfaces now (which is where the
cost is)

ok beck@ original diff ok kjc@ henning@


# 1.70 18-Nov-2009 deraadt

do not do setup that ether_ifattach() takes care of; ok jsg


# 1.69 17-Sep-2009 claudio

Add an splassert check trunk_enqueue() calling this function at anything
below splnet() is a good recipe for doom.
OK henning, reyk, mpf


# 1.68 09-Sep-2009 reyk

remove inline functions and move some code from the trunk_lacp_input()
API function directly to lacp_input() to simplify the code path.

ok mpf@


# 1.67 16-Jul-2009 thib

Backout rev1.79 of if_vlan.c and rev1.66 of if_trunk.c;
Changes in those revision limited the send queue to one slot.

This breaks NFS over vlan(4) has discovered by sthen@.

"just plain back it out." deraadt@


# 1.66 13-Jul-2009 dlg

make the send queue one slot long. this forces packets off the virtual
interfaces down to the queue on the physical interface immediately.

this avoids having the tx mitigation code wasting cpu time dicking around
with simply shuffling packets off virtual interface queues and lets it
do its job of ammortising the cost of calling a real interfaces start
routine.

it also prevents an artificial inflation of the physical interfaces queue
length where packets could hide on the virtual interfaces queues during
softnet before being dumped en masse onto the hardware. this will smooth
out the rate at which packets are submitted to the hardware.

kjc@ says this has no impact on altq. ya henning@


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.65 27-Jan-2009 naddy

handle HW VLAN tags being passed down; from Brad


# 1.64 27-Jan-2009 naddy

make the hardware/no hardware tag stripping cases consistent and don't
hash the VLAN priority; ok henning@


# 1.63 14-Dec-2008 brad

Allow trunk_hashmbuf() to take HW VLAN tagging into consideration.

ok mpf@ naddy@


# 1.62 14-Dec-2008 brad

Since trunk_hashmbuf() and thus trunk_lb_gethdr() are no longer specific
to the loadbalance code rename trunk_lb_gethdr() to just trunk_gethr().

ok mpf@


# 1.61 28-Nov-2008 brad

Eliminate the redundant bits of code for MTU and multicast handling
from the individual drivers now that ether_ioctl() handles this.

Shrinks the i386 kernels by..
RAMDISK - 2176 bytes
RAMDISKB - 1504 bytes
RAMDISKC - 736 bytes

Tested by naddy@/okan@/sthen@/brad@/todd@/jmc@ and lots of users.
Build tested on almost all archs by todd@/brad@

ok naddy@


# 1.60 16-Nov-2008 brad

Make sure to increment the the output error counter if
not using TRUNK_PROTO_NONE and there are no member ports.

ok mpf@


# 1.59 08-Nov-2008 mpf

Take into account that our ether_input() already strips the
ethernet header. This lets us actually process the incoming
LACP-Packets. It should now work with a lot more switches.
At least a Catalyst 3500 seems happy.
OK brad@


# 1.58 04-Nov-2008 brad

Move the trunk port count check from trunk_lb_start() to trunk_start()
before the protocol start routine is called so as to cover all protocols
with the same check.

ok mpf@


# 1.57 30-Oct-2008 brad

Fix building with !INET6 kernels.


# 1.56 28-Oct-2008 brad

Remove return at the end of a void function.


# 1.55 28-Oct-2008 brad

In trunk_media_status() mark the interface as active if any ports are
active rather than just the primary being UP.

From FreeBSD

Ok mpf@


# 1.54 28-Oct-2008 brad

In trunk_lb_start() port % count will never be greater than
TRUNK_MAX_PORTS so nuke the test.

From FreeBSD

Ok mpf@


# 1.53 28-Oct-2008 brad

Feed IPv6 flow label to hash calculation.

From FreeBSD

Ok mpf@


# 1.52 28-Oct-2008 brad

Show the ACTIVE flag in ifconfig for the single interface that is
actually active in failover mode rather than all interfaces with a
link. This makes it clear if the master interface is in use or one
of the backup links.

From FreeBSD

Tested by jmc@
Ok mpf@


# 1.51 02-Oct-2008 brad

First step towards cleaning up the Ethernet driver ioctl handling.
Move calling ether_ioctl() from the top of the ioctl function, which
at the moment does absolutely nothing, to the default switch case.
Thus allowing drivers to define their own ioctl handlers and then
falling back on ether_ioctl(). The only functional change this results
in at the moment is having all Ethernet drivers returning the proper
errno of ENOTTY instead of EINVAL/ENXIO when encountering unknown
ioctl's.

Shrinks the i386 kernels by..
RAMDISK - 1024 bytes
RAMDISKB - 1120 bytes
RAMDISKC - 832 bytes

Tested by martin@/jsing@/todd@/brad@
Build tested on almost all archs by todd@/brad@

ok jsing@


# 1.50 17-Sep-2008 chl

remove dead stores and newly created unused variables.

fix potential use of uninitialized value in trunk_port_ioctl() function.

Found by LLVM/Clang Static Analyzer.

ok mpf@ henning@


# 1.49 07-Aug-2008 damien

do not touch m after IFQ_ENQUEUE()+if_start().

ok brad@, mpf@, henning@, reyk@


Revision tags: OPENBSD_4_4_BASE
# 1.48 06-Aug-2008 reyk

fix trunk breakage that sneaked in with the lacp diff:

- don't use in-kernel IFMEDIA ioctls in lacp and remove two KASSERTs
that caused reliable panics - the lacp key can be locally assigned and
we don't need to query the media subtype here.

- unbreak failover/loadbalance/broadcast status handling.

Reported by brad@
ok deraadt@


# 1.47 30-Jul-2008 mpf

Prevent a divide by zero panic if trunkproto loadbalance is
used w/out any trunkports. Patch from Dmitri Alenitchev.
OK reyk@


# 1.46 15-Jun-2008 mpf

Add 802.3ad LACP support for trunk(4).
Implementation from NetBSD. Ported via FreeBSD's version in trunk^Wlagg(4).
This is still work in progress. Tested with a HP ProCurve 3500.
OK reyk@


# 1.45 14-Jun-2008 mpf

Move bpf_mtap_hdr() above the trunk_*_input() routines.
This makes it easier to add trunk protocols that consume some packets.
Add a special case for the failover protocol, to prevent shoving
duplicates to bpf. (Not beautiful, but it has to do for the moment).
OK reyk@, claudio@


# 1.44 13-Jun-2008 mpf

Move the responsibility to free received packets on trunked interfaces
from ether_input() into trunk_input() where it can be handled in a smarter way.
OK claudio@ and reyk@ on an earlier version.


# 1.43 08-Jun-2008 brad

Use m_freem() instead of m_free() in trunk_start() to ensure that the
full mbuf chain is being free'd.

ok reyk@


# 1.42 07-May-2008 dlg

enable tx mitigation when putting packets on the wire by switching from
calls to ifp->if_start to if_start(). these are the obviously right cases
where we can do that, the less obvious ones may follow as theyre figured
out.

deraadt@ said to go for it


Revision tags: OPENBSD_4_3_BASE
# 1.41 10-Jan-2008 brad

return with ENOTTY instead of EINVAL for unknown ioctl requests to trunk
ports.

ok reyk@ dlg@


# 1.40 26-Nov-2007 martynas

typos; ok jmc@
sys/netinet/in_pcb.c and sys/net/bridgestp.c ok henning@
sys/dev/pci/bktr/* ok jakemsr@


# 1.39 20-Nov-2007 canacar

Fix possible mbuf leak on error. ok reyk@


# 1.38 22-Oct-2007 reyk

use the input mbuf for the first active port instead of copying it in
the broadcast start routing.

ok pyr@


# 1.37 22-Oct-2007 pyr

Add a broadcast mode to trunk(4). This mode sends frames on all
ports and receives frame on any port. This allows interaction with
some L2 configurations.
with input and ok reyk@


# 1.36 15-Sep-2007 henning

malloc sweep:
-remove useless casts
-MALLOC/FREE -> malloc/free
-use M_ZERO where appropriate instead of seperate bzero
feedback & ok krw, hshoexer


# 1.35 07-Sep-2007 reyk

use M_ZERO


# 1.34 06-Sep-2007 reyk

bump the copyright while touching these files


# 1.33 06-Sep-2007 reyk

With a trunk(4) interface in fail over mode the trunk(4) interface
will show input errors for packets received from any of the ports that
are part of a fail over interface but are not the "master" port at the
time. This fixes the problem by checking the error condition
correctly.

From brad at comstyle dot com


Revision tags: OPENBSD_4_2_BASE
# 1.32 26-May-2007 jason

one extern seems to be better than 20 for ifqmaxlen; ok krw


# 1.31 26-Apr-2007 reyk

extend the trunk protocol API with some additional callbacks required
for future work. also move the repeated tx start code into a common
function.

parts of it are merged from FreeBSD's trunk(4) port. oh, wait... they
renamed it to 'lagg(4)' because a little green guy from Cizzco-Eeeh
told them "trunk is for VLANs, trunk is for VLANs". Bad FreeBSD, don't
listen to the little green guy from Cizzco-Eeeh!

ok claudio@


Revision tags: OPENBSD_4_1_BASE
# 1.30 31-Jan-2007 reyk

handle the full duplex link state in trunk(4). load sharing trunks
with at least two ports are always handled as full fuplex links. this
change will allow trunks as edge ports in a rstp bridge(4).

ok brad@ pyr@


Revision tags: OPENBSD_4_0_BASE
# 1.29 28-May-2006 reyk

check if the interface is active and UP. some, but not all, network
drivers report an active link state even if the interface is DOWN.
this should fix trunk with various ethernet devices.

ok brad@


# 1.28 23-May-2006 reyk

knf and remove an unneeded debug message


# 1.27 23-May-2006 reyk

add


# 1.26 20-May-2006 reyk

bump copyright


# 1.25 20-May-2006 reyk

drop packets received on inactive failover ports without increasing the
error counter. just silently drop them...

figured out by todd@, ok brad@


# 1.24 16-May-2006 reyk

the ifp->if_linkstatehooks may be NULL, add an extra check to avoid
possible kernel panic. this happened to me by using tun(4) in layer 2
mode (link0 flag) as a trunk port for testing.


# 1.23 25-Mar-2006 djm

allow bpf(4) to ignore packets based on their direction (inbound or
outbound), using a new BIOCSDIRFILT ioctl;
guidance, feedback and ok canacar@


# 1.22 11-Mar-2006 brad

splimp -> splnet


# 1.21 04-Mar-2006 brad

With the exception of two other small uncommited diffs this moves
the remainder of the network stack from splimp to splnet.

ok miod@


Revision tags: OPENBSD_3_9_BASE
# 1.20 04-Jan-2006 brad

Move bpf_mtap_hdr() after trunk_*_input() so that we hopefully see only one
copy of the packet with bpf.

From mcbride@

some testing by todd@, ok reyk@


# 1.19 04-Jan-2006 canacar

Remove redundant calls to bpfdetach.
ok brad@


# 1.18 21-Dec-2005 reyk

knf


# 1.17 21-Dec-2005 reyk

fix possible NULL pointer, thanks to Marco Molteni


# 1.16 18-Dec-2005 reyk

Update my e-mail address in the copyright statement, no binary changes.


# 1.15 17-Dec-2005 brad

revert last commit as it introduced system panics due to improper use
of bpf_mtap().


# 1.14 27-Nov-2005 mcbride

Move bpf_mtap() after trunk_*_input() so that we hopefull see only one
copy of the packet with bpf.

ok reyk@


# 1.13 27-Nov-2005 mcbride

Attempt to accept a packet only once when operating in failover mode.

Makes trunk usable with hubs or switches which don't have actual trunk support.

ok reyk@


# 1.12 27-Nov-2005 mcbride

Fix dereference of uninitialised pointer in trunk_input() error path.

ok reyk@


# 1.11 27-Nov-2005 mcbride

Make the trunk interface link state depend on the link states of the
trunkports (link is UP as long as at least one of the trunkports is up)

ok reyk@


# 1.10 23-Oct-2005 mpf

Rework of multicast deletion code for vlan(4) and trunk(4).
The previous code could wrongly delete multicast groups
on the parent interface. Now we forward only remembered
delete requests.
OK mcbride, mickey.


# 1.9 09-Oct-2005 reyk

use label


# 1.8 03-Oct-2005 reyk

add a simple active "failover" trunk mode. port priorities will be
added later, currently the master port will always be the default
(active) port and the the next active port will be used as the
failover port.

ok brad@


# 1.7 14-Sep-2005 reyk

correctly set IFF_RUNNING flag after device state changes.

ok brad@


# 1.6 11-Sep-2005 brad

when adding any IP addresses make sure to UP the interface.

ok reyk@


# 1.5 10-Sep-2005 reyk

update the trunk(4) driver

- add multicast support by passing multicast addresses to the ports.
this is a requirement for carp(4) over trunk(4).

- support the smallest common interface capabilities. ie., this adds
support for VLAN MTUs if all attached ports have this capability.

- add a port_destroy callback to the trunk protocol. this fixes a
potential crash if the master port has been detached while running.

discussed with deraadt@, brad@ and some others.


Revision tags: OPENBSD_3_8_BASE
# 1.4 31-Jul-2005 pascoe

Introduce bpf_mtap_af and bpf_mtap_hdr to be used when passing a mbuf chain
to bpf with either an address family or other header added.

These helpers only allocate a much smaller struct m_hdr on the stack when
needed, rather than leaving 256 byte struct mbufs on the stack in deep
call paths. Also removes a fair bit of duplicated code.

commit now, tune after deraadt@


# 1.3 27-May-2005 reyk

add missing free on error. thanks to Andrey Matveev.


# 1.2 24-May-2005 reyk

support trunk stacking (trunks as trunk ports) and some fixes

ok brad@


# 1.1 24-May-2005 reyk

initial import of a trunking (link aggregation and link failover)
implementation. it currently supports round robin mode with link state
checking, additional modes will be added later.

ok brad@, deraadt@


# 1.152 02-Aug-2021 mvs

Don't call rtm_ifchg() in trunk_port_state().

The preceding trunk_link_active() already produced RTM_IFINFO message when
trunk(4) state was changed. I such case we double RTM_IFINFO message or we
produce false message when trunk(4) state was not changed.

ok florian@


Revision tags: OPENBSD_6_9_BASE
# 1.151 28-Jan-2021 mvs

trunk(4): convert ifunit to if_unit(9)

ok bluhm@


Revision tags: OPENBSD_6_8_BASE
# 1.150 12-Sep-2020 kn

Keep port interface UP on removal

There is no reason to change flags on member interfaces when removing
them, aggr(4) does not pull its members down either.

OK florian bluhm


# 1.149 28-Jul-2020 mvs

Add missing `IFXF_CLONED' flag to clone interfaces.

ok mpi@


# 1.148 22-Jul-2020 dlg

deprecate interface input handler lists, just use one input function.

the interface input handler lists were originally set up to help
us during the intial mpsafe network stack work. at the time not all
the virtual ethernet interfaces (vlan, svlan, bridge, trunk, etc)
were mpsafe, so we wanted a way to avoid them by default, and only
take the kernel lock hit when they were specifically enabled on the
interface. since then, they have been fixed up to be mpsafe.

i could leave the list in place, but it has some semantic problems.
because virtual interfaces filter packets based on the order they
were attached to the parent interface, you can get packets taken
away in surprising ways, especially when you reboot and netstart
does something different to what you did by hand. by hardcoding the
order that things like vlan and bridge get to look at packets, we
can document the behaviour and get consistency.

it also means we can get rid of a use of SRPs which were difficult
to replace with SMRs. the interface input handler list is an SRPL,
which we would like to deprecate. it turns out that you can sleep
during stack processing, which you're not supposed to do with SRPs
or SMRs, but SRPs are a lot more forgiving and it worked.

lastly, it turns out that this code is faster than the input list
handling, so lots of winning all around.

special thanks to hrvoje popovski and aaron bieber for testing.
this has been in snaps as part of a larger diff for over a week.


# 1.147 10-Jul-2020 patrick

Change users of IFQ_DEQUEUE(), IFQ_ENQUEUE() and IFQ_LEN() to use the
"new" API.

ok dlg@ tobhe@


# 1.146 17-Jun-2020 dlg

make ph_flowid in mbufs 16bits by storing whether it's set in csum_flags.

i've been wanting to do this for a while, and now that we've got
stoeplitz and it gives us 16 bits, it seems like the right time.


# 1.145 21-May-2020 dlg

don't limit the output queue (ifq) length to 1 anymore.

if we use the ifq to move packet processing to another context,
it's too easy to fill up the one slot and cause packet loss.

the ifq len was set to 1 to avoid delays produced by the original
implementation of tx mitigation. however, trunk now introduces
latency because it isn't mpsafe yet, which causes the network stack
to have to take the kernel lock for each packet, and the kernel
lock can be quite contended. i want to use the ifq to move the
packet to the systq thread (which already has the kernel lock)
before trunk is asked to transmit it.

tested by mark patruck and myself.


Revision tags: OPENBSD_6_7_BASE
# 1.144 06-Dec-2019 dlg

when copying capabilities from the first port to a trunk, copy hardmtu too.

previously it copied the ports if_mtu to the trunks if_hardmtu,
which makes it hard for things like vlan(4) to work with a full
frame size, or large frame size.

tested by hrvoje popovski


# 1.143 07-Nov-2019 dlg

turn the linkstate hooks into a task list, like the detach hooks.

this is largely mechanical, except for carp. this moves the addition
of the carp link state hook after we're committed to using the new
interface as a carpdev. because the add can't fail, we avoid a
complicated unwind dance. also, this tweaks the carp linkstate hook
so it only updates the relevant carp interface, not all of the
carpdevs on the parent.

hrvoje popovski has tested an early version of this diff and it's
generally ok, but there's some splasserts that this diff fires that
i'll fix in an upcoming diff.

ok claudio@


# 1.142 06-Nov-2019 dlg

replace the hooks used with if_detachhooks with a task list.

the main semantic change is that things registering detach hooks
have to allocate and set a task structure that then gets added to
the list. this means if the task is allocated up front (eg, as part
of carps softc or bridges port structure), it avoids the possibility
that adding a hook can fail. a lot of drivers weren't checking for
failure, and unwinding state in the event of failure in other parts
was error prone.

while doing this i discovered that the list operations have to be
in a particular order, but drivers weren't doing that consistently
either. this diff wraps the list ops up so you have to seriously
go out of your way to screw them up.

ive also sprinkled some NET_ASSERT_LOCKED around the list operations
so we can make sure there's no potential for the list to be corrupted,
especially while it's being run.

hrvoje popovski has tested this a bit, and some issues he discovered
have been fixed.

ok sashan@


Revision tags: OPENBSD_6_6_BASE
# 1.141 05-Jul-2019 dlg

record when trunk takes over an interface by setting ac_trunkport

this will be used to prevent trunk and the upcoming aggr driver
from taking ownership of an Ethernet interface at the same time.


# 1.140 11-May-2019 florian

A trunk(4) usually stays up when the link state of one of its members
changes. While we do get RTM_IFINFO messages for the (physical) member
interfaces there is no indication that something changed from the
trunk(4) interface.
It is helpful to get this information in userland from the trunk so that
userland daemons do not need to track interface membership by themselves.
OK phessler


# 1.139 29-Apr-2019 dlg

tr_unit is unused, so gc it


# 1.138 23-Apr-2019 dlg

a first cut at converting some virtual ethernet interfaces to if_vinput

this let's input processing bypass ifiqs. there's a performance
benefit from this, and it will let me tweak the backpressure detection
mechanism that ifiqs use without impacting on a stack of virtual
interfaces.

ive tested all of these except mpw, which i will end up testing
soon anyway.


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.137 12-Aug-2018 ccardenas

Add administrative options to LACP trunk implementation.

The trunk driver now has a new ioctl (SIOCxTRUNKOPTS), which for now only
has options for LACP:
* Mode - Active or Passive (default Active)
* Timeout - Fast or Slow (default Slow)
* System Priority - 1(high) to 65535(low) (default 32768/0x8000)
* Port Priority - 1(high) to 65535(low) (default 32768/0x8000)
* IFQ Priority - 0 to NUM_QUEUES (default 6)

At the moment, ifconfig only has options for lacpmode and lacptimeout
plumbed as those are the immediate need.

The approach taken for the options was to make them on a "trunk" vs a
"port" as what's typically seen on various NOSes (JunOS, NXOS, etc...)
as it's uncommon for a host to have one link "Passive" and the other
"Active" in a given trunk.

Just like on a NOS, when applying lacpmode or lacptimeout, the settings
are immediately applied to all existing ports in the trunk and to all
future ports brought into the trunk.

Tested by many on a plethora of NIC drivers and switches.

Ok remi@


Revision tags: OPENBSD_6_3_BASE
# 1.136 19-Feb-2018 mpi

Remove almost unused `flags' argument of suser().

The account flag `ASU' will no longer be set but that makes suser()
mpsafe since it no longer mess with a per-process field.

No objection from millert@, ok tedu@, bluhm@


# 1.135 09-Jan-2018 bluhm

Creating a cloned interface could return ENOMEM due to temporary
memory shortage. As it is invoked from a system call, it should
not fail and wait instead.
OK visa@ mpi@


Revision tags: OPENBSD_6_2_BASE
# 1.134 14-Aug-2017 reyk

The "ret" return value is reused and overwritten, potentially
returning 0 (success) on error instead of an error number. The caller
doesn't evaluate the return value, so it is good enough to return
ENOBUFS (non-0) on error and to remove "ret" in trunk_cast_start().

Coverity CID 1453105; Severity: Minor

OK mpi@


# 1.133 11-Aug-2017 mpi

Remove NET_LOCK()'s argument.

Tested by Hrvoje Popovski, ok bluhm@


# 1.132 28-May-2017 mpi

Add missing NET_UNLOCK() in error path.

Spotted by sashan@


# 1.131 28-May-2017 mpi

trunk_port_destroy() needs the NET_LOCK().

It brings the interface down and restore the original lladdr.

Found by Hrvoje Popovski


# 1.130 28-May-2017 mpi

Remove useless splnet()/splx() dances.

Data structures modified in the ioctl path are protected by the NET_LOCK().

ok sashan@


Revision tags: OPENBSD_6_1_BASE
# 1.129 22-Jan-2017 dlg

move counting if_opackets next to counting if_obytes in if_enqueue.

this means packets are consistently counted in one place, unlike the
many and various ways that drivers thought they should do it.

ok mpi@ deraadt@


# 1.128 16-Sep-2016 mikeb

Reconfigure interface capabilities after switching trunkproto; ok mpi


Revision tags: OPENBSD_6_0_BASE
# 1.127 13-Apr-2016 mpi

We're always ready! So send IFQ_SET_READY() to the bitbucket.


Revision tags: OPENBSD_5_9_BASE
# 1.126 31-Dec-2015 sthen

Move tr_port_destroy down; fixes 'lacp_compose_key protection fault trap'
when removing a port from a lacp trunk. Part of a larger diff from mpi,
as suggested by mikeb. ok mpi@


# 1.125 21-Nov-2015 dlg

dont check IFF_OACTIVE to see if the port is busy.

dont check if its busy at all, actually.

fine with reyk@


# 1.124 20-Nov-2015 dlg

dont play with IFF_OACTIVE needlessly.

only a driver sets or clears it, and trunk never sets it. therefore it
never needs to clear it.


# 1.123 12-Nov-2015 mpi

Prefix flowid with ph_ and print it in m_print().

ok dlg@


# 1.122 25-Oct-2015 mpi

arp_ifinit() is no longer required.


# 1.121 08-Oct-2015 mikeb

Make sure that when trunk_port_ioctl is called to set a new
lladdr the trunk port is already on the list.

OK mpi


# 1.120 08-Oct-2015 dlg

if the mbuf has a valid flowid, use it instead of using siphash24
and a bunch of header fields we have to parse the mbuf for.

siphash24 is about 20% of the cost of sending a udp packet on a
trunk interface with tcpbench on my box. if there's a flowid set
we get all that back.

ok mpi@ mikeb@ sthen@


# 1.119 05-Oct-2015 mikeb

Factor LACP frame processing out to a separate task

This is slightly refactored version of the diff by jmatthew@
that makes use of a single per-trunk task but retains per-port
mbuf queues.

Running LACP frame processing in a task context allows a simple
way to synchronize changes to the trunk ports and trunk itself
performed from the ioctl, timeout and task contexts with a kernel
lock.

OK mpi


# 1.118 29-Sep-2015 deraadt

add sizes to some of the simpler free calls
ok mpi


# 1.117 28-Sep-2015 mpi

Remove "if_tp" from the "struct ifnet".

Instead of violating a layer of abstraction by keeping per pseudo-driver
informations in "struct ifnet", the port trunk is now passed as a cookie
to the interface input handler (ifih).

The time of per pseudo-driver hack in the network stack is over!

ok mikeb@


# 1.116 24-Sep-2015 mikeb

add a comment explaining how we serialize when switching trunkproto;
requested by mpi@


# 1.115 24-Sep-2015 mikeb

Avoid a theoretical m_pullup(9) mishandling by delegating the mbuf
reclaiming to the PDU and marker input routines.

m_pullup may return a pointer to the newly allocated mbuf. In this
case m_freem is called by the trunk_input, not by the proto specific
code and pointer to the mbuf is not passed by reference. Therefore
m_freem will either be called on the middle element of the chain
(when the m_pullup call succeeds) or on the stale pointer (when it
frees the chain in the failure case). Fortunately we should never
hit this case as the receive path uniformly uses contiguous chunks
of memory.

Verified with and ok blambert, ok mpi


# 1.114 23-Sep-2015 mikeb

Serialize trunk changes with input handler insertion and removal.

This moves around calls to if_ih_insert and if_ih_remove to ensure
that we either have completed port initialization or are going to
tear the port configuration down and don't want any input processes
to get hold of the port.

When trunk_port_destroy is called from the ioctl this would wait for
all input processes to finish and release their references to be able
to disestablish the input handler and ensure full control of the port.

When switching trunkproto it is required for the ioctl context to
be able to touch all trunk ports and the protocol (tr_psc). The
easiest way do this is to disestablish all input handlers (while
making sure they all complete) and then reestablish them after the
trunk reconfiguration is completed.

This avoids getting trunk a separate locking protocol of its own.

ok mpi, suggested by and ok dlg


# 1.113 23-Sep-2015 mikeb

Keep track of an active port in the failover trunk to avoid list
iterations and additional locking protection in the future.

Suggested by and ok mpi


# 1.112 23-Sep-2015 mikeb

Remove trunk watchdog code since it doesn't do anything useful
and we want to limit the number of different places where we
access trunk port pointers.

trunk_watchdog should be never called as we don't set up it's
if_timer and trunk_port_watchdog just calls the if_watchdog
from the underlying interface.

It's possible that this is no longer needed due to if_slowtimo/
if_watchdog changes done earlier.

ok mpi


# 1.111 10-Sep-2015 mikeb

pass a cookie argument to interface input handlers that can be used
to pass additional context or transient data with the similar life
time.

ok mpi, suggestions, hand holding and ok from dlg


# 1.110 10-Sep-2015 dlg

move the if input handler list to an SRP list.

instead of having every driver that manipulates the ifih list
understand SRPLs, this moves that processing into if_ih_insert and
if_ih_remove functions.

we rely on the kernel lock to serialise the modifications to the
list.

tested by mpi@
ok mpi@ claudio@ mikeb@


Revision tags: OPENBSD_5_8_BASE
# 1.109 17-Jul-2015 mpi

Drop promiscuously received packets if the trunk(4) interface is not
in promiscuous mode.

The long story is that claudio@ had his ssh session reset multiple
times in the hackroom because czarkoff@'s machine was sending reset.
We figured out that the packet was reaching pf because of this missing
check. pf would then not find any state and sent a reset.

Analyzed with and ok phessler@, claudio@


# 1.108 02-Jul-2015 mpi

Unify the check for up & running between all pseudo-drivers.


# 1.107 02-Jul-2015 mpi

By design if_input_process() needs to hold a reference on the receiving
ifp in order to access its ifih handlers.

So get rid of if_get() in the various ifih handlers we know the ifp is
live at this point.

ok dlg@


# 1.106 30-Jun-2015 mpi

Rename if_output() into if_enqueue() to avoid confusion with comments
talking about (*ifp->if_output)().

ok claudio@, dlg@


# 1.105 29-Jun-2015 dlg

count if_ibytes in if_input like we do for if_ipackets.

tweaks and ok mpi@


# 1.104 24-Jun-2015 mpi

Increment if_ipackets in if_input().

Note that pseudo-drivers not using if_input() are not affected by this
conversion.

ok mikeb@, kettenis@, claudio@, dlg@


# 1.103 16-Jun-2015 mpi

Store a unique ID, an interface index, rather than a pointer to the
receiving interface in the packet header of every mbuf.

The interface pointer should now be retrieved when necessary with
if_get(). If a NULL pointer is returned by if_get(), the interface
has probably been destroy/removed and the mbuf should be freed.

Such mechanism will simplify garbage collection of mbufs and limit
problems with dangling ifp pointers.

Tested by jmatthew@ and krw@, discussed with many.

ok mikeb@, bluhm@, dlg@


# 1.102 15-Jun-2015 mpi

Fix a double free in the destroy path triggered when a second process,
in my case dhclient(8), races with ifconfig(8) to free the descriptors
of the joined multicast groups.

While here reduce the difference with carp(4).

ok dms@


# 1.101 09-Jun-2015 mpi

Convert trunk(4) to if_input().

ok dlg@


# 1.100 26-May-2015 mpi

Now that the Ethernet header is always passed as part of the mbuf, kill
the second (unused) argument of the input packet handlers.

ok dlg@


# 1.99 15-May-2015 mpi

Introduce if_output(), a function do to the last steps before enqueuing
a packet on the sending queue of an interface.

Tested by many, thanks a lot!

ok dlg@, claudio@


# 1.98 14-May-2015 mpi

Allocate the input packet handler as part of the trunk_port structure
since they have the same lifetime.

Requested by and ok dlg@


# 1.97 13-May-2015 mpi

Get rid of the last "#if NTRUNK" by overwriting trunk ports' output
function.

ok claudio@, reyk@


# 1.96 11-May-2015 mpi

Take trunk(4) out of ether_input().

Each physical interface of a trunk now gets a specific ifih (interface
input handler) that runs before ether_input().

Tested by sthen@, dlg@, Theo Buehler and <mxb AT alumni.chalmers DOT se>

ok sthen@, dlg@


# 1.95 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.94 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.93 04-Dec-2014 tedu

use siphash for trunk loadbalancing. ok deraadt


# 1.92 01-Dec-2014 mikeb

Make every interface with a watchdog register it's own slow timeout

This removes the system wide if_slowtimo timeout and lets every
interface with a valid if_watchdog method register it's own in
order to get rid of the ifnet loop in the softclock context and
avoid further complications with concurrent access to the ifnet
list.

ok deraadt, input and ok mpi, looked at by claudio


# 1.91 18-Nov-2014 tedu

move arc4random prototype to systm.h. more appropriate for most code
to include that than rdnvar.h. ok deraadt dlg


Revision tags: OPENBSD_5_6_BASE
# 1.90 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.89 12-Jul-2014 tedu

add a size argument to free. will be used soon, but for now default to 0.
after discussions with beck deraadt kettenis.


# 1.88 09-Jul-2014 henning

bpf code surgery / shuffling / simplification.
the various bpf_mtap_* are very similiar, they differ in what (and to some
extent how) they prepend something, and what copy function they pass to
bpf_catchpacket.
use an internal _bpf_mtap as "backend" for bpf_mtap and friends.
extend bpf_mtap_hdr so that it covers all common cases:
if dlen is 0, nothing gets prepended.
copy function can be given, if NULL the default bpf_mcopy is used.
adjust the existing bpf_mtap_hdr users to pass a NULL ptr for the copy fn.
re-implement bpf_mtap_af as simple wrapper for bpf_mtap_hdr.
re-implement bpf_mtap_ether using bpf_map_hdr
re-implement bpf_mtap_pflog as trivial bpf_mtap_hdr wrapper
ok bluhm benno


# 1.87 10-Mar-2014 mpi

if_lladdr -> if_sadl, no functional change.

ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.86 21-Nov-2013 mpi

Remove unneeded include.

ok deraadt@


# 1.85 18-Nov-2013 mpi

Convert trunk(4) to use a detachhook, discussed at b2k13 with many.

While here add a comment explaining detach hooks' order of execution when
destroying/detaching an interface.


Revision tags: OPENBSD_5_4_BASE
# 1.84 20-Jun-2013 mpi

Revert previous and unbreak asr, the new include should be protected.

Reported by naddy@


# 1.83 20-Jun-2013 mpi

Allocate the various hook head descriptors as part of the ifnet
structure rather than doing various M_WAITOK allocations during
the *attach() functions, we always rely on them anyway.

ok mikeb@, uebayasi@


# 1.82 11-May-2013 sthen

Set trunk(4)'s MTU to that of the first trunkport. Allows trunk to work with
jumbo/baby-jumbo frames. To avoid problems with mismatches between trunkports,
any additional ports must have the same MTU as already set on the trunk(4).
Based on changes made in FreeBSD. Tested by myself and jj@, ok reyk@


# 1.81 02-Apr-2013 mpi

Instead of storing the link-level address of every interface in a global
array indexed by interface numbers, add a new field to the interface
descriptor pointing to it.

claudio@ and todd@ like it, ok mikeb@


# 1.80 28-Mar-2013 tedu

no need for a lot of code to include proc.h


Revision tags: OPENBSD_5_3_BASE
# 1.79 25-Feb-2013 dlg

trunk_bcast_start sent packets on all its member interfaces by copying
the mbuf it just sent on the previous interface. this is bad because the
previous interface could have modified the mbuf chain, which can make the
subsequent m_copym()s panic.

this copies the dance that rtsock.c does for broadcasting mbufs which
copies the mbuf before transmit, except for the last interface which it
handles outside the loop.

tested by halex@ who verified it fixes his panic.
ok claudio@ deraadt@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.78 28-Oct-2011 krw

Take more care to ensure all callbacks are initialized. In particular
tr_linkstate() was not initialized in trunk_rr_attach(), and
tr_init() and tr_stop() were missing in trunk_lb_attach().

Fixes crash triggered by changing trunkproto, reported by Anders
Berggren on bugs@.

ok mpf henning


Revision tags: OPENBSD_4_9_BASE OPENBSD_5_0_BASE
# 1.77 28-Jan-2011 reyk

Fix another memory leak by replacing m_free() with m_freem() in trunk's
broadcast mode.

ok blambert@ mikeb@


# 1.76 12-Nov-2010 dhill

avoid pointer arithmetic on void *
no change in .o

ok claudio


Revision tags: OPENBSD_4_8_BASE
# 1.75 08-May-2010 stsp

Upon changing the MAC address of an if_trunk interface, all ports are switched
to the new MAC. But subsequently added ports were still being assigned the
old MAC address because it was copied from the wrong place. Give newly added
trunk ports the current MAC of the master port, rather than the saved MAC of
the master port. The saved MAC should only be used to restore the original
MAC address of the interface when it is removed from the trunk.

ok claudio@


# 1.74 23-Apr-2010 stsp

Use proper Queen's English in a comment. Drive-by fix, no functional change.


# 1.73 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


# 1.72 17-Apr-2010 deraadt

use ifnewlladdr() for trunk lladdr changes, too
ok stsp


Revision tags: OPENBSD_4_7_BASE
# 1.71 12-Jan-2010 dlg

set the length of the send queue to 1.

this prevents the ultimate length of the queue of the underlying interface
from being artificially inflated while hte vlan/trunk queue is filled and
then dumped wholesale on the underlying interface, which will dump its
massive queue wholesale on the chip.

tx mitigation is only triggered on real interfaces now (which is where the
cost is)

ok beck@ original diff ok kjc@ henning@


# 1.70 18-Nov-2009 deraadt

do not do setup that ether_ifattach() takes care of; ok jsg


# 1.69 17-Sep-2009 claudio

Add an splassert check trunk_enqueue() calling this function at anything
below splnet() is a good recipe for doom.
OK henning, reyk, mpf


# 1.68 09-Sep-2009 reyk

remove inline functions and move some code from the trunk_lacp_input()
API function directly to lacp_input() to simplify the code path.

ok mpf@


# 1.67 16-Jul-2009 thib

Backout rev1.79 of if_vlan.c and rev1.66 of if_trunk.c;
Changes in those revision limited the send queue to one slot.

This breaks NFS over vlan(4) has discovered by sthen@.

"just plain back it out." deraadt@


# 1.66 13-Jul-2009 dlg

make the send queue one slot long. this forces packets off the virtual
interfaces down to the queue on the physical interface immediately.

this avoids having the tx mitigation code wasting cpu time dicking around
with simply shuffling packets off virtual interface queues and lets it
do its job of ammortising the cost of calling a real interfaces start
routine.

it also prevents an artificial inflation of the physical interfaces queue
length where packets could hide on the virtual interfaces queues during
softnet before being dumped en masse onto the hardware. this will smooth
out the rate at which packets are submitted to the hardware.

kjc@ says this has no impact on altq. ya henning@


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.65 27-Jan-2009 naddy

handle HW VLAN tags being passed down; from Brad


# 1.64 27-Jan-2009 naddy

make the hardware/no hardware tag stripping cases consistent and don't
hash the VLAN priority; ok henning@


# 1.63 14-Dec-2008 brad

Allow trunk_hashmbuf() to take HW VLAN tagging into consideration.

ok mpf@ naddy@


# 1.62 14-Dec-2008 brad

Since trunk_hashmbuf() and thus trunk_lb_gethdr() are no longer specific
to the loadbalance code rename trunk_lb_gethdr() to just trunk_gethr().

ok mpf@


# 1.61 28-Nov-2008 brad

Eliminate the redundant bits of code for MTU and multicast handling
from the individual drivers now that ether_ioctl() handles this.

Shrinks the i386 kernels by..
RAMDISK - 2176 bytes
RAMDISKB - 1504 bytes
RAMDISKC - 736 bytes

Tested by naddy@/okan@/sthen@/brad@/todd@/jmc@ and lots of users.
Build tested on almost all archs by todd@/brad@

ok naddy@


# 1.60 16-Nov-2008 brad

Make sure to increment the the output error counter if
not using TRUNK_PROTO_NONE and there are no member ports.

ok mpf@


# 1.59 08-Nov-2008 mpf

Take into account that our ether_input() already strips the
ethernet header. This lets us actually process the incoming
LACP-Packets. It should now work with a lot more switches.
At least a Catalyst 3500 seems happy.
OK brad@


# 1.58 04-Nov-2008 brad

Move the trunk port count check from trunk_lb_start() to trunk_start()
before the protocol start routine is called so as to cover all protocols
with the same check.

ok mpf@


# 1.57 30-Oct-2008 brad

Fix building with !INET6 kernels.


# 1.56 28-Oct-2008 brad

Remove return at the end of a void function.


# 1.55 28-Oct-2008 brad

In trunk_media_status() mark the interface as active if any ports are
active rather than just the primary being UP.

From FreeBSD

Ok mpf@


# 1.54 28-Oct-2008 brad

In trunk_lb_start() port % count will never be greater than
TRUNK_MAX_PORTS so nuke the test.

From FreeBSD

Ok mpf@


# 1.53 28-Oct-2008 brad

Feed IPv6 flow label to hash calculation.

From FreeBSD

Ok mpf@


# 1.52 28-Oct-2008 brad

Show the ACTIVE flag in ifconfig for the single interface that is
actually active in failover mode rather than all interfaces with a
link. This makes it clear if the master interface is in use or one
of the backup links.

From FreeBSD

Tested by jmc@
Ok mpf@


# 1.51 02-Oct-2008 brad

First step towards cleaning up the Ethernet driver ioctl handling.
Move calling ether_ioctl() from the top of the ioctl function, which
at the moment does absolutely nothing, to the default switch case.
Thus allowing drivers to define their own ioctl handlers and then
falling back on ether_ioctl(). The only functional change this results
in at the moment is having all Ethernet drivers returning the proper
errno of ENOTTY instead of EINVAL/ENXIO when encountering unknown
ioctl's.

Shrinks the i386 kernels by..
RAMDISK - 1024 bytes
RAMDISKB - 1120 bytes
RAMDISKC - 832 bytes

Tested by martin@/jsing@/todd@/brad@
Build tested on almost all archs by todd@/brad@

ok jsing@


# 1.50 17-Sep-2008 chl

remove dead stores and newly created unused variables.

fix potential use of uninitialized value in trunk_port_ioctl() function.

Found by LLVM/Clang Static Analyzer.

ok mpf@ henning@


# 1.49 07-Aug-2008 damien

do not touch m after IFQ_ENQUEUE()+if_start().

ok brad@, mpf@, henning@, reyk@


Revision tags: OPENBSD_4_4_BASE
# 1.48 06-Aug-2008 reyk

fix trunk breakage that sneaked in with the lacp diff:

- don't use in-kernel IFMEDIA ioctls in lacp and remove two KASSERTs
that caused reliable panics - the lacp key can be locally assigned and
we don't need to query the media subtype here.

- unbreak failover/loadbalance/broadcast status handling.

Reported by brad@
ok deraadt@


# 1.47 30-Jul-2008 mpf

Prevent a divide by zero panic if trunkproto loadbalance is
used w/out any trunkports. Patch from Dmitri Alenitchev.
OK reyk@


# 1.46 15-Jun-2008 mpf

Add 802.3ad LACP support for trunk(4).
Implementation from NetBSD. Ported via FreeBSD's version in trunk^Wlagg(4).
This is still work in progress. Tested with a HP ProCurve 3500.
OK reyk@


# 1.45 14-Jun-2008 mpf

Move bpf_mtap_hdr() above the trunk_*_input() routines.
This makes it easier to add trunk protocols that consume some packets.
Add a special case for the failover protocol, to prevent shoving
duplicates to bpf. (Not beautiful, but it has to do for the moment).
OK reyk@, claudio@


# 1.44 13-Jun-2008 mpf

Move the responsibility to free received packets on trunked interfaces
from ether_input() into trunk_input() where it can be handled in a smarter way.
OK claudio@ and reyk@ on an earlier version.


# 1.43 08-Jun-2008 brad

Use m_freem() instead of m_free() in trunk_start() to ensure that the
full mbuf chain is being free'd.

ok reyk@


# 1.42 07-May-2008 dlg

enable tx mitigation when putting packets on the wire by switching from
calls to ifp->if_start to if_start(). these are the obviously right cases
where we can do that, the less obvious ones may follow as theyre figured
out.

deraadt@ said to go for it


Revision tags: OPENBSD_4_3_BASE
# 1.41 10-Jan-2008 brad

return with ENOTTY instead of EINVAL for unknown ioctl requests to trunk
ports.

ok reyk@ dlg@


# 1.40 26-Nov-2007 martynas

typos; ok jmc@
sys/netinet/in_pcb.c and sys/net/bridgestp.c ok henning@
sys/dev/pci/bktr/* ok jakemsr@


# 1.39 20-Nov-2007 canacar

Fix possible mbuf leak on error. ok reyk@


# 1.38 22-Oct-2007 reyk

use the input mbuf for the first active port instead of copying it in
the broadcast start routing.

ok pyr@


# 1.37 22-Oct-2007 pyr

Add a broadcast mode to trunk(4). This mode sends frames on all
ports and receives frame on any port. This allows interaction with
some L2 configurations.
with input and ok reyk@


# 1.36 15-Sep-2007 henning

malloc sweep:
-remove useless casts
-MALLOC/FREE -> malloc/free
-use M_ZERO where appropriate instead of seperate bzero
feedback & ok krw, hshoexer


# 1.35 07-Sep-2007 reyk

use M_ZERO


# 1.34 06-Sep-2007 reyk

bump the copyright while touching these files


# 1.33 06-Sep-2007 reyk

With a trunk(4) interface in fail over mode the trunk(4) interface
will show input errors for packets received from any of the ports that
are part of a fail over interface but are not the "master" port at the
time. This fixes the problem by checking the error condition
correctly.

From brad at comstyle dot com


Revision tags: OPENBSD_4_2_BASE
# 1.32 26-May-2007 jason

one extern seems to be better than 20 for ifqmaxlen; ok krw


# 1.31 26-Apr-2007 reyk

extend the trunk protocol API with some additional callbacks required
for future work. also move the repeated tx start code into a common
function.

parts of it are merged from FreeBSD's trunk(4) port. oh, wait... they
renamed it to 'lagg(4)' because a little green guy from Cizzco-Eeeh
told them "trunk is for VLANs, trunk is for VLANs". Bad FreeBSD, don't
listen to the little green guy from Cizzco-Eeeh!

ok claudio@


Revision tags: OPENBSD_4_1_BASE
# 1.30 31-Jan-2007 reyk

handle the full duplex link state in trunk(4). load sharing trunks
with at least two ports are always handled as full fuplex links. this
change will allow trunks as edge ports in a rstp bridge(4).

ok brad@ pyr@


Revision tags: OPENBSD_4_0_BASE
# 1.29 28-May-2006 reyk

check if the interface is active and UP. some, but not all, network
drivers report an active link state even if the interface is DOWN.
this should fix trunk with various ethernet devices.

ok brad@


# 1.28 23-May-2006 reyk

knf and remove an unneeded debug message


# 1.27 23-May-2006 reyk

add


# 1.26 20-May-2006 reyk

bump copyright


# 1.25 20-May-2006 reyk

drop packets received on inactive failover ports without increasing the
error counter. just silently drop them...

figured out by todd@, ok brad@


# 1.24 16-May-2006 reyk

the ifp->if_linkstatehooks may be NULL, add an extra check to avoid
possible kernel panic. this happened to me by using tun(4) in layer 2
mode (link0 flag) as a trunk port for testing.


# 1.23 25-Mar-2006 djm

allow bpf(4) to ignore packets based on their direction (inbound or
outbound), using a new BIOCSDIRFILT ioctl;
guidance, feedback and ok canacar@


# 1.22 11-Mar-2006 brad

splimp -> splnet


# 1.21 04-Mar-2006 brad

With the exception of two other small uncommited diffs this moves
the remainder of the network stack from splimp to splnet.

ok miod@


Revision tags: OPENBSD_3_9_BASE
# 1.20 04-Jan-2006 brad

Move bpf_mtap_hdr() after trunk_*_input() so that we hopefully see only one
copy of the packet with bpf.

From mcbride@

some testing by todd@, ok reyk@


# 1.19 04-Jan-2006 canacar

Remove redundant calls to bpfdetach.
ok brad@


# 1.18 21-Dec-2005 reyk

knf


# 1.17 21-Dec-2005 reyk

fix possible NULL pointer, thanks to Marco Molteni


# 1.16 18-Dec-2005 reyk

Update my e-mail address in the copyright statement, no binary changes.


# 1.15 17-Dec-2005 brad

revert last commit as it introduced system panics due to improper use
of bpf_mtap().


# 1.14 27-Nov-2005 mcbride

Move bpf_mtap() after trunk_*_input() so that we hopefull see only one
copy of the packet with bpf.

ok reyk@


# 1.13 27-Nov-2005 mcbride

Attempt to accept a packet only once when operating in failover mode.

Makes trunk usable with hubs or switches which don't have actual trunk support.

ok reyk@


# 1.12 27-Nov-2005 mcbride

Fix dereference of uninitialised pointer in trunk_input() error path.

ok reyk@


# 1.11 27-Nov-2005 mcbride

Make the trunk interface link state depend on the link states of the
trunkports (link is UP as long as at least one of the trunkports is up)

ok reyk@


# 1.10 23-Oct-2005 mpf

Rework of multicast deletion code for vlan(4) and trunk(4).
The previous code could wrongly delete multicast groups
on the parent interface. Now we forward only remembered
delete requests.
OK mcbride, mickey.


# 1.9 09-Oct-2005 reyk

use label


# 1.8 03-Oct-2005 reyk

add a simple active "failover" trunk mode. port priorities will be
added later, currently the master port will always be the default
(active) port and the the next active port will be used as the
failover port.

ok brad@


# 1.7 14-Sep-2005 reyk

correctly set IFF_RUNNING flag after device state changes.

ok brad@


# 1.6 11-Sep-2005 brad

when adding any IP addresses make sure to UP the interface.

ok reyk@


# 1.5 10-Sep-2005 reyk

update the trunk(4) driver

- add multicast support by passing multicast addresses to the ports.
this is a requirement for carp(4) over trunk(4).

- support the smallest common interface capabilities. ie., this adds
support for VLAN MTUs if all attached ports have this capability.

- add a port_destroy callback to the trunk protocol. this fixes a
potential crash if the master port has been detached while running.

discussed with deraadt@, brad@ and some others.


Revision tags: OPENBSD_3_8_BASE
# 1.4 31-Jul-2005 pascoe

Introduce bpf_mtap_af and bpf_mtap_hdr to be used when passing a mbuf chain
to bpf with either an address family or other header added.

These helpers only allocate a much smaller struct m_hdr on the stack when
needed, rather than leaving 256 byte struct mbufs on the stack in deep
call paths. Also removes a fair bit of duplicated code.

commit now, tune after deraadt@


# 1.3 27-May-2005 reyk

add missing free on error. thanks to Andrey Matveev.


# 1.2 24-May-2005 reyk

support trunk stacking (trunks as trunk ports) and some fixes

ok brad@


# 1.1 24-May-2005 reyk

initial import of a trunking (link aggregation and link failover)
implementation. it currently supports round robin mode with link state
checking, additional modes will be added later.

ok brad@, deraadt@


# 1.151 28-Jan-2021 mvs

trunk(4): convert ifunit to if_unit(9)

ok bluhm@


Revision tags: OPENBSD_6_8_BASE
# 1.150 12-Sep-2020 kn

Keep port interface UP on removal

There is no reason to change flags on member interfaces when removing
them, aggr(4) does not pull its members down either.

OK florian bluhm


# 1.149 28-Jul-2020 mvs

Add missing `IFXF_CLONED' flag to clone interfaces.

ok mpi@


# 1.148 22-Jul-2020 dlg

deprecate interface input handler lists, just use one input function.

the interface input handler lists were originally set up to help
us during the intial mpsafe network stack work. at the time not all
the virtual ethernet interfaces (vlan, svlan, bridge, trunk, etc)
were mpsafe, so we wanted a way to avoid them by default, and only
take the kernel lock hit when they were specifically enabled on the
interface. since then, they have been fixed up to be mpsafe.

i could leave the list in place, but it has some semantic problems.
because virtual interfaces filter packets based on the order they
were attached to the parent interface, you can get packets taken
away in surprising ways, especially when you reboot and netstart
does something different to what you did by hand. by hardcoding the
order that things like vlan and bridge get to look at packets, we
can document the behaviour and get consistency.

it also means we can get rid of a use of SRPs which were difficult
to replace with SMRs. the interface input handler list is an SRPL,
which we would like to deprecate. it turns out that you can sleep
during stack processing, which you're not supposed to do with SRPs
or SMRs, but SRPs are a lot more forgiving and it worked.

lastly, it turns out that this code is faster than the input list
handling, so lots of winning all around.

special thanks to hrvoje popovski and aaron bieber for testing.
this has been in snaps as part of a larger diff for over a week.


# 1.147 10-Jul-2020 patrick

Change users of IFQ_DEQUEUE(), IFQ_ENQUEUE() and IFQ_LEN() to use the
"new" API.

ok dlg@ tobhe@


# 1.146 17-Jun-2020 dlg

make ph_flowid in mbufs 16bits by storing whether it's set in csum_flags.

i've been wanting to do this for a while, and now that we've got
stoeplitz and it gives us 16 bits, it seems like the right time.


# 1.145 21-May-2020 dlg

don't limit the output queue (ifq) length to 1 anymore.

if we use the ifq to move packet processing to another context,
it's too easy to fill up the one slot and cause packet loss.

the ifq len was set to 1 to avoid delays produced by the original
implementation of tx mitigation. however, trunk now introduces
latency because it isn't mpsafe yet, which causes the network stack
to have to take the kernel lock for each packet, and the kernel
lock can be quite contended. i want to use the ifq to move the
packet to the systq thread (which already has the kernel lock)
before trunk is asked to transmit it.

tested by mark patruck and myself.


Revision tags: OPENBSD_6_7_BASE
# 1.144 06-Dec-2019 dlg

when copying capabilities from the first port to a trunk, copy hardmtu too.

previously it copied the ports if_mtu to the trunks if_hardmtu,
which makes it hard for things like vlan(4) to work with a full
frame size, or large frame size.

tested by hrvoje popovski


# 1.143 07-Nov-2019 dlg

turn the linkstate hooks into a task list, like the detach hooks.

this is largely mechanical, except for carp. this moves the addition
of the carp link state hook after we're committed to using the new
interface as a carpdev. because the add can't fail, we avoid a
complicated unwind dance. also, this tweaks the carp linkstate hook
so it only updates the relevant carp interface, not all of the
carpdevs on the parent.

hrvoje popovski has tested an early version of this diff and it's
generally ok, but there's some splasserts that this diff fires that
i'll fix in an upcoming diff.

ok claudio@


# 1.142 06-Nov-2019 dlg

replace the hooks used with if_detachhooks with a task list.

the main semantic change is that things registering detach hooks
have to allocate and set a task structure that then gets added to
the list. this means if the task is allocated up front (eg, as part
of carps softc or bridges port structure), it avoids the possibility
that adding a hook can fail. a lot of drivers weren't checking for
failure, and unwinding state in the event of failure in other parts
was error prone.

while doing this i discovered that the list operations have to be
in a particular order, but drivers weren't doing that consistently
either. this diff wraps the list ops up so you have to seriously
go out of your way to screw them up.

ive also sprinkled some NET_ASSERT_LOCKED around the list operations
so we can make sure there's no potential for the list to be corrupted,
especially while it's being run.

hrvoje popovski has tested this a bit, and some issues he discovered
have been fixed.

ok sashan@


Revision tags: OPENBSD_6_6_BASE
# 1.141 05-Jul-2019 dlg

record when trunk takes over an interface by setting ac_trunkport

this will be used to prevent trunk and the upcoming aggr driver
from taking ownership of an Ethernet interface at the same time.


# 1.140 11-May-2019 florian

A trunk(4) usually stays up when the link state of one of its members
changes. While we do get RTM_IFINFO messages for the (physical) member
interfaces there is no indication that something changed from the
trunk(4) interface.
It is helpful to get this information in userland from the trunk so that
userland daemons do not need to track interface membership by themselves.
OK phessler


# 1.139 29-Apr-2019 dlg

tr_unit is unused, so gc it


# 1.138 23-Apr-2019 dlg

a first cut at converting some virtual ethernet interfaces to if_vinput

this let's input processing bypass ifiqs. there's a performance
benefit from this, and it will let me tweak the backpressure detection
mechanism that ifiqs use without impacting on a stack of virtual
interfaces.

ive tested all of these except mpw, which i will end up testing
soon anyway.


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.137 12-Aug-2018 ccardenas

Add administrative options to LACP trunk implementation.

The trunk driver now has a new ioctl (SIOCxTRUNKOPTS), which for now only
has options for LACP:
* Mode - Active or Passive (default Active)
* Timeout - Fast or Slow (default Slow)
* System Priority - 1(high) to 65535(low) (default 32768/0x8000)
* Port Priority - 1(high) to 65535(low) (default 32768/0x8000)
* IFQ Priority - 0 to NUM_QUEUES (default 6)

At the moment, ifconfig only has options for lacpmode and lacptimeout
plumbed as those are the immediate need.

The approach taken for the options was to make them on a "trunk" vs a
"port" as what's typically seen on various NOSes (JunOS, NXOS, etc...)
as it's uncommon for a host to have one link "Passive" and the other
"Active" in a given trunk.

Just like on a NOS, when applying lacpmode or lacptimeout, the settings
are immediately applied to all existing ports in the trunk and to all
future ports brought into the trunk.

Tested by many on a plethora of NIC drivers and switches.

Ok remi@


Revision tags: OPENBSD_6_3_BASE
# 1.136 19-Feb-2018 mpi

Remove almost unused `flags' argument of suser().

The account flag `ASU' will no longer be set but that makes suser()
mpsafe since it no longer mess with a per-process field.

No objection from millert@, ok tedu@, bluhm@


# 1.135 09-Jan-2018 bluhm

Creating a cloned interface could return ENOMEM due to temporary
memory shortage. As it is invoked from a system call, it should
not fail and wait instead.
OK visa@ mpi@


Revision tags: OPENBSD_6_2_BASE
# 1.134 14-Aug-2017 reyk

The "ret" return value is reused and overwritten, potentially
returning 0 (success) on error instead of an error number. The caller
doesn't evaluate the return value, so it is good enough to return
ENOBUFS (non-0) on error and to remove "ret" in trunk_cast_start().

Coverity CID 1453105; Severity: Minor

OK mpi@


# 1.133 11-Aug-2017 mpi

Remove NET_LOCK()'s argument.

Tested by Hrvoje Popovski, ok bluhm@


# 1.132 28-May-2017 mpi

Add missing NET_UNLOCK() in error path.

Spotted by sashan@


# 1.131 28-May-2017 mpi

trunk_port_destroy() needs the NET_LOCK().

It brings the interface down and restore the original lladdr.

Found by Hrvoje Popovski


# 1.130 28-May-2017 mpi

Remove useless splnet()/splx() dances.

Data structures modified in the ioctl path are protected by the NET_LOCK().

ok sashan@


Revision tags: OPENBSD_6_1_BASE
# 1.129 22-Jan-2017 dlg

move counting if_opackets next to counting if_obytes in if_enqueue.

this means packets are consistently counted in one place, unlike the
many and various ways that drivers thought they should do it.

ok mpi@ deraadt@


# 1.128 16-Sep-2016 mikeb

Reconfigure interface capabilities after switching trunkproto; ok mpi


Revision tags: OPENBSD_6_0_BASE
# 1.127 13-Apr-2016 mpi

We're always ready! So send IFQ_SET_READY() to the bitbucket.


Revision tags: OPENBSD_5_9_BASE
# 1.126 31-Dec-2015 sthen

Move tr_port_destroy down; fixes 'lacp_compose_key protection fault trap'
when removing a port from a lacp trunk. Part of a larger diff from mpi,
as suggested by mikeb. ok mpi@


# 1.125 21-Nov-2015 dlg

dont check IFF_OACTIVE to see if the port is busy.

dont check if its busy at all, actually.

fine with reyk@


# 1.124 20-Nov-2015 dlg

dont play with IFF_OACTIVE needlessly.

only a driver sets or clears it, and trunk never sets it. therefore it
never needs to clear it.


# 1.123 12-Nov-2015 mpi

Prefix flowid with ph_ and print it in m_print().

ok dlg@


# 1.122 25-Oct-2015 mpi

arp_ifinit() is no longer required.


# 1.121 08-Oct-2015 mikeb

Make sure that when trunk_port_ioctl is called to set a new
lladdr the trunk port is already on the list.

OK mpi


# 1.120 08-Oct-2015 dlg

if the mbuf has a valid flowid, use it instead of using siphash24
and a bunch of header fields we have to parse the mbuf for.

siphash24 is about 20% of the cost of sending a udp packet on a
trunk interface with tcpbench on my box. if there's a flowid set
we get all that back.

ok mpi@ mikeb@ sthen@


# 1.119 05-Oct-2015 mikeb

Factor LACP frame processing out to a separate task

This is slightly refactored version of the diff by jmatthew@
that makes use of a single per-trunk task but retains per-port
mbuf queues.

Running LACP frame processing in a task context allows a simple
way to synchronize changes to the trunk ports and trunk itself
performed from the ioctl, timeout and task contexts with a kernel
lock.

OK mpi


# 1.118 29-Sep-2015 deraadt

add sizes to some of the simpler free calls
ok mpi


# 1.117 28-Sep-2015 mpi

Remove "if_tp" from the "struct ifnet".

Instead of violating a layer of abstraction by keeping per pseudo-driver
informations in "struct ifnet", the port trunk is now passed as a cookie
to the interface input handler (ifih).

The time of per pseudo-driver hack in the network stack is over!

ok mikeb@


# 1.116 24-Sep-2015 mikeb

add a comment explaining how we serialize when switching trunkproto;
requested by mpi@


# 1.115 24-Sep-2015 mikeb

Avoid a theoretical m_pullup(9) mishandling by delegating the mbuf
reclaiming to the PDU and marker input routines.

m_pullup may return a pointer to the newly allocated mbuf. In this
case m_freem is called by the trunk_input, not by the proto specific
code and pointer to the mbuf is not passed by reference. Therefore
m_freem will either be called on the middle element of the chain
(when the m_pullup call succeeds) or on the stale pointer (when it
frees the chain in the failure case). Fortunately we should never
hit this case as the receive path uniformly uses contiguous chunks
of memory.

Verified with and ok blambert, ok mpi


# 1.114 23-Sep-2015 mikeb

Serialize trunk changes with input handler insertion and removal.

This moves around calls to if_ih_insert and if_ih_remove to ensure
that we either have completed port initialization or are going to
tear the port configuration down and don't want any input processes
to get hold of the port.

When trunk_port_destroy is called from the ioctl this would wait for
all input processes to finish and release their references to be able
to disestablish the input handler and ensure full control of the port.

When switching trunkproto it is required for the ioctl context to
be able to touch all trunk ports and the protocol (tr_psc). The
easiest way do this is to disestablish all input handlers (while
making sure they all complete) and then reestablish them after the
trunk reconfiguration is completed.

This avoids getting trunk a separate locking protocol of its own.

ok mpi, suggested by and ok dlg


# 1.113 23-Sep-2015 mikeb

Keep track of an active port in the failover trunk to avoid list
iterations and additional locking protection in the future.

Suggested by and ok mpi


# 1.112 23-Sep-2015 mikeb

Remove trunk watchdog code since it doesn't do anything useful
and we want to limit the number of different places where we
access trunk port pointers.

trunk_watchdog should be never called as we don't set up it's
if_timer and trunk_port_watchdog just calls the if_watchdog
from the underlying interface.

It's possible that this is no longer needed due to if_slowtimo/
if_watchdog changes done earlier.

ok mpi


# 1.111 10-Sep-2015 mikeb

pass a cookie argument to interface input handlers that can be used
to pass additional context or transient data with the similar life
time.

ok mpi, suggestions, hand holding and ok from dlg


# 1.110 10-Sep-2015 dlg

move the if input handler list to an SRP list.

instead of having every driver that manipulates the ifih list
understand SRPLs, this moves that processing into if_ih_insert and
if_ih_remove functions.

we rely on the kernel lock to serialise the modifications to the
list.

tested by mpi@
ok mpi@ claudio@ mikeb@


Revision tags: OPENBSD_5_8_BASE
# 1.109 17-Jul-2015 mpi

Drop promiscuously received packets if the trunk(4) interface is not
in promiscuous mode.

The long story is that claudio@ had his ssh session reset multiple
times in the hackroom because czarkoff@'s machine was sending reset.
We figured out that the packet was reaching pf because of this missing
check. pf would then not find any state and sent a reset.

Analyzed with and ok phessler@, claudio@


# 1.108 02-Jul-2015 mpi

Unify the check for up & running between all pseudo-drivers.


# 1.107 02-Jul-2015 mpi

By design if_input_process() needs to hold a reference on the receiving
ifp in order to access its ifih handlers.

So get rid of if_get() in the various ifih handlers we know the ifp is
live at this point.

ok dlg@


# 1.106 30-Jun-2015 mpi

Rename if_output() into if_enqueue() to avoid confusion with comments
talking about (*ifp->if_output)().

ok claudio@, dlg@


# 1.105 29-Jun-2015 dlg

count if_ibytes in if_input like we do for if_ipackets.

tweaks and ok mpi@


# 1.104 24-Jun-2015 mpi

Increment if_ipackets in if_input().

Note that pseudo-drivers not using if_input() are not affected by this
conversion.

ok mikeb@, kettenis@, claudio@, dlg@


# 1.103 16-Jun-2015 mpi

Store a unique ID, an interface index, rather than a pointer to the
receiving interface in the packet header of every mbuf.

The interface pointer should now be retrieved when necessary with
if_get(). If a NULL pointer is returned by if_get(), the interface
has probably been destroy/removed and the mbuf should be freed.

Such mechanism will simplify garbage collection of mbufs and limit
problems with dangling ifp pointers.

Tested by jmatthew@ and krw@, discussed with many.

ok mikeb@, bluhm@, dlg@


# 1.102 15-Jun-2015 mpi

Fix a double free in the destroy path triggered when a second process,
in my case dhclient(8), races with ifconfig(8) to free the descriptors
of the joined multicast groups.

While here reduce the difference with carp(4).

ok dms@


# 1.101 09-Jun-2015 mpi

Convert trunk(4) to if_input().

ok dlg@


# 1.100 26-May-2015 mpi

Now that the Ethernet header is always passed as part of the mbuf, kill
the second (unused) argument of the input packet handlers.

ok dlg@


# 1.99 15-May-2015 mpi

Introduce if_output(), a function do to the last steps before enqueuing
a packet on the sending queue of an interface.

Tested by many, thanks a lot!

ok dlg@, claudio@


# 1.98 14-May-2015 mpi

Allocate the input packet handler as part of the trunk_port structure
since they have the same lifetime.

Requested by and ok dlg@


# 1.97 13-May-2015 mpi

Get rid of the last "#if NTRUNK" by overwriting trunk ports' output
function.

ok claudio@, reyk@


# 1.96 11-May-2015 mpi

Take trunk(4) out of ether_input().

Each physical interface of a trunk now gets a specific ifih (interface
input handler) that runs before ether_input().

Tested by sthen@, dlg@, Theo Buehler and <mxb AT alumni.chalmers DOT se>

ok sthen@, dlg@


# 1.95 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.94 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.93 04-Dec-2014 tedu

use siphash for trunk loadbalancing. ok deraadt


# 1.92 01-Dec-2014 mikeb

Make every interface with a watchdog register it's own slow timeout

This removes the system wide if_slowtimo timeout and lets every
interface with a valid if_watchdog method register it's own in
order to get rid of the ifnet loop in the softclock context and
avoid further complications with concurrent access to the ifnet
list.

ok deraadt, input and ok mpi, looked at by claudio


# 1.91 18-Nov-2014 tedu

move arc4random prototype to systm.h. more appropriate for most code
to include that than rdnvar.h. ok deraadt dlg


Revision tags: OPENBSD_5_6_BASE
# 1.90 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.89 12-Jul-2014 tedu

add a size argument to free. will be used soon, but for now default to 0.
after discussions with beck deraadt kettenis.


# 1.88 09-Jul-2014 henning

bpf code surgery / shuffling / simplification.
the various bpf_mtap_* are very similiar, they differ in what (and to some
extent how) they prepend something, and what copy function they pass to
bpf_catchpacket.
use an internal _bpf_mtap as "backend" for bpf_mtap and friends.
extend bpf_mtap_hdr so that it covers all common cases:
if dlen is 0, nothing gets prepended.
copy function can be given, if NULL the default bpf_mcopy is used.
adjust the existing bpf_mtap_hdr users to pass a NULL ptr for the copy fn.
re-implement bpf_mtap_af as simple wrapper for bpf_mtap_hdr.
re-implement bpf_mtap_ether using bpf_map_hdr
re-implement bpf_mtap_pflog as trivial bpf_mtap_hdr wrapper
ok bluhm benno


# 1.87 10-Mar-2014 mpi

if_lladdr -> if_sadl, no functional change.

ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.86 21-Nov-2013 mpi

Remove unneeded include.

ok deraadt@


# 1.85 18-Nov-2013 mpi

Convert trunk(4) to use a detachhook, discussed at b2k13 with many.

While here add a comment explaining detach hooks' order of execution when
destroying/detaching an interface.


Revision tags: OPENBSD_5_4_BASE
# 1.84 20-Jun-2013 mpi

Revert previous and unbreak asr, the new include should be protected.

Reported by naddy@


# 1.83 20-Jun-2013 mpi

Allocate the various hook head descriptors as part of the ifnet
structure rather than doing various M_WAITOK allocations during
the *attach() functions, we always rely on them anyway.

ok mikeb@, uebayasi@


# 1.82 11-May-2013 sthen

Set trunk(4)'s MTU to that of the first trunkport. Allows trunk to work with
jumbo/baby-jumbo frames. To avoid problems with mismatches between trunkports,
any additional ports must have the same MTU as already set on the trunk(4).
Based on changes made in FreeBSD. Tested by myself and jj@, ok reyk@


# 1.81 02-Apr-2013 mpi

Instead of storing the link-level address of every interface in a global
array indexed by interface numbers, add a new field to the interface
descriptor pointing to it.

claudio@ and todd@ like it, ok mikeb@


# 1.80 28-Mar-2013 tedu

no need for a lot of code to include proc.h


Revision tags: OPENBSD_5_3_BASE
# 1.79 25-Feb-2013 dlg

trunk_bcast_start sent packets on all its member interfaces by copying
the mbuf it just sent on the previous interface. this is bad because the
previous interface could have modified the mbuf chain, which can make the
subsequent m_copym()s panic.

this copies the dance that rtsock.c does for broadcasting mbufs which
copies the mbuf before transmit, except for the last interface which it
handles outside the loop.

tested by halex@ who verified it fixes his panic.
ok claudio@ deraadt@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.78 28-Oct-2011 krw

Take more care to ensure all callbacks are initialized. In particular
tr_linkstate() was not initialized in trunk_rr_attach(), and
tr_init() and tr_stop() were missing in trunk_lb_attach().

Fixes crash triggered by changing trunkproto, reported by Anders
Berggren on bugs@.

ok mpf henning


Revision tags: OPENBSD_4_9_BASE OPENBSD_5_0_BASE
# 1.77 28-Jan-2011 reyk

Fix another memory leak by replacing m_free() with m_freem() in trunk's
broadcast mode.

ok blambert@ mikeb@


# 1.76 12-Nov-2010 dhill

avoid pointer arithmetic on void *
no change in .o

ok claudio


Revision tags: OPENBSD_4_8_BASE
# 1.75 08-May-2010 stsp

Upon changing the MAC address of an if_trunk interface, all ports are switched
to the new MAC. But subsequently added ports were still being assigned the
old MAC address because it was copied from the wrong place. Give newly added
trunk ports the current MAC of the master port, rather than the saved MAC of
the master port. The saved MAC should only be used to restore the original
MAC address of the interface when it is removed from the trunk.

ok claudio@


# 1.74 23-Apr-2010 stsp

Use proper Queen's English in a comment. Drive-by fix, no functional change.


# 1.73 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


# 1.72 17-Apr-2010 deraadt

use ifnewlladdr() for trunk lladdr changes, too
ok stsp


Revision tags: OPENBSD_4_7_BASE
# 1.71 12-Jan-2010 dlg

set the length of the send queue to 1.

this prevents the ultimate length of the queue of the underlying interface
from being artificially inflated while hte vlan/trunk queue is filled and
then dumped wholesale on the underlying interface, which will dump its
massive queue wholesale on the chip.

tx mitigation is only triggered on real interfaces now (which is where the
cost is)

ok beck@ original diff ok kjc@ henning@


# 1.70 18-Nov-2009 deraadt

do not do setup that ether_ifattach() takes care of; ok jsg


# 1.69 17-Sep-2009 claudio

Add an splassert check trunk_enqueue() calling this function at anything
below splnet() is a good recipe for doom.
OK henning, reyk, mpf


# 1.68 09-Sep-2009 reyk

remove inline functions and move some code from the trunk_lacp_input()
API function directly to lacp_input() to simplify the code path.

ok mpf@


# 1.67 16-Jul-2009 thib

Backout rev1.79 of if_vlan.c and rev1.66 of if_trunk.c;
Changes in those revision limited the send queue to one slot.

This breaks NFS over vlan(4) has discovered by sthen@.

"just plain back it out." deraadt@


# 1.66 13-Jul-2009 dlg

make the send queue one slot long. this forces packets off the virtual
interfaces down to the queue on the physical interface immediately.

this avoids having the tx mitigation code wasting cpu time dicking around
with simply shuffling packets off virtual interface queues and lets it
do its job of ammortising the cost of calling a real interfaces start
routine.

it also prevents an artificial inflation of the physical interfaces queue
length where packets could hide on the virtual interfaces queues during
softnet before being dumped en masse onto the hardware. this will smooth
out the rate at which packets are submitted to the hardware.

kjc@ says this has no impact on altq. ya henning@


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.65 27-Jan-2009 naddy

handle HW VLAN tags being passed down; from Brad


# 1.64 27-Jan-2009 naddy

make the hardware/no hardware tag stripping cases consistent and don't
hash the VLAN priority; ok henning@


# 1.63 14-Dec-2008 brad

Allow trunk_hashmbuf() to take HW VLAN tagging into consideration.

ok mpf@ naddy@


# 1.62 14-Dec-2008 brad

Since trunk_hashmbuf() and thus trunk_lb_gethdr() are no longer specific
to the loadbalance code rename trunk_lb_gethdr() to just trunk_gethr().

ok mpf@


# 1.61 28-Nov-2008 brad

Eliminate the redundant bits of code for MTU and multicast handling
from the individual drivers now that ether_ioctl() handles this.

Shrinks the i386 kernels by..
RAMDISK - 2176 bytes
RAMDISKB - 1504 bytes
RAMDISKC - 736 bytes

Tested by naddy@/okan@/sthen@/brad@/todd@/jmc@ and lots of users.
Build tested on almost all archs by todd@/brad@

ok naddy@


# 1.60 16-Nov-2008 brad

Make sure to increment the the output error counter if
not using TRUNK_PROTO_NONE and there are no member ports.

ok mpf@


# 1.59 08-Nov-2008 mpf

Take into account that our ether_input() already strips the
ethernet header. This lets us actually process the incoming
LACP-Packets. It should now work with a lot more switches.
At least a Catalyst 3500 seems happy.
OK brad@


# 1.58 04-Nov-2008 brad

Move the trunk port count check from trunk_lb_start() to trunk_start()
before the protocol start routine is called so as to cover all protocols
with the same check.

ok mpf@


# 1.57 30-Oct-2008 brad

Fix building with !INET6 kernels.


# 1.56 28-Oct-2008 brad

Remove return at the end of a void function.


# 1.55 28-Oct-2008 brad

In trunk_media_status() mark the interface as active if any ports are
active rather than just the primary being UP.

From FreeBSD

Ok mpf@


# 1.54 28-Oct-2008 brad

In trunk_lb_start() port % count will never be greater than
TRUNK_MAX_PORTS so nuke the test.

From FreeBSD

Ok mpf@


# 1.53 28-Oct-2008 brad

Feed IPv6 flow label to hash calculation.

From FreeBSD

Ok mpf@


# 1.52 28-Oct-2008 brad

Show the ACTIVE flag in ifconfig for the single interface that is
actually active in failover mode rather than all interfaces with a
link. This makes it clear if the master interface is in use or one
of the backup links.

From FreeBSD

Tested by jmc@
Ok mpf@


# 1.51 02-Oct-2008 brad

First step towards cleaning up the Ethernet driver ioctl handling.
Move calling ether_ioctl() from the top of the ioctl function, which
at the moment does absolutely nothing, to the default switch case.
Thus allowing drivers to define their own ioctl handlers and then
falling back on ether_ioctl(). The only functional change this results
in at the moment is having all Ethernet drivers returning the proper
errno of ENOTTY instead of EINVAL/ENXIO when encountering unknown
ioctl's.

Shrinks the i386 kernels by..
RAMDISK - 1024 bytes
RAMDISKB - 1120 bytes
RAMDISKC - 832 bytes

Tested by martin@/jsing@/todd@/brad@
Build tested on almost all archs by todd@/brad@

ok jsing@


# 1.50 17-Sep-2008 chl

remove dead stores and newly created unused variables.

fix potential use of uninitialized value in trunk_port_ioctl() function.

Found by LLVM/Clang Static Analyzer.

ok mpf@ henning@


# 1.49 07-Aug-2008 damien

do not touch m after IFQ_ENQUEUE()+if_start().

ok brad@, mpf@, henning@, reyk@


Revision tags: OPENBSD_4_4_BASE
# 1.48 06-Aug-2008 reyk

fix trunk breakage that sneaked in with the lacp diff:

- don't use in-kernel IFMEDIA ioctls in lacp and remove two KASSERTs
that caused reliable panics - the lacp key can be locally assigned and
we don't need to query the media subtype here.

- unbreak failover/loadbalance/broadcast status handling.

Reported by brad@
ok deraadt@


# 1.47 30-Jul-2008 mpf

Prevent a divide by zero panic if trunkproto loadbalance is
used w/out any trunkports. Patch from Dmitri Alenitchev.
OK reyk@


# 1.46 15-Jun-2008 mpf

Add 802.3ad LACP support for trunk(4).
Implementation from NetBSD. Ported via FreeBSD's version in trunk^Wlagg(4).
This is still work in progress. Tested with a HP ProCurve 3500.
OK reyk@


# 1.45 14-Jun-2008 mpf

Move bpf_mtap_hdr() above the trunk_*_input() routines.
This makes it easier to add trunk protocols that consume some packets.
Add a special case for the failover protocol, to prevent shoving
duplicates to bpf. (Not beautiful, but it has to do for the moment).
OK reyk@, claudio@


# 1.44 13-Jun-2008 mpf

Move the responsibility to free received packets on trunked interfaces
from ether_input() into trunk_input() where it can be handled in a smarter way.
OK claudio@ and reyk@ on an earlier version.


# 1.43 08-Jun-2008 brad

Use m_freem() instead of m_free() in trunk_start() to ensure that the
full mbuf chain is being free'd.

ok reyk@


# 1.42 07-May-2008 dlg

enable tx mitigation when putting packets on the wire by switching from
calls to ifp->if_start to if_start(). these are the obviously right cases
where we can do that, the less obvious ones may follow as theyre figured
out.

deraadt@ said to go for it


Revision tags: OPENBSD_4_3_BASE
# 1.41 10-Jan-2008 brad

return with ENOTTY instead of EINVAL for unknown ioctl requests to trunk
ports.

ok reyk@ dlg@


# 1.40 26-Nov-2007 martynas

typos; ok jmc@
sys/netinet/in_pcb.c and sys/net/bridgestp.c ok henning@
sys/dev/pci/bktr/* ok jakemsr@


# 1.39 20-Nov-2007 canacar

Fix possible mbuf leak on error. ok reyk@


# 1.38 22-Oct-2007 reyk

use the input mbuf for the first active port instead of copying it in
the broadcast start routing.

ok pyr@


# 1.37 22-Oct-2007 pyr

Add a broadcast mode to trunk(4). This mode sends frames on all
ports and receives frame on any port. This allows interaction with
some L2 configurations.
with input and ok reyk@


# 1.36 15-Sep-2007 henning

malloc sweep:
-remove useless casts
-MALLOC/FREE -> malloc/free
-use M_ZERO where appropriate instead of seperate bzero
feedback & ok krw, hshoexer


# 1.35 07-Sep-2007 reyk

use M_ZERO


# 1.34 06-Sep-2007 reyk

bump the copyright while touching these files


# 1.33 06-Sep-2007 reyk

With a trunk(4) interface in fail over mode the trunk(4) interface
will show input errors for packets received from any of the ports that
are part of a fail over interface but are not the "master" port at the
time. This fixes the problem by checking the error condition
correctly.

From brad at comstyle dot com


Revision tags: OPENBSD_4_2_BASE
# 1.32 26-May-2007 jason

one extern seems to be better than 20 for ifqmaxlen; ok krw


# 1.31 26-Apr-2007 reyk

extend the trunk protocol API with some additional callbacks required
for future work. also move the repeated tx start code into a common
function.

parts of it are merged from FreeBSD's trunk(4) port. oh, wait... they
renamed it to 'lagg(4)' because a little green guy from Cizzco-Eeeh
told them "trunk is for VLANs, trunk is for VLANs". Bad FreeBSD, don't
listen to the little green guy from Cizzco-Eeeh!

ok claudio@


Revision tags: OPENBSD_4_1_BASE
# 1.30 31-Jan-2007 reyk

handle the full duplex link state in trunk(4). load sharing trunks
with at least two ports are always handled as full fuplex links. this
change will allow trunks as edge ports in a rstp bridge(4).

ok brad@ pyr@


Revision tags: OPENBSD_4_0_BASE
# 1.29 28-May-2006 reyk

check if the interface is active and UP. some, but not all, network
drivers report an active link state even if the interface is DOWN.
this should fix trunk with various ethernet devices.

ok brad@


# 1.28 23-May-2006 reyk

knf and remove an unneeded debug message


# 1.27 23-May-2006 reyk

add


# 1.26 20-May-2006 reyk

bump copyright


# 1.25 20-May-2006 reyk

drop packets received on inactive failover ports without increasing the
error counter. just silently drop them...

figured out by todd@, ok brad@


# 1.24 16-May-2006 reyk

the ifp->if_linkstatehooks may be NULL, add an extra check to avoid
possible kernel panic. this happened to me by using tun(4) in layer 2
mode (link0 flag) as a trunk port for testing.


# 1.23 25-Mar-2006 djm

allow bpf(4) to ignore packets based on their direction (inbound or
outbound), using a new BIOCSDIRFILT ioctl;
guidance, feedback and ok canacar@


# 1.22 11-Mar-2006 brad

splimp -> splnet


# 1.21 04-Mar-2006 brad

With the exception of two other small uncommited diffs this moves
the remainder of the network stack from splimp to splnet.

ok miod@


Revision tags: OPENBSD_3_9_BASE
# 1.20 04-Jan-2006 brad

Move bpf_mtap_hdr() after trunk_*_input() so that we hopefully see only one
copy of the packet with bpf.

From mcbride@

some testing by todd@, ok reyk@


# 1.19 04-Jan-2006 canacar

Remove redundant calls to bpfdetach.
ok brad@


# 1.18 21-Dec-2005 reyk

knf


# 1.17 21-Dec-2005 reyk

fix possible NULL pointer, thanks to Marco Molteni


# 1.16 18-Dec-2005 reyk

Update my e-mail address in the copyright statement, no binary changes.


# 1.15 17-Dec-2005 brad

revert last commit as it introduced system panics due to improper use
of bpf_mtap().


# 1.14 27-Nov-2005 mcbride

Move bpf_mtap() after trunk_*_input() so that we hopefull see only one
copy of the packet with bpf.

ok reyk@


# 1.13 27-Nov-2005 mcbride

Attempt to accept a packet only once when operating in failover mode.

Makes trunk usable with hubs or switches which don't have actual trunk support.

ok reyk@


# 1.12 27-Nov-2005 mcbride

Fix dereference of uninitialised pointer in trunk_input() error path.

ok reyk@


# 1.11 27-Nov-2005 mcbride

Make the trunk interface link state depend on the link states of the
trunkports (link is UP as long as at least one of the trunkports is up)

ok reyk@


# 1.10 23-Oct-2005 mpf

Rework of multicast deletion code for vlan(4) and trunk(4).
The previous code could wrongly delete multicast groups
on the parent interface. Now we forward only remembered
delete requests.
OK mcbride, mickey.


# 1.9 09-Oct-2005 reyk

use label


# 1.8 03-Oct-2005 reyk

add a simple active "failover" trunk mode. port priorities will be
added later, currently the master port will always be the default
(active) port and the the next active port will be used as the
failover port.

ok brad@


# 1.7 14-Sep-2005 reyk

correctly set IFF_RUNNING flag after device state changes.

ok brad@


# 1.6 11-Sep-2005 brad

when adding any IP addresses make sure to UP the interface.

ok reyk@


# 1.5 10-Sep-2005 reyk

update the trunk(4) driver

- add multicast support by passing multicast addresses to the ports.
this is a requirement for carp(4) over trunk(4).

- support the smallest common interface capabilities. ie., this adds
support for VLAN MTUs if all attached ports have this capability.

- add a port_destroy callback to the trunk protocol. this fixes a
potential crash if the master port has been detached while running.

discussed with deraadt@, brad@ and some others.


Revision tags: OPENBSD_3_8_BASE
# 1.4 31-Jul-2005 pascoe

Introduce bpf_mtap_af and bpf_mtap_hdr to be used when passing a mbuf chain
to bpf with either an address family or other header added.

These helpers only allocate a much smaller struct m_hdr on the stack when
needed, rather than leaving 256 byte struct mbufs on the stack in deep
call paths. Also removes a fair bit of duplicated code.

commit now, tune after deraadt@


# 1.3 27-May-2005 reyk

add missing free on error. thanks to Andrey Matveev.


# 1.2 24-May-2005 reyk

support trunk stacking (trunks as trunk ports) and some fixes

ok brad@


# 1.1 24-May-2005 reyk

initial import of a trunking (link aggregation and link failover)
implementation. it currently supports round robin mode with link state
checking, additional modes will be added later.

ok brad@, deraadt@


# 1.150 12-Sep-2020 kn

Keep port interface UP on removal

There is no reason to change flags on member interfaces when removing
them, aggr(4) does not pull its members down either.

OK florian bluhm


# 1.149 28-Jul-2020 mvs

Add missing `IFXF_CLONED' flag to clone interfaces.

ok mpi@


# 1.148 22-Jul-2020 dlg

deprecate interface input handler lists, just use one input function.

the interface input handler lists were originally set up to help
us during the intial mpsafe network stack work. at the time not all
the virtual ethernet interfaces (vlan, svlan, bridge, trunk, etc)
were mpsafe, so we wanted a way to avoid them by default, and only
take the kernel lock hit when they were specifically enabled on the
interface. since then, they have been fixed up to be mpsafe.

i could leave the list in place, but it has some semantic problems.
because virtual interfaces filter packets based on the order they
were attached to the parent interface, you can get packets taken
away in surprising ways, especially when you reboot and netstart
does something different to what you did by hand. by hardcoding the
order that things like vlan and bridge get to look at packets, we
can document the behaviour and get consistency.

it also means we can get rid of a use of SRPs which were difficult
to replace with SMRs. the interface input handler list is an SRPL,
which we would like to deprecate. it turns out that you can sleep
during stack processing, which you're not supposed to do with SRPs
or SMRs, but SRPs are a lot more forgiving and it worked.

lastly, it turns out that this code is faster than the input list
handling, so lots of winning all around.

special thanks to hrvoje popovski and aaron bieber for testing.
this has been in snaps as part of a larger diff for over a week.


# 1.147 10-Jul-2020 patrick

Change users of IFQ_DEQUEUE(), IFQ_ENQUEUE() and IFQ_LEN() to use the
"new" API.

ok dlg@ tobhe@


# 1.146 17-Jun-2020 dlg

make ph_flowid in mbufs 16bits by storing whether it's set in csum_flags.

i've been wanting to do this for a while, and now that we've got
stoeplitz and it gives us 16 bits, it seems like the right time.


# 1.145 21-May-2020 dlg

don't limit the output queue (ifq) length to 1 anymore.

if we use the ifq to move packet processing to another context,
it's too easy to fill up the one slot and cause packet loss.

the ifq len was set to 1 to avoid delays produced by the original
implementation of tx mitigation. however, trunk now introduces
latency because it isn't mpsafe yet, which causes the network stack
to have to take the kernel lock for each packet, and the kernel
lock can be quite contended. i want to use the ifq to move the
packet to the systq thread (which already has the kernel lock)
before trunk is asked to transmit it.

tested by mark patruck and myself.


Revision tags: OPENBSD_6_7_BASE
# 1.144 06-Dec-2019 dlg

when copying capabilities from the first port to a trunk, copy hardmtu too.

previously it copied the ports if_mtu to the trunks if_hardmtu,
which makes it hard for things like vlan(4) to work with a full
frame size, or large frame size.

tested by hrvoje popovski


# 1.143 07-Nov-2019 dlg

turn the linkstate hooks into a task list, like the detach hooks.

this is largely mechanical, except for carp. this moves the addition
of the carp link state hook after we're committed to using the new
interface as a carpdev. because the add can't fail, we avoid a
complicated unwind dance. also, this tweaks the carp linkstate hook
so it only updates the relevant carp interface, not all of the
carpdevs on the parent.

hrvoje popovski has tested an early version of this diff and it's
generally ok, but there's some splasserts that this diff fires that
i'll fix in an upcoming diff.

ok claudio@


# 1.142 06-Nov-2019 dlg

replace the hooks used with if_detachhooks with a task list.

the main semantic change is that things registering detach hooks
have to allocate and set a task structure that then gets added to
the list. this means if the task is allocated up front (eg, as part
of carps softc or bridges port structure), it avoids the possibility
that adding a hook can fail. a lot of drivers weren't checking for
failure, and unwinding state in the event of failure in other parts
was error prone.

while doing this i discovered that the list operations have to be
in a particular order, but drivers weren't doing that consistently
either. this diff wraps the list ops up so you have to seriously
go out of your way to screw them up.

ive also sprinkled some NET_ASSERT_LOCKED around the list operations
so we can make sure there's no potential for the list to be corrupted,
especially while it's being run.

hrvoje popovski has tested this a bit, and some issues he discovered
have been fixed.

ok sashan@


Revision tags: OPENBSD_6_6_BASE
# 1.141 05-Jul-2019 dlg

record when trunk takes over an interface by setting ac_trunkport

this will be used to prevent trunk and the upcoming aggr driver
from taking ownership of an Ethernet interface at the same time.


# 1.140 11-May-2019 florian

A trunk(4) usually stays up when the link state of one of its members
changes. While we do get RTM_IFINFO messages for the (physical) member
interfaces there is no indication that something changed from the
trunk(4) interface.
It is helpful to get this information in userland from the trunk so that
userland daemons do not need to track interface membership by themselves.
OK phessler


# 1.139 29-Apr-2019 dlg

tr_unit is unused, so gc it


# 1.138 23-Apr-2019 dlg

a first cut at converting some virtual ethernet interfaces to if_vinput

this let's input processing bypass ifiqs. there's a performance
benefit from this, and it will let me tweak the backpressure detection
mechanism that ifiqs use without impacting on a stack of virtual
interfaces.

ive tested all of these except mpw, which i will end up testing
soon anyway.


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.137 12-Aug-2018 ccardenas

Add administrative options to LACP trunk implementation.

The trunk driver now has a new ioctl (SIOCxTRUNKOPTS), which for now only
has options for LACP:
* Mode - Active or Passive (default Active)
* Timeout - Fast or Slow (default Slow)
* System Priority - 1(high) to 65535(low) (default 32768/0x8000)
* Port Priority - 1(high) to 65535(low) (default 32768/0x8000)
* IFQ Priority - 0 to NUM_QUEUES (default 6)

At the moment, ifconfig only has options for lacpmode and lacptimeout
plumbed as those are the immediate need.

The approach taken for the options was to make them on a "trunk" vs a
"port" as what's typically seen on various NOSes (JunOS, NXOS, etc...)
as it's uncommon for a host to have one link "Passive" and the other
"Active" in a given trunk.

Just like on a NOS, when applying lacpmode or lacptimeout, the settings
are immediately applied to all existing ports in the trunk and to all
future ports brought into the trunk.

Tested by many on a plethora of NIC drivers and switches.

Ok remi@


Revision tags: OPENBSD_6_3_BASE
# 1.136 19-Feb-2018 mpi

Remove almost unused `flags' argument of suser().

The account flag `ASU' will no longer be set but that makes suser()
mpsafe since it no longer mess with a per-process field.

No objection from millert@, ok tedu@, bluhm@


# 1.135 09-Jan-2018 bluhm

Creating a cloned interface could return ENOMEM due to temporary
memory shortage. As it is invoked from a system call, it should
not fail and wait instead.
OK visa@ mpi@


Revision tags: OPENBSD_6_2_BASE
# 1.134 14-Aug-2017 reyk

The "ret" return value is reused and overwritten, potentially
returning 0 (success) on error instead of an error number. The caller
doesn't evaluate the return value, so it is good enough to return
ENOBUFS (non-0) on error and to remove "ret" in trunk_cast_start().

Coverity CID 1453105; Severity: Minor

OK mpi@


# 1.133 11-Aug-2017 mpi

Remove NET_LOCK()'s argument.

Tested by Hrvoje Popovski, ok bluhm@


# 1.132 28-May-2017 mpi

Add missing NET_UNLOCK() in error path.

Spotted by sashan@


# 1.131 28-May-2017 mpi

trunk_port_destroy() needs the NET_LOCK().

It brings the interface down and restore the original lladdr.

Found by Hrvoje Popovski


# 1.130 28-May-2017 mpi

Remove useless splnet()/splx() dances.

Data structures modified in the ioctl path are protected by the NET_LOCK().

ok sashan@


Revision tags: OPENBSD_6_1_BASE
# 1.129 22-Jan-2017 dlg

move counting if_opackets next to counting if_obytes in if_enqueue.

this means packets are consistently counted in one place, unlike the
many and various ways that drivers thought they should do it.

ok mpi@ deraadt@


# 1.128 16-Sep-2016 mikeb

Reconfigure interface capabilities after switching trunkproto; ok mpi


Revision tags: OPENBSD_6_0_BASE
# 1.127 13-Apr-2016 mpi

We're always ready! So send IFQ_SET_READY() to the bitbucket.


Revision tags: OPENBSD_5_9_BASE
# 1.126 31-Dec-2015 sthen

Move tr_port_destroy down; fixes 'lacp_compose_key protection fault trap'
when removing a port from a lacp trunk. Part of a larger diff from mpi,
as suggested by mikeb. ok mpi@


# 1.125 21-Nov-2015 dlg

dont check IFF_OACTIVE to see if the port is busy.

dont check if its busy at all, actually.

fine with reyk@


# 1.124 20-Nov-2015 dlg

dont play with IFF_OACTIVE needlessly.

only a driver sets or clears it, and trunk never sets it. therefore it
never needs to clear it.


# 1.123 12-Nov-2015 mpi

Prefix flowid with ph_ and print it in m_print().

ok dlg@


# 1.122 25-Oct-2015 mpi

arp_ifinit() is no longer required.


# 1.121 08-Oct-2015 mikeb

Make sure that when trunk_port_ioctl is called to set a new
lladdr the trunk port is already on the list.

OK mpi


# 1.120 08-Oct-2015 dlg

if the mbuf has a valid flowid, use it instead of using siphash24
and a bunch of header fields we have to parse the mbuf for.

siphash24 is about 20% of the cost of sending a udp packet on a
trunk interface with tcpbench on my box. if there's a flowid set
we get all that back.

ok mpi@ mikeb@ sthen@


# 1.119 05-Oct-2015 mikeb

Factor LACP frame processing out to a separate task

This is slightly refactored version of the diff by jmatthew@
that makes use of a single per-trunk task but retains per-port
mbuf queues.

Running LACP frame processing in a task context allows a simple
way to synchronize changes to the trunk ports and trunk itself
performed from the ioctl, timeout and task contexts with a kernel
lock.

OK mpi


# 1.118 29-Sep-2015 deraadt

add sizes to some of the simpler free calls
ok mpi


# 1.117 28-Sep-2015 mpi

Remove "if_tp" from the "struct ifnet".

Instead of violating a layer of abstraction by keeping per pseudo-driver
informations in "struct ifnet", the port trunk is now passed as a cookie
to the interface input handler (ifih).

The time of per pseudo-driver hack in the network stack is over!

ok mikeb@


# 1.116 24-Sep-2015 mikeb

add a comment explaining how we serialize when switching trunkproto;
requested by mpi@


# 1.115 24-Sep-2015 mikeb

Avoid a theoretical m_pullup(9) mishandling by delegating the mbuf
reclaiming to the PDU and marker input routines.

m_pullup may return a pointer to the newly allocated mbuf. In this
case m_freem is called by the trunk_input, not by the proto specific
code and pointer to the mbuf is not passed by reference. Therefore
m_freem will either be called on the middle element of the chain
(when the m_pullup call succeeds) or on the stale pointer (when it
frees the chain in the failure case). Fortunately we should never
hit this case as the receive path uniformly uses contiguous chunks
of memory.

Verified with and ok blambert, ok mpi


# 1.114 23-Sep-2015 mikeb

Serialize trunk changes with input handler insertion and removal.

This moves around calls to if_ih_insert and if_ih_remove to ensure
that we either have completed port initialization or are going to
tear the port configuration down and don't want any input processes
to get hold of the port.

When trunk_port_destroy is called from the ioctl this would wait for
all input processes to finish and release their references to be able
to disestablish the input handler and ensure full control of the port.

When switching trunkproto it is required for the ioctl context to
be able to touch all trunk ports and the protocol (tr_psc). The
easiest way do this is to disestablish all input handlers (while
making sure they all complete) and then reestablish them after the
trunk reconfiguration is completed.

This avoids getting trunk a separate locking protocol of its own.

ok mpi, suggested by and ok dlg


# 1.113 23-Sep-2015 mikeb

Keep track of an active port in the failover trunk to avoid list
iterations and additional locking protection in the future.

Suggested by and ok mpi


# 1.112 23-Sep-2015 mikeb

Remove trunk watchdog code since it doesn't do anything useful
and we want to limit the number of different places where we
access trunk port pointers.

trunk_watchdog should be never called as we don't set up it's
if_timer and trunk_port_watchdog just calls the if_watchdog
from the underlying interface.

It's possible that this is no longer needed due to if_slowtimo/
if_watchdog changes done earlier.

ok mpi


# 1.111 10-Sep-2015 mikeb

pass a cookie argument to interface input handlers that can be used
to pass additional context or transient data with the similar life
time.

ok mpi, suggestions, hand holding and ok from dlg


# 1.110 10-Sep-2015 dlg

move the if input handler list to an SRP list.

instead of having every driver that manipulates the ifih list
understand SRPLs, this moves that processing into if_ih_insert and
if_ih_remove functions.

we rely on the kernel lock to serialise the modifications to the
list.

tested by mpi@
ok mpi@ claudio@ mikeb@


Revision tags: OPENBSD_5_8_BASE
# 1.109 17-Jul-2015 mpi

Drop promiscuously received packets if the trunk(4) interface is not
in promiscuous mode.

The long story is that claudio@ had his ssh session reset multiple
times in the hackroom because czarkoff@'s machine was sending reset.
We figured out that the packet was reaching pf because of this missing
check. pf would then not find any state and sent a reset.

Analyzed with and ok phessler@, claudio@


# 1.108 02-Jul-2015 mpi

Unify the check for up & running between all pseudo-drivers.


# 1.107 02-Jul-2015 mpi

By design if_input_process() needs to hold a reference on the receiving
ifp in order to access its ifih handlers.

So get rid of if_get() in the various ifih handlers we know the ifp is
live at this point.

ok dlg@


# 1.106 30-Jun-2015 mpi

Rename if_output() into if_enqueue() to avoid confusion with comments
talking about (*ifp->if_output)().

ok claudio@, dlg@


# 1.105 29-Jun-2015 dlg

count if_ibytes in if_input like we do for if_ipackets.

tweaks and ok mpi@


# 1.104 24-Jun-2015 mpi

Increment if_ipackets in if_input().

Note that pseudo-drivers not using if_input() are not affected by this
conversion.

ok mikeb@, kettenis@, claudio@, dlg@


# 1.103 16-Jun-2015 mpi

Store a unique ID, an interface index, rather than a pointer to the
receiving interface in the packet header of every mbuf.

The interface pointer should now be retrieved when necessary with
if_get(). If a NULL pointer is returned by if_get(), the interface
has probably been destroy/removed and the mbuf should be freed.

Such mechanism will simplify garbage collection of mbufs and limit
problems with dangling ifp pointers.

Tested by jmatthew@ and krw@, discussed with many.

ok mikeb@, bluhm@, dlg@


# 1.102 15-Jun-2015 mpi

Fix a double free in the destroy path triggered when a second process,
in my case dhclient(8), races with ifconfig(8) to free the descriptors
of the joined multicast groups.

While here reduce the difference with carp(4).

ok dms@


# 1.101 09-Jun-2015 mpi

Convert trunk(4) to if_input().

ok dlg@


# 1.100 26-May-2015 mpi

Now that the Ethernet header is always passed as part of the mbuf, kill
the second (unused) argument of the input packet handlers.

ok dlg@


# 1.99 15-May-2015 mpi

Introduce if_output(), a function do to the last steps before enqueuing
a packet on the sending queue of an interface.

Tested by many, thanks a lot!

ok dlg@, claudio@


# 1.98 14-May-2015 mpi

Allocate the input packet handler as part of the trunk_port structure
since they have the same lifetime.

Requested by and ok dlg@


# 1.97 13-May-2015 mpi

Get rid of the last "#if NTRUNK" by overwriting trunk ports' output
function.

ok claudio@, reyk@


# 1.96 11-May-2015 mpi

Take trunk(4) out of ether_input().

Each physical interface of a trunk now gets a specific ifih (interface
input handler) that runs before ether_input().

Tested by sthen@, dlg@, Theo Buehler and <mxb AT alumni.chalmers DOT se>

ok sthen@, dlg@


# 1.95 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.94 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.93 04-Dec-2014 tedu

use siphash for trunk loadbalancing. ok deraadt


# 1.92 01-Dec-2014 mikeb

Make every interface with a watchdog register it's own slow timeout

This removes the system wide if_slowtimo timeout and lets every
interface with a valid if_watchdog method register it's own in
order to get rid of the ifnet loop in the softclock context and
avoid further complications with concurrent access to the ifnet
list.

ok deraadt, input and ok mpi, looked at by claudio


# 1.91 18-Nov-2014 tedu

move arc4random prototype to systm.h. more appropriate for most code
to include that than rdnvar.h. ok deraadt dlg


Revision tags: OPENBSD_5_6_BASE
# 1.90 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.89 12-Jul-2014 tedu

add a size argument to free. will be used soon, but for now default to 0.
after discussions with beck deraadt kettenis.


# 1.88 09-Jul-2014 henning

bpf code surgery / shuffling / simplification.
the various bpf_mtap_* are very similiar, they differ in what (and to some
extent how) they prepend something, and what copy function they pass to
bpf_catchpacket.
use an internal _bpf_mtap as "backend" for bpf_mtap and friends.
extend bpf_mtap_hdr so that it covers all common cases:
if dlen is 0, nothing gets prepended.
copy function can be given, if NULL the default bpf_mcopy is used.
adjust the existing bpf_mtap_hdr users to pass a NULL ptr for the copy fn.
re-implement bpf_mtap_af as simple wrapper for bpf_mtap_hdr.
re-implement bpf_mtap_ether using bpf_map_hdr
re-implement bpf_mtap_pflog as trivial bpf_mtap_hdr wrapper
ok bluhm benno


# 1.87 10-Mar-2014 mpi

if_lladdr -> if_sadl, no functional change.

ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.86 21-Nov-2013 mpi

Remove unneeded include.

ok deraadt@


# 1.85 18-Nov-2013 mpi

Convert trunk(4) to use a detachhook, discussed at b2k13 with many.

While here add a comment explaining detach hooks' order of execution when
destroying/detaching an interface.


Revision tags: OPENBSD_5_4_BASE
# 1.84 20-Jun-2013 mpi

Revert previous and unbreak asr, the new include should be protected.

Reported by naddy@


# 1.83 20-Jun-2013 mpi

Allocate the various hook head descriptors as part of the ifnet
structure rather than doing various M_WAITOK allocations during
the *attach() functions, we always rely on them anyway.

ok mikeb@, uebayasi@


# 1.82 11-May-2013 sthen

Set trunk(4)'s MTU to that of the first trunkport. Allows trunk to work with
jumbo/baby-jumbo frames. To avoid problems with mismatches between trunkports,
any additional ports must have the same MTU as already set on the trunk(4).
Based on changes made in FreeBSD. Tested by myself and jj@, ok reyk@


# 1.81 02-Apr-2013 mpi

Instead of storing the link-level address of every interface in a global
array indexed by interface numbers, add a new field to the interface
descriptor pointing to it.

claudio@ and todd@ like it, ok mikeb@


# 1.80 28-Mar-2013 tedu

no need for a lot of code to include proc.h


Revision tags: OPENBSD_5_3_BASE
# 1.79 25-Feb-2013 dlg

trunk_bcast_start sent packets on all its member interfaces by copying
the mbuf it just sent on the previous interface. this is bad because the
previous interface could have modified the mbuf chain, which can make the
subsequent m_copym()s panic.

this copies the dance that rtsock.c does for broadcasting mbufs which
copies the mbuf before transmit, except for the last interface which it
handles outside the loop.

tested by halex@ who verified it fixes his panic.
ok claudio@ deraadt@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.78 28-Oct-2011 krw

Take more care to ensure all callbacks are initialized. In particular
tr_linkstate() was not initialized in trunk_rr_attach(), and
tr_init() and tr_stop() were missing in trunk_lb_attach().

Fixes crash triggered by changing trunkproto, reported by Anders
Berggren on bugs@.

ok mpf henning


Revision tags: OPENBSD_4_9_BASE OPENBSD_5_0_BASE
# 1.77 28-Jan-2011 reyk

Fix another memory leak by replacing m_free() with m_freem() in trunk's
broadcast mode.

ok blambert@ mikeb@


# 1.76 12-Nov-2010 dhill

avoid pointer arithmetic on void *
no change in .o

ok claudio


Revision tags: OPENBSD_4_8_BASE
# 1.75 08-May-2010 stsp

Upon changing the MAC address of an if_trunk interface, all ports are switched
to the new MAC. But subsequently added ports were still being assigned the
old MAC address because it was copied from the wrong place. Give newly added
trunk ports the current MAC of the master port, rather than the saved MAC of
the master port. The saved MAC should only be used to restore the original
MAC address of the interface when it is removed from the trunk.

ok claudio@


# 1.74 23-Apr-2010 stsp

Use proper Queen's English in a comment. Drive-by fix, no functional change.


# 1.73 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


# 1.72 17-Apr-2010 deraadt

use ifnewlladdr() for trunk lladdr changes, too
ok stsp


Revision tags: OPENBSD_4_7_BASE
# 1.71 12-Jan-2010 dlg

set the length of the send queue to 1.

this prevents the ultimate length of the queue of the underlying interface
from being artificially inflated while hte vlan/trunk queue is filled and
then dumped wholesale on the underlying interface, which will dump its
massive queue wholesale on the chip.

tx mitigation is only triggered on real interfaces now (which is where the
cost is)

ok beck@ original diff ok kjc@ henning@


# 1.70 18-Nov-2009 deraadt

do not do setup that ether_ifattach() takes care of; ok jsg


# 1.69 17-Sep-2009 claudio

Add an splassert check trunk_enqueue() calling this function at anything
below splnet() is a good recipe for doom.
OK henning, reyk, mpf


# 1.68 09-Sep-2009 reyk

remove inline functions and move some code from the trunk_lacp_input()
API function directly to lacp_input() to simplify the code path.

ok mpf@


# 1.67 16-Jul-2009 thib

Backout rev1.79 of if_vlan.c and rev1.66 of if_trunk.c;
Changes in those revision limited the send queue to one slot.

This breaks NFS over vlan(4) has discovered by sthen@.

"just plain back it out." deraadt@


# 1.66 13-Jul-2009 dlg

make the send queue one slot long. this forces packets off the virtual
interfaces down to the queue on the physical interface immediately.

this avoids having the tx mitigation code wasting cpu time dicking around
with simply shuffling packets off virtual interface queues and lets it
do its job of ammortising the cost of calling a real interfaces start
routine.

it also prevents an artificial inflation of the physical interfaces queue
length where packets could hide on the virtual interfaces queues during
softnet before being dumped en masse onto the hardware. this will smooth
out the rate at which packets are submitted to the hardware.

kjc@ says this has no impact on altq. ya henning@


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.65 27-Jan-2009 naddy

handle HW VLAN tags being passed down; from Brad


# 1.64 27-Jan-2009 naddy

make the hardware/no hardware tag stripping cases consistent and don't
hash the VLAN priority; ok henning@


# 1.63 14-Dec-2008 brad

Allow trunk_hashmbuf() to take HW VLAN tagging into consideration.

ok mpf@ naddy@


# 1.62 14-Dec-2008 brad

Since trunk_hashmbuf() and thus trunk_lb_gethdr() are no longer specific
to the loadbalance code rename trunk_lb_gethdr() to just trunk_gethr().

ok mpf@


# 1.61 28-Nov-2008 brad

Eliminate the redundant bits of code for MTU and multicast handling
from the individual drivers now that ether_ioctl() handles this.

Shrinks the i386 kernels by..
RAMDISK - 2176 bytes
RAMDISKB - 1504 bytes
RAMDISKC - 736 bytes

Tested by naddy@/okan@/sthen@/brad@/todd@/jmc@ and lots of users.
Build tested on almost all archs by todd@/brad@

ok naddy@


# 1.60 16-Nov-2008 brad

Make sure to increment the the output error counter if
not using TRUNK_PROTO_NONE and there are no member ports.

ok mpf@


# 1.59 08-Nov-2008 mpf

Take into account that our ether_input() already strips the
ethernet header. This lets us actually process the incoming
LACP-Packets. It should now work with a lot more switches.
At least a Catalyst 3500 seems happy.
OK brad@


# 1.58 04-Nov-2008 brad

Move the trunk port count check from trunk_lb_start() to trunk_start()
before the protocol start routine is called so as to cover all protocols
with the same check.

ok mpf@


# 1.57 30-Oct-2008 brad

Fix building with !INET6 kernels.


# 1.56 28-Oct-2008 brad

Remove return at the end of a void function.


# 1.55 28-Oct-2008 brad

In trunk_media_status() mark the interface as active if any ports are
active rather than just the primary being UP.

From FreeBSD

Ok mpf@


# 1.54 28-Oct-2008 brad

In trunk_lb_start() port % count will never be greater than
TRUNK_MAX_PORTS so nuke the test.

From FreeBSD

Ok mpf@


# 1.53 28-Oct-2008 brad

Feed IPv6 flow label to hash calculation.

From FreeBSD

Ok mpf@


# 1.52 28-Oct-2008 brad

Show the ACTIVE flag in ifconfig for the single interface that is
actually active in failover mode rather than all interfaces with a
link. This makes it clear if the master interface is in use or one
of the backup links.

From FreeBSD

Tested by jmc@
Ok mpf@


# 1.51 02-Oct-2008 brad

First step towards cleaning up the Ethernet driver ioctl handling.
Move calling ether_ioctl() from the top of the ioctl function, which
at the moment does absolutely nothing, to the default switch case.
Thus allowing drivers to define their own ioctl handlers and then
falling back on ether_ioctl(). The only functional change this results
in at the moment is having all Ethernet drivers returning the proper
errno of ENOTTY instead of EINVAL/ENXIO when encountering unknown
ioctl's.

Shrinks the i386 kernels by..
RAMDISK - 1024 bytes
RAMDISKB - 1120 bytes
RAMDISKC - 832 bytes

Tested by martin@/jsing@/todd@/brad@
Build tested on almost all archs by todd@/brad@

ok jsing@


# 1.50 17-Sep-2008 chl

remove dead stores and newly created unused variables.

fix potential use of uninitialized value in trunk_port_ioctl() function.

Found by LLVM/Clang Static Analyzer.

ok mpf@ henning@


# 1.49 07-Aug-2008 damien

do not touch m after IFQ_ENQUEUE()+if_start().

ok brad@, mpf@, henning@, reyk@


Revision tags: OPENBSD_4_4_BASE
# 1.48 06-Aug-2008 reyk

fix trunk breakage that sneaked in with the lacp diff:

- don't use in-kernel IFMEDIA ioctls in lacp and remove two KASSERTs
that caused reliable panics - the lacp key can be locally assigned and
we don't need to query the media subtype here.

- unbreak failover/loadbalance/broadcast status handling.

Reported by brad@
ok deraadt@


# 1.47 30-Jul-2008 mpf

Prevent a divide by zero panic if trunkproto loadbalance is
used w/out any trunkports. Patch from Dmitri Alenitchev.
OK reyk@


# 1.46 15-Jun-2008 mpf

Add 802.3ad LACP support for trunk(4).
Implementation from NetBSD. Ported via FreeBSD's version in trunk^Wlagg(4).
This is still work in progress. Tested with a HP ProCurve 3500.
OK reyk@


# 1.45 14-Jun-2008 mpf

Move bpf_mtap_hdr() above the trunk_*_input() routines.
This makes it easier to add trunk protocols that consume some packets.
Add a special case for the failover protocol, to prevent shoving
duplicates to bpf. (Not beautiful, but it has to do for the moment).
OK reyk@, claudio@


# 1.44 13-Jun-2008 mpf

Move the responsibility to free received packets on trunked interfaces
from ether_input() into trunk_input() where it can be handled in a smarter way.
OK claudio@ and reyk@ on an earlier version.


# 1.43 08-Jun-2008 brad

Use m_freem() instead of m_free() in trunk_start() to ensure that the
full mbuf chain is being free'd.

ok reyk@


# 1.42 07-May-2008 dlg

enable tx mitigation when putting packets on the wire by switching from
calls to ifp->if_start to if_start(). these are the obviously right cases
where we can do that, the less obvious ones may follow as theyre figured
out.

deraadt@ said to go for it


Revision tags: OPENBSD_4_3_BASE
# 1.41 10-Jan-2008 brad

return with ENOTTY instead of EINVAL for unknown ioctl requests to trunk
ports.

ok reyk@ dlg@


# 1.40 26-Nov-2007 martynas

typos; ok jmc@
sys/netinet/in_pcb.c and sys/net/bridgestp.c ok henning@
sys/dev/pci/bktr/* ok jakemsr@


# 1.39 20-Nov-2007 canacar

Fix possible mbuf leak on error. ok reyk@


# 1.38 22-Oct-2007 reyk

use the input mbuf for the first active port instead of copying it in
the broadcast start routing.

ok pyr@


# 1.37 22-Oct-2007 pyr

Add a broadcast mode to trunk(4). This mode sends frames on all
ports and receives frame on any port. This allows interaction with
some L2 configurations.
with input and ok reyk@


# 1.36 15-Sep-2007 henning

malloc sweep:
-remove useless casts
-MALLOC/FREE -> malloc/free
-use M_ZERO where appropriate instead of seperate bzero
feedback & ok krw, hshoexer


# 1.35 07-Sep-2007 reyk

use M_ZERO


# 1.34 06-Sep-2007 reyk

bump the copyright while touching these files


# 1.33 06-Sep-2007 reyk

With a trunk(4) interface in fail over mode the trunk(4) interface
will show input errors for packets received from any of the ports that
are part of a fail over interface but are not the "master" port at the
time. This fixes the problem by checking the error condition
correctly.

From brad at comstyle dot com


Revision tags: OPENBSD_4_2_BASE
# 1.32 26-May-2007 jason

one extern seems to be better than 20 for ifqmaxlen; ok krw


# 1.31 26-Apr-2007 reyk

extend the trunk protocol API with some additional callbacks required
for future work. also move the repeated tx start code into a common
function.

parts of it are merged from FreeBSD's trunk(4) port. oh, wait... they
renamed it to 'lagg(4)' because a little green guy from Cizzco-Eeeh
told them "trunk is for VLANs, trunk is for VLANs". Bad FreeBSD, don't
listen to the little green guy from Cizzco-Eeeh!

ok claudio@


Revision tags: OPENBSD_4_1_BASE
# 1.30 31-Jan-2007 reyk

handle the full duplex link state in trunk(4). load sharing trunks
with at least two ports are always handled as full fuplex links. this
change will allow trunks as edge ports in a rstp bridge(4).

ok brad@ pyr@


Revision tags: OPENBSD_4_0_BASE
# 1.29 28-May-2006 reyk

check if the interface is active and UP. some, but not all, network
drivers report an active link state even if the interface is DOWN.
this should fix trunk with various ethernet devices.

ok brad@


# 1.28 23-May-2006 reyk

knf and remove an unneeded debug message


# 1.27 23-May-2006 reyk

add


# 1.26 20-May-2006 reyk

bump copyright


# 1.25 20-May-2006 reyk

drop packets received on inactive failover ports without increasing the
error counter. just silently drop them...

figured out by todd@, ok brad@


# 1.24 16-May-2006 reyk

the ifp->if_linkstatehooks may be NULL, add an extra check to avoid
possible kernel panic. this happened to me by using tun(4) in layer 2
mode (link0 flag) as a trunk port for testing.


# 1.23 25-Mar-2006 djm

allow bpf(4) to ignore packets based on their direction (inbound or
outbound), using a new BIOCSDIRFILT ioctl;
guidance, feedback and ok canacar@


# 1.22 11-Mar-2006 brad

splimp -> splnet


# 1.21 04-Mar-2006 brad

With the exception of two other small uncommited diffs this moves
the remainder of the network stack from splimp to splnet.

ok miod@


Revision tags: OPENBSD_3_9_BASE
# 1.20 04-Jan-2006 brad

Move bpf_mtap_hdr() after trunk_*_input() so that we hopefully see only one
copy of the packet with bpf.

From mcbride@

some testing by todd@, ok reyk@


# 1.19 04-Jan-2006 canacar

Remove redundant calls to bpfdetach.
ok brad@


# 1.18 21-Dec-2005 reyk

knf


# 1.17 21-Dec-2005 reyk

fix possible NULL pointer, thanks to Marco Molteni


# 1.16 18-Dec-2005 reyk

Update my e-mail address in the copyright statement, no binary changes.


# 1.15 17-Dec-2005 brad

revert last commit as it introduced system panics due to improper use
of bpf_mtap().


# 1.14 27-Nov-2005 mcbride

Move bpf_mtap() after trunk_*_input() so that we hopefull see only one
copy of the packet with bpf.

ok reyk@


# 1.13 27-Nov-2005 mcbride

Attempt to accept a packet only once when operating in failover mode.

Makes trunk usable with hubs or switches which don't have actual trunk support.

ok reyk@


# 1.12 27-Nov-2005 mcbride

Fix dereference of uninitialised pointer in trunk_input() error path.

ok reyk@


# 1.11 27-Nov-2005 mcbride

Make the trunk interface link state depend on the link states of the
trunkports (link is UP as long as at least one of the trunkports is up)

ok reyk@


# 1.10 23-Oct-2005 mpf

Rework of multicast deletion code for vlan(4) and trunk(4).
The previous code could wrongly delete multicast groups
on the parent interface. Now we forward only remembered
delete requests.
OK mcbride, mickey.


# 1.9 09-Oct-2005 reyk

use label


# 1.8 03-Oct-2005 reyk

add a simple active "failover" trunk mode. port priorities will be
added later, currently the master port will always be the default
(active) port and the the next active port will be used as the
failover port.

ok brad@


# 1.7 14-Sep-2005 reyk

correctly set IFF_RUNNING flag after device state changes.

ok brad@


# 1.6 11-Sep-2005 brad

when adding any IP addresses make sure to UP the interface.

ok reyk@


# 1.5 10-Sep-2005 reyk

update the trunk(4) driver

- add multicast support by passing multicast addresses to the ports.
this is a requirement for carp(4) over trunk(4).

- support the smallest common interface capabilities. ie., this adds
support for VLAN MTUs if all attached ports have this capability.

- add a port_destroy callback to the trunk protocol. this fixes a
potential crash if the master port has been detached while running.

discussed with deraadt@, brad@ and some others.


Revision tags: OPENBSD_3_8_BASE
# 1.4 31-Jul-2005 pascoe

Introduce bpf_mtap_af and bpf_mtap_hdr to be used when passing a mbuf chain
to bpf with either an address family or other header added.

These helpers only allocate a much smaller struct m_hdr on the stack when
needed, rather than leaving 256 byte struct mbufs on the stack in deep
call paths. Also removes a fair bit of duplicated code.

commit now, tune after deraadt@


# 1.3 27-May-2005 reyk

add missing free on error. thanks to Andrey Matveev.


# 1.2 24-May-2005 reyk

support trunk stacking (trunks as trunk ports) and some fixes

ok brad@


# 1.1 24-May-2005 reyk

initial import of a trunking (link aggregation and link failover)
implementation. it currently supports round robin mode with link state
checking, additional modes will be added later.

ok brad@, deraadt@


# 1.149 28-Jul-2020 mvs

Add missing `IFXF_CLONED' flag to clone interfaces.

ok mpi@


# 1.148 22-Jul-2020 dlg

deprecate interface input handler lists, just use one input function.

the interface input handler lists were originally set up to help
us during the intial mpsafe network stack work. at the time not all
the virtual ethernet interfaces (vlan, svlan, bridge, trunk, etc)
were mpsafe, so we wanted a way to avoid them by default, and only
take the kernel lock hit when they were specifically enabled on the
interface. since then, they have been fixed up to be mpsafe.

i could leave the list in place, but it has some semantic problems.
because virtual interfaces filter packets based on the order they
were attached to the parent interface, you can get packets taken
away in surprising ways, especially when you reboot and netstart
does something different to what you did by hand. by hardcoding the
order that things like vlan and bridge get to look at packets, we
can document the behaviour and get consistency.

it also means we can get rid of a use of SRPs which were difficult
to replace with SMRs. the interface input handler list is an SRPL,
which we would like to deprecate. it turns out that you can sleep
during stack processing, which you're not supposed to do with SRPs
or SMRs, but SRPs are a lot more forgiving and it worked.

lastly, it turns out that this code is faster than the input list
handling, so lots of winning all around.

special thanks to hrvoje popovski and aaron bieber for testing.
this has been in snaps as part of a larger diff for over a week.


# 1.147 10-Jul-2020 patrick

Change users of IFQ_DEQUEUE(), IFQ_ENQUEUE() and IFQ_LEN() to use the
"new" API.

ok dlg@ tobhe@


# 1.146 17-Jun-2020 dlg

make ph_flowid in mbufs 16bits by storing whether it's set in csum_flags.

i've been wanting to do this for a while, and now that we've got
stoeplitz and it gives us 16 bits, it seems like the right time.


# 1.145 21-May-2020 dlg

don't limit the output queue (ifq) length to 1 anymore.

if we use the ifq to move packet processing to another context,
it's too easy to fill up the one slot and cause packet loss.

the ifq len was set to 1 to avoid delays produced by the original
implementation of tx mitigation. however, trunk now introduces
latency because it isn't mpsafe yet, which causes the network stack
to have to take the kernel lock for each packet, and the kernel
lock can be quite contended. i want to use the ifq to move the
packet to the systq thread (which already has the kernel lock)
before trunk is asked to transmit it.

tested by mark patruck and myself.


Revision tags: OPENBSD_6_7_BASE
# 1.144 06-Dec-2019 dlg

when copying capabilities from the first port to a trunk, copy hardmtu too.

previously it copied the ports if_mtu to the trunks if_hardmtu,
which makes it hard for things like vlan(4) to work with a full
frame size, or large frame size.

tested by hrvoje popovski


# 1.143 07-Nov-2019 dlg

turn the linkstate hooks into a task list, like the detach hooks.

this is largely mechanical, except for carp. this moves the addition
of the carp link state hook after we're committed to using the new
interface as a carpdev. because the add can't fail, we avoid a
complicated unwind dance. also, this tweaks the carp linkstate hook
so it only updates the relevant carp interface, not all of the
carpdevs on the parent.

hrvoje popovski has tested an early version of this diff and it's
generally ok, but there's some splasserts that this diff fires that
i'll fix in an upcoming diff.

ok claudio@


# 1.142 06-Nov-2019 dlg

replace the hooks used with if_detachhooks with a task list.

the main semantic change is that things registering detach hooks
have to allocate and set a task structure that then gets added to
the list. this means if the task is allocated up front (eg, as part
of carps softc or bridges port structure), it avoids the possibility
that adding a hook can fail. a lot of drivers weren't checking for
failure, and unwinding state in the event of failure in other parts
was error prone.

while doing this i discovered that the list operations have to be
in a particular order, but drivers weren't doing that consistently
either. this diff wraps the list ops up so you have to seriously
go out of your way to screw them up.

ive also sprinkled some NET_ASSERT_LOCKED around the list operations
so we can make sure there's no potential for the list to be corrupted,
especially while it's being run.

hrvoje popovski has tested this a bit, and some issues he discovered
have been fixed.

ok sashan@


Revision tags: OPENBSD_6_6_BASE
# 1.141 05-Jul-2019 dlg

record when trunk takes over an interface by setting ac_trunkport

this will be used to prevent trunk and the upcoming aggr driver
from taking ownership of an Ethernet interface at the same time.


# 1.140 11-May-2019 florian

A trunk(4) usually stays up when the link state of one of its members
changes. While we do get RTM_IFINFO messages for the (physical) member
interfaces there is no indication that something changed from the
trunk(4) interface.
It is helpful to get this information in userland from the trunk so that
userland daemons do not need to track interface membership by themselves.
OK phessler


# 1.139 29-Apr-2019 dlg

tr_unit is unused, so gc it


# 1.138 23-Apr-2019 dlg

a first cut at converting some virtual ethernet interfaces to if_vinput

this let's input processing bypass ifiqs. there's a performance
benefit from this, and it will let me tweak the backpressure detection
mechanism that ifiqs use without impacting on a stack of virtual
interfaces.

ive tested all of these except mpw, which i will end up testing
soon anyway.


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.137 12-Aug-2018 ccardenas

Add administrative options to LACP trunk implementation.

The trunk driver now has a new ioctl (SIOCxTRUNKOPTS), which for now only
has options for LACP:
* Mode - Active or Passive (default Active)
* Timeout - Fast or Slow (default Slow)
* System Priority - 1(high) to 65535(low) (default 32768/0x8000)
* Port Priority - 1(high) to 65535(low) (default 32768/0x8000)
* IFQ Priority - 0 to NUM_QUEUES (default 6)

At the moment, ifconfig only has options for lacpmode and lacptimeout
plumbed as those are the immediate need.

The approach taken for the options was to make them on a "trunk" vs a
"port" as what's typically seen on various NOSes (JunOS, NXOS, etc...)
as it's uncommon for a host to have one link "Passive" and the other
"Active" in a given trunk.

Just like on a NOS, when applying lacpmode or lacptimeout, the settings
are immediately applied to all existing ports in the trunk and to all
future ports brought into the trunk.

Tested by many on a plethora of NIC drivers and switches.

Ok remi@


Revision tags: OPENBSD_6_3_BASE
# 1.136 19-Feb-2018 mpi

Remove almost unused `flags' argument of suser().

The account flag `ASU' will no longer be set but that makes suser()
mpsafe since it no longer mess with a per-process field.

No objection from millert@, ok tedu@, bluhm@


# 1.135 09-Jan-2018 bluhm

Creating a cloned interface could return ENOMEM due to temporary
memory shortage. As it is invoked from a system call, it should
not fail and wait instead.
OK visa@ mpi@


Revision tags: OPENBSD_6_2_BASE
# 1.134 14-Aug-2017 reyk

The "ret" return value is reused and overwritten, potentially
returning 0 (success) on error instead of an error number. The caller
doesn't evaluate the return value, so it is good enough to return
ENOBUFS (non-0) on error and to remove "ret" in trunk_cast_start().

Coverity CID 1453105; Severity: Minor

OK mpi@


# 1.133 11-Aug-2017 mpi

Remove NET_LOCK()'s argument.

Tested by Hrvoje Popovski, ok bluhm@


# 1.132 28-May-2017 mpi

Add missing NET_UNLOCK() in error path.

Spotted by sashan@


# 1.131 28-May-2017 mpi

trunk_port_destroy() needs the NET_LOCK().

It brings the interface down and restore the original lladdr.

Found by Hrvoje Popovski


# 1.130 28-May-2017 mpi

Remove useless splnet()/splx() dances.

Data structures modified in the ioctl path are protected by the NET_LOCK().

ok sashan@


Revision tags: OPENBSD_6_1_BASE
# 1.129 22-Jan-2017 dlg

move counting if_opackets next to counting if_obytes in if_enqueue.

this means packets are consistently counted in one place, unlike the
many and various ways that drivers thought they should do it.

ok mpi@ deraadt@


# 1.128 16-Sep-2016 mikeb

Reconfigure interface capabilities after switching trunkproto; ok mpi


Revision tags: OPENBSD_6_0_BASE
# 1.127 13-Apr-2016 mpi

We're always ready! So send IFQ_SET_READY() to the bitbucket.


Revision tags: OPENBSD_5_9_BASE
# 1.126 31-Dec-2015 sthen

Move tr_port_destroy down; fixes 'lacp_compose_key protection fault trap'
when removing a port from a lacp trunk. Part of a larger diff from mpi,
as suggested by mikeb. ok mpi@


# 1.125 21-Nov-2015 dlg

dont check IFF_OACTIVE to see if the port is busy.

dont check if its busy at all, actually.

fine with reyk@


# 1.124 20-Nov-2015 dlg

dont play with IFF_OACTIVE needlessly.

only a driver sets or clears it, and trunk never sets it. therefore it
never needs to clear it.


# 1.123 12-Nov-2015 mpi

Prefix flowid with ph_ and print it in m_print().

ok dlg@


# 1.122 25-Oct-2015 mpi

arp_ifinit() is no longer required.


# 1.121 08-Oct-2015 mikeb

Make sure that when trunk_port_ioctl is called to set a new
lladdr the trunk port is already on the list.

OK mpi


# 1.120 08-Oct-2015 dlg

if the mbuf has a valid flowid, use it instead of using siphash24
and a bunch of header fields we have to parse the mbuf for.

siphash24 is about 20% of the cost of sending a udp packet on a
trunk interface with tcpbench on my box. if there's a flowid set
we get all that back.

ok mpi@ mikeb@ sthen@


# 1.119 05-Oct-2015 mikeb

Factor LACP frame processing out to a separate task

This is slightly refactored version of the diff by jmatthew@
that makes use of a single per-trunk task but retains per-port
mbuf queues.

Running LACP frame processing in a task context allows a simple
way to synchronize changes to the trunk ports and trunk itself
performed from the ioctl, timeout and task contexts with a kernel
lock.

OK mpi


# 1.118 29-Sep-2015 deraadt

add sizes to some of the simpler free calls
ok mpi


# 1.117 28-Sep-2015 mpi

Remove "if_tp" from the "struct ifnet".

Instead of violating a layer of abstraction by keeping per pseudo-driver
informations in "struct ifnet", the port trunk is now passed as a cookie
to the interface input handler (ifih).

The time of per pseudo-driver hack in the network stack is over!

ok mikeb@


# 1.116 24-Sep-2015 mikeb

add a comment explaining how we serialize when switching trunkproto;
requested by mpi@


# 1.115 24-Sep-2015 mikeb

Avoid a theoretical m_pullup(9) mishandling by delegating the mbuf
reclaiming to the PDU and marker input routines.

m_pullup may return a pointer to the newly allocated mbuf. In this
case m_freem is called by the trunk_input, not by the proto specific
code and pointer to the mbuf is not passed by reference. Therefore
m_freem will either be called on the middle element of the chain
(when the m_pullup call succeeds) or on the stale pointer (when it
frees the chain in the failure case). Fortunately we should never
hit this case as the receive path uniformly uses contiguous chunks
of memory.

Verified with and ok blambert, ok mpi


# 1.114 23-Sep-2015 mikeb

Serialize trunk changes with input handler insertion and removal.

This moves around calls to if_ih_insert and if_ih_remove to ensure
that we either have completed port initialization or are going to
tear the port configuration down and don't want any input processes
to get hold of the port.

When trunk_port_destroy is called from the ioctl this would wait for
all input processes to finish and release their references to be able
to disestablish the input handler and ensure full control of the port.

When switching trunkproto it is required for the ioctl context to
be able to touch all trunk ports and the protocol (tr_psc). The
easiest way do this is to disestablish all input handlers (while
making sure they all complete) and then reestablish them after the
trunk reconfiguration is completed.

This avoids getting trunk a separate locking protocol of its own.

ok mpi, suggested by and ok dlg


# 1.113 23-Sep-2015 mikeb

Keep track of an active port in the failover trunk to avoid list
iterations and additional locking protection in the future.

Suggested by and ok mpi


# 1.112 23-Sep-2015 mikeb

Remove trunk watchdog code since it doesn't do anything useful
and we want to limit the number of different places where we
access trunk port pointers.

trunk_watchdog should be never called as we don't set up it's
if_timer and trunk_port_watchdog just calls the if_watchdog
from the underlying interface.

It's possible that this is no longer needed due to if_slowtimo/
if_watchdog changes done earlier.

ok mpi


# 1.111 10-Sep-2015 mikeb

pass a cookie argument to interface input handlers that can be used
to pass additional context or transient data with the similar life
time.

ok mpi, suggestions, hand holding and ok from dlg


# 1.110 10-Sep-2015 dlg

move the if input handler list to an SRP list.

instead of having every driver that manipulates the ifih list
understand SRPLs, this moves that processing into if_ih_insert and
if_ih_remove functions.

we rely on the kernel lock to serialise the modifications to the
list.

tested by mpi@
ok mpi@ claudio@ mikeb@


Revision tags: OPENBSD_5_8_BASE
# 1.109 17-Jul-2015 mpi

Drop promiscuously received packets if the trunk(4) interface is not
in promiscuous mode.

The long story is that claudio@ had his ssh session reset multiple
times in the hackroom because czarkoff@'s machine was sending reset.
We figured out that the packet was reaching pf because of this missing
check. pf would then not find any state and sent a reset.

Analyzed with and ok phessler@, claudio@


# 1.108 02-Jul-2015 mpi

Unify the check for up & running between all pseudo-drivers.


# 1.107 02-Jul-2015 mpi

By design if_input_process() needs to hold a reference on the receiving
ifp in order to access its ifih handlers.

So get rid of if_get() in the various ifih handlers we know the ifp is
live at this point.

ok dlg@


# 1.106 30-Jun-2015 mpi

Rename if_output() into if_enqueue() to avoid confusion with comments
talking about (*ifp->if_output)().

ok claudio@, dlg@


# 1.105 29-Jun-2015 dlg

count if_ibytes in if_input like we do for if_ipackets.

tweaks and ok mpi@


# 1.104 24-Jun-2015 mpi

Increment if_ipackets in if_input().

Note that pseudo-drivers not using if_input() are not affected by this
conversion.

ok mikeb@, kettenis@, claudio@, dlg@


# 1.103 16-Jun-2015 mpi

Store a unique ID, an interface index, rather than a pointer to the
receiving interface in the packet header of every mbuf.

The interface pointer should now be retrieved when necessary with
if_get(). If a NULL pointer is returned by if_get(), the interface
has probably been destroy/removed and the mbuf should be freed.

Such mechanism will simplify garbage collection of mbufs and limit
problems with dangling ifp pointers.

Tested by jmatthew@ and krw@, discussed with many.

ok mikeb@, bluhm@, dlg@


# 1.102 15-Jun-2015 mpi

Fix a double free in the destroy path triggered when a second process,
in my case dhclient(8), races with ifconfig(8) to free the descriptors
of the joined multicast groups.

While here reduce the difference with carp(4).

ok dms@


# 1.101 09-Jun-2015 mpi

Convert trunk(4) to if_input().

ok dlg@


# 1.100 26-May-2015 mpi

Now that the Ethernet header is always passed as part of the mbuf, kill
the second (unused) argument of the input packet handlers.

ok dlg@


# 1.99 15-May-2015 mpi

Introduce if_output(), a function do to the last steps before enqueuing
a packet on the sending queue of an interface.

Tested by many, thanks a lot!

ok dlg@, claudio@


# 1.98 14-May-2015 mpi

Allocate the input packet handler as part of the trunk_port structure
since they have the same lifetime.

Requested by and ok dlg@


# 1.97 13-May-2015 mpi

Get rid of the last "#if NTRUNK" by overwriting trunk ports' output
function.

ok claudio@, reyk@


# 1.96 11-May-2015 mpi

Take trunk(4) out of ether_input().

Each physical interface of a trunk now gets a specific ifih (interface
input handler) that runs before ether_input().

Tested by sthen@, dlg@, Theo Buehler and <mxb AT alumni.chalmers DOT se>

ok sthen@, dlg@


# 1.95 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.94 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.93 04-Dec-2014 tedu

use siphash for trunk loadbalancing. ok deraadt


# 1.92 01-Dec-2014 mikeb

Make every interface with a watchdog register it's own slow timeout

This removes the system wide if_slowtimo timeout and lets every
interface with a valid if_watchdog method register it's own in
order to get rid of the ifnet loop in the softclock context and
avoid further complications with concurrent access to the ifnet
list.

ok deraadt, input and ok mpi, looked at by claudio


# 1.91 18-Nov-2014 tedu

move arc4random prototype to systm.h. more appropriate for most code
to include that than rdnvar.h. ok deraadt dlg


Revision tags: OPENBSD_5_6_BASE
# 1.90 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.89 12-Jul-2014 tedu

add a size argument to free. will be used soon, but for now default to 0.
after discussions with beck deraadt kettenis.


# 1.88 09-Jul-2014 henning

bpf code surgery / shuffling / simplification.
the various bpf_mtap_* are very similiar, they differ in what (and to some
extent how) they prepend something, and what copy function they pass to
bpf_catchpacket.
use an internal _bpf_mtap as "backend" for bpf_mtap and friends.
extend bpf_mtap_hdr so that it covers all common cases:
if dlen is 0, nothing gets prepended.
copy function can be given, if NULL the default bpf_mcopy is used.
adjust the existing bpf_mtap_hdr users to pass a NULL ptr for the copy fn.
re-implement bpf_mtap_af as simple wrapper for bpf_mtap_hdr.
re-implement bpf_mtap_ether using bpf_map_hdr
re-implement bpf_mtap_pflog as trivial bpf_mtap_hdr wrapper
ok bluhm benno


# 1.87 10-Mar-2014 mpi

if_lladdr -> if_sadl, no functional change.

ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.86 21-Nov-2013 mpi

Remove unneeded include.

ok deraadt@


# 1.85 18-Nov-2013 mpi

Convert trunk(4) to use a detachhook, discussed at b2k13 with many.

While here add a comment explaining detach hooks' order of execution when
destroying/detaching an interface.


Revision tags: OPENBSD_5_4_BASE
# 1.84 20-Jun-2013 mpi

Revert previous and unbreak asr, the new include should be protected.

Reported by naddy@


# 1.83 20-Jun-2013 mpi

Allocate the various hook head descriptors as part of the ifnet
structure rather than doing various M_WAITOK allocations during
the *attach() functions, we always rely on them anyway.

ok mikeb@, uebayasi@


# 1.82 11-May-2013 sthen

Set trunk(4)'s MTU to that of the first trunkport. Allows trunk to work with
jumbo/baby-jumbo frames. To avoid problems with mismatches between trunkports,
any additional ports must have the same MTU as already set on the trunk(4).
Based on changes made in FreeBSD. Tested by myself and jj@, ok reyk@


# 1.81 02-Apr-2013 mpi

Instead of storing the link-level address of every interface in a global
array indexed by interface numbers, add a new field to the interface
descriptor pointing to it.

claudio@ and todd@ like it, ok mikeb@


# 1.80 28-Mar-2013 tedu

no need for a lot of code to include proc.h


Revision tags: OPENBSD_5_3_BASE
# 1.79 25-Feb-2013 dlg

trunk_bcast_start sent packets on all its member interfaces by copying
the mbuf it just sent on the previous interface. this is bad because the
previous interface could have modified the mbuf chain, which can make the
subsequent m_copym()s panic.

this copies the dance that rtsock.c does for broadcasting mbufs which
copies the mbuf before transmit, except for the last interface which it
handles outside the loop.

tested by halex@ who verified it fixes his panic.
ok claudio@ deraadt@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.78 28-Oct-2011 krw

Take more care to ensure all callbacks are initialized. In particular
tr_linkstate() was not initialized in trunk_rr_attach(), and
tr_init() and tr_stop() were missing in trunk_lb_attach().

Fixes crash triggered by changing trunkproto, reported by Anders
Berggren on bugs@.

ok mpf henning


Revision tags: OPENBSD_4_9_BASE OPENBSD_5_0_BASE
# 1.77 28-Jan-2011 reyk

Fix another memory leak by replacing m_free() with m_freem() in trunk's
broadcast mode.

ok blambert@ mikeb@


# 1.76 12-Nov-2010 dhill

avoid pointer arithmetic on void *
no change in .o

ok claudio


Revision tags: OPENBSD_4_8_BASE
# 1.75 08-May-2010 stsp

Upon changing the MAC address of an if_trunk interface, all ports are switched
to the new MAC. But subsequently added ports were still being assigned the
old MAC address because it was copied from the wrong place. Give newly added
trunk ports the current MAC of the master port, rather than the saved MAC of
the master port. The saved MAC should only be used to restore the original
MAC address of the interface when it is removed from the trunk.

ok claudio@


# 1.74 23-Apr-2010 stsp

Use proper Queen's English in a comment. Drive-by fix, no functional change.


# 1.73 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


# 1.72 17-Apr-2010 deraadt

use ifnewlladdr() for trunk lladdr changes, too
ok stsp


Revision tags: OPENBSD_4_7_BASE
# 1.71 12-Jan-2010 dlg

set the length of the send queue to 1.

this prevents the ultimate length of the queue of the underlying interface
from being artificially inflated while hte vlan/trunk queue is filled and
then dumped wholesale on the underlying interface, which will dump its
massive queue wholesale on the chip.

tx mitigation is only triggered on real interfaces now (which is where the
cost is)

ok beck@ original diff ok kjc@ henning@


# 1.70 18-Nov-2009 deraadt

do not do setup that ether_ifattach() takes care of; ok jsg


# 1.69 17-Sep-2009 claudio

Add an splassert check trunk_enqueue() calling this function at anything
below splnet() is a good recipe for doom.
OK henning, reyk, mpf


# 1.68 09-Sep-2009 reyk

remove inline functions and move some code from the trunk_lacp_input()
API function directly to lacp_input() to simplify the code path.

ok mpf@


# 1.67 16-Jul-2009 thib

Backout rev1.79 of if_vlan.c and rev1.66 of if_trunk.c;
Changes in those revision limited the send queue to one slot.

This breaks NFS over vlan(4) has discovered by sthen@.

"just plain back it out." deraadt@


# 1.66 13-Jul-2009 dlg

make the send queue one slot long. this forces packets off the virtual
interfaces down to the queue on the physical interface immediately.

this avoids having the tx mitigation code wasting cpu time dicking around
with simply shuffling packets off virtual interface queues and lets it
do its job of ammortising the cost of calling a real interfaces start
routine.

it also prevents an artificial inflation of the physical interfaces queue
length where packets could hide on the virtual interfaces queues during
softnet before being dumped en masse onto the hardware. this will smooth
out the rate at which packets are submitted to the hardware.

kjc@ says this has no impact on altq. ya henning@


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.65 27-Jan-2009 naddy

handle HW VLAN tags being passed down; from Brad


# 1.64 27-Jan-2009 naddy

make the hardware/no hardware tag stripping cases consistent and don't
hash the VLAN priority; ok henning@


# 1.63 14-Dec-2008 brad

Allow trunk_hashmbuf() to take HW VLAN tagging into consideration.

ok mpf@ naddy@


# 1.62 14-Dec-2008 brad

Since trunk_hashmbuf() and thus trunk_lb_gethdr() are no longer specific
to the loadbalance code rename trunk_lb_gethdr() to just trunk_gethr().

ok mpf@


# 1.61 28-Nov-2008 brad

Eliminate the redundant bits of code for MTU and multicast handling
from the individual drivers now that ether_ioctl() handles this.

Shrinks the i386 kernels by..
RAMDISK - 2176 bytes
RAMDISKB - 1504 bytes
RAMDISKC - 736 bytes

Tested by naddy@/okan@/sthen@/brad@/todd@/jmc@ and lots of users.
Build tested on almost all archs by todd@/brad@

ok naddy@


# 1.60 16-Nov-2008 brad

Make sure to increment the the output error counter if
not using TRUNK_PROTO_NONE and there are no member ports.

ok mpf@


# 1.59 08-Nov-2008 mpf

Take into account that our ether_input() already strips the
ethernet header. This lets us actually process the incoming
LACP-Packets. It should now work with a lot more switches.
At least a Catalyst 3500 seems happy.
OK brad@


# 1.58 04-Nov-2008 brad

Move the trunk port count check from trunk_lb_start() to trunk_start()
before the protocol start routine is called so as to cover all protocols
with the same check.

ok mpf@


# 1.57 30-Oct-2008 brad

Fix building with !INET6 kernels.


# 1.56 28-Oct-2008 brad

Remove return at the end of a void function.


# 1.55 28-Oct-2008 brad

In trunk_media_status() mark the interface as active if any ports are
active rather than just the primary being UP.

From FreeBSD

Ok mpf@


# 1.54 28-Oct-2008 brad

In trunk_lb_start() port % count will never be greater than
TRUNK_MAX_PORTS so nuke the test.

From FreeBSD

Ok mpf@


# 1.53 28-Oct-2008 brad

Feed IPv6 flow label to hash calculation.

From FreeBSD

Ok mpf@


# 1.52 28-Oct-2008 brad

Show the ACTIVE flag in ifconfig for the single interface that is
actually active in failover mode rather than all interfaces with a
link. This makes it clear if the master interface is in use or one
of the backup links.

From FreeBSD

Tested by jmc@
Ok mpf@


# 1.51 02-Oct-2008 brad

First step towards cleaning up the Ethernet driver ioctl handling.
Move calling ether_ioctl() from the top of the ioctl function, which
at the moment does absolutely nothing, to the default switch case.
Thus allowing drivers to define their own ioctl handlers and then
falling back on ether_ioctl(). The only functional change this results
in at the moment is having all Ethernet drivers returning the proper
errno of ENOTTY instead of EINVAL/ENXIO when encountering unknown
ioctl's.

Shrinks the i386 kernels by..
RAMDISK - 1024 bytes
RAMDISKB - 1120 bytes
RAMDISKC - 832 bytes

Tested by martin@/jsing@/todd@/brad@
Build tested on almost all archs by todd@/brad@

ok jsing@


# 1.50 17-Sep-2008 chl

remove dead stores and newly created unused variables.

fix potential use of uninitialized value in trunk_port_ioctl() function.

Found by LLVM/Clang Static Analyzer.

ok mpf@ henning@


# 1.49 07-Aug-2008 damien

do not touch m after IFQ_ENQUEUE()+if_start().

ok brad@, mpf@, henning@, reyk@


Revision tags: OPENBSD_4_4_BASE
# 1.48 06-Aug-2008 reyk

fix trunk breakage that sneaked in with the lacp diff:

- don't use in-kernel IFMEDIA ioctls in lacp and remove two KASSERTs
that caused reliable panics - the lacp key can be locally assigned and
we don't need to query the media subtype here.

- unbreak failover/loadbalance/broadcast status handling.

Reported by brad@
ok deraadt@


# 1.47 30-Jul-2008 mpf

Prevent a divide by zero panic if trunkproto loadbalance is
used w/out any trunkports. Patch from Dmitri Alenitchev.
OK reyk@


# 1.46 15-Jun-2008 mpf

Add 802.3ad LACP support for trunk(4).
Implementation from NetBSD. Ported via FreeBSD's version in trunk^Wlagg(4).
This is still work in progress. Tested with a HP ProCurve 3500.
OK reyk@


# 1.45 14-Jun-2008 mpf

Move bpf_mtap_hdr() above the trunk_*_input() routines.
This makes it easier to add trunk protocols that consume some packets.
Add a special case for the failover protocol, to prevent shoving
duplicates to bpf. (Not beautiful, but it has to do for the moment).
OK reyk@, claudio@


# 1.44 13-Jun-2008 mpf

Move the responsibility to free received packets on trunked interfaces
from ether_input() into trunk_input() where it can be handled in a smarter way.
OK claudio@ and reyk@ on an earlier version.


# 1.43 08-Jun-2008 brad

Use m_freem() instead of m_free() in trunk_start() to ensure that the
full mbuf chain is being free'd.

ok reyk@


# 1.42 07-May-2008 dlg

enable tx mitigation when putting packets on the wire by switching from
calls to ifp->if_start to if_start(). these are the obviously right cases
where we can do that, the less obvious ones may follow as theyre figured
out.

deraadt@ said to go for it


Revision tags: OPENBSD_4_3_BASE
# 1.41 10-Jan-2008 brad

return with ENOTTY instead of EINVAL for unknown ioctl requests to trunk
ports.

ok reyk@ dlg@


# 1.40 26-Nov-2007 martynas

typos; ok jmc@
sys/netinet/in_pcb.c and sys/net/bridgestp.c ok henning@
sys/dev/pci/bktr/* ok jakemsr@


# 1.39 20-Nov-2007 canacar

Fix possible mbuf leak on error. ok reyk@


# 1.38 22-Oct-2007 reyk

use the input mbuf for the first active port instead of copying it in
the broadcast start routing.

ok pyr@


# 1.37 22-Oct-2007 pyr

Add a broadcast mode to trunk(4). This mode sends frames on all
ports and receives frame on any port. This allows interaction with
some L2 configurations.
with input and ok reyk@


# 1.36 15-Sep-2007 henning

malloc sweep:
-remove useless casts
-MALLOC/FREE -> malloc/free
-use M_ZERO where appropriate instead of seperate bzero
feedback & ok krw, hshoexer


# 1.35 07-Sep-2007 reyk

use M_ZERO


# 1.34 06-Sep-2007 reyk

bump the copyright while touching these files


# 1.33 06-Sep-2007 reyk

With a trunk(4) interface in fail over mode the trunk(4) interface
will show input errors for packets received from any of the ports that
are part of a fail over interface but are not the "master" port at the
time. This fixes the problem by checking the error condition
correctly.

From brad at comstyle dot com


Revision tags: OPENBSD_4_2_BASE
# 1.32 26-May-2007 jason

one extern seems to be better than 20 for ifqmaxlen; ok krw


# 1.31 26-Apr-2007 reyk

extend the trunk protocol API with some additional callbacks required
for future work. also move the repeated tx start code into a common
function.

parts of it are merged from FreeBSD's trunk(4) port. oh, wait... they
renamed it to 'lagg(4)' because a little green guy from Cizzco-Eeeh
told them "trunk is for VLANs, trunk is for VLANs". Bad FreeBSD, don't
listen to the little green guy from Cizzco-Eeeh!

ok claudio@


Revision tags: OPENBSD_4_1_BASE
# 1.30 31-Jan-2007 reyk

handle the full duplex link state in trunk(4). load sharing trunks
with at least two ports are always handled as full fuplex links. this
change will allow trunks as edge ports in a rstp bridge(4).

ok brad@ pyr@


Revision tags: OPENBSD_4_0_BASE
# 1.29 28-May-2006 reyk

check if the interface is active and UP. some, but not all, network
drivers report an active link state even if the interface is DOWN.
this should fix trunk with various ethernet devices.

ok brad@


# 1.28 23-May-2006 reyk

knf and remove an unneeded debug message


# 1.27 23-May-2006 reyk

add


# 1.26 20-May-2006 reyk

bump copyright


# 1.25 20-May-2006 reyk

drop packets received on inactive failover ports without increasing the
error counter. just silently drop them...

figured out by todd@, ok brad@


# 1.24 16-May-2006 reyk

the ifp->if_linkstatehooks may be NULL, add an extra check to avoid
possible kernel panic. this happened to me by using tun(4) in layer 2
mode (link0 flag) as a trunk port for testing.


# 1.23 25-Mar-2006 djm

allow bpf(4) to ignore packets based on their direction (inbound or
outbound), using a new BIOCSDIRFILT ioctl;
guidance, feedback and ok canacar@


# 1.22 11-Mar-2006 brad

splimp -> splnet


# 1.21 04-Mar-2006 brad

With the exception of two other small uncommited diffs this moves
the remainder of the network stack from splimp to splnet.

ok miod@


Revision tags: OPENBSD_3_9_BASE
# 1.20 04-Jan-2006 brad

Move bpf_mtap_hdr() after trunk_*_input() so that we hopefully see only one
copy of the packet with bpf.

From mcbride@

some testing by todd@, ok reyk@


# 1.19 04-Jan-2006 canacar

Remove redundant calls to bpfdetach.
ok brad@


# 1.18 21-Dec-2005 reyk

knf


# 1.17 21-Dec-2005 reyk

fix possible NULL pointer, thanks to Marco Molteni


# 1.16 18-Dec-2005 reyk

Update my e-mail address in the copyright statement, no binary changes.


# 1.15 17-Dec-2005 brad

revert last commit as it introduced system panics due to improper use
of bpf_mtap().


# 1.14 27-Nov-2005 mcbride

Move bpf_mtap() after trunk_*_input() so that we hopefull see only one
copy of the packet with bpf.

ok reyk@


# 1.13 27-Nov-2005 mcbride

Attempt to accept a packet only once when operating in failover mode.

Makes trunk usable with hubs or switches which don't have actual trunk support.

ok reyk@


# 1.12 27-Nov-2005 mcbride

Fix dereference of uninitialised pointer in trunk_input() error path.

ok reyk@


# 1.11 27-Nov-2005 mcbride

Make the trunk interface link state depend on the link states of the
trunkports (link is UP as long as at least one of the trunkports is up)

ok reyk@


# 1.10 23-Oct-2005 mpf

Rework of multicast deletion code for vlan(4) and trunk(4).
The previous code could wrongly delete multicast groups
on the parent interface. Now we forward only remembered
delete requests.
OK mcbride, mickey.


# 1.9 09-Oct-2005 reyk

use label


# 1.8 03-Oct-2005 reyk

add a simple active "failover" trunk mode. port priorities will be
added later, currently the master port will always be the default
(active) port and the the next active port will be used as the
failover port.

ok brad@


# 1.7 14-Sep-2005 reyk

correctly set IFF_RUNNING flag after device state changes.

ok brad@


# 1.6 11-Sep-2005 brad

when adding any IP addresses make sure to UP the interface.

ok reyk@


# 1.5 10-Sep-2005 reyk

update the trunk(4) driver

- add multicast support by passing multicast addresses to the ports.
this is a requirement for carp(4) over trunk(4).

- support the smallest common interface capabilities. ie., this adds
support for VLAN MTUs if all attached ports have this capability.

- add a port_destroy callback to the trunk protocol. this fixes a
potential crash if the master port has been detached while running.

discussed with deraadt@, brad@ and some others.


Revision tags: OPENBSD_3_8_BASE
# 1.4 31-Jul-2005 pascoe

Introduce bpf_mtap_af and bpf_mtap_hdr to be used when passing a mbuf chain
to bpf with either an address family or other header added.

These helpers only allocate a much smaller struct m_hdr on the stack when
needed, rather than leaving 256 byte struct mbufs on the stack in deep
call paths. Also removes a fair bit of duplicated code.

commit now, tune after deraadt@


# 1.3 27-May-2005 reyk

add missing free on error. thanks to Andrey Matveev.


# 1.2 24-May-2005 reyk

support trunk stacking (trunks as trunk ports) and some fixes

ok brad@


# 1.1 24-May-2005 reyk

initial import of a trunking (link aggregation and link failover)
implementation. it currently supports round robin mode with link state
checking, additional modes will be added later.

ok brad@, deraadt@


# 1.148 22-Jul-2020 dlg

deprecate interface input handler lists, just use one input function.

the interface input handler lists were originally set up to help
us during the intial mpsafe network stack work. at the time not all
the virtual ethernet interfaces (vlan, svlan, bridge, trunk, etc)
were mpsafe, so we wanted a way to avoid them by default, and only
take the kernel lock hit when they were specifically enabled on the
interface. since then, they have been fixed up to be mpsafe.

i could leave the list in place, but it has some semantic problems.
because virtual interfaces filter packets based on the order they
were attached to the parent interface, you can get packets taken
away in surprising ways, especially when you reboot and netstart
does something different to what you did by hand. by hardcoding the
order that things like vlan and bridge get to look at packets, we
can document the behaviour and get consistency.

it also means we can get rid of a use of SRPs which were difficult
to replace with SMRs. the interface input handler list is an SRPL,
which we would like to deprecate. it turns out that you can sleep
during stack processing, which you're not supposed to do with SRPs
or SMRs, but SRPs are a lot more forgiving and it worked.

lastly, it turns out that this code is faster than the input list
handling, so lots of winning all around.

special thanks to hrvoje popovski and aaron bieber for testing.
this has been in snaps as part of a larger diff for over a week.


# 1.147 10-Jul-2020 patrick

Change users of IFQ_DEQUEUE(), IFQ_ENQUEUE() and IFQ_LEN() to use the
"new" API.

ok dlg@ tobhe@


# 1.146 17-Jun-2020 dlg

make ph_flowid in mbufs 16bits by storing whether it's set in csum_flags.

i've been wanting to do this for a while, and now that we've got
stoeplitz and it gives us 16 bits, it seems like the right time.


# 1.145 21-May-2020 dlg

don't limit the output queue (ifq) length to 1 anymore.

if we use the ifq to move packet processing to another context,
it's too easy to fill up the one slot and cause packet loss.

the ifq len was set to 1 to avoid delays produced by the original
implementation of tx mitigation. however, trunk now introduces
latency because it isn't mpsafe yet, which causes the network stack
to have to take the kernel lock for each packet, and the kernel
lock can be quite contended. i want to use the ifq to move the
packet to the systq thread (which already has the kernel lock)
before trunk is asked to transmit it.

tested by mark patruck and myself.


Revision tags: OPENBSD_6_7_BASE
# 1.144 06-Dec-2019 dlg

when copying capabilities from the first port to a trunk, copy hardmtu too.

previously it copied the ports if_mtu to the trunks if_hardmtu,
which makes it hard for things like vlan(4) to work with a full
frame size, or large frame size.

tested by hrvoje popovski


# 1.143 07-Nov-2019 dlg

turn the linkstate hooks into a task list, like the detach hooks.

this is largely mechanical, except for carp. this moves the addition
of the carp link state hook after we're committed to using the new
interface as a carpdev. because the add can't fail, we avoid a
complicated unwind dance. also, this tweaks the carp linkstate hook
so it only updates the relevant carp interface, not all of the
carpdevs on the parent.

hrvoje popovski has tested an early version of this diff and it's
generally ok, but there's some splasserts that this diff fires that
i'll fix in an upcoming diff.

ok claudio@


# 1.142 06-Nov-2019 dlg

replace the hooks used with if_detachhooks with a task list.

the main semantic change is that things registering detach hooks
have to allocate and set a task structure that then gets added to
the list. this means if the task is allocated up front (eg, as part
of carps softc or bridges port structure), it avoids the possibility
that adding a hook can fail. a lot of drivers weren't checking for
failure, and unwinding state in the event of failure in other parts
was error prone.

while doing this i discovered that the list operations have to be
in a particular order, but drivers weren't doing that consistently
either. this diff wraps the list ops up so you have to seriously
go out of your way to screw them up.

ive also sprinkled some NET_ASSERT_LOCKED around the list operations
so we can make sure there's no potential for the list to be corrupted,
especially while it's being run.

hrvoje popovski has tested this a bit, and some issues he discovered
have been fixed.

ok sashan@


Revision tags: OPENBSD_6_6_BASE
# 1.141 05-Jul-2019 dlg

record when trunk takes over an interface by setting ac_trunkport

this will be used to prevent trunk and the upcoming aggr driver
from taking ownership of an Ethernet interface at the same time.


# 1.140 11-May-2019 florian

A trunk(4) usually stays up when the link state of one of its members
changes. While we do get RTM_IFINFO messages for the (physical) member
interfaces there is no indication that something changed from the
trunk(4) interface.
It is helpful to get this information in userland from the trunk so that
userland daemons do not need to track interface membership by themselves.
OK phessler


# 1.139 29-Apr-2019 dlg

tr_unit is unused, so gc it


# 1.138 23-Apr-2019 dlg

a first cut at converting some virtual ethernet interfaces to if_vinput

this let's input processing bypass ifiqs. there's a performance
benefit from this, and it will let me tweak the backpressure detection
mechanism that ifiqs use without impacting on a stack of virtual
interfaces.

ive tested all of these except mpw, which i will end up testing
soon anyway.


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.137 12-Aug-2018 ccardenas

Add administrative options to LACP trunk implementation.

The trunk driver now has a new ioctl (SIOCxTRUNKOPTS), which for now only
has options for LACP:
* Mode - Active or Passive (default Active)
* Timeout - Fast or Slow (default Slow)
* System Priority - 1(high) to 65535(low) (default 32768/0x8000)
* Port Priority - 1(high) to 65535(low) (default 32768/0x8000)
* IFQ Priority - 0 to NUM_QUEUES (default 6)

At the moment, ifconfig only has options for lacpmode and lacptimeout
plumbed as those are the immediate need.

The approach taken for the options was to make them on a "trunk" vs a
"port" as what's typically seen on various NOSes (JunOS, NXOS, etc...)
as it's uncommon for a host to have one link "Passive" and the other
"Active" in a given trunk.

Just like on a NOS, when applying lacpmode or lacptimeout, the settings
are immediately applied to all existing ports in the trunk and to all
future ports brought into the trunk.

Tested by many on a plethora of NIC drivers and switches.

Ok remi@


Revision tags: OPENBSD_6_3_BASE
# 1.136 19-Feb-2018 mpi

Remove almost unused `flags' argument of suser().

The account flag `ASU' will no longer be set but that makes suser()
mpsafe since it no longer mess with a per-process field.

No objection from millert@, ok tedu@, bluhm@


# 1.135 09-Jan-2018 bluhm

Creating a cloned interface could return ENOMEM due to temporary
memory shortage. As it is invoked from a system call, it should
not fail and wait instead.
OK visa@ mpi@


Revision tags: OPENBSD_6_2_BASE
# 1.134 14-Aug-2017 reyk

The "ret" return value is reused and overwritten, potentially
returning 0 (success) on error instead of an error number. The caller
doesn't evaluate the return value, so it is good enough to return
ENOBUFS (non-0) on error and to remove "ret" in trunk_cast_start().

Coverity CID 1453105; Severity: Minor

OK mpi@


# 1.133 11-Aug-2017 mpi

Remove NET_LOCK()'s argument.

Tested by Hrvoje Popovski, ok bluhm@


# 1.132 28-May-2017 mpi

Add missing NET_UNLOCK() in error path.

Spotted by sashan@


# 1.131 28-May-2017 mpi

trunk_port_destroy() needs the NET_LOCK().

It brings the interface down and restore the original lladdr.

Found by Hrvoje Popovski


# 1.130 28-May-2017 mpi

Remove useless splnet()/splx() dances.

Data structures modified in the ioctl path are protected by the NET_LOCK().

ok sashan@


Revision tags: OPENBSD_6_1_BASE
# 1.129 22-Jan-2017 dlg

move counting if_opackets next to counting if_obytes in if_enqueue.

this means packets are consistently counted in one place, unlike the
many and various ways that drivers thought they should do it.

ok mpi@ deraadt@


# 1.128 16-Sep-2016 mikeb

Reconfigure interface capabilities after switching trunkproto; ok mpi


Revision tags: OPENBSD_6_0_BASE
# 1.127 13-Apr-2016 mpi

We're always ready! So send IFQ_SET_READY() to the bitbucket.


Revision tags: OPENBSD_5_9_BASE
# 1.126 31-Dec-2015 sthen

Move tr_port_destroy down; fixes 'lacp_compose_key protection fault trap'
when removing a port from a lacp trunk. Part of a larger diff from mpi,
as suggested by mikeb. ok mpi@


# 1.125 21-Nov-2015 dlg

dont check IFF_OACTIVE to see if the port is busy.

dont check if its busy at all, actually.

fine with reyk@


# 1.124 20-Nov-2015 dlg

dont play with IFF_OACTIVE needlessly.

only a driver sets or clears it, and trunk never sets it. therefore it
never needs to clear it.


# 1.123 12-Nov-2015 mpi

Prefix flowid with ph_ and print it in m_print().

ok dlg@


# 1.122 25-Oct-2015 mpi

arp_ifinit() is no longer required.


# 1.121 08-Oct-2015 mikeb

Make sure that when trunk_port_ioctl is called to set a new
lladdr the trunk port is already on the list.

OK mpi


# 1.120 08-Oct-2015 dlg

if the mbuf has a valid flowid, use it instead of using siphash24
and a bunch of header fields we have to parse the mbuf for.

siphash24 is about 20% of the cost of sending a udp packet on a
trunk interface with tcpbench on my box. if there's a flowid set
we get all that back.

ok mpi@ mikeb@ sthen@


# 1.119 05-Oct-2015 mikeb

Factor LACP frame processing out to a separate task

This is slightly refactored version of the diff by jmatthew@
that makes use of a single per-trunk task but retains per-port
mbuf queues.

Running LACP frame processing in a task context allows a simple
way to synchronize changes to the trunk ports and trunk itself
performed from the ioctl, timeout and task contexts with a kernel
lock.

OK mpi


# 1.118 29-Sep-2015 deraadt

add sizes to some of the simpler free calls
ok mpi


# 1.117 28-Sep-2015 mpi

Remove "if_tp" from the "struct ifnet".

Instead of violating a layer of abstraction by keeping per pseudo-driver
informations in "struct ifnet", the port trunk is now passed as a cookie
to the interface input handler (ifih).

The time of per pseudo-driver hack in the network stack is over!

ok mikeb@


# 1.116 24-Sep-2015 mikeb

add a comment explaining how we serialize when switching trunkproto;
requested by mpi@


# 1.115 24-Sep-2015 mikeb

Avoid a theoretical m_pullup(9) mishandling by delegating the mbuf
reclaiming to the PDU and marker input routines.

m_pullup may return a pointer to the newly allocated mbuf. In this
case m_freem is called by the trunk_input, not by the proto specific
code and pointer to the mbuf is not passed by reference. Therefore
m_freem will either be called on the middle element of the chain
(when the m_pullup call succeeds) or on the stale pointer (when it
frees the chain in the failure case). Fortunately we should never
hit this case as the receive path uniformly uses contiguous chunks
of memory.

Verified with and ok blambert, ok mpi


# 1.114 23-Sep-2015 mikeb

Serialize trunk changes with input handler insertion and removal.

This moves around calls to if_ih_insert and if_ih_remove to ensure
that we either have completed port initialization or are going to
tear the port configuration down and don't want any input processes
to get hold of the port.

When trunk_port_destroy is called from the ioctl this would wait for
all input processes to finish and release their references to be able
to disestablish the input handler and ensure full control of the port.

When switching trunkproto it is required for the ioctl context to
be able to touch all trunk ports and the protocol (tr_psc). The
easiest way do this is to disestablish all input handlers (while
making sure they all complete) and then reestablish them after the
trunk reconfiguration is completed.

This avoids getting trunk a separate locking protocol of its own.

ok mpi, suggested by and ok dlg


# 1.113 23-Sep-2015 mikeb

Keep track of an active port in the failover trunk to avoid list
iterations and additional locking protection in the future.

Suggested by and ok mpi


# 1.112 23-Sep-2015 mikeb

Remove trunk watchdog code since it doesn't do anything useful
and we want to limit the number of different places where we
access trunk port pointers.

trunk_watchdog should be never called as we don't set up it's
if_timer and trunk_port_watchdog just calls the if_watchdog
from the underlying interface.

It's possible that this is no longer needed due to if_slowtimo/
if_watchdog changes done earlier.

ok mpi


# 1.111 10-Sep-2015 mikeb

pass a cookie argument to interface input handlers that can be used
to pass additional context or transient data with the similar life
time.

ok mpi, suggestions, hand holding and ok from dlg


# 1.110 10-Sep-2015 dlg

move the if input handler list to an SRP list.

instead of having every driver that manipulates the ifih list
understand SRPLs, this moves that processing into if_ih_insert and
if_ih_remove functions.

we rely on the kernel lock to serialise the modifications to the
list.

tested by mpi@
ok mpi@ claudio@ mikeb@


Revision tags: OPENBSD_5_8_BASE
# 1.109 17-Jul-2015 mpi

Drop promiscuously received packets if the trunk(4) interface is not
in promiscuous mode.

The long story is that claudio@ had his ssh session reset multiple
times in the hackroom because czarkoff@'s machine was sending reset.
We figured out that the packet was reaching pf because of this missing
check. pf would then not find any state and sent a reset.

Analyzed with and ok phessler@, claudio@


# 1.108 02-Jul-2015 mpi

Unify the check for up & running between all pseudo-drivers.


# 1.107 02-Jul-2015 mpi

By design if_input_process() needs to hold a reference on the receiving
ifp in order to access its ifih handlers.

So get rid of if_get() in the various ifih handlers we know the ifp is
live at this point.

ok dlg@


# 1.106 30-Jun-2015 mpi

Rename if_output() into if_enqueue() to avoid confusion with comments
talking about (*ifp->if_output)().

ok claudio@, dlg@


# 1.105 29-Jun-2015 dlg

count if_ibytes in if_input like we do for if_ipackets.

tweaks and ok mpi@


# 1.104 24-Jun-2015 mpi

Increment if_ipackets in if_input().

Note that pseudo-drivers not using if_input() are not affected by this
conversion.

ok mikeb@, kettenis@, claudio@, dlg@


# 1.103 16-Jun-2015 mpi

Store a unique ID, an interface index, rather than a pointer to the
receiving interface in the packet header of every mbuf.

The interface pointer should now be retrieved when necessary with
if_get(). If a NULL pointer is returned by if_get(), the interface
has probably been destroy/removed and the mbuf should be freed.

Such mechanism will simplify garbage collection of mbufs and limit
problems with dangling ifp pointers.

Tested by jmatthew@ and krw@, discussed with many.

ok mikeb@, bluhm@, dlg@


# 1.102 15-Jun-2015 mpi

Fix a double free in the destroy path triggered when a second process,
in my case dhclient(8), races with ifconfig(8) to free the descriptors
of the joined multicast groups.

While here reduce the difference with carp(4).

ok dms@


# 1.101 09-Jun-2015 mpi

Convert trunk(4) to if_input().

ok dlg@


# 1.100 26-May-2015 mpi

Now that the Ethernet header is always passed as part of the mbuf, kill
the second (unused) argument of the input packet handlers.

ok dlg@


# 1.99 15-May-2015 mpi

Introduce if_output(), a function do to the last steps before enqueuing
a packet on the sending queue of an interface.

Tested by many, thanks a lot!

ok dlg@, claudio@


# 1.98 14-May-2015 mpi

Allocate the input packet handler as part of the trunk_port structure
since they have the same lifetime.

Requested by and ok dlg@


# 1.97 13-May-2015 mpi

Get rid of the last "#if NTRUNK" by overwriting trunk ports' output
function.

ok claudio@, reyk@


# 1.96 11-May-2015 mpi

Take trunk(4) out of ether_input().

Each physical interface of a trunk now gets a specific ifih (interface
input handler) that runs before ether_input().

Tested by sthen@, dlg@, Theo Buehler and <mxb AT alumni.chalmers DOT se>

ok sthen@, dlg@


# 1.95 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.94 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.93 04-Dec-2014 tedu

use siphash for trunk loadbalancing. ok deraadt


# 1.92 01-Dec-2014 mikeb

Make every interface with a watchdog register it's own slow timeout

This removes the system wide if_slowtimo timeout and lets every
interface with a valid if_watchdog method register it's own in
order to get rid of the ifnet loop in the softclock context and
avoid further complications with concurrent access to the ifnet
list.

ok deraadt, input and ok mpi, looked at by claudio


# 1.91 18-Nov-2014 tedu

move arc4random prototype to systm.h. more appropriate for most code
to include that than rdnvar.h. ok deraadt dlg


Revision tags: OPENBSD_5_6_BASE
# 1.90 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.89 12-Jul-2014 tedu

add a size argument to free. will be used soon, but for now default to 0.
after discussions with beck deraadt kettenis.


# 1.88 09-Jul-2014 henning

bpf code surgery / shuffling / simplification.
the various bpf_mtap_* are very similiar, they differ in what (and to some
extent how) they prepend something, and what copy function they pass to
bpf_catchpacket.
use an internal _bpf_mtap as "backend" for bpf_mtap and friends.
extend bpf_mtap_hdr so that it covers all common cases:
if dlen is 0, nothing gets prepended.
copy function can be given, if NULL the default bpf_mcopy is used.
adjust the existing bpf_mtap_hdr users to pass a NULL ptr for the copy fn.
re-implement bpf_mtap_af as simple wrapper for bpf_mtap_hdr.
re-implement bpf_mtap_ether using bpf_map_hdr
re-implement bpf_mtap_pflog as trivial bpf_mtap_hdr wrapper
ok bluhm benno


# 1.87 10-Mar-2014 mpi

if_lladdr -> if_sadl, no functional change.

ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.86 21-Nov-2013 mpi

Remove unneeded include.

ok deraadt@


# 1.85 18-Nov-2013 mpi

Convert trunk(4) to use a detachhook, discussed at b2k13 with many.

While here add a comment explaining detach hooks' order of execution when
destroying/detaching an interface.


Revision tags: OPENBSD_5_4_BASE
# 1.84 20-Jun-2013 mpi

Revert previous and unbreak asr, the new include should be protected.

Reported by naddy@


# 1.83 20-Jun-2013 mpi

Allocate the various hook head descriptors as part of the ifnet
structure rather than doing various M_WAITOK allocations during
the *attach() functions, we always rely on them anyway.

ok mikeb@, uebayasi@


# 1.82 11-May-2013 sthen

Set trunk(4)'s MTU to that of the first trunkport. Allows trunk to work with
jumbo/baby-jumbo frames. To avoid problems with mismatches between trunkports,
any additional ports must have the same MTU as already set on the trunk(4).
Based on changes made in FreeBSD. Tested by myself and jj@, ok reyk@


# 1.81 02-Apr-2013 mpi

Instead of storing the link-level address of every interface in a global
array indexed by interface numbers, add a new field to the interface
descriptor pointing to it.

claudio@ and todd@ like it, ok mikeb@


# 1.80 28-Mar-2013 tedu

no need for a lot of code to include proc.h


Revision tags: OPENBSD_5_3_BASE
# 1.79 25-Feb-2013 dlg

trunk_bcast_start sent packets on all its member interfaces by copying
the mbuf it just sent on the previous interface. this is bad because the
previous interface could have modified the mbuf chain, which can make the
subsequent m_copym()s panic.

this copies the dance that rtsock.c does for broadcasting mbufs which
copies the mbuf before transmit, except for the last interface which it
handles outside the loop.

tested by halex@ who verified it fixes his panic.
ok claudio@ deraadt@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.78 28-Oct-2011 krw

Take more care to ensure all callbacks are initialized. In particular
tr_linkstate() was not initialized in trunk_rr_attach(), and
tr_init() and tr_stop() were missing in trunk_lb_attach().

Fixes crash triggered by changing trunkproto, reported by Anders
Berggren on bugs@.

ok mpf henning


Revision tags: OPENBSD_4_9_BASE OPENBSD_5_0_BASE
# 1.77 28-Jan-2011 reyk

Fix another memory leak by replacing m_free() with m_freem() in trunk's
broadcast mode.

ok blambert@ mikeb@


# 1.76 12-Nov-2010 dhill

avoid pointer arithmetic on void *
no change in .o

ok claudio


Revision tags: OPENBSD_4_8_BASE
# 1.75 08-May-2010 stsp

Upon changing the MAC address of an if_trunk interface, all ports are switched
to the new MAC. But subsequently added ports were still being assigned the
old MAC address because it was copied from the wrong place. Give newly added
trunk ports the current MAC of the master port, rather than the saved MAC of
the master port. The saved MAC should only be used to restore the original
MAC address of the interface when it is removed from the trunk.

ok claudio@


# 1.74 23-Apr-2010 stsp

Use proper Queen's English in a comment. Drive-by fix, no functional change.


# 1.73 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


# 1.72 17-Apr-2010 deraadt

use ifnewlladdr() for trunk lladdr changes, too
ok stsp


Revision tags: OPENBSD_4_7_BASE
# 1.71 12-Jan-2010 dlg

set the length of the send queue to 1.

this prevents the ultimate length of the queue of the underlying interface
from being artificially inflated while hte vlan/trunk queue is filled and
then dumped wholesale on the underlying interface, which will dump its
massive queue wholesale on the chip.

tx mitigation is only triggered on real interfaces now (which is where the
cost is)

ok beck@ original diff ok kjc@ henning@


# 1.70 18-Nov-2009 deraadt

do not do setup that ether_ifattach() takes care of; ok jsg


# 1.69 17-Sep-2009 claudio

Add an splassert check trunk_enqueue() calling this function at anything
below splnet() is a good recipe for doom.
OK henning, reyk, mpf


# 1.68 09-Sep-2009 reyk

remove inline functions and move some code from the trunk_lacp_input()
API function directly to lacp_input() to simplify the code path.

ok mpf@


# 1.67 16-Jul-2009 thib

Backout rev1.79 of if_vlan.c and rev1.66 of if_trunk.c;
Changes in those revision limited the send queue to one slot.

This breaks NFS over vlan(4) has discovered by sthen@.

"just plain back it out." deraadt@


# 1.66 13-Jul-2009 dlg

make the send queue one slot long. this forces packets off the virtual
interfaces down to the queue on the physical interface immediately.

this avoids having the tx mitigation code wasting cpu time dicking around
with simply shuffling packets off virtual interface queues and lets it
do its job of ammortising the cost of calling a real interfaces start
routine.

it also prevents an artificial inflation of the physical interfaces queue
length where packets could hide on the virtual interfaces queues during
softnet before being dumped en masse onto the hardware. this will smooth
out the rate at which packets are submitted to the hardware.

kjc@ says this has no impact on altq. ya henning@


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.65 27-Jan-2009 naddy

handle HW VLAN tags being passed down; from Brad


# 1.64 27-Jan-2009 naddy

make the hardware/no hardware tag stripping cases consistent and don't
hash the VLAN priority; ok henning@


# 1.63 14-Dec-2008 brad

Allow trunk_hashmbuf() to take HW VLAN tagging into consideration.

ok mpf@ naddy@


# 1.62 14-Dec-2008 brad

Since trunk_hashmbuf() and thus trunk_lb_gethdr() are no longer specific
to the loadbalance code rename trunk_lb_gethdr() to just trunk_gethr().

ok mpf@


# 1.61 28-Nov-2008 brad

Eliminate the redundant bits of code for MTU and multicast handling
from the individual drivers now that ether_ioctl() handles this.

Shrinks the i386 kernels by..
RAMDISK - 2176 bytes
RAMDISKB - 1504 bytes
RAMDISKC - 736 bytes

Tested by naddy@/okan@/sthen@/brad@/todd@/jmc@ and lots of users.
Build tested on almost all archs by todd@/brad@

ok naddy@


# 1.60 16-Nov-2008 brad

Make sure to increment the the output error counter if
not using TRUNK_PROTO_NONE and there are no member ports.

ok mpf@


# 1.59 08-Nov-2008 mpf

Take into account that our ether_input() already strips the
ethernet header. This lets us actually process the incoming
LACP-Packets. It should now work with a lot more switches.
At least a Catalyst 3500 seems happy.
OK brad@


# 1.58 04-Nov-2008 brad

Move the trunk port count check from trunk_lb_start() to trunk_start()
before the protocol start routine is called so as to cover all protocols
with the same check.

ok mpf@


# 1.57 30-Oct-2008 brad

Fix building with !INET6 kernels.


# 1.56 28-Oct-2008 brad

Remove return at the end of a void function.


# 1.55 28-Oct-2008 brad

In trunk_media_status() mark the interface as active if any ports are
active rather than just the primary being UP.

From FreeBSD

Ok mpf@


# 1.54 28-Oct-2008 brad

In trunk_lb_start() port % count will never be greater than
TRUNK_MAX_PORTS so nuke the test.

From FreeBSD

Ok mpf@


# 1.53 28-Oct-2008 brad

Feed IPv6 flow label to hash calculation.

From FreeBSD

Ok mpf@


# 1.52 28-Oct-2008 brad

Show the ACTIVE flag in ifconfig for the single interface that is
actually active in failover mode rather than all interfaces with a
link. This makes it clear if the master interface is in use or one
of the backup links.

From FreeBSD

Tested by jmc@
Ok mpf@


# 1.51 02-Oct-2008 brad

First step towards cleaning up the Ethernet driver ioctl handling.
Move calling ether_ioctl() from the top of the ioctl function, which
at the moment does absolutely nothing, to the default switch case.
Thus allowing drivers to define their own ioctl handlers and then
falling back on ether_ioctl(). The only functional change this results
in at the moment is having all Ethernet drivers returning the proper
errno of ENOTTY instead of EINVAL/ENXIO when encountering unknown
ioctl's.

Shrinks the i386 kernels by..
RAMDISK - 1024 bytes
RAMDISKB - 1120 bytes
RAMDISKC - 832 bytes

Tested by martin@/jsing@/todd@/brad@
Build tested on almost all archs by todd@/brad@

ok jsing@


# 1.50 17-Sep-2008 chl

remove dead stores and newly created unused variables.

fix potential use of uninitialized value in trunk_port_ioctl() function.

Found by LLVM/Clang Static Analyzer.

ok mpf@ henning@


# 1.49 07-Aug-2008 damien

do not touch m after IFQ_ENQUEUE()+if_start().

ok brad@, mpf@, henning@, reyk@


Revision tags: OPENBSD_4_4_BASE
# 1.48 06-Aug-2008 reyk

fix trunk breakage that sneaked in with the lacp diff:

- don't use in-kernel IFMEDIA ioctls in lacp and remove two KASSERTs
that caused reliable panics - the lacp key can be locally assigned and
we don't need to query the media subtype here.

- unbreak failover/loadbalance/broadcast status handling.

Reported by brad@
ok deraadt@


# 1.47 30-Jul-2008 mpf

Prevent a divide by zero panic if trunkproto loadbalance is
used w/out any trunkports. Patch from Dmitri Alenitchev.
OK reyk@


# 1.46 15-Jun-2008 mpf

Add 802.3ad LACP support for trunk(4).
Implementation from NetBSD. Ported via FreeBSD's version in trunk^Wlagg(4).
This is still work in progress. Tested with a HP ProCurve 3500.
OK reyk@


# 1.45 14-Jun-2008 mpf

Move bpf_mtap_hdr() above the trunk_*_input() routines.
This makes it easier to add trunk protocols that consume some packets.
Add a special case for the failover protocol, to prevent shoving
duplicates to bpf. (Not beautiful, but it has to do for the moment).
OK reyk@, claudio@


# 1.44 13-Jun-2008 mpf

Move the responsibility to free received packets on trunked interfaces
from ether_input() into trunk_input() where it can be handled in a smarter way.
OK claudio@ and reyk@ on an earlier version.


# 1.43 08-Jun-2008 brad

Use m_freem() instead of m_free() in trunk_start() to ensure that the
full mbuf chain is being free'd.

ok reyk@


# 1.42 07-May-2008 dlg

enable tx mitigation when putting packets on the wire by switching from
calls to ifp->if_start to if_start(). these are the obviously right cases
where we can do that, the less obvious ones may follow as theyre figured
out.

deraadt@ said to go for it


Revision tags: OPENBSD_4_3_BASE
# 1.41 10-Jan-2008 brad

return with ENOTTY instead of EINVAL for unknown ioctl requests to trunk
ports.

ok reyk@ dlg@


# 1.40 26-Nov-2007 martynas

typos; ok jmc@
sys/netinet/in_pcb.c and sys/net/bridgestp.c ok henning@
sys/dev/pci/bktr/* ok jakemsr@


# 1.39 20-Nov-2007 canacar

Fix possible mbuf leak on error. ok reyk@


# 1.38 22-Oct-2007 reyk

use the input mbuf for the first active port instead of copying it in
the broadcast start routing.

ok pyr@


# 1.37 22-Oct-2007 pyr

Add a broadcast mode to trunk(4). This mode sends frames on all
ports and receives frame on any port. This allows interaction with
some L2 configurations.
with input and ok reyk@


# 1.36 15-Sep-2007 henning

malloc sweep:
-remove useless casts
-MALLOC/FREE -> malloc/free
-use M_ZERO where appropriate instead of seperate bzero
feedback & ok krw, hshoexer


# 1.35 07-Sep-2007 reyk

use M_ZERO


# 1.34 06-Sep-2007 reyk

bump the copyright while touching these files


# 1.33 06-Sep-2007 reyk

With a trunk(4) interface in fail over mode the trunk(4) interface
will show input errors for packets received from any of the ports that
are part of a fail over interface but are not the "master" port at the
time. This fixes the problem by checking the error condition
correctly.

From brad at comstyle dot com


Revision tags: OPENBSD_4_2_BASE
# 1.32 26-May-2007 jason

one extern seems to be better than 20 for ifqmaxlen; ok krw


# 1.31 26-Apr-2007 reyk

extend the trunk protocol API with some additional callbacks required
for future work. also move the repeated tx start code into a common
function.

parts of it are merged from FreeBSD's trunk(4) port. oh, wait... they
renamed it to 'lagg(4)' because a little green guy from Cizzco-Eeeh
told them "trunk is for VLANs, trunk is for VLANs". Bad FreeBSD, don't
listen to the little green guy from Cizzco-Eeeh!

ok claudio@


Revision tags: OPENBSD_4_1_BASE
# 1.30 31-Jan-2007 reyk

handle the full duplex link state in trunk(4). load sharing trunks
with at least two ports are always handled as full fuplex links. this
change will allow trunks as edge ports in a rstp bridge(4).

ok brad@ pyr@


Revision tags: OPENBSD_4_0_BASE
# 1.29 28-May-2006 reyk

check if the interface is active and UP. some, but not all, network
drivers report an active link state even if the interface is DOWN.
this should fix trunk with various ethernet devices.

ok brad@


# 1.28 23-May-2006 reyk

knf and remove an unneeded debug message


# 1.27 23-May-2006 reyk

add


# 1.26 20-May-2006 reyk

bump copyright


# 1.25 20-May-2006 reyk

drop packets received on inactive failover ports without increasing the
error counter. just silently drop them...

figured out by todd@, ok brad@


# 1.24 16-May-2006 reyk

the ifp->if_linkstatehooks may be NULL, add an extra check to avoid
possible kernel panic. this happened to me by using tun(4) in layer 2
mode (link0 flag) as a trunk port for testing.


# 1.23 25-Mar-2006 djm

allow bpf(4) to ignore packets based on their direction (inbound or
outbound), using a new BIOCSDIRFILT ioctl;
guidance, feedback and ok canacar@


# 1.22 11-Mar-2006 brad

splimp -> splnet


# 1.21 04-Mar-2006 brad

With the exception of two other small uncommited diffs this moves
the remainder of the network stack from splimp to splnet.

ok miod@


Revision tags: OPENBSD_3_9_BASE
# 1.20 04-Jan-2006 brad

Move bpf_mtap_hdr() after trunk_*_input() so that we hopefully see only one
copy of the packet with bpf.

From mcbride@

some testing by todd@, ok reyk@


# 1.19 04-Jan-2006 canacar

Remove redundant calls to bpfdetach.
ok brad@


# 1.18 21-Dec-2005 reyk

knf


# 1.17 21-Dec-2005 reyk

fix possible NULL pointer, thanks to Marco Molteni


# 1.16 18-Dec-2005 reyk

Update my e-mail address in the copyright statement, no binary changes.


# 1.15 17-Dec-2005 brad

revert last commit as it introduced system panics due to improper use
of bpf_mtap().


# 1.14 27-Nov-2005 mcbride

Move bpf_mtap() after trunk_*_input() so that we hopefull see only one
copy of the packet with bpf.

ok reyk@


# 1.13 27-Nov-2005 mcbride

Attempt to accept a packet only once when operating in failover mode.

Makes trunk usable with hubs or switches which don't have actual trunk support.

ok reyk@


# 1.12 27-Nov-2005 mcbride

Fix dereference of uninitialised pointer in trunk_input() error path.

ok reyk@


# 1.11 27-Nov-2005 mcbride

Make the trunk interface link state depend on the link states of the
trunkports (link is UP as long as at least one of the trunkports is up)

ok reyk@


# 1.10 23-Oct-2005 mpf

Rework of multicast deletion code for vlan(4) and trunk(4).
The previous code could wrongly delete multicast groups
on the parent interface. Now we forward only remembered
delete requests.
OK mcbride, mickey.


# 1.9 09-Oct-2005 reyk

use label


# 1.8 03-Oct-2005 reyk

add a simple active "failover" trunk mode. port priorities will be
added later, currently the master port will always be the default
(active) port and the the next active port will be used as the
failover port.

ok brad@


# 1.7 14-Sep-2005 reyk

correctly set IFF_RUNNING flag after device state changes.

ok brad@


# 1.6 11-Sep-2005 brad

when adding any IP addresses make sure to UP the interface.

ok reyk@


# 1.5 10-Sep-2005 reyk

update the trunk(4) driver

- add multicast support by passing multicast addresses to the ports.
this is a requirement for carp(4) over trunk(4).

- support the smallest common interface capabilities. ie., this adds
support for VLAN MTUs if all attached ports have this capability.

- add a port_destroy callback to the trunk protocol. this fixes a
potential crash if the master port has been detached while running.

discussed with deraadt@, brad@ and some others.


Revision tags: OPENBSD_3_8_BASE
# 1.4 31-Jul-2005 pascoe

Introduce bpf_mtap_af and bpf_mtap_hdr to be used when passing a mbuf chain
to bpf with either an address family or other header added.

These helpers only allocate a much smaller struct m_hdr on the stack when
needed, rather than leaving 256 byte struct mbufs on the stack in deep
call paths. Also removes a fair bit of duplicated code.

commit now, tune after deraadt@


# 1.3 27-May-2005 reyk

add missing free on error. thanks to Andrey Matveev.


# 1.2 24-May-2005 reyk

support trunk stacking (trunks as trunk ports) and some fixes

ok brad@


# 1.1 24-May-2005 reyk

initial import of a trunking (link aggregation and link failover)
implementation. it currently supports round robin mode with link state
checking, additional modes will be added later.

ok brad@, deraadt@


# 1.147 10-Jul-2020 patrick

Change users of IFQ_DEQUEUE(), IFQ_ENQUEUE() and IFQ_LEN() to use the
"new" API.

ok dlg@ tobhe@


# 1.146 17-Jun-2020 dlg

make ph_flowid in mbufs 16bits by storing whether it's set in csum_flags.

i've been wanting to do this for a while, and now that we've got
stoeplitz and it gives us 16 bits, it seems like the right time.


# 1.145 21-May-2020 dlg

don't limit the output queue (ifq) length to 1 anymore.

if we use the ifq to move packet processing to another context,
it's too easy to fill up the one slot and cause packet loss.

the ifq len was set to 1 to avoid delays produced by the original
implementation of tx mitigation. however, trunk now introduces
latency because it isn't mpsafe yet, which causes the network stack
to have to take the kernel lock for each packet, and the kernel
lock can be quite contended. i want to use the ifq to move the
packet to the systq thread (which already has the kernel lock)
before trunk is asked to transmit it.

tested by mark patruck and myself.


Revision tags: OPENBSD_6_7_BASE
# 1.144 06-Dec-2019 dlg

when copying capabilities from the first port to a trunk, copy hardmtu too.

previously it copied the ports if_mtu to the trunks if_hardmtu,
which makes it hard for things like vlan(4) to work with a full
frame size, or large frame size.

tested by hrvoje popovski


# 1.143 07-Nov-2019 dlg

turn the linkstate hooks into a task list, like the detach hooks.

this is largely mechanical, except for carp. this moves the addition
of the carp link state hook after we're committed to using the new
interface as a carpdev. because the add can't fail, we avoid a
complicated unwind dance. also, this tweaks the carp linkstate hook
so it only updates the relevant carp interface, not all of the
carpdevs on the parent.

hrvoje popovski has tested an early version of this diff and it's
generally ok, but there's some splasserts that this diff fires that
i'll fix in an upcoming diff.

ok claudio@


# 1.142 06-Nov-2019 dlg

replace the hooks used with if_detachhooks with a task list.

the main semantic change is that things registering detach hooks
have to allocate and set a task structure that then gets added to
the list. this means if the task is allocated up front (eg, as part
of carps softc or bridges port structure), it avoids the possibility
that adding a hook can fail. a lot of drivers weren't checking for
failure, and unwinding state in the event of failure in other parts
was error prone.

while doing this i discovered that the list operations have to be
in a particular order, but drivers weren't doing that consistently
either. this diff wraps the list ops up so you have to seriously
go out of your way to screw them up.

ive also sprinkled some NET_ASSERT_LOCKED around the list operations
so we can make sure there's no potential for the list to be corrupted,
especially while it's being run.

hrvoje popovski has tested this a bit, and some issues he discovered
have been fixed.

ok sashan@


Revision tags: OPENBSD_6_6_BASE
# 1.141 05-Jul-2019 dlg

record when trunk takes over an interface by setting ac_trunkport

this will be used to prevent trunk and the upcoming aggr driver
from taking ownership of an Ethernet interface at the same time.


# 1.140 11-May-2019 florian

A trunk(4) usually stays up when the link state of one of its members
changes. While we do get RTM_IFINFO messages for the (physical) member
interfaces there is no indication that something changed from the
trunk(4) interface.
It is helpful to get this information in userland from the trunk so that
userland daemons do not need to track interface membership by themselves.
OK phessler


# 1.139 29-Apr-2019 dlg

tr_unit is unused, so gc it


# 1.138 23-Apr-2019 dlg

a first cut at converting some virtual ethernet interfaces to if_vinput

this let's input processing bypass ifiqs. there's a performance
benefit from this, and it will let me tweak the backpressure detection
mechanism that ifiqs use without impacting on a stack of virtual
interfaces.

ive tested all of these except mpw, which i will end up testing
soon anyway.


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.137 12-Aug-2018 ccardenas

Add administrative options to LACP trunk implementation.

The trunk driver now has a new ioctl (SIOCxTRUNKOPTS), which for now only
has options for LACP:
* Mode - Active or Passive (default Active)
* Timeout - Fast or Slow (default Slow)
* System Priority - 1(high) to 65535(low) (default 32768/0x8000)
* Port Priority - 1(high) to 65535(low) (default 32768/0x8000)
* IFQ Priority - 0 to NUM_QUEUES (default 6)

At the moment, ifconfig only has options for lacpmode and lacptimeout
plumbed as those are the immediate need.

The approach taken for the options was to make them on a "trunk" vs a
"port" as what's typically seen on various NOSes (JunOS, NXOS, etc...)
as it's uncommon for a host to have one link "Passive" and the other
"Active" in a given trunk.

Just like on a NOS, when applying lacpmode or lacptimeout, the settings
are immediately applied to all existing ports in the trunk and to all
future ports brought into the trunk.

Tested by many on a plethora of NIC drivers and switches.

Ok remi@


Revision tags: OPENBSD_6_3_BASE
# 1.136 19-Feb-2018 mpi

Remove almost unused `flags' argument of suser().

The account flag `ASU' will no longer be set but that makes suser()
mpsafe since it no longer mess with a per-process field.

No objection from millert@, ok tedu@, bluhm@


# 1.135 09-Jan-2018 bluhm

Creating a cloned interface could return ENOMEM due to temporary
memory shortage. As it is invoked from a system call, it should
not fail and wait instead.
OK visa@ mpi@


Revision tags: OPENBSD_6_2_BASE
# 1.134 14-Aug-2017 reyk

The "ret" return value is reused and overwritten, potentially
returning 0 (success) on error instead of an error number. The caller
doesn't evaluate the return value, so it is good enough to return
ENOBUFS (non-0) on error and to remove "ret" in trunk_cast_start().

Coverity CID 1453105; Severity: Minor

OK mpi@


# 1.133 11-Aug-2017 mpi

Remove NET_LOCK()'s argument.

Tested by Hrvoje Popovski, ok bluhm@


# 1.132 28-May-2017 mpi

Add missing NET_UNLOCK() in error path.

Spotted by sashan@


# 1.131 28-May-2017 mpi

trunk_port_destroy() needs the NET_LOCK().

It brings the interface down and restore the original lladdr.

Found by Hrvoje Popovski


# 1.130 28-May-2017 mpi

Remove useless splnet()/splx() dances.

Data structures modified in the ioctl path are protected by the NET_LOCK().

ok sashan@


Revision tags: OPENBSD_6_1_BASE
# 1.129 22-Jan-2017 dlg

move counting if_opackets next to counting if_obytes in if_enqueue.

this means packets are consistently counted in one place, unlike the
many and various ways that drivers thought they should do it.

ok mpi@ deraadt@


# 1.128 16-Sep-2016 mikeb

Reconfigure interface capabilities after switching trunkproto; ok mpi


Revision tags: OPENBSD_6_0_BASE
# 1.127 13-Apr-2016 mpi

We're always ready! So send IFQ_SET_READY() to the bitbucket.


Revision tags: OPENBSD_5_9_BASE
# 1.126 31-Dec-2015 sthen

Move tr_port_destroy down; fixes 'lacp_compose_key protection fault trap'
when removing a port from a lacp trunk. Part of a larger diff from mpi,
as suggested by mikeb. ok mpi@


# 1.125 21-Nov-2015 dlg

dont check IFF_OACTIVE to see if the port is busy.

dont check if its busy at all, actually.

fine with reyk@


# 1.124 20-Nov-2015 dlg

dont play with IFF_OACTIVE needlessly.

only a driver sets or clears it, and trunk never sets it. therefore it
never needs to clear it.


# 1.123 12-Nov-2015 mpi

Prefix flowid with ph_ and print it in m_print().

ok dlg@


# 1.122 25-Oct-2015 mpi

arp_ifinit() is no longer required.


# 1.121 08-Oct-2015 mikeb

Make sure that when trunk_port_ioctl is called to set a new
lladdr the trunk port is already on the list.

OK mpi


# 1.120 08-Oct-2015 dlg

if the mbuf has a valid flowid, use it instead of using siphash24
and a bunch of header fields we have to parse the mbuf for.

siphash24 is about 20% of the cost of sending a udp packet on a
trunk interface with tcpbench on my box. if there's a flowid set
we get all that back.

ok mpi@ mikeb@ sthen@


# 1.119 05-Oct-2015 mikeb

Factor LACP frame processing out to a separate task

This is slightly refactored version of the diff by jmatthew@
that makes use of a single per-trunk task but retains per-port
mbuf queues.

Running LACP frame processing in a task context allows a simple
way to synchronize changes to the trunk ports and trunk itself
performed from the ioctl, timeout and task contexts with a kernel
lock.

OK mpi


# 1.118 29-Sep-2015 deraadt

add sizes to some of the simpler free calls
ok mpi


# 1.117 28-Sep-2015 mpi

Remove "if_tp" from the "struct ifnet".

Instead of violating a layer of abstraction by keeping per pseudo-driver
informations in "struct ifnet", the port trunk is now passed as a cookie
to the interface input handler (ifih).

The time of per pseudo-driver hack in the network stack is over!

ok mikeb@


# 1.116 24-Sep-2015 mikeb

add a comment explaining how we serialize when switching trunkproto;
requested by mpi@


# 1.115 24-Sep-2015 mikeb

Avoid a theoretical m_pullup(9) mishandling by delegating the mbuf
reclaiming to the PDU and marker input routines.

m_pullup may return a pointer to the newly allocated mbuf. In this
case m_freem is called by the trunk_input, not by the proto specific
code and pointer to the mbuf is not passed by reference. Therefore
m_freem will either be called on the middle element of the chain
(when the m_pullup call succeeds) or on the stale pointer (when it
frees the chain in the failure case). Fortunately we should never
hit this case as the receive path uniformly uses contiguous chunks
of memory.

Verified with and ok blambert, ok mpi


# 1.114 23-Sep-2015 mikeb

Serialize trunk changes with input handler insertion and removal.

This moves around calls to if_ih_insert and if_ih_remove to ensure
that we either have completed port initialization or are going to
tear the port configuration down and don't want any input processes
to get hold of the port.

When trunk_port_destroy is called from the ioctl this would wait for
all input processes to finish and release their references to be able
to disestablish the input handler and ensure full control of the port.

When switching trunkproto it is required for the ioctl context to
be able to touch all trunk ports and the protocol (tr_psc). The
easiest way do this is to disestablish all input handlers (while
making sure they all complete) and then reestablish them after the
trunk reconfiguration is completed.

This avoids getting trunk a separate locking protocol of its own.

ok mpi, suggested by and ok dlg


# 1.113 23-Sep-2015 mikeb

Keep track of an active port in the failover trunk to avoid list
iterations and additional locking protection in the future.

Suggested by and ok mpi


# 1.112 23-Sep-2015 mikeb

Remove trunk watchdog code since it doesn't do anything useful
and we want to limit the number of different places where we
access trunk port pointers.

trunk_watchdog should be never called as we don't set up it's
if_timer and trunk_port_watchdog just calls the if_watchdog
from the underlying interface.

It's possible that this is no longer needed due to if_slowtimo/
if_watchdog changes done earlier.

ok mpi


# 1.111 10-Sep-2015 mikeb

pass a cookie argument to interface input handlers that can be used
to pass additional context or transient data with the similar life
time.

ok mpi, suggestions, hand holding and ok from dlg


# 1.110 10-Sep-2015 dlg

move the if input handler list to an SRP list.

instead of having every driver that manipulates the ifih list
understand SRPLs, this moves that processing into if_ih_insert and
if_ih_remove functions.

we rely on the kernel lock to serialise the modifications to the
list.

tested by mpi@
ok mpi@ claudio@ mikeb@


Revision tags: OPENBSD_5_8_BASE
# 1.109 17-Jul-2015 mpi

Drop promiscuously received packets if the trunk(4) interface is not
in promiscuous mode.

The long story is that claudio@ had his ssh session reset multiple
times in the hackroom because czarkoff@'s machine was sending reset.
We figured out that the packet was reaching pf because of this missing
check. pf would then not find any state and sent a reset.

Analyzed with and ok phessler@, claudio@


# 1.108 02-Jul-2015 mpi

Unify the check for up & running between all pseudo-drivers.


# 1.107 02-Jul-2015 mpi

By design if_input_process() needs to hold a reference on the receiving
ifp in order to access its ifih handlers.

So get rid of if_get() in the various ifih handlers we know the ifp is
live at this point.

ok dlg@


# 1.106 30-Jun-2015 mpi

Rename if_output() into if_enqueue() to avoid confusion with comments
talking about (*ifp->if_output)().

ok claudio@, dlg@


# 1.105 29-Jun-2015 dlg

count if_ibytes in if_input like we do for if_ipackets.

tweaks and ok mpi@


# 1.104 24-Jun-2015 mpi

Increment if_ipackets in if_input().

Note that pseudo-drivers not using if_input() are not affected by this
conversion.

ok mikeb@, kettenis@, claudio@, dlg@


# 1.103 16-Jun-2015 mpi

Store a unique ID, an interface index, rather than a pointer to the
receiving interface in the packet header of every mbuf.

The interface pointer should now be retrieved when necessary with
if_get(). If a NULL pointer is returned by if_get(), the interface
has probably been destroy/removed and the mbuf should be freed.

Such mechanism will simplify garbage collection of mbufs and limit
problems with dangling ifp pointers.

Tested by jmatthew@ and krw@, discussed with many.

ok mikeb@, bluhm@, dlg@


# 1.102 15-Jun-2015 mpi

Fix a double free in the destroy path triggered when a second process,
in my case dhclient(8), races with ifconfig(8) to free the descriptors
of the joined multicast groups.

While here reduce the difference with carp(4).

ok dms@


# 1.101 09-Jun-2015 mpi

Convert trunk(4) to if_input().

ok dlg@


# 1.100 26-May-2015 mpi

Now that the Ethernet header is always passed as part of the mbuf, kill
the second (unused) argument of the input packet handlers.

ok dlg@


# 1.99 15-May-2015 mpi

Introduce if_output(), a function do to the last steps before enqueuing
a packet on the sending queue of an interface.

Tested by many, thanks a lot!

ok dlg@, claudio@


# 1.98 14-May-2015 mpi

Allocate the input packet handler as part of the trunk_port structure
since they have the same lifetime.

Requested by and ok dlg@


# 1.97 13-May-2015 mpi

Get rid of the last "#if NTRUNK" by overwriting trunk ports' output
function.

ok claudio@, reyk@


# 1.96 11-May-2015 mpi

Take trunk(4) out of ether_input().

Each physical interface of a trunk now gets a specific ifih (interface
input handler) that runs before ether_input().

Tested by sthen@, dlg@, Theo Buehler and <mxb AT alumni.chalmers DOT se>

ok sthen@, dlg@


# 1.95 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.94 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.93 04-Dec-2014 tedu

use siphash for trunk loadbalancing. ok deraadt


# 1.92 01-Dec-2014 mikeb

Make every interface with a watchdog register it's own slow timeout

This removes the system wide if_slowtimo timeout and lets every
interface with a valid if_watchdog method register it's own in
order to get rid of the ifnet loop in the softclock context and
avoid further complications with concurrent access to the ifnet
list.

ok deraadt, input and ok mpi, looked at by claudio


# 1.91 18-Nov-2014 tedu

move arc4random prototype to systm.h. more appropriate for most code
to include that than rdnvar.h. ok deraadt dlg


Revision tags: OPENBSD_5_6_BASE
# 1.90 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.89 12-Jul-2014 tedu

add a size argument to free. will be used soon, but for now default to 0.
after discussions with beck deraadt kettenis.


# 1.88 09-Jul-2014 henning

bpf code surgery / shuffling / simplification.
the various bpf_mtap_* are very similiar, they differ in what (and to some
extent how) they prepend something, and what copy function they pass to
bpf_catchpacket.
use an internal _bpf_mtap as "backend" for bpf_mtap and friends.
extend bpf_mtap_hdr so that it covers all common cases:
if dlen is 0, nothing gets prepended.
copy function can be given, if NULL the default bpf_mcopy is used.
adjust the existing bpf_mtap_hdr users to pass a NULL ptr for the copy fn.
re-implement bpf_mtap_af as simple wrapper for bpf_mtap_hdr.
re-implement bpf_mtap_ether using bpf_map_hdr
re-implement bpf_mtap_pflog as trivial bpf_mtap_hdr wrapper
ok bluhm benno


# 1.87 10-Mar-2014 mpi

if_lladdr -> if_sadl, no functional change.

ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.86 21-Nov-2013 mpi

Remove unneeded include.

ok deraadt@


# 1.85 18-Nov-2013 mpi

Convert trunk(4) to use a detachhook, discussed at b2k13 with many.

While here add a comment explaining detach hooks' order of execution when
destroying/detaching an interface.


Revision tags: OPENBSD_5_4_BASE
# 1.84 20-Jun-2013 mpi

Revert previous and unbreak asr, the new include should be protected.

Reported by naddy@


# 1.83 20-Jun-2013 mpi

Allocate the various hook head descriptors as part of the ifnet
structure rather than doing various M_WAITOK allocations during
the *attach() functions, we always rely on them anyway.

ok mikeb@, uebayasi@


# 1.82 11-May-2013 sthen

Set trunk(4)'s MTU to that of the first trunkport. Allows trunk to work with
jumbo/baby-jumbo frames. To avoid problems with mismatches between trunkports,
any additional ports must have the same MTU as already set on the trunk(4).
Based on changes made in FreeBSD. Tested by myself and jj@, ok reyk@


# 1.81 02-Apr-2013 mpi

Instead of storing the link-level address of every interface in a global
array indexed by interface numbers, add a new field to the interface
descriptor pointing to it.

claudio@ and todd@ like it, ok mikeb@


# 1.80 28-Mar-2013 tedu

no need for a lot of code to include proc.h


Revision tags: OPENBSD_5_3_BASE
# 1.79 25-Feb-2013 dlg

trunk_bcast_start sent packets on all its member interfaces by copying
the mbuf it just sent on the previous interface. this is bad because the
previous interface could have modified the mbuf chain, which can make the
subsequent m_copym()s panic.

this copies the dance that rtsock.c does for broadcasting mbufs which
copies the mbuf before transmit, except for the last interface which it
handles outside the loop.

tested by halex@ who verified it fixes his panic.
ok claudio@ deraadt@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.78 28-Oct-2011 krw

Take more care to ensure all callbacks are initialized. In particular
tr_linkstate() was not initialized in trunk_rr_attach(), and
tr_init() and tr_stop() were missing in trunk_lb_attach().

Fixes crash triggered by changing trunkproto, reported by Anders
Berggren on bugs@.

ok mpf henning


Revision tags: OPENBSD_4_9_BASE OPENBSD_5_0_BASE
# 1.77 28-Jan-2011 reyk

Fix another memory leak by replacing m_free() with m_freem() in trunk's
broadcast mode.

ok blambert@ mikeb@


# 1.76 12-Nov-2010 dhill

avoid pointer arithmetic on void *
no change in .o

ok claudio


Revision tags: OPENBSD_4_8_BASE
# 1.75 08-May-2010 stsp

Upon changing the MAC address of an if_trunk interface, all ports are switched
to the new MAC. But subsequently added ports were still being assigned the
old MAC address because it was copied from the wrong place. Give newly added
trunk ports the current MAC of the master port, rather than the saved MAC of
the master port. The saved MAC should only be used to restore the original
MAC address of the interface when it is removed from the trunk.

ok claudio@


# 1.74 23-Apr-2010 stsp

Use proper Queen's English in a comment. Drive-by fix, no functional change.


# 1.73 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


# 1.72 17-Apr-2010 deraadt

use ifnewlladdr() for trunk lladdr changes, too
ok stsp


Revision tags: OPENBSD_4_7_BASE
# 1.71 12-Jan-2010 dlg

set the length of the send queue to 1.

this prevents the ultimate length of the queue of the underlying interface
from being artificially inflated while hte vlan/trunk queue is filled and
then dumped wholesale on the underlying interface, which will dump its
massive queue wholesale on the chip.

tx mitigation is only triggered on real interfaces now (which is where the
cost is)

ok beck@ original diff ok kjc@ henning@


# 1.70 18-Nov-2009 deraadt

do not do setup that ether_ifattach() takes care of; ok jsg


# 1.69 17-Sep-2009 claudio

Add an splassert check trunk_enqueue() calling this function at anything
below splnet() is a good recipe for doom.
OK henning, reyk, mpf


# 1.68 09-Sep-2009 reyk

remove inline functions and move some code from the trunk_lacp_input()
API function directly to lacp_input() to simplify the code path.

ok mpf@


# 1.67 16-Jul-2009 thib

Backout rev1.79 of if_vlan.c and rev1.66 of if_trunk.c;
Changes in those revision limited the send queue to one slot.

This breaks NFS over vlan(4) has discovered by sthen@.

"just plain back it out." deraadt@


# 1.66 13-Jul-2009 dlg

make the send queue one slot long. this forces packets off the virtual
interfaces down to the queue on the physical interface immediately.

this avoids having the tx mitigation code wasting cpu time dicking around
with simply shuffling packets off virtual interface queues and lets it
do its job of ammortising the cost of calling a real interfaces start
routine.

it also prevents an artificial inflation of the physical interfaces queue
length where packets could hide on the virtual interfaces queues during
softnet before being dumped en masse onto the hardware. this will smooth
out the rate at which packets are submitted to the hardware.

kjc@ says this has no impact on altq. ya henning@


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.65 27-Jan-2009 naddy

handle HW VLAN tags being passed down; from Brad


# 1.64 27-Jan-2009 naddy

make the hardware/no hardware tag stripping cases consistent and don't
hash the VLAN priority; ok henning@


# 1.63 14-Dec-2008 brad

Allow trunk_hashmbuf() to take HW VLAN tagging into consideration.

ok mpf@ naddy@


# 1.62 14-Dec-2008 brad

Since trunk_hashmbuf() and thus trunk_lb_gethdr() are no longer specific
to the loadbalance code rename trunk_lb_gethdr() to just trunk_gethr().

ok mpf@


# 1.61 28-Nov-2008 brad

Eliminate the redundant bits of code for MTU and multicast handling
from the individual drivers now that ether_ioctl() handles this.

Shrinks the i386 kernels by..
RAMDISK - 2176 bytes
RAMDISKB - 1504 bytes
RAMDISKC - 736 bytes

Tested by naddy@/okan@/sthen@/brad@/todd@/jmc@ and lots of users.
Build tested on almost all archs by todd@/brad@

ok naddy@


# 1.60 16-Nov-2008 brad

Make sure to increment the the output error counter if
not using TRUNK_PROTO_NONE and there are no member ports.

ok mpf@


# 1.59 08-Nov-2008 mpf

Take into account that our ether_input() already strips the
ethernet header. This lets us actually process the incoming
LACP-Packets. It should now work with a lot more switches.
At least a Catalyst 3500 seems happy.
OK brad@


# 1.58 04-Nov-2008 brad

Move the trunk port count check from trunk_lb_start() to trunk_start()
before the protocol start routine is called so as to cover all protocols
with the same check.

ok mpf@


# 1.57 30-Oct-2008 brad

Fix building with !INET6 kernels.


# 1.56 28-Oct-2008 brad

Remove return at the end of a void function.


# 1.55 28-Oct-2008 brad

In trunk_media_status() mark the interface as active if any ports are
active rather than just the primary being UP.

From FreeBSD

Ok mpf@


# 1.54 28-Oct-2008 brad

In trunk_lb_start() port % count will never be greater than
TRUNK_MAX_PORTS so nuke the test.

From FreeBSD

Ok mpf@


# 1.53 28-Oct-2008 brad

Feed IPv6 flow label to hash calculation.

From FreeBSD

Ok mpf@


# 1.52 28-Oct-2008 brad

Show the ACTIVE flag in ifconfig for the single interface that is
actually active in failover mode rather than all interfaces with a
link. This makes it clear if the master interface is in use or one
of the backup links.

From FreeBSD

Tested by jmc@
Ok mpf@


# 1.51 02-Oct-2008 brad

First step towards cleaning up the Ethernet driver ioctl handling.
Move calling ether_ioctl() from the top of the ioctl function, which
at the moment does absolutely nothing, to the default switch case.
Thus allowing drivers to define their own ioctl handlers and then
falling back on ether_ioctl(). The only functional change this results
in at the moment is having all Ethernet drivers returning the proper
errno of ENOTTY instead of EINVAL/ENXIO when encountering unknown
ioctl's.

Shrinks the i386 kernels by..
RAMDISK - 1024 bytes
RAMDISKB - 1120 bytes
RAMDISKC - 832 bytes

Tested by martin@/jsing@/todd@/brad@
Build tested on almost all archs by todd@/brad@

ok jsing@


# 1.50 17-Sep-2008 chl

remove dead stores and newly created unused variables.

fix potential use of uninitialized value in trunk_port_ioctl() function.

Found by LLVM/Clang Static Analyzer.

ok mpf@ henning@


# 1.49 07-Aug-2008 damien

do not touch m after IFQ_ENQUEUE()+if_start().

ok brad@, mpf@, henning@, reyk@


Revision tags: OPENBSD_4_4_BASE
# 1.48 06-Aug-2008 reyk

fix trunk breakage that sneaked in with the lacp diff:

- don't use in-kernel IFMEDIA ioctls in lacp and remove two KASSERTs
that caused reliable panics - the lacp key can be locally assigned and
we don't need to query the media subtype here.

- unbreak failover/loadbalance/broadcast status handling.

Reported by brad@
ok deraadt@


# 1.47 30-Jul-2008 mpf

Prevent a divide by zero panic if trunkproto loadbalance is
used w/out any trunkports. Patch from Dmitri Alenitchev.
OK reyk@


# 1.46 15-Jun-2008 mpf

Add 802.3ad LACP support for trunk(4).
Implementation from NetBSD. Ported via FreeBSD's version in trunk^Wlagg(4).
This is still work in progress. Tested with a HP ProCurve 3500.
OK reyk@


# 1.45 14-Jun-2008 mpf

Move bpf_mtap_hdr() above the trunk_*_input() routines.
This makes it easier to add trunk protocols that consume some packets.
Add a special case for the failover protocol, to prevent shoving
duplicates to bpf. (Not beautiful, but it has to do for the moment).
OK reyk@, claudio@


# 1.44 13-Jun-2008 mpf

Move the responsibility to free received packets on trunked interfaces
from ether_input() into trunk_input() where it can be handled in a smarter way.
OK claudio@ and reyk@ on an earlier version.


# 1.43 08-Jun-2008 brad

Use m_freem() instead of m_free() in trunk_start() to ensure that the
full mbuf chain is being free'd.

ok reyk@


# 1.42 07-May-2008 dlg

enable tx mitigation when putting packets on the wire by switching from
calls to ifp->if_start to if_start(). these are the obviously right cases
where we can do that, the less obvious ones may follow as theyre figured
out.

deraadt@ said to go for it


Revision tags: OPENBSD_4_3_BASE
# 1.41 10-Jan-2008 brad

return with ENOTTY instead of EINVAL for unknown ioctl requests to trunk
ports.

ok reyk@ dlg@


# 1.40 26-Nov-2007 martynas

typos; ok jmc@
sys/netinet/in_pcb.c and sys/net/bridgestp.c ok henning@
sys/dev/pci/bktr/* ok jakemsr@


# 1.39 20-Nov-2007 canacar

Fix possible mbuf leak on error. ok reyk@


# 1.38 22-Oct-2007 reyk

use the input mbuf for the first active port instead of copying it in
the broadcast start routing.

ok pyr@


# 1.37 22-Oct-2007 pyr

Add a broadcast mode to trunk(4). This mode sends frames on all
ports and receives frame on any port. This allows interaction with
some L2 configurations.
with input and ok reyk@


# 1.36 15-Sep-2007 henning

malloc sweep:
-remove useless casts
-MALLOC/FREE -> malloc/free
-use M_ZERO where appropriate instead of seperate bzero
feedback & ok krw, hshoexer


# 1.35 07-Sep-2007 reyk

use M_ZERO


# 1.34 06-Sep-2007 reyk

bump the copyright while touching these files


# 1.33 06-Sep-2007 reyk

With a trunk(4) interface in fail over mode the trunk(4) interface
will show input errors for packets received from any of the ports that
are part of a fail over interface but are not the "master" port at the
time. This fixes the problem by checking the error condition
correctly.

From brad at comstyle dot com


Revision tags: OPENBSD_4_2_BASE
# 1.32 26-May-2007 jason

one extern seems to be better than 20 for ifqmaxlen; ok krw


# 1.31 26-Apr-2007 reyk

extend the trunk protocol API with some additional callbacks required
for future work. also move the repeated tx start code into a common
function.

parts of it are merged from FreeBSD's trunk(4) port. oh, wait... they
renamed it to 'lagg(4)' because a little green guy from Cizzco-Eeeh
told them "trunk is for VLANs, trunk is for VLANs". Bad FreeBSD, don't
listen to the little green guy from Cizzco-Eeeh!

ok claudio@


Revision tags: OPENBSD_4_1_BASE
# 1.30 31-Jan-2007 reyk

handle the full duplex link state in trunk(4). load sharing trunks
with at least two ports are always handled as full fuplex links. this
change will allow trunks as edge ports in a rstp bridge(4).

ok brad@ pyr@


Revision tags: OPENBSD_4_0_BASE
# 1.29 28-May-2006 reyk

check if the interface is active and UP. some, but not all, network
drivers report an active link state even if the interface is DOWN.
this should fix trunk with various ethernet devices.

ok brad@


# 1.28 23-May-2006 reyk

knf and remove an unneeded debug message


# 1.27 23-May-2006 reyk

add


# 1.26 20-May-2006 reyk

bump copyright


# 1.25 20-May-2006 reyk

drop packets received on inactive failover ports without increasing the
error counter. just silently drop them...

figured out by todd@, ok brad@


# 1.24 16-May-2006 reyk

the ifp->if_linkstatehooks may be NULL, add an extra check to avoid
possible kernel panic. this happened to me by using tun(4) in layer 2
mode (link0 flag) as a trunk port for testing.


# 1.23 25-Mar-2006 djm

allow bpf(4) to ignore packets based on their direction (inbound or
outbound), using a new BIOCSDIRFILT ioctl;
guidance, feedback and ok canacar@


# 1.22 11-Mar-2006 brad

splimp -> splnet


# 1.21 04-Mar-2006 brad

With the exception of two other small uncommited diffs this moves
the remainder of the network stack from splimp to splnet.

ok miod@


Revision tags: OPENBSD_3_9_BASE
# 1.20 04-Jan-2006 brad

Move bpf_mtap_hdr() after trunk_*_input() so that we hopefully see only one
copy of the packet with bpf.

From mcbride@

some testing by todd@, ok reyk@


# 1.19 04-Jan-2006 canacar

Remove redundant calls to bpfdetach.
ok brad@


# 1.18 21-Dec-2005 reyk

knf


# 1.17 21-Dec-2005 reyk

fix possible NULL pointer, thanks to Marco Molteni


# 1.16 18-Dec-2005 reyk

Update my e-mail address in the copyright statement, no binary changes.


# 1.15 17-Dec-2005 brad

revert last commit as it introduced system panics due to improper use
of bpf_mtap().


# 1.14 27-Nov-2005 mcbride

Move bpf_mtap() after trunk_*_input() so that we hopefull see only one
copy of the packet with bpf.

ok reyk@


# 1.13 27-Nov-2005 mcbride

Attempt to accept a packet only once when operating in failover mode.

Makes trunk usable with hubs or switches which don't have actual trunk support.

ok reyk@


# 1.12 27-Nov-2005 mcbride

Fix dereference of uninitialised pointer in trunk_input() error path.

ok reyk@


# 1.11 27-Nov-2005 mcbride

Make the trunk interface link state depend on the link states of the
trunkports (link is UP as long as at least one of the trunkports is up)

ok reyk@


# 1.10 23-Oct-2005 mpf

Rework of multicast deletion code for vlan(4) and trunk(4).
The previous code could wrongly delete multicast groups
on the parent interface. Now we forward only remembered
delete requests.
OK mcbride, mickey.


# 1.9 09-Oct-2005 reyk

use label


# 1.8 03-Oct-2005 reyk

add a simple active "failover" trunk mode. port priorities will be
added later, currently the master port will always be the default
(active) port and the the next active port will be used as the
failover port.

ok brad@


# 1.7 14-Sep-2005 reyk

correctly set IFF_RUNNING flag after device state changes.

ok brad@


# 1.6 11-Sep-2005 brad

when adding any IP addresses make sure to UP the interface.

ok reyk@


# 1.5 10-Sep-2005 reyk

update the trunk(4) driver

- add multicast support by passing multicast addresses to the ports.
this is a requirement for carp(4) over trunk(4).

- support the smallest common interface capabilities. ie., this adds
support for VLAN MTUs if all attached ports have this capability.

- add a port_destroy callback to the trunk protocol. this fixes a
potential crash if the master port has been detached while running.

discussed with deraadt@, brad@ and some others.


Revision tags: OPENBSD_3_8_BASE
# 1.4 31-Jul-2005 pascoe

Introduce bpf_mtap_af and bpf_mtap_hdr to be used when passing a mbuf chain
to bpf with either an address family or other header added.

These helpers only allocate a much smaller struct m_hdr on the stack when
needed, rather than leaving 256 byte struct mbufs on the stack in deep
call paths. Also removes a fair bit of duplicated code.

commit now, tune after deraadt@


# 1.3 27-May-2005 reyk

add missing free on error. thanks to Andrey Matveev.


# 1.2 24-May-2005 reyk

support trunk stacking (trunks as trunk ports) and some fixes

ok brad@


# 1.1 24-May-2005 reyk

initial import of a trunking (link aggregation and link failover)
implementation. it currently supports round robin mode with link state
checking, additional modes will be added later.

ok brad@, deraadt@


# 1.146 17-Jun-2020 dlg

make ph_flowid in mbufs 16bits by storing whether it's set in csum_flags.

i've been wanting to do this for a while, and now that we've got
stoeplitz and it gives us 16 bits, it seems like the right time.


# 1.145 21-May-2020 dlg

don't limit the output queue (ifq) length to 1 anymore.

if we use the ifq to move packet processing to another context,
it's too easy to fill up the one slot and cause packet loss.

the ifq len was set to 1 to avoid delays produced by the original
implementation of tx mitigation. however, trunk now introduces
latency because it isn't mpsafe yet, which causes the network stack
to have to take the kernel lock for each packet, and the kernel
lock can be quite contended. i want to use the ifq to move the
packet to the systq thread (which already has the kernel lock)
before trunk is asked to transmit it.

tested by mark patruck and myself.


Revision tags: OPENBSD_6_7_BASE
# 1.144 06-Dec-2019 dlg

when copying capabilities from the first port to a trunk, copy hardmtu too.

previously it copied the ports if_mtu to the trunks if_hardmtu,
which makes it hard for things like vlan(4) to work with a full
frame size, or large frame size.

tested by hrvoje popovski


# 1.143 07-Nov-2019 dlg

turn the linkstate hooks into a task list, like the detach hooks.

this is largely mechanical, except for carp. this moves the addition
of the carp link state hook after we're committed to using the new
interface as a carpdev. because the add can't fail, we avoid a
complicated unwind dance. also, this tweaks the carp linkstate hook
so it only updates the relevant carp interface, not all of the
carpdevs on the parent.

hrvoje popovski has tested an early version of this diff and it's
generally ok, but there's some splasserts that this diff fires that
i'll fix in an upcoming diff.

ok claudio@


# 1.142 06-Nov-2019 dlg

replace the hooks used with if_detachhooks with a task list.

the main semantic change is that things registering detach hooks
have to allocate and set a task structure that then gets added to
the list. this means if the task is allocated up front (eg, as part
of carps softc or bridges port structure), it avoids the possibility
that adding a hook can fail. a lot of drivers weren't checking for
failure, and unwinding state in the event of failure in other parts
was error prone.

while doing this i discovered that the list operations have to be
in a particular order, but drivers weren't doing that consistently
either. this diff wraps the list ops up so you have to seriously
go out of your way to screw them up.

ive also sprinkled some NET_ASSERT_LOCKED around the list operations
so we can make sure there's no potential for the list to be corrupted,
especially while it's being run.

hrvoje popovski has tested this a bit, and some issues he discovered
have been fixed.

ok sashan@


Revision tags: OPENBSD_6_6_BASE
# 1.141 05-Jul-2019 dlg

record when trunk takes over an interface by setting ac_trunkport

this will be used to prevent trunk and the upcoming aggr driver
from taking ownership of an Ethernet interface at the same time.


# 1.140 11-May-2019 florian

A trunk(4) usually stays up when the link state of one of its members
changes. While we do get RTM_IFINFO messages for the (physical) member
interfaces there is no indication that something changed from the
trunk(4) interface.
It is helpful to get this information in userland from the trunk so that
userland daemons do not need to track interface membership by themselves.
OK phessler


# 1.139 29-Apr-2019 dlg

tr_unit is unused, so gc it


# 1.138 23-Apr-2019 dlg

a first cut at converting some virtual ethernet interfaces to if_vinput

this let's input processing bypass ifiqs. there's a performance
benefit from this, and it will let me tweak the backpressure detection
mechanism that ifiqs use without impacting on a stack of virtual
interfaces.

ive tested all of these except mpw, which i will end up testing
soon anyway.


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.137 12-Aug-2018 ccardenas

Add administrative options to LACP trunk implementation.

The trunk driver now has a new ioctl (SIOCxTRUNKOPTS), which for now only
has options for LACP:
* Mode - Active or Passive (default Active)
* Timeout - Fast or Slow (default Slow)
* System Priority - 1(high) to 65535(low) (default 32768/0x8000)
* Port Priority - 1(high) to 65535(low) (default 32768/0x8000)
* IFQ Priority - 0 to NUM_QUEUES (default 6)

At the moment, ifconfig only has options for lacpmode and lacptimeout
plumbed as those are the immediate need.

The approach taken for the options was to make them on a "trunk" vs a
"port" as what's typically seen on various NOSes (JunOS, NXOS, etc...)
as it's uncommon for a host to have one link "Passive" and the other
"Active" in a given trunk.

Just like on a NOS, when applying lacpmode or lacptimeout, the settings
are immediately applied to all existing ports in the trunk and to all
future ports brought into the trunk.

Tested by many on a plethora of NIC drivers and switches.

Ok remi@


Revision tags: OPENBSD_6_3_BASE
# 1.136 19-Feb-2018 mpi

Remove almost unused `flags' argument of suser().

The account flag `ASU' will no longer be set but that makes suser()
mpsafe since it no longer mess with a per-process field.

No objection from millert@, ok tedu@, bluhm@


# 1.135 09-Jan-2018 bluhm

Creating a cloned interface could return ENOMEM due to temporary
memory shortage. As it is invoked from a system call, it should
not fail and wait instead.
OK visa@ mpi@


Revision tags: OPENBSD_6_2_BASE
# 1.134 14-Aug-2017 reyk

The "ret" return value is reused and overwritten, potentially
returning 0 (success) on error instead of an error number. The caller
doesn't evaluate the return value, so it is good enough to return
ENOBUFS (non-0) on error and to remove "ret" in trunk_cast_start().

Coverity CID 1453105; Severity: Minor

OK mpi@


# 1.133 11-Aug-2017 mpi

Remove NET_LOCK()'s argument.

Tested by Hrvoje Popovski, ok bluhm@


# 1.132 28-May-2017 mpi

Add missing NET_UNLOCK() in error path.

Spotted by sashan@


# 1.131 28-May-2017 mpi

trunk_port_destroy() needs the NET_LOCK().

It brings the interface down and restore the original lladdr.

Found by Hrvoje Popovski


# 1.130 28-May-2017 mpi

Remove useless splnet()/splx() dances.

Data structures modified in the ioctl path are protected by the NET_LOCK().

ok sashan@


Revision tags: OPENBSD_6_1_BASE
# 1.129 22-Jan-2017 dlg

move counting if_opackets next to counting if_obytes in if_enqueue.

this means packets are consistently counted in one place, unlike the
many and various ways that drivers thought they should do it.

ok mpi@ deraadt@


# 1.128 16-Sep-2016 mikeb

Reconfigure interface capabilities after switching trunkproto; ok mpi


Revision tags: OPENBSD_6_0_BASE
# 1.127 13-Apr-2016 mpi

We're always ready! So send IFQ_SET_READY() to the bitbucket.


Revision tags: OPENBSD_5_9_BASE
# 1.126 31-Dec-2015 sthen

Move tr_port_destroy down; fixes 'lacp_compose_key protection fault trap'
when removing a port from a lacp trunk. Part of a larger diff from mpi,
as suggested by mikeb. ok mpi@


# 1.125 21-Nov-2015 dlg

dont check IFF_OACTIVE to see if the port is busy.

dont check if its busy at all, actually.

fine with reyk@


# 1.124 20-Nov-2015 dlg

dont play with IFF_OACTIVE needlessly.

only a driver sets or clears it, and trunk never sets it. therefore it
never needs to clear it.


# 1.123 12-Nov-2015 mpi

Prefix flowid with ph_ and print it in m_print().

ok dlg@


# 1.122 25-Oct-2015 mpi

arp_ifinit() is no longer required.


# 1.121 08-Oct-2015 mikeb

Make sure that when trunk_port_ioctl is called to set a new
lladdr the trunk port is already on the list.

OK mpi


# 1.120 08-Oct-2015 dlg

if the mbuf has a valid flowid, use it instead of using siphash24
and a bunch of header fields we have to parse the mbuf for.

siphash24 is about 20% of the cost of sending a udp packet on a
trunk interface with tcpbench on my box. if there's a flowid set
we get all that back.

ok mpi@ mikeb@ sthen@


# 1.119 05-Oct-2015 mikeb

Factor LACP frame processing out to a separate task

This is slightly refactored version of the diff by jmatthew@
that makes use of a single per-trunk task but retains per-port
mbuf queues.

Running LACP frame processing in a task context allows a simple
way to synchronize changes to the trunk ports and trunk itself
performed from the ioctl, timeout and task contexts with a kernel
lock.

OK mpi


# 1.118 29-Sep-2015 deraadt

add sizes to some of the simpler free calls
ok mpi


# 1.117 28-Sep-2015 mpi

Remove "if_tp" from the "struct ifnet".

Instead of violating a layer of abstraction by keeping per pseudo-driver
informations in "struct ifnet", the port trunk is now passed as a cookie
to the interface input handler (ifih).

The time of per pseudo-driver hack in the network stack is over!

ok mikeb@


# 1.116 24-Sep-2015 mikeb

add a comment explaining how we serialize when switching trunkproto;
requested by mpi@


# 1.115 24-Sep-2015 mikeb

Avoid a theoretical m_pullup(9) mishandling by delegating the mbuf
reclaiming to the PDU and marker input routines.

m_pullup may return a pointer to the newly allocated mbuf. In this
case m_freem is called by the trunk_input, not by the proto specific
code and pointer to the mbuf is not passed by reference. Therefore
m_freem will either be called on the middle element of the chain
(when the m_pullup call succeeds) or on the stale pointer (when it
frees the chain in the failure case). Fortunately we should never
hit this case as the receive path uniformly uses contiguous chunks
of memory.

Verified with and ok blambert, ok mpi


# 1.114 23-Sep-2015 mikeb

Serialize trunk changes with input handler insertion and removal.

This moves around calls to if_ih_insert and if_ih_remove to ensure
that we either have completed port initialization or are going to
tear the port configuration down and don't want any input processes
to get hold of the port.

When trunk_port_destroy is called from the ioctl this would wait for
all input processes to finish and release their references to be able
to disestablish the input handler and ensure full control of the port.

When switching trunkproto it is required for the ioctl context to
be able to touch all trunk ports and the protocol (tr_psc). The
easiest way do this is to disestablish all input handlers (while
making sure they all complete) and then reestablish them after the
trunk reconfiguration is completed.

This avoids getting trunk a separate locking protocol of its own.

ok mpi, suggested by and ok dlg


# 1.113 23-Sep-2015 mikeb

Keep track of an active port in the failover trunk to avoid list
iterations and additional locking protection in the future.

Suggested by and ok mpi


# 1.112 23-Sep-2015 mikeb

Remove trunk watchdog code since it doesn't do anything useful
and we want to limit the number of different places where we
access trunk port pointers.

trunk_watchdog should be never called as we don't set up it's
if_timer and trunk_port_watchdog just calls the if_watchdog
from the underlying interface.

It's possible that this is no longer needed due to if_slowtimo/
if_watchdog changes done earlier.

ok mpi


# 1.111 10-Sep-2015 mikeb

pass a cookie argument to interface input handlers that can be used
to pass additional context or transient data with the similar life
time.

ok mpi, suggestions, hand holding and ok from dlg


# 1.110 10-Sep-2015 dlg

move the if input handler list to an SRP list.

instead of having every driver that manipulates the ifih list
understand SRPLs, this moves that processing into if_ih_insert and
if_ih_remove functions.

we rely on the kernel lock to serialise the modifications to the
list.

tested by mpi@
ok mpi@ claudio@ mikeb@


Revision tags: OPENBSD_5_8_BASE
# 1.109 17-Jul-2015 mpi

Drop promiscuously received packets if the trunk(4) interface is not
in promiscuous mode.

The long story is that claudio@ had his ssh session reset multiple
times in the hackroom because czarkoff@'s machine was sending reset.
We figured out that the packet was reaching pf because of this missing
check. pf would then not find any state and sent a reset.

Analyzed with and ok phessler@, claudio@


# 1.108 02-Jul-2015 mpi

Unify the check for up & running between all pseudo-drivers.


# 1.107 02-Jul-2015 mpi

By design if_input_process() needs to hold a reference on the receiving
ifp in order to access its ifih handlers.

So get rid of if_get() in the various ifih handlers we know the ifp is
live at this point.

ok dlg@


# 1.106 30-Jun-2015 mpi

Rename if_output() into if_enqueue() to avoid confusion with comments
talking about (*ifp->if_output)().

ok claudio@, dlg@


# 1.105 29-Jun-2015 dlg

count if_ibytes in if_input like we do for if_ipackets.

tweaks and ok mpi@


# 1.104 24-Jun-2015 mpi

Increment if_ipackets in if_input().

Note that pseudo-drivers not using if_input() are not affected by this
conversion.

ok mikeb@, kettenis@, claudio@, dlg@


# 1.103 16-Jun-2015 mpi

Store a unique ID, an interface index, rather than a pointer to the
receiving interface in the packet header of every mbuf.

The interface pointer should now be retrieved when necessary with
if_get(). If a NULL pointer is returned by if_get(), the interface
has probably been destroy/removed and the mbuf should be freed.

Such mechanism will simplify garbage collection of mbufs and limit
problems with dangling ifp pointers.

Tested by jmatthew@ and krw@, discussed with many.

ok mikeb@, bluhm@, dlg@


# 1.102 15-Jun-2015 mpi

Fix a double free in the destroy path triggered when a second process,
in my case dhclient(8), races with ifconfig(8) to free the descriptors
of the joined multicast groups.

While here reduce the difference with carp(4).

ok dms@


# 1.101 09-Jun-2015 mpi

Convert trunk(4) to if_input().

ok dlg@


# 1.100 26-May-2015 mpi

Now that the Ethernet header is always passed as part of the mbuf, kill
the second (unused) argument of the input packet handlers.

ok dlg@


# 1.99 15-May-2015 mpi

Introduce if_output(), a function do to the last steps before enqueuing
a packet on the sending queue of an interface.

Tested by many, thanks a lot!

ok dlg@, claudio@


# 1.98 14-May-2015 mpi

Allocate the input packet handler as part of the trunk_port structure
since they have the same lifetime.

Requested by and ok dlg@


# 1.97 13-May-2015 mpi

Get rid of the last "#if NTRUNK" by overwriting trunk ports' output
function.

ok claudio@, reyk@


# 1.96 11-May-2015 mpi

Take trunk(4) out of ether_input().

Each physical interface of a trunk now gets a specific ifih (interface
input handler) that runs before ether_input().

Tested by sthen@, dlg@, Theo Buehler and <mxb AT alumni.chalmers DOT se>

ok sthen@, dlg@


# 1.95 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.94 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.93 04-Dec-2014 tedu

use siphash for trunk loadbalancing. ok deraadt


# 1.92 01-Dec-2014 mikeb

Make every interface with a watchdog register it's own slow timeout

This removes the system wide if_slowtimo timeout and lets every
interface with a valid if_watchdog method register it's own in
order to get rid of the ifnet loop in the softclock context and
avoid further complications with concurrent access to the ifnet
list.

ok deraadt, input and ok mpi, looked at by claudio


# 1.91 18-Nov-2014 tedu

move arc4random prototype to systm.h. more appropriate for most code
to include that than rdnvar.h. ok deraadt dlg


Revision tags: OPENBSD_5_6_BASE
# 1.90 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.89 12-Jul-2014 tedu

add a size argument to free. will be used soon, but for now default to 0.
after discussions with beck deraadt kettenis.


# 1.88 09-Jul-2014 henning

bpf code surgery / shuffling / simplification.
the various bpf_mtap_* are very similiar, they differ in what (and to some
extent how) they prepend something, and what copy function they pass to
bpf_catchpacket.
use an internal _bpf_mtap as "backend" for bpf_mtap and friends.
extend bpf_mtap_hdr so that it covers all common cases:
if dlen is 0, nothing gets prepended.
copy function can be given, if NULL the default bpf_mcopy is used.
adjust the existing bpf_mtap_hdr users to pass a NULL ptr for the copy fn.
re-implement bpf_mtap_af as simple wrapper for bpf_mtap_hdr.
re-implement bpf_mtap_ether using bpf_map_hdr
re-implement bpf_mtap_pflog as trivial bpf_mtap_hdr wrapper
ok bluhm benno


# 1.87 10-Mar-2014 mpi

if_lladdr -> if_sadl, no functional change.

ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.86 21-Nov-2013 mpi

Remove unneeded include.

ok deraadt@


# 1.85 18-Nov-2013 mpi

Convert trunk(4) to use a detachhook, discussed at b2k13 with many.

While here add a comment explaining detach hooks' order of execution when
destroying/detaching an interface.


Revision tags: OPENBSD_5_4_BASE
# 1.84 20-Jun-2013 mpi

Revert previous and unbreak asr, the new include should be protected.

Reported by naddy@


# 1.83 20-Jun-2013 mpi

Allocate the various hook head descriptors as part of the ifnet
structure rather than doing various M_WAITOK allocations during
the *attach() functions, we always rely on them anyway.

ok mikeb@, uebayasi@


# 1.82 11-May-2013 sthen

Set trunk(4)'s MTU to that of the first trunkport. Allows trunk to work with
jumbo/baby-jumbo frames. To avoid problems with mismatches between trunkports,
any additional ports must have the same MTU as already set on the trunk(4).
Based on changes made in FreeBSD. Tested by myself and jj@, ok reyk@


# 1.81 02-Apr-2013 mpi

Instead of storing the link-level address of every interface in a global
array indexed by interface numbers, add a new field to the interface
descriptor pointing to it.

claudio@ and todd@ like it, ok mikeb@


# 1.80 28-Mar-2013 tedu

no need for a lot of code to include proc.h


Revision tags: OPENBSD_5_3_BASE
# 1.79 25-Feb-2013 dlg

trunk_bcast_start sent packets on all its member interfaces by copying
the mbuf it just sent on the previous interface. this is bad because the
previous interface could have modified the mbuf chain, which can make the
subsequent m_copym()s panic.

this copies the dance that rtsock.c does for broadcasting mbufs which
copies the mbuf before transmit, except for the last interface which it
handles outside the loop.

tested by halex@ who verified it fixes his panic.
ok claudio@ deraadt@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.78 28-Oct-2011 krw

Take more care to ensure all callbacks are initialized. In particular
tr_linkstate() was not initialized in trunk_rr_attach(), and
tr_init() and tr_stop() were missing in trunk_lb_attach().

Fixes crash triggered by changing trunkproto, reported by Anders
Berggren on bugs@.

ok mpf henning


Revision tags: OPENBSD_4_9_BASE OPENBSD_5_0_BASE
# 1.77 28-Jan-2011 reyk

Fix another memory leak by replacing m_free() with m_freem() in trunk's
broadcast mode.

ok blambert@ mikeb@


# 1.76 12-Nov-2010 dhill

avoid pointer arithmetic on void *
no change in .o

ok claudio


Revision tags: OPENBSD_4_8_BASE
# 1.75 08-May-2010 stsp

Upon changing the MAC address of an if_trunk interface, all ports are switched
to the new MAC. But subsequently added ports were still being assigned the
old MAC address because it was copied from the wrong place. Give newly added
trunk ports the current MAC of the master port, rather than the saved MAC of
the master port. The saved MAC should only be used to restore the original
MAC address of the interface when it is removed from the trunk.

ok claudio@


# 1.74 23-Apr-2010 stsp

Use proper Queen's English in a comment. Drive-by fix, no functional change.


# 1.73 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


# 1.72 17-Apr-2010 deraadt

use ifnewlladdr() for trunk lladdr changes, too
ok stsp


Revision tags: OPENBSD_4_7_BASE
# 1.71 12-Jan-2010 dlg

set the length of the send queue to 1.

this prevents the ultimate length of the queue of the underlying interface
from being artificially inflated while hte vlan/trunk queue is filled and
then dumped wholesale on the underlying interface, which will dump its
massive queue wholesale on the chip.

tx mitigation is only triggered on real interfaces now (which is where the
cost is)

ok beck@ original diff ok kjc@ henning@


# 1.70 18-Nov-2009 deraadt

do not do setup that ether_ifattach() takes care of; ok jsg


# 1.69 17-Sep-2009 claudio

Add an splassert check trunk_enqueue() calling this function at anything
below splnet() is a good recipe for doom.
OK henning, reyk, mpf


# 1.68 09-Sep-2009 reyk

remove inline functions and move some code from the trunk_lacp_input()
API function directly to lacp_input() to simplify the code path.

ok mpf@


# 1.67 16-Jul-2009 thib

Backout rev1.79 of if_vlan.c and rev1.66 of if_trunk.c;
Changes in those revision limited the send queue to one slot.

This breaks NFS over vlan(4) has discovered by sthen@.

"just plain back it out." deraadt@


# 1.66 13-Jul-2009 dlg

make the send queue one slot long. this forces packets off the virtual
interfaces down to the queue on the physical interface immediately.

this avoids having the tx mitigation code wasting cpu time dicking around
with simply shuffling packets off virtual interface queues and lets it
do its job of ammortising the cost of calling a real interfaces start
routine.

it also prevents an artificial inflation of the physical interfaces queue
length where packets could hide on the virtual interfaces queues during
softnet before being dumped en masse onto the hardware. this will smooth
out the rate at which packets are submitted to the hardware.

kjc@ says this has no impact on altq. ya henning@


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.65 27-Jan-2009 naddy

handle HW VLAN tags being passed down; from Brad


# 1.64 27-Jan-2009 naddy

make the hardware/no hardware tag stripping cases consistent and don't
hash the VLAN priority; ok henning@


# 1.63 14-Dec-2008 brad

Allow trunk_hashmbuf() to take HW VLAN tagging into consideration.

ok mpf@ naddy@


# 1.62 14-Dec-2008 brad

Since trunk_hashmbuf() and thus trunk_lb_gethdr() are no longer specific
to the loadbalance code rename trunk_lb_gethdr() to just trunk_gethr().

ok mpf@


# 1.61 28-Nov-2008 brad

Eliminate the redundant bits of code for MTU and multicast handling
from the individual drivers now that ether_ioctl() handles this.

Shrinks the i386 kernels by..
RAMDISK - 2176 bytes
RAMDISKB - 1504 bytes
RAMDISKC - 736 bytes

Tested by naddy@/okan@/sthen@/brad@/todd@/jmc@ and lots of users.
Build tested on almost all archs by todd@/brad@

ok naddy@


# 1.60 16-Nov-2008 brad

Make sure to increment the the output error counter if
not using TRUNK_PROTO_NONE and there are no member ports.

ok mpf@


# 1.59 08-Nov-2008 mpf

Take into account that our ether_input() already strips the
ethernet header. This lets us actually process the incoming
LACP-Packets. It should now work with a lot more switches.
At least a Catalyst 3500 seems happy.
OK brad@


# 1.58 04-Nov-2008 brad

Move the trunk port count check from trunk_lb_start() to trunk_start()
before the protocol start routine is called so as to cover all protocols
with the same check.

ok mpf@


# 1.57 30-Oct-2008 brad

Fix building with !INET6 kernels.


# 1.56 28-Oct-2008 brad

Remove return at the end of a void function.


# 1.55 28-Oct-2008 brad

In trunk_media_status() mark the interface as active if any ports are
active rather than just the primary being UP.

From FreeBSD

Ok mpf@


# 1.54 28-Oct-2008 brad

In trunk_lb_start() port % count will never be greater than
TRUNK_MAX_PORTS so nuke the test.

From FreeBSD

Ok mpf@


# 1.53 28-Oct-2008 brad

Feed IPv6 flow label to hash calculation.

From FreeBSD

Ok mpf@


# 1.52 28-Oct-2008 brad

Show the ACTIVE flag in ifconfig for the single interface that is
actually active in failover mode rather than all interfaces with a
link. This makes it clear if the master interface is in use or one
of the backup links.

From FreeBSD

Tested by jmc@
Ok mpf@


# 1.51 02-Oct-2008 brad

First step towards cleaning up the Ethernet driver ioctl handling.
Move calling ether_ioctl() from the top of the ioctl function, which
at the moment does absolutely nothing, to the default switch case.
Thus allowing drivers to define their own ioctl handlers and then
falling back on ether_ioctl(). The only functional change this results
in at the moment is having all Ethernet drivers returning the proper
errno of ENOTTY instead of EINVAL/ENXIO when encountering unknown
ioctl's.

Shrinks the i386 kernels by..
RAMDISK - 1024 bytes
RAMDISKB - 1120 bytes
RAMDISKC - 832 bytes

Tested by martin@/jsing@/todd@/brad@
Build tested on almost all archs by todd@/brad@

ok jsing@


# 1.50 17-Sep-2008 chl

remove dead stores and newly created unused variables.

fix potential use of uninitialized value in trunk_port_ioctl() function.

Found by LLVM/Clang Static Analyzer.

ok mpf@ henning@


# 1.49 07-Aug-2008 damien

do not touch m after IFQ_ENQUEUE()+if_start().

ok brad@, mpf@, henning@, reyk@


Revision tags: OPENBSD_4_4_BASE
# 1.48 06-Aug-2008 reyk

fix trunk breakage that sneaked in with the lacp diff:

- don't use in-kernel IFMEDIA ioctls in lacp and remove two KASSERTs
that caused reliable panics - the lacp key can be locally assigned and
we don't need to query the media subtype here.

- unbreak failover/loadbalance/broadcast status handling.

Reported by brad@
ok deraadt@


# 1.47 30-Jul-2008 mpf

Prevent a divide by zero panic if trunkproto loadbalance is
used w/out any trunkports. Patch from Dmitri Alenitchev.
OK reyk@


# 1.46 15-Jun-2008 mpf

Add 802.3ad LACP support for trunk(4).
Implementation from NetBSD. Ported via FreeBSD's version in trunk^Wlagg(4).
This is still work in progress. Tested with a HP ProCurve 3500.
OK reyk@


# 1.45 14-Jun-2008 mpf

Move bpf_mtap_hdr() above the trunk_*_input() routines.
This makes it easier to add trunk protocols that consume some packets.
Add a special case for the failover protocol, to prevent shoving
duplicates to bpf. (Not beautiful, but it has to do for the moment).
OK reyk@, claudio@


# 1.44 13-Jun-2008 mpf

Move the responsibility to free received packets on trunked interfaces
from ether_input() into trunk_input() where it can be handled in a smarter way.
OK claudio@ and reyk@ on an earlier version.


# 1.43 08-Jun-2008 brad

Use m_freem() instead of m_free() in trunk_start() to ensure that the
full mbuf chain is being free'd.

ok reyk@


# 1.42 07-May-2008 dlg

enable tx mitigation when putting packets on the wire by switching from
calls to ifp->if_start to if_start(). these are the obviously right cases
where we can do that, the less obvious ones may follow as theyre figured
out.

deraadt@ said to go for it


Revision tags: OPENBSD_4_3_BASE
# 1.41 10-Jan-2008 brad

return with ENOTTY instead of EINVAL for unknown ioctl requests to trunk
ports.

ok reyk@ dlg@


# 1.40 26-Nov-2007 martynas

typos; ok jmc@
sys/netinet/in_pcb.c and sys/net/bridgestp.c ok henning@
sys/dev/pci/bktr/* ok jakemsr@


# 1.39 20-Nov-2007 canacar

Fix possible mbuf leak on error. ok reyk@


# 1.38 22-Oct-2007 reyk

use the input mbuf for the first active port instead of copying it in
the broadcast start routing.

ok pyr@


# 1.37 22-Oct-2007 pyr

Add a broadcast mode to trunk(4). This mode sends frames on all
ports and receives frame on any port. This allows interaction with
some L2 configurations.
with input and ok reyk@


# 1.36 15-Sep-2007 henning

malloc sweep:
-remove useless casts
-MALLOC/FREE -> malloc/free
-use M_ZERO where appropriate instead of seperate bzero
feedback & ok krw, hshoexer


# 1.35 07-Sep-2007 reyk

use M_ZERO


# 1.34 06-Sep-2007 reyk

bump the copyright while touching these files


# 1.33 06-Sep-2007 reyk

With a trunk(4) interface in fail over mode the trunk(4) interface
will show input errors for packets received from any of the ports that
are part of a fail over interface but are not the "master" port at the
time. This fixes the problem by checking the error condition
correctly.

From brad at comstyle dot com


Revision tags: OPENBSD_4_2_BASE
# 1.32 26-May-2007 jason

one extern seems to be better than 20 for ifqmaxlen; ok krw


# 1.31 26-Apr-2007 reyk

extend the trunk protocol API with some additional callbacks required
for future work. also move the repeated tx start code into a common
function.

parts of it are merged from FreeBSD's trunk(4) port. oh, wait... they
renamed it to 'lagg(4)' because a little green guy from Cizzco-Eeeh
told them "trunk is for VLANs, trunk is for VLANs". Bad FreeBSD, don't
listen to the little green guy from Cizzco-Eeeh!

ok claudio@


Revision tags: OPENBSD_4_1_BASE
# 1.30 31-Jan-2007 reyk

handle the full duplex link state in trunk(4). load sharing trunks
with at least two ports are always handled as full fuplex links. this
change will allow trunks as edge ports in a rstp bridge(4).

ok brad@ pyr@


Revision tags: OPENBSD_4_0_BASE
# 1.29 28-May-2006 reyk

check if the interface is active and UP. some, but not all, network
drivers report an active link state even if the interface is DOWN.
this should fix trunk with various ethernet devices.

ok brad@


# 1.28 23-May-2006 reyk

knf and remove an unneeded debug message


# 1.27 23-May-2006 reyk

add


# 1.26 20-May-2006 reyk

bump copyright


# 1.25 20-May-2006 reyk

drop packets received on inactive failover ports without increasing the
error counter. just silently drop them...

figured out by todd@, ok brad@


# 1.24 16-May-2006 reyk

the ifp->if_linkstatehooks may be NULL, add an extra check to avoid
possible kernel panic. this happened to me by using tun(4) in layer 2
mode (link0 flag) as a trunk port for testing.


# 1.23 25-Mar-2006 djm

allow bpf(4) to ignore packets based on their direction (inbound or
outbound), using a new BIOCSDIRFILT ioctl;
guidance, feedback and ok canacar@


# 1.22 11-Mar-2006 brad

splimp -> splnet


# 1.21 04-Mar-2006 brad

With the exception of two other small uncommited diffs this moves
the remainder of the network stack from splimp to splnet.

ok miod@


Revision tags: OPENBSD_3_9_BASE
# 1.20 04-Jan-2006 brad

Move bpf_mtap_hdr() after trunk_*_input() so that we hopefully see only one
copy of the packet with bpf.

From mcbride@

some testing by todd@, ok reyk@


# 1.19 04-Jan-2006 canacar

Remove redundant calls to bpfdetach.
ok brad@


# 1.18 21-Dec-2005 reyk

knf


# 1.17 21-Dec-2005 reyk

fix possible NULL pointer, thanks to Marco Molteni


# 1.16 18-Dec-2005 reyk

Update my e-mail address in the copyright statement, no binary changes.


# 1.15 17-Dec-2005 brad

revert last commit as it introduced system panics due to improper use
of bpf_mtap().


# 1.14 27-Nov-2005 mcbride

Move bpf_mtap() after trunk_*_input() so that we hopefull see only one
copy of the packet with bpf.

ok reyk@


# 1.13 27-Nov-2005 mcbride

Attempt to accept a packet only once when operating in failover mode.

Makes trunk usable with hubs or switches which don't have actual trunk support.

ok reyk@


# 1.12 27-Nov-2005 mcbride

Fix dereference of uninitialised pointer in trunk_input() error path.

ok reyk@


# 1.11 27-Nov-2005 mcbride

Make the trunk interface link state depend on the link states of the
trunkports (link is UP as long as at least one of the trunkports is up)

ok reyk@


# 1.10 23-Oct-2005 mpf

Rework of multicast deletion code for vlan(4) and trunk(4).
The previous code could wrongly delete multicast groups
on the parent interface. Now we forward only remembered
delete requests.
OK mcbride, mickey.


# 1.9 09-Oct-2005 reyk

use label


# 1.8 03-Oct-2005 reyk

add a simple active "failover" trunk mode. port priorities will be
added later, currently the master port will always be the default
(active) port and the the next active port will be used as the
failover port.

ok brad@


# 1.7 14-Sep-2005 reyk

correctly set IFF_RUNNING flag after device state changes.

ok brad@


# 1.6 11-Sep-2005 brad

when adding any IP addresses make sure to UP the interface.

ok reyk@


# 1.5 10-Sep-2005 reyk

update the trunk(4) driver

- add multicast support by passing multicast addresses to the ports.
this is a requirement for carp(4) over trunk(4).

- support the smallest common interface capabilities. ie., this adds
support for VLAN MTUs if all attached ports have this capability.

- add a port_destroy callback to the trunk protocol. this fixes a
potential crash if the master port has been detached while running.

discussed with deraadt@, brad@ and some others.


Revision tags: OPENBSD_3_8_BASE
# 1.4 31-Jul-2005 pascoe

Introduce bpf_mtap_af and bpf_mtap_hdr to be used when passing a mbuf chain
to bpf with either an address family or other header added.

These helpers only allocate a much smaller struct m_hdr on the stack when
needed, rather than leaving 256 byte struct mbufs on the stack in deep
call paths. Also removes a fair bit of duplicated code.

commit now, tune after deraadt@


# 1.3 27-May-2005 reyk

add missing free on error. thanks to Andrey Matveev.


# 1.2 24-May-2005 reyk

support trunk stacking (trunks as trunk ports) and some fixes

ok brad@


# 1.1 24-May-2005 reyk

initial import of a trunking (link aggregation and link failover)
implementation. it currently supports round robin mode with link state
checking, additional modes will be added later.

ok brad@, deraadt@


# 1.145 21-May-2020 dlg

don't limit the output queue (ifq) length to 1 anymore.

if we use the ifq to move packet processing to another context,
it's too easy to fill up the one slot and cause packet loss.

the ifq len was set to 1 to avoid delays produced by the original
implementation of tx mitigation. however, trunk now introduces
latency because it isn't mpsafe yet, which causes the network stack
to have to take the kernel lock for each packet, and the kernel
lock can be quite contended. i want to use the ifq to move the
packet to the systq thread (which already has the kernel lock)
before trunk is asked to transmit it.

tested by mark patruck and myself.


Revision tags: OPENBSD_6_7_BASE
# 1.144 06-Dec-2019 dlg

when copying capabilities from the first port to a trunk, copy hardmtu too.

previously it copied the ports if_mtu to the trunks if_hardmtu,
which makes it hard for things like vlan(4) to work with a full
frame size, or large frame size.

tested by hrvoje popovski


# 1.143 07-Nov-2019 dlg

turn the linkstate hooks into a task list, like the detach hooks.

this is largely mechanical, except for carp. this moves the addition
of the carp link state hook after we're committed to using the new
interface as a carpdev. because the add can't fail, we avoid a
complicated unwind dance. also, this tweaks the carp linkstate hook
so it only updates the relevant carp interface, not all of the
carpdevs on the parent.

hrvoje popovski has tested an early version of this diff and it's
generally ok, but there's some splasserts that this diff fires that
i'll fix in an upcoming diff.

ok claudio@


# 1.142 06-Nov-2019 dlg

replace the hooks used with if_detachhooks with a task list.

the main semantic change is that things registering detach hooks
have to allocate and set a task structure that then gets added to
the list. this means if the task is allocated up front (eg, as part
of carps softc or bridges port structure), it avoids the possibility
that adding a hook can fail. a lot of drivers weren't checking for
failure, and unwinding state in the event of failure in other parts
was error prone.

while doing this i discovered that the list operations have to be
in a particular order, but drivers weren't doing that consistently
either. this diff wraps the list ops up so you have to seriously
go out of your way to screw them up.

ive also sprinkled some NET_ASSERT_LOCKED around the list operations
so we can make sure there's no potential for the list to be corrupted,
especially while it's being run.

hrvoje popovski has tested this a bit, and some issues he discovered
have been fixed.

ok sashan@


Revision tags: OPENBSD_6_6_BASE
# 1.141 05-Jul-2019 dlg

record when trunk takes over an interface by setting ac_trunkport

this will be used to prevent trunk and the upcoming aggr driver
from taking ownership of an Ethernet interface at the same time.


# 1.140 11-May-2019 florian

A trunk(4) usually stays up when the link state of one of its members
changes. While we do get RTM_IFINFO messages for the (physical) member
interfaces there is no indication that something changed from the
trunk(4) interface.
It is helpful to get this information in userland from the trunk so that
userland daemons do not need to track interface membership by themselves.
OK phessler


# 1.139 29-Apr-2019 dlg

tr_unit is unused, so gc it


# 1.138 23-Apr-2019 dlg

a first cut at converting some virtual ethernet interfaces to if_vinput

this let's input processing bypass ifiqs. there's a performance
benefit from this, and it will let me tweak the backpressure detection
mechanism that ifiqs use without impacting on a stack of virtual
interfaces.

ive tested all of these except mpw, which i will end up testing
soon anyway.


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.137 12-Aug-2018 ccardenas

Add administrative options to LACP trunk implementation.

The trunk driver now has a new ioctl (SIOCxTRUNKOPTS), which for now only
has options for LACP:
* Mode - Active or Passive (default Active)
* Timeout - Fast or Slow (default Slow)
* System Priority - 1(high) to 65535(low) (default 32768/0x8000)
* Port Priority - 1(high) to 65535(low) (default 32768/0x8000)
* IFQ Priority - 0 to NUM_QUEUES (default 6)

At the moment, ifconfig only has options for lacpmode and lacptimeout
plumbed as those are the immediate need.

The approach taken for the options was to make them on a "trunk" vs a
"port" as what's typically seen on various NOSes (JunOS, NXOS, etc...)
as it's uncommon for a host to have one link "Passive" and the other
"Active" in a given trunk.

Just like on a NOS, when applying lacpmode or lacptimeout, the settings
are immediately applied to all existing ports in the trunk and to all
future ports brought into the trunk.

Tested by many on a plethora of NIC drivers and switches.

Ok remi@


Revision tags: OPENBSD_6_3_BASE
# 1.136 19-Feb-2018 mpi

Remove almost unused `flags' argument of suser().

The account flag `ASU' will no longer be set but that makes suser()
mpsafe since it no longer mess with a per-process field.

No objection from millert@, ok tedu@, bluhm@


# 1.135 09-Jan-2018 bluhm

Creating a cloned interface could return ENOMEM due to temporary
memory shortage. As it is invoked from a system call, it should
not fail and wait instead.
OK visa@ mpi@


Revision tags: OPENBSD_6_2_BASE
# 1.134 14-Aug-2017 reyk

The "ret" return value is reused and overwritten, potentially
returning 0 (success) on error instead of an error number. The caller
doesn't evaluate the return value, so it is good enough to return
ENOBUFS (non-0) on error and to remove "ret" in trunk_cast_start().

Coverity CID 1453105; Severity: Minor

OK mpi@


# 1.133 11-Aug-2017 mpi

Remove NET_LOCK()'s argument.

Tested by Hrvoje Popovski, ok bluhm@


# 1.132 28-May-2017 mpi

Add missing NET_UNLOCK() in error path.

Spotted by sashan@


# 1.131 28-May-2017 mpi

trunk_port_destroy() needs the NET_LOCK().

It brings the interface down and restore the original lladdr.

Found by Hrvoje Popovski


# 1.130 28-May-2017 mpi

Remove useless splnet()/splx() dances.

Data structures modified in the ioctl path are protected by the NET_LOCK().

ok sashan@


Revision tags: OPENBSD_6_1_BASE
# 1.129 22-Jan-2017 dlg

move counting if_opackets next to counting if_obytes in if_enqueue.

this means packets are consistently counted in one place, unlike the
many and various ways that drivers thought they should do it.

ok mpi@ deraadt@


# 1.128 16-Sep-2016 mikeb

Reconfigure interface capabilities after switching trunkproto; ok mpi


Revision tags: OPENBSD_6_0_BASE
# 1.127 13-Apr-2016 mpi

We're always ready! So send IFQ_SET_READY() to the bitbucket.


Revision tags: OPENBSD_5_9_BASE
# 1.126 31-Dec-2015 sthen

Move tr_port_destroy down; fixes 'lacp_compose_key protection fault trap'
when removing a port from a lacp trunk. Part of a larger diff from mpi,
as suggested by mikeb. ok mpi@


# 1.125 21-Nov-2015 dlg

dont check IFF_OACTIVE to see if the port is busy.

dont check if its busy at all, actually.

fine with reyk@


# 1.124 20-Nov-2015 dlg

dont play with IFF_OACTIVE needlessly.

only a driver sets or clears it, and trunk never sets it. therefore it
never needs to clear it.


# 1.123 12-Nov-2015 mpi

Prefix flowid with ph_ and print it in m_print().

ok dlg@


# 1.122 25-Oct-2015 mpi

arp_ifinit() is no longer required.


# 1.121 08-Oct-2015 mikeb

Make sure that when trunk_port_ioctl is called to set a new
lladdr the trunk port is already on the list.

OK mpi


# 1.120 08-Oct-2015 dlg

if the mbuf has a valid flowid, use it instead of using siphash24
and a bunch of header fields we have to parse the mbuf for.

siphash24 is about 20% of the cost of sending a udp packet on a
trunk interface with tcpbench on my box. if there's a flowid set
we get all that back.

ok mpi@ mikeb@ sthen@


# 1.119 05-Oct-2015 mikeb

Factor LACP frame processing out to a separate task

This is slightly refactored version of the diff by jmatthew@
that makes use of a single per-trunk task but retains per-port
mbuf queues.

Running LACP frame processing in a task context allows a simple
way to synchronize changes to the trunk ports and trunk itself
performed from the ioctl, timeout and task contexts with a kernel
lock.

OK mpi


# 1.118 29-Sep-2015 deraadt

add sizes to some of the simpler free calls
ok mpi


# 1.117 28-Sep-2015 mpi

Remove "if_tp" from the "struct ifnet".

Instead of violating a layer of abstraction by keeping per pseudo-driver
informations in "struct ifnet", the port trunk is now passed as a cookie
to the interface input handler (ifih).

The time of per pseudo-driver hack in the network stack is over!

ok mikeb@


# 1.116 24-Sep-2015 mikeb

add a comment explaining how we serialize when switching trunkproto;
requested by mpi@


# 1.115 24-Sep-2015 mikeb

Avoid a theoretical m_pullup(9) mishandling by delegating the mbuf
reclaiming to the PDU and marker input routines.

m_pullup may return a pointer to the newly allocated mbuf. In this
case m_freem is called by the trunk_input, not by the proto specific
code and pointer to the mbuf is not passed by reference. Therefore
m_freem will either be called on the middle element of the chain
(when the m_pullup call succeeds) or on the stale pointer (when it
frees the chain in the failure case). Fortunately we should never
hit this case as the receive path uniformly uses contiguous chunks
of memory.

Verified with and ok blambert, ok mpi


# 1.114 23-Sep-2015 mikeb

Serialize trunk changes with input handler insertion and removal.

This moves around calls to if_ih_insert and if_ih_remove to ensure
that we either have completed port initialization or are going to
tear the port configuration down and don't want any input processes
to get hold of the port.

When trunk_port_destroy is called from the ioctl this would wait for
all input processes to finish and release their references to be able
to disestablish the input handler and ensure full control of the port.

When switching trunkproto it is required for the ioctl context to
be able to touch all trunk ports and the protocol (tr_psc). The
easiest way do this is to disestablish all input handlers (while
making sure they all complete) and then reestablish them after the
trunk reconfiguration is completed.

This avoids getting trunk a separate locking protocol of its own.

ok mpi, suggested by and ok dlg


# 1.113 23-Sep-2015 mikeb

Keep track of an active port in the failover trunk to avoid list
iterations and additional locking protection in the future.

Suggested by and ok mpi


# 1.112 23-Sep-2015 mikeb

Remove trunk watchdog code since it doesn't do anything useful
and we want to limit the number of different places where we
access trunk port pointers.

trunk_watchdog should be never called as we don't set up it's
if_timer and trunk_port_watchdog just calls the if_watchdog
from the underlying interface.

It's possible that this is no longer needed due to if_slowtimo/
if_watchdog changes done earlier.

ok mpi


# 1.111 10-Sep-2015 mikeb

pass a cookie argument to interface input handlers that can be used
to pass additional context or transient data with the similar life
time.

ok mpi, suggestions, hand holding and ok from dlg


# 1.110 10-Sep-2015 dlg

move the if input handler list to an SRP list.

instead of having every driver that manipulates the ifih list
understand SRPLs, this moves that processing into if_ih_insert and
if_ih_remove functions.

we rely on the kernel lock to serialise the modifications to the
list.

tested by mpi@
ok mpi@ claudio@ mikeb@


Revision tags: OPENBSD_5_8_BASE
# 1.109 17-Jul-2015 mpi

Drop promiscuously received packets if the trunk(4) interface is not
in promiscuous mode.

The long story is that claudio@ had his ssh session reset multiple
times in the hackroom because czarkoff@'s machine was sending reset.
We figured out that the packet was reaching pf because of this missing
check. pf would then not find any state and sent a reset.

Analyzed with and ok phessler@, claudio@


# 1.108 02-Jul-2015 mpi

Unify the check for up & running between all pseudo-drivers.


# 1.107 02-Jul-2015 mpi

By design if_input_process() needs to hold a reference on the receiving
ifp in order to access its ifih handlers.

So get rid of if_get() in the various ifih handlers we know the ifp is
live at this point.

ok dlg@


# 1.106 30-Jun-2015 mpi

Rename if_output() into if_enqueue() to avoid confusion with comments
talking about (*ifp->if_output)().

ok claudio@, dlg@


# 1.105 29-Jun-2015 dlg

count if_ibytes in if_input like we do for if_ipackets.

tweaks and ok mpi@


# 1.104 24-Jun-2015 mpi

Increment if_ipackets in if_input().

Note that pseudo-drivers not using if_input() are not affected by this
conversion.

ok mikeb@, kettenis@, claudio@, dlg@


# 1.103 16-Jun-2015 mpi

Store a unique ID, an interface index, rather than a pointer to the
receiving interface in the packet header of every mbuf.

The interface pointer should now be retrieved when necessary with
if_get(). If a NULL pointer is returned by if_get(), the interface
has probably been destroy/removed and the mbuf should be freed.

Such mechanism will simplify garbage collection of mbufs and limit
problems with dangling ifp pointers.

Tested by jmatthew@ and krw@, discussed with many.

ok mikeb@, bluhm@, dlg@


# 1.102 15-Jun-2015 mpi

Fix a double free in the destroy path triggered when a second process,
in my case dhclient(8), races with ifconfig(8) to free the descriptors
of the joined multicast groups.

While here reduce the difference with carp(4).

ok dms@


# 1.101 09-Jun-2015 mpi

Convert trunk(4) to if_input().

ok dlg@


# 1.100 26-May-2015 mpi

Now that the Ethernet header is always passed as part of the mbuf, kill
the second (unused) argument of the input packet handlers.

ok dlg@


# 1.99 15-May-2015 mpi

Introduce if_output(), a function do to the last steps before enqueuing
a packet on the sending queue of an interface.

Tested by many, thanks a lot!

ok dlg@, claudio@


# 1.98 14-May-2015 mpi

Allocate the input packet handler as part of the trunk_port structure
since they have the same lifetime.

Requested by and ok dlg@


# 1.97 13-May-2015 mpi

Get rid of the last "#if NTRUNK" by overwriting trunk ports' output
function.

ok claudio@, reyk@


# 1.96 11-May-2015 mpi

Take trunk(4) out of ether_input().

Each physical interface of a trunk now gets a specific ifih (interface
input handler) that runs before ether_input().

Tested by sthen@, dlg@, Theo Buehler and <mxb AT alumni.chalmers DOT se>

ok sthen@, dlg@


# 1.95 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.94 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.93 04-Dec-2014 tedu

use siphash for trunk loadbalancing. ok deraadt


# 1.92 01-Dec-2014 mikeb

Make every interface with a watchdog register it's own slow timeout

This removes the system wide if_slowtimo timeout and lets every
interface with a valid if_watchdog method register it's own in
order to get rid of the ifnet loop in the softclock context and
avoid further complications with concurrent access to the ifnet
list.

ok deraadt, input and ok mpi, looked at by claudio


# 1.91 18-Nov-2014 tedu

move arc4random prototype to systm.h. more appropriate for most code
to include that than rdnvar.h. ok deraadt dlg


Revision tags: OPENBSD_5_6_BASE
# 1.90 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.89 12-Jul-2014 tedu

add a size argument to free. will be used soon, but for now default to 0.
after discussions with beck deraadt kettenis.


# 1.88 09-Jul-2014 henning

bpf code surgery / shuffling / simplification.
the various bpf_mtap_* are very similiar, they differ in what (and to some
extent how) they prepend something, and what copy function they pass to
bpf_catchpacket.
use an internal _bpf_mtap as "backend" for bpf_mtap and friends.
extend bpf_mtap_hdr so that it covers all common cases:
if dlen is 0, nothing gets prepended.
copy function can be given, if NULL the default bpf_mcopy is used.
adjust the existing bpf_mtap_hdr users to pass a NULL ptr for the copy fn.
re-implement bpf_mtap_af as simple wrapper for bpf_mtap_hdr.
re-implement bpf_mtap_ether using bpf_map_hdr
re-implement bpf_mtap_pflog as trivial bpf_mtap_hdr wrapper
ok bluhm benno


# 1.87 10-Mar-2014 mpi

if_lladdr -> if_sadl, no functional change.

ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.86 21-Nov-2013 mpi

Remove unneeded include.

ok deraadt@


# 1.85 18-Nov-2013 mpi

Convert trunk(4) to use a detachhook, discussed at b2k13 with many.

While here add a comment explaining detach hooks' order of execution when
destroying/detaching an interface.


Revision tags: OPENBSD_5_4_BASE
# 1.84 20-Jun-2013 mpi

Revert previous and unbreak asr, the new include should be protected.

Reported by naddy@


# 1.83 20-Jun-2013 mpi

Allocate the various hook head descriptors as part of the ifnet
structure rather than doing various M_WAITOK allocations during
the *attach() functions, we always rely on them anyway.

ok mikeb@, uebayasi@


# 1.82 11-May-2013 sthen

Set trunk(4)'s MTU to that of the first trunkport. Allows trunk to work with
jumbo/baby-jumbo frames. To avoid problems with mismatches between trunkports,
any additional ports must have the same MTU as already set on the trunk(4).
Based on changes made in FreeBSD. Tested by myself and jj@, ok reyk@


# 1.81 02-Apr-2013 mpi

Instead of storing the link-level address of every interface in a global
array indexed by interface numbers, add a new field to the interface
descriptor pointing to it.

claudio@ and todd@ like it, ok mikeb@


# 1.80 28-Mar-2013 tedu

no need for a lot of code to include proc.h


Revision tags: OPENBSD_5_3_BASE
# 1.79 25-Feb-2013 dlg

trunk_bcast_start sent packets on all its member interfaces by copying
the mbuf it just sent on the previous interface. this is bad because the
previous interface could have modified the mbuf chain, which can make the
subsequent m_copym()s panic.

this copies the dance that rtsock.c does for broadcasting mbufs which
copies the mbuf before transmit, except for the last interface which it
handles outside the loop.

tested by halex@ who verified it fixes his panic.
ok claudio@ deraadt@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.78 28-Oct-2011 krw

Take more care to ensure all callbacks are initialized. In particular
tr_linkstate() was not initialized in trunk_rr_attach(), and
tr_init() and tr_stop() were missing in trunk_lb_attach().

Fixes crash triggered by changing trunkproto, reported by Anders
Berggren on bugs@.

ok mpf henning


Revision tags: OPENBSD_4_9_BASE OPENBSD_5_0_BASE
# 1.77 28-Jan-2011 reyk

Fix another memory leak by replacing m_free() with m_freem() in trunk's
broadcast mode.

ok blambert@ mikeb@


# 1.76 12-Nov-2010 dhill

avoid pointer arithmetic on void *
no change in .o

ok claudio


Revision tags: OPENBSD_4_8_BASE
# 1.75 08-May-2010 stsp

Upon changing the MAC address of an if_trunk interface, all ports are switched
to the new MAC. But subsequently added ports were still being assigned the
old MAC address because it was copied from the wrong place. Give newly added
trunk ports the current MAC of the master port, rather than the saved MAC of
the master port. The saved MAC should only be used to restore the original
MAC address of the interface when it is removed from the trunk.

ok claudio@


# 1.74 23-Apr-2010 stsp

Use proper Queen's English in a comment. Drive-by fix, no functional change.


# 1.73 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


# 1.72 17-Apr-2010 deraadt

use ifnewlladdr() for trunk lladdr changes, too
ok stsp


Revision tags: OPENBSD_4_7_BASE
# 1.71 12-Jan-2010 dlg

set the length of the send queue to 1.

this prevents the ultimate length of the queue of the underlying interface
from being artificially inflated while hte vlan/trunk queue is filled and
then dumped wholesale on the underlying interface, which will dump its
massive queue wholesale on the chip.

tx mitigation is only triggered on real interfaces now (which is where the
cost is)

ok beck@ original diff ok kjc@ henning@


# 1.70 18-Nov-2009 deraadt

do not do setup that ether_ifattach() takes care of; ok jsg


# 1.69 17-Sep-2009 claudio

Add an splassert check trunk_enqueue() calling this function at anything
below splnet() is a good recipe for doom.
OK henning, reyk, mpf


# 1.68 09-Sep-2009 reyk

remove inline functions and move some code from the trunk_lacp_input()
API function directly to lacp_input() to simplify the code path.

ok mpf@


# 1.67 16-Jul-2009 thib

Backout rev1.79 of if_vlan.c and rev1.66 of if_trunk.c;
Changes in those revision limited the send queue to one slot.

This breaks NFS over vlan(4) has discovered by sthen@.

"just plain back it out." deraadt@


# 1.66 13-Jul-2009 dlg

make the send queue one slot long. this forces packets off the virtual
interfaces down to the queue on the physical interface immediately.

this avoids having the tx mitigation code wasting cpu time dicking around
with simply shuffling packets off virtual interface queues and lets it
do its job of ammortising the cost of calling a real interfaces start
routine.

it also prevents an artificial inflation of the physical interfaces queue
length where packets could hide on the virtual interfaces queues during
softnet before being dumped en masse onto the hardware. this will smooth
out the rate at which packets are submitted to the hardware.

kjc@ says this has no impact on altq. ya henning@


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.65 27-Jan-2009 naddy

handle HW VLAN tags being passed down; from Brad


# 1.64 27-Jan-2009 naddy

make the hardware/no hardware tag stripping cases consistent and don't
hash the VLAN priority; ok henning@


# 1.63 14-Dec-2008 brad

Allow trunk_hashmbuf() to take HW VLAN tagging into consideration.

ok mpf@ naddy@


# 1.62 14-Dec-2008 brad

Since trunk_hashmbuf() and thus trunk_lb_gethdr() are no longer specific
to the loadbalance code rename trunk_lb_gethdr() to just trunk_gethr().

ok mpf@


# 1.61 28-Nov-2008 brad

Eliminate the redundant bits of code for MTU and multicast handling
from the individual drivers now that ether_ioctl() handles this.

Shrinks the i386 kernels by..
RAMDISK - 2176 bytes
RAMDISKB - 1504 bytes
RAMDISKC - 736 bytes

Tested by naddy@/okan@/sthen@/brad@/todd@/jmc@ and lots of users.
Build tested on almost all archs by todd@/brad@

ok naddy@


# 1.60 16-Nov-2008 brad

Make sure to increment the the output error counter if
not using TRUNK_PROTO_NONE and there are no member ports.

ok mpf@


# 1.59 08-Nov-2008 mpf

Take into account that our ether_input() already strips the
ethernet header. This lets us actually process the incoming
LACP-Packets. It should now work with a lot more switches.
At least a Catalyst 3500 seems happy.
OK brad@


# 1.58 04-Nov-2008 brad

Move the trunk port count check from trunk_lb_start() to trunk_start()
before the protocol start routine is called so as to cover all protocols
with the same check.

ok mpf@


# 1.57 30-Oct-2008 brad

Fix building with !INET6 kernels.


# 1.56 28-Oct-2008 brad

Remove return at the end of a void function.


# 1.55 28-Oct-2008 brad

In trunk_media_status() mark the interface as active if any ports are
active rather than just the primary being UP.

From FreeBSD

Ok mpf@


# 1.54 28-Oct-2008 brad

In trunk_lb_start() port % count will never be greater than
TRUNK_MAX_PORTS so nuke the test.

From FreeBSD

Ok mpf@


# 1.53 28-Oct-2008 brad

Feed IPv6 flow label to hash calculation.

From FreeBSD

Ok mpf@


# 1.52 28-Oct-2008 brad

Show the ACTIVE flag in ifconfig for the single interface that is
actually active in failover mode rather than all interfaces with a
link. This makes it clear if the master interface is in use or one
of the backup links.

From FreeBSD

Tested by jmc@
Ok mpf@


# 1.51 02-Oct-2008 brad

First step towards cleaning up the Ethernet driver ioctl handling.
Move calling ether_ioctl() from the top of the ioctl function, which
at the moment does absolutely nothing, to the default switch case.
Thus allowing drivers to define their own ioctl handlers and then
falling back on ether_ioctl(). The only functional change this results
in at the moment is having all Ethernet drivers returning the proper
errno of ENOTTY instead of EINVAL/ENXIO when encountering unknown
ioctl's.

Shrinks the i386 kernels by..
RAMDISK - 1024 bytes
RAMDISKB - 1120 bytes
RAMDISKC - 832 bytes

Tested by martin@/jsing@/todd@/brad@
Build tested on almost all archs by todd@/brad@

ok jsing@


# 1.50 17-Sep-2008 chl

remove dead stores and newly created unused variables.

fix potential use of uninitialized value in trunk_port_ioctl() function.

Found by LLVM/Clang Static Analyzer.

ok mpf@ henning@


# 1.49 07-Aug-2008 damien

do not touch m after IFQ_ENQUEUE()+if_start().

ok brad@, mpf@, henning@, reyk@


Revision tags: OPENBSD_4_4_BASE
# 1.48 06-Aug-2008 reyk

fix trunk breakage that sneaked in with the lacp diff:

- don't use in-kernel IFMEDIA ioctls in lacp and remove two KASSERTs
that caused reliable panics - the lacp key can be locally assigned and
we don't need to query the media subtype here.

- unbreak failover/loadbalance/broadcast status handling.

Reported by brad@
ok deraadt@


# 1.47 30-Jul-2008 mpf

Prevent a divide by zero panic if trunkproto loadbalance is
used w/out any trunkports. Patch from Dmitri Alenitchev.
OK reyk@


# 1.46 15-Jun-2008 mpf

Add 802.3ad LACP support for trunk(4).
Implementation from NetBSD. Ported via FreeBSD's version in trunk^Wlagg(4).
This is still work in progress. Tested with a HP ProCurve 3500.
OK reyk@


# 1.45 14-Jun-2008 mpf

Move bpf_mtap_hdr() above the trunk_*_input() routines.
This makes it easier to add trunk protocols that consume some packets.
Add a special case for the failover protocol, to prevent shoving
duplicates to bpf. (Not beautiful, but it has to do for the moment).
OK reyk@, claudio@


# 1.44 13-Jun-2008 mpf

Move the responsibility to free received packets on trunked interfaces
from ether_input() into trunk_input() where it can be handled in a smarter way.
OK claudio@ and reyk@ on an earlier version.


# 1.43 08-Jun-2008 brad

Use m_freem() instead of m_free() in trunk_start() to ensure that the
full mbuf chain is being free'd.

ok reyk@


# 1.42 07-May-2008 dlg

enable tx mitigation when putting packets on the wire by switching from
calls to ifp->if_start to if_start(). these are the obviously right cases
where we can do that, the less obvious ones may follow as theyre figured
out.

deraadt@ said to go for it


Revision tags: OPENBSD_4_3_BASE
# 1.41 10-Jan-2008 brad

return with ENOTTY instead of EINVAL for unknown ioctl requests to trunk
ports.

ok reyk@ dlg@


# 1.40 26-Nov-2007 martynas

typos; ok jmc@
sys/netinet/in_pcb.c and sys/net/bridgestp.c ok henning@
sys/dev/pci/bktr/* ok jakemsr@


# 1.39 20-Nov-2007 canacar

Fix possible mbuf leak on error. ok reyk@


# 1.38 22-Oct-2007 reyk

use the input mbuf for the first active port instead of copying it in
the broadcast start routing.

ok pyr@


# 1.37 22-Oct-2007 pyr

Add a broadcast mode to trunk(4). This mode sends frames on all
ports and receives frame on any port. This allows interaction with
some L2 configurations.
with input and ok reyk@


# 1.36 15-Sep-2007 henning

malloc sweep:
-remove useless casts
-MALLOC/FREE -> malloc/free
-use M_ZERO where appropriate instead of seperate bzero
feedback & ok krw, hshoexer


# 1.35 07-Sep-2007 reyk

use M_ZERO


# 1.34 06-Sep-2007 reyk

bump the copyright while touching these files


# 1.33 06-Sep-2007 reyk

With a trunk(4) interface in fail over mode the trunk(4) interface
will show input errors for packets received from any of the ports that
are part of a fail over interface but are not the "master" port at the
time. This fixes the problem by checking the error condition
correctly.

From brad at comstyle dot com


Revision tags: OPENBSD_4_2_BASE
# 1.32 26-May-2007 jason

one extern seems to be better than 20 for ifqmaxlen; ok krw


# 1.31 26-Apr-2007 reyk

extend the trunk protocol API with some additional callbacks required
for future work. also move the repeated tx start code into a common
function.

parts of it are merged from FreeBSD's trunk(4) port. oh, wait... they
renamed it to 'lagg(4)' because a little green guy from Cizzco-Eeeh
told them "trunk is for VLANs, trunk is for VLANs". Bad FreeBSD, don't
listen to the little green guy from Cizzco-Eeeh!

ok claudio@


Revision tags: OPENBSD_4_1_BASE
# 1.30 31-Jan-2007 reyk

handle the full duplex link state in trunk(4). load sharing trunks
with at least two ports are always handled as full fuplex links. this
change will allow trunks as edge ports in a rstp bridge(4).

ok brad@ pyr@


Revision tags: OPENBSD_4_0_BASE
# 1.29 28-May-2006 reyk

check if the interface is active and UP. some, but not all, network
drivers report an active link state even if the interface is DOWN.
this should fix trunk with various ethernet devices.

ok brad@


# 1.28 23-May-2006 reyk

knf and remove an unneeded debug message


# 1.27 23-May-2006 reyk

add


# 1.26 20-May-2006 reyk

bump copyright


# 1.25 20-May-2006 reyk

drop packets received on inactive failover ports without increasing the
error counter. just silently drop them...

figured out by todd@, ok brad@


# 1.24 16-May-2006 reyk

the ifp->if_linkstatehooks may be NULL, add an extra check to avoid
possible kernel panic. this happened to me by using tun(4) in layer 2
mode (link0 flag) as a trunk port for testing.


# 1.23 25-Mar-2006 djm

allow bpf(4) to ignore packets based on their direction (inbound or
outbound), using a new BIOCSDIRFILT ioctl;
guidance, feedback and ok canacar@


# 1.22 11-Mar-2006 brad

splimp -> splnet


# 1.21 04-Mar-2006 brad

With the exception of two other small uncommited diffs this moves
the remainder of the network stack from splimp to splnet.

ok miod@


Revision tags: OPENBSD_3_9_BASE
# 1.20 04-Jan-2006 brad

Move bpf_mtap_hdr() after trunk_*_input() so that we hopefully see only one
copy of the packet with bpf.

From mcbride@

some testing by todd@, ok reyk@


# 1.19 04-Jan-2006 canacar

Remove redundant calls to bpfdetach.
ok brad@


# 1.18 21-Dec-2005 reyk

knf


# 1.17 21-Dec-2005 reyk

fix possible NULL pointer, thanks to Marco Molteni


# 1.16 18-Dec-2005 reyk

Update my e-mail address in the copyright statement, no binary changes.


# 1.15 17-Dec-2005 brad

revert last commit as it introduced system panics due to improper use
of bpf_mtap().


# 1.14 27-Nov-2005 mcbride

Move bpf_mtap() after trunk_*_input() so that we hopefull see only one
copy of the packet with bpf.

ok reyk@


# 1.13 27-Nov-2005 mcbride

Attempt to accept a packet only once when operating in failover mode.

Makes trunk usable with hubs or switches which don't have actual trunk support.

ok reyk@


# 1.12 27-Nov-2005 mcbride

Fix dereference of uninitialised pointer in trunk_input() error path.

ok reyk@


# 1.11 27-Nov-2005 mcbride

Make the trunk interface link state depend on the link states of the
trunkports (link is UP as long as at least one of the trunkports is up)

ok reyk@


# 1.10 23-Oct-2005 mpf

Rework of multicast deletion code for vlan(4) and trunk(4).
The previous code could wrongly delete multicast groups
on the parent interface. Now we forward only remembered
delete requests.
OK mcbride, mickey.


# 1.9 09-Oct-2005 reyk

use label


# 1.8 03-Oct-2005 reyk

add a simple active "failover" trunk mode. port priorities will be
added later, currently the master port will always be the default
(active) port and the the next active port will be used as the
failover port.

ok brad@


# 1.7 14-Sep-2005 reyk

correctly set IFF_RUNNING flag after device state changes.

ok brad@


# 1.6 11-Sep-2005 brad

when adding any IP addresses make sure to UP the interface.

ok reyk@


# 1.5 10-Sep-2005 reyk

update the trunk(4) driver

- add multicast support by passing multicast addresses to the ports.
this is a requirement for carp(4) over trunk(4).

- support the smallest common interface capabilities. ie., this adds
support for VLAN MTUs if all attached ports have this capability.

- add a port_destroy callback to the trunk protocol. this fixes a
potential crash if the master port has been detached while running.

discussed with deraadt@, brad@ and some others.


Revision tags: OPENBSD_3_8_BASE
# 1.4 31-Jul-2005 pascoe

Introduce bpf_mtap_af and bpf_mtap_hdr to be used when passing a mbuf chain
to bpf with either an address family or other header added.

These helpers only allocate a much smaller struct m_hdr on the stack when
needed, rather than leaving 256 byte struct mbufs on the stack in deep
call paths. Also removes a fair bit of duplicated code.

commit now, tune after deraadt@


# 1.3 27-May-2005 reyk

add missing free on error. thanks to Andrey Matveev.


# 1.2 24-May-2005 reyk

support trunk stacking (trunks as trunk ports) and some fixes

ok brad@


# 1.1 24-May-2005 reyk

initial import of a trunking (link aggregation and link failover)
implementation. it currently supports round robin mode with link state
checking, additional modes will be added later.

ok brad@, deraadt@


# 1.144 06-Dec-2019 dlg

when copying capabilities from the first port to a trunk, copy hardmtu too.

previously it copied the ports if_mtu to the trunks if_hardmtu,
which makes it hard for things like vlan(4) to work with a full
frame size, or large frame size.

tested by hrvoje popovski


# 1.143 07-Nov-2019 dlg

turn the linkstate hooks into a task list, like the detach hooks.

this is largely mechanical, except for carp. this moves the addition
of the carp link state hook after we're committed to using the new
interface as a carpdev. because the add can't fail, we avoid a
complicated unwind dance. also, this tweaks the carp linkstate hook
so it only updates the relevant carp interface, not all of the
carpdevs on the parent.

hrvoje popovski has tested an early version of this diff and it's
generally ok, but there's some splasserts that this diff fires that
i'll fix in an upcoming diff.

ok claudio@


# 1.142 06-Nov-2019 dlg

replace the hooks used with if_detachhooks with a task list.

the main semantic change is that things registering detach hooks
have to allocate and set a task structure that then gets added to
the list. this means if the task is allocated up front (eg, as part
of carps softc or bridges port structure), it avoids the possibility
that adding a hook can fail. a lot of drivers weren't checking for
failure, and unwinding state in the event of failure in other parts
was error prone.

while doing this i discovered that the list operations have to be
in a particular order, but drivers weren't doing that consistently
either. this diff wraps the list ops up so you have to seriously
go out of your way to screw them up.

ive also sprinkled some NET_ASSERT_LOCKED around the list operations
so we can make sure there's no potential for the list to be corrupted,
especially while it's being run.

hrvoje popovski has tested this a bit, and some issues he discovered
have been fixed.

ok sashan@


Revision tags: OPENBSD_6_6_BASE
# 1.141 05-Jul-2019 dlg

record when trunk takes over an interface by setting ac_trunkport

this will be used to prevent trunk and the upcoming aggr driver
from taking ownership of an Ethernet interface at the same time.


# 1.140 11-May-2019 florian

A trunk(4) usually stays up when the link state of one of its members
changes. While we do get RTM_IFINFO messages for the (physical) member
interfaces there is no indication that something changed from the
trunk(4) interface.
It is helpful to get this information in userland from the trunk so that
userland daemons do not need to track interface membership by themselves.
OK phessler


# 1.139 29-Apr-2019 dlg

tr_unit is unused, so gc it


# 1.138 23-Apr-2019 dlg

a first cut at converting some virtual ethernet interfaces to if_vinput

this let's input processing bypass ifiqs. there's a performance
benefit from this, and it will let me tweak the backpressure detection
mechanism that ifiqs use without impacting on a stack of virtual
interfaces.

ive tested all of these except mpw, which i will end up testing
soon anyway.


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.137 12-Aug-2018 ccardenas

Add administrative options to LACP trunk implementation.

The trunk driver now has a new ioctl (SIOCxTRUNKOPTS), which for now only
has options for LACP:
* Mode - Active or Passive (default Active)
* Timeout - Fast or Slow (default Slow)
* System Priority - 1(high) to 65535(low) (default 32768/0x8000)
* Port Priority - 1(high) to 65535(low) (default 32768/0x8000)
* IFQ Priority - 0 to NUM_QUEUES (default 6)

At the moment, ifconfig only has options for lacpmode and lacptimeout
plumbed as those are the immediate need.

The approach taken for the options was to make them on a "trunk" vs a
"port" as what's typically seen on various NOSes (JunOS, NXOS, etc...)
as it's uncommon for a host to have one link "Passive" and the other
"Active" in a given trunk.

Just like on a NOS, when applying lacpmode or lacptimeout, the settings
are immediately applied to all existing ports in the trunk and to all
future ports brought into the trunk.

Tested by many on a plethora of NIC drivers and switches.

Ok remi@


Revision tags: OPENBSD_6_3_BASE
# 1.136 19-Feb-2018 mpi

Remove almost unused `flags' argument of suser().

The account flag `ASU' will no longer be set but that makes suser()
mpsafe since it no longer mess with a per-process field.

No objection from millert@, ok tedu@, bluhm@


# 1.135 09-Jan-2018 bluhm

Creating a cloned interface could return ENOMEM due to temporary
memory shortage. As it is invoked from a system call, it should
not fail and wait instead.
OK visa@ mpi@


Revision tags: OPENBSD_6_2_BASE
# 1.134 14-Aug-2017 reyk

The "ret" return value is reused and overwritten, potentially
returning 0 (success) on error instead of an error number. The caller
doesn't evaluate the return value, so it is good enough to return
ENOBUFS (non-0) on error and to remove "ret" in trunk_cast_start().

Coverity CID 1453105; Severity: Minor

OK mpi@


# 1.133 11-Aug-2017 mpi

Remove NET_LOCK()'s argument.

Tested by Hrvoje Popovski, ok bluhm@


# 1.132 28-May-2017 mpi

Add missing NET_UNLOCK() in error path.

Spotted by sashan@


# 1.131 28-May-2017 mpi

trunk_port_destroy() needs the NET_LOCK().

It brings the interface down and restore the original lladdr.

Found by Hrvoje Popovski


# 1.130 28-May-2017 mpi

Remove useless splnet()/splx() dances.

Data structures modified in the ioctl path are protected by the NET_LOCK().

ok sashan@


Revision tags: OPENBSD_6_1_BASE
# 1.129 22-Jan-2017 dlg

move counting if_opackets next to counting if_obytes in if_enqueue.

this means packets are consistently counted in one place, unlike the
many and various ways that drivers thought they should do it.

ok mpi@ deraadt@


# 1.128 16-Sep-2016 mikeb

Reconfigure interface capabilities after switching trunkproto; ok mpi


Revision tags: OPENBSD_6_0_BASE
# 1.127 13-Apr-2016 mpi

We're always ready! So send IFQ_SET_READY() to the bitbucket.


Revision tags: OPENBSD_5_9_BASE
# 1.126 31-Dec-2015 sthen

Move tr_port_destroy down; fixes 'lacp_compose_key protection fault trap'
when removing a port from a lacp trunk. Part of a larger diff from mpi,
as suggested by mikeb. ok mpi@


# 1.125 21-Nov-2015 dlg

dont check IFF_OACTIVE to see if the port is busy.

dont check if its busy at all, actually.

fine with reyk@


# 1.124 20-Nov-2015 dlg

dont play with IFF_OACTIVE needlessly.

only a driver sets or clears it, and trunk never sets it. therefore it
never needs to clear it.


# 1.123 12-Nov-2015 mpi

Prefix flowid with ph_ and print it in m_print().

ok dlg@


# 1.122 25-Oct-2015 mpi

arp_ifinit() is no longer required.


# 1.121 08-Oct-2015 mikeb

Make sure that when trunk_port_ioctl is called to set a new
lladdr the trunk port is already on the list.

OK mpi


# 1.120 08-Oct-2015 dlg

if the mbuf has a valid flowid, use it instead of using siphash24
and a bunch of header fields we have to parse the mbuf for.

siphash24 is about 20% of the cost of sending a udp packet on a
trunk interface with tcpbench on my box. if there's a flowid set
we get all that back.

ok mpi@ mikeb@ sthen@


# 1.119 05-Oct-2015 mikeb

Factor LACP frame processing out to a separate task

This is slightly refactored version of the diff by jmatthew@
that makes use of a single per-trunk task but retains per-port
mbuf queues.

Running LACP frame processing in a task context allows a simple
way to synchronize changes to the trunk ports and trunk itself
performed from the ioctl, timeout and task contexts with a kernel
lock.

OK mpi


# 1.118 29-Sep-2015 deraadt

add sizes to some of the simpler free calls
ok mpi


# 1.117 28-Sep-2015 mpi

Remove "if_tp" from the "struct ifnet".

Instead of violating a layer of abstraction by keeping per pseudo-driver
informations in "struct ifnet", the port trunk is now passed as a cookie
to the interface input handler (ifih).

The time of per pseudo-driver hack in the network stack is over!

ok mikeb@


# 1.116 24-Sep-2015 mikeb

add a comment explaining how we serialize when switching trunkproto;
requested by mpi@


# 1.115 24-Sep-2015 mikeb

Avoid a theoretical m_pullup(9) mishandling by delegating the mbuf
reclaiming to the PDU and marker input routines.

m_pullup may return a pointer to the newly allocated mbuf. In this
case m_freem is called by the trunk_input, not by the proto specific
code and pointer to the mbuf is not passed by reference. Therefore
m_freem will either be called on the middle element of the chain
(when the m_pullup call succeeds) or on the stale pointer (when it
frees the chain in the failure case). Fortunately we should never
hit this case as the receive path uniformly uses contiguous chunks
of memory.

Verified with and ok blambert, ok mpi


# 1.114 23-Sep-2015 mikeb

Serialize trunk changes with input handler insertion and removal.

This moves around calls to if_ih_insert and if_ih_remove to ensure
that we either have completed port initialization or are going to
tear the port configuration down and don't want any input processes
to get hold of the port.

When trunk_port_destroy is called from the ioctl this would wait for
all input processes to finish and release their references to be able
to disestablish the input handler and ensure full control of the port.

When switching trunkproto it is required for the ioctl context to
be able to touch all trunk ports and the protocol (tr_psc). The
easiest way do this is to disestablish all input handlers (while
making sure they all complete) and then reestablish them after the
trunk reconfiguration is completed.

This avoids getting trunk a separate locking protocol of its own.

ok mpi, suggested by and ok dlg


# 1.113 23-Sep-2015 mikeb

Keep track of an active port in the failover trunk to avoid list
iterations and additional locking protection in the future.

Suggested by and ok mpi


# 1.112 23-Sep-2015 mikeb

Remove trunk watchdog code since it doesn't do anything useful
and we want to limit the number of different places where we
access trunk port pointers.

trunk_watchdog should be never called as we don't set up it's
if_timer and trunk_port_watchdog just calls the if_watchdog
from the underlying interface.

It's possible that this is no longer needed due to if_slowtimo/
if_watchdog changes done earlier.

ok mpi


# 1.111 10-Sep-2015 mikeb

pass a cookie argument to interface input handlers that can be used
to pass additional context or transient data with the similar life
time.

ok mpi, suggestions, hand holding and ok from dlg


# 1.110 10-Sep-2015 dlg

move the if input handler list to an SRP list.

instead of having every driver that manipulates the ifih list
understand SRPLs, this moves that processing into if_ih_insert and
if_ih_remove functions.

we rely on the kernel lock to serialise the modifications to the
list.

tested by mpi@
ok mpi@ claudio@ mikeb@


Revision tags: OPENBSD_5_8_BASE
# 1.109 17-Jul-2015 mpi

Drop promiscuously received packets if the trunk(4) interface is not
in promiscuous mode.

The long story is that claudio@ had his ssh session reset multiple
times in the hackroom because czarkoff@'s machine was sending reset.
We figured out that the packet was reaching pf because of this missing
check. pf would then not find any state and sent a reset.

Analyzed with and ok phessler@, claudio@


# 1.108 02-Jul-2015 mpi

Unify the check for up & running between all pseudo-drivers.


# 1.107 02-Jul-2015 mpi

By design if_input_process() needs to hold a reference on the receiving
ifp in order to access its ifih handlers.

So get rid of if_get() in the various ifih handlers we know the ifp is
live at this point.

ok dlg@


# 1.106 30-Jun-2015 mpi

Rename if_output() into if_enqueue() to avoid confusion with comments
talking about (*ifp->if_output)().

ok claudio@, dlg@


# 1.105 29-Jun-2015 dlg

count if_ibytes in if_input like we do for if_ipackets.

tweaks and ok mpi@


# 1.104 24-Jun-2015 mpi

Increment if_ipackets in if_input().

Note that pseudo-drivers not using if_input() are not affected by this
conversion.

ok mikeb@, kettenis@, claudio@, dlg@


# 1.103 16-Jun-2015 mpi

Store a unique ID, an interface index, rather than a pointer to the
receiving interface in the packet header of every mbuf.

The interface pointer should now be retrieved when necessary with
if_get(). If a NULL pointer is returned by if_get(), the interface
has probably been destroy/removed and the mbuf should be freed.

Such mechanism will simplify garbage collection of mbufs and limit
problems with dangling ifp pointers.

Tested by jmatthew@ and krw@, discussed with many.

ok mikeb@, bluhm@, dlg@


# 1.102 15-Jun-2015 mpi

Fix a double free in the destroy path triggered when a second process,
in my case dhclient(8), races with ifconfig(8) to free the descriptors
of the joined multicast groups.

While here reduce the difference with carp(4).

ok dms@


# 1.101 09-Jun-2015 mpi

Convert trunk(4) to if_input().

ok dlg@


# 1.100 26-May-2015 mpi

Now that the Ethernet header is always passed as part of the mbuf, kill
the second (unused) argument of the input packet handlers.

ok dlg@


# 1.99 15-May-2015 mpi

Introduce if_output(), a function do to the last steps before enqueuing
a packet on the sending queue of an interface.

Tested by many, thanks a lot!

ok dlg@, claudio@


# 1.98 14-May-2015 mpi

Allocate the input packet handler as part of the trunk_port structure
since they have the same lifetime.

Requested by and ok dlg@


# 1.97 13-May-2015 mpi

Get rid of the last "#if NTRUNK" by overwriting trunk ports' output
function.

ok claudio@, reyk@


# 1.96 11-May-2015 mpi

Take trunk(4) out of ether_input().

Each physical interface of a trunk now gets a specific ifih (interface
input handler) that runs before ether_input().

Tested by sthen@, dlg@, Theo Buehler and <mxb AT alumni.chalmers DOT se>

ok sthen@, dlg@


# 1.95 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.94 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.93 04-Dec-2014 tedu

use siphash for trunk loadbalancing. ok deraadt


# 1.92 01-Dec-2014 mikeb

Make every interface with a watchdog register it's own slow timeout

This removes the system wide if_slowtimo timeout and lets every
interface with a valid if_watchdog method register it's own in
order to get rid of the ifnet loop in the softclock context and
avoid further complications with concurrent access to the ifnet
list.

ok deraadt, input and ok mpi, looked at by claudio


# 1.91 18-Nov-2014 tedu

move arc4random prototype to systm.h. more appropriate for most code
to include that than rdnvar.h. ok deraadt dlg


Revision tags: OPENBSD_5_6_BASE
# 1.90 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.89 12-Jul-2014 tedu

add a size argument to free. will be used soon, but for now default to 0.
after discussions with beck deraadt kettenis.


# 1.88 09-Jul-2014 henning

bpf code surgery / shuffling / simplification.
the various bpf_mtap_* are very similiar, they differ in what (and to some
extent how) they prepend something, and what copy function they pass to
bpf_catchpacket.
use an internal _bpf_mtap as "backend" for bpf_mtap and friends.
extend bpf_mtap_hdr so that it covers all common cases:
if dlen is 0, nothing gets prepended.
copy function can be given, if NULL the default bpf_mcopy is used.
adjust the existing bpf_mtap_hdr users to pass a NULL ptr for the copy fn.
re-implement bpf_mtap_af as simple wrapper for bpf_mtap_hdr.
re-implement bpf_mtap_ether using bpf_map_hdr
re-implement bpf_mtap_pflog as trivial bpf_mtap_hdr wrapper
ok bluhm benno


# 1.87 10-Mar-2014 mpi

if_lladdr -> if_sadl, no functional change.

ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.86 21-Nov-2013 mpi

Remove unneeded include.

ok deraadt@


# 1.85 18-Nov-2013 mpi

Convert trunk(4) to use a detachhook, discussed at b2k13 with many.

While here add a comment explaining detach hooks' order of execution when
destroying/detaching an interface.


Revision tags: OPENBSD_5_4_BASE
# 1.84 20-Jun-2013 mpi

Revert previous and unbreak asr, the new include should be protected.

Reported by naddy@


# 1.83 20-Jun-2013 mpi

Allocate the various hook head descriptors as part of the ifnet
structure rather than doing various M_WAITOK allocations during
the *attach() functions, we always rely on them anyway.

ok mikeb@, uebayasi@


# 1.82 11-May-2013 sthen

Set trunk(4)'s MTU to that of the first trunkport. Allows trunk to work with
jumbo/baby-jumbo frames. To avoid problems with mismatches between trunkports,
any additional ports must have the same MTU as already set on the trunk(4).
Based on changes made in FreeBSD. Tested by myself and jj@, ok reyk@


# 1.81 02-Apr-2013 mpi

Instead of storing the link-level address of every interface in a global
array indexed by interface numbers, add a new field to the interface
descriptor pointing to it.

claudio@ and todd@ like it, ok mikeb@


# 1.80 28-Mar-2013 tedu

no need for a lot of code to include proc.h


Revision tags: OPENBSD_5_3_BASE
# 1.79 25-Feb-2013 dlg

trunk_bcast_start sent packets on all its member interfaces by copying
the mbuf it just sent on the previous interface. this is bad because the
previous interface could have modified the mbuf chain, which can make the
subsequent m_copym()s panic.

this copies the dance that rtsock.c does for broadcasting mbufs which
copies the mbuf before transmit, except for the last interface which it
handles outside the loop.

tested by halex@ who verified it fixes his panic.
ok claudio@ deraadt@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.78 28-Oct-2011 krw

Take more care to ensure all callbacks are initialized. In particular
tr_linkstate() was not initialized in trunk_rr_attach(), and
tr_init() and tr_stop() were missing in trunk_lb_attach().

Fixes crash triggered by changing trunkproto, reported by Anders
Berggren on bugs@.

ok mpf henning


Revision tags: OPENBSD_4_9_BASE OPENBSD_5_0_BASE
# 1.77 28-Jan-2011 reyk

Fix another memory leak by replacing m_free() with m_freem() in trunk's
broadcast mode.

ok blambert@ mikeb@


# 1.76 12-Nov-2010 dhill

avoid pointer arithmetic on void *
no change in .o

ok claudio


Revision tags: OPENBSD_4_8_BASE
# 1.75 08-May-2010 stsp

Upon changing the MAC address of an if_trunk interface, all ports are switched
to the new MAC. But subsequently added ports were still being assigned the
old MAC address because it was copied from the wrong place. Give newly added
trunk ports the current MAC of the master port, rather than the saved MAC of
the master port. The saved MAC should only be used to restore the original
MAC address of the interface when it is removed from the trunk.

ok claudio@


# 1.74 23-Apr-2010 stsp

Use proper Queen's English in a comment. Drive-by fix, no functional change.


# 1.73 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


# 1.72 17-Apr-2010 deraadt

use ifnewlladdr() for trunk lladdr changes, too
ok stsp


Revision tags: OPENBSD_4_7_BASE
# 1.71 12-Jan-2010 dlg

set the length of the send queue to 1.

this prevents the ultimate length of the queue of the underlying interface
from being artificially inflated while hte vlan/trunk queue is filled and
then dumped wholesale on the underlying interface, which will dump its
massive queue wholesale on the chip.

tx mitigation is only triggered on real interfaces now (which is where the
cost is)

ok beck@ original diff ok kjc@ henning@


# 1.70 18-Nov-2009 deraadt

do not do setup that ether_ifattach() takes care of; ok jsg


# 1.69 17-Sep-2009 claudio

Add an splassert check trunk_enqueue() calling this function at anything
below splnet() is a good recipe for doom.
OK henning, reyk, mpf


# 1.68 09-Sep-2009 reyk

remove inline functions and move some code from the trunk_lacp_input()
API function directly to lacp_input() to simplify the code path.

ok mpf@


# 1.67 16-Jul-2009 thib

Backout rev1.79 of if_vlan.c and rev1.66 of if_trunk.c;
Changes in those revision limited the send queue to one slot.

This breaks NFS over vlan(4) has discovered by sthen@.

"just plain back it out." deraadt@


# 1.66 13-Jul-2009 dlg

make the send queue one slot long. this forces packets off the virtual
interfaces down to the queue on the physical interface immediately.

this avoids having the tx mitigation code wasting cpu time dicking around
with simply shuffling packets off virtual interface queues and lets it
do its job of ammortising the cost of calling a real interfaces start
routine.

it also prevents an artificial inflation of the physical interfaces queue
length where packets could hide on the virtual interfaces queues during
softnet before being dumped en masse onto the hardware. this will smooth
out the rate at which packets are submitted to the hardware.

kjc@ says this has no impact on altq. ya henning@


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.65 27-Jan-2009 naddy

handle HW VLAN tags being passed down; from Brad


# 1.64 27-Jan-2009 naddy

make the hardware/no hardware tag stripping cases consistent and don't
hash the VLAN priority; ok henning@


# 1.63 14-Dec-2008 brad

Allow trunk_hashmbuf() to take HW VLAN tagging into consideration.

ok mpf@ naddy@


# 1.62 14-Dec-2008 brad

Since trunk_hashmbuf() and thus trunk_lb_gethdr() are no longer specific
to the loadbalance code rename trunk_lb_gethdr() to just trunk_gethr().

ok mpf@


# 1.61 28-Nov-2008 brad

Eliminate the redundant bits of code for MTU and multicast handling
from the individual drivers now that ether_ioctl() handles this.

Shrinks the i386 kernels by..
RAMDISK - 2176 bytes
RAMDISKB - 1504 bytes
RAMDISKC - 736 bytes

Tested by naddy@/okan@/sthen@/brad@/todd@/jmc@ and lots of users.
Build tested on almost all archs by todd@/brad@

ok naddy@


# 1.60 16-Nov-2008 brad

Make sure to increment the the output error counter if
not using TRUNK_PROTO_NONE and there are no member ports.

ok mpf@


# 1.59 08-Nov-2008 mpf

Take into account that our ether_input() already strips the
ethernet header. This lets us actually process the incoming
LACP-Packets. It should now work with a lot more switches.
At least a Catalyst 3500 seems happy.
OK brad@


# 1.58 04-Nov-2008 brad

Move the trunk port count check from trunk_lb_start() to trunk_start()
before the protocol start routine is called so as to cover all protocols
with the same check.

ok mpf@


# 1.57 30-Oct-2008 brad

Fix building with !INET6 kernels.


# 1.56 28-Oct-2008 brad

Remove return at the end of a void function.


# 1.55 28-Oct-2008 brad

In trunk_media_status() mark the interface as active if any ports are
active rather than just the primary being UP.

From FreeBSD

Ok mpf@


# 1.54 28-Oct-2008 brad

In trunk_lb_start() port % count will never be greater than
TRUNK_MAX_PORTS so nuke the test.

From FreeBSD

Ok mpf@


# 1.53 28-Oct-2008 brad

Feed IPv6 flow label to hash calculation.

From FreeBSD

Ok mpf@


# 1.52 28-Oct-2008 brad

Show the ACTIVE flag in ifconfig for the single interface that is
actually active in failover mode rather than all interfaces with a
link. This makes it clear if the master interface is in use or one
of the backup links.

From FreeBSD

Tested by jmc@
Ok mpf@


# 1.51 02-Oct-2008 brad

First step towards cleaning up the Ethernet driver ioctl handling.
Move calling ether_ioctl() from the top of the ioctl function, which
at the moment does absolutely nothing, to the default switch case.
Thus allowing drivers to define their own ioctl handlers and then
falling back on ether_ioctl(). The only functional change this results
in at the moment is having all Ethernet drivers returning the proper
errno of ENOTTY instead of EINVAL/ENXIO when encountering unknown
ioctl's.

Shrinks the i386 kernels by..
RAMDISK - 1024 bytes
RAMDISKB - 1120 bytes
RAMDISKC - 832 bytes

Tested by martin@/jsing@/todd@/brad@
Build tested on almost all archs by todd@/brad@

ok jsing@


# 1.50 17-Sep-2008 chl

remove dead stores and newly created unused variables.

fix potential use of uninitialized value in trunk_port_ioctl() function.

Found by LLVM/Clang Static Analyzer.

ok mpf@ henning@


# 1.49 07-Aug-2008 damien

do not touch m after IFQ_ENQUEUE()+if_start().

ok brad@, mpf@, henning@, reyk@


Revision tags: OPENBSD_4_4_BASE
# 1.48 06-Aug-2008 reyk

fix trunk breakage that sneaked in with the lacp diff:

- don't use in-kernel IFMEDIA ioctls in lacp and remove two KASSERTs
that caused reliable panics - the lacp key can be locally assigned and
we don't need to query the media subtype here.

- unbreak failover/loadbalance/broadcast status handling.

Reported by brad@
ok deraadt@


# 1.47 30-Jul-2008 mpf

Prevent a divide by zero panic if trunkproto loadbalance is
used w/out any trunkports. Patch from Dmitri Alenitchev.
OK reyk@


# 1.46 15-Jun-2008 mpf

Add 802.3ad LACP support for trunk(4).
Implementation from NetBSD. Ported via FreeBSD's version in trunk^Wlagg(4).
This is still work in progress. Tested with a HP ProCurve 3500.
OK reyk@


# 1.45 14-Jun-2008 mpf

Move bpf_mtap_hdr() above the trunk_*_input() routines.
This makes it easier to add trunk protocols that consume some packets.
Add a special case for the failover protocol, to prevent shoving
duplicates to bpf. (Not beautiful, but it has to do for the moment).
OK reyk@, claudio@


# 1.44 13-Jun-2008 mpf

Move the responsibility to free received packets on trunked interfaces
from ether_input() into trunk_input() where it can be handled in a smarter way.
OK claudio@ and reyk@ on an earlier version.


# 1.43 08-Jun-2008 brad

Use m_freem() instead of m_free() in trunk_start() to ensure that the
full mbuf chain is being free'd.

ok reyk@


# 1.42 07-May-2008 dlg

enable tx mitigation when putting packets on the wire by switching from
calls to ifp->if_start to if_start(). these are the obviously right cases
where we can do that, the less obvious ones may follow as theyre figured
out.

deraadt@ said to go for it


Revision tags: OPENBSD_4_3_BASE
# 1.41 10-Jan-2008 brad

return with ENOTTY instead of EINVAL for unknown ioctl requests to trunk
ports.

ok reyk@ dlg@


# 1.40 26-Nov-2007 martynas

typos; ok jmc@
sys/netinet/in_pcb.c and sys/net/bridgestp.c ok henning@
sys/dev/pci/bktr/* ok jakemsr@


# 1.39 20-Nov-2007 canacar

Fix possible mbuf leak on error. ok reyk@


# 1.38 22-Oct-2007 reyk

use the input mbuf for the first active port instead of copying it in
the broadcast start routing.

ok pyr@


# 1.37 22-Oct-2007 pyr

Add a broadcast mode to trunk(4). This mode sends frames on all
ports and receives frame on any port. This allows interaction with
some L2 configurations.
with input and ok reyk@


# 1.36 15-Sep-2007 henning

malloc sweep:
-remove useless casts
-MALLOC/FREE -> malloc/free
-use M_ZERO where appropriate instead of seperate bzero
feedback & ok krw, hshoexer


# 1.35 07-Sep-2007 reyk

use M_ZERO


# 1.34 06-Sep-2007 reyk

bump the copyright while touching these files


# 1.33 06-Sep-2007 reyk

With a trunk(4) interface in fail over mode the trunk(4) interface
will show input errors for packets received from any of the ports that
are part of a fail over interface but are not the "master" port at the
time. This fixes the problem by checking the error condition
correctly.

From brad at comstyle dot com


Revision tags: OPENBSD_4_2_BASE
# 1.32 26-May-2007 jason

one extern seems to be better than 20 for ifqmaxlen; ok krw


# 1.31 26-Apr-2007 reyk

extend the trunk protocol API with some additional callbacks required
for future work. also move the repeated tx start code into a common
function.

parts of it are merged from FreeBSD's trunk(4) port. oh, wait... they
renamed it to 'lagg(4)' because a little green guy from Cizzco-Eeeh
told them "trunk is for VLANs, trunk is for VLANs". Bad FreeBSD, don't
listen to the little green guy from Cizzco-Eeeh!

ok claudio@


Revision tags: OPENBSD_4_1_BASE
# 1.30 31-Jan-2007 reyk

handle the full duplex link state in trunk(4). load sharing trunks
with at least two ports are always handled as full fuplex links. this
change will allow trunks as edge ports in a rstp bridge(4).

ok brad@ pyr@


Revision tags: OPENBSD_4_0_BASE
# 1.29 28-May-2006 reyk

check if the interface is active and UP. some, but not all, network
drivers report an active link state even if the interface is DOWN.
this should fix trunk with various ethernet devices.

ok brad@


# 1.28 23-May-2006 reyk

knf and remove an unneeded debug message


# 1.27 23-May-2006 reyk

add


# 1.26 20-May-2006 reyk

bump copyright


# 1.25 20-May-2006 reyk

drop packets received on inactive failover ports without increasing the
error counter. just silently drop them...

figured out by todd@, ok brad@


# 1.24 16-May-2006 reyk

the ifp->if_linkstatehooks may be NULL, add an extra check to avoid
possible kernel panic. this happened to me by using tun(4) in layer 2
mode (link0 flag) as a trunk port for testing.


# 1.23 25-Mar-2006 djm

allow bpf(4) to ignore packets based on their direction (inbound or
outbound), using a new BIOCSDIRFILT ioctl;
guidance, feedback and ok canacar@


# 1.22 11-Mar-2006 brad

splimp -> splnet


# 1.21 04-Mar-2006 brad

With the exception of two other small uncommited diffs this moves
the remainder of the network stack from splimp to splnet.

ok miod@


Revision tags: OPENBSD_3_9_BASE
# 1.20 04-Jan-2006 brad

Move bpf_mtap_hdr() after trunk_*_input() so that we hopefully see only one
copy of the packet with bpf.

From mcbride@

some testing by todd@, ok reyk@


# 1.19 04-Jan-2006 canacar

Remove redundant calls to bpfdetach.
ok brad@


# 1.18 21-Dec-2005 reyk

knf


# 1.17 21-Dec-2005 reyk

fix possible NULL pointer, thanks to Marco Molteni


# 1.16 18-Dec-2005 reyk

Update my e-mail address in the copyright statement, no binary changes.


# 1.15 17-Dec-2005 brad

revert last commit as it introduced system panics due to improper use
of bpf_mtap().


# 1.14 27-Nov-2005 mcbride

Move bpf_mtap() after trunk_*_input() so that we hopefull see only one
copy of the packet with bpf.

ok reyk@


# 1.13 27-Nov-2005 mcbride

Attempt to accept a packet only once when operating in failover mode.

Makes trunk usable with hubs or switches which don't have actual trunk support.

ok reyk@


# 1.12 27-Nov-2005 mcbride

Fix dereference of uninitialised pointer in trunk_input() error path.

ok reyk@


# 1.11 27-Nov-2005 mcbride

Make the trunk interface link state depend on the link states of the
trunkports (link is UP as long as at least one of the trunkports is up)

ok reyk@


# 1.10 23-Oct-2005 mpf

Rework of multicast deletion code for vlan(4) and trunk(4).
The previous code could wrongly delete multicast groups
on the parent interface. Now we forward only remembered
delete requests.
OK mcbride, mickey.


# 1.9 09-Oct-2005 reyk

use label


# 1.8 03-Oct-2005 reyk

add a simple active "failover" trunk mode. port priorities will be
added later, currently the master port will always be the default
(active) port and the the next active port will be used as the
failover port.

ok brad@


# 1.7 14-Sep-2005 reyk

correctly set IFF_RUNNING flag after device state changes.

ok brad@


# 1.6 11-Sep-2005 brad

when adding any IP addresses make sure to UP the interface.

ok reyk@


# 1.5 10-Sep-2005 reyk

update the trunk(4) driver

- add multicast support by passing multicast addresses to the ports.
this is a requirement for carp(4) over trunk(4).

- support the smallest common interface capabilities. ie., this adds
support for VLAN MTUs if all attached ports have this capability.

- add a port_destroy callback to the trunk protocol. this fixes a
potential crash if the master port has been detached while running.

discussed with deraadt@, brad@ and some others.


Revision tags: OPENBSD_3_8_BASE
# 1.4 31-Jul-2005 pascoe

Introduce bpf_mtap_af and bpf_mtap_hdr to be used when passing a mbuf chain
to bpf with either an address family or other header added.

These helpers only allocate a much smaller struct m_hdr on the stack when
needed, rather than leaving 256 byte struct mbufs on the stack in deep
call paths. Also removes a fair bit of duplicated code.

commit now, tune after deraadt@


# 1.3 27-May-2005 reyk

add missing free on error. thanks to Andrey Matveev.


# 1.2 24-May-2005 reyk

support trunk stacking (trunks as trunk ports) and some fixes

ok brad@


# 1.1 24-May-2005 reyk

initial import of a trunking (link aggregation and link failover)
implementation. it currently supports round robin mode with link state
checking, additional modes will be added later.

ok brad@, deraadt@


# 1.143 07-Nov-2019 dlg

turn the linkstate hooks into a task list, like the detach hooks.

this is largely mechanical, except for carp. this moves the addition
of the carp link state hook after we're committed to using the new
interface as a carpdev. because the add can't fail, we avoid a
complicated unwind dance. also, this tweaks the carp linkstate hook
so it only updates the relevant carp interface, not all of the
carpdevs on the parent.

hrvoje popovski has tested an early version of this diff and it's
generally ok, but there's some splasserts that this diff fires that
i'll fix in an upcoming diff.

ok claudio@


# 1.142 06-Nov-2019 dlg

replace the hooks used with if_detachhooks with a task list.

the main semantic change is that things registering detach hooks
have to allocate and set a task structure that then gets added to
the list. this means if the task is allocated up front (eg, as part
of carps softc or bridges port structure), it avoids the possibility
that adding a hook can fail. a lot of drivers weren't checking for
failure, and unwinding state in the event of failure in other parts
was error prone.

while doing this i discovered that the list operations have to be
in a particular order, but drivers weren't doing that consistently
either. this diff wraps the list ops up so you have to seriously
go out of your way to screw them up.

ive also sprinkled some NET_ASSERT_LOCKED around the list operations
so we can make sure there's no potential for the list to be corrupted,
especially while it's being run.

hrvoje popovski has tested this a bit, and some issues he discovered
have been fixed.

ok sashan@


Revision tags: OPENBSD_6_6_BASE
# 1.141 05-Jul-2019 dlg

record when trunk takes over an interface by setting ac_trunkport

this will be used to prevent trunk and the upcoming aggr driver
from taking ownership of an Ethernet interface at the same time.


# 1.140 11-May-2019 florian

A trunk(4) usually stays up when the link state of one of its members
changes. While we do get RTM_IFINFO messages for the (physical) member
interfaces there is no indication that something changed from the
trunk(4) interface.
It is helpful to get this information in userland from the trunk so that
userland daemons do not need to track interface membership by themselves.
OK phessler


# 1.139 29-Apr-2019 dlg

tr_unit is unused, so gc it


# 1.138 23-Apr-2019 dlg

a first cut at converting some virtual ethernet interfaces to if_vinput

this let's input processing bypass ifiqs. there's a performance
benefit from this, and it will let me tweak the backpressure detection
mechanism that ifiqs use without impacting on a stack of virtual
interfaces.

ive tested all of these except mpw, which i will end up testing
soon anyway.


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.137 12-Aug-2018 ccardenas

Add administrative options to LACP trunk implementation.

The trunk driver now has a new ioctl (SIOCxTRUNKOPTS), which for now only
has options for LACP:
* Mode - Active or Passive (default Active)
* Timeout - Fast or Slow (default Slow)
* System Priority - 1(high) to 65535(low) (default 32768/0x8000)
* Port Priority - 1(high) to 65535(low) (default 32768/0x8000)
* IFQ Priority - 0 to NUM_QUEUES (default 6)

At the moment, ifconfig only has options for lacpmode and lacptimeout
plumbed as those are the immediate need.

The approach taken for the options was to make them on a "trunk" vs a
"port" as what's typically seen on various NOSes (JunOS, NXOS, etc...)
as it's uncommon for a host to have one link "Passive" and the other
"Active" in a given trunk.

Just like on a NOS, when applying lacpmode or lacptimeout, the settings
are immediately applied to all existing ports in the trunk and to all
future ports brought into the trunk.

Tested by many on a plethora of NIC drivers and switches.

Ok remi@


Revision tags: OPENBSD_6_3_BASE
# 1.136 19-Feb-2018 mpi

Remove almost unused `flags' argument of suser().

The account flag `ASU' will no longer be set but that makes suser()
mpsafe since it no longer mess with a per-process field.

No objection from millert@, ok tedu@, bluhm@


# 1.135 09-Jan-2018 bluhm

Creating a cloned interface could return ENOMEM due to temporary
memory shortage. As it is invoked from a system call, it should
not fail and wait instead.
OK visa@ mpi@


Revision tags: OPENBSD_6_2_BASE
# 1.134 14-Aug-2017 reyk

The "ret" return value is reused and overwritten, potentially
returning 0 (success) on error instead of an error number. The caller
doesn't evaluate the return value, so it is good enough to return
ENOBUFS (non-0) on error and to remove "ret" in trunk_cast_start().

Coverity CID 1453105; Severity: Minor

OK mpi@


# 1.133 11-Aug-2017 mpi

Remove NET_LOCK()'s argument.

Tested by Hrvoje Popovski, ok bluhm@


# 1.132 28-May-2017 mpi

Add missing NET_UNLOCK() in error path.

Spotted by sashan@


# 1.131 28-May-2017 mpi

trunk_port_destroy() needs the NET_LOCK().

It brings the interface down and restore the original lladdr.

Found by Hrvoje Popovski


# 1.130 28-May-2017 mpi

Remove useless splnet()/splx() dances.

Data structures modified in the ioctl path are protected by the NET_LOCK().

ok sashan@


Revision tags: OPENBSD_6_1_BASE
# 1.129 22-Jan-2017 dlg

move counting if_opackets next to counting if_obytes in if_enqueue.

this means packets are consistently counted in one place, unlike the
many and various ways that drivers thought they should do it.

ok mpi@ deraadt@


# 1.128 16-Sep-2016 mikeb

Reconfigure interface capabilities after switching trunkproto; ok mpi


Revision tags: OPENBSD_6_0_BASE
# 1.127 13-Apr-2016 mpi

We're always ready! So send IFQ_SET_READY() to the bitbucket.


Revision tags: OPENBSD_5_9_BASE
# 1.126 31-Dec-2015 sthen

Move tr_port_destroy down; fixes 'lacp_compose_key protection fault trap'
when removing a port from a lacp trunk. Part of a larger diff from mpi,
as suggested by mikeb. ok mpi@


# 1.125 21-Nov-2015 dlg

dont check IFF_OACTIVE to see if the port is busy.

dont check if its busy at all, actually.

fine with reyk@


# 1.124 20-Nov-2015 dlg

dont play with IFF_OACTIVE needlessly.

only a driver sets or clears it, and trunk never sets it. therefore it
never needs to clear it.


# 1.123 12-Nov-2015 mpi

Prefix flowid with ph_ and print it in m_print().

ok dlg@


# 1.122 25-Oct-2015 mpi

arp_ifinit() is no longer required.


# 1.121 08-Oct-2015 mikeb

Make sure that when trunk_port_ioctl is called to set a new
lladdr the trunk port is already on the list.

OK mpi


# 1.120 08-Oct-2015 dlg

if the mbuf has a valid flowid, use it instead of using siphash24
and a bunch of header fields we have to parse the mbuf for.

siphash24 is about 20% of the cost of sending a udp packet on a
trunk interface with tcpbench on my box. if there's a flowid set
we get all that back.

ok mpi@ mikeb@ sthen@


# 1.119 05-Oct-2015 mikeb

Factor LACP frame processing out to a separate task

This is slightly refactored version of the diff by jmatthew@
that makes use of a single per-trunk task but retains per-port
mbuf queues.

Running LACP frame processing in a task context allows a simple
way to synchronize changes to the trunk ports and trunk itself
performed from the ioctl, timeout and task contexts with a kernel
lock.

OK mpi


# 1.118 29-Sep-2015 deraadt

add sizes to some of the simpler free calls
ok mpi


# 1.117 28-Sep-2015 mpi

Remove "if_tp" from the "struct ifnet".

Instead of violating a layer of abstraction by keeping per pseudo-driver
informations in "struct ifnet", the port trunk is now passed as a cookie
to the interface input handler (ifih).

The time of per pseudo-driver hack in the network stack is over!

ok mikeb@


# 1.116 24-Sep-2015 mikeb

add a comment explaining how we serialize when switching trunkproto;
requested by mpi@


# 1.115 24-Sep-2015 mikeb

Avoid a theoretical m_pullup(9) mishandling by delegating the mbuf
reclaiming to the PDU and marker input routines.

m_pullup may return a pointer to the newly allocated mbuf. In this
case m_freem is called by the trunk_input, not by the proto specific
code and pointer to the mbuf is not passed by reference. Therefore
m_freem will either be called on the middle element of the chain
(when the m_pullup call succeeds) or on the stale pointer (when it
frees the chain in the failure case). Fortunately we should never
hit this case as the receive path uniformly uses contiguous chunks
of memory.

Verified with and ok blambert, ok mpi


# 1.114 23-Sep-2015 mikeb

Serialize trunk changes with input handler insertion and removal.

This moves around calls to if_ih_insert and if_ih_remove to ensure
that we either have completed port initialization or are going to
tear the port configuration down and don't want any input processes
to get hold of the port.

When trunk_port_destroy is called from the ioctl this would wait for
all input processes to finish and release their references to be able
to disestablish the input handler and ensure full control of the port.

When switching trunkproto it is required for the ioctl context to
be able to touch all trunk ports and the protocol (tr_psc). The
easiest way do this is to disestablish all input handlers (while
making sure they all complete) and then reestablish them after the
trunk reconfiguration is completed.

This avoids getting trunk a separate locking protocol of its own.

ok mpi, suggested by and ok dlg


# 1.113 23-Sep-2015 mikeb

Keep track of an active port in the failover trunk to avoid list
iterations and additional locking protection in the future.

Suggested by and ok mpi


# 1.112 23-Sep-2015 mikeb

Remove trunk watchdog code since it doesn't do anything useful
and we want to limit the number of different places where we
access trunk port pointers.

trunk_watchdog should be never called as we don't set up it's
if_timer and trunk_port_watchdog just calls the if_watchdog
from the underlying interface.

It's possible that this is no longer needed due to if_slowtimo/
if_watchdog changes done earlier.

ok mpi


# 1.111 10-Sep-2015 mikeb

pass a cookie argument to interface input handlers that can be used
to pass additional context or transient data with the similar life
time.

ok mpi, suggestions, hand holding and ok from dlg


# 1.110 10-Sep-2015 dlg

move the if input handler list to an SRP list.

instead of having every driver that manipulates the ifih list
understand SRPLs, this moves that processing into if_ih_insert and
if_ih_remove functions.

we rely on the kernel lock to serialise the modifications to the
list.

tested by mpi@
ok mpi@ claudio@ mikeb@


Revision tags: OPENBSD_5_8_BASE
# 1.109 17-Jul-2015 mpi

Drop promiscuously received packets if the trunk(4) interface is not
in promiscuous mode.

The long story is that claudio@ had his ssh session reset multiple
times in the hackroom because czarkoff@'s machine was sending reset.
We figured out that the packet was reaching pf because of this missing
check. pf would then not find any state and sent a reset.

Analyzed with and ok phessler@, claudio@


# 1.108 02-Jul-2015 mpi

Unify the check for up & running between all pseudo-drivers.


# 1.107 02-Jul-2015 mpi

By design if_input_process() needs to hold a reference on the receiving
ifp in order to access its ifih handlers.

So get rid of if_get() in the various ifih handlers we know the ifp is
live at this point.

ok dlg@


# 1.106 30-Jun-2015 mpi

Rename if_output() into if_enqueue() to avoid confusion with comments
talking about (*ifp->if_output)().

ok claudio@, dlg@


# 1.105 29-Jun-2015 dlg

count if_ibytes in if_input like we do for if_ipackets.

tweaks and ok mpi@


# 1.104 24-Jun-2015 mpi

Increment if_ipackets in if_input().

Note that pseudo-drivers not using if_input() are not affected by this
conversion.

ok mikeb@, kettenis@, claudio@, dlg@


# 1.103 16-Jun-2015 mpi

Store a unique ID, an interface index, rather than a pointer to the
receiving interface in the packet header of every mbuf.

The interface pointer should now be retrieved when necessary with
if_get(). If a NULL pointer is returned by if_get(), the interface
has probably been destroy/removed and the mbuf should be freed.

Such mechanism will simplify garbage collection of mbufs and limit
problems with dangling ifp pointers.

Tested by jmatthew@ and krw@, discussed with many.

ok mikeb@, bluhm@, dlg@


# 1.102 15-Jun-2015 mpi

Fix a double free in the destroy path triggered when a second process,
in my case dhclient(8), races with ifconfig(8) to free the descriptors
of the joined multicast groups.

While here reduce the difference with carp(4).

ok dms@


# 1.101 09-Jun-2015 mpi

Convert trunk(4) to if_input().

ok dlg@


# 1.100 26-May-2015 mpi

Now that the Ethernet header is always passed as part of the mbuf, kill
the second (unused) argument of the input packet handlers.

ok dlg@


# 1.99 15-May-2015 mpi

Introduce if_output(), a function do to the last steps before enqueuing
a packet on the sending queue of an interface.

Tested by many, thanks a lot!

ok dlg@, claudio@


# 1.98 14-May-2015 mpi

Allocate the input packet handler as part of the trunk_port structure
since they have the same lifetime.

Requested by and ok dlg@


# 1.97 13-May-2015 mpi

Get rid of the last "#if NTRUNK" by overwriting trunk ports' output
function.

ok claudio@, reyk@


# 1.96 11-May-2015 mpi

Take trunk(4) out of ether_input().

Each physical interface of a trunk now gets a specific ifih (interface
input handler) that runs before ether_input().

Tested by sthen@, dlg@, Theo Buehler and <mxb AT alumni.chalmers DOT se>

ok sthen@, dlg@


# 1.95 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.94 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.93 04-Dec-2014 tedu

use siphash for trunk loadbalancing. ok deraadt


# 1.92 01-Dec-2014 mikeb

Make every interface with a watchdog register it's own slow timeout

This removes the system wide if_slowtimo timeout and lets every
interface with a valid if_watchdog method register it's own in
order to get rid of the ifnet loop in the softclock context and
avoid further complications with concurrent access to the ifnet
list.

ok deraadt, input and ok mpi, looked at by claudio


# 1.91 18-Nov-2014 tedu

move arc4random prototype to systm.h. more appropriate for most code
to include that than rdnvar.h. ok deraadt dlg


Revision tags: OPENBSD_5_6_BASE
# 1.90 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.89 12-Jul-2014 tedu

add a size argument to free. will be used soon, but for now default to 0.
after discussions with beck deraadt kettenis.


# 1.88 09-Jul-2014 henning

bpf code surgery / shuffling / simplification.
the various bpf_mtap_* are very similiar, they differ in what (and to some
extent how) they prepend something, and what copy function they pass to
bpf_catchpacket.
use an internal _bpf_mtap as "backend" for bpf_mtap and friends.
extend bpf_mtap_hdr so that it covers all common cases:
if dlen is 0, nothing gets prepended.
copy function can be given, if NULL the default bpf_mcopy is used.
adjust the existing bpf_mtap_hdr users to pass a NULL ptr for the copy fn.
re-implement bpf_mtap_af as simple wrapper for bpf_mtap_hdr.
re-implement bpf_mtap_ether using bpf_map_hdr
re-implement bpf_mtap_pflog as trivial bpf_mtap_hdr wrapper
ok bluhm benno


# 1.87 10-Mar-2014 mpi

if_lladdr -> if_sadl, no functional change.

ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.86 21-Nov-2013 mpi

Remove unneeded include.

ok deraadt@


# 1.85 18-Nov-2013 mpi

Convert trunk(4) to use a detachhook, discussed at b2k13 with many.

While here add a comment explaining detach hooks' order of execution when
destroying/detaching an interface.


Revision tags: OPENBSD_5_4_BASE
# 1.84 20-Jun-2013 mpi

Revert previous and unbreak asr, the new include should be protected.

Reported by naddy@


# 1.83 20-Jun-2013 mpi

Allocate the various hook head descriptors as part of the ifnet
structure rather than doing various M_WAITOK allocations during
the *attach() functions, we always rely on them anyway.

ok mikeb@, uebayasi@


# 1.82 11-May-2013 sthen

Set trunk(4)'s MTU to that of the first trunkport. Allows trunk to work with
jumbo/baby-jumbo frames. To avoid problems with mismatches between trunkports,
any additional ports must have the same MTU as already set on the trunk(4).
Based on changes made in FreeBSD. Tested by myself and jj@, ok reyk@


# 1.81 02-Apr-2013 mpi

Instead of storing the link-level address of every interface in a global
array indexed by interface numbers, add a new field to the interface
descriptor pointing to it.

claudio@ and todd@ like it, ok mikeb@


# 1.80 28-Mar-2013 tedu

no need for a lot of code to include proc.h


Revision tags: OPENBSD_5_3_BASE
# 1.79 25-Feb-2013 dlg

trunk_bcast_start sent packets on all its member interfaces by copying
the mbuf it just sent on the previous interface. this is bad because the
previous interface could have modified the mbuf chain, which can make the
subsequent m_copym()s panic.

this copies the dance that rtsock.c does for broadcasting mbufs which
copies the mbuf before transmit, except for the last interface which it
handles outside the loop.

tested by halex@ who verified it fixes his panic.
ok claudio@ deraadt@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.78 28-Oct-2011 krw

Take more care to ensure all callbacks are initialized. In particular
tr_linkstate() was not initialized in trunk_rr_attach(), and
tr_init() and tr_stop() were missing in trunk_lb_attach().

Fixes crash triggered by changing trunkproto, reported by Anders
Berggren on bugs@.

ok mpf henning


Revision tags: OPENBSD_4_9_BASE OPENBSD_5_0_BASE
# 1.77 28-Jan-2011 reyk

Fix another memory leak by replacing m_free() with m_freem() in trunk's
broadcast mode.

ok blambert@ mikeb@


# 1.76 12-Nov-2010 dhill

avoid pointer arithmetic on void *
no change in .o

ok claudio


Revision tags: OPENBSD_4_8_BASE
# 1.75 08-May-2010 stsp

Upon changing the MAC address of an if_trunk interface, all ports are switched
to the new MAC. But subsequently added ports were still being assigned the
old MAC address because it was copied from the wrong place. Give newly added
trunk ports the current MAC of the master port, rather than the saved MAC of
the master port. The saved MAC should only be used to restore the original
MAC address of the interface when it is removed from the trunk.

ok claudio@


# 1.74 23-Apr-2010 stsp

Use proper Queen's English in a comment. Drive-by fix, no functional change.


# 1.73 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


# 1.72 17-Apr-2010 deraadt

use ifnewlladdr() for trunk lladdr changes, too
ok stsp


Revision tags: OPENBSD_4_7_BASE
# 1.71 12-Jan-2010 dlg

set the length of the send queue to 1.

this prevents the ultimate length of the queue of the underlying interface
from being artificially inflated while hte vlan/trunk queue is filled and
then dumped wholesale on the underlying interface, which will dump its
massive queue wholesale on the chip.

tx mitigation is only triggered on real interfaces now (which is where the
cost is)

ok beck@ original diff ok kjc@ henning@


# 1.70 18-Nov-2009 deraadt

do not do setup that ether_ifattach() takes care of; ok jsg


# 1.69 17-Sep-2009 claudio

Add an splassert check trunk_enqueue() calling this function at anything
below splnet() is a good recipe for doom.
OK henning, reyk, mpf


# 1.68 09-Sep-2009 reyk

remove inline functions and move some code from the trunk_lacp_input()
API function directly to lacp_input() to simplify the code path.

ok mpf@


# 1.67 16-Jul-2009 thib

Backout rev1.79 of if_vlan.c and rev1.66 of if_trunk.c;
Changes in those revision limited the send queue to one slot.

This breaks NFS over vlan(4) has discovered by sthen@.

"just plain back it out." deraadt@


# 1.66 13-Jul-2009 dlg

make the send queue one slot long. this forces packets off the virtual
interfaces down to the queue on the physical interface immediately.

this avoids having the tx mitigation code wasting cpu time dicking around
with simply shuffling packets off virtual interface queues and lets it
do its job of ammortising the cost of calling a real interfaces start
routine.

it also prevents an artificial inflation of the physical interfaces queue
length where packets could hide on the virtual interfaces queues during
softnet before being dumped en masse onto the hardware. this will smooth
out the rate at which packets are submitted to the hardware.

kjc@ says this has no impact on altq. ya henning@


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.65 27-Jan-2009 naddy

handle HW VLAN tags being passed down; from Brad


# 1.64 27-Jan-2009 naddy

make the hardware/no hardware tag stripping cases consistent and don't
hash the VLAN priority; ok henning@


# 1.63 14-Dec-2008 brad

Allow trunk_hashmbuf() to take HW VLAN tagging into consideration.

ok mpf@ naddy@


# 1.62 14-Dec-2008 brad

Since trunk_hashmbuf() and thus trunk_lb_gethdr() are no longer specific
to the loadbalance code rename trunk_lb_gethdr() to just trunk_gethr().

ok mpf@


# 1.61 28-Nov-2008 brad

Eliminate the redundant bits of code for MTU and multicast handling
from the individual drivers now that ether_ioctl() handles this.

Shrinks the i386 kernels by..
RAMDISK - 2176 bytes
RAMDISKB - 1504 bytes
RAMDISKC - 736 bytes

Tested by naddy@/okan@/sthen@/brad@/todd@/jmc@ and lots of users.
Build tested on almost all archs by todd@/brad@

ok naddy@


# 1.60 16-Nov-2008 brad

Make sure to increment the the output error counter if
not using TRUNK_PROTO_NONE and there are no member ports.

ok mpf@


# 1.59 08-Nov-2008 mpf

Take into account that our ether_input() already strips the
ethernet header. This lets us actually process the incoming
LACP-Packets. It should now work with a lot more switches.
At least a Catalyst 3500 seems happy.
OK brad@


# 1.58 04-Nov-2008 brad

Move the trunk port count check from trunk_lb_start() to trunk_start()
before the protocol start routine is called so as to cover all protocols
with the same check.

ok mpf@


# 1.57 30-Oct-2008 brad

Fix building with !INET6 kernels.


# 1.56 28-Oct-2008 brad

Remove return at the end of a void function.


# 1.55 28-Oct-2008 brad

In trunk_media_status() mark the interface as active if any ports are
active rather than just the primary being UP.

From FreeBSD

Ok mpf@


# 1.54 28-Oct-2008 brad

In trunk_lb_start() port % count will never be greater than
TRUNK_MAX_PORTS so nuke the test.

From FreeBSD

Ok mpf@


# 1.53 28-Oct-2008 brad

Feed IPv6 flow label to hash calculation.

From FreeBSD

Ok mpf@


# 1.52 28-Oct-2008 brad

Show the ACTIVE flag in ifconfig for the single interface that is
actually active in failover mode rather than all interfaces with a
link. This makes it clear if the master interface is in use or one
of the backup links.

From FreeBSD

Tested by jmc@
Ok mpf@


# 1.51 02-Oct-2008 brad

First step towards cleaning up the Ethernet driver ioctl handling.
Move calling ether_ioctl() from the top of the ioctl function, which
at the moment does absolutely nothing, to the default switch case.
Thus allowing drivers to define their own ioctl handlers and then
falling back on ether_ioctl(). The only functional change this results
in at the moment is having all Ethernet drivers returning the proper
errno of ENOTTY instead of EINVAL/ENXIO when encountering unknown
ioctl's.

Shrinks the i386 kernels by..
RAMDISK - 1024 bytes
RAMDISKB - 1120 bytes
RAMDISKC - 832 bytes

Tested by martin@/jsing@/todd@/brad@
Build tested on almost all archs by todd@/brad@

ok jsing@


# 1.50 17-Sep-2008 chl

remove dead stores and newly created unused variables.

fix potential use of uninitialized value in trunk_port_ioctl() function.

Found by LLVM/Clang Static Analyzer.

ok mpf@ henning@


# 1.49 07-Aug-2008 damien

do not touch m after IFQ_ENQUEUE()+if_start().

ok brad@, mpf@, henning@, reyk@


Revision tags: OPENBSD_4_4_BASE
# 1.48 06-Aug-2008 reyk

fix trunk breakage that sneaked in with the lacp diff:

- don't use in-kernel IFMEDIA ioctls in lacp and remove two KASSERTs
that caused reliable panics - the lacp key can be locally assigned and
we don't need to query the media subtype here.

- unbreak failover/loadbalance/broadcast status handling.

Reported by brad@
ok deraadt@


# 1.47 30-Jul-2008 mpf

Prevent a divide by zero panic if trunkproto loadbalance is
used w/out any trunkports. Patch from Dmitri Alenitchev.
OK reyk@


# 1.46 15-Jun-2008 mpf

Add 802.3ad LACP support for trunk(4).
Implementation from NetBSD. Ported via FreeBSD's version in trunk^Wlagg(4).
This is still work in progress. Tested with a HP ProCurve 3500.
OK reyk@


# 1.45 14-Jun-2008 mpf

Move bpf_mtap_hdr() above the trunk_*_input() routines.
This makes it easier to add trunk protocols that consume some packets.
Add a special case for the failover protocol, to prevent shoving
duplicates to bpf. (Not beautiful, but it has to do for the moment).
OK reyk@, claudio@


# 1.44 13-Jun-2008 mpf

Move the responsibility to free received packets on trunked interfaces
from ether_input() into trunk_input() where it can be handled in a smarter way.
OK claudio@ and reyk@ on an earlier version.


# 1.43 08-Jun-2008 brad

Use m_freem() instead of m_free() in trunk_start() to ensure that the
full mbuf chain is being free'd.

ok reyk@


# 1.42 07-May-2008 dlg

enable tx mitigation when putting packets on the wire by switching from
calls to ifp->if_start to if_start(). these are the obviously right cases
where we can do that, the less obvious ones may follow as theyre figured
out.

deraadt@ said to go for it


Revision tags: OPENBSD_4_3_BASE
# 1.41 10-Jan-2008 brad

return with ENOTTY instead of EINVAL for unknown ioctl requests to trunk
ports.

ok reyk@ dlg@


# 1.40 26-Nov-2007 martynas

typos; ok jmc@
sys/netinet/in_pcb.c and sys/net/bridgestp.c ok henning@
sys/dev/pci/bktr/* ok jakemsr@


# 1.39 20-Nov-2007 canacar

Fix possible mbuf leak on error. ok reyk@


# 1.38 22-Oct-2007 reyk

use the input mbuf for the first active port instead of copying it in
the broadcast start routing.

ok pyr@


# 1.37 22-Oct-2007 pyr

Add a broadcast mode to trunk(4). This mode sends frames on all
ports and receives frame on any port. This allows interaction with
some L2 configurations.
with input and ok reyk@


# 1.36 15-Sep-2007 henning

malloc sweep:
-remove useless casts
-MALLOC/FREE -> malloc/free
-use M_ZERO where appropriate instead of seperate bzero
feedback & ok krw, hshoexer


# 1.35 07-Sep-2007 reyk

use M_ZERO


# 1.34 06-Sep-2007 reyk

bump the copyright while touching these files


# 1.33 06-Sep-2007 reyk

With a trunk(4) interface in fail over mode the trunk(4) interface
will show input errors for packets received from any of the ports that
are part of a fail over interface but are not the "master" port at the
time. This fixes the problem by checking the error condition
correctly.

From brad at comstyle dot com


Revision tags: OPENBSD_4_2_BASE
# 1.32 26-May-2007 jason

one extern seems to be better than 20 for ifqmaxlen; ok krw


# 1.31 26-Apr-2007 reyk

extend the trunk protocol API with some additional callbacks required
for future work. also move the repeated tx start code into a common
function.

parts of it are merged from FreeBSD's trunk(4) port. oh, wait... they
renamed it to 'lagg(4)' because a little green guy from Cizzco-Eeeh
told them "trunk is for VLANs, trunk is for VLANs". Bad FreeBSD, don't
listen to the little green guy from Cizzco-Eeeh!

ok claudio@


Revision tags: OPENBSD_4_1_BASE
# 1.30 31-Jan-2007 reyk

handle the full duplex link state in trunk(4). load sharing trunks
with at least two ports are always handled as full fuplex links. this
change will allow trunks as edge ports in a rstp bridge(4).

ok brad@ pyr@


Revision tags: OPENBSD_4_0_BASE
# 1.29 28-May-2006 reyk

check if the interface is active and UP. some, but not all, network
drivers report an active link state even if the interface is DOWN.
this should fix trunk with various ethernet devices.

ok brad@


# 1.28 23-May-2006 reyk

knf and remove an unneeded debug message


# 1.27 23-May-2006 reyk

add


# 1.26 20-May-2006 reyk

bump copyright


# 1.25 20-May-2006 reyk

drop packets received on inactive failover ports without increasing the
error counter. just silently drop them...

figured out by todd@, ok brad@


# 1.24 16-May-2006 reyk

the ifp->if_linkstatehooks may be NULL, add an extra check to avoid
possible kernel panic. this happened to me by using tun(4) in layer 2
mode (link0 flag) as a trunk port for testing.


# 1.23 25-Mar-2006 djm

allow bpf(4) to ignore packets based on their direction (inbound or
outbound), using a new BIOCSDIRFILT ioctl;
guidance, feedback and ok canacar@


# 1.22 11-Mar-2006 brad

splimp -> splnet


# 1.21 04-Mar-2006 brad

With the exception of two other small uncommited diffs this moves
the remainder of the network stack from splimp to splnet.

ok miod@


Revision tags: OPENBSD_3_9_BASE
# 1.20 04-Jan-2006 brad

Move bpf_mtap_hdr() after trunk_*_input() so that we hopefully see only one
copy of the packet with bpf.

From mcbride@

some testing by todd@, ok reyk@


# 1.19 04-Jan-2006 canacar

Remove redundant calls to bpfdetach.
ok brad@


# 1.18 21-Dec-2005 reyk

knf


# 1.17 21-Dec-2005 reyk

fix possible NULL pointer, thanks to Marco Molteni


# 1.16 18-Dec-2005 reyk

Update my e-mail address in the copyright statement, no binary changes.


# 1.15 17-Dec-2005 brad

revert last commit as it introduced system panics due to improper use
of bpf_mtap().


# 1.14 27-Nov-2005 mcbride

Move bpf_mtap() after trunk_*_input() so that we hopefull see only one
copy of the packet with bpf.

ok reyk@


# 1.13 27-Nov-2005 mcbride

Attempt to accept a packet only once when operating in failover mode.

Makes trunk usable with hubs or switches which don't have actual trunk support.

ok reyk@


# 1.12 27-Nov-2005 mcbride

Fix dereference of uninitialised pointer in trunk_input() error path.

ok reyk@


# 1.11 27-Nov-2005 mcbride

Make the trunk interface link state depend on the link states of the
trunkports (link is UP as long as at least one of the trunkports is up)

ok reyk@


# 1.10 23-Oct-2005 mpf

Rework of multicast deletion code for vlan(4) and trunk(4).
The previous code could wrongly delete multicast groups
on the parent interface. Now we forward only remembered
delete requests.
OK mcbride, mickey.


# 1.9 09-Oct-2005 reyk

use label


# 1.8 03-Oct-2005 reyk

add a simple active "failover" trunk mode. port priorities will be
added later, currently the master port will always be the default
(active) port and the the next active port will be used as the
failover port.

ok brad@


# 1.7 14-Sep-2005 reyk

correctly set IFF_RUNNING flag after device state changes.

ok brad@


# 1.6 11-Sep-2005 brad

when adding any IP addresses make sure to UP the interface.

ok reyk@


# 1.5 10-Sep-2005 reyk

update the trunk(4) driver

- add multicast support by passing multicast addresses to the ports.
this is a requirement for carp(4) over trunk(4).

- support the smallest common interface capabilities. ie., this adds
support for VLAN MTUs if all attached ports have this capability.

- add a port_destroy callback to the trunk protocol. this fixes a
potential crash if the master port has been detached while running.

discussed with deraadt@, brad@ and some others.


Revision tags: OPENBSD_3_8_BASE
# 1.4 31-Jul-2005 pascoe

Introduce bpf_mtap_af and bpf_mtap_hdr to be used when passing a mbuf chain
to bpf with either an address family or other header added.

These helpers only allocate a much smaller struct m_hdr on the stack when
needed, rather than leaving 256 byte struct mbufs on the stack in deep
call paths. Also removes a fair bit of duplicated code.

commit now, tune after deraadt@


# 1.3 27-May-2005 reyk

add missing free on error. thanks to Andrey Matveev.


# 1.2 24-May-2005 reyk

support trunk stacking (trunks as trunk ports) and some fixes

ok brad@


# 1.1 24-May-2005 reyk

initial import of a trunking (link aggregation and link failover)
implementation. it currently supports round robin mode with link state
checking, additional modes will be added later.

ok brad@, deraadt@


# 1.142 06-Nov-2019 dlg

replace the hooks used with if_detachhooks with a task list.

the main semantic change is that things registering detach hooks
have to allocate and set a task structure that then gets added to
the list. this means if the task is allocated up front (eg, as part
of carps softc or bridges port structure), it avoids the possibility
that adding a hook can fail. a lot of drivers weren't checking for
failure, and unwinding state in the event of failure in other parts
was error prone.

while doing this i discovered that the list operations have to be
in a particular order, but drivers weren't doing that consistently
either. this diff wraps the list ops up so you have to seriously
go out of your way to screw them up.

ive also sprinkled some NET_ASSERT_LOCKED around the list operations
so we can make sure there's no potential for the list to be corrupted,
especially while it's being run.

hrvoje popovski has tested this a bit, and some issues he discovered
have been fixed.

ok sashan@


Revision tags: OPENBSD_6_6_BASE
# 1.141 05-Jul-2019 dlg

record when trunk takes over an interface by setting ac_trunkport

this will be used to prevent trunk and the upcoming aggr driver
from taking ownership of an Ethernet interface at the same time.


# 1.140 11-May-2019 florian

A trunk(4) usually stays up when the link state of one of its members
changes. While we do get RTM_IFINFO messages for the (physical) member
interfaces there is no indication that something changed from the
trunk(4) interface.
It is helpful to get this information in userland from the trunk so that
userland daemons do not need to track interface membership by themselves.
OK phessler


# 1.139 29-Apr-2019 dlg

tr_unit is unused, so gc it


# 1.138 23-Apr-2019 dlg

a first cut at converting some virtual ethernet interfaces to if_vinput

this let's input processing bypass ifiqs. there's a performance
benefit from this, and it will let me tweak the backpressure detection
mechanism that ifiqs use without impacting on a stack of virtual
interfaces.

ive tested all of these except mpw, which i will end up testing
soon anyway.


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.137 12-Aug-2018 ccardenas

Add administrative options to LACP trunk implementation.

The trunk driver now has a new ioctl (SIOCxTRUNKOPTS), which for now only
has options for LACP:
* Mode - Active or Passive (default Active)
* Timeout - Fast or Slow (default Slow)
* System Priority - 1(high) to 65535(low) (default 32768/0x8000)
* Port Priority - 1(high) to 65535(low) (default 32768/0x8000)
* IFQ Priority - 0 to NUM_QUEUES (default 6)

At the moment, ifconfig only has options for lacpmode and lacptimeout
plumbed as those are the immediate need.

The approach taken for the options was to make them on a "trunk" vs a
"port" as what's typically seen on various NOSes (JunOS, NXOS, etc...)
as it's uncommon for a host to have one link "Passive" and the other
"Active" in a given trunk.

Just like on a NOS, when applying lacpmode or lacptimeout, the settings
are immediately applied to all existing ports in the trunk and to all
future ports brought into the trunk.

Tested by many on a plethora of NIC drivers and switches.

Ok remi@


Revision tags: OPENBSD_6_3_BASE
# 1.136 19-Feb-2018 mpi

Remove almost unused `flags' argument of suser().

The account flag `ASU' will no longer be set but that makes suser()
mpsafe since it no longer mess with a per-process field.

No objection from millert@, ok tedu@, bluhm@


# 1.135 09-Jan-2018 bluhm

Creating a cloned interface could return ENOMEM due to temporary
memory shortage. As it is invoked from a system call, it should
not fail and wait instead.
OK visa@ mpi@


Revision tags: OPENBSD_6_2_BASE
# 1.134 14-Aug-2017 reyk

The "ret" return value is reused and overwritten, potentially
returning 0 (success) on error instead of an error number. The caller
doesn't evaluate the return value, so it is good enough to return
ENOBUFS (non-0) on error and to remove "ret" in trunk_cast_start().

Coverity CID 1453105; Severity: Minor

OK mpi@


# 1.133 11-Aug-2017 mpi

Remove NET_LOCK()'s argument.

Tested by Hrvoje Popovski, ok bluhm@


# 1.132 28-May-2017 mpi

Add missing NET_UNLOCK() in error path.

Spotted by sashan@


# 1.131 28-May-2017 mpi

trunk_port_destroy() needs the NET_LOCK().

It brings the interface down and restore the original lladdr.

Found by Hrvoje Popovski


# 1.130 28-May-2017 mpi

Remove useless splnet()/splx() dances.

Data structures modified in the ioctl path are protected by the NET_LOCK().

ok sashan@


Revision tags: OPENBSD_6_1_BASE
# 1.129 22-Jan-2017 dlg

move counting if_opackets next to counting if_obytes in if_enqueue.

this means packets are consistently counted in one place, unlike the
many and various ways that drivers thought they should do it.

ok mpi@ deraadt@


# 1.128 16-Sep-2016 mikeb

Reconfigure interface capabilities after switching trunkproto; ok mpi


Revision tags: OPENBSD_6_0_BASE
# 1.127 13-Apr-2016 mpi

We're always ready! So send IFQ_SET_READY() to the bitbucket.


Revision tags: OPENBSD_5_9_BASE
# 1.126 31-Dec-2015 sthen

Move tr_port_destroy down; fixes 'lacp_compose_key protection fault trap'
when removing a port from a lacp trunk. Part of a larger diff from mpi,
as suggested by mikeb. ok mpi@


# 1.125 21-Nov-2015 dlg

dont check IFF_OACTIVE to see if the port is busy.

dont check if its busy at all, actually.

fine with reyk@


# 1.124 20-Nov-2015 dlg

dont play with IFF_OACTIVE needlessly.

only a driver sets or clears it, and trunk never sets it. therefore it
never needs to clear it.


# 1.123 12-Nov-2015 mpi

Prefix flowid with ph_ and print it in m_print().

ok dlg@


# 1.122 25-Oct-2015 mpi

arp_ifinit() is no longer required.


# 1.121 08-Oct-2015 mikeb

Make sure that when trunk_port_ioctl is called to set a new
lladdr the trunk port is already on the list.

OK mpi


# 1.120 08-Oct-2015 dlg

if the mbuf has a valid flowid, use it instead of using siphash24
and a bunch of header fields we have to parse the mbuf for.

siphash24 is about 20% of the cost of sending a udp packet on a
trunk interface with tcpbench on my box. if there's a flowid set
we get all that back.

ok mpi@ mikeb@ sthen@


# 1.119 05-Oct-2015 mikeb

Factor LACP frame processing out to a separate task

This is slightly refactored version of the diff by jmatthew@
that makes use of a single per-trunk task but retains per-port
mbuf queues.

Running LACP frame processing in a task context allows a simple
way to synchronize changes to the trunk ports and trunk itself
performed from the ioctl, timeout and task contexts with a kernel
lock.

OK mpi


# 1.118 29-Sep-2015 deraadt

add sizes to some of the simpler free calls
ok mpi


# 1.117 28-Sep-2015 mpi

Remove "if_tp" from the "struct ifnet".

Instead of violating a layer of abstraction by keeping per pseudo-driver
informations in "struct ifnet", the port trunk is now passed as a cookie
to the interface input handler (ifih).

The time of per pseudo-driver hack in the network stack is over!

ok mikeb@


# 1.116 24-Sep-2015 mikeb

add a comment explaining how we serialize when switching trunkproto;
requested by mpi@


# 1.115 24-Sep-2015 mikeb

Avoid a theoretical m_pullup(9) mishandling by delegating the mbuf
reclaiming to the PDU and marker input routines.

m_pullup may return a pointer to the newly allocated mbuf. In this
case m_freem is called by the trunk_input, not by the proto specific
code and pointer to the mbuf is not passed by reference. Therefore
m_freem will either be called on the middle element of the chain
(when the m_pullup call succeeds) or on the stale pointer (when it
frees the chain in the failure case). Fortunately we should never
hit this case as the receive path uniformly uses contiguous chunks
of memory.

Verified with and ok blambert, ok mpi


# 1.114 23-Sep-2015 mikeb

Serialize trunk changes with input handler insertion and removal.

This moves around calls to if_ih_insert and if_ih_remove to ensure
that we either have completed port initialization or are going to
tear the port configuration down and don't want any input processes
to get hold of the port.

When trunk_port_destroy is called from the ioctl this would wait for
all input processes to finish and release their references to be able
to disestablish the input handler and ensure full control of the port.

When switching trunkproto it is required for the ioctl context to
be able to touch all trunk ports and the protocol (tr_psc). The
easiest way do this is to disestablish all input handlers (while
making sure they all complete) and then reestablish them after the
trunk reconfiguration is completed.

This avoids getting trunk a separate locking protocol of its own.

ok mpi, suggested by and ok dlg


# 1.113 23-Sep-2015 mikeb

Keep track of an active port in the failover trunk to avoid list
iterations and additional locking protection in the future.

Suggested by and ok mpi


# 1.112 23-Sep-2015 mikeb

Remove trunk watchdog code since it doesn't do anything useful
and we want to limit the number of different places where we
access trunk port pointers.

trunk_watchdog should be never called as we don't set up it's
if_timer and trunk_port_watchdog just calls the if_watchdog
from the underlying interface.

It's possible that this is no longer needed due to if_slowtimo/
if_watchdog changes done earlier.

ok mpi


# 1.111 10-Sep-2015 mikeb

pass a cookie argument to interface input handlers that can be used
to pass additional context or transient data with the similar life
time.

ok mpi, suggestions, hand holding and ok from dlg


# 1.110 10-Sep-2015 dlg

move the if input handler list to an SRP list.

instead of having every driver that manipulates the ifih list
understand SRPLs, this moves that processing into if_ih_insert and
if_ih_remove functions.

we rely on the kernel lock to serialise the modifications to the
list.

tested by mpi@
ok mpi@ claudio@ mikeb@


Revision tags: OPENBSD_5_8_BASE
# 1.109 17-Jul-2015 mpi

Drop promiscuously received packets if the trunk(4) interface is not
in promiscuous mode.

The long story is that claudio@ had his ssh session reset multiple
times in the hackroom because czarkoff@'s machine was sending reset.
We figured out that the packet was reaching pf because of this missing
check. pf would then not find any state and sent a reset.

Analyzed with and ok phessler@, claudio@


# 1.108 02-Jul-2015 mpi

Unify the check for up & running between all pseudo-drivers.


# 1.107 02-Jul-2015 mpi

By design if_input_process() needs to hold a reference on the receiving
ifp in order to access its ifih handlers.

So get rid of if_get() in the various ifih handlers we know the ifp is
live at this point.

ok dlg@


# 1.106 30-Jun-2015 mpi

Rename if_output() into if_enqueue() to avoid confusion with comments
talking about (*ifp->if_output)().

ok claudio@, dlg@


# 1.105 29-Jun-2015 dlg

count if_ibytes in if_input like we do for if_ipackets.

tweaks and ok mpi@


# 1.104 24-Jun-2015 mpi

Increment if_ipackets in if_input().

Note that pseudo-drivers not using if_input() are not affected by this
conversion.

ok mikeb@, kettenis@, claudio@, dlg@


# 1.103 16-Jun-2015 mpi

Store a unique ID, an interface index, rather than a pointer to the
receiving interface in the packet header of every mbuf.

The interface pointer should now be retrieved when necessary with
if_get(). If a NULL pointer is returned by if_get(), the interface
has probably been destroy/removed and the mbuf should be freed.

Such mechanism will simplify garbage collection of mbufs and limit
problems with dangling ifp pointers.

Tested by jmatthew@ and krw@, discussed with many.

ok mikeb@, bluhm@, dlg@


# 1.102 15-Jun-2015 mpi

Fix a double free in the destroy path triggered when a second process,
in my case dhclient(8), races with ifconfig(8) to free the descriptors
of the joined multicast groups.

While here reduce the difference with carp(4).

ok dms@


# 1.101 09-Jun-2015 mpi

Convert trunk(4) to if_input().

ok dlg@


# 1.100 26-May-2015 mpi

Now that the Ethernet header is always passed as part of the mbuf, kill
the second (unused) argument of the input packet handlers.

ok dlg@


# 1.99 15-May-2015 mpi

Introduce if_output(), a function do to the last steps before enqueuing
a packet on the sending queue of an interface.

Tested by many, thanks a lot!

ok dlg@, claudio@


# 1.98 14-May-2015 mpi

Allocate the input packet handler as part of the trunk_port structure
since they have the same lifetime.

Requested by and ok dlg@


# 1.97 13-May-2015 mpi

Get rid of the last "#if NTRUNK" by overwriting trunk ports' output
function.

ok claudio@, reyk@


# 1.96 11-May-2015 mpi

Take trunk(4) out of ether_input().

Each physical interface of a trunk now gets a specific ifih (interface
input handler) that runs before ether_input().

Tested by sthen@, dlg@, Theo Buehler and <mxb AT alumni.chalmers DOT se>

ok sthen@, dlg@


# 1.95 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.94 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.93 04-Dec-2014 tedu

use siphash for trunk loadbalancing. ok deraadt


# 1.92 01-Dec-2014 mikeb

Make every interface with a watchdog register it's own slow timeout

This removes the system wide if_slowtimo timeout and lets every
interface with a valid if_watchdog method register it's own in
order to get rid of the ifnet loop in the softclock context and
avoid further complications with concurrent access to the ifnet
list.

ok deraadt, input and ok mpi, looked at by claudio


# 1.91 18-Nov-2014 tedu

move arc4random prototype to systm.h. more appropriate for most code
to include that than rdnvar.h. ok deraadt dlg


Revision tags: OPENBSD_5_6_BASE
# 1.90 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.89 12-Jul-2014 tedu

add a size argument to free. will be used soon, but for now default to 0.
after discussions with beck deraadt kettenis.


# 1.88 09-Jul-2014 henning

bpf code surgery / shuffling / simplification.
the various bpf_mtap_* are very similiar, they differ in what (and to some
extent how) they prepend something, and what copy function they pass to
bpf_catchpacket.
use an internal _bpf_mtap as "backend" for bpf_mtap and friends.
extend bpf_mtap_hdr so that it covers all common cases:
if dlen is 0, nothing gets prepended.
copy function can be given, if NULL the default bpf_mcopy is used.
adjust the existing bpf_mtap_hdr users to pass a NULL ptr for the copy fn.
re-implement bpf_mtap_af as simple wrapper for bpf_mtap_hdr.
re-implement bpf_mtap_ether using bpf_map_hdr
re-implement bpf_mtap_pflog as trivial bpf_mtap_hdr wrapper
ok bluhm benno


# 1.87 10-Mar-2014 mpi

if_lladdr -> if_sadl, no functional change.

ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.86 21-Nov-2013 mpi

Remove unneeded include.

ok deraadt@


# 1.85 18-Nov-2013 mpi

Convert trunk(4) to use a detachhook, discussed at b2k13 with many.

While here add a comment explaining detach hooks' order of execution when
destroying/detaching an interface.


Revision tags: OPENBSD_5_4_BASE
# 1.84 20-Jun-2013 mpi

Revert previous and unbreak asr, the new include should be protected.

Reported by naddy@


# 1.83 20-Jun-2013 mpi

Allocate the various hook head descriptors as part of the ifnet
structure rather than doing various M_WAITOK allocations during
the *attach() functions, we always rely on them anyway.

ok mikeb@, uebayasi@


# 1.82 11-May-2013 sthen

Set trunk(4)'s MTU to that of the first trunkport. Allows trunk to work with
jumbo/baby-jumbo frames. To avoid problems with mismatches between trunkports,
any additional ports must have the same MTU as already set on the trunk(4).
Based on changes made in FreeBSD. Tested by myself and jj@, ok reyk@


# 1.81 02-Apr-2013 mpi

Instead of storing the link-level address of every interface in a global
array indexed by interface numbers, add a new field to the interface
descriptor pointing to it.

claudio@ and todd@ like it, ok mikeb@


# 1.80 28-Mar-2013 tedu

no need for a lot of code to include proc.h


Revision tags: OPENBSD_5_3_BASE
# 1.79 25-Feb-2013 dlg

trunk_bcast_start sent packets on all its member interfaces by copying
the mbuf it just sent on the previous interface. this is bad because the
previous interface could have modified the mbuf chain, which can make the
subsequent m_copym()s panic.

this copies the dance that rtsock.c does for broadcasting mbufs which
copies the mbuf before transmit, except for the last interface which it
handles outside the loop.

tested by halex@ who verified it fixes his panic.
ok claudio@ deraadt@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.78 28-Oct-2011 krw

Take more care to ensure all callbacks are initialized. In particular
tr_linkstate() was not initialized in trunk_rr_attach(), and
tr_init() and tr_stop() were missing in trunk_lb_attach().

Fixes crash triggered by changing trunkproto, reported by Anders
Berggren on bugs@.

ok mpf henning


Revision tags: OPENBSD_4_9_BASE OPENBSD_5_0_BASE
# 1.77 28-Jan-2011 reyk

Fix another memory leak by replacing m_free() with m_freem() in trunk's
broadcast mode.

ok blambert@ mikeb@


# 1.76 12-Nov-2010 dhill

avoid pointer arithmetic on void *
no change in .o

ok claudio


Revision tags: OPENBSD_4_8_BASE
# 1.75 08-May-2010 stsp

Upon changing the MAC address of an if_trunk interface, all ports are switched
to the new MAC. But subsequently added ports were still being assigned the
old MAC address because it was copied from the wrong place. Give newly added
trunk ports the current MAC of the master port, rather than the saved MAC of
the master port. The saved MAC should only be used to restore the original
MAC address of the interface when it is removed from the trunk.

ok claudio@


# 1.74 23-Apr-2010 stsp

Use proper Queen's English in a comment. Drive-by fix, no functional change.


# 1.73 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


# 1.72 17-Apr-2010 deraadt

use ifnewlladdr() for trunk lladdr changes, too
ok stsp


Revision tags: OPENBSD_4_7_BASE
# 1.71 12-Jan-2010 dlg

set the length of the send queue to 1.

this prevents the ultimate length of the queue of the underlying interface
from being artificially inflated while hte vlan/trunk queue is filled and
then dumped wholesale on the underlying interface, which will dump its
massive queue wholesale on the chip.

tx mitigation is only triggered on real interfaces now (which is where the
cost is)

ok beck@ original diff ok kjc@ henning@


# 1.70 18-Nov-2009 deraadt

do not do setup that ether_ifattach() takes care of; ok jsg


# 1.69 17-Sep-2009 claudio

Add an splassert check trunk_enqueue() calling this function at anything
below splnet() is a good recipe for doom.
OK henning, reyk, mpf


# 1.68 09-Sep-2009 reyk

remove inline functions and move some code from the trunk_lacp_input()
API function directly to lacp_input() to simplify the code path.

ok mpf@


# 1.67 16-Jul-2009 thib

Backout rev1.79 of if_vlan.c and rev1.66 of if_trunk.c;
Changes in those revision limited the send queue to one slot.

This breaks NFS over vlan(4) has discovered by sthen@.

"just plain back it out." deraadt@


# 1.66 13-Jul-2009 dlg

make the send queue one slot long. this forces packets off the virtual
interfaces down to the queue on the physical interface immediately.

this avoids having the tx mitigation code wasting cpu time dicking around
with simply shuffling packets off virtual interface queues and lets it
do its job of ammortising the cost of calling a real interfaces start
routine.

it also prevents an artificial inflation of the physical interfaces queue
length where packets could hide on the virtual interfaces queues during
softnet before being dumped en masse onto the hardware. this will smooth
out the rate at which packets are submitted to the hardware.

kjc@ says this has no impact on altq. ya henning@


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.65 27-Jan-2009 naddy

handle HW VLAN tags being passed down; from Brad


# 1.64 27-Jan-2009 naddy

make the hardware/no hardware tag stripping cases consistent and don't
hash the VLAN priority; ok henning@


# 1.63 14-Dec-2008 brad

Allow trunk_hashmbuf() to take HW VLAN tagging into consideration.

ok mpf@ naddy@


# 1.62 14-Dec-2008 brad

Since trunk_hashmbuf() and thus trunk_lb_gethdr() are no longer specific
to the loadbalance code rename trunk_lb_gethdr() to just trunk_gethr().

ok mpf@


# 1.61 28-Nov-2008 brad

Eliminate the redundant bits of code for MTU and multicast handling
from the individual drivers now that ether_ioctl() handles this.

Shrinks the i386 kernels by..
RAMDISK - 2176 bytes
RAMDISKB - 1504 bytes
RAMDISKC - 736 bytes

Tested by naddy@/okan@/sthen@/brad@/todd@/jmc@ and lots of users.
Build tested on almost all archs by todd@/brad@

ok naddy@


# 1.60 16-Nov-2008 brad

Make sure to increment the the output error counter if
not using TRUNK_PROTO_NONE and there are no member ports.

ok mpf@


# 1.59 08-Nov-2008 mpf

Take into account that our ether_input() already strips the
ethernet header. This lets us actually process the incoming
LACP-Packets. It should now work with a lot more switches.
At least a Catalyst 3500 seems happy.
OK brad@


# 1.58 04-Nov-2008 brad

Move the trunk port count check from trunk_lb_start() to trunk_start()
before the protocol start routine is called so as to cover all protocols
with the same check.

ok mpf@


# 1.57 30-Oct-2008 brad

Fix building with !INET6 kernels.


# 1.56 28-Oct-2008 brad

Remove return at the end of a void function.


# 1.55 28-Oct-2008 brad

In trunk_media_status() mark the interface as active if any ports are
active rather than just the primary being UP.

From FreeBSD

Ok mpf@


# 1.54 28-Oct-2008 brad

In trunk_lb_start() port % count will never be greater than
TRUNK_MAX_PORTS so nuke the test.

From FreeBSD

Ok mpf@


# 1.53 28-Oct-2008 brad

Feed IPv6 flow label to hash calculation.

From FreeBSD

Ok mpf@


# 1.52 28-Oct-2008 brad

Show the ACTIVE flag in ifconfig for the single interface that is
actually active in failover mode rather than all interfaces with a
link. This makes it clear if the master interface is in use or one
of the backup links.

From FreeBSD

Tested by jmc@
Ok mpf@


# 1.51 02-Oct-2008 brad

First step towards cleaning up the Ethernet driver ioctl handling.
Move calling ether_ioctl() from the top of the ioctl function, which
at the moment does absolutely nothing, to the default switch case.
Thus allowing drivers to define their own ioctl handlers and then
falling back on ether_ioctl(). The only functional change this results
in at the moment is having all Ethernet drivers returning the proper
errno of ENOTTY instead of EINVAL/ENXIO when encountering unknown
ioctl's.

Shrinks the i386 kernels by..
RAMDISK - 1024 bytes
RAMDISKB - 1120 bytes
RAMDISKC - 832 bytes

Tested by martin@/jsing@/todd@/brad@
Build tested on almost all archs by todd@/brad@

ok jsing@


# 1.50 17-Sep-2008 chl

remove dead stores and newly created unused variables.

fix potential use of uninitialized value in trunk_port_ioctl() function.

Found by LLVM/Clang Static Analyzer.

ok mpf@ henning@


# 1.49 07-Aug-2008 damien

do not touch m after IFQ_ENQUEUE()+if_start().

ok brad@, mpf@, henning@, reyk@


Revision tags: OPENBSD_4_4_BASE
# 1.48 06-Aug-2008 reyk

fix trunk breakage that sneaked in with the lacp diff:

- don't use in-kernel IFMEDIA ioctls in lacp and remove two KASSERTs
that caused reliable panics - the lacp key can be locally assigned and
we don't need to query the media subtype here.

- unbreak failover/loadbalance/broadcast status handling.

Reported by brad@
ok deraadt@


# 1.47 30-Jul-2008 mpf

Prevent a divide by zero panic if trunkproto loadbalance is
used w/out any trunkports. Patch from Dmitri Alenitchev.
OK reyk@


# 1.46 15-Jun-2008 mpf

Add 802.3ad LACP support for trunk(4).
Implementation from NetBSD. Ported via FreeBSD's version in trunk^Wlagg(4).
This is still work in progress. Tested with a HP ProCurve 3500.
OK reyk@


# 1.45 14-Jun-2008 mpf

Move bpf_mtap_hdr() above the trunk_*_input() routines.
This makes it easier to add trunk protocols that consume some packets.
Add a special case for the failover protocol, to prevent shoving
duplicates to bpf. (Not beautiful, but it has to do for the moment).
OK reyk@, claudio@


# 1.44 13-Jun-2008 mpf

Move the responsibility to free received packets on trunked interfaces
from ether_input() into trunk_input() where it can be handled in a smarter way.
OK claudio@ and reyk@ on an earlier version.


# 1.43 08-Jun-2008 brad

Use m_freem() instead of m_free() in trunk_start() to ensure that the
full mbuf chain is being free'd.

ok reyk@


# 1.42 07-May-2008 dlg

enable tx mitigation when putting packets on the wire by switching from
calls to ifp->if_start to if_start(). these are the obviously right cases
where we can do that, the less obvious ones may follow as theyre figured
out.

deraadt@ said to go for it


Revision tags: OPENBSD_4_3_BASE
# 1.41 10-Jan-2008 brad

return with ENOTTY instead of EINVAL for unknown ioctl requests to trunk
ports.

ok reyk@ dlg@


# 1.40 26-Nov-2007 martynas

typos; ok jmc@
sys/netinet/in_pcb.c and sys/net/bridgestp.c ok henning@
sys/dev/pci/bktr/* ok jakemsr@


# 1.39 20-Nov-2007 canacar

Fix possible mbuf leak on error. ok reyk@


# 1.38 22-Oct-2007 reyk

use the input mbuf for the first active port instead of copying it in
the broadcast start routing.

ok pyr@


# 1.37 22-Oct-2007 pyr

Add a broadcast mode to trunk(4). This mode sends frames on all
ports and receives frame on any port. This allows interaction with
some L2 configurations.
with input and ok reyk@


# 1.36 15-Sep-2007 henning

malloc sweep:
-remove useless casts
-MALLOC/FREE -> malloc/free
-use M_ZERO where appropriate instead of seperate bzero
feedback & ok krw, hshoexer


# 1.35 07-Sep-2007 reyk

use M_ZERO


# 1.34 06-Sep-2007 reyk

bump the copyright while touching these files


# 1.33 06-Sep-2007 reyk

With a trunk(4) interface in fail over mode the trunk(4) interface
will show input errors for packets received from any of the ports that
are part of a fail over interface but are not the "master" port at the
time. This fixes the problem by checking the error condition
correctly.

From brad at comstyle dot com


Revision tags: OPENBSD_4_2_BASE
# 1.32 26-May-2007 jason

one extern seems to be better than 20 for ifqmaxlen; ok krw


# 1.31 26-Apr-2007 reyk

extend the trunk protocol API with some additional callbacks required
for future work. also move the repeated tx start code into a common
function.

parts of it are merged from FreeBSD's trunk(4) port. oh, wait... they
renamed it to 'lagg(4)' because a little green guy from Cizzco-Eeeh
told them "trunk is for VLANs, trunk is for VLANs". Bad FreeBSD, don't
listen to the little green guy from Cizzco-Eeeh!

ok claudio@


Revision tags: OPENBSD_4_1_BASE
# 1.30 31-Jan-2007 reyk

handle the full duplex link state in trunk(4). load sharing trunks
with at least two ports are always handled as full fuplex links. this
change will allow trunks as edge ports in a rstp bridge(4).

ok brad@ pyr@


Revision tags: OPENBSD_4_0_BASE
# 1.29 28-May-2006 reyk

check if the interface is active and UP. some, but not all, network
drivers report an active link state even if the interface is DOWN.
this should fix trunk with various ethernet devices.

ok brad@


# 1.28 23-May-2006 reyk

knf and remove an unneeded debug message


# 1.27 23-May-2006 reyk

add


# 1.26 20-May-2006 reyk

bump copyright


# 1.25 20-May-2006 reyk

drop packets received on inactive failover ports without increasing the
error counter. just silently drop them...

figured out by todd@, ok brad@


# 1.24 16-May-2006 reyk

the ifp->if_linkstatehooks may be NULL, add an extra check to avoid
possible kernel panic. this happened to me by using tun(4) in layer 2
mode (link0 flag) as a trunk port for testing.


# 1.23 25-Mar-2006 djm

allow bpf(4) to ignore packets based on their direction (inbound or
outbound), using a new BIOCSDIRFILT ioctl;
guidance, feedback and ok canacar@


# 1.22 11-Mar-2006 brad

splimp -> splnet


# 1.21 04-Mar-2006 brad

With the exception of two other small uncommited diffs this moves
the remainder of the network stack from splimp to splnet.

ok miod@


Revision tags: OPENBSD_3_9_BASE
# 1.20 04-Jan-2006 brad

Move bpf_mtap_hdr() after trunk_*_input() so that we hopefully see only one
copy of the packet with bpf.

From mcbride@

some testing by todd@, ok reyk@


# 1.19 04-Jan-2006 canacar

Remove redundant calls to bpfdetach.
ok brad@


# 1.18 21-Dec-2005 reyk

knf


# 1.17 21-Dec-2005 reyk

fix possible NULL pointer, thanks to Marco Molteni


# 1.16 18-Dec-2005 reyk

Update my e-mail address in the copyright statement, no binary changes.


# 1.15 17-Dec-2005 brad

revert last commit as it introduced system panics due to improper use
of bpf_mtap().


# 1.14 27-Nov-2005 mcbride

Move bpf_mtap() after trunk_*_input() so that we hopefull see only one
copy of the packet with bpf.

ok reyk@


# 1.13 27-Nov-2005 mcbride

Attempt to accept a packet only once when operating in failover mode.

Makes trunk usable with hubs or switches which don't have actual trunk support.

ok reyk@


# 1.12 27-Nov-2005 mcbride

Fix dereference of uninitialised pointer in trunk_input() error path.

ok reyk@


# 1.11 27-Nov-2005 mcbride

Make the trunk interface link state depend on the link states of the
trunkports (link is UP as long as at least one of the trunkports is up)

ok reyk@


# 1.10 23-Oct-2005 mpf

Rework of multicast deletion code for vlan(4) and trunk(4).
The previous code could wrongly delete multicast groups
on the parent interface. Now we forward only remembered
delete requests.
OK mcbride, mickey.


# 1.9 09-Oct-2005 reyk

use label


# 1.8 03-Oct-2005 reyk

add a simple active "failover" trunk mode. port priorities will be
added later, currently the master port will always be the default
(active) port and the the next active port will be used as the
failover port.

ok brad@


# 1.7 14-Sep-2005 reyk

correctly set IFF_RUNNING flag after device state changes.

ok brad@


# 1.6 11-Sep-2005 brad

when adding any IP addresses make sure to UP the interface.

ok reyk@


# 1.5 10-Sep-2005 reyk

update the trunk(4) driver

- add multicast support by passing multicast addresses to the ports.
this is a requirement for carp(4) over trunk(4).

- support the smallest common interface capabilities. ie., this adds
support for VLAN MTUs if all attached ports have this capability.

- add a port_destroy callback to the trunk protocol. this fixes a
potential crash if the master port has been detached while running.

discussed with deraadt@, brad@ and some others.


Revision tags: OPENBSD_3_8_BASE
# 1.4 31-Jul-2005 pascoe

Introduce bpf_mtap_af and bpf_mtap_hdr to be used when passing a mbuf chain
to bpf with either an address family or other header added.

These helpers only allocate a much smaller struct m_hdr on the stack when
needed, rather than leaving 256 byte struct mbufs on the stack in deep
call paths. Also removes a fair bit of duplicated code.

commit now, tune after deraadt@


# 1.3 27-May-2005 reyk

add missing free on error. thanks to Andrey Matveev.


# 1.2 24-May-2005 reyk

support trunk stacking (trunks as trunk ports) and some fixes

ok brad@


# 1.1 24-May-2005 reyk

initial import of a trunking (link aggregation and link failover)
implementation. it currently supports round robin mode with link state
checking, additional modes will be added later.

ok brad@, deraadt@


# 1.141 05-Jul-2019 dlg

record when trunk takes over an interface by setting ac_trunkport

this will be used to prevent trunk and the upcoming aggr driver
from taking ownership of an Ethernet interface at the same time.


# 1.140 11-May-2019 florian

A trunk(4) usually stays up when the link state of one of its members
changes. While we do get RTM_IFINFO messages for the (physical) member
interfaces there is no indication that something changed from the
trunk(4) interface.
It is helpful to get this information in userland from the trunk so that
userland daemons do not need to track interface membership by themselves.
OK phessler


# 1.139 29-Apr-2019 dlg

tr_unit is unused, so gc it


# 1.138 23-Apr-2019 dlg

a first cut at converting some virtual ethernet interfaces to if_vinput

this let's input processing bypass ifiqs. there's a performance
benefit from this, and it will let me tweak the backpressure detection
mechanism that ifiqs use without impacting on a stack of virtual
interfaces.

ive tested all of these except mpw, which i will end up testing
soon anyway.


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.137 12-Aug-2018 ccardenas

Add administrative options to LACP trunk implementation.

The trunk driver now has a new ioctl (SIOCxTRUNKOPTS), which for now only
has options for LACP:
* Mode - Active or Passive (default Active)
* Timeout - Fast or Slow (default Slow)
* System Priority - 1(high) to 65535(low) (default 32768/0x8000)
* Port Priority - 1(high) to 65535(low) (default 32768/0x8000)
* IFQ Priority - 0 to NUM_QUEUES (default 6)

At the moment, ifconfig only has options for lacpmode and lacptimeout
plumbed as those are the immediate need.

The approach taken for the options was to make them on a "trunk" vs a
"port" as what's typically seen on various NOSes (JunOS, NXOS, etc...)
as it's uncommon for a host to have one link "Passive" and the other
"Active" in a given trunk.

Just like on a NOS, when applying lacpmode or lacptimeout, the settings
are immediately applied to all existing ports in the trunk and to all
future ports brought into the trunk.

Tested by many on a plethora of NIC drivers and switches.

Ok remi@


Revision tags: OPENBSD_6_3_BASE
# 1.136 19-Feb-2018 mpi

Remove almost unused `flags' argument of suser().

The account flag `ASU' will no longer be set but that makes suser()
mpsafe since it no longer mess with a per-process field.

No objection from millert@, ok tedu@, bluhm@


# 1.135 09-Jan-2018 bluhm

Creating a cloned interface could return ENOMEM due to temporary
memory shortage. As it is invoked from a system call, it should
not fail and wait instead.
OK visa@ mpi@


Revision tags: OPENBSD_6_2_BASE
# 1.134 14-Aug-2017 reyk

The "ret" return value is reused and overwritten, potentially
returning 0 (success) on error instead of an error number. The caller
doesn't evaluate the return value, so it is good enough to return
ENOBUFS (non-0) on error and to remove "ret" in trunk_cast_start().

Coverity CID 1453105; Severity: Minor

OK mpi@


# 1.133 11-Aug-2017 mpi

Remove NET_LOCK()'s argument.

Tested by Hrvoje Popovski, ok bluhm@


# 1.132 28-May-2017 mpi

Add missing NET_UNLOCK() in error path.

Spotted by sashan@


# 1.131 28-May-2017 mpi

trunk_port_destroy() needs the NET_LOCK().

It brings the interface down and restore the original lladdr.

Found by Hrvoje Popovski


# 1.130 28-May-2017 mpi

Remove useless splnet()/splx() dances.

Data structures modified in the ioctl path are protected by the NET_LOCK().

ok sashan@


Revision tags: OPENBSD_6_1_BASE
# 1.129 22-Jan-2017 dlg

move counting if_opackets next to counting if_obytes in if_enqueue.

this means packets are consistently counted in one place, unlike the
many and various ways that drivers thought they should do it.

ok mpi@ deraadt@


# 1.128 16-Sep-2016 mikeb

Reconfigure interface capabilities after switching trunkproto; ok mpi


Revision tags: OPENBSD_6_0_BASE
# 1.127 13-Apr-2016 mpi

We're always ready! So send IFQ_SET_READY() to the bitbucket.


Revision tags: OPENBSD_5_9_BASE
# 1.126 31-Dec-2015 sthen

Move tr_port_destroy down; fixes 'lacp_compose_key protection fault trap'
when removing a port from a lacp trunk. Part of a larger diff from mpi,
as suggested by mikeb. ok mpi@


# 1.125 21-Nov-2015 dlg

dont check IFF_OACTIVE to see if the port is busy.

dont check if its busy at all, actually.

fine with reyk@


# 1.124 20-Nov-2015 dlg

dont play with IFF_OACTIVE needlessly.

only a driver sets or clears it, and trunk never sets it. therefore it
never needs to clear it.


# 1.123 12-Nov-2015 mpi

Prefix flowid with ph_ and print it in m_print().

ok dlg@


# 1.122 25-Oct-2015 mpi

arp_ifinit() is no longer required.


# 1.121 08-Oct-2015 mikeb

Make sure that when trunk_port_ioctl is called to set a new
lladdr the trunk port is already on the list.

OK mpi


# 1.120 08-Oct-2015 dlg

if the mbuf has a valid flowid, use it instead of using siphash24
and a bunch of header fields we have to parse the mbuf for.

siphash24 is about 20% of the cost of sending a udp packet on a
trunk interface with tcpbench on my box. if there's a flowid set
we get all that back.

ok mpi@ mikeb@ sthen@


# 1.119 05-Oct-2015 mikeb

Factor LACP frame processing out to a separate task

This is slightly refactored version of the diff by jmatthew@
that makes use of a single per-trunk task but retains per-port
mbuf queues.

Running LACP frame processing in a task context allows a simple
way to synchronize changes to the trunk ports and trunk itself
performed from the ioctl, timeout and task contexts with a kernel
lock.

OK mpi


# 1.118 29-Sep-2015 deraadt

add sizes to some of the simpler free calls
ok mpi


# 1.117 28-Sep-2015 mpi

Remove "if_tp" from the "struct ifnet".

Instead of violating a layer of abstraction by keeping per pseudo-driver
informations in "struct ifnet", the port trunk is now passed as a cookie
to the interface input handler (ifih).

The time of per pseudo-driver hack in the network stack is over!

ok mikeb@


# 1.116 24-Sep-2015 mikeb

add a comment explaining how we serialize when switching trunkproto;
requested by mpi@


# 1.115 24-Sep-2015 mikeb

Avoid a theoretical m_pullup(9) mishandling by delegating the mbuf
reclaiming to the PDU and marker input routines.

m_pullup may return a pointer to the newly allocated mbuf. In this
case m_freem is called by the trunk_input, not by the proto specific
code and pointer to the mbuf is not passed by reference. Therefore
m_freem will either be called on the middle element of the chain
(when the m_pullup call succeeds) or on the stale pointer (when it
frees the chain in the failure case). Fortunately we should never
hit this case as the receive path uniformly uses contiguous chunks
of memory.

Verified with and ok blambert, ok mpi


# 1.114 23-Sep-2015 mikeb

Serialize trunk changes with input handler insertion and removal.

This moves around calls to if_ih_insert and if_ih_remove to ensure
that we either have completed port initialization or are going to
tear the port configuration down and don't want any input processes
to get hold of the port.

When trunk_port_destroy is called from the ioctl this would wait for
all input processes to finish and release their references to be able
to disestablish the input handler and ensure full control of the port.

When switching trunkproto it is required for the ioctl context to
be able to touch all trunk ports and the protocol (tr_psc). The
easiest way do this is to disestablish all input handlers (while
making sure they all complete) and then reestablish them after the
trunk reconfiguration is completed.

This avoids getting trunk a separate locking protocol of its own.

ok mpi, suggested by and ok dlg


# 1.113 23-Sep-2015 mikeb

Keep track of an active port in the failover trunk to avoid list
iterations and additional locking protection in the future.

Suggested by and ok mpi


# 1.112 23-Sep-2015 mikeb

Remove trunk watchdog code since it doesn't do anything useful
and we want to limit the number of different places where we
access trunk port pointers.

trunk_watchdog should be never called as we don't set up it's
if_timer and trunk_port_watchdog just calls the if_watchdog
from the underlying interface.

It's possible that this is no longer needed due to if_slowtimo/
if_watchdog changes done earlier.

ok mpi


# 1.111 10-Sep-2015 mikeb

pass a cookie argument to interface input handlers that can be used
to pass additional context or transient data with the similar life
time.

ok mpi, suggestions, hand holding and ok from dlg


# 1.110 10-Sep-2015 dlg

move the if input handler list to an SRP list.

instead of having every driver that manipulates the ifih list
understand SRPLs, this moves that processing into if_ih_insert and
if_ih_remove functions.

we rely on the kernel lock to serialise the modifications to the
list.

tested by mpi@
ok mpi@ claudio@ mikeb@


Revision tags: OPENBSD_5_8_BASE
# 1.109 17-Jul-2015 mpi

Drop promiscuously received packets if the trunk(4) interface is not
in promiscuous mode.

The long story is that claudio@ had his ssh session reset multiple
times in the hackroom because czarkoff@'s machine was sending reset.
We figured out that the packet was reaching pf because of this missing
check. pf would then not find any state and sent a reset.

Analyzed with and ok phessler@, claudio@


# 1.108 02-Jul-2015 mpi

Unify the check for up & running between all pseudo-drivers.


# 1.107 02-Jul-2015 mpi

By design if_input_process() needs to hold a reference on the receiving
ifp in order to access its ifih handlers.

So get rid of if_get() in the various ifih handlers we know the ifp is
live at this point.

ok dlg@


# 1.106 30-Jun-2015 mpi

Rename if_output() into if_enqueue() to avoid confusion with comments
talking about (*ifp->if_output)().

ok claudio@, dlg@


# 1.105 29-Jun-2015 dlg

count if_ibytes in if_input like we do for if_ipackets.

tweaks and ok mpi@


# 1.104 24-Jun-2015 mpi

Increment if_ipackets in if_input().

Note that pseudo-drivers not using if_input() are not affected by this
conversion.

ok mikeb@, kettenis@, claudio@, dlg@


# 1.103 16-Jun-2015 mpi

Store a unique ID, an interface index, rather than a pointer to the
receiving interface in the packet header of every mbuf.

The interface pointer should now be retrieved when necessary with
if_get(). If a NULL pointer is returned by if_get(), the interface
has probably been destroy/removed and the mbuf should be freed.

Such mechanism will simplify garbage collection of mbufs and limit
problems with dangling ifp pointers.

Tested by jmatthew@ and krw@, discussed with many.

ok mikeb@, bluhm@, dlg@


# 1.102 15-Jun-2015 mpi

Fix a double free in the destroy path triggered when a second process,
in my case dhclient(8), races with ifconfig(8) to free the descriptors
of the joined multicast groups.

While here reduce the difference with carp(4).

ok dms@


# 1.101 09-Jun-2015 mpi

Convert trunk(4) to if_input().

ok dlg@


# 1.100 26-May-2015 mpi

Now that the Ethernet header is always passed as part of the mbuf, kill
the second (unused) argument of the input packet handlers.

ok dlg@


# 1.99 15-May-2015 mpi

Introduce if_output(), a function do to the last steps before enqueuing
a packet on the sending queue of an interface.

Tested by many, thanks a lot!

ok dlg@, claudio@


# 1.98 14-May-2015 mpi

Allocate the input packet handler as part of the trunk_port structure
since they have the same lifetime.

Requested by and ok dlg@


# 1.97 13-May-2015 mpi

Get rid of the last "#if NTRUNK" by overwriting trunk ports' output
function.

ok claudio@, reyk@


# 1.96 11-May-2015 mpi

Take trunk(4) out of ether_input().

Each physical interface of a trunk now gets a specific ifih (interface
input handler) that runs before ether_input().

Tested by sthen@, dlg@, Theo Buehler and <mxb AT alumni.chalmers DOT se>

ok sthen@, dlg@


# 1.95 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.94 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.93 04-Dec-2014 tedu

use siphash for trunk loadbalancing. ok deraadt


# 1.92 01-Dec-2014 mikeb

Make every interface with a watchdog register it's own slow timeout

This removes the system wide if_slowtimo timeout and lets every
interface with a valid if_watchdog method register it's own in
order to get rid of the ifnet loop in the softclock context and
avoid further complications with concurrent access to the ifnet
list.

ok deraadt, input and ok mpi, looked at by claudio


# 1.91 18-Nov-2014 tedu

move arc4random prototype to systm.h. more appropriate for most code
to include that than rdnvar.h. ok deraadt dlg


Revision tags: OPENBSD_5_6_BASE
# 1.90 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.89 12-Jul-2014 tedu

add a size argument to free. will be used soon, but for now default to 0.
after discussions with beck deraadt kettenis.


# 1.88 09-Jul-2014 henning

bpf code surgery / shuffling / simplification.
the various bpf_mtap_* are very similiar, they differ in what (and to some
extent how) they prepend something, and what copy function they pass to
bpf_catchpacket.
use an internal _bpf_mtap as "backend" for bpf_mtap and friends.
extend bpf_mtap_hdr so that it covers all common cases:
if dlen is 0, nothing gets prepended.
copy function can be given, if NULL the default bpf_mcopy is used.
adjust the existing bpf_mtap_hdr users to pass a NULL ptr for the copy fn.
re-implement bpf_mtap_af as simple wrapper for bpf_mtap_hdr.
re-implement bpf_mtap_ether using bpf_map_hdr
re-implement bpf_mtap_pflog as trivial bpf_mtap_hdr wrapper
ok bluhm benno


# 1.87 10-Mar-2014 mpi

if_lladdr -> if_sadl, no functional change.

ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.86 21-Nov-2013 mpi

Remove unneeded include.

ok deraadt@


# 1.85 18-Nov-2013 mpi

Convert trunk(4) to use a detachhook, discussed at b2k13 with many.

While here add a comment explaining detach hooks' order of execution when
destroying/detaching an interface.


Revision tags: OPENBSD_5_4_BASE
# 1.84 20-Jun-2013 mpi

Revert previous and unbreak asr, the new include should be protected.

Reported by naddy@


# 1.83 20-Jun-2013 mpi

Allocate the various hook head descriptors as part of the ifnet
structure rather than doing various M_WAITOK allocations during
the *attach() functions, we always rely on them anyway.

ok mikeb@, uebayasi@


# 1.82 11-May-2013 sthen

Set trunk(4)'s MTU to that of the first trunkport. Allows trunk to work with
jumbo/baby-jumbo frames. To avoid problems with mismatches between trunkports,
any additional ports must have the same MTU as already set on the trunk(4).
Based on changes made in FreeBSD. Tested by myself and jj@, ok reyk@


# 1.81 02-Apr-2013 mpi

Instead of storing the link-level address of every interface in a global
array indexed by interface numbers, add a new field to the interface
descriptor pointing to it.

claudio@ and todd@ like it, ok mikeb@


# 1.80 28-Mar-2013 tedu

no need for a lot of code to include proc.h


Revision tags: OPENBSD_5_3_BASE
# 1.79 25-Feb-2013 dlg

trunk_bcast_start sent packets on all its member interfaces by copying
the mbuf it just sent on the previous interface. this is bad because the
previous interface could have modified the mbuf chain, which can make the
subsequent m_copym()s panic.

this copies the dance that rtsock.c does for broadcasting mbufs which
copies the mbuf before transmit, except for the last interface which it
handles outside the loop.

tested by halex@ who verified it fixes his panic.
ok claudio@ deraadt@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.78 28-Oct-2011 krw

Take more care to ensure all callbacks are initialized. In particular
tr_linkstate() was not initialized in trunk_rr_attach(), and
tr_init() and tr_stop() were missing in trunk_lb_attach().

Fixes crash triggered by changing trunkproto, reported by Anders
Berggren on bugs@.

ok mpf henning


Revision tags: OPENBSD_4_9_BASE OPENBSD_5_0_BASE
# 1.77 28-Jan-2011 reyk

Fix another memory leak by replacing m_free() with m_freem() in trunk's
broadcast mode.

ok blambert@ mikeb@


# 1.76 12-Nov-2010 dhill

avoid pointer arithmetic on void *
no change in .o

ok claudio


Revision tags: OPENBSD_4_8_BASE
# 1.75 08-May-2010 stsp

Upon changing the MAC address of an if_trunk interface, all ports are switched
to the new MAC. But subsequently added ports were still being assigned the
old MAC address because it was copied from the wrong place. Give newly added
trunk ports the current MAC of the master port, rather than the saved MAC of
the master port. The saved MAC should only be used to restore the original
MAC address of the interface when it is removed from the trunk.

ok claudio@


# 1.74 23-Apr-2010 stsp

Use proper Queen's English in a comment. Drive-by fix, no functional change.


# 1.73 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


# 1.72 17-Apr-2010 deraadt

use ifnewlladdr() for trunk lladdr changes, too
ok stsp


Revision tags: OPENBSD_4_7_BASE
# 1.71 12-Jan-2010 dlg

set the length of the send queue to 1.

this prevents the ultimate length of the queue of the underlying interface
from being artificially inflated while hte vlan/trunk queue is filled and
then dumped wholesale on the underlying interface, which will dump its
massive queue wholesale on the chip.

tx mitigation is only triggered on real interfaces now (which is where the
cost is)

ok beck@ original diff ok kjc@ henning@


# 1.70 18-Nov-2009 deraadt

do not do setup that ether_ifattach() takes care of; ok jsg


# 1.69 17-Sep-2009 claudio

Add an splassert check trunk_enqueue() calling this function at anything
below splnet() is a good recipe for doom.
OK henning, reyk, mpf


# 1.68 09-Sep-2009 reyk

remove inline functions and move some code from the trunk_lacp_input()
API function directly to lacp_input() to simplify the code path.

ok mpf@


# 1.67 16-Jul-2009 thib

Backout rev1.79 of if_vlan.c and rev1.66 of if_trunk.c;
Changes in those revision limited the send queue to one slot.

This breaks NFS over vlan(4) has discovered by sthen@.

"just plain back it out." deraadt@


# 1.66 13-Jul-2009 dlg

make the send queue one slot long. this forces packets off the virtual
interfaces down to the queue on the physical interface immediately.

this avoids having the tx mitigation code wasting cpu time dicking around
with simply shuffling packets off virtual interface queues and lets it
do its job of ammortising the cost of calling a real interfaces start
routine.

it also prevents an artificial inflation of the physical interfaces queue
length where packets could hide on the virtual interfaces queues during
softnet before being dumped en masse onto the hardware. this will smooth
out the rate at which packets are submitted to the hardware.

kjc@ says this has no impact on altq. ya henning@


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.65 27-Jan-2009 naddy

handle HW VLAN tags being passed down; from Brad


# 1.64 27-Jan-2009 naddy

make the hardware/no hardware tag stripping cases consistent and don't
hash the VLAN priority; ok henning@


# 1.63 14-Dec-2008 brad

Allow trunk_hashmbuf() to take HW VLAN tagging into consideration.

ok mpf@ naddy@


# 1.62 14-Dec-2008 brad

Since trunk_hashmbuf() and thus trunk_lb_gethdr() are no longer specific
to the loadbalance code rename trunk_lb_gethdr() to just trunk_gethr().

ok mpf@


# 1.61 28-Nov-2008 brad

Eliminate the redundant bits of code for MTU and multicast handling
from the individual drivers now that ether_ioctl() handles this.

Shrinks the i386 kernels by..
RAMDISK - 2176 bytes
RAMDISKB - 1504 bytes
RAMDISKC - 736 bytes

Tested by naddy@/okan@/sthen@/brad@/todd@/jmc@ and lots of users.
Build tested on almost all archs by todd@/brad@

ok naddy@


# 1.60 16-Nov-2008 brad

Make sure to increment the the output error counter if
not using TRUNK_PROTO_NONE and there are no member ports.

ok mpf@


# 1.59 08-Nov-2008 mpf

Take into account that our ether_input() already strips the
ethernet header. This lets us actually process the incoming
LACP-Packets. It should now work with a lot more switches.
At least a Catalyst 3500 seems happy.
OK brad@


# 1.58 04-Nov-2008 brad

Move the trunk port count check from trunk_lb_start() to trunk_start()
before the protocol start routine is called so as to cover all protocols
with the same check.

ok mpf@


# 1.57 30-Oct-2008 brad

Fix building with !INET6 kernels.


# 1.56 28-Oct-2008 brad

Remove return at the end of a void function.


# 1.55 28-Oct-2008 brad

In trunk_media_status() mark the interface as active if any ports are
active rather than just the primary being UP.

From FreeBSD

Ok mpf@


# 1.54 28-Oct-2008 brad

In trunk_lb_start() port % count will never be greater than
TRUNK_MAX_PORTS so nuke the test.

From FreeBSD

Ok mpf@


# 1.53 28-Oct-2008 brad

Feed IPv6 flow label to hash calculation.

From FreeBSD

Ok mpf@


# 1.52 28-Oct-2008 brad

Show the ACTIVE flag in ifconfig for the single interface that is
actually active in failover mode rather than all interfaces with a
link. This makes it clear if the master interface is in use or one
of the backup links.

From FreeBSD

Tested by jmc@
Ok mpf@


# 1.51 02-Oct-2008 brad

First step towards cleaning up the Ethernet driver ioctl handling.
Move calling ether_ioctl() from the top of the ioctl function, which
at the moment does absolutely nothing, to the default switch case.
Thus allowing drivers to define their own ioctl handlers and then
falling back on ether_ioctl(). The only functional change this results
in at the moment is having all Ethernet drivers returning the proper
errno of ENOTTY instead of EINVAL/ENXIO when encountering unknown
ioctl's.

Shrinks the i386 kernels by..
RAMDISK - 1024 bytes
RAMDISKB - 1120 bytes
RAMDISKC - 832 bytes

Tested by martin@/jsing@/todd@/brad@
Build tested on almost all archs by todd@/brad@

ok jsing@


# 1.50 17-Sep-2008 chl

remove dead stores and newly created unused variables.

fix potential use of uninitialized value in trunk_port_ioctl() function.

Found by LLVM/Clang Static Analyzer.

ok mpf@ henning@


# 1.49 07-Aug-2008 damien

do not touch m after IFQ_ENQUEUE()+if_start().

ok brad@, mpf@, henning@, reyk@


Revision tags: OPENBSD_4_4_BASE
# 1.48 06-Aug-2008 reyk

fix trunk breakage that sneaked in with the lacp diff:

- don't use in-kernel IFMEDIA ioctls in lacp and remove two KASSERTs
that caused reliable panics - the lacp key can be locally assigned and
we don't need to query the media subtype here.

- unbreak failover/loadbalance/broadcast status handling.

Reported by brad@
ok deraadt@


# 1.47 30-Jul-2008 mpf

Prevent a divide by zero panic if trunkproto loadbalance is
used w/out any trunkports. Patch from Dmitri Alenitchev.
OK reyk@


# 1.46 15-Jun-2008 mpf

Add 802.3ad LACP support for trunk(4).
Implementation from NetBSD. Ported via FreeBSD's version in trunk^Wlagg(4).
This is still work in progress. Tested with a HP ProCurve 3500.
OK reyk@


# 1.45 14-Jun-2008 mpf

Move bpf_mtap_hdr() above the trunk_*_input() routines.
This makes it easier to add trunk protocols that consume some packets.
Add a special case for the failover protocol, to prevent shoving
duplicates to bpf. (Not beautiful, but it has to do for the moment).
OK reyk@, claudio@


# 1.44 13-Jun-2008 mpf

Move the responsibility to free received packets on trunked interfaces
from ether_input() into trunk_input() where it can be handled in a smarter way.
OK claudio@ and reyk@ on an earlier version.


# 1.43 08-Jun-2008 brad

Use m_freem() instead of m_free() in trunk_start() to ensure that the
full mbuf chain is being free'd.

ok reyk@


# 1.42 07-May-2008 dlg

enable tx mitigation when putting packets on the wire by switching from
calls to ifp->if_start to if_start(). these are the obviously right cases
where we can do that, the less obvious ones may follow as theyre figured
out.

deraadt@ said to go for it


Revision tags: OPENBSD_4_3_BASE
# 1.41 10-Jan-2008 brad

return with ENOTTY instead of EINVAL for unknown ioctl requests to trunk
ports.

ok reyk@ dlg@


# 1.40 26-Nov-2007 martynas

typos; ok jmc@
sys/netinet/in_pcb.c and sys/net/bridgestp.c ok henning@
sys/dev/pci/bktr/* ok jakemsr@


# 1.39 20-Nov-2007 canacar

Fix possible mbuf leak on error. ok reyk@


# 1.38 22-Oct-2007 reyk

use the input mbuf for the first active port instead of copying it in
the broadcast start routing.

ok pyr@


# 1.37 22-Oct-2007 pyr

Add a broadcast mode to trunk(4). This mode sends frames on all
ports and receives frame on any port. This allows interaction with
some L2 configurations.
with input and ok reyk@


# 1.36 15-Sep-2007 henning

malloc sweep:
-remove useless casts
-MALLOC/FREE -> malloc/free
-use M_ZERO where appropriate instead of seperate bzero
feedback & ok krw, hshoexer


# 1.35 07-Sep-2007 reyk

use M_ZERO


# 1.34 06-Sep-2007 reyk

bump the copyright while touching these files


# 1.33 06-Sep-2007 reyk

With a trunk(4) interface in fail over mode the trunk(4) interface
will show input errors for packets received from any of the ports that
are part of a fail over interface but are not the "master" port at the
time. This fixes the problem by checking the error condition
correctly.

From brad at comstyle dot com


Revision tags: OPENBSD_4_2_BASE
# 1.32 26-May-2007 jason

one extern seems to be better than 20 for ifqmaxlen; ok krw


# 1.31 26-Apr-2007 reyk

extend the trunk protocol API with some additional callbacks required
for future work. also move the repeated tx start code into a common
function.

parts of it are merged from FreeBSD's trunk(4) port. oh, wait... they
renamed it to 'lagg(4)' because a little green guy from Cizzco-Eeeh
told them "trunk is for VLANs, trunk is for VLANs". Bad FreeBSD, don't
listen to the little green guy from Cizzco-Eeeh!

ok claudio@


Revision tags: OPENBSD_4_1_BASE
# 1.30 31-Jan-2007 reyk

handle the full duplex link state in trunk(4). load sharing trunks
with at least two ports are always handled as full fuplex links. this
change will allow trunks as edge ports in a rstp bridge(4).

ok brad@ pyr@


Revision tags: OPENBSD_4_0_BASE
# 1.29 28-May-2006 reyk

check if the interface is active and UP. some, but not all, network
drivers report an active link state even if the interface is DOWN.
this should fix trunk with various ethernet devices.

ok brad@


# 1.28 23-May-2006 reyk

knf and remove an unneeded debug message


# 1.27 23-May-2006 reyk

add


# 1.26 20-May-2006 reyk

bump copyright


# 1.25 20-May-2006 reyk

drop packets received on inactive failover ports without increasing the
error counter. just silently drop them...

figured out by todd@, ok brad@


# 1.24 16-May-2006 reyk

the ifp->if_linkstatehooks may be NULL, add an extra check to avoid
possible kernel panic. this happened to me by using tun(4) in layer 2
mode (link0 flag) as a trunk port for testing.


# 1.23 25-Mar-2006 djm

allow bpf(4) to ignore packets based on their direction (inbound or
outbound), using a new BIOCSDIRFILT ioctl;
guidance, feedback and ok canacar@


# 1.22 11-Mar-2006 brad

splimp -> splnet


# 1.21 04-Mar-2006 brad

With the exception of two other small uncommited diffs this moves
the remainder of the network stack from splimp to splnet.

ok miod@


Revision tags: OPENBSD_3_9_BASE
# 1.20 04-Jan-2006 brad

Move bpf_mtap_hdr() after trunk_*_input() so that we hopefully see only one
copy of the packet with bpf.

From mcbride@

some testing by todd@, ok reyk@


# 1.19 04-Jan-2006 canacar

Remove redundant calls to bpfdetach.
ok brad@


# 1.18 21-Dec-2005 reyk

knf


# 1.17 21-Dec-2005 reyk

fix possible NULL pointer, thanks to Marco Molteni


# 1.16 18-Dec-2005 reyk

Update my e-mail address in the copyright statement, no binary changes.


# 1.15 17-Dec-2005 brad

revert last commit as it introduced system panics due to improper use
of bpf_mtap().


# 1.14 27-Nov-2005 mcbride

Move bpf_mtap() after trunk_*_input() so that we hopefull see only one
copy of the packet with bpf.

ok reyk@


# 1.13 27-Nov-2005 mcbride

Attempt to accept a packet only once when operating in failover mode.

Makes trunk usable with hubs or switches which don't have actual trunk support.

ok reyk@


# 1.12 27-Nov-2005 mcbride

Fix dereference of uninitialised pointer in trunk_input() error path.

ok reyk@


# 1.11 27-Nov-2005 mcbride

Make the trunk interface link state depend on the link states of the
trunkports (link is UP as long as at least one of the trunkports is up)

ok reyk@


# 1.10 23-Oct-2005 mpf

Rework of multicast deletion code for vlan(4) and trunk(4).
The previous code could wrongly delete multicast groups
on the parent interface. Now we forward only remembered
delete requests.
OK mcbride, mickey.


# 1.9 09-Oct-2005 reyk

use label


# 1.8 03-Oct-2005 reyk

add a simple active "failover" trunk mode. port priorities will be
added later, currently the master port will always be the default
(active) port and the the next active port will be used as the
failover port.

ok brad@


# 1.7 14-Sep-2005 reyk

correctly set IFF_RUNNING flag after device state changes.

ok brad@


# 1.6 11-Sep-2005 brad

when adding any IP addresses make sure to UP the interface.

ok reyk@


# 1.5 10-Sep-2005 reyk

update the trunk(4) driver

- add multicast support by passing multicast addresses to the ports.
this is a requirement for carp(4) over trunk(4).

- support the smallest common interface capabilities. ie., this adds
support for VLAN MTUs if all attached ports have this capability.

- add a port_destroy callback to the trunk protocol. this fixes a
potential crash if the master port has been detached while running.

discussed with deraadt@, brad@ and some others.


Revision tags: OPENBSD_3_8_BASE
# 1.4 31-Jul-2005 pascoe

Introduce bpf_mtap_af and bpf_mtap_hdr to be used when passing a mbuf chain
to bpf with either an address family or other header added.

These helpers only allocate a much smaller struct m_hdr on the stack when
needed, rather than leaving 256 byte struct mbufs on the stack in deep
call paths. Also removes a fair bit of duplicated code.

commit now, tune after deraadt@


# 1.3 27-May-2005 reyk

add missing free on error. thanks to Andrey Matveev.


# 1.2 24-May-2005 reyk

support trunk stacking (trunks as trunk ports) and some fixes

ok brad@


# 1.1 24-May-2005 reyk

initial import of a trunking (link aggregation and link failover)
implementation. it currently supports round robin mode with link state
checking, additional modes will be added later.

ok brad@, deraadt@


# 1.140 11-May-2019 florian

A trunk(4) usually stays up when the link state of one of its members
changes. While we do get RTM_IFINFO messages for the (physical) member
interfaces there is no indication that something changed from the
trunk(4) interface.
It is helpful to get this information in userland from the trunk so that
userland daemons do not need to track interface membership by themselves.
OK phessler


# 1.139 29-Apr-2019 dlg

tr_unit is unused, so gc it


# 1.138 23-Apr-2019 dlg

a first cut at converting some virtual ethernet interfaces to if_vinput

this let's input processing bypass ifiqs. there's a performance
benefit from this, and it will let me tweak the backpressure detection
mechanism that ifiqs use without impacting on a stack of virtual
interfaces.

ive tested all of these except mpw, which i will end up testing
soon anyway.


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.137 12-Aug-2018 ccardenas

Add administrative options to LACP trunk implementation.

The trunk driver now has a new ioctl (SIOCxTRUNKOPTS), which for now only
has options for LACP:
* Mode - Active or Passive (default Active)
* Timeout - Fast or Slow (default Slow)
* System Priority - 1(high) to 65535(low) (default 32768/0x8000)
* Port Priority - 1(high) to 65535(low) (default 32768/0x8000)
* IFQ Priority - 0 to NUM_QUEUES (default 6)

At the moment, ifconfig only has options for lacpmode and lacptimeout
plumbed as those are the immediate need.

The approach taken for the options was to make them on a "trunk" vs a
"port" as what's typically seen on various NOSes (JunOS, NXOS, etc...)
as it's uncommon for a host to have one link "Passive" and the other
"Active" in a given trunk.

Just like on a NOS, when applying lacpmode or lacptimeout, the settings
are immediately applied to all existing ports in the trunk and to all
future ports brought into the trunk.

Tested by many on a plethora of NIC drivers and switches.

Ok remi@


Revision tags: OPENBSD_6_3_BASE
# 1.136 19-Feb-2018 mpi

Remove almost unused `flags' argument of suser().

The account flag `ASU' will no longer be set but that makes suser()
mpsafe since it no longer mess with a per-process field.

No objection from millert@, ok tedu@, bluhm@


# 1.135 09-Jan-2018 bluhm

Creating a cloned interface could return ENOMEM due to temporary
memory shortage. As it is invoked from a system call, it should
not fail and wait instead.
OK visa@ mpi@


Revision tags: OPENBSD_6_2_BASE
# 1.134 14-Aug-2017 reyk

The "ret" return value is reused and overwritten, potentially
returning 0 (success) on error instead of an error number. The caller
doesn't evaluate the return value, so it is good enough to return
ENOBUFS (non-0) on error and to remove "ret" in trunk_cast_start().

Coverity CID 1453105; Severity: Minor

OK mpi@


# 1.133 11-Aug-2017 mpi

Remove NET_LOCK()'s argument.

Tested by Hrvoje Popovski, ok bluhm@


# 1.132 28-May-2017 mpi

Add missing NET_UNLOCK() in error path.

Spotted by sashan@


# 1.131 28-May-2017 mpi

trunk_port_destroy() needs the NET_LOCK().

It brings the interface down and restore the original lladdr.

Found by Hrvoje Popovski


# 1.130 28-May-2017 mpi

Remove useless splnet()/splx() dances.

Data structures modified in the ioctl path are protected by the NET_LOCK().

ok sashan@


Revision tags: OPENBSD_6_1_BASE
# 1.129 22-Jan-2017 dlg

move counting if_opackets next to counting if_obytes in if_enqueue.

this means packets are consistently counted in one place, unlike the
many and various ways that drivers thought they should do it.

ok mpi@ deraadt@


# 1.128 16-Sep-2016 mikeb

Reconfigure interface capabilities after switching trunkproto; ok mpi


Revision tags: OPENBSD_6_0_BASE
# 1.127 13-Apr-2016 mpi

We're always ready! So send IFQ_SET_READY() to the bitbucket.


Revision tags: OPENBSD_5_9_BASE
# 1.126 31-Dec-2015 sthen

Move tr_port_destroy down; fixes 'lacp_compose_key protection fault trap'
when removing a port from a lacp trunk. Part of a larger diff from mpi,
as suggested by mikeb. ok mpi@


# 1.125 21-Nov-2015 dlg

dont check IFF_OACTIVE to see if the port is busy.

dont check if its busy at all, actually.

fine with reyk@


# 1.124 20-Nov-2015 dlg

dont play with IFF_OACTIVE needlessly.

only a driver sets or clears it, and trunk never sets it. therefore it
never needs to clear it.


# 1.123 12-Nov-2015 mpi

Prefix flowid with ph_ and print it in m_print().

ok dlg@


# 1.122 25-Oct-2015 mpi

arp_ifinit() is no longer required.


# 1.121 08-Oct-2015 mikeb

Make sure that when trunk_port_ioctl is called to set a new
lladdr the trunk port is already on the list.

OK mpi


# 1.120 08-Oct-2015 dlg

if the mbuf has a valid flowid, use it instead of using siphash24
and a bunch of header fields we have to parse the mbuf for.

siphash24 is about 20% of the cost of sending a udp packet on a
trunk interface with tcpbench on my box. if there's a flowid set
we get all that back.

ok mpi@ mikeb@ sthen@


# 1.119 05-Oct-2015 mikeb

Factor LACP frame processing out to a separate task

This is slightly refactored version of the diff by jmatthew@
that makes use of a single per-trunk task but retains per-port
mbuf queues.

Running LACP frame processing in a task context allows a simple
way to synchronize changes to the trunk ports and trunk itself
performed from the ioctl, timeout and task contexts with a kernel
lock.

OK mpi


# 1.118 29-Sep-2015 deraadt

add sizes to some of the simpler free calls
ok mpi


# 1.117 28-Sep-2015 mpi

Remove "if_tp" from the "struct ifnet".

Instead of violating a layer of abstraction by keeping per pseudo-driver
informations in "struct ifnet", the port trunk is now passed as a cookie
to the interface input handler (ifih).

The time of per pseudo-driver hack in the network stack is over!

ok mikeb@


# 1.116 24-Sep-2015 mikeb

add a comment explaining how we serialize when switching trunkproto;
requested by mpi@


# 1.115 24-Sep-2015 mikeb

Avoid a theoretical m_pullup(9) mishandling by delegating the mbuf
reclaiming to the PDU and marker input routines.

m_pullup may return a pointer to the newly allocated mbuf. In this
case m_freem is called by the trunk_input, not by the proto specific
code and pointer to the mbuf is not passed by reference. Therefore
m_freem will either be called on the middle element of the chain
(when the m_pullup call succeeds) or on the stale pointer (when it
frees the chain in the failure case). Fortunately we should never
hit this case as the receive path uniformly uses contiguous chunks
of memory.

Verified with and ok blambert, ok mpi


# 1.114 23-Sep-2015 mikeb

Serialize trunk changes with input handler insertion and removal.

This moves around calls to if_ih_insert and if_ih_remove to ensure
that we either have completed port initialization or are going to
tear the port configuration down and don't want any input processes
to get hold of the port.

When trunk_port_destroy is called from the ioctl this would wait for
all input processes to finish and release their references to be able
to disestablish the input handler and ensure full control of the port.

When switching trunkproto it is required for the ioctl context to
be able to touch all trunk ports and the protocol (tr_psc). The
easiest way do this is to disestablish all input handlers (while
making sure they all complete) and then reestablish them after the
trunk reconfiguration is completed.

This avoids getting trunk a separate locking protocol of its own.

ok mpi, suggested by and ok dlg


# 1.113 23-Sep-2015 mikeb

Keep track of an active port in the failover trunk to avoid list
iterations and additional locking protection in the future.

Suggested by and ok mpi


# 1.112 23-Sep-2015 mikeb

Remove trunk watchdog code since it doesn't do anything useful
and we want to limit the number of different places where we
access trunk port pointers.

trunk_watchdog should be never called as we don't set up it's
if_timer and trunk_port_watchdog just calls the if_watchdog
from the underlying interface.

It's possible that this is no longer needed due to if_slowtimo/
if_watchdog changes done earlier.

ok mpi


# 1.111 10-Sep-2015 mikeb

pass a cookie argument to interface input handlers that can be used
to pass additional context or transient data with the similar life
time.

ok mpi, suggestions, hand holding and ok from dlg


# 1.110 10-Sep-2015 dlg

move the if input handler list to an SRP list.

instead of having every driver that manipulates the ifih list
understand SRPLs, this moves that processing into if_ih_insert and
if_ih_remove functions.

we rely on the kernel lock to serialise the modifications to the
list.

tested by mpi@
ok mpi@ claudio@ mikeb@


Revision tags: OPENBSD_5_8_BASE
# 1.109 17-Jul-2015 mpi

Drop promiscuously received packets if the trunk(4) interface is not
in promiscuous mode.

The long story is that claudio@ had his ssh session reset multiple
times in the hackroom because czarkoff@'s machine was sending reset.
We figured out that the packet was reaching pf because of this missing
check. pf would then not find any state and sent a reset.

Analyzed with and ok phessler@, claudio@


# 1.108 02-Jul-2015 mpi

Unify the check for up & running between all pseudo-drivers.


# 1.107 02-Jul-2015 mpi

By design if_input_process() needs to hold a reference on the receiving
ifp in order to access its ifih handlers.

So get rid of if_get() in the various ifih handlers we know the ifp is
live at this point.

ok dlg@


# 1.106 30-Jun-2015 mpi

Rename if_output() into if_enqueue() to avoid confusion with comments
talking about (*ifp->if_output)().

ok claudio@, dlg@


# 1.105 29-Jun-2015 dlg

count if_ibytes in if_input like we do for if_ipackets.

tweaks and ok mpi@


# 1.104 24-Jun-2015 mpi

Increment if_ipackets in if_input().

Note that pseudo-drivers not using if_input() are not affected by this
conversion.

ok mikeb@, kettenis@, claudio@, dlg@


# 1.103 16-Jun-2015 mpi

Store a unique ID, an interface index, rather than a pointer to the
receiving interface in the packet header of every mbuf.

The interface pointer should now be retrieved when necessary with
if_get(). If a NULL pointer is returned by if_get(), the interface
has probably been destroy/removed and the mbuf should be freed.

Such mechanism will simplify garbage collection of mbufs and limit
problems with dangling ifp pointers.

Tested by jmatthew@ and krw@, discussed with many.

ok mikeb@, bluhm@, dlg@


# 1.102 15-Jun-2015 mpi

Fix a double free in the destroy path triggered when a second process,
in my case dhclient(8), races with ifconfig(8) to free the descriptors
of the joined multicast groups.

While here reduce the difference with carp(4).

ok dms@


# 1.101 09-Jun-2015 mpi

Convert trunk(4) to if_input().

ok dlg@


# 1.100 26-May-2015 mpi

Now that the Ethernet header is always passed as part of the mbuf, kill
the second (unused) argument of the input packet handlers.

ok dlg@


# 1.99 15-May-2015 mpi

Introduce if_output(), a function do to the last steps before enqueuing
a packet on the sending queue of an interface.

Tested by many, thanks a lot!

ok dlg@, claudio@


# 1.98 14-May-2015 mpi

Allocate the input packet handler as part of the trunk_port structure
since they have the same lifetime.

Requested by and ok dlg@


# 1.97 13-May-2015 mpi

Get rid of the last "#if NTRUNK" by overwriting trunk ports' output
function.

ok claudio@, reyk@


# 1.96 11-May-2015 mpi

Take trunk(4) out of ether_input().

Each physical interface of a trunk now gets a specific ifih (interface
input handler) that runs before ether_input().

Tested by sthen@, dlg@, Theo Buehler and <mxb AT alumni.chalmers DOT se>

ok sthen@, dlg@


# 1.95 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.94 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.93 04-Dec-2014 tedu

use siphash for trunk loadbalancing. ok deraadt


# 1.92 01-Dec-2014 mikeb

Make every interface with a watchdog register it's own slow timeout

This removes the system wide if_slowtimo timeout and lets every
interface with a valid if_watchdog method register it's own in
order to get rid of the ifnet loop in the softclock context and
avoid further complications with concurrent access to the ifnet
list.

ok deraadt, input and ok mpi, looked at by claudio


# 1.91 18-Nov-2014 tedu

move arc4random prototype to systm.h. more appropriate for most code
to include that than rdnvar.h. ok deraadt dlg


Revision tags: OPENBSD_5_6_BASE
# 1.90 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.89 12-Jul-2014 tedu

add a size argument to free. will be used soon, but for now default to 0.
after discussions with beck deraadt kettenis.


# 1.88 09-Jul-2014 henning

bpf code surgery / shuffling / simplification.
the various bpf_mtap_* are very similiar, they differ in what (and to some
extent how) they prepend something, and what copy function they pass to
bpf_catchpacket.
use an internal _bpf_mtap as "backend" for bpf_mtap and friends.
extend bpf_mtap_hdr so that it covers all common cases:
if dlen is 0, nothing gets prepended.
copy function can be given, if NULL the default bpf_mcopy is used.
adjust the existing bpf_mtap_hdr users to pass a NULL ptr for the copy fn.
re-implement bpf_mtap_af as simple wrapper for bpf_mtap_hdr.
re-implement bpf_mtap_ether using bpf_map_hdr
re-implement bpf_mtap_pflog as trivial bpf_mtap_hdr wrapper
ok bluhm benno


# 1.87 10-Mar-2014 mpi

if_lladdr -> if_sadl, no functional change.

ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.86 21-Nov-2013 mpi

Remove unneeded include.

ok deraadt@


# 1.85 18-Nov-2013 mpi

Convert trunk(4) to use a detachhook, discussed at b2k13 with many.

While here add a comment explaining detach hooks' order of execution when
destroying/detaching an interface.


Revision tags: OPENBSD_5_4_BASE
# 1.84 20-Jun-2013 mpi

Revert previous and unbreak asr, the new include should be protected.

Reported by naddy@


# 1.83 20-Jun-2013 mpi

Allocate the various hook head descriptors as part of the ifnet
structure rather than doing various M_WAITOK allocations during
the *attach() functions, we always rely on them anyway.

ok mikeb@, uebayasi@


# 1.82 11-May-2013 sthen

Set trunk(4)'s MTU to that of the first trunkport. Allows trunk to work with
jumbo/baby-jumbo frames. To avoid problems with mismatches between trunkports,
any additional ports must have the same MTU as already set on the trunk(4).
Based on changes made in FreeBSD. Tested by myself and jj@, ok reyk@


# 1.81 02-Apr-2013 mpi

Instead of storing the link-level address of every interface in a global
array indexed by interface numbers, add a new field to the interface
descriptor pointing to it.

claudio@ and todd@ like it, ok mikeb@


# 1.80 28-Mar-2013 tedu

no need for a lot of code to include proc.h


Revision tags: OPENBSD_5_3_BASE
# 1.79 25-Feb-2013 dlg

trunk_bcast_start sent packets on all its member interfaces by copying
the mbuf it just sent on the previous interface. this is bad because the
previous interface could have modified the mbuf chain, which can make the
subsequent m_copym()s panic.

this copies the dance that rtsock.c does for broadcasting mbufs which
copies the mbuf before transmit, except for the last interface which it
handles outside the loop.

tested by halex@ who verified it fixes his panic.
ok claudio@ deraadt@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.78 28-Oct-2011 krw

Take more care to ensure all callbacks are initialized. In particular
tr_linkstate() was not initialized in trunk_rr_attach(), and
tr_init() and tr_stop() were missing in trunk_lb_attach().

Fixes crash triggered by changing trunkproto, reported by Anders
Berggren on bugs@.

ok mpf henning


Revision tags: OPENBSD_4_9_BASE OPENBSD_5_0_BASE
# 1.77 28-Jan-2011 reyk

Fix another memory leak by replacing m_free() with m_freem() in trunk's
broadcast mode.

ok blambert@ mikeb@


# 1.76 12-Nov-2010 dhill

avoid pointer arithmetic on void *
no change in .o

ok claudio


Revision tags: OPENBSD_4_8_BASE
# 1.75 08-May-2010 stsp

Upon changing the MAC address of an if_trunk interface, all ports are switched
to the new MAC. But subsequently added ports were still being assigned the
old MAC address because it was copied from the wrong place. Give newly added
trunk ports the current MAC of the master port, rather than the saved MAC of
the master port. The saved MAC should only be used to restore the original
MAC address of the interface when it is removed from the trunk.

ok claudio@


# 1.74 23-Apr-2010 stsp

Use proper Queen's English in a comment. Drive-by fix, no functional change.


# 1.73 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


# 1.72 17-Apr-2010 deraadt

use ifnewlladdr() for trunk lladdr changes, too
ok stsp


Revision tags: OPENBSD_4_7_BASE
# 1.71 12-Jan-2010 dlg

set the length of the send queue to 1.

this prevents the ultimate length of the queue of the underlying interface
from being artificially inflated while hte vlan/trunk queue is filled and
then dumped wholesale on the underlying interface, which will dump its
massive queue wholesale on the chip.

tx mitigation is only triggered on real interfaces now (which is where the
cost is)

ok beck@ original diff ok kjc@ henning@


# 1.70 18-Nov-2009 deraadt

do not do setup that ether_ifattach() takes care of; ok jsg


# 1.69 17-Sep-2009 claudio

Add an splassert check trunk_enqueue() calling this function at anything
below splnet() is a good recipe for doom.
OK henning, reyk, mpf


# 1.68 09-Sep-2009 reyk

remove inline functions and move some code from the trunk_lacp_input()
API function directly to lacp_input() to simplify the code path.

ok mpf@


# 1.67 16-Jul-2009 thib

Backout rev1.79 of if_vlan.c and rev1.66 of if_trunk.c;
Changes in those revision limited the send queue to one slot.

This breaks NFS over vlan(4) has discovered by sthen@.

"just plain back it out." deraadt@


# 1.66 13-Jul-2009 dlg

make the send queue one slot long. this forces packets off the virtual
interfaces down to the queue on the physical interface immediately.

this avoids having the tx mitigation code wasting cpu time dicking around
with simply shuffling packets off virtual interface queues and lets it
do its job of ammortising the cost of calling a real interfaces start
routine.

it also prevents an artificial inflation of the physical interfaces queue
length where packets could hide on the virtual interfaces queues during
softnet before being dumped en masse onto the hardware. this will smooth
out the rate at which packets are submitted to the hardware.

kjc@ says this has no impact on altq. ya henning@


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.65 27-Jan-2009 naddy

handle HW VLAN tags being passed down; from Brad


# 1.64 27-Jan-2009 naddy

make the hardware/no hardware tag stripping cases consistent and don't
hash the VLAN priority; ok henning@


# 1.63 14-Dec-2008 brad

Allow trunk_hashmbuf() to take HW VLAN tagging into consideration.

ok mpf@ naddy@


# 1.62 14-Dec-2008 brad

Since trunk_hashmbuf() and thus trunk_lb_gethdr() are no longer specific
to the loadbalance code rename trunk_lb_gethdr() to just trunk_gethr().

ok mpf@


# 1.61 28-Nov-2008 brad

Eliminate the redundant bits of code for MTU and multicast handling
from the individual drivers now that ether_ioctl() handles this.

Shrinks the i386 kernels by..
RAMDISK - 2176 bytes
RAMDISKB - 1504 bytes
RAMDISKC - 736 bytes

Tested by naddy@/okan@/sthen@/brad@/todd@/jmc@ and lots of users.
Build tested on almost all archs by todd@/brad@

ok naddy@


# 1.60 16-Nov-2008 brad

Make sure to increment the the output error counter if
not using TRUNK_PROTO_NONE and there are no member ports.

ok mpf@


# 1.59 08-Nov-2008 mpf

Take into account that our ether_input() already strips the
ethernet header. This lets us actually process the incoming
LACP-Packets. It should now work with a lot more switches.
At least a Catalyst 3500 seems happy.
OK brad@


# 1.58 04-Nov-2008 brad

Move the trunk port count check from trunk_lb_start() to trunk_start()
before the protocol start routine is called so as to cover all protocols
with the same check.

ok mpf@


# 1.57 30-Oct-2008 brad

Fix building with !INET6 kernels.


# 1.56 28-Oct-2008 brad

Remove return at the end of a void function.


# 1.55 28-Oct-2008 brad

In trunk_media_status() mark the interface as active if any ports are
active rather than just the primary being UP.

From FreeBSD

Ok mpf@


# 1.54 28-Oct-2008 brad

In trunk_lb_start() port % count will never be greater than
TRUNK_MAX_PORTS so nuke the test.

From FreeBSD

Ok mpf@


# 1.53 28-Oct-2008 brad

Feed IPv6 flow label to hash calculation.

From FreeBSD

Ok mpf@


# 1.52 28-Oct-2008 brad

Show the ACTIVE flag in ifconfig for the single interface that is
actually active in failover mode rather than all interfaces with a
link. This makes it clear if the master interface is in use or one
of the backup links.

From FreeBSD

Tested by jmc@
Ok mpf@


# 1.51 02-Oct-2008 brad

First step towards cleaning up the Ethernet driver ioctl handling.
Move calling ether_ioctl() from the top of the ioctl function, which
at the moment does absolutely nothing, to the default switch case.
Thus allowing drivers to define their own ioctl handlers and then
falling back on ether_ioctl(). The only functional change this results
in at the moment is having all Ethernet drivers returning the proper
errno of ENOTTY instead of EINVAL/ENXIO when encountering unknown
ioctl's.

Shrinks the i386 kernels by..
RAMDISK - 1024 bytes
RAMDISKB - 1120 bytes
RAMDISKC - 832 bytes

Tested by martin@/jsing@/todd@/brad@
Build tested on almost all archs by todd@/brad@

ok jsing@


# 1.50 17-Sep-2008 chl

remove dead stores and newly created unused variables.

fix potential use of uninitialized value in trunk_port_ioctl() function.

Found by LLVM/Clang Static Analyzer.

ok mpf@ henning@


# 1.49 07-Aug-2008 damien

do not touch m after IFQ_ENQUEUE()+if_start().

ok brad@, mpf@, henning@, reyk@


Revision tags: OPENBSD_4_4_BASE
# 1.48 06-Aug-2008 reyk

fix trunk breakage that sneaked in with the lacp diff:

- don't use in-kernel IFMEDIA ioctls in lacp and remove two KASSERTs
that caused reliable panics - the lacp key can be locally assigned and
we don't need to query the media subtype here.

- unbreak failover/loadbalance/broadcast status handling.

Reported by brad@
ok deraadt@


# 1.47 30-Jul-2008 mpf

Prevent a divide by zero panic if trunkproto loadbalance is
used w/out any trunkports. Patch from Dmitri Alenitchev.
OK reyk@


# 1.46 15-Jun-2008 mpf

Add 802.3ad LACP support for trunk(4).
Implementation from NetBSD. Ported via FreeBSD's version in trunk^Wlagg(4).
This is still work in progress. Tested with a HP ProCurve 3500.
OK reyk@


# 1.45 14-Jun-2008 mpf

Move bpf_mtap_hdr() above the trunk_*_input() routines.
This makes it easier to add trunk protocols that consume some packets.
Add a special case for the failover protocol, to prevent shoving
duplicates to bpf. (Not beautiful, but it has to do for the moment).
OK reyk@, claudio@


# 1.44 13-Jun-2008 mpf

Move the responsibility to free received packets on trunked interfaces
from ether_input() into trunk_input() where it can be handled in a smarter way.
OK claudio@ and reyk@ on an earlier version.


# 1.43 08-Jun-2008 brad

Use m_freem() instead of m_free() in trunk_start() to ensure that the
full mbuf chain is being free'd.

ok reyk@


# 1.42 07-May-2008 dlg

enable tx mitigation when putting packets on the wire by switching from
calls to ifp->if_start to if_start(). these are the obviously right cases
where we can do that, the less obvious ones may follow as theyre figured
out.

deraadt@ said to go for it


Revision tags: OPENBSD_4_3_BASE
# 1.41 10-Jan-2008 brad

return with ENOTTY instead of EINVAL for unknown ioctl requests to trunk
ports.

ok reyk@ dlg@


# 1.40 26-Nov-2007 martynas

typos; ok jmc@
sys/netinet/in_pcb.c and sys/net/bridgestp.c ok henning@
sys/dev/pci/bktr/* ok jakemsr@


# 1.39 20-Nov-2007 canacar

Fix possible mbuf leak on error. ok reyk@


# 1.38 22-Oct-2007 reyk

use the input mbuf for the first active port instead of copying it in
the broadcast start routing.

ok pyr@


# 1.37 22-Oct-2007 pyr

Add a broadcast mode to trunk(4). This mode sends frames on all
ports and receives frame on any port. This allows interaction with
some L2 configurations.
with input and ok reyk@


# 1.36 15-Sep-2007 henning

malloc sweep:
-remove useless casts
-MALLOC/FREE -> malloc/free
-use M_ZERO where appropriate instead of seperate bzero
feedback & ok krw, hshoexer


# 1.35 07-Sep-2007 reyk

use M_ZERO


# 1.34 06-Sep-2007 reyk

bump the copyright while touching these files


# 1.33 06-Sep-2007 reyk

With a trunk(4) interface in fail over mode the trunk(4) interface
will show input errors for packets received from any of the ports that
are part of a fail over interface but are not the "master" port at the
time. This fixes the problem by checking the error condition
correctly.

From brad at comstyle dot com


Revision tags: OPENBSD_4_2_BASE
# 1.32 26-May-2007 jason

one extern seems to be better than 20 for ifqmaxlen; ok krw


# 1.31 26-Apr-2007 reyk

extend the trunk protocol API with some additional callbacks required
for future work. also move the repeated tx start code into a common
function.

parts of it are merged from FreeBSD's trunk(4) port. oh, wait... they
renamed it to 'lagg(4)' because a little green guy from Cizzco-Eeeh
told them "trunk is for VLANs, trunk is for VLANs". Bad FreeBSD, don't
listen to the little green guy from Cizzco-Eeeh!

ok claudio@


Revision tags: OPENBSD_4_1_BASE
# 1.30 31-Jan-2007 reyk

handle the full duplex link state in trunk(4). load sharing trunks
with at least two ports are always handled as full fuplex links. this
change will allow trunks as edge ports in a rstp bridge(4).

ok brad@ pyr@


Revision tags: OPENBSD_4_0_BASE
# 1.29 28-May-2006 reyk

check if the interface is active and UP. some, but not all, network
drivers report an active link state even if the interface is DOWN.
this should fix trunk with various ethernet devices.

ok brad@


# 1.28 23-May-2006 reyk

knf and remove an unneeded debug message


# 1.27 23-May-2006 reyk

add


# 1.26 20-May-2006 reyk

bump copyright


# 1.25 20-May-2006 reyk

drop packets received on inactive failover ports without increasing the
error counter. just silently drop them...

figured out by todd@, ok brad@


# 1.24 16-May-2006 reyk

the ifp->if_linkstatehooks may be NULL, add an extra check to avoid
possible kernel panic. this happened to me by using tun(4) in layer 2
mode (link0 flag) as a trunk port for testing.


# 1.23 25-Mar-2006 djm

allow bpf(4) to ignore packets based on their direction (inbound or
outbound), using a new BIOCSDIRFILT ioctl;
guidance, feedback and ok canacar@


# 1.22 11-Mar-2006 brad

splimp -> splnet


# 1.21 04-Mar-2006 brad

With the exception of two other small uncommited diffs this moves
the remainder of the network stack from splimp to splnet.

ok miod@


Revision tags: OPENBSD_3_9_BASE
# 1.20 04-Jan-2006 brad

Move bpf_mtap_hdr() after trunk_*_input() so that we hopefully see only one
copy of the packet with bpf.

From mcbride@

some testing by todd@, ok reyk@


# 1.19 04-Jan-2006 canacar

Remove redundant calls to bpfdetach.
ok brad@


# 1.18 21-Dec-2005 reyk

knf


# 1.17 21-Dec-2005 reyk

fix possible NULL pointer, thanks to Marco Molteni


# 1.16 18-Dec-2005 reyk

Update my e-mail address in the copyright statement, no binary changes.


# 1.15 17-Dec-2005 brad

revert last commit as it introduced system panics due to improper use
of bpf_mtap().


# 1.14 27-Nov-2005 mcbride

Move bpf_mtap() after trunk_*_input() so that we hopefull see only one
copy of the packet with bpf.

ok reyk@


# 1.13 27-Nov-2005 mcbride

Attempt to accept a packet only once when operating in failover mode.

Makes trunk usable with hubs or switches which don't have actual trunk support.

ok reyk@


# 1.12 27-Nov-2005 mcbride

Fix dereference of uninitialised pointer in trunk_input() error path.

ok reyk@


# 1.11 27-Nov-2005 mcbride

Make the trunk interface link state depend on the link states of the
trunkports (link is UP as long as at least one of the trunkports is up)

ok reyk@


# 1.10 23-Oct-2005 mpf

Rework of multicast deletion code for vlan(4) and trunk(4).
The previous code could wrongly delete multicast groups
on the parent interface. Now we forward only remembered
delete requests.
OK mcbride, mickey.


# 1.9 09-Oct-2005 reyk

use label


# 1.8 03-Oct-2005 reyk

add a simple active "failover" trunk mode. port priorities will be
added later, currently the master port will always be the default
(active) port and the the next active port will be used as the
failover port.

ok brad@


# 1.7 14-Sep-2005 reyk

correctly set IFF_RUNNING flag after device state changes.

ok brad@


# 1.6 11-Sep-2005 brad

when adding any IP addresses make sure to UP the interface.

ok reyk@


# 1.5 10-Sep-2005 reyk

update the trunk(4) driver

- add multicast support by passing multicast addresses to the ports.
this is a requirement for carp(4) over trunk(4).

- support the smallest common interface capabilities. ie., this adds
support for VLAN MTUs if all attached ports have this capability.

- add a port_destroy callback to the trunk protocol. this fixes a
potential crash if the master port has been detached while running.

discussed with deraadt@, brad@ and some others.


Revision tags: OPENBSD_3_8_BASE
# 1.4 31-Jul-2005 pascoe

Introduce bpf_mtap_af and bpf_mtap_hdr to be used when passing a mbuf chain
to bpf with either an address family or other header added.

These helpers only allocate a much smaller struct m_hdr on the stack when
needed, rather than leaving 256 byte struct mbufs on the stack in deep
call paths. Also removes a fair bit of duplicated code.

commit now, tune after deraadt@


# 1.3 27-May-2005 reyk

add missing free on error. thanks to Andrey Matveev.


# 1.2 24-May-2005 reyk

support trunk stacking (trunks as trunk ports) and some fixes

ok brad@


# 1.1 24-May-2005 reyk

initial import of a trunking (link aggregation and link failover)
implementation. it currently supports round robin mode with link state
checking, additional modes will be added later.

ok brad@, deraadt@


# 1.139 29-Apr-2019 dlg

tr_unit is unused, so gc it


# 1.138 23-Apr-2019 dlg

a first cut at converting some virtual ethernet interfaces to if_vinput

this let's input processing bypass ifiqs. there's a performance
benefit from this, and it will let me tweak the backpressure detection
mechanism that ifiqs use without impacting on a stack of virtual
interfaces.

ive tested all of these except mpw, which i will end up testing
soon anyway.


Revision tags: OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.137 12-Aug-2018 ccardenas

Add administrative options to LACP trunk implementation.

The trunk driver now has a new ioctl (SIOCxTRUNKOPTS), which for now only
has options for LACP:
* Mode - Active or Passive (default Active)
* Timeout - Fast or Slow (default Slow)
* System Priority - 1(high) to 65535(low) (default 32768/0x8000)
* Port Priority - 1(high) to 65535(low) (default 32768/0x8000)
* IFQ Priority - 0 to NUM_QUEUES (default 6)

At the moment, ifconfig only has options for lacpmode and lacptimeout
plumbed as those are the immediate need.

The approach taken for the options was to make them on a "trunk" vs a
"port" as what's typically seen on various NOSes (JunOS, NXOS, etc...)
as it's uncommon for a host to have one link "Passive" and the other
"Active" in a given trunk.

Just like on a NOS, when applying lacpmode or lacptimeout, the settings
are immediately applied to all existing ports in the trunk and to all
future ports brought into the trunk.

Tested by many on a plethora of NIC drivers and switches.

Ok remi@


Revision tags: OPENBSD_6_3_BASE
# 1.136 19-Feb-2018 mpi

Remove almost unused `flags' argument of suser().

The account flag `ASU' will no longer be set but that makes suser()
mpsafe since it no longer mess with a per-process field.

No objection from millert@, ok tedu@, bluhm@


# 1.135 09-Jan-2018 bluhm

Creating a cloned interface could return ENOMEM due to temporary
memory shortage. As it is invoked from a system call, it should
not fail and wait instead.
OK visa@ mpi@


Revision tags: OPENBSD_6_2_BASE
# 1.134 14-Aug-2017 reyk

The "ret" return value is reused and overwritten, potentially
returning 0 (success) on error instead of an error number. The caller
doesn't evaluate the return value, so it is good enough to return
ENOBUFS (non-0) on error and to remove "ret" in trunk_cast_start().

Coverity CID 1453105; Severity: Minor

OK mpi@


# 1.133 11-Aug-2017 mpi

Remove NET_LOCK()'s argument.

Tested by Hrvoje Popovski, ok bluhm@


# 1.132 28-May-2017 mpi

Add missing NET_UNLOCK() in error path.

Spotted by sashan@


# 1.131 28-May-2017 mpi

trunk_port_destroy() needs the NET_LOCK().

It brings the interface down and restore the original lladdr.

Found by Hrvoje Popovski


# 1.130 28-May-2017 mpi

Remove useless splnet()/splx() dances.

Data structures modified in the ioctl path are protected by the NET_LOCK().

ok sashan@


Revision tags: OPENBSD_6_1_BASE
# 1.129 22-Jan-2017 dlg

move counting if_opackets next to counting if_obytes in if_enqueue.

this means packets are consistently counted in one place, unlike the
many and various ways that drivers thought they should do it.

ok mpi@ deraadt@


# 1.128 16-Sep-2016 mikeb

Reconfigure interface capabilities after switching trunkproto; ok mpi


Revision tags: OPENBSD_6_0_BASE
# 1.127 13-Apr-2016 mpi

We're always ready! So send IFQ_SET_READY() to the bitbucket.


Revision tags: OPENBSD_5_9_BASE
# 1.126 31-Dec-2015 sthen

Move tr_port_destroy down; fixes 'lacp_compose_key protection fault trap'
when removing a port from a lacp trunk. Part of a larger diff from mpi,
as suggested by mikeb. ok mpi@


# 1.125 21-Nov-2015 dlg

dont check IFF_OACTIVE to see if the port is busy.

dont check if its busy at all, actually.

fine with reyk@


# 1.124 20-Nov-2015 dlg

dont play with IFF_OACTIVE needlessly.

only a driver sets or clears it, and trunk never sets it. therefore it
never needs to clear it.


# 1.123 12-Nov-2015 mpi

Prefix flowid with ph_ and print it in m_print().

ok dlg@


# 1.122 25-Oct-2015 mpi

arp_ifinit() is no longer required.


# 1.121 08-Oct-2015 mikeb

Make sure that when trunk_port_ioctl is called to set a new
lladdr the trunk port is already on the list.

OK mpi


# 1.120 08-Oct-2015 dlg

if the mbuf has a valid flowid, use it instead of using siphash24
and a bunch of header fields we have to parse the mbuf for.

siphash24 is about 20% of the cost of sending a udp packet on a
trunk interface with tcpbench on my box. if there's a flowid set
we get all that back.

ok mpi@ mikeb@ sthen@


# 1.119 05-Oct-2015 mikeb

Factor LACP frame processing out to a separate task

This is slightly refactored version of the diff by jmatthew@
that makes use of a single per-trunk task but retains per-port
mbuf queues.

Running LACP frame processing in a task context allows a simple
way to synchronize changes to the trunk ports and trunk itself
performed from the ioctl, timeout and task contexts with a kernel
lock.

OK mpi


# 1.118 29-Sep-2015 deraadt

add sizes to some of the simpler free calls
ok mpi


# 1.117 28-Sep-2015 mpi

Remove "if_tp" from the "struct ifnet".

Instead of violating a layer of abstraction by keeping per pseudo-driver
informations in "struct ifnet", the port trunk is now passed as a cookie
to the interface input handler (ifih).

The time of per pseudo-driver hack in the network stack is over!

ok mikeb@


# 1.116 24-Sep-2015 mikeb

add a comment explaining how we serialize when switching trunkproto;
requested by mpi@


# 1.115 24-Sep-2015 mikeb

Avoid a theoretical m_pullup(9) mishandling by delegating the mbuf
reclaiming to the PDU and marker input routines.

m_pullup may return a pointer to the newly allocated mbuf. In this
case m_freem is called by the trunk_input, not by the proto specific
code and pointer to the mbuf is not passed by reference. Therefore
m_freem will either be called on the middle element of the chain
(when the m_pullup call succeeds) or on the stale pointer (when it
frees the chain in the failure case). Fortunately we should never
hit this case as the receive path uniformly uses contiguous chunks
of memory.

Verified with and ok blambert, ok mpi


# 1.114 23-Sep-2015 mikeb

Serialize trunk changes with input handler insertion and removal.

This moves around calls to if_ih_insert and if_ih_remove to ensure
that we either have completed port initialization or are going to
tear the port configuration down and don't want any input processes
to get hold of the port.

When trunk_port_destroy is called from the ioctl this would wait for
all input processes to finish and release their references to be able
to disestablish the input handler and ensure full control of the port.

When switching trunkproto it is required for the ioctl context to
be able to touch all trunk ports and the protocol (tr_psc). The
easiest way do this is to disestablish all input handlers (while
making sure they all complete) and then reestablish them after the
trunk reconfiguration is completed.

This avoids getting trunk a separate locking protocol of its own.

ok mpi, suggested by and ok dlg


# 1.113 23-Sep-2015 mikeb

Keep track of an active port in the failover trunk to avoid list
iterations and additional locking protection in the future.

Suggested by and ok mpi


# 1.112 23-Sep-2015 mikeb

Remove trunk watchdog code since it doesn't do anything useful
and we want to limit the number of different places where we
access trunk port pointers.

trunk_watchdog should be never called as we don't set up it's
if_timer and trunk_port_watchdog just calls the if_watchdog
from the underlying interface.

It's possible that this is no longer needed due to if_slowtimo/
if_watchdog changes done earlier.

ok mpi


# 1.111 10-Sep-2015 mikeb

pass a cookie argument to interface input handlers that can be used
to pass additional context or transient data with the similar life
time.

ok mpi, suggestions, hand holding and ok from dlg


# 1.110 10-Sep-2015 dlg

move the if input handler list to an SRP list.

instead of having every driver that manipulates the ifih list
understand SRPLs, this moves that processing into if_ih_insert and
if_ih_remove functions.

we rely on the kernel lock to serialise the modifications to the
list.

tested by mpi@
ok mpi@ claudio@ mikeb@


Revision tags: OPENBSD_5_8_BASE
# 1.109 17-Jul-2015 mpi

Drop promiscuously received packets if the trunk(4) interface is not
in promiscuous mode.

The long story is that claudio@ had his ssh session reset multiple
times in the hackroom because czarkoff@'s machine was sending reset.
We figured out that the packet was reaching pf because of this missing
check. pf would then not find any state and sent a reset.

Analyzed with and ok phessler@, claudio@


# 1.108 02-Jul-2015 mpi

Unify the check for up & running between all pseudo-drivers.


# 1.107 02-Jul-2015 mpi

By design if_input_process() needs to hold a reference on the receiving
ifp in order to access its ifih handlers.

So get rid of if_get() in the various ifih handlers we know the ifp is
live at this point.

ok dlg@


# 1.106 30-Jun-2015 mpi

Rename if_output() into if_enqueue() to avoid confusion with comments
talking about (*ifp->if_output)().

ok claudio@, dlg@


# 1.105 29-Jun-2015 dlg

count if_ibytes in if_input like we do for if_ipackets.

tweaks and ok mpi@


# 1.104 24-Jun-2015 mpi

Increment if_ipackets in if_input().

Note that pseudo-drivers not using if_input() are not affected by this
conversion.

ok mikeb@, kettenis@, claudio@, dlg@


# 1.103 16-Jun-2015 mpi

Store a unique ID, an interface index, rather than a pointer to the
receiving interface in the packet header of every mbuf.

The interface pointer should now be retrieved when necessary with
if_get(). If a NULL pointer is returned by if_get(), the interface
has probably been destroy/removed and the mbuf should be freed.

Such mechanism will simplify garbage collection of mbufs and limit
problems with dangling ifp pointers.

Tested by jmatthew@ and krw@, discussed with many.

ok mikeb@, bluhm@, dlg@


# 1.102 15-Jun-2015 mpi

Fix a double free in the destroy path triggered when a second process,
in my case dhclient(8), races with ifconfig(8) to free the descriptors
of the joined multicast groups.

While here reduce the difference with carp(4).

ok dms@


# 1.101 09-Jun-2015 mpi

Convert trunk(4) to if_input().

ok dlg@


# 1.100 26-May-2015 mpi

Now that the Ethernet header is always passed as part of the mbuf, kill
the second (unused) argument of the input packet handlers.

ok dlg@


# 1.99 15-May-2015 mpi

Introduce if_output(), a function do to the last steps before enqueuing
a packet on the sending queue of an interface.

Tested by many, thanks a lot!

ok dlg@, claudio@


# 1.98 14-May-2015 mpi

Allocate the input packet handler as part of the trunk_port structure
since they have the same lifetime.

Requested by and ok dlg@


# 1.97 13-May-2015 mpi

Get rid of the last "#if NTRUNK" by overwriting trunk ports' output
function.

ok claudio@, reyk@


# 1.96 11-May-2015 mpi

Take trunk(4) out of ether_input().

Each physical interface of a trunk now gets a specific ifih (interface
input handler) that runs before ether_input().

Tested by sthen@, dlg@, Theo Buehler and <mxb AT alumni.chalmers DOT se>

ok sthen@, dlg@


# 1.95 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.94 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.93 04-Dec-2014 tedu

use siphash for trunk loadbalancing. ok deraadt


# 1.92 01-Dec-2014 mikeb

Make every interface with a watchdog register it's own slow timeout

This removes the system wide if_slowtimo timeout and lets every
interface with a valid if_watchdog method register it's own in
order to get rid of the ifnet loop in the softclock context and
avoid further complications with concurrent access to the ifnet
list.

ok deraadt, input and ok mpi, looked at by claudio


# 1.91 18-Nov-2014 tedu

move arc4random prototype to systm.h. more appropriate for most code
to include that than rdnvar.h. ok deraadt dlg


Revision tags: OPENBSD_5_6_BASE
# 1.90 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.89 12-Jul-2014 tedu

add a size argument to free. will be used soon, but for now default to 0.
after discussions with beck deraadt kettenis.


# 1.88 09-Jul-2014 henning

bpf code surgery / shuffling / simplification.
the various bpf_mtap_* are very similiar, they differ in what (and to some
extent how) they prepend something, and what copy function they pass to
bpf_catchpacket.
use an internal _bpf_mtap as "backend" for bpf_mtap and friends.
extend bpf_mtap_hdr so that it covers all common cases:
if dlen is 0, nothing gets prepended.
copy function can be given, if NULL the default bpf_mcopy is used.
adjust the existing bpf_mtap_hdr users to pass a NULL ptr for the copy fn.
re-implement bpf_mtap_af as simple wrapper for bpf_mtap_hdr.
re-implement bpf_mtap_ether using bpf_map_hdr
re-implement bpf_mtap_pflog as trivial bpf_mtap_hdr wrapper
ok bluhm benno


# 1.87 10-Mar-2014 mpi

if_lladdr -> if_sadl, no functional change.

ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.86 21-Nov-2013 mpi

Remove unneeded include.

ok deraadt@


# 1.85 18-Nov-2013 mpi

Convert trunk(4) to use a detachhook, discussed at b2k13 with many.

While here add a comment explaining detach hooks' order of execution when
destroying/detaching an interface.


Revision tags: OPENBSD_5_4_BASE
# 1.84 20-Jun-2013 mpi

Revert previous and unbreak asr, the new include should be protected.

Reported by naddy@


# 1.83 20-Jun-2013 mpi

Allocate the various hook head descriptors as part of the ifnet
structure rather than doing various M_WAITOK allocations during
the *attach() functions, we always rely on them anyway.

ok mikeb@, uebayasi@


# 1.82 11-May-2013 sthen

Set trunk(4)'s MTU to that of the first trunkport. Allows trunk to work with
jumbo/baby-jumbo frames. To avoid problems with mismatches between trunkports,
any additional ports must have the same MTU as already set on the trunk(4).
Based on changes made in FreeBSD. Tested by myself and jj@, ok reyk@


# 1.81 02-Apr-2013 mpi

Instead of storing the link-level address of every interface in a global
array indexed by interface numbers, add a new field to the interface
descriptor pointing to it.

claudio@ and todd@ like it, ok mikeb@


# 1.80 28-Mar-2013 tedu

no need for a lot of code to include proc.h


Revision tags: OPENBSD_5_3_BASE
# 1.79 25-Feb-2013 dlg

trunk_bcast_start sent packets on all its member interfaces by copying
the mbuf it just sent on the previous interface. this is bad because the
previous interface could have modified the mbuf chain, which can make the
subsequent m_copym()s panic.

this copies the dance that rtsock.c does for broadcasting mbufs which
copies the mbuf before transmit, except for the last interface which it
handles outside the loop.

tested by halex@ who verified it fixes his panic.
ok claudio@ deraadt@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.78 28-Oct-2011 krw

Take more care to ensure all callbacks are initialized. In particular
tr_linkstate() was not initialized in trunk_rr_attach(), and
tr_init() and tr_stop() were missing in trunk_lb_attach().

Fixes crash triggered by changing trunkproto, reported by Anders
Berggren on bugs@.

ok mpf henning


Revision tags: OPENBSD_4_9_BASE OPENBSD_5_0_BASE
# 1.77 28-Jan-2011 reyk

Fix another memory leak by replacing m_free() with m_freem() in trunk's
broadcast mode.

ok blambert@ mikeb@


# 1.76 12-Nov-2010 dhill

avoid pointer arithmetic on void *
no change in .o

ok claudio


Revision tags: OPENBSD_4_8_BASE
# 1.75 08-May-2010 stsp

Upon changing the MAC address of an if_trunk interface, all ports are switched
to the new MAC. But subsequently added ports were still being assigned the
old MAC address because it was copied from the wrong place. Give newly added
trunk ports the current MAC of the master port, rather than the saved MAC of
the master port. The saved MAC should only be used to restore the original
MAC address of the interface when it is removed from the trunk.

ok claudio@


# 1.74 23-Apr-2010 stsp

Use proper Queen's English in a comment. Drive-by fix, no functional change.


# 1.73 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


# 1.72 17-Apr-2010 deraadt

use ifnewlladdr() for trunk lladdr changes, too
ok stsp


Revision tags: OPENBSD_4_7_BASE
# 1.71 12-Jan-2010 dlg

set the length of the send queue to 1.

this prevents the ultimate length of the queue of the underlying interface
from being artificially inflated while hte vlan/trunk queue is filled and
then dumped wholesale on the underlying interface, which will dump its
massive queue wholesale on the chip.

tx mitigation is only triggered on real interfaces now (which is where the
cost is)

ok beck@ original diff ok kjc@ henning@


# 1.70 18-Nov-2009 deraadt

do not do setup that ether_ifattach() takes care of; ok jsg


# 1.69 17-Sep-2009 claudio

Add an splassert check trunk_enqueue() calling this function at anything
below splnet() is a good recipe for doom.
OK henning, reyk, mpf


# 1.68 09-Sep-2009 reyk

remove inline functions and move some code from the trunk_lacp_input()
API function directly to lacp_input() to simplify the code path.

ok mpf@


# 1.67 16-Jul-2009 thib

Backout rev1.79 of if_vlan.c and rev1.66 of if_trunk.c;
Changes in those revision limited the send queue to one slot.

This breaks NFS over vlan(4) has discovered by sthen@.

"just plain back it out." deraadt@


# 1.66 13-Jul-2009 dlg

make the send queue one slot long. this forces packets off the virtual
interfaces down to the queue on the physical interface immediately.

this avoids having the tx mitigation code wasting cpu time dicking around
with simply shuffling packets off virtual interface queues and lets it
do its job of ammortising the cost of calling a real interfaces start
routine.

it also prevents an artificial inflation of the physical interfaces queue
length where packets could hide on the virtual interfaces queues during
softnet before being dumped en masse onto the hardware. this will smooth
out the rate at which packets are submitted to the hardware.

kjc@ says this has no impact on altq. ya henning@


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.65 27-Jan-2009 naddy

handle HW VLAN tags being passed down; from Brad


# 1.64 27-Jan-2009 naddy

make the hardware/no hardware tag stripping cases consistent and don't
hash the VLAN priority; ok henning@


# 1.63 14-Dec-2008 brad

Allow trunk_hashmbuf() to take HW VLAN tagging into consideration.

ok mpf@ naddy@


# 1.62 14-Dec-2008 brad

Since trunk_hashmbuf() and thus trunk_lb_gethdr() are no longer specific
to the loadbalance code rename trunk_lb_gethdr() to just trunk_gethr().

ok mpf@


# 1.61 28-Nov-2008 brad

Eliminate the redundant bits of code for MTU and multicast handling
from the individual drivers now that ether_ioctl() handles this.

Shrinks the i386 kernels by..
RAMDISK - 2176 bytes
RAMDISKB - 1504 bytes
RAMDISKC - 736 bytes

Tested by naddy@/okan@/sthen@/brad@/todd@/jmc@ and lots of users.
Build tested on almost all archs by todd@/brad@

ok naddy@


# 1.60 16-Nov-2008 brad

Make sure to increment the the output error counter if
not using TRUNK_PROTO_NONE and there are no member ports.

ok mpf@


# 1.59 08-Nov-2008 mpf

Take into account that our ether_input() already strips the
ethernet header. This lets us actually process the incoming
LACP-Packets. It should now work with a lot more switches.
At least a Catalyst 3500 seems happy.
OK brad@


# 1.58 04-Nov-2008 brad

Move the trunk port count check from trunk_lb_start() to trunk_start()
before the protocol start routine is called so as to cover all protocols
with the same check.

ok mpf@


# 1.57 30-Oct-2008 brad

Fix building with !INET6 kernels.


# 1.56 28-Oct-2008 brad

Remove return at the end of a void function.


# 1.55 28-Oct-2008 brad

In trunk_media_status() mark the interface as active if any ports are
active rather than just the primary being UP.

From FreeBSD

Ok mpf@


# 1.54 28-Oct-2008 brad

In trunk_lb_start() port % count will never be greater than
TRUNK_MAX_PORTS so nuke the test.

From FreeBSD

Ok mpf@


# 1.53 28-Oct-2008 brad

Feed IPv6 flow label to hash calculation.

From FreeBSD

Ok mpf@


# 1.52 28-Oct-2008 brad

Show the ACTIVE flag in ifconfig for the single interface that is
actually active in failover mode rather than all interfaces with a
link. This makes it clear if the master interface is in use or one
of the backup links.

From FreeBSD

Tested by jmc@
Ok mpf@


# 1.51 02-Oct-2008 brad

First step towards cleaning up the Ethernet driver ioctl handling.
Move calling ether_ioctl() from the top of the ioctl function, which
at the moment does absolutely nothing, to the default switch case.
Thus allowing drivers to define their own ioctl handlers and then
falling back on ether_ioctl(). The only functional change this results
in at the moment is having all Ethernet drivers returning the proper
errno of ENOTTY instead of EINVAL/ENXIO when encountering unknown
ioctl's.

Shrinks the i386 kernels by..
RAMDISK - 1024 bytes
RAMDISKB - 1120 bytes
RAMDISKC - 832 bytes

Tested by martin@/jsing@/todd@/brad@
Build tested on almost all archs by todd@/brad@

ok jsing@


# 1.50 17-Sep-2008 chl

remove dead stores and newly created unused variables.

fix potential use of uninitialized value in trunk_port_ioctl() function.

Found by LLVM/Clang Static Analyzer.

ok mpf@ henning@


# 1.49 07-Aug-2008 damien

do not touch m after IFQ_ENQUEUE()+if_start().

ok brad@, mpf@, henning@, reyk@


Revision tags: OPENBSD_4_4_BASE
# 1.48 06-Aug-2008 reyk

fix trunk breakage that sneaked in with the lacp diff:

- don't use in-kernel IFMEDIA ioctls in lacp and remove two KASSERTs
that caused reliable panics - the lacp key can be locally assigned and
we don't need to query the media subtype here.

- unbreak failover/loadbalance/broadcast status handling.

Reported by brad@
ok deraadt@


# 1.47 30-Jul-2008 mpf

Prevent a divide by zero panic if trunkproto loadbalance is
used w/out any trunkports. Patch from Dmitri Alenitchev.
OK reyk@


# 1.46 15-Jun-2008 mpf

Add 802.3ad LACP support for trunk(4).
Implementation from NetBSD. Ported via FreeBSD's version in trunk^Wlagg(4).
This is still work in progress. Tested with a HP ProCurve 3500.
OK reyk@


# 1.45 14-Jun-2008 mpf

Move bpf_mtap_hdr() above the trunk_*_input() routines.
This makes it easier to add trunk protocols that consume some packets.
Add a special case for the failover protocol, to prevent shoving
duplicates to bpf. (Not beautiful, but it has to do for the moment).
OK reyk@, claudio@


# 1.44 13-Jun-2008 mpf

Move the responsibility to free received packets on trunked interfaces
from ether_input() into trunk_input() where it can be handled in a smarter way.
OK claudio@ and reyk@ on an earlier version.


# 1.43 08-Jun-2008 brad

Use m_freem() instead of m_free() in trunk_start() to ensure that the
full mbuf chain is being free'd.

ok reyk@


# 1.42 07-May-2008 dlg

enable tx mitigation when putting packets on the wire by switching from
calls to ifp->if_start to if_start(). these are the obviously right cases
where we can do that, the less obvious ones may follow as theyre figured
out.

deraadt@ said to go for it


Revision tags: OPENBSD_4_3_BASE
# 1.41 10-Jan-2008 brad

return with ENOTTY instead of EINVAL for unknown ioctl requests to trunk
ports.

ok reyk@ dlg@


# 1.40 26-Nov-2007 martynas

typos; ok jmc@
sys/netinet/in_pcb.c and sys/net/bridgestp.c ok henning@
sys/dev/pci/bktr/* ok jakemsr@


# 1.39 20-Nov-2007 canacar

Fix possible mbuf leak on error. ok reyk@


# 1.38 22-Oct-2007 reyk

use the input mbuf for the first active port instead of copying it in
the broadcast start routing.

ok pyr@


# 1.37 22-Oct-2007 pyr

Add a broadcast mode to trunk(4). This mode sends frames on all
ports and receives frame on any port. This allows interaction with
some L2 configurations.
with input and ok reyk@


# 1.36 15-Sep-2007 henning

malloc sweep:
-remove useless casts
-MALLOC/FREE -> malloc/free
-use M_ZERO where appropriate instead of seperate bzero
feedback & ok krw, hshoexer


# 1.35 07-Sep-2007 reyk

use M_ZERO


# 1.34 06-Sep-2007 reyk

bump the copyright while touching these files


# 1.33 06-Sep-2007 reyk

With a trunk(4) interface in fail over mode the trunk(4) interface
will show input errors for packets received from any of the ports that
are part of a fail over interface but are not the "master" port at the
time. This fixes the problem by checking the error condition
correctly.

From brad at comstyle dot com


Revision tags: OPENBSD_4_2_BASE
# 1.32 26-May-2007 jason

one extern seems to be better than 20 for ifqmaxlen; ok krw


# 1.31 26-Apr-2007 reyk

extend the trunk protocol API with some additional callbacks required
for future work. also move the repeated tx start code into a common
function.

parts of it are merged from FreeBSD's trunk(4) port. oh, wait... they
renamed it to 'lagg(4)' because a little green guy from Cizzco-Eeeh
told them "trunk is for VLANs, trunk is for VLANs". Bad FreeBSD, don't
listen to the little green guy from Cizzco-Eeeh!

ok claudio@


Revision tags: OPENBSD_4_1_BASE
# 1.30 31-Jan-2007 reyk

handle the full duplex link state in trunk(4). load sharing trunks
with at least two ports are always handled as full fuplex links. this
change will allow trunks as edge ports in a rstp bridge(4).

ok brad@ pyr@


Revision tags: OPENBSD_4_0_BASE
# 1.29 28-May-2006 reyk

check if the interface is active and UP. some, but not all, network
drivers report an active link state even if the interface is DOWN.
this should fix trunk with various ethernet devices.

ok brad@


# 1.28 23-May-2006 reyk

knf and remove an unneeded debug message


# 1.27 23-May-2006 reyk

add


# 1.26 20-May-2006 reyk

bump copyright


# 1.25 20-May-2006 reyk

drop packets received on inactive failover ports without increasing the
error counter. just silently drop them...

figured out by todd@, ok brad@


# 1.24 16-May-2006 reyk

the ifp->if_linkstatehooks may be NULL, add an extra check to avoid
possible kernel panic. this happened to me by using tun(4) in layer 2
mode (link0 flag) as a trunk port for testing.


# 1.23 25-Mar-2006 djm

allow bpf(4) to ignore packets based on their direction (inbound or
outbound), using a new BIOCSDIRFILT ioctl;
guidance, feedback and ok canacar@


# 1.22 11-Mar-2006 brad

splimp -> splnet


# 1.21 04-Mar-2006 brad

With the exception of two other small uncommited diffs this moves
the remainder of the network stack from splimp to splnet.

ok miod@


Revision tags: OPENBSD_3_9_BASE
# 1.20 04-Jan-2006 brad

Move bpf_mtap_hdr() after trunk_*_input() so that we hopefully see only one
copy of the packet with bpf.

From mcbride@

some testing by todd@, ok reyk@


# 1.19 04-Jan-2006 canacar

Remove redundant calls to bpfdetach.
ok brad@


# 1.18 21-Dec-2005 reyk

knf


# 1.17 21-Dec-2005 reyk

fix possible NULL pointer, thanks to Marco Molteni


# 1.16 18-Dec-2005 reyk

Update my e-mail address in the copyright statement, no binary changes.


# 1.15 17-Dec-2005 brad

revert last commit as it introduced system panics due to improper use
of bpf_mtap().


# 1.14 27-Nov-2005 mcbride

Move bpf_mtap() after trunk_*_input() so that we hopefull see only one
copy of the packet with bpf.

ok reyk@


# 1.13 27-Nov-2005 mcbride

Attempt to accept a packet only once when operating in failover mode.

Makes trunk usable with hubs or switches which don't have actual trunk support.

ok reyk@


# 1.12 27-Nov-2005 mcbride

Fix dereference of uninitialised pointer in trunk_input() error path.

ok reyk@


# 1.11 27-Nov-2005 mcbride

Make the trunk interface link state depend on the link states of the
trunkports (link is UP as long as at least one of the trunkports is up)

ok reyk@


# 1.10 23-Oct-2005 mpf

Rework of multicast deletion code for vlan(4) and trunk(4).
The previous code could wrongly delete multicast groups
on the parent interface. Now we forward only remembered
delete requests.
OK mcbride, mickey.


# 1.9 09-Oct-2005 reyk

use label


# 1.8 03-Oct-2005 reyk

add a simple active "failover" trunk mode. port priorities will be
added later, currently the master port will always be the default
(active) port and the the next active port will be used as the
failover port.

ok brad@


# 1.7 14-Sep-2005 reyk

correctly set IFF_RUNNING flag after device state changes.

ok brad@


# 1.6 11-Sep-2005 brad

when adding any IP addresses make sure to UP the interface.

ok reyk@


# 1.5 10-Sep-2005 reyk

update the trunk(4) driver

- add multicast support by passing multicast addresses to the ports.
this is a requirement for carp(4) over trunk(4).

- support the smallest common interface capabilities. ie., this adds
support for VLAN MTUs if all attached ports have this capability.

- add a port_destroy callback to the trunk protocol. this fixes a
potential crash if the master port has been detached while running.

discussed with deraadt@, brad@ and some others.


Revision tags: OPENBSD_3_8_BASE
# 1.4 31-Jul-2005 pascoe

Introduce bpf_mtap_af and bpf_mtap_hdr to be used when passing a mbuf chain
to bpf with either an address family or other header added.

These helpers only allocate a much smaller struct m_hdr on the stack when
needed, rather than leaving 256 byte struct mbufs on the stack in deep
call paths. Also removes a fair bit of duplicated code.

commit now, tune after deraadt@


# 1.3 27-May-2005 reyk

add missing free on error. thanks to Andrey Matveev.


# 1.2 24-May-2005 reyk

support trunk stacking (trunks as trunk ports) and some fixes

ok brad@


# 1.1 24-May-2005 reyk

initial import of a trunking (link aggregation and link failover)
implementation. it currently supports round robin mode with link state
checking, additional modes will be added later.

ok brad@, deraadt@


# 1.137 12-Aug-2018 ccardenas

Add administrative options to LACP trunk implementation.

The trunk driver now has a new ioctl (SIOCxTRUNKOPTS), which for now only
has options for LACP:
* Mode - Active or Passive (default Active)
* Timeout - Fast or Slow (default Slow)
* System Priority - 1(high) to 65535(low) (default 32768/0x8000)
* Port Priority - 1(high) to 65535(low) (default 32768/0x8000)
* IFQ Priority - 0 to NUM_QUEUES (default 6)

At the moment, ifconfig only has options for lacpmode and lacptimeout
plumbed as those are the immediate need.

The approach taken for the options was to make them on a "trunk" vs a
"port" as what's typically seen on various NOSes (JunOS, NXOS, etc...)
as it's uncommon for a host to have one link "Passive" and the other
"Active" in a given trunk.

Just like on a NOS, when applying lacpmode or lacptimeout, the settings
are immediately applied to all existing ports in the trunk and to all
future ports brought into the trunk.

Tested by many on a plethora of NIC drivers and switches.

Ok remi@


Revision tags: OPENBSD_6_3_BASE
# 1.136 19-Feb-2018 mpi

Remove almost unused `flags' argument of suser().

The account flag `ASU' will no longer be set but that makes suser()
mpsafe since it no longer mess with a per-process field.

No objection from millert@, ok tedu@, bluhm@


# 1.135 09-Jan-2018 bluhm

Creating a cloned interface could return ENOMEM due to temporary
memory shortage. As it is invoked from a system call, it should
not fail and wait instead.
OK visa@ mpi@


Revision tags: OPENBSD_6_2_BASE
# 1.134 14-Aug-2017 reyk

The "ret" return value is reused and overwritten, potentially
returning 0 (success) on error instead of an error number. The caller
doesn't evaluate the return value, so it is good enough to return
ENOBUFS (non-0) on error and to remove "ret" in trunk_cast_start().

Coverity CID 1453105; Severity: Minor

OK mpi@


# 1.133 11-Aug-2017 mpi

Remove NET_LOCK()'s argument.

Tested by Hrvoje Popovski, ok bluhm@


# 1.132 28-May-2017 mpi

Add missing NET_UNLOCK() in error path.

Spotted by sashan@


# 1.131 28-May-2017 mpi

trunk_port_destroy() needs the NET_LOCK().

It brings the interface down and restore the original lladdr.

Found by Hrvoje Popovski


# 1.130 28-May-2017 mpi

Remove useless splnet()/splx() dances.

Data structures modified in the ioctl path are protected by the NET_LOCK().

ok sashan@


Revision tags: OPENBSD_6_1_BASE
# 1.129 22-Jan-2017 dlg

move counting if_opackets next to counting if_obytes in if_enqueue.

this means packets are consistently counted in one place, unlike the
many and various ways that drivers thought they should do it.

ok mpi@ deraadt@


# 1.128 16-Sep-2016 mikeb

Reconfigure interface capabilities after switching trunkproto; ok mpi


Revision tags: OPENBSD_6_0_BASE
# 1.127 13-Apr-2016 mpi

We're always ready! So send IFQ_SET_READY() to the bitbucket.


Revision tags: OPENBSD_5_9_BASE
# 1.126 31-Dec-2015 sthen

Move tr_port_destroy down; fixes 'lacp_compose_key protection fault trap'
when removing a port from a lacp trunk. Part of a larger diff from mpi,
as suggested by mikeb. ok mpi@


# 1.125 21-Nov-2015 dlg

dont check IFF_OACTIVE to see if the port is busy.

dont check if its busy at all, actually.

fine with reyk@


# 1.124 20-Nov-2015 dlg

dont play with IFF_OACTIVE needlessly.

only a driver sets or clears it, and trunk never sets it. therefore it
never needs to clear it.


# 1.123 12-Nov-2015 mpi

Prefix flowid with ph_ and print it in m_print().

ok dlg@


# 1.122 25-Oct-2015 mpi

arp_ifinit() is no longer required.


# 1.121 08-Oct-2015 mikeb

Make sure that when trunk_port_ioctl is called to set a new
lladdr the trunk port is already on the list.

OK mpi


# 1.120 08-Oct-2015 dlg

if the mbuf has a valid flowid, use it instead of using siphash24
and a bunch of header fields we have to parse the mbuf for.

siphash24 is about 20% of the cost of sending a udp packet on a
trunk interface with tcpbench on my box. if there's a flowid set
we get all that back.

ok mpi@ mikeb@ sthen@


# 1.119 05-Oct-2015 mikeb

Factor LACP frame processing out to a separate task

This is slightly refactored version of the diff by jmatthew@
that makes use of a single per-trunk task but retains per-port
mbuf queues.

Running LACP frame processing in a task context allows a simple
way to synchronize changes to the trunk ports and trunk itself
performed from the ioctl, timeout and task contexts with a kernel
lock.

OK mpi


# 1.118 29-Sep-2015 deraadt

add sizes to some of the simpler free calls
ok mpi


# 1.117 28-Sep-2015 mpi

Remove "if_tp" from the "struct ifnet".

Instead of violating a layer of abstraction by keeping per pseudo-driver
informations in "struct ifnet", the port trunk is now passed as a cookie
to the interface input handler (ifih).

The time of per pseudo-driver hack in the network stack is over!

ok mikeb@


# 1.116 24-Sep-2015 mikeb

add a comment explaining how we serialize when switching trunkproto;
requested by mpi@


# 1.115 24-Sep-2015 mikeb

Avoid a theoretical m_pullup(9) mishandling by delegating the mbuf
reclaiming to the PDU and marker input routines.

m_pullup may return a pointer to the newly allocated mbuf. In this
case m_freem is called by the trunk_input, not by the proto specific
code and pointer to the mbuf is not passed by reference. Therefore
m_freem will either be called on the middle element of the chain
(when the m_pullup call succeeds) or on the stale pointer (when it
frees the chain in the failure case). Fortunately we should never
hit this case as the receive path uniformly uses contiguous chunks
of memory.

Verified with and ok blambert, ok mpi


# 1.114 23-Sep-2015 mikeb

Serialize trunk changes with input handler insertion and removal.

This moves around calls to if_ih_insert and if_ih_remove to ensure
that we either have completed port initialization or are going to
tear the port configuration down and don't want any input processes
to get hold of the port.

When trunk_port_destroy is called from the ioctl this would wait for
all input processes to finish and release their references to be able
to disestablish the input handler and ensure full control of the port.

When switching trunkproto it is required for the ioctl context to
be able to touch all trunk ports and the protocol (tr_psc). The
easiest way do this is to disestablish all input handlers (while
making sure they all complete) and then reestablish them after the
trunk reconfiguration is completed.

This avoids getting trunk a separate locking protocol of its own.

ok mpi, suggested by and ok dlg


# 1.113 23-Sep-2015 mikeb

Keep track of an active port in the failover trunk to avoid list
iterations and additional locking protection in the future.

Suggested by and ok mpi


# 1.112 23-Sep-2015 mikeb

Remove trunk watchdog code since it doesn't do anything useful
and we want to limit the number of different places where we
access trunk port pointers.

trunk_watchdog should be never called as we don't set up it's
if_timer and trunk_port_watchdog just calls the if_watchdog
from the underlying interface.

It's possible that this is no longer needed due to if_slowtimo/
if_watchdog changes done earlier.

ok mpi


# 1.111 10-Sep-2015 mikeb

pass a cookie argument to interface input handlers that can be used
to pass additional context or transient data with the similar life
time.

ok mpi, suggestions, hand holding and ok from dlg


# 1.110 10-Sep-2015 dlg

move the if input handler list to an SRP list.

instead of having every driver that manipulates the ifih list
understand SRPLs, this moves that processing into if_ih_insert and
if_ih_remove functions.

we rely on the kernel lock to serialise the modifications to the
list.

tested by mpi@
ok mpi@ claudio@ mikeb@


Revision tags: OPENBSD_5_8_BASE
# 1.109 17-Jul-2015 mpi

Drop promiscuously received packets if the trunk(4) interface is not
in promiscuous mode.

The long story is that claudio@ had his ssh session reset multiple
times in the hackroom because czarkoff@'s machine was sending reset.
We figured out that the packet was reaching pf because of this missing
check. pf would then not find any state and sent a reset.

Analyzed with and ok phessler@, claudio@


# 1.108 02-Jul-2015 mpi

Unify the check for up & running between all pseudo-drivers.


# 1.107 02-Jul-2015 mpi

By design if_input_process() needs to hold a reference on the receiving
ifp in order to access its ifih handlers.

So get rid of if_get() in the various ifih handlers we know the ifp is
live at this point.

ok dlg@


# 1.106 30-Jun-2015 mpi

Rename if_output() into if_enqueue() to avoid confusion with comments
talking about (*ifp->if_output)().

ok claudio@, dlg@


# 1.105 29-Jun-2015 dlg

count if_ibytes in if_input like we do for if_ipackets.

tweaks and ok mpi@


# 1.104 24-Jun-2015 mpi

Increment if_ipackets in if_input().

Note that pseudo-drivers not using if_input() are not affected by this
conversion.

ok mikeb@, kettenis@, claudio@, dlg@


# 1.103 16-Jun-2015 mpi

Store a unique ID, an interface index, rather than a pointer to the
receiving interface in the packet header of every mbuf.

The interface pointer should now be retrieved when necessary with
if_get(). If a NULL pointer is returned by if_get(), the interface
has probably been destroy/removed and the mbuf should be freed.

Such mechanism will simplify garbage collection of mbufs and limit
problems with dangling ifp pointers.

Tested by jmatthew@ and krw@, discussed with many.

ok mikeb@, bluhm@, dlg@


# 1.102 15-Jun-2015 mpi

Fix a double free in the destroy path triggered when a second process,
in my case dhclient(8), races with ifconfig(8) to free the descriptors
of the joined multicast groups.

While here reduce the difference with carp(4).

ok dms@


# 1.101 09-Jun-2015 mpi

Convert trunk(4) to if_input().

ok dlg@


# 1.100 26-May-2015 mpi

Now that the Ethernet header is always passed as part of the mbuf, kill
the second (unused) argument of the input packet handlers.

ok dlg@


# 1.99 15-May-2015 mpi

Introduce if_output(), a function do to the last steps before enqueuing
a packet on the sending queue of an interface.

Tested by many, thanks a lot!

ok dlg@, claudio@


# 1.98 14-May-2015 mpi

Allocate the input packet handler as part of the trunk_port structure
since they have the same lifetime.

Requested by and ok dlg@


# 1.97 13-May-2015 mpi

Get rid of the last "#if NTRUNK" by overwriting trunk ports' output
function.

ok claudio@, reyk@


# 1.96 11-May-2015 mpi

Take trunk(4) out of ether_input().

Each physical interface of a trunk now gets a specific ifih (interface
input handler) that runs before ether_input().

Tested by sthen@, dlg@, Theo Buehler and <mxb AT alumni.chalmers DOT se>

ok sthen@, dlg@


# 1.95 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.94 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.93 04-Dec-2014 tedu

use siphash for trunk loadbalancing. ok deraadt


# 1.92 01-Dec-2014 mikeb

Make every interface with a watchdog register it's own slow timeout

This removes the system wide if_slowtimo timeout and lets every
interface with a valid if_watchdog method register it's own in
order to get rid of the ifnet loop in the softclock context and
avoid further complications with concurrent access to the ifnet
list.

ok deraadt, input and ok mpi, looked at by claudio


# 1.91 18-Nov-2014 tedu

move arc4random prototype to systm.h. more appropriate for most code
to include that than rdnvar.h. ok deraadt dlg


Revision tags: OPENBSD_5_6_BASE
# 1.90 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.89 12-Jul-2014 tedu

add a size argument to free. will be used soon, but for now default to 0.
after discussions with beck deraadt kettenis.


# 1.88 09-Jul-2014 henning

bpf code surgery / shuffling / simplification.
the various bpf_mtap_* are very similiar, they differ in what (and to some
extent how) they prepend something, and what copy function they pass to
bpf_catchpacket.
use an internal _bpf_mtap as "backend" for bpf_mtap and friends.
extend bpf_mtap_hdr so that it covers all common cases:
if dlen is 0, nothing gets prepended.
copy function can be given, if NULL the default bpf_mcopy is used.
adjust the existing bpf_mtap_hdr users to pass a NULL ptr for the copy fn.
re-implement bpf_mtap_af as simple wrapper for bpf_mtap_hdr.
re-implement bpf_mtap_ether using bpf_map_hdr
re-implement bpf_mtap_pflog as trivial bpf_mtap_hdr wrapper
ok bluhm benno


# 1.87 10-Mar-2014 mpi

if_lladdr -> if_sadl, no functional change.

ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.86 21-Nov-2013 mpi

Remove unneeded include.

ok deraadt@


# 1.85 18-Nov-2013 mpi

Convert trunk(4) to use a detachhook, discussed at b2k13 with many.

While here add a comment explaining detach hooks' order of execution when
destroying/detaching an interface.


Revision tags: OPENBSD_5_4_BASE
# 1.84 20-Jun-2013 mpi

Revert previous and unbreak asr, the new include should be protected.

Reported by naddy@


# 1.83 20-Jun-2013 mpi

Allocate the various hook head descriptors as part of the ifnet
structure rather than doing various M_WAITOK allocations during
the *attach() functions, we always rely on them anyway.

ok mikeb@, uebayasi@


# 1.82 11-May-2013 sthen

Set trunk(4)'s MTU to that of the first trunkport. Allows trunk to work with
jumbo/baby-jumbo frames. To avoid problems with mismatches between trunkports,
any additional ports must have the same MTU as already set on the trunk(4).
Based on changes made in FreeBSD. Tested by myself and jj@, ok reyk@


# 1.81 02-Apr-2013 mpi

Instead of storing the link-level address of every interface in a global
array indexed by interface numbers, add a new field to the interface
descriptor pointing to it.

claudio@ and todd@ like it, ok mikeb@


# 1.80 28-Mar-2013 tedu

no need for a lot of code to include proc.h


Revision tags: OPENBSD_5_3_BASE
# 1.79 25-Feb-2013 dlg

trunk_bcast_start sent packets on all its member interfaces by copying
the mbuf it just sent on the previous interface. this is bad because the
previous interface could have modified the mbuf chain, which can make the
subsequent m_copym()s panic.

this copies the dance that rtsock.c does for broadcasting mbufs which
copies the mbuf before transmit, except for the last interface which it
handles outside the loop.

tested by halex@ who verified it fixes his panic.
ok claudio@ deraadt@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.78 28-Oct-2011 krw

Take more care to ensure all callbacks are initialized. In particular
tr_linkstate() was not initialized in trunk_rr_attach(), and
tr_init() and tr_stop() were missing in trunk_lb_attach().

Fixes crash triggered by changing trunkproto, reported by Anders
Berggren on bugs@.

ok mpf henning


Revision tags: OPENBSD_4_9_BASE OPENBSD_5_0_BASE
# 1.77 28-Jan-2011 reyk

Fix another memory leak by replacing m_free() with m_freem() in trunk's
broadcast mode.

ok blambert@ mikeb@


# 1.76 12-Nov-2010 dhill

avoid pointer arithmetic on void *
no change in .o

ok claudio


Revision tags: OPENBSD_4_8_BASE
# 1.75 08-May-2010 stsp

Upon changing the MAC address of an if_trunk interface, all ports are switched
to the new MAC. But subsequently added ports were still being assigned the
old MAC address because it was copied from the wrong place. Give newly added
trunk ports the current MAC of the master port, rather than the saved MAC of
the master port. The saved MAC should only be used to restore the original
MAC address of the interface when it is removed from the trunk.

ok claudio@


# 1.74 23-Apr-2010 stsp

Use proper Queen's English in a comment. Drive-by fix, no functional change.


# 1.73 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


# 1.72 17-Apr-2010 deraadt

use ifnewlladdr() for trunk lladdr changes, too
ok stsp


Revision tags: OPENBSD_4_7_BASE
# 1.71 12-Jan-2010 dlg

set the length of the send queue to 1.

this prevents the ultimate length of the queue of the underlying interface
from being artificially inflated while hte vlan/trunk queue is filled and
then dumped wholesale on the underlying interface, which will dump its
massive queue wholesale on the chip.

tx mitigation is only triggered on real interfaces now (which is where the
cost is)

ok beck@ original diff ok kjc@ henning@


# 1.70 18-Nov-2009 deraadt

do not do setup that ether_ifattach() takes care of; ok jsg


# 1.69 17-Sep-2009 claudio

Add an splassert check trunk_enqueue() calling this function at anything
below splnet() is a good recipe for doom.
OK henning, reyk, mpf


# 1.68 09-Sep-2009 reyk

remove inline functions and move some code from the trunk_lacp_input()
API function directly to lacp_input() to simplify the code path.

ok mpf@


# 1.67 16-Jul-2009 thib

Backout rev1.79 of if_vlan.c and rev1.66 of if_trunk.c;
Changes in those revision limited the send queue to one slot.

This breaks NFS over vlan(4) has discovered by sthen@.

"just plain back it out." deraadt@


# 1.66 13-Jul-2009 dlg

make the send queue one slot long. this forces packets off the virtual
interfaces down to the queue on the physical interface immediately.

this avoids having the tx mitigation code wasting cpu time dicking around
with simply shuffling packets off virtual interface queues and lets it
do its job of ammortising the cost of calling a real interfaces start
routine.

it also prevents an artificial inflation of the physical interfaces queue
length where packets could hide on the virtual interfaces queues during
softnet before being dumped en masse onto the hardware. this will smooth
out the rate at which packets are submitted to the hardware.

kjc@ says this has no impact on altq. ya henning@


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.65 27-Jan-2009 naddy

handle HW VLAN tags being passed down; from Brad


# 1.64 27-Jan-2009 naddy

make the hardware/no hardware tag stripping cases consistent and don't
hash the VLAN priority; ok henning@


# 1.63 14-Dec-2008 brad

Allow trunk_hashmbuf() to take HW VLAN tagging into consideration.

ok mpf@ naddy@


# 1.62 14-Dec-2008 brad

Since trunk_hashmbuf() and thus trunk_lb_gethdr() are no longer specific
to the loadbalance code rename trunk_lb_gethdr() to just trunk_gethr().

ok mpf@


# 1.61 28-Nov-2008 brad

Eliminate the redundant bits of code for MTU and multicast handling
from the individual drivers now that ether_ioctl() handles this.

Shrinks the i386 kernels by..
RAMDISK - 2176 bytes
RAMDISKB - 1504 bytes
RAMDISKC - 736 bytes

Tested by naddy@/okan@/sthen@/brad@/todd@/jmc@ and lots of users.
Build tested on almost all archs by todd@/brad@

ok naddy@


# 1.60 16-Nov-2008 brad

Make sure to increment the the output error counter if
not using TRUNK_PROTO_NONE and there are no member ports.

ok mpf@


# 1.59 08-Nov-2008 mpf

Take into account that our ether_input() already strips the
ethernet header. This lets us actually process the incoming
LACP-Packets. It should now work with a lot more switches.
At least a Catalyst 3500 seems happy.
OK brad@


# 1.58 04-Nov-2008 brad

Move the trunk port count check from trunk_lb_start() to trunk_start()
before the protocol start routine is called so as to cover all protocols
with the same check.

ok mpf@


# 1.57 30-Oct-2008 brad

Fix building with !INET6 kernels.


# 1.56 28-Oct-2008 brad

Remove return at the end of a void function.


# 1.55 28-Oct-2008 brad

In trunk_media_status() mark the interface as active if any ports are
active rather than just the primary being UP.

From FreeBSD

Ok mpf@


# 1.54 28-Oct-2008 brad

In trunk_lb_start() port % count will never be greater than
TRUNK_MAX_PORTS so nuke the test.

From FreeBSD

Ok mpf@


# 1.53 28-Oct-2008 brad

Feed IPv6 flow label to hash calculation.

From FreeBSD

Ok mpf@


# 1.52 28-Oct-2008 brad

Show the ACTIVE flag in ifconfig for the single interface that is
actually active in failover mode rather than all interfaces with a
link. This makes it clear if the master interface is in use or one
of the backup links.

From FreeBSD

Tested by jmc@
Ok mpf@


# 1.51 02-Oct-2008 brad

First step towards cleaning up the Ethernet driver ioctl handling.
Move calling ether_ioctl() from the top of the ioctl function, which
at the moment does absolutely nothing, to the default switch case.
Thus allowing drivers to define their own ioctl handlers and then
falling back on ether_ioctl(). The only functional change this results
in at the moment is having all Ethernet drivers returning the proper
errno of ENOTTY instead of EINVAL/ENXIO when encountering unknown
ioctl's.

Shrinks the i386 kernels by..
RAMDISK - 1024 bytes
RAMDISKB - 1120 bytes
RAMDISKC - 832 bytes

Tested by martin@/jsing@/todd@/brad@
Build tested on almost all archs by todd@/brad@

ok jsing@


# 1.50 17-Sep-2008 chl

remove dead stores and newly created unused variables.

fix potential use of uninitialized value in trunk_port_ioctl() function.

Found by LLVM/Clang Static Analyzer.

ok mpf@ henning@


# 1.49 07-Aug-2008 damien

do not touch m after IFQ_ENQUEUE()+if_start().

ok brad@, mpf@, henning@, reyk@


Revision tags: OPENBSD_4_4_BASE
# 1.48 06-Aug-2008 reyk

fix trunk breakage that sneaked in with the lacp diff:

- don't use in-kernel IFMEDIA ioctls in lacp and remove two KASSERTs
that caused reliable panics - the lacp key can be locally assigned and
we don't need to query the media subtype here.

- unbreak failover/loadbalance/broadcast status handling.

Reported by brad@
ok deraadt@


# 1.47 30-Jul-2008 mpf

Prevent a divide by zero panic if trunkproto loadbalance is
used w/out any trunkports. Patch from Dmitri Alenitchev.
OK reyk@


# 1.46 15-Jun-2008 mpf

Add 802.3ad LACP support for trunk(4).
Implementation from NetBSD. Ported via FreeBSD's version in trunk^Wlagg(4).
This is still work in progress. Tested with a HP ProCurve 3500.
OK reyk@


# 1.45 14-Jun-2008 mpf

Move bpf_mtap_hdr() above the trunk_*_input() routines.
This makes it easier to add trunk protocols that consume some packets.
Add a special case for the failover protocol, to prevent shoving
duplicates to bpf. (Not beautiful, but it has to do for the moment).
OK reyk@, claudio@


# 1.44 13-Jun-2008 mpf

Move the responsibility to free received packets on trunked interfaces
from ether_input() into trunk_input() where it can be handled in a smarter way.
OK claudio@ and reyk@ on an earlier version.


# 1.43 08-Jun-2008 brad

Use m_freem() instead of m_free() in trunk_start() to ensure that the
full mbuf chain is being free'd.

ok reyk@


# 1.42 07-May-2008 dlg

enable tx mitigation when putting packets on the wire by switching from
calls to ifp->if_start to if_start(). these are the obviously right cases
where we can do that, the less obvious ones may follow as theyre figured
out.

deraadt@ said to go for it


Revision tags: OPENBSD_4_3_BASE
# 1.41 10-Jan-2008 brad

return with ENOTTY instead of EINVAL for unknown ioctl requests to trunk
ports.

ok reyk@ dlg@


# 1.40 26-Nov-2007 martynas

typos; ok jmc@
sys/netinet/in_pcb.c and sys/net/bridgestp.c ok henning@
sys/dev/pci/bktr/* ok jakemsr@


# 1.39 20-Nov-2007 canacar

Fix possible mbuf leak on error. ok reyk@


# 1.38 22-Oct-2007 reyk

use the input mbuf for the first active port instead of copying it in
the broadcast start routing.

ok pyr@


# 1.37 22-Oct-2007 pyr

Add a broadcast mode to trunk(4). This mode sends frames on all
ports and receives frame on any port. This allows interaction with
some L2 configurations.
with input and ok reyk@


# 1.36 15-Sep-2007 henning

malloc sweep:
-remove useless casts
-MALLOC/FREE -> malloc/free
-use M_ZERO where appropriate instead of seperate bzero
feedback & ok krw, hshoexer


# 1.35 07-Sep-2007 reyk

use M_ZERO


# 1.34 06-Sep-2007 reyk

bump the copyright while touching these files


# 1.33 06-Sep-2007 reyk

With a trunk(4) interface in fail over mode the trunk(4) interface
will show input errors for packets received from any of the ports that
are part of a fail over interface but are not the "master" port at the
time. This fixes the problem by checking the error condition
correctly.

From brad at comstyle dot com


Revision tags: OPENBSD_4_2_BASE
# 1.32 26-May-2007 jason

one extern seems to be better than 20 for ifqmaxlen; ok krw


# 1.31 26-Apr-2007 reyk

extend the trunk protocol API with some additional callbacks required
for future work. also move the repeated tx start code into a common
function.

parts of it are merged from FreeBSD's trunk(4) port. oh, wait... they
renamed it to 'lagg(4)' because a little green guy from Cizzco-Eeeh
told them "trunk is for VLANs, trunk is for VLANs". Bad FreeBSD, don't
listen to the little green guy from Cizzco-Eeeh!

ok claudio@


Revision tags: OPENBSD_4_1_BASE
# 1.30 31-Jan-2007 reyk

handle the full duplex link state in trunk(4). load sharing trunks
with at least two ports are always handled as full fuplex links. this
change will allow trunks as edge ports in a rstp bridge(4).

ok brad@ pyr@


Revision tags: OPENBSD_4_0_BASE
# 1.29 28-May-2006 reyk

check if the interface is active and UP. some, but not all, network
drivers report an active link state even if the interface is DOWN.
this should fix trunk with various ethernet devices.

ok brad@


# 1.28 23-May-2006 reyk

knf and remove an unneeded debug message


# 1.27 23-May-2006 reyk

add


# 1.26 20-May-2006 reyk

bump copyright


# 1.25 20-May-2006 reyk

drop packets received on inactive failover ports without increasing the
error counter. just silently drop them...

figured out by todd@, ok brad@


# 1.24 16-May-2006 reyk

the ifp->if_linkstatehooks may be NULL, add an extra check to avoid
possible kernel panic. this happened to me by using tun(4) in layer 2
mode (link0 flag) as a trunk port for testing.


# 1.23 25-Mar-2006 djm

allow bpf(4) to ignore packets based on their direction (inbound or
outbound), using a new BIOCSDIRFILT ioctl;
guidance, feedback and ok canacar@


# 1.22 11-Mar-2006 brad

splimp -> splnet


# 1.21 04-Mar-2006 brad

With the exception of two other small uncommited diffs this moves
the remainder of the network stack from splimp to splnet.

ok miod@


Revision tags: OPENBSD_3_9_BASE
# 1.20 04-Jan-2006 brad

Move bpf_mtap_hdr() after trunk_*_input() so that we hopefully see only one
copy of the packet with bpf.

From mcbride@

some testing by todd@, ok reyk@


# 1.19 04-Jan-2006 canacar

Remove redundant calls to bpfdetach.
ok brad@


# 1.18 21-Dec-2005 reyk

knf


# 1.17 21-Dec-2005 reyk

fix possible NULL pointer, thanks to Marco Molteni


# 1.16 18-Dec-2005 reyk

Update my e-mail address in the copyright statement, no binary changes.


# 1.15 17-Dec-2005 brad

revert last commit as it introduced system panics due to improper use
of bpf_mtap().


# 1.14 27-Nov-2005 mcbride

Move bpf_mtap() after trunk_*_input() so that we hopefull see only one
copy of the packet with bpf.

ok reyk@


# 1.13 27-Nov-2005 mcbride

Attempt to accept a packet only once when operating in failover mode.

Makes trunk usable with hubs or switches which don't have actual trunk support.

ok reyk@


# 1.12 27-Nov-2005 mcbride

Fix dereference of uninitialised pointer in trunk_input() error path.

ok reyk@


# 1.11 27-Nov-2005 mcbride

Make the trunk interface link state depend on the link states of the
trunkports (link is UP as long as at least one of the trunkports is up)

ok reyk@


# 1.10 23-Oct-2005 mpf

Rework of multicast deletion code for vlan(4) and trunk(4).
The previous code could wrongly delete multicast groups
on the parent interface. Now we forward only remembered
delete requests.
OK mcbride, mickey.


# 1.9 09-Oct-2005 reyk

use label


# 1.8 03-Oct-2005 reyk

add a simple active "failover" trunk mode. port priorities will be
added later, currently the master port will always be the default
(active) port and the the next active port will be used as the
failover port.

ok brad@


# 1.7 14-Sep-2005 reyk

correctly set IFF_RUNNING flag after device state changes.

ok brad@


# 1.6 11-Sep-2005 brad

when adding any IP addresses make sure to UP the interface.

ok reyk@


# 1.5 10-Sep-2005 reyk

update the trunk(4) driver

- add multicast support by passing multicast addresses to the ports.
this is a requirement for carp(4) over trunk(4).

- support the smallest common interface capabilities. ie., this adds
support for VLAN MTUs if all attached ports have this capability.

- add a port_destroy callback to the trunk protocol. this fixes a
potential crash if the master port has been detached while running.

discussed with deraadt@, brad@ and some others.


Revision tags: OPENBSD_3_8_BASE
# 1.4 31-Jul-2005 pascoe

Introduce bpf_mtap_af and bpf_mtap_hdr to be used when passing a mbuf chain
to bpf with either an address family or other header added.

These helpers only allocate a much smaller struct m_hdr on the stack when
needed, rather than leaving 256 byte struct mbufs on the stack in deep
call paths. Also removes a fair bit of duplicated code.

commit now, tune after deraadt@


# 1.3 27-May-2005 reyk

add missing free on error. thanks to Andrey Matveev.


# 1.2 24-May-2005 reyk

support trunk stacking (trunks as trunk ports) and some fixes

ok brad@


# 1.1 24-May-2005 reyk

initial import of a trunking (link aggregation and link failover)
implementation. it currently supports round robin mode with link state
checking, additional modes will be added later.

ok brad@, deraadt@


# 1.136 19-Feb-2018 mpi

Remove almost unused `flags' argument of suser().

The account flag `ASU' will no longer be set but that makes suser()
mpsafe since it no longer mess with a per-process field.

No objection from millert@, ok tedu@, bluhm@


# 1.135 09-Jan-2018 bluhm

Creating a cloned interface could return ENOMEM due to temporary
memory shortage. As it is invoked from a system call, it should
not fail and wait instead.
OK visa@ mpi@


Revision tags: OPENBSD_6_2_BASE
# 1.134 14-Aug-2017 reyk

The "ret" return value is reused and overwritten, potentially
returning 0 (success) on error instead of an error number. The caller
doesn't evaluate the return value, so it is good enough to return
ENOBUFS (non-0) on error and to remove "ret" in trunk_cast_start().

Coverity CID 1453105; Severity: Minor

OK mpi@


# 1.133 11-Aug-2017 mpi

Remove NET_LOCK()'s argument.

Tested by Hrvoje Popovski, ok bluhm@


# 1.132 28-May-2017 mpi

Add missing NET_UNLOCK() in error path.

Spotted by sashan@


# 1.131 28-May-2017 mpi

trunk_port_destroy() needs the NET_LOCK().

It brings the interface down and restore the original lladdr.

Found by Hrvoje Popovski


# 1.130 28-May-2017 mpi

Remove useless splnet()/splx() dances.

Data structures modified in the ioctl path are protected by the NET_LOCK().

ok sashan@


Revision tags: OPENBSD_6_1_BASE
# 1.129 22-Jan-2017 dlg

move counting if_opackets next to counting if_obytes in if_enqueue.

this means packets are consistently counted in one place, unlike the
many and various ways that drivers thought they should do it.

ok mpi@ deraadt@


# 1.128 16-Sep-2016 mikeb

Reconfigure interface capabilities after switching trunkproto; ok mpi


Revision tags: OPENBSD_6_0_BASE
# 1.127 13-Apr-2016 mpi

We're always ready! So send IFQ_SET_READY() to the bitbucket.


Revision tags: OPENBSD_5_9_BASE
# 1.126 31-Dec-2015 sthen

Move tr_port_destroy down; fixes 'lacp_compose_key protection fault trap'
when removing a port from a lacp trunk. Part of a larger diff from mpi,
as suggested by mikeb. ok mpi@


# 1.125 21-Nov-2015 dlg

dont check IFF_OACTIVE to see if the port is busy.

dont check if its busy at all, actually.

fine with reyk@


# 1.124 20-Nov-2015 dlg

dont play with IFF_OACTIVE needlessly.

only a driver sets or clears it, and trunk never sets it. therefore it
never needs to clear it.


# 1.123 12-Nov-2015 mpi

Prefix flowid with ph_ and print it in m_print().

ok dlg@


# 1.122 25-Oct-2015 mpi

arp_ifinit() is no longer required.


# 1.121 08-Oct-2015 mikeb

Make sure that when trunk_port_ioctl is called to set a new
lladdr the trunk port is already on the list.

OK mpi


# 1.120 08-Oct-2015 dlg

if the mbuf has a valid flowid, use it instead of using siphash24
and a bunch of header fields we have to parse the mbuf for.

siphash24 is about 20% of the cost of sending a udp packet on a
trunk interface with tcpbench on my box. if there's a flowid set
we get all that back.

ok mpi@ mikeb@ sthen@


# 1.119 05-Oct-2015 mikeb

Factor LACP frame processing out to a separate task

This is slightly refactored version of the diff by jmatthew@
that makes use of a single per-trunk task but retains per-port
mbuf queues.

Running LACP frame processing in a task context allows a simple
way to synchronize changes to the trunk ports and trunk itself
performed from the ioctl, timeout and task contexts with a kernel
lock.

OK mpi


# 1.118 29-Sep-2015 deraadt

add sizes to some of the simpler free calls
ok mpi


# 1.117 28-Sep-2015 mpi

Remove "if_tp" from the "struct ifnet".

Instead of violating a layer of abstraction by keeping per pseudo-driver
informations in "struct ifnet", the port trunk is now passed as a cookie
to the interface input handler (ifih).

The time of per pseudo-driver hack in the network stack is over!

ok mikeb@


# 1.116 24-Sep-2015 mikeb

add a comment explaining how we serialize when switching trunkproto;
requested by mpi@


# 1.115 24-Sep-2015 mikeb

Avoid a theoretical m_pullup(9) mishandling by delegating the mbuf
reclaiming to the PDU and marker input routines.

m_pullup may return a pointer to the newly allocated mbuf. In this
case m_freem is called by the trunk_input, not by the proto specific
code and pointer to the mbuf is not passed by reference. Therefore
m_freem will either be called on the middle element of the chain
(when the m_pullup call succeeds) or on the stale pointer (when it
frees the chain in the failure case). Fortunately we should never
hit this case as the receive path uniformly uses contiguous chunks
of memory.

Verified with and ok blambert, ok mpi


# 1.114 23-Sep-2015 mikeb

Serialize trunk changes with input handler insertion and removal.

This moves around calls to if_ih_insert and if_ih_remove to ensure
that we either have completed port initialization or are going to
tear the port configuration down and don't want any input processes
to get hold of the port.

When trunk_port_destroy is called from the ioctl this would wait for
all input processes to finish and release their references to be able
to disestablish the input handler and ensure full control of the port.

When switching trunkproto it is required for the ioctl context to
be able to touch all trunk ports and the protocol (tr_psc). The
easiest way do this is to disestablish all input handlers (while
making sure they all complete) and then reestablish them after the
trunk reconfiguration is completed.

This avoids getting trunk a separate locking protocol of its own.

ok mpi, suggested by and ok dlg


# 1.113 23-Sep-2015 mikeb

Keep track of an active port in the failover trunk to avoid list
iterations and additional locking protection in the future.

Suggested by and ok mpi


# 1.112 23-Sep-2015 mikeb

Remove trunk watchdog code since it doesn't do anything useful
and we want to limit the number of different places where we
access trunk port pointers.

trunk_watchdog should be never called as we don't set up it's
if_timer and trunk_port_watchdog just calls the if_watchdog
from the underlying interface.

It's possible that this is no longer needed due to if_slowtimo/
if_watchdog changes done earlier.

ok mpi


# 1.111 10-Sep-2015 mikeb

pass a cookie argument to interface input handlers that can be used
to pass additional context or transient data with the similar life
time.

ok mpi, suggestions, hand holding and ok from dlg


# 1.110 10-Sep-2015 dlg

move the if input handler list to an SRP list.

instead of having every driver that manipulates the ifih list
understand SRPLs, this moves that processing into if_ih_insert and
if_ih_remove functions.

we rely on the kernel lock to serialise the modifications to the
list.

tested by mpi@
ok mpi@ claudio@ mikeb@


Revision tags: OPENBSD_5_8_BASE
# 1.109 17-Jul-2015 mpi

Drop promiscuously received packets if the trunk(4) interface is not
in promiscuous mode.

The long story is that claudio@ had his ssh session reset multiple
times in the hackroom because czarkoff@'s machine was sending reset.
We figured out that the packet was reaching pf because of this missing
check. pf would then not find any state and sent a reset.

Analyzed with and ok phessler@, claudio@


# 1.108 02-Jul-2015 mpi

Unify the check for up & running between all pseudo-drivers.


# 1.107 02-Jul-2015 mpi

By design if_input_process() needs to hold a reference on the receiving
ifp in order to access its ifih handlers.

So get rid of if_get() in the various ifih handlers we know the ifp is
live at this point.

ok dlg@


# 1.106 30-Jun-2015 mpi

Rename if_output() into if_enqueue() to avoid confusion with comments
talking about (*ifp->if_output)().

ok claudio@, dlg@


# 1.105 29-Jun-2015 dlg

count if_ibytes in if_input like we do for if_ipackets.

tweaks and ok mpi@


# 1.104 24-Jun-2015 mpi

Increment if_ipackets in if_input().

Note that pseudo-drivers not using if_input() are not affected by this
conversion.

ok mikeb@, kettenis@, claudio@, dlg@


# 1.103 16-Jun-2015 mpi

Store a unique ID, an interface index, rather than a pointer to the
receiving interface in the packet header of every mbuf.

The interface pointer should now be retrieved when necessary with
if_get(). If a NULL pointer is returned by if_get(), the interface
has probably been destroy/removed and the mbuf should be freed.

Such mechanism will simplify garbage collection of mbufs and limit
problems with dangling ifp pointers.

Tested by jmatthew@ and krw@, discussed with many.

ok mikeb@, bluhm@, dlg@


# 1.102 15-Jun-2015 mpi

Fix a double free in the destroy path triggered when a second process,
in my case dhclient(8), races with ifconfig(8) to free the descriptors
of the joined multicast groups.

While here reduce the difference with carp(4).

ok dms@


# 1.101 09-Jun-2015 mpi

Convert trunk(4) to if_input().

ok dlg@


# 1.100 26-May-2015 mpi

Now that the Ethernet header is always passed as part of the mbuf, kill
the second (unused) argument of the input packet handlers.

ok dlg@


# 1.99 15-May-2015 mpi

Introduce if_output(), a function do to the last steps before enqueuing
a packet on the sending queue of an interface.

Tested by many, thanks a lot!

ok dlg@, claudio@


# 1.98 14-May-2015 mpi

Allocate the input packet handler as part of the trunk_port structure
since they have the same lifetime.

Requested by and ok dlg@


# 1.97 13-May-2015 mpi

Get rid of the last "#if NTRUNK" by overwriting trunk ports' output
function.

ok claudio@, reyk@


# 1.96 11-May-2015 mpi

Take trunk(4) out of ether_input().

Each physical interface of a trunk now gets a specific ifih (interface
input handler) that runs before ether_input().

Tested by sthen@, dlg@, Theo Buehler and <mxb AT alumni.chalmers DOT se>

ok sthen@, dlg@


# 1.95 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.94 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.93 04-Dec-2014 tedu

use siphash for trunk loadbalancing. ok deraadt


# 1.92 01-Dec-2014 mikeb

Make every interface with a watchdog register it's own slow timeout

This removes the system wide if_slowtimo timeout and lets every
interface with a valid if_watchdog method register it's own in
order to get rid of the ifnet loop in the softclock context and
avoid further complications with concurrent access to the ifnet
list.

ok deraadt, input and ok mpi, looked at by claudio


# 1.91 18-Nov-2014 tedu

move arc4random prototype to systm.h. more appropriate for most code
to include that than rdnvar.h. ok deraadt dlg


Revision tags: OPENBSD_5_6_BASE
# 1.90 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.89 12-Jul-2014 tedu

add a size argument to free. will be used soon, but for now default to 0.
after discussions with beck deraadt kettenis.


# 1.88 09-Jul-2014 henning

bpf code surgery / shuffling / simplification.
the various bpf_mtap_* are very similiar, they differ in what (and to some
extent how) they prepend something, and what copy function they pass to
bpf_catchpacket.
use an internal _bpf_mtap as "backend" for bpf_mtap and friends.
extend bpf_mtap_hdr so that it covers all common cases:
if dlen is 0, nothing gets prepended.
copy function can be given, if NULL the default bpf_mcopy is used.
adjust the existing bpf_mtap_hdr users to pass a NULL ptr for the copy fn.
re-implement bpf_mtap_af as simple wrapper for bpf_mtap_hdr.
re-implement bpf_mtap_ether using bpf_map_hdr
re-implement bpf_mtap_pflog as trivial bpf_mtap_hdr wrapper
ok bluhm benno


# 1.87 10-Mar-2014 mpi

if_lladdr -> if_sadl, no functional change.

ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.86 21-Nov-2013 mpi

Remove unneeded include.

ok deraadt@


# 1.85 18-Nov-2013 mpi

Convert trunk(4) to use a detachhook, discussed at b2k13 with many.

While here add a comment explaining detach hooks' order of execution when
destroying/detaching an interface.


Revision tags: OPENBSD_5_4_BASE
# 1.84 20-Jun-2013 mpi

Revert previous and unbreak asr, the new include should be protected.

Reported by naddy@


# 1.83 20-Jun-2013 mpi

Allocate the various hook head descriptors as part of the ifnet
structure rather than doing various M_WAITOK allocations during
the *attach() functions, we always rely on them anyway.

ok mikeb@, uebayasi@


# 1.82 11-May-2013 sthen

Set trunk(4)'s MTU to that of the first trunkport. Allows trunk to work with
jumbo/baby-jumbo frames. To avoid problems with mismatches between trunkports,
any additional ports must have the same MTU as already set on the trunk(4).
Based on changes made in FreeBSD. Tested by myself and jj@, ok reyk@


# 1.81 02-Apr-2013 mpi

Instead of storing the link-level address of every interface in a global
array indexed by interface numbers, add a new field to the interface
descriptor pointing to it.

claudio@ and todd@ like it, ok mikeb@


# 1.80 28-Mar-2013 tedu

no need for a lot of code to include proc.h


Revision tags: OPENBSD_5_3_BASE
# 1.79 25-Feb-2013 dlg

trunk_bcast_start sent packets on all its member interfaces by copying
the mbuf it just sent on the previous interface. this is bad because the
previous interface could have modified the mbuf chain, which can make the
subsequent m_copym()s panic.

this copies the dance that rtsock.c does for broadcasting mbufs which
copies the mbuf before transmit, except for the last interface which it
handles outside the loop.

tested by halex@ who verified it fixes his panic.
ok claudio@ deraadt@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.78 28-Oct-2011 krw

Take more care to ensure all callbacks are initialized. In particular
tr_linkstate() was not initialized in trunk_rr_attach(), and
tr_init() and tr_stop() were missing in trunk_lb_attach().

Fixes crash triggered by changing trunkproto, reported by Anders
Berggren on bugs@.

ok mpf henning


Revision tags: OPENBSD_4_9_BASE OPENBSD_5_0_BASE
# 1.77 28-Jan-2011 reyk

Fix another memory leak by replacing m_free() with m_freem() in trunk's
broadcast mode.

ok blambert@ mikeb@


# 1.76 12-Nov-2010 dhill

avoid pointer arithmetic on void *
no change in .o

ok claudio


Revision tags: OPENBSD_4_8_BASE
# 1.75 08-May-2010 stsp

Upon changing the MAC address of an if_trunk interface, all ports are switched
to the new MAC. But subsequently added ports were still being assigned the
old MAC address because it was copied from the wrong place. Give newly added
trunk ports the current MAC of the master port, rather than the saved MAC of
the master port. The saved MAC should only be used to restore the original
MAC address of the interface when it is removed from the trunk.

ok claudio@


# 1.74 23-Apr-2010 stsp

Use proper Queen's English in a comment. Drive-by fix, no functional change.


# 1.73 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


# 1.72 17-Apr-2010 deraadt

use ifnewlladdr() for trunk lladdr changes, too
ok stsp


Revision tags: OPENBSD_4_7_BASE
# 1.71 12-Jan-2010 dlg

set the length of the send queue to 1.

this prevents the ultimate length of the queue of the underlying interface
from being artificially inflated while hte vlan/trunk queue is filled and
then dumped wholesale on the underlying interface, which will dump its
massive queue wholesale on the chip.

tx mitigation is only triggered on real interfaces now (which is where the
cost is)

ok beck@ original diff ok kjc@ henning@


# 1.70 18-Nov-2009 deraadt

do not do setup that ether_ifattach() takes care of; ok jsg


# 1.69 17-Sep-2009 claudio

Add an splassert check trunk_enqueue() calling this function at anything
below splnet() is a good recipe for doom.
OK henning, reyk, mpf


# 1.68 09-Sep-2009 reyk

remove inline functions and move some code from the trunk_lacp_input()
API function directly to lacp_input() to simplify the code path.

ok mpf@


# 1.67 16-Jul-2009 thib

Backout rev1.79 of if_vlan.c and rev1.66 of if_trunk.c;
Changes in those revision limited the send queue to one slot.

This breaks NFS over vlan(4) has discovered by sthen@.

"just plain back it out." deraadt@


# 1.66 13-Jul-2009 dlg

make the send queue one slot long. this forces packets off the virtual
interfaces down to the queue on the physical interface immediately.

this avoids having the tx mitigation code wasting cpu time dicking around
with simply shuffling packets off virtual interface queues and lets it
do its job of ammortising the cost of calling a real interfaces start
routine.

it also prevents an artificial inflation of the physical interfaces queue
length where packets could hide on the virtual interfaces queues during
softnet before being dumped en masse onto the hardware. this will smooth
out the rate at which packets are submitted to the hardware.

kjc@ says this has no impact on altq. ya henning@


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.65 27-Jan-2009 naddy

handle HW VLAN tags being passed down; from Brad


# 1.64 27-Jan-2009 naddy

make the hardware/no hardware tag stripping cases consistent and don't
hash the VLAN priority; ok henning@


# 1.63 14-Dec-2008 brad

Allow trunk_hashmbuf() to take HW VLAN tagging into consideration.

ok mpf@ naddy@


# 1.62 14-Dec-2008 brad

Since trunk_hashmbuf() and thus trunk_lb_gethdr() are no longer specific
to the loadbalance code rename trunk_lb_gethdr() to just trunk_gethr().

ok mpf@


# 1.61 28-Nov-2008 brad

Eliminate the redundant bits of code for MTU and multicast handling
from the individual drivers now that ether_ioctl() handles this.

Shrinks the i386 kernels by..
RAMDISK - 2176 bytes
RAMDISKB - 1504 bytes
RAMDISKC - 736 bytes

Tested by naddy@/okan@/sthen@/brad@/todd@/jmc@ and lots of users.
Build tested on almost all archs by todd@/brad@

ok naddy@


# 1.60 16-Nov-2008 brad

Make sure to increment the the output error counter if
not using TRUNK_PROTO_NONE and there are no member ports.

ok mpf@


# 1.59 08-Nov-2008 mpf

Take into account that our ether_input() already strips the
ethernet header. This lets us actually process the incoming
LACP-Packets. It should now work with a lot more switches.
At least a Catalyst 3500 seems happy.
OK brad@


# 1.58 04-Nov-2008 brad

Move the trunk port count check from trunk_lb_start() to trunk_start()
before the protocol start routine is called so as to cover all protocols
with the same check.

ok mpf@


# 1.57 30-Oct-2008 brad

Fix building with !INET6 kernels.


# 1.56 28-Oct-2008 brad

Remove return at the end of a void function.


# 1.55 28-Oct-2008 brad

In trunk_media_status() mark the interface as active if any ports are
active rather than just the primary being UP.

From FreeBSD

Ok mpf@


# 1.54 28-Oct-2008 brad

In trunk_lb_start() port % count will never be greater than
TRUNK_MAX_PORTS so nuke the test.

From FreeBSD

Ok mpf@


# 1.53 28-Oct-2008 brad

Feed IPv6 flow label to hash calculation.

From FreeBSD

Ok mpf@


# 1.52 28-Oct-2008 brad

Show the ACTIVE flag in ifconfig for the single interface that is
actually active in failover mode rather than all interfaces with a
link. This makes it clear if the master interface is in use or one
of the backup links.

From FreeBSD

Tested by jmc@
Ok mpf@


# 1.51 02-Oct-2008 brad

First step towards cleaning up the Ethernet driver ioctl handling.
Move calling ether_ioctl() from the top of the ioctl function, which
at the moment does absolutely nothing, to the default switch case.
Thus allowing drivers to define their own ioctl handlers and then
falling back on ether_ioctl(). The only functional change this results
in at the moment is having all Ethernet drivers returning the proper
errno of ENOTTY instead of EINVAL/ENXIO when encountering unknown
ioctl's.

Shrinks the i386 kernels by..
RAMDISK - 1024 bytes
RAMDISKB - 1120 bytes
RAMDISKC - 832 bytes

Tested by martin@/jsing@/todd@/brad@
Build tested on almost all archs by todd@/brad@

ok jsing@


# 1.50 17-Sep-2008 chl

remove dead stores and newly created unused variables.

fix potential use of uninitialized value in trunk_port_ioctl() function.

Found by LLVM/Clang Static Analyzer.

ok mpf@ henning@


# 1.49 07-Aug-2008 damien

do not touch m after IFQ_ENQUEUE()+if_start().

ok brad@, mpf@, henning@, reyk@


Revision tags: OPENBSD_4_4_BASE
# 1.48 06-Aug-2008 reyk

fix trunk breakage that sneaked in with the lacp diff:

- don't use in-kernel IFMEDIA ioctls in lacp and remove two KASSERTs
that caused reliable panics - the lacp key can be locally assigned and
we don't need to query the media subtype here.

- unbreak failover/loadbalance/broadcast status handling.

Reported by brad@
ok deraadt@


# 1.47 30-Jul-2008 mpf

Prevent a divide by zero panic if trunkproto loadbalance is
used w/out any trunkports. Patch from Dmitri Alenitchev.
OK reyk@


# 1.46 15-Jun-2008 mpf

Add 802.3ad LACP support for trunk(4).
Implementation from NetBSD. Ported via FreeBSD's version in trunk^Wlagg(4).
This is still work in progress. Tested with a HP ProCurve 3500.
OK reyk@


# 1.45 14-Jun-2008 mpf

Move bpf_mtap_hdr() above the trunk_*_input() routines.
This makes it easier to add trunk protocols that consume some packets.
Add a special case for the failover protocol, to prevent shoving
duplicates to bpf. (Not beautiful, but it has to do for the moment).
OK reyk@, claudio@


# 1.44 13-Jun-2008 mpf

Move the responsibility to free received packets on trunked interfaces
from ether_input() into trunk_input() where it can be handled in a smarter way.
OK claudio@ and reyk@ on an earlier version.


# 1.43 08-Jun-2008 brad

Use m_freem() instead of m_free() in trunk_start() to ensure that the
full mbuf chain is being free'd.

ok reyk@


# 1.42 07-May-2008 dlg

enable tx mitigation when putting packets on the wire by switching from
calls to ifp->if_start to if_start(). these are the obviously right cases
where we can do that, the less obvious ones may follow as theyre figured
out.

deraadt@ said to go for it


Revision tags: OPENBSD_4_3_BASE
# 1.41 10-Jan-2008 brad

return with ENOTTY instead of EINVAL for unknown ioctl requests to trunk
ports.

ok reyk@ dlg@


# 1.40 26-Nov-2007 martynas

typos; ok jmc@
sys/netinet/in_pcb.c and sys/net/bridgestp.c ok henning@
sys/dev/pci/bktr/* ok jakemsr@


# 1.39 20-Nov-2007 canacar

Fix possible mbuf leak on error. ok reyk@


# 1.38 22-Oct-2007 reyk

use the input mbuf for the first active port instead of copying it in
the broadcast start routing.

ok pyr@


# 1.37 22-Oct-2007 pyr

Add a broadcast mode to trunk(4). This mode sends frames on all
ports and receives frame on any port. This allows interaction with
some L2 configurations.
with input and ok reyk@


# 1.36 15-Sep-2007 henning

malloc sweep:
-remove useless casts
-MALLOC/FREE -> malloc/free
-use M_ZERO where appropriate instead of seperate bzero
feedback & ok krw, hshoexer


# 1.35 07-Sep-2007 reyk

use M_ZERO


# 1.34 06-Sep-2007 reyk

bump the copyright while touching these files


# 1.33 06-Sep-2007 reyk

With a trunk(4) interface in fail over mode the trunk(4) interface
will show input errors for packets received from any of the ports that
are part of a fail over interface but are not the "master" port at the
time. This fixes the problem by checking the error condition
correctly.

From brad at comstyle dot com


Revision tags: OPENBSD_4_2_BASE
# 1.32 26-May-2007 jason

one extern seems to be better than 20 for ifqmaxlen; ok krw


# 1.31 26-Apr-2007 reyk

extend the trunk protocol API with some additional callbacks required
for future work. also move the repeated tx start code into a common
function.

parts of it are merged from FreeBSD's trunk(4) port. oh, wait... they
renamed it to 'lagg(4)' because a little green guy from Cizzco-Eeeh
told them "trunk is for VLANs, trunk is for VLANs". Bad FreeBSD, don't
listen to the little green guy from Cizzco-Eeeh!

ok claudio@


Revision tags: OPENBSD_4_1_BASE
# 1.30 31-Jan-2007 reyk

handle the full duplex link state in trunk(4). load sharing trunks
with at least two ports are always handled as full fuplex links. this
change will allow trunks as edge ports in a rstp bridge(4).

ok brad@ pyr@


Revision tags: OPENBSD_4_0_BASE
# 1.29 28-May-2006 reyk

check if the interface is active and UP. some, but not all, network
drivers report an active link state even if the interface is DOWN.
this should fix trunk with various ethernet devices.

ok brad@


# 1.28 23-May-2006 reyk

knf and remove an unneeded debug message


# 1.27 23-May-2006 reyk

add


# 1.26 20-May-2006 reyk

bump copyright


# 1.25 20-May-2006 reyk

drop packets received on inactive failover ports without increasing the
error counter. just silently drop them...

figured out by todd@, ok brad@


# 1.24 16-May-2006 reyk

the ifp->if_linkstatehooks may be NULL, add an extra check to avoid
possible kernel panic. this happened to me by using tun(4) in layer 2
mode (link0 flag) as a trunk port for testing.


# 1.23 25-Mar-2006 djm

allow bpf(4) to ignore packets based on their direction (inbound or
outbound), using a new BIOCSDIRFILT ioctl;
guidance, feedback and ok canacar@


# 1.22 11-Mar-2006 brad

splimp -> splnet


# 1.21 04-Mar-2006 brad

With the exception of two other small uncommited diffs this moves
the remainder of the network stack from splimp to splnet.

ok miod@


Revision tags: OPENBSD_3_9_BASE
# 1.20 04-Jan-2006 brad

Move bpf_mtap_hdr() after trunk_*_input() so that we hopefully see only one
copy of the packet with bpf.

From mcbride@

some testing by todd@, ok reyk@


# 1.19 04-Jan-2006 canacar

Remove redundant calls to bpfdetach.
ok brad@


# 1.18 21-Dec-2005 reyk

knf


# 1.17 21-Dec-2005 reyk

fix possible NULL pointer, thanks to Marco Molteni


# 1.16 18-Dec-2005 reyk

Update my e-mail address in the copyright statement, no binary changes.


# 1.15 17-Dec-2005 brad

revert last commit as it introduced system panics due to improper use
of bpf_mtap().


# 1.14 27-Nov-2005 mcbride

Move bpf_mtap() after trunk_*_input() so that we hopefull see only one
copy of the packet with bpf.

ok reyk@


# 1.13 27-Nov-2005 mcbride

Attempt to accept a packet only once when operating in failover mode.

Makes trunk usable with hubs or switches which don't have actual trunk support.

ok reyk@


# 1.12 27-Nov-2005 mcbride

Fix dereference of uninitialised pointer in trunk_input() error path.

ok reyk@


# 1.11 27-Nov-2005 mcbride

Make the trunk interface link state depend on the link states of the
trunkports (link is UP as long as at least one of the trunkports is up)

ok reyk@


# 1.10 23-Oct-2005 mpf

Rework of multicast deletion code for vlan(4) and trunk(4).
The previous code could wrongly delete multicast groups
on the parent interface. Now we forward only remembered
delete requests.
OK mcbride, mickey.


# 1.9 09-Oct-2005 reyk

use label


# 1.8 03-Oct-2005 reyk

add a simple active "failover" trunk mode. port priorities will be
added later, currently the master port will always be the default
(active) port and the the next active port will be used as the
failover port.

ok brad@


# 1.7 14-Sep-2005 reyk

correctly set IFF_RUNNING flag after device state changes.

ok brad@


# 1.6 11-Sep-2005 brad

when adding any IP addresses make sure to UP the interface.

ok reyk@


# 1.5 10-Sep-2005 reyk

update the trunk(4) driver

- add multicast support by passing multicast addresses to the ports.
this is a requirement for carp(4) over trunk(4).

- support the smallest common interface capabilities. ie., this adds
support for VLAN MTUs if all attached ports have this capability.

- add a port_destroy callback to the trunk protocol. this fixes a
potential crash if the master port has been detached while running.

discussed with deraadt@, brad@ and some others.


Revision tags: OPENBSD_3_8_BASE
# 1.4 31-Jul-2005 pascoe

Introduce bpf_mtap_af and bpf_mtap_hdr to be used when passing a mbuf chain
to bpf with either an address family or other header added.

These helpers only allocate a much smaller struct m_hdr on the stack when
needed, rather than leaving 256 byte struct mbufs on the stack in deep
call paths. Also removes a fair bit of duplicated code.

commit now, tune after deraadt@


# 1.3 27-May-2005 reyk

add missing free on error. thanks to Andrey Matveev.


# 1.2 24-May-2005 reyk

support trunk stacking (trunks as trunk ports) and some fixes

ok brad@


# 1.1 24-May-2005 reyk

initial import of a trunking (link aggregation and link failover)
implementation. it currently supports round robin mode with link state
checking, additional modes will be added later.

ok brad@, deraadt@


# 1.135 09-Jan-2018 bluhm

Creating a cloned interface could return ENOMEM due to temporary
memory shortage. As it is invoked from a system call, it should
not fail and wait instead.
OK visa@ mpi@


Revision tags: OPENBSD_6_2_BASE
# 1.134 14-Aug-2017 reyk

The "ret" return value is reused and overwritten, potentially
returning 0 (success) on error instead of an error number. The caller
doesn't evaluate the return value, so it is good enough to return
ENOBUFS (non-0) on error and to remove "ret" in trunk_cast_start().

Coverity CID 1453105; Severity: Minor

OK mpi@


# 1.133 11-Aug-2017 mpi

Remove NET_LOCK()'s argument.

Tested by Hrvoje Popovski, ok bluhm@


# 1.132 28-May-2017 mpi

Add missing NET_UNLOCK() in error path.

Spotted by sashan@


# 1.131 28-May-2017 mpi

trunk_port_destroy() needs the NET_LOCK().

It brings the interface down and restore the original lladdr.

Found by Hrvoje Popovski


# 1.130 28-May-2017 mpi

Remove useless splnet()/splx() dances.

Data structures modified in the ioctl path are protected by the NET_LOCK().

ok sashan@


Revision tags: OPENBSD_6_1_BASE
# 1.129 22-Jan-2017 dlg

move counting if_opackets next to counting if_obytes in if_enqueue.

this means packets are consistently counted in one place, unlike the
many and various ways that drivers thought they should do it.

ok mpi@ deraadt@


# 1.128 16-Sep-2016 mikeb

Reconfigure interface capabilities after switching trunkproto; ok mpi


Revision tags: OPENBSD_6_0_BASE
# 1.127 13-Apr-2016 mpi

We're always ready! So send IFQ_SET_READY() to the bitbucket.


Revision tags: OPENBSD_5_9_BASE
# 1.126 31-Dec-2015 sthen

Move tr_port_destroy down; fixes 'lacp_compose_key protection fault trap'
when removing a port from a lacp trunk. Part of a larger diff from mpi,
as suggested by mikeb. ok mpi@


# 1.125 21-Nov-2015 dlg

dont check IFF_OACTIVE to see if the port is busy.

dont check if its busy at all, actually.

fine with reyk@


# 1.124 20-Nov-2015 dlg

dont play with IFF_OACTIVE needlessly.

only a driver sets or clears it, and trunk never sets it. therefore it
never needs to clear it.


# 1.123 12-Nov-2015 mpi

Prefix flowid with ph_ and print it in m_print().

ok dlg@


# 1.122 25-Oct-2015 mpi

arp_ifinit() is no longer required.


# 1.121 08-Oct-2015 mikeb

Make sure that when trunk_port_ioctl is called to set a new
lladdr the trunk port is already on the list.

OK mpi


# 1.120 08-Oct-2015 dlg

if the mbuf has a valid flowid, use it instead of using siphash24
and a bunch of header fields we have to parse the mbuf for.

siphash24 is about 20% of the cost of sending a udp packet on a
trunk interface with tcpbench on my box. if there's a flowid set
we get all that back.

ok mpi@ mikeb@ sthen@


# 1.119 05-Oct-2015 mikeb

Factor LACP frame processing out to a separate task

This is slightly refactored version of the diff by jmatthew@
that makes use of a single per-trunk task but retains per-port
mbuf queues.

Running LACP frame processing in a task context allows a simple
way to synchronize changes to the trunk ports and trunk itself
performed from the ioctl, timeout and task contexts with a kernel
lock.

OK mpi


# 1.118 29-Sep-2015 deraadt

add sizes to some of the simpler free calls
ok mpi


# 1.117 28-Sep-2015 mpi

Remove "if_tp" from the "struct ifnet".

Instead of violating a layer of abstraction by keeping per pseudo-driver
informations in "struct ifnet", the port trunk is now passed as a cookie
to the interface input handler (ifih).

The time of per pseudo-driver hack in the network stack is over!

ok mikeb@


# 1.116 24-Sep-2015 mikeb

add a comment explaining how we serialize when switching trunkproto;
requested by mpi@


# 1.115 24-Sep-2015 mikeb

Avoid a theoretical m_pullup(9) mishandling by delegating the mbuf
reclaiming to the PDU and marker input routines.

m_pullup may return a pointer to the newly allocated mbuf. In this
case m_freem is called by the trunk_input, not by the proto specific
code and pointer to the mbuf is not passed by reference. Therefore
m_freem will either be called on the middle element of the chain
(when the m_pullup call succeeds) or on the stale pointer (when it
frees the chain in the failure case). Fortunately we should never
hit this case as the receive path uniformly uses contiguous chunks
of memory.

Verified with and ok blambert, ok mpi


# 1.114 23-Sep-2015 mikeb

Serialize trunk changes with input handler insertion and removal.

This moves around calls to if_ih_insert and if_ih_remove to ensure
that we either have completed port initialization or are going to
tear the port configuration down and don't want any input processes
to get hold of the port.

When trunk_port_destroy is called from the ioctl this would wait for
all input processes to finish and release their references to be able
to disestablish the input handler and ensure full control of the port.

When switching trunkproto it is required for the ioctl context to
be able to touch all trunk ports and the protocol (tr_psc). The
easiest way do this is to disestablish all input handlers (while
making sure they all complete) and then reestablish them after the
trunk reconfiguration is completed.

This avoids getting trunk a separate locking protocol of its own.

ok mpi, suggested by and ok dlg


# 1.113 23-Sep-2015 mikeb

Keep track of an active port in the failover trunk to avoid list
iterations and additional locking protection in the future.

Suggested by and ok mpi


# 1.112 23-Sep-2015 mikeb

Remove trunk watchdog code since it doesn't do anything useful
and we want to limit the number of different places where we
access trunk port pointers.

trunk_watchdog should be never called as we don't set up it's
if_timer and trunk_port_watchdog just calls the if_watchdog
from the underlying interface.

It's possible that this is no longer needed due to if_slowtimo/
if_watchdog changes done earlier.

ok mpi


# 1.111 10-Sep-2015 mikeb

pass a cookie argument to interface input handlers that can be used
to pass additional context or transient data with the similar life
time.

ok mpi, suggestions, hand holding and ok from dlg


# 1.110 10-Sep-2015 dlg

move the if input handler list to an SRP list.

instead of having every driver that manipulates the ifih list
understand SRPLs, this moves that processing into if_ih_insert and
if_ih_remove functions.

we rely on the kernel lock to serialise the modifications to the
list.

tested by mpi@
ok mpi@ claudio@ mikeb@


Revision tags: OPENBSD_5_8_BASE
# 1.109 17-Jul-2015 mpi

Drop promiscuously received packets if the trunk(4) interface is not
in promiscuous mode.

The long story is that claudio@ had his ssh session reset multiple
times in the hackroom because czarkoff@'s machine was sending reset.
We figured out that the packet was reaching pf because of this missing
check. pf would then not find any state and sent a reset.

Analyzed with and ok phessler@, claudio@


# 1.108 02-Jul-2015 mpi

Unify the check for up & running between all pseudo-drivers.


# 1.107 02-Jul-2015 mpi

By design if_input_process() needs to hold a reference on the receiving
ifp in order to access its ifih handlers.

So get rid of if_get() in the various ifih handlers we know the ifp is
live at this point.

ok dlg@


# 1.106 30-Jun-2015 mpi

Rename if_output() into if_enqueue() to avoid confusion with comments
talking about (*ifp->if_output)().

ok claudio@, dlg@


# 1.105 29-Jun-2015 dlg

count if_ibytes in if_input like we do for if_ipackets.

tweaks and ok mpi@


# 1.104 24-Jun-2015 mpi

Increment if_ipackets in if_input().

Note that pseudo-drivers not using if_input() are not affected by this
conversion.

ok mikeb@, kettenis@, claudio@, dlg@


# 1.103 16-Jun-2015 mpi

Store a unique ID, an interface index, rather than a pointer to the
receiving interface in the packet header of every mbuf.

The interface pointer should now be retrieved when necessary with
if_get(). If a NULL pointer is returned by if_get(), the interface
has probably been destroy/removed and the mbuf should be freed.

Such mechanism will simplify garbage collection of mbufs and limit
problems with dangling ifp pointers.

Tested by jmatthew@ and krw@, discussed with many.

ok mikeb@, bluhm@, dlg@


# 1.102 15-Jun-2015 mpi

Fix a double free in the destroy path triggered when a second process,
in my case dhclient(8), races with ifconfig(8) to free the descriptors
of the joined multicast groups.

While here reduce the difference with carp(4).

ok dms@


# 1.101 09-Jun-2015 mpi

Convert trunk(4) to if_input().

ok dlg@


# 1.100 26-May-2015 mpi

Now that the Ethernet header is always passed as part of the mbuf, kill
the second (unused) argument of the input packet handlers.

ok dlg@


# 1.99 15-May-2015 mpi

Introduce if_output(), a function do to the last steps before enqueuing
a packet on the sending queue of an interface.

Tested by many, thanks a lot!

ok dlg@, claudio@


# 1.98 14-May-2015 mpi

Allocate the input packet handler as part of the trunk_port structure
since they have the same lifetime.

Requested by and ok dlg@


# 1.97 13-May-2015 mpi

Get rid of the last "#if NTRUNK" by overwriting trunk ports' output
function.

ok claudio@, reyk@


# 1.96 11-May-2015 mpi

Take trunk(4) out of ether_input().

Each physical interface of a trunk now gets a specific ifih (interface
input handler) that runs before ether_input().

Tested by sthen@, dlg@, Theo Buehler and <mxb AT alumni.chalmers DOT se>

ok sthen@, dlg@


# 1.95 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.94 19-Dec-2014 tedu

unifdef INET in net code as a precursor to removing the pretend option.
long live the one true internet.
ok henning mikeb


# 1.93 04-Dec-2014 tedu

use siphash for trunk loadbalancing. ok deraadt


# 1.92 01-Dec-2014 mikeb

Make every interface with a watchdog register it's own slow timeout

This removes the system wide if_slowtimo timeout and lets every
interface with a valid if_watchdog method register it's own in
order to get rid of the ifnet loop in the softclock context and
avoid further complications with concurrent access to the ifnet
list.

ok deraadt, input and ok mpi, looked at by claudio


# 1.91 18-Nov-2014 tedu

move arc4random prototype to systm.h. more appropriate for most code
to include that than rdnvar.h. ok deraadt dlg


Revision tags: OPENBSD_5_6_BASE
# 1.90 22-Jul-2014 mpi

Fewer <netinet/in_systm.h> !


# 1.89 12-Jul-2014 tedu

add a size argument to free. will be used soon, but for now default to 0.
after discussions with beck deraadt kettenis.


# 1.88 09-Jul-2014 henning

bpf code surgery / shuffling / simplification.
the various bpf_mtap_* are very similiar, they differ in what (and to some
extent how) they prepend something, and what copy function they pass to
bpf_catchpacket.
use an internal _bpf_mtap as "backend" for bpf_mtap and friends.
extend bpf_mtap_hdr so that it covers all common cases:
if dlen is 0, nothing gets prepended.
copy function can be given, if NULL the default bpf_mcopy is used.
adjust the existing bpf_mtap_hdr users to pass a NULL ptr for the copy fn.
re-implement bpf_mtap_af as simple wrapper for bpf_mtap_hdr.
re-implement bpf_mtap_ether using bpf_map_hdr
re-implement bpf_mtap_pflog as trivial bpf_mtap_hdr wrapper
ok bluhm benno


# 1.87 10-Mar-2014 mpi

if_lladdr -> if_sadl, no functional change.

ok mikeb@


Revision tags: OPENBSD_5_5_BASE
# 1.86 21-Nov-2013 mpi

Remove unneeded include.

ok deraadt@


# 1.85 18-Nov-2013 mpi

Convert trunk(4) to use a detachhook, discussed at b2k13 with many.

While here add a comment explaining detach hooks' order of execution when
destroying/detaching an interface.


Revision tags: OPENBSD_5_4_BASE
# 1.84 20-Jun-2013 mpi

Revert previous and unbreak asr, the new include should be protected.

Reported by naddy@


# 1.83 20-Jun-2013 mpi

Allocate the various hook head descriptors as part of the ifnet
structure rather than doing various M_WAITOK allocations during
the *attach() functions, we always rely on them anyway.

ok mikeb@, uebayasi@


# 1.82 11-May-2013 sthen

Set trunk(4)'s MTU to that of the first trunkport. Allows trunk to work with
jumbo/baby-jumbo frames. To avoid problems with mismatches between trunkports,
any additional ports must have the same MTU as already set on the trunk(4).
Based on changes made in FreeBSD. Tested by myself and jj@, ok reyk@


# 1.81 02-Apr-2013 mpi

Instead of storing the link-level address of every interface in a global
array indexed by interface numbers, add a new field to the interface
descriptor pointing to it.

claudio@ and todd@ like it, ok mikeb@


# 1.80 28-Mar-2013 tedu

no need for a lot of code to include proc.h


Revision tags: OPENBSD_5_3_BASE
# 1.79 25-Feb-2013 dlg

trunk_bcast_start sent packets on all its member interfaces by copying
the mbuf it just sent on the previous interface. this is bad because the
previous interface could have modified the mbuf chain, which can make the
subsequent m_copym()s panic.

this copies the dance that rtsock.c does for broadcasting mbufs which
copies the mbuf before transmit, except for the last interface which it
handles outside the loop.

tested by halex@ who verified it fixes his panic.
ok claudio@ deraadt@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.78 28-Oct-2011 krw

Take more care to ensure all callbacks are initialized. In particular
tr_linkstate() was not initialized in trunk_rr_attach(), and
tr_init() and tr_stop() were missing in trunk_lb_attach().

Fixes crash triggered by changing trunkproto, reported by Anders
Berggren on bugs@.

ok mpf henning


Revision tags: OPENBSD_4_9_BASE OPENBSD_5_0_BASE
# 1.77 28-Jan-2011 reyk

Fix another memory leak by replacing m_free() with m_freem() in trunk's
broadcast mode.

ok blambert@ mikeb@


# 1.76 12-Nov-2010 dhill

avoid pointer arithmetic on void *
no change in .o

ok claudio


Revision tags: OPENBSD_4_8_BASE
# 1.75 08-May-2010 stsp

Upon changing the MAC address of an if_trunk interface, all ports are switched
to the new MAC. But subsequently added ports were still being assigned the
old MAC address because it was copied from the wrong place. Give newly added
trunk ports the current MAC of the master port, rather than the saved MAC of
the master port. The saved MAC should only be used to restore the original
MAC address of the interface when it is removed from the trunk.

ok claudio@


# 1.74 23-Apr-2010 stsp

Use proper Queen's English in a comment. Drive-by fix, no functional change.


# 1.73 20-Apr-2010 tedu

remove proc.h include from uvm_map.h. This has far reaching effects, as
sysctl.h was reliant on this particular include, and many drivers included
sysctl.h unnecessarily. remove sysctl.h or add proc.h as needed.
ok deraadt


# 1.72 17-Apr-2010 deraadt

use ifnewlladdr() for trunk lladdr changes, too
ok stsp


Revision tags: OPENBSD_4_7_BASE
# 1.71 12-Jan-2010 dlg

set the length of the send queue to 1.

this prevents the ultimate length of the queue of the underlying interface
from being artificially inflated while hte vlan/trunk queue is filled and
then dumped wholesale on the underlying interface, which will dump its
massive queue wholesale on the chip.

tx mitigation is only triggered on real interfaces now (which is where the
cost is)

ok beck@ original diff ok kjc@ henning@


# 1.70 18-Nov-2009 deraadt

do not do setup that ether_ifattach() takes care of; ok jsg


# 1.69 17-Sep-2009 claudio

Add an splassert check trunk_enqueue() calling this function at anything
below splnet() is a good recipe for doom.
OK henning, reyk, mpf


# 1.68 09-Sep-2009 reyk

remove inline functions and move some code from the trunk_lacp_input()
API function directly to lacp_input() to simplify the code path.

ok mpf@


# 1.67 16-Jul-2009 thib

Backout rev1.79 of if_vlan.c and rev1.66 of if_trunk.c;
Changes in those revision limited the send queue to one slot.

This breaks NFS over vlan(4) has discovered by sthen@.

"just plain back it out." deraadt@


# 1.66 13-Jul-2009 dlg

make the send queue one slot long. this forces packets off the virtual
interfaces down to the queue on the physical interface immediately.

this avoids having the tx mitigation code wasting cpu time dicking around
with simply shuffling packets off virtual interface queues and lets it
do its job of ammortising the cost of calling a real interfaces start
routine.

it also prevents an artificial inflation of the physical interfaces queue
length where packets could hide on the virtual interfaces queues during
softnet before being dumped en masse onto the hardware. this will smooth
out the rate at which packets are submitted to the hardware.

kjc@ says this has no impact on altq. ya henning@


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.65 27-Jan-2009 naddy

handle HW VLAN tags being passed down; from Brad


# 1.64 27-Jan-2009 naddy

make the hardware/no hardware tag stripping cases consistent and don't
hash the VLAN priority; ok henning@


# 1.63 14-Dec-2008 brad

Allow trunk_hashmbuf() to take HW VLAN tagging into consideration.

ok mpf@ naddy@


# 1.62 14-Dec-2008 brad

Since trunk_hashmbuf() and thus trunk_lb_gethdr() are no longer specific
to the loadbalance code rename trunk_lb_gethdr() to just trunk_gethr().

ok mpf@


# 1.61 28-Nov-2008 brad

Eliminate the redundant bits of code for MTU and multicast handling
from the individual drivers now that ether_ioctl() handles this.

Shrinks the i386 kernels by..
RAMDISK - 2176 bytes
RAMDISKB - 1504 bytes
RAMDISKC - 736 bytes

Tested by naddy@/okan@/sthen@/brad@/todd@/jmc@ and lots of users.
Build tested on almost all archs by todd@/brad@

ok naddy@


# 1.60 16-Nov-2008 brad

Make sure to increment the the output error counter if
not using TRUNK_PROTO_NONE and there are no member ports.

ok mpf@


# 1.59 08-Nov-2008 mpf

Take into account that our ether_input() already strips the
ethernet header. This lets us actually process the incoming
LACP-Packets. It should now work with a lot more switches.
At least a Catalyst 3500 seems happy.
OK brad@


# 1.58 04-Nov-2008 brad

Move the trunk port count check from trunk_lb_start() to trunk_start()
before the protocol start routine is called so as to cover all protocols
with the same check.

ok mpf@


# 1.57 30-Oct-2008 brad

Fix building with !INET6 kernels.


# 1.56 28-Oct-2008 brad

Remove return at the end of a void function.


# 1.55 28-Oct-2008 brad

In trunk_media_status() mark the interface as active if any ports are
active rather than just the primary being UP.

From FreeBSD

Ok mpf@


# 1.54 28-Oct-2008 brad

In trunk_lb_start() port % count will never be greater than
TRUNK_MAX_PORTS so nuke the test.

From FreeBSD

Ok mpf@


# 1.53 28-Oct-2008 brad

Feed IPv6 flow label to hash calculation.

From FreeBSD

Ok mpf@


# 1.52 28-Oct-2008 brad

Show the ACTIVE flag in ifconfig for the single interface that is
actually active in failover mode rather than all interfaces with a
link. This makes it clear if the master interface is in use or one
of the backup links.

From FreeBSD

Tested by jmc@
Ok mpf@


# 1.51 02-Oct-2008 brad

First step towards cleaning up the Ethernet driver ioctl handling.
Move calling ether_ioctl() from the top of the ioctl function, which
at the moment does absolutely nothing, to the default switch case.
Thus allowing drivers to define their own ioctl handlers and then
falling back on ether_ioctl(). The only functional change this results
in at the moment is having all Ethernet drivers returning the proper
errno of ENOTTY instead of EINVAL/ENXIO when encountering unknown
ioctl's.

Shrinks the i386 kernels by..
RAMDISK - 1024 bytes
RAMDISKB - 1120 bytes
RAMDISKC - 832 bytes

Tested by martin@/jsing@/todd@/brad@
Build tested on almost all archs by todd@/brad@

ok jsing@


# 1.50 17-Sep-2008 chl

remove dead stores and newly created unused variables.

fix potential use of uninitialized value in trunk_port_ioctl() function.

Found by LLVM/Clang Static Analyzer.

ok mpf@ henning@


# 1.49 07-Aug-2008 damien

do not touch m after IFQ_ENQUEUE()+if_start().

ok brad@, mpf@, henning@, reyk@


Revision tags: OPENBSD_4_4_BASE
# 1.48 06-Aug-2008 reyk

fix trunk breakage that sneaked in with the lacp diff:

- don't use in-kernel IFMEDIA ioctls in lacp and remove two KASSERTs
that caused reliable panics - the lacp key can be locally assigned and
we don't need to query the media subtype here.

- unbreak failover/loadbalance/broadcast status handling.

Reported by brad@
ok deraadt@


# 1.47 30-Jul-2008 mpf

Prevent a divide by zero panic if trunkproto loadbalance is
used w/out any trunkports. Patch from Dmitri Alenitchev.
OK reyk@


# 1.46 15-Jun-2008 mpf

Add 802.3ad LACP support for trunk(4).
Implementation from NetBSD. Ported via FreeBSD's version in trunk^Wlagg(4).
This is still work in progress. Tested with a HP ProCurve 3500.
OK reyk@


# 1.45 14-Jun-2008 mpf

Move bpf_mtap_hdr() above the trunk_*_input() routines.
This makes it easier to add trunk protocols that consume some packets.
Add a special case for the failover protocol, to prevent shoving
duplicates to bpf. (Not beautiful, but it has to do for the moment).
OK reyk@, claudio@


# 1.44 13-Jun-2008 mpf

Move the responsibility to free received packets on trunked interfaces
from ether_input() into trunk_input() where it can be handled in a smarter way.
OK claudio@ and reyk@ on an earlier version.


# 1.43 08-Jun-2008 brad

Use m_freem() instead of m_free() in trunk_start() to ensure that the
full mbuf chain is being free'd.

ok reyk@


# 1.42 07-May-2008 dlg

enable tx mitigation when putting packets on the wire by switching from
calls to ifp->if_start to if_start(). these are the obviously right cases
where we can do that, the less obvious ones may follow as theyre figured
out.

deraadt@ said to go for it


Revision tags: OPENBSD_4_3_BASE
# 1.41 10-Jan-2008 brad

return with ENOTTY instead of EINVAL for unknown ioctl requests to trunk
ports.

ok reyk@ dlg@


# 1.40 26-Nov-2007 martynas

typos; ok jmc@
sys/netinet/in_pcb.c and sys/net/bridgestp.c ok henning@
sys/dev/pci/bktr/* ok jakemsr@


# 1.39 20-Nov-2007 canacar

Fix possible mbuf leak on error. ok reyk@


# 1.38 22-Oct-2007 reyk

use the input mbuf for the first active port instead of copying it in
the broadcast start routing.

ok pyr@


# 1.37 22-Oct-2007 pyr

Add a broadcast mode to trunk(4). This mode sends frames on all
ports and receives frame on any port. This allows interaction with
some L2 configurations.
with input and ok reyk@


# 1.36 15-Sep-2007 henning

malloc sweep:
-remove useless casts
-MALLOC/FREE -> malloc/free
-use M_ZERO where appropriate instead of seperate bzero
feedback & ok krw, hshoexer


# 1.35 07-Sep-2007 reyk

use M_ZERO


# 1.34 06-Sep-2007 reyk

bump the copyright while touching these files


# 1.33 06-Sep-2007 reyk

With a trunk(4) interface in fail over mode the trunk(4) interface
will show input errors for packets received from any of the ports that
are part of a fail over interface but are not the "master" port at the
time. This fixes the problem by checking the error condition
correctly.

From brad at comstyle dot com


Revision tags: OPENBSD_4_2_BASE
# 1.32 26-May-2007 jason

one extern seems to be better than 20 for ifqmaxlen; ok krw


# 1.31 26-Apr-2007 reyk

extend the trunk protocol API with some additional callbacks required
for future work. also move the repeated tx start code into a common
function.

parts of it are merged from FreeBSD's trunk(4) port. oh, wait... they
renamed it to 'lagg(4)' because a little green guy from Cizzco-Eeeh
told them "trunk is for VLANs, trunk is for VLANs". Bad FreeBSD, don't
listen to the little green guy from Cizzco-Eeeh!

ok claudio@


Revision tags: OPENBSD_4_1_BASE
# 1.30 31-Jan-2007 reyk

handle the full duplex link state in trunk(4). load sharing trunks
with at least two ports are always handled as full fuplex links. this
change will allow trunks as edge ports in a rstp bridge(4).

ok brad@ pyr@


Revision tags: OPENBSD_4_0_BASE
# 1.29 28-May-2006 reyk

check if the interface is active and UP. some, but not all, network
drivers report an active link state even if the interface is DOWN.
this should fix trunk with various ethernet devices.

ok brad@


# 1.28 23-May-2006 reyk

knf and remove an unneeded debug message


# 1.27 23-May-2006 reyk

add


# 1.26 20-May-2006 reyk

bump copyright


# 1.25 20-May-2006 reyk

drop packets received on inactive failover ports without increasing the
error counter. just silently drop them...

figured out by todd@, ok brad@


# 1.24 16-May-2006 reyk

the ifp->if_linkstatehooks may be NULL, add an extra check to avoid
possible kernel panic. this happened to me by using tun(4) in layer 2
mode (link0 flag) as a trunk port for testing.


# 1.23 25-Mar-2006 djm

allow bpf(4) to ignore packets based on their direction (inbound or
outbound), using a new BIOCSDIRFILT ioctl;
guidance, feedback and ok canacar@


# 1.22 11-Mar-2006 brad

splimp -> splnet


# 1.21 04-Mar-2006 brad

With the exception of two other small uncommited diffs this moves
the remainder of the network stack from splimp to splnet.

ok miod@


Revision tags: OPENBSD_3_9_BASE
# 1.20 04-Jan-2006 brad

Move bpf_mtap_hdr() after trunk_*_input() so that we hopefully see only one
copy of the packet with bpf.

From mcbride@

some testing by todd@, ok reyk@


# 1.19 04-Jan-2006 canacar

Remove redundant calls to bpfdetach.
ok brad@


# 1.18 21-Dec-2005 reyk

knf


# 1.17 21-Dec-2005 reyk

fix possible NULL pointer, thanks to Marco Molteni


# 1.16 18-Dec-2005 reyk

Update my e-mail address in the copyright statement, no binary changes.


# 1.15 17-Dec-2005 brad

revert last commit as it introduced system panics due to improper use
of bpf_mtap().


# 1.14 27-Nov-2005 mcbride

Move bpf_mtap() after trunk_*_input() so that we hopefull see only one
copy of the packet with bpf.

ok reyk@


# 1.13 27-Nov-2005 mcbride

Attempt to accept a packet only once when operating in failover mode.

Makes trunk usable with hubs or switches which don't have actual trunk support.

ok reyk@


# 1.12 27-Nov-2005 mcbride

Fix dereference of uninitialised pointer in trunk_input() error path.

ok reyk@


# 1.11 27-Nov-2005 mcbride

Make the trunk interface link state depend on the link states of the
trunkports (link is UP as long as at least one of the trunkports is up)

ok reyk@


# 1.10 23-Oct-2005 mpf

Rework of multicast deletion code for vlan(4) and trunk(4).
The previous code could wrongly delete multicast groups
on the parent interface. Now we forward only remembered
delete requests.
OK mcbride, mickey.


# 1.9 09-Oct-2005 reyk

use label


# 1.8 03-Oct-2005 reyk

add a simple active "failover" trunk mode. port priorities will be
added later, currently the master port will always be the default
(active) port and the the next active port will be used as the
failover port.

ok brad@


# 1.7 14-Sep-2005 reyk

correctly set IFF_RUNNING flag after device state changes.

ok brad@


# 1.6 11-Sep-2005 brad

when adding any IP addresses make sure to UP the interface.

ok reyk@


# 1.5 10-Sep-2005 reyk

update the trunk(4) driver

- add multicast support by passing multicast addresses to the ports.
this is a requirement for carp(4) over trunk(4).

- support the smallest common interface capabilities. ie., this adds
support for VLAN MTUs if all attached ports have this capability.

- add a port_destroy callback to the trunk protocol. this fixes a
potential crash if the master port has been detached while running.

discussed with deraadt@, brad@ and some others.


Revision tags: OPENBSD_3_8_BASE
# 1.4 31-Jul-2005 pascoe

Introduce bpf_mtap_af and bpf_mtap_hdr to be used when passing a mbuf chain
to bpf with either an address family or other header added.

These helpers only allocate a much smaller struct m_hdr on the stack when
needed, rather than leaving 256 byte struct mbufs on the stack in deep
call paths. Also removes a fair bit of duplicated code.

commit now, tune after deraadt@


# 1.3 27-May-2005 reyk

add missing free on error. thanks to Andrey Matveev.


# 1.2 24-May-2005 reyk

support trunk stacking (trunks as trunk ports) and some fixes

ok brad@


# 1.1 24-May-2005 reyk

initial import of a trunking (link aggregation and link failover)
implementation. it currently supports round robin mode with link state
checking, additional modes will be added later.

ok brad@, deraadt@