History log of /freebsd-10.1-release/sys/net/if_lagg.c
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# 272461 02-Oct-2014 gjb

Copy stable/10@r272459 to releng/10.1 as part of
the 10.1-RELEASE process.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation

# 270136 18-Aug-2014 mav

MFC r269492:
Improve locking of multicast addresses in VLAN and LAGG interfaces.

This fixes several scenarios of reproducible panics, cause by races
between multicast address changes and interface destruction.


# 265412 06-May-2014 rmacklem

MFC: r264469, r264498
Lagg did not set the value of if_hw_tsomax, so when lagg
was stacked on top of network interfaces that set if_hw_tsomax,
tcp_output() would see the default value instead of the value
set by the network interface(s). This patch modifies lagg so that
it sets if_hw_tsomax to the minimum of the value(s) for the
underlying network interfaces.


# 260179 01-Jan-2014 scottl

MFC r260070

Multi-queue NIC drivers and multi-port lagg tend to use the same lower
bits of the flowid as each other, resulting in a poor distribution of
packets among queues in certain cases. Work around this by adding a
set of sysctls for controlling a bit-shift on the flowid when doing
multi-port aggrigation in lagg and lacp. By default, lagg/lacp will
now use bits 16 and higher instead of 0 and higher.

Obtained from: Netflix


# 256281 10-Oct-2013 gjb

Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


# 256218 09-Oct-2013 glebius

There are some high performance NICs that count statistics in hardware,
and there are ifnets, that do that via counter(9). Provide a flag that
would skip cache line trashing '+=' operation in ether_input().

Sponsored by: Netflix
Sponsored by: Nginx, Inc.
Reviewed by: melifaro, adrian
Approved by: re (marius)


# 255038 29-Aug-2013 adrian

Convert the if_lagg rwlock to an rmlock.

We've been seeing lots of cache line contention (but not lock contention!)
in our workloads between the various TX and RX threads going on.

The write lock is only grabbed when configuration changes are made - which
are infrequent.

With this patch, the contention and cycles spent waiting for updates
disappear.

Sponsored by: Netflix, Inc.


# 253687 26-Jul-2013 adrian

Break out the static, global LACP debug options into a per-lagg unit
sysctl tree.

* Create a net.link.lagg.X.lacp node
* Add a debug node under that for tx_test and rx_test
* Add lacp_strict_mode, defaulting to 1

tx_test and rx_test are still a bitmap of unit numbers for now.
At some point it would be nice to create child nodes of the lagg bundle
for each sub-interface, and then populate those with various knobs
and statistics.

Sponsored by: Netflix


# 253314 13-Jul-2013 adrian

Bring over some link aggregation / LACP protocol improvements and debugging
additions.

* Add some new tracing events to aid in debugging.
* Add in a debugging mode to drop transmit and received frames, specifically
to test whether seeing or hearing heartbeats correctly cause LACP to
drop the port.
* Add in (and make default) a strict LACP mode, which requires the
heartbeat on a port to be heard before it's used. Sometimes vendor ports
will hang but the link layer stays up, resulting in hung traffic.
* Add logging the number of link status flaps, again to aid in debugging
badly behaving switch ports.
* Calculate the lagg interface port speed as the multiple of the
configured ports, rather than the largest.

Obtained from: Netflix
MFC after: 2 weeks


# 252511 02-Jul-2013 hrs

- Allow ND6_IFF_AUTO_LINKLOCAL for IFT_BRIDGE. An interface with IFT_BRIDGE
is initialized with !ND6_IFF_AUTO_LINKLOCAL && !ND6_IFF_ACCEPT_RTADV
regardless of net.inet6.ip6.accept_rtadv and net.inet6.ip6.auto_linklocal.
To configure an autoconfigured link-local address (RFC 4862), the
following rc.conf(5) configuration can be used:

ifconfig_bridge0_ipv6="inet6 auto_linklocal"

- if_bridge(4) now removes IPv6 addresses on a member interface to be
added when the parent interface or one of the existing member
interfaces has an IPv6 address. if_bridge(4) merges each link-local
scope zone which the member interfaces form respectively, so it causes
address scope violation. Removal of the IPv6 addresses prevents it.

- if_lagg(4) now removes IPv6 addresses on a member interfaces
unconditionally.

- Set reasonable flags to non-IPv6-capable interfaces. [*]

Submitted by: rpaulo [*]
MFC after: 1 week


# 251859 17-Jun-2013 delphij

Return ENETDOWN instead of ENOENT when all lagg(4) links are
inactive when upper layer tries to transmit packet. This
gives better feedback and meaningful errors for applications.

MFC after: 2 weeks
Reviewed by: thompsa


# 251490 07-Jun-2013 trociny

Properly set curvnet context in lagg_port_setlladdr() task handler.

Reported by: Nikos Vassiliadis <nvass gmx.com>
Submitted by: zec
Tested by: Nikos Vassiliadis <nvass gmx.com>
MFC after: 1 week


# 249925 26-Apr-2013 glebius

Add const qualifier to the dst parameter of the ifnet if_output method.


# 249506 15-Apr-2013 glebius

Switch lagg(4) statistics to counter(9).

The lagg(4) is often used to bond high speed links, so basic per-packet +=
on statistics cause cache misses and statistics loss.

Perfect solution would be to convert ifnet(9) to counters(9), but this
requires much more work, and unfortunately ABI change, so temporarily
patch lagg(4) manually.

We store counters in the softc, and once per second push their values
to legacy ifnet counters.

Sponsored by: Nginx, Inc.


# 248621 22-Mar-2013 glebius

Remove __FreeBSD_version ifdefs.


# 245741 21-Jan-2013 glebius

If lagg(4) can't forward a packet due to underlying port problems,
return much more meaningful ENETDOWN to the stack, instead of EBUSY.


# 241627 17-Oct-2012 delphij

Fix build.


# 241619 16-Oct-2012 emax

report total number of ports for each lagg(4) interface
via net.link.lagg.X.count sysctl

MFC after: 1 week


# 241610 16-Oct-2012 glebius

Make the "struct if_clone" opaque to users of the cloning API. Users
now use function calls:

if_clone_simple()
if_clone_advanced()

to initialize a cloner, instead of macros that initialize if_clone
structure.

Discussed with: brooks, bz, 1 year ago


# 241394 10-Oct-2012 kevlo

Revert previous commit...

Pointyhat to: kevlo (myself)


# 241370 09-Oct-2012 kevlo

Prefer NULL over 0 for pointers


# 240742 20-Sep-2012 glebius

Convert lagg(4) to use if_transmit instead of if_start.

In collaboration with: thompsa, sbruno, fabient


# 237852 30-Jun-2012 thompsa

Add the same check as vlan(4) where we ignore the ifnet departure event if the
interface is just being renamed.

PR: kern/169557
Submitted by: Mark Johnston
MFC after: 3 days


# 236178 28-May-2012 rea

if_lagg: allow to invoke SIOCSLAGGPORT multiple times in a row

Currently, 'ifconfig laggX down' does not remove members from this
lagg(4) interface. So, 'service netif stop laggX' followed by
'service netif start laggX' will choke, because "stop" will leave
interfaces attached to the laggX and ifconfig from the "start" will
refuse to add already-existing interfaces.

The real-world case is when I am bundling together my Ethernet and
WiFi interfaces and using multiple profiles for accessing network in
different places: system being booted up with one profile, but later
this profile being exchanged to another one, followed by 'service
netif restart' will not add WiFi interface back to the lagg: the
"stop" action from 'service netif restart' will shut down my main WiFi
interface, so wlan0 that exists in the lagg0 will be destroyed and
purged from lagg0; the "start" action will try to re-add both
interfaces, but since Ethernet one is already in lagg0, ifconfig will
refuse to add the wlan0 from WiFi interface.

Since adding the interface to the lagg(4) when it is already here
should be an idempotent action: we're really not changing anything,
so this fix doesn't change the semantics of interface addition.

Approved by: thompsa
Reviewed by: emaste
MFC after: 1 week


# 234936 02-May-2012 emaste

Relax restriction on direct tx to child ports

Lagg(4) restricts the type of packet that may be sent directly to a child
port, to avoid undesired output from accidental misconfiguration.
Previously only ETHERTYPE_PAE was permitted.

BPF writes to a lagg(4) child port are presumably intentional, so just
allow them, while still blocking other packets that should take the
aggregation path.

PR: kern/138620
Approved by: thompsa@


# 234163 11-Apr-2012 thompsa

Set the proto to LAGG_PROTO_NONE before calling the detach routine so packets
are discarded, this is an issue because lacp drops the lock which may allow
network threads to access freed memory. Expand the lock coverage so the
detach/attach happen atomically.

Submitted by: Andrew Boyer (earlier version)


# 232640 07-Mar-2012 thompsa

Move the vlan buffer space into the union which also fixes an unused variable
warning with !INET & !INET6.

Spotted by: pluknet


# 232629 06-Mar-2012 thompsa

Add the ability to set which packet layers are used for the load balance hash
calculation.


# 232080 23-Feb-2012 thompsa

Add a sysctl/tunable default value for the use_flowid sysctl in r232008.


# 232008 22-Feb-2012 thompsa

Using the flowid in the mbuf assumes the network card is giving a good hash for
the traffic flow, this may not be the case giving poor traffic distribution.
Add a sysctl which allows us to fall back to our own flow hash code.

PR: kern/164901
Submitted by: Eugene Grosbein
MFC after: 1 week


# 227459 11-Nov-2011 brooks

In r191367 the need for if_free_type() was removed and a new member
if_alloctype was used to store the origional interface type. Take
advantage of this change by removing all existing uses of if_free_type()
in favor of if_free().

MFC after: 1 Month


# 227309 07-Nov-2011 ed

Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.

The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.


# 224571 01-Aug-2011 pluknet

Add missing MODULE_VERSION() definition to protect against duplicating
module loads.

PR: kern/159345
Reported by: Eugene Grosbein <egrosbein att rdtc ru>
Tested by: Eugene Grosbein <egrosbein att rdtc ru>
Approved by: re (kib)
MFC after: 1 week


# 223846 07-Jul-2011 thompsa

Grab the rlock before checking if our interface is enabled, it could be
possible to hit a dead pointer when changing interfaces.

PR: kern/156978
Submitted by: Andrew Boyer
MFC after: 1 week


# 221270 30-Apr-2011 thompsa

LACP frames must not be send VLAN-tagged, check for that before processing.

PR: kern/156743
Submitted by: Dmitrij Tejblum
MFC after: 1 week


# 221130 27-Apr-2011 bz

Make various (pseudo) interfaces compile without INET in the kernel
adding appropriate #ifdefs. For module builds the framework needs
adjustments for at least carp.

Reviewed by: gnn
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
MFC after: 4 days


# 219275 04-Mar-2011 eri

Fix a panic that can happen when trying to destroy a lagg(4) with scheduler set to none.

Approved by: thompsa(mentor)
MFC after: 1 week


# 212100 01-Sep-2010 emaste

Add a sysctl knob to accept input packets on any link in a failover lagg.


# 204901 08-Mar-2010 delphij

Remove the check for IFF_DRV_OACTIVE right before adding a port into lagg
interface. The check itself seems to be coming from OpenBSD but does not
seem to be useful for our code.

Discussed with: thomasa
MFC after: 1 month


# 203548 06-Feb-2010 eri

Propagate the vlan eventis to the underlying interfaces/members so they can do initialization of hw related features.

PR: kern/141646
Reviewed by: thompsa
Approved by: thompsa(co-mentor)
MFC after: 2 weeks


# 202588 18-Jan-2010 thompsa

Declare a new EVENTHANDLER called iflladdr_event which signals that the L2
address on an interface has changed. This lets stacked interfaces such as
vlan(4) detect that their lower interface has changed and adjust things in
order to keep working. Previously this situation broke at least vlan(4) and
lagg(4) configurations.

The EVENTHANDLER_INVOKE call was not placed within if_setlladdr() due to the
risk of a loop.

PR: kern/142927
Submitted by: Nikolay Denev


# 201803 08-Jan-2010 trasz

Stop GCC from complaining about lagg_port_checkstacking() being unused.


# 191692 30-Apr-2009 thompsa

Use the flowid if its available for selecting the tx port.


# 191148 16-Apr-2009 kmacy

Change if_output to take a struct route as its fourth argument in order
to allow passing a cached struct llentry * down to L2

Reviewed by: rwatson


# 186255 17-Dec-2008 thompsa

- Protect against sc->sc_primary being null
- Initialise speed where its used


# 186254 17-Dec-2008 thompsa

Update the interface baudrate taking into account the max speed for the
different aggregation protocols.


# 186195 16-Dec-2008 thompsa

Also propagate the if_hwassist value to the parent so that cksum offload works.

Submitted by: Tom Hicks (thicks_averesys.com)


# 185164 22-Nov-2008 kmacy

convert calls to IFQ_HANDOFF to if_transmit


# 183498 30-Sep-2008 glebius

Do not mangle if_oerrors of the underlying interface. This counter
belongs solely to the driver.
We don't lose any statistics with this change, because in a error
case the drop counter on the interface output queue is always incremented.

Reviewed by: thompsa


# 183160 18-Sep-2008 thompsa

Move the protocol and port count checks to outside the loop, these conditions
can not change while we have the lock so no point retesting.


# 183135 18-Sep-2008 thompsa

Make sure there is at least one port to avoid divide by zero when choosing the
tx port.

PR: kern/122794
MFC after: 3 days


# 180249 04-Jul-2008 thompsa

port % count will never be greater than LAGG_MAX_PORTS so nuke the test.


# 177274 16-Mar-2008 thompsa

Switch the LACP state machine over to its own mutex to protect the internals,
this means that it no longer grabs the lagg rwlock. Use two port table arrays
which list the active ports for Tx and switch between them with an atomic op.
Now the lagg rwlock is only exclusively locked for management (ioctls) and
queuing of lacp control frames isnt needed.


# 175005 30-Dec-2007 thompsa

Pass any unmatched slowprotocols frames up the stack instead of dropping them,
there are more subtypes than just LACP.


# 174742 18-Dec-2007 thompsa

- Use the macro to check the port status has it will also test if its
administratively down (!IFF_UP)
- Use the same parameters to lagg_link_active() to get the backup port as in
the output path, this didnt actually matter in practice as sc_primary is
always the first on the port list.

MFC after: 3 days


# 174721 17-Dec-2007 thompsa

Add myself to the copyright.


# 174278 04-Dec-2007 thompsa

Support monitor mode where the frame is discarded after bpf and stats processing.


# 173895 25-Nov-2007 thompsa

Have the lagg interface generate link up/down events, the interface is marked
as up if at least one of its ports also has a link up. This fixes using
carp+lagg together and any other system that relies on linkstate events.

PR: kern/113956
MFC after: 3 days


# 172825 20-Oct-2007 thompsa

Use ETHER_BPF_MTAP so that the vlan tags are visible to bpf(4) when stacked
under a vlan.

MFC after: 3 days


# 172554 12-Oct-2007 thompsa

Fix two panics in lagg.

1. The locking was changed to shared but roundrobin mode still updated a
pointer in the softc with the next tx interface to use. This will panic
under high load. Change this to an atomically incremented sequence number in
order to choose the tx port in round robin.

2. IFQ_HANDOFF will free the mbuf if the queue is full, this will then be freed
again by lagg_start() and panic. Reorganised the error handling and freeing
to fix this.

MFC after: 3 days


# 172020 30-Aug-2007 thompsa

Show the ACTIVE flag in ifconfig for the single interface that is actaully
active in failover mode rather than all interfaces with a link. This makes it
clear if the master interface is in use or one of the backup links.

Found by: Writing the Handbook section
Approved by: re (kensmith)


# 171661 30-Jul-2007 thompsa

- Propagate the largest set of interface capabilities supported by all lagg
ports to the lagg interface.
- Use the MTU from the first interface as the lagg MTU, all extra interfaces
must be the same.

This fixes using a lagg interface for a vlan or enabling jumbo frames, etc.

Approved by: re (kensmith)
MFC After: 3 days


# 171603 26-Jul-2007 thompsa

Avoid holding the softc lock when using copyout().

Reported by: dfr
Approved by: re (rwatson)


# 171247 05-Jul-2007 thompsa

Allow the LACP state to be queried from userland which at the moment is the
actor and partner peer info. Print out the active aggregator and per port data
in verbose mode from ifconfig.

Approved by: re (mux)


# 170599 12-Jun-2007 thompsa

non-functional cleanup
- remove dead code
- use consistent variable names
- gc unused defines
- whitespace cleanup


# 169783 20-May-2007 thompsa

- packets on the input interface were counted twice
- Use IFQ_HANDOFF instead of rolling our own


# 169698 18-May-2007 thompsa

Fix a mbuf leak where sc_start fails or the protocol is none.


# 169688 18-May-2007 thompsa

Fix locking assert where we should hold the reader lock.


# 169583 15-May-2007 thompsa

Fix unused variable error with !INET6

Reported by: Artem Naluzhny, Frank Terhaar-Yonkers


# 169570 15-May-2007 thompsa

Feed ipv6 flowlabel to hash calculation.

Obtained from: NetBSD


# 169569 15-May-2007 thompsa

Change from a mutex to a read/write lock. This allows the tx port to be
selected simultaneously by multiple senders and transmit/receive is not
serialised between aggregated interfaces.


# 169340 07-May-2007 thompsa

- Correctly check if lp_ioctl is null
- Remove lagg_ether_purgemulti as its no longer needed
- Mark the interface as up if any ports are active rather than just the primary


# 169330 06-May-2007 thompsa

The purgemulti call is not needed since all the ports have already been detached.


# 169329 06-May-2007 thompsa

Call if_setlladdr() on the aggregation port from a taskqueue so the softc lock
is not held. The short delay between aggregating the port and setting the MAC
address is fine.


# 169328 06-May-2007 thompsa

Avoid touching various unsafe parts if the interface is disappearing.


# 169327 06-May-2007 thompsa

Change from using if_delmulti() to if_delmulti_ifma() as it simplifies the code
and is safe to use if the ifp has disappeared.

Suggested by: bms


# 169227 03-May-2007 thompsa

- Add a disabled state for ports that can not be aggregated
- Refine check for lacp links, set to disabled if not suitable


# 169204 02-May-2007 thompsa

Set the master flag on the right variable.


# 168793 16-Apr-2007 thompsa

Rename the trunk(4) driver to lagg(4) as it is too similar to vlan trunking.

The name trunk is misused as the networking term trunk means carrying multiple
VLANs over a single connection. The IEEE standard for link aggregation (802.3
section 3) does not talk about 'trunk' at all while it is used throughout IEEE
802.1Q in describing vlans.

The lagg(4) driver provides link aggregation, failover and fault tolerance.

Discussed on: current@