History log of /freebsd-11.0-release/sys/net/if_lagg.c
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# 303975 11-Aug-2016 gjb

Copy stable/11@r303970 to releng/11.0 as part of the 11.0-RELEASE
cycle.

Prune svn:mergeinfo from the new branch, and rename it to RC1.

Update __FreeBSD_version.

Use the quarterly branch for the default FreeBSD.conf pkg(8) repo and
the dvd1.iso packages population.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation

# 302408 08-Jul-2016 gjb

Copy head@r302406 to stable/11 as part of the 11.0-RELEASE cycle.
Prune svn:mergeinfo from the new branch, as nothing has been merged
here.

Additional commits post-branch will follow.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


# 302054 21-Jun-2016 bz

Get closer to a VIMAGE network stack teardown from top to bottom rather
than removing the network interfaces first. This change is rather larger
and convoluted as the ordering requirements cannot be separated.

Move the pfil(9) framework to SI_SUB_PROTO_PFIL, move Firewalls and
related modules to their own SI_SUB_PROTO_FIREWALL.
Move initialization of "physical" interfaces to SI_SUB_DRIVERS,
move virtual (cloned) interfaces to SI_SUB_PSEUDO.
Move Multicast to SI_SUB_PROTO_MC.

Re-work parts of multicast initialisation and teardown, not taking the
huge amount of memory into account if used as a module yet.

For interface teardown we try to do as many of them as we can on
SI_SUB_INIT_IF, but for some this makes no sense, e.g., when tunnelling
over a higher layer protocol such as IP. In that case the interface
has to go along (or before) the higher layer protocol is shutdown.

Kernel hhooks need to go last on teardown as they may be used at various
higher layers and we cannot remove them before we cleaned up the higher
layers.

For interface teardown there are multiple paths:
(a) a cloned interface is destroyed (inside a VIMAGE or in the base system),
(b) any interface is moved from a virtual network stack to a different
network stack ("vmove"), or (c) a virtual network stack is being shut down.
All code paths go through if_detach_internal() where we, depending on the
vmove flag or the vnet state, make a decision on how much to shut down;
in case we are destroying a VNET the individual protocol layers will
cleanup their own parts thus we cannot do so again for each interface as
we end up with, e.g., double-frees, destroying locks twice or acquiring
already destroyed locks.
When calling into protocol cleanups we equally have to tell them
whether they need to detach upper layer protocols ("ulp") or not
(e.g., in6_ifdetach()).

Provide or enahnce helper functions to do proper cleanup at a protocol
rather than at an interface level.

Approved by: re (hrs)
Obtained from: projects/vnet
Reviewed by: gnn, jhb
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D6747


# 298995 03-May-2016 pfg

sys/net*: minor spelling fixes.

No functional change.


# 297610 06-Apr-2016 rpokala

Revert accidental submit of WIP as part of r297609

Pointyhat to: rpokala


# 297609 06-Apr-2016 rpokala

Storage Controller Interface driver - typo in unimplemented macro in
scic_sds_controller_registers.h

s/contoller/controller/

PR: 207336
Submitted by: Tony Narlock <tony @ git-pull.com>


# 295796 19-Feb-2016 araujo

Fix regression introduced on 272446r.

lagg(4) supports the protocol none, where it disables any traffic without
disabling the lagg(4) interface itself.

PR: 206921
Submitted by: Pushkar Kothavade <pushkarbk@gmail.com>
Reviewed by: rpokala
Approved by: bapt (mentor)
MFC after: 3 weeks
Sponsored by: gandi.net
Differential Revision: https://reviews.freebsd.org/D5076


# 294615 23-Jan-2016 araujo

Add an IOCTL rr_limit to let users fine tuning the number of packets to be
sent using roundrobin protocol and set a better granularity and distribution
among the interfaces. Tuning the number of packages sent by interface can
increase throughput and reduce unordered packets as well as reduce SACK.

Example of usage:
# ifconfig bge0 up
# ifconfig bge1 up
# ifconfig lagg0 create
# ifconfig lagg0 laggproto roundrobin laggport bge0 laggport bge1 \
192.168.1.1 netmask 255.255.255.0
# ifconfig lagg0 rr_limit 500

Reviewed by: thompsa, glebius, adrian (old patch)
Approved by: bapt (mentor)
Relnotes: Yes
Differential Revision: https://reviews.freebsd.org/D540


# 292402 17-Dec-2015 smh

Revert r292275 & r292379

glebius has concerns about these changes so reverting those can be discussed
and addressed.

Sponsored by: Multiplay


# 292275 15-Dec-2015 smh

Fix lagg failover due to missing notifications

When using lagg failover mode neither Gratuitous ARP (IPv4) or Unsolicited
Neighbour Advertisements (IPv6) are sent to notify other nodes that the
address may have moved.

This results is slow failover, dropped packets and network outages for the
lagg interface when the primary link goes down.

We now use the new if_link_state_change_cond with the force param set to
allow lagg to force through link state changes and hence fire a
ifnet_link_event which are now monitored by rip and nd6.

Upon receiving these events each protocol trigger the relevant
notifications:
* inet4 => Gratuitous ARP
* inet6 => Unsolicited Neighbour Announce

This also fixes the carp IPv6 NA's that stopped working after r251584 which
added the ipv6_route__llma route.

The new behavour can be controlled using the sysctls:
* net.link.ether.inet.arp_on_link
* net.inet6.icmp6.nd6_on_link

Also removed unused param from lagg_port_state and added descriptions for the
sysctls while here.

PR: 156226
MFC after: 1 month
Sponsored by: Multiplay
Differential Revision: https://reviews.freebsd.org/D4111


# 290819 14-Nov-2015 melifaro

Move iflladdr_event eventhandler invocation to if_setlladdr.

Suggested by: glebius


# 290239 01-Nov-2015 melifaro

Fix lladdr change propagation for on vlans on top of it.
Fix lladdr update when setting mac address manually.
Fix lladdr_event for slave ports addition.

MFC after: 4 weeks
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D4004


# 289400 16-Oct-2015 hrs

Fix a panic when destroying a lagg interface.

Differential Revision: https://reviews.freebsd.org/D3883


# 288980 07-Oct-2015 hrs

Fix a bug that caused reinitialization failure of MAC addresses on
the lagg interface when removing the primary port.

PR: 201916
Differential Revision: https://reviews.freebsd.org/D3301


# 288654 04-Oct-2015 araujo

Remove per complete the fec aggregation protocol.
The remove began with revision r271733.

NOTE: This patch must never be merge to 10-Stable

Reviewed by: glebius
Approved by: bapt (mentor)
Relnotes: Yes
Sponsored by: EuroBSDCon Sweden.
Differential Revision: D3786


# 286700 12-Aug-2015 hiren

Make LAG LACP fast timeout tunable through IOCTL.

Differential Revision: D3300
Submitted by: LN Sundararajan <lakshmi.n at msystechnologies>
Reviewed by: wblock, smh, gnn, hiren, rpokala at panasas
MFC after: 2 weeks
Sponsored by: Panasas


# 280720 26-Mar-2015 ae

Fix a possible mbuf leak on interface departure.

Reported by: Alexandre Martins


# 279891 11-Mar-2015 hselasky

Factor out mbuf hashing code from LAGG driver so that other network
drivers can use it. This avoids some code duplication. Add missing
default case to all switch statements while at it. Also move the
hashing of the IPv6 flow field to layer 4 because the IPv6 flow field
is constant on a per L4 connection basis and not on a per L3 network.

Differential Revision: https://reviews.freebsd.org/D1987
Sponsored by: Mellanox Technologies
MFC after: 1 month


# 277544 23-Jan-2015 will

Improve the distribution of LAGG port traffic.

I edited the original change to retain the use of arc4random() as a seed for
the hashing as a very basic defense against intentional lagg port selection.

The author's original commit message (edited slightly):

sys/net/ieee8023ad_lacp.c
sys/net/if_lagg.c
In lagg_hashmbuf, use the FNV hash instead of the old
hash32_buf. The hash32 family of functions operate one octet
at a time, and when run on a string s of length n, their output
is equivalent to :

----- i=n-1
\
n \ (n-i-1) 32
( seed^ + / 33^ * s[i] ) % 2^
/
----- i=0

The problem is that the last five bytes of input don't get
multiplied by sufficiently many powers of 33 to rollover 2^32.
That means that changing the last few bytes (but obviously not
the very last) of input will always change the value of the
hash by a multiple of 33. In the case of lagg_hashmbuf() with
ipv4 input, the last four bytes are the TCP or UDP port
numbers. Since the output of lagg_hashmbuf is always taken
modulo the port count, and 3 is a common port count for a lagg,
that's bad. It means that the UDP or TCP source port will
never affect which lagg member is selected on a 3-port lagg.

At 10Gbps, I was not able to measure any difference in CPU
consumption between the old and new hash.

Submitted by: asomers (original commit)
Reviewed by: emaste, glebius
MFC after: 1 week
Sponsored by: Spectra Logic
MFSpectraBSD: 1001723 on 2013/08/28 (original)
1114258 on 2015/01/22 (edit)


# 277295 17-Jan-2015 ae

Fix condition and really sort ports. Also add comment describing
the intent of this code.

Reported by: sbruno
MFC after: 1 week
Sponsored by: Yandex LLC


# 275358 01-Dec-2014 hselasky

Start process of removing the use of the deprecated "M_FLOWID" flag
from the FreeBSD network code. The flag is still kept around in the
"sys/mbuf.h" header file, but does no longer have any users. Instead
the "m_pkthdr.rsstype" field in the mbuf structure is now used to
decide the meaning of the "m_pkthdr.flowid" field. To modify the
"m_pkthdr.rsstype" field please use the existing "M_HASHTYPE_XXX"
macros as defined in the "sys/mbuf.h" header file.

This patch introduces new behaviour in the transmit direction.
Previously network drivers checked if "M_FLOWID" was set in "m_flags"
before using the "m_pkthdr.flowid" field. This check has now now been
replaced by checking if "M_HASHTYPE_GET(m)" is different from
"M_HASHTYPE_NONE". In the future more hashtypes will be added, for
example hashtypes for hardware dedicated flows.

"M_HASHTYPE_OPAQUE" indicates that the "m_pkthdr.flowid" value is
valid and has no particular type. This change removes the need for an
"if" statement in TCP transmit code checking for the presence of a
valid flowid value. The "if" statement mentioned above is now a direct
variable assignment which is then later checked by the respective
network drivers like before.

Additional notes:
- The SCTP code changes will be committed as a separate patch.
- Removal of the "M_FLOWID" flag will also be done separately.
- The FreeBSD version has been bumped.

MFC after: 1 month
Sponsored by: Mellanox Technologies


# 273210 17-Oct-2014 hrs

- Fix lladdr configuration which could prevent LACP mode from working.
- Fix LORs when a laggport interface has an IPv6 LLA.

PR: 194321


# 272547 05-Oct-2014 hrs

- Move L2 addr configuration for the primary port to a taskqueue. This fixes
LOR of softc rmlock in iflladdr_event handlers.

- Call if_delmulti_ifma() after LACP_UNLOCK(). This fixes another LOR.

- Fix a panic in lacp_transit_expire().

- Fix a panic in lagg_input() upon shutting down a port.


# 272446 02-Oct-2014 hrs

Separate option handling from SIOC[SG]LAGG to SIOC[SG]LAGGOPTS for
backward compatibility with old ifconfig(8).


# 272386 01-Oct-2014 hrs

Virtualize lagg(4) cloner. This change fixes a panic when tearing down
if_lagg(4) interfaces which were cloned in a vnet jail.

Sysctl nodes which are dynamically generated for each cloned interface
(net.link.lagg.N.*) have been removed, and use_flowid and flowid_shift
ifconfig(8) parameters have been added instead. Flags and per-interface
statistics counters are displayed in "ifconfig -v".

CR: D842


# 272354 01-Oct-2014 glebius

Fix off by one in lagg_port_destroy().

Reported by: "Max N. Boyarov" <zotrix bsd.by>


# 272244 28-Sep-2014 glebius

Finally, convert counters in struct ifnet to counter(9).

Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# 272242 28-Sep-2014 glebius

Convert to if_inc_counter() last remnantes of bare access to struct ifnet
counters.


# 272211 27-Sep-2014 melifaro

Use underlying ports counters to get lagg statistics instead of
per-packet accounting.
This introduce user-visible changes like aggregating error counters.

Reviewed by: asomers (prev.version), glebius
CR: D781
MFC after: 2 weeks
Sponsored by: Yandex LLC


# 272179 26-Sep-2014 glebius

Remove macros that hide access to struct ifnet fields.


# 272178 26-Sep-2014 glebius

Make all lagg protocol methods live in lagg_protos, not in softc. All
interfaces of a same protocol, use the same methods.

Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# 272176 26-Sep-2014 ae

Keep list of lagg ports sorted by if_index.

Obtained from: Yandex LLC
MFC after: 1 week
Sponsored by: Yandex LLC


# 272175 26-Sep-2014 glebius

- Whitespace.
- Remove caddr_t.


# 272170 26-Sep-2014 glebius

- Provide lagg_proto_attach(), lagg_proto_detach().
- Make detach a protocol method in lagg_protos.
- Simplify code to lookup protocols.

Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# 272161 26-Sep-2014 glebius

- When reconfiguring protocol on a lagg, first set it to LAGG_PROTO_NONE,
then drop lock, run the attach routines, and then set it to specific
proto. This removes tons of WITNESS warnings.
- Make lagg protocol attach handlers not failing and allocate memory
with M_WAITOK.

Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# 272158 26-Sep-2014 glebius

Make lagg protocols detach methods returning void.

Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# 271946 22-Sep-2014 hselasky

Improve transmit sending offload, TSO, algorithm in general.

The current TSO limitation feature only takes the total number of
bytes in an mbuf chain into account and does not limit by the number
of mbufs in a chain. Some kinds of hardware is limited by two
factors. One is the fragment length and the second is the fragment
count. Both of these limits need to be taken into account when doing
TSO. Else some kinds of hardware might have to drop completely valid
mbuf chains because they cannot loaded into the given hardware's DMA
engine. The new way of doing TSO limitation has been made backwards
compatible as input from other FreeBSD developers and will use
defaults for values not set.

Reviewed by: adrian, rmacklem
Sponsored by: Mellanox Technologies
MFC after: 1 week


# 271732 18-Sep-2014 araujo

Add laggproto broadcast, it allows sends frames to all ports of the lagg(4) group
and receives frames on any port of the lagg(4).

Phabric: D549
Reviewed by: glebius, thompsa
Approved by: glebius
Obtained from: OpenBSD
Sponsored by: QNAP Systems Inc.


# 271551 13-Sep-2014 hselasky

Revert r271504. A new patch to solve this issue will be made.

Suggested by: adrian @


# 271504 13-Sep-2014 hselasky

Improve transmit sending offload, TSO, algorithm in general.

The current TSO limitation feature only takes the total number of
bytes in an mbuf chain into account and does not limit by the number
of mbufs in a chain. Some kinds of hardware is limited by two
factors. One is the fragment length and the second is the fragment
count. Both of these limits need to be taken into account when doing
TSO. Else some kinds of hardware might have to drop completely valid
mbuf chains because they cannot loaded into the given hardware's DMA
engine. The new way of doing TSO limitation has been made backwards
compatible as input from other FreeBSD developers and will use
defaults for values not set.

MFC after: 1 week
Sponsored by: Mellanox Technologies


# 269799 11-Aug-2014 araujo

- Remove unneeded include.

Phabric: D563
Reviewed by: kevlo
Approved by: kevlo


# 269492 04-Aug-2014 mav

Improve locking of multicast addresses in VLAN and LAGG interfaces.

This fixes several scenarios of reproducible panics, cause by races
between multicast address changes and interface destruction.

MFC after: 2 weeks


# 267992 28-Jun-2014 hselasky

Pull in r267961 and r267973 again. Fix for issues reported will follow.


# 267985 27-Jun-2014 gjb

Revert r267961, r267973:

These changes prevent sysctl(8) from returning proper output,
such as:

1) no output from sysctl(8)
2) erroneously returning ENOMEM with tools like truss(1)
or uname(1)
truss: can not get etype: Cannot allocate memory


# 267961 27-Jun-2014 hselasky

Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after: 2 weeks
Sponsored by: Mellanox Technologies


# 264498 15-Apr-2014 rmacklem

Fix build for non-INET that was broken by r264469.

MFC after: 2 weeks


# 264469 14-Apr-2014 rmacklem

Lagg did not set the value of if_hw_tsomax, so when lagg
was stacked on top of network interfaces that set if_hw_tsomax,
tcp_output() would see the default value instead of the value
set by the network interface(s). This patch modifies lagg so that
it sets if_hw_tsomax to the minimum of the value(s) for the
underlying network interfaces.

Reviewed by: glebius
MFC after: 2 weeks


# 260870 18-Jan-2014 melifaro

Simplify filling sockaddr_dl structure for if_resolvemulti()
callback providers. link_init_sdl() function can be used to
fill most of the parameters. Use caller stack instead of
allocation / freing memory for each request. Do not drop support
for extra-long (probably non-existing) link-layer protocols by
introducing link_alloc_sdl() (used by if_resolvemulti() callback)
and link_free_sdl() (used by caller).
Since this change breaks KBI, MFC requires slightly different approach
(link_init_sdl() auto-allocating buffer if necessary to handle cases
with unmodified if_resolvemulti() callers).

MFC after: 2 weeks


# 260070 30-Dec-2013 scottl

Multi-queue NIC drivers and multi-port lagg tend to use the same lower
bits of the flowid as each other, resulting in a poor distribution of
packets among queues in certain cases. Work around this by adding a
set of sysctls for controlling a bit-shift on the flowid when doing
multi-port aggrigation in lagg and lacp. By default, lagg/lacp will
now use bits 16 and higher instead of 0 and higher.

Reviewed by: max
Obtained from: Netflix
MFC after: 3 days


# 256218 09-Oct-2013 glebius

There are some high performance NICs that count statistics in hardware,
and there are ifnets, that do that via counter(9). Provide a flag that
would skip cache line trashing '+=' operation in ether_input().

Sponsored by: Netflix
Sponsored by: Nginx, Inc.
Reviewed by: melifaro, adrian
Approved by: re (marius)


# 255038 29-Aug-2013 adrian

Convert the if_lagg rwlock to an rmlock.

We've been seeing lots of cache line contention (but not lock contention!)
in our workloads between the various TX and RX threads going on.

The write lock is only grabbed when configuration changes are made - which
are infrequent.

With this patch, the contention and cycles spent waiting for updates
disappear.

Sponsored by: Netflix, Inc.


# 253687 26-Jul-2013 adrian

Break out the static, global LACP debug options into a per-lagg unit
sysctl tree.

* Create a net.link.lagg.X.lacp node
* Add a debug node under that for tx_test and rx_test
* Add lacp_strict_mode, defaulting to 1

tx_test and rx_test are still a bitmap of unit numbers for now.
At some point it would be nice to create child nodes of the lagg bundle
for each sub-interface, and then populate those with various knobs
and statistics.

Sponsored by: Netflix


# 253314 13-Jul-2013 adrian

Bring over some link aggregation / LACP protocol improvements and debugging
additions.

* Add some new tracing events to aid in debugging.
* Add in a debugging mode to drop transmit and received frames, specifically
to test whether seeing or hearing heartbeats correctly cause LACP to
drop the port.
* Add in (and make default) a strict LACP mode, which requires the
heartbeat on a port to be heard before it's used. Sometimes vendor ports
will hang but the link layer stays up, resulting in hung traffic.
* Add logging the number of link status flaps, again to aid in debugging
badly behaving switch ports.
* Calculate the lagg interface port speed as the multiple of the
configured ports, rather than the largest.

Obtained from: Netflix
MFC after: 2 weeks


# 252511 02-Jul-2013 hrs

- Allow ND6_IFF_AUTO_LINKLOCAL for IFT_BRIDGE. An interface with IFT_BRIDGE
is initialized with !ND6_IFF_AUTO_LINKLOCAL && !ND6_IFF_ACCEPT_RTADV
regardless of net.inet6.ip6.accept_rtadv and net.inet6.ip6.auto_linklocal.
To configure an autoconfigured link-local address (RFC 4862), the
following rc.conf(5) configuration can be used:

ifconfig_bridge0_ipv6="inet6 auto_linklocal"

- if_bridge(4) now removes IPv6 addresses on a member interface to be
added when the parent interface or one of the existing member
interfaces has an IPv6 address. if_bridge(4) merges each link-local
scope zone which the member interfaces form respectively, so it causes
address scope violation. Removal of the IPv6 addresses prevents it.

- if_lagg(4) now removes IPv6 addresses on a member interfaces
unconditionally.

- Set reasonable flags to non-IPv6-capable interfaces. [*]

Submitted by: rpaulo [*]
MFC after: 1 week


# 251859 17-Jun-2013 delphij

Return ENETDOWN instead of ENOENT when all lagg(4) links are
inactive when upper layer tries to transmit packet. This
gives better feedback and meaningful errors for applications.

MFC after: 2 weeks
Reviewed by: thompsa


# 251490 07-Jun-2013 trociny

Properly set curvnet context in lagg_port_setlladdr() task handler.

Reported by: Nikos Vassiliadis <nvass gmx.com>
Submitted by: zec
Tested by: Nikos Vassiliadis <nvass gmx.com>
MFC after: 1 week


# 249925 26-Apr-2013 glebius

Add const qualifier to the dst parameter of the ifnet if_output method.


# 249506 15-Apr-2013 glebius

Switch lagg(4) statistics to counter(9).

The lagg(4) is often used to bond high speed links, so basic per-packet +=
on statistics cause cache misses and statistics loss.

Perfect solution would be to convert ifnet(9) to counters(9), but this
requires much more work, and unfortunately ABI change, so temporarily
patch lagg(4) manually.

We store counters in the softc, and once per second push their values
to legacy ifnet counters.

Sponsored by: Nginx, Inc.


# 248621 22-Mar-2013 glebius

Remove __FreeBSD_version ifdefs.


# 245741 21-Jan-2013 glebius

If lagg(4) can't forward a packet due to underlying port problems,
return much more meaningful ENETDOWN to the stack, instead of EBUSY.


# 241627 17-Oct-2012 delphij

Fix build.


# 241619 16-Oct-2012 emax

report total number of ports for each lagg(4) interface
via net.link.lagg.X.count sysctl

MFC after: 1 week


# 241610 16-Oct-2012 glebius

Make the "struct if_clone" opaque to users of the cloning API. Users
now use function calls:

if_clone_simple()
if_clone_advanced()

to initialize a cloner, instead of macros that initialize if_clone
structure.

Discussed with: brooks, bz, 1 year ago


# 241394 10-Oct-2012 kevlo

Revert previous commit...

Pointyhat to: kevlo (myself)


# 241370 09-Oct-2012 kevlo

Prefer NULL over 0 for pointers


# 240742 20-Sep-2012 glebius

Convert lagg(4) to use if_transmit instead of if_start.

In collaboration with: thompsa, sbruno, fabient


# 237852 30-Jun-2012 thompsa

Add the same check as vlan(4) where we ignore the ifnet departure event if the
interface is just being renamed.

PR: kern/169557
Submitted by: Mark Johnston
MFC after: 3 days


# 236178 28-May-2012 rea

if_lagg: allow to invoke SIOCSLAGGPORT multiple times in a row

Currently, 'ifconfig laggX down' does not remove members from this
lagg(4) interface. So, 'service netif stop laggX' followed by
'service netif start laggX' will choke, because "stop" will leave
interfaces attached to the laggX and ifconfig from the "start" will
refuse to add already-existing interfaces.

The real-world case is when I am bundling together my Ethernet and
WiFi interfaces and using multiple profiles for accessing network in
different places: system being booted up with one profile, but later
this profile being exchanged to another one, followed by 'service
netif restart' will not add WiFi interface back to the lagg: the
"stop" action from 'service netif restart' will shut down my main WiFi
interface, so wlan0 that exists in the lagg0 will be destroyed and
purged from lagg0; the "start" action will try to re-add both
interfaces, but since Ethernet one is already in lagg0, ifconfig will
refuse to add the wlan0 from WiFi interface.

Since adding the interface to the lagg(4) when it is already here
should be an idempotent action: we're really not changing anything,
so this fix doesn't change the semantics of interface addition.

Approved by: thompsa
Reviewed by: emaste
MFC after: 1 week


# 234936 03-May-2012 emaste

Relax restriction on direct tx to child ports

Lagg(4) restricts the type of packet that may be sent directly to a child
port, to avoid undesired output from accidental misconfiguration.
Previously only ETHERTYPE_PAE was permitted.

BPF writes to a lagg(4) child port are presumably intentional, so just
allow them, while still blocking other packets that should take the
aggregation path.

PR: kern/138620
Approved by: thompsa@


# 234163 12-Apr-2012 thompsa

Set the proto to LAGG_PROTO_NONE before calling the detach routine so packets
are discarded, this is an issue because lacp drops the lock which may allow
network threads to access freed memory. Expand the lock coverage so the
detach/attach happen atomically.

Submitted by: Andrew Boyer (earlier version)


# 232640 07-Mar-2012 thompsa

Move the vlan buffer space into the union which also fixes an unused variable
warning with !INET & !INET6.

Spotted by: pluknet


# 232629 06-Mar-2012 thompsa

Add the ability to set which packet layers are used for the load balance hash
calculation.


# 232080 23-Feb-2012 thompsa

Add a sysctl/tunable default value for the use_flowid sysctl in r232008.


# 232008 22-Feb-2012 thompsa

Using the flowid in the mbuf assumes the network card is giving a good hash for
the traffic flow, this may not be the case giving poor traffic distribution.
Add a sysctl which allows us to fall back to our own flow hash code.

PR: kern/164901
Submitted by: Eugene Grosbein
MFC after: 1 week


# 227459 11-Nov-2011 brooks

In r191367 the need for if_free_type() was removed and a new member
if_alloctype was used to store the origional interface type. Take
advantage of this change by removing all existing uses of if_free_type()
in favor of if_free().

MFC after: 1 Month


# 227309 07-Nov-2011 ed

Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.

The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.


# 224571 01-Aug-2011 pluknet

Add missing MODULE_VERSION() definition to protect against duplicating
module loads.

PR: kern/159345
Reported by: Eugene Grosbein <egrosbein att rdtc ru>
Tested by: Eugene Grosbein <egrosbein att rdtc ru>
Approved by: re (kib)
MFC after: 1 week


# 223846 07-Jul-2011 thompsa

Grab the rlock before checking if our interface is enabled, it could be
possible to hit a dead pointer when changing interfaces.

PR: kern/156978
Submitted by: Andrew Boyer
MFC after: 1 week


# 221270 30-Apr-2011 thompsa

LACP frames must not be send VLAN-tagged, check for that before processing.

PR: kern/156743
Submitted by: Dmitrij Tejblum
MFC after: 1 week


# 221130 27-Apr-2011 bz

Make various (pseudo) interfaces compile without INET in the kernel
adding appropriate #ifdefs. For module builds the framework needs
adjustments for at least carp.

Reviewed by: gnn
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
MFC after: 4 days


# 219275 04-Mar-2011 eri

Fix a panic that can happen when trying to destroy a lagg(4) with scheduler set to none.

Approved by: thompsa(mentor)
MFC after: 1 week


# 212100 01-Sep-2010 emaste

Add a sysctl knob to accept input packets on any link in a failover lagg.


# 204901 09-Mar-2010 delphij

Remove the check for IFF_DRV_OACTIVE right before adding a port into lagg
interface. The check itself seems to be coming from OpenBSD but does not
seem to be useful for our code.

Discussed with: thomasa
MFC after: 1 month


# 203548 06-Feb-2010 eri

Propagate the vlan eventis to the underlying interfaces/members so they can do initialization of hw related features.

PR: kern/141646
Reviewed by: thompsa
Approved by: thompsa(co-mentor)
MFC after: 2 weeks


# 202588 18-Jan-2010 thompsa

Declare a new EVENTHANDLER called iflladdr_event which signals that the L2
address on an interface has changed. This lets stacked interfaces such as
vlan(4) detect that their lower interface has changed and adjust things in
order to keep working. Previously this situation broke at least vlan(4) and
lagg(4) configurations.

The EVENTHANDLER_INVOKE call was not placed within if_setlladdr() due to the
risk of a loop.

PR: kern/142927
Submitted by: Nikolay Denev


# 201803 08-Jan-2010 trasz

Stop GCC from complaining about lagg_port_checkstacking() being unused.


# 191692 30-Apr-2009 thompsa

Use the flowid if its available for selecting the tx port.


# 191148 16-Apr-2009 kmacy

Change if_output to take a struct route as its fourth argument in order
to allow passing a cached struct llentry * down to L2

Reviewed by: rwatson


# 186255 17-Dec-2008 thompsa

- Protect against sc->sc_primary being null
- Initialise speed where its used


# 186254 17-Dec-2008 thompsa

Update the interface baudrate taking into account the max speed for the
different aggregation protocols.


# 186195 16-Dec-2008 thompsa

Also propagate the if_hwassist value to the parent so that cksum offload works.

Submitted by: Tom Hicks (thicks_averesys.com)


# 185164 22-Nov-2008 kmacy

convert calls to IFQ_HANDOFF to if_transmit


# 183498 30-Sep-2008 glebius

Do not mangle if_oerrors of the underlying interface. This counter
belongs solely to the driver.
We don't lose any statistics with this change, because in a error
case the drop counter on the interface output queue is always incremented.

Reviewed by: thompsa


# 183160 18-Sep-2008 thompsa

Move the protocol and port count checks to outside the loop, these conditions
can not change while we have the lock so no point retesting.


# 183135 18-Sep-2008 thompsa

Make sure there is at least one port to avoid divide by zero when choosing the
tx port.

PR: kern/122794
MFC after: 3 days


# 180249 04-Jul-2008 thompsa

port % count will never be greater than LAGG_MAX_PORTS so nuke the test.


# 177274 16-Mar-2008 thompsa

Switch the LACP state machine over to its own mutex to protect the internals,
this means that it no longer grabs the lagg rwlock. Use two port table arrays
which list the active ports for Tx and switch between them with an atomic op.
Now the lagg rwlock is only exclusively locked for management (ioctls) and
queuing of lacp control frames isnt needed.


# 175005 31-Dec-2007 thompsa

Pass any unmatched slowprotocols frames up the stack instead of dropping them,
there are more subtypes than just LACP.


# 174742 18-Dec-2007 thompsa

- Use the macro to check the port status has it will also test if its
administratively down (!IFF_UP)
- Use the same parameters to lagg_link_active() to get the backup port as in
the output path, this didnt actually matter in practice as sc_primary is
always the first on the port list.

MFC after: 3 days


# 174721 17-Dec-2007 thompsa

Add myself to the copyright.


# 174278 05-Dec-2007 thompsa

Support monitor mode where the frame is discarded after bpf and stats processing.


# 173895 25-Nov-2007 thompsa

Have the lagg interface generate link up/down events, the interface is marked
as up if at least one of its ports also has a link up. This fixes using
carp+lagg together and any other system that relies on linkstate events.

PR: kern/113956
MFC after: 3 days


# 172825 20-Oct-2007 thompsa

Use ETHER_BPF_MTAP so that the vlan tags are visible to bpf(4) when stacked
under a vlan.

MFC after: 3 days


# 172554 12-Oct-2007 thompsa

Fix two panics in lagg.

1. The locking was changed to shared but roundrobin mode still updated a
pointer in the softc with the next tx interface to use. This will panic
under high load. Change this to an atomically incremented sequence number in
order to choose the tx port in round robin.

2. IFQ_HANDOFF will free the mbuf if the queue is full, this will then be freed
again by lagg_start() and panic. Reorganised the error handling and freeing
to fix this.

MFC after: 3 days


# 172020 30-Aug-2007 thompsa

Show the ACTIVE flag in ifconfig for the single interface that is actaully
active in failover mode rather than all interfaces with a link. This makes it
clear if the master interface is in use or one of the backup links.

Found by: Writing the Handbook section
Approved by: re (kensmith)


# 171661 30-Jul-2007 thompsa

- Propagate the largest set of interface capabilities supported by all lagg
ports to the lagg interface.
- Use the MTU from the first interface as the lagg MTU, all extra interfaces
must be the same.

This fixes using a lagg interface for a vlan or enabling jumbo frames, etc.

Approved by: re (kensmith)
MFC After: 3 days


# 171603 26-Jul-2007 thompsa

Avoid holding the softc lock when using copyout().

Reported by: dfr
Approved by: re (rwatson)


# 171247 05-Jul-2007 thompsa

Allow the LACP state to be queried from userland which at the moment is the
actor and partner peer info. Print out the active aggregator and per port data
in verbose mode from ifconfig.

Approved by: re (mux)


# 170599 12-Jun-2007 thompsa

non-functional cleanup
- remove dead code
- use consistent variable names
- gc unused defines
- whitespace cleanup


# 169783 20-May-2007 thompsa

- packets on the input interface were counted twice
- Use IFQ_HANDOFF instead of rolling our own


# 169698 19-May-2007 thompsa

Fix a mbuf leak where sc_start fails or the protocol is none.


# 169688 18-May-2007 thompsa

Fix locking assert where we should hold the reader lock.


# 169583 15-May-2007 thompsa

Fix unused variable error with !INET6

Reported by: Artem Naluzhny, Frank Terhaar-Yonkers


# 169570 15-May-2007 thompsa

Feed ipv6 flowlabel to hash calculation.

Obtained from: NetBSD


# 169569 15-May-2007 thompsa

Change from a mutex to a read/write lock. This allows the tx port to be
selected simultaneously by multiple senders and transmit/receive is not
serialised between aggregated interfaces.


# 169340 07-May-2007 thompsa

- Correctly check if lp_ioctl is null
- Remove lagg_ether_purgemulti as its no longer needed
- Mark the interface as up if any ports are active rather than just the primary


# 169330 07-May-2007 thompsa

The purgemulti call is not needed since all the ports have already been detached.


# 169329 07-May-2007 thompsa

Call if_setlladdr() on the aggregation port from a taskqueue so the softc lock
is not held. The short delay between aggregating the port and setting the MAC
address is fine.


# 169328 07-May-2007 thompsa

Avoid touching various unsafe parts if the interface is disappearing.


# 169327 07-May-2007 thompsa

Change from using if_delmulti() to if_delmulti_ifma() as it simplifies the code
and is safe to use if the ifp has disappeared.

Suggested by: bms


# 169227 03-May-2007 thompsa

- Add a disabled state for ports that can not be aggregated
- Refine check for lacp links, set to disabled if not suitable


# 169204 02-May-2007 thompsa

Set the master flag on the right variable.


# 168793 17-Apr-2007 thompsa

Rename the trunk(4) driver to lagg(4) as it is too similar to vlan trunking.

The name trunk is misused as the networking term trunk means carrying multiple
VLANs over a single connection. The IEEE standard for link aggregation (802.3
section 3) does not talk about 'trunk' at all while it is used throughout IEEE
802.1Q in describing vlans.

The lagg(4) driver provides link aggregation, failover and fault tolerance.

Discussed on: current@