History log of /freebsd-current/sys/net/iflib.h
Revision Date Author Comments
# ed34a6b6 17-Jan-2023 Eric Joyner <erj@FreeBSD.org>

iflib: Add subinterface interrupt allocation function

The ice(4) driver will add the ability to create extra interfaces
that hang off of the base interface; to do that the driver requires
a method for the subinterface to request hardware interrupt resources
from the base interface.

Signed-off-by: Eric Joyner <erj@FreeBSD.org>

MFC after: 3 days
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D39930


# 3c7da27a 22-Mar-2023 Eric Joyner <erj@FreeBSD.org>

iflib: Add sysctl to request extra MSIX vectors on driver load

Intended to be used with upcoming feature to add sub-interfaces, since
those new interfaces will be dynamically created and will need to have
spare MSI-X interrupts already allocated for them on driver load.

This sysctl is marked as a tunable since it will need to be set before
the driver is loaded since MSI-X interrupt allocation and setup is
done during the attach process.

Signed-off-by: Eric Joyner <erj@FreeBSD.org>

MFC after: 3 days
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D41326


# 95ee2897 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: two-line .h pattern

Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/


# 7f527d48 04-Aug-2023 Eric Joyner <erj@FreeBSD.org>

iflib: Fix white space and reduce some line lengths

This helps align some of the code with the rest of the style used in
iflib, but as marius@ points out, this is not style(9).

Signed-off-by: Eric Joyner <erj@FreeBSD.org>

Reviewed by: kbowling@
MFC after: 3 days
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D41324


# 7ff9ae90 03-Aug-2023 Marius Strobl <marius@FreeBSD.org>

iflib(9): Remove support for cloning pseudo interfaces

This code was used by the first incarnation of wg(4) and is dead ever
since f187d6dfbf633665ba6740fe22742aec60ce02a2 has removed the latter
again. Moreover, this code matched iflib(4) like a square peg fits in
a round hole, was incomplete and despite some hacks still tailored to
VPC and wg(4) but not generic. In effect, this reverts the following:
09f6ff4f1a47c3009dc16fdc609a44f2341bc7ac (w/ its "ancillary changes")
9aeca21324f481f57f2ecb7009f461f4f51b62b3
1f93e931d9f0c688f43f98ef777e04636a325526
0f9544d03e89d180f94a7a84b110ec7d2b6c625a
0dd691b41276ce13d25ffb1443af27f85038aa3f

Reviewed by: erj, kbowling
Differential Revision: <https://reviews.freebsd.org/D41196>


# 9c950139 17-Oct-2022 Eric Joyner <erj@FreeBSD.org>

iflib: Introduce v2 of TX Queue Select Functionality

For v2, iflib will parse packet headers before queueing a packet.

This commit also adds a new field in the structure that holds parsed
header information from packets; it stores the IP ToS/traffic class
field found in the IPv4/IPv6 header.

To help, it will only partially parse header packets before queueing
them by using a new header parsing function that does less than the
current parsing header function; for our purposes we only need up to the
minimal IP header in order to get the IP ToS infromation and don't need
to pull up more data.

For now, v1 and v2 co-exist in this patch; v1 still offers a
less-invasive method where none of the packet is parsed in iflib before
queueing.

This also bumps the sys/param.h version.

Signed-off-by: Eric Joyner <erj@FreeBSD.org>
Tested by: IntelNetworking
MFC after: 3 days
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D34742


# 213e9139 29-Jul-2021 Eric Joyner <erj@FreeBSD.org>

iflib: Allow drivers to determine which queue to TX on

Adds a new function pointer to struct if_txrx in order to allow
drivers to set their own function that will determine which queue
a packet should be sent on.

Since this includes a kernel ABI change, bump the __FreeBSD_version
as well.

(This motivation behind this is to allow the driver to examine the
UP in the VLAN tag and determine which queue to TX on based on
that, in support of HW TX traffic shaping.)

Signed-off-by: Eric Joyner <erj@FreeBSD.org>

Reviewed by: kbowling@, stallamr@netapp.com
Tested by: jeffrey.e.pieper@intel.com
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D31485


# 58632fa7 19-May-2021 Marcin Wojtas <mw@FreeBSD.org>

iflib: Add a new quirk

ENETC NIC found in LS1028A has a bug where clearing TX pidx/cidx
causes the ring to hang after being re-enabled.
Add a new flag, if set iflib will preserve the indices during restart.

Submitted by: Kornel Duleba <mindal@semihalf.com>
Reviewed by: gallatin, erj
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential Revision: https://reviews.freebsd.org/D30728


# ffe3def9 07-Mar-2021 Mark Johnston <markj@FreeBSD.org>

iflib: Make if_shared_ctx_t a pointer to const

This structure is shared among multiple instances of a driver, so we
should ensure that it doesn't somehow get treated as if there's a
separate instance per interface. This is especially important for
software-only drivers like wg.

DEVICE_REGISTER() still returns a void * and so the per-driver sctx
structures are not yet defined with the const qualifier.

Reviewed by: gallatin, erj
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29102


# 09c3f04f 02-Mar-2021 Marcin Wojtas <mw@FreeBSD.org>

iflib: add support for admin completion queues

For interfaces with admin completion queues, introduce a new devmethod
IFDI_ADMIN_COMPLETION_HANDLE and a corresponding flag IFLIB_HAS_ADMINCQ.

This provides an option for handling any admin cq logic, which cannot be
run from an interrupt context.

Said method is called from within iflib's admin task, making it safe to
sleep.

Reviewed by: mmacy
Submitted by: Artur Rojek <ar@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
Differential Revision: https://reviews.freebsd.org/D28708


# 6dd69f00 24-Feb-2021 Marcin Wojtas <mw@FreeBSD.org>

iflib: introduce isc_dma_width

Some DMA controllers are unable to address the full host memory space
and are instead limited to a subset of address range (e.g. 48-bit).

Allow the driver to specify the maximum allowed DMA addressing width
(in bits) for the NIC hardware, by introducing a new field in
if_softc_ctx.

If said field is omitted (set to 0), the lowaddr of DMA window bounds
defaults to BUS_SPACE_MAXADDR.

Submitted by: Artur Rojek <ar@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
Differential Revision: https://reviews.freebsd.org/D28706


# 81be6552 18-Dec-2020 Matt Macy <mmacy@FreeBSD.org>

iflib: ensure that tx interrupts enabled and cleanups

Doing a 'dd' over iscsi will reliably cause stalls. Tx
cleaning _should_ reliably happen as data is sent.
However, currently if the transmit queue fills it will
wait until the iflib timer (hz/2) runs.

This change causes the the tx taskq thread to be run
if there are completed descriptors.

While here:

- make timer interrupt delay a sysctl

- simplify txd_db_check handling

- comment on INTR types

Background on the change:

Initially doorbell updates were minimized by only writing to the register
on every fourth packet. If txq_drain would return without writing to the
doorbell it scheduled a callout on the next tick to do the doorbell write
to ensure that the write otherwise happened "soon". At that time a sysctl
was added for users to avoid the potential added latency by simply writing
to the doorbell register on every packet. This worked perfectly well for
e1000 and ixgbe ... and appeared to work well on ixl. However, as it
turned out there was a race to this approach that would lockup the ixl MAC.
It was possible for a lower producer index to be written after a higher one.
On e1000 and ixgbe this was harmless - on ixl it was fatal. My initial
response was to add a lock around doorbell writes - fixing the problem but
adding an unacceptable amount of lock contention.

The next iteration was to use transmit interrupts to drive delayed doorbell
writes. If there were no packets in the queue all doorbell writes would be
immediate as the queue started to fill up we could delay doorbell writes
further and further. At the start of drain if we've cleaned any packets we
know we've moved the state machine along and we write the doorbell (an
obvious missing optimization was to skip that doorbell write if db_pending
is zero). This change required that tx interrupts be scheduled periodically
as opposed to just when the hardware txq was full. However, that just leads
to our next problem.

Initially dedicated msix vectors were used for both tx and rx. However, it
was often possible to use up all available vectors before we set up all the
queues we wanted. By having rx and tx share a vector for a given queue we
could halve the number of vectors used by a given configuration. The problem
here is that with this change only e1000 passed the necessary value to have
the fast interrupt drive tx when appropriate.

Reported by: mav@
Tested by: mav@
Reviewed by: gallatin@
MFC after: 1 month
Sponsored by: iXsystems
Differential Revision: https://reviews.freebsd.org/D27683


# 662c1305 01-Sep-2020 Mateusz Guzik <mjg@FreeBSD.org>

net: clean up empty lines in .c and .h files


# 6d84e76a 12-Aug-2020 Vincenzo Maffione <vmaffione@FreeBSD.org>

iflib: netmap: improve rxsync to support IFLIB_HAS_RXCQ

For drivers with IFLIB_HAS_RXCQ set, there is a separate completion
queue. In this case, the netmap rxsync routine needs to update
rxq->ifr_cq_cidx in the same way it is updated by iflib_rxeof().
This improves the situation for vmx(4) and bnxt(4) drivers, which
use iflib and have the IFLIB_HAS_RXCQ bit set.

PR: 248494
MFC after: 3 weeks


# b256d25c 06-Jul-2020 Mark Johnston <markj@FreeBSD.org>

iflib: Fix some nits in the rx refill code.

- Get rid of the ifl_vm_addrs array. It is not used by any existing
consumer, so we are just dirtying a couple of cache lines for no
reason.
- Use uma_zalloc(fl->ifl_zone) instead of m_cljget(). Otherwise
m_cljget() is doing unnecessary work to look up the correct zone, when
iflib already knows what that zone is.
- ifl_gen is only used when INVARIANTS is on, so make that more clear.
- Fix some style nits and inconsistencies.

Reviewed by: gallatin
Tested by: pho
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25490


# 9aeca213 21-Jun-2020 Matt Macy <mmacy@FreeBSD.org>

iflib: fix cloneattach fail and generalize pseudo device handling

- a cloneattach failure will not currently be handled correctly,
jump to the right target

- pseudo devices are all treat as if they're ethernet devices -
this often doesn't make sense

MFC after: 1 week
Sponsored by: Netgate, Inc.
Differential Revision: https://reviews.freebsd.org/D25083


# 45818bf1 27-Apr-2020 Eric Joyner <erj@FreeBSD.org>

iflib: Stop interface before (un)registering VLAN

This patch is intended to solve a specific problem that iavf(4)
encounters, but what it does can be extended to solve other issues.

To summarize the iavf(4) issue, if the PF driver configures VLAN
anti-spoof, then the VF driver needs to make sure no untagged traffic is
sent if a VLAN is configured, and vice-versa. This can be an issue when
a VLAN is being registered or unregistered, e.g. when a packet may be on
the ring with a VLAN in it, but the VLANs are being unregistered. This
can cause that tagged packet to go out and cause an MDD event.

To fix this, include a new interface-dependent function that drivers can
implement named IFDI_NEEDS_RESTART(). Right now, this function is called
in iflib_vlan_unregister/register() to determine whether the interface
needs to be stopped and started when a VLAN is registered or
unregistered. The default return value of IFDI_NEEDS_RESTART() is true,
so this fixes the MDD problem that iavf(4) encounters, since the
interface rings are flushed during a stop/init.

A future change to iavf(4) will implement that function just in case the
default value changes, and to make it explicit that this interface reset
is required when a VLAN is added or removed.

Reviewed by: gallatin@
MFC after: 1 week
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D22086


# 704101dd 24-Mar-2020 Conrad Meyer <cem@FreeBSD.org>

Fix PNP matching for iflib NIC drivers

The previous descriptor string specified that all fields were significant for
match. However, the only significant fields for in-tree drivers are
vendor:devid, and the fictitious zero values constructed by PVID() did not
match real subvendor, subdevice, revision, and/or class values, resulting in no
automatic probe.

If a future iflib driver needs to match on other criteria, the descriptor
string can be updated accordingly. (E.g., "V32" and ~0 for unspecified values
in PVID().)

Reported by: mav
Sponsored by: Dell EMC Isilon


# b3813609 14-Mar-2020 Patrick Kelsey <pkelsey@FreeBSD.org>

Allow iflib drivers to specify the buffer size used for each receive queue

Reviewed by: erj, gallatin
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D23947


# 41669133 30-Sep-2019 Mark Johnston <markj@FreeBSD.org>

Add IFLIB_SINGLE_IRQ_RX_ONLY.

As of r347221 the iflib legacy interrupt mode setup assumes that drivers
perform both receive and transmit processing from the interrupt handler.
This assumption is invalid in the vmxnet3 driver, so introduce the
IFLIB_SINGLE_IRQ_RX_ONLY flag to make iflib avoid tx processing in the
interrupt handler.

PR: 239118
Reported and tested by: Juraj Lutter <otis@sk.freebsd.org>
Obtained from: marius
Reviewed by: gallatin
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D21831


# d49e83ea 15-Jun-2019 Marius Strobl <marius@FreeBSD.org>

- Replace unused and only ever written to members of public iflib(9)
structs with placeholders (in the latter case, IFLIB_MAX_TX_BYTES
etc. are also only ever used for these write-only members if at all,
so both these macros and members can just go). Using these spares
may render it possible to merge certain iflib(9) fixes to stable/12.
Otherwise, changes extending struct if_irq or struct if_shared_ctx
in any way would break KBI as instances of these are allocated by
the driver front-ends (by contrast, struct if_pkt_info as well as
struct if_softc_ctx instances are provided by iflib(9) and, thus,
may grow at least at the end without breaking KBI).
- Make the pvi_name in struct pci_vendor_info const char * as device
identifiers in hardware lookup tables aren't to be expected to ever
change at runtime.
- Similarly, make the pci_vendor_info_t of struct if_shared_ctx which
is used to point to the struct pci_vendor_info arrays provided by
the driver front-ends const.
- Remove the ETH_ADDR_LEN macro from iflib.h; this was duplicating
ETHER_ADDR_LEN of <net/ethernet.h> with iflib(9) actually only
consuming the latter macro.
- Make the name argument of iflib_io_tqg_attach(9) const, matching
the taskqgroup_attach_cpu(9) this function wraps as well as e. g.
iflib_config_gtask_init(9).
- Remove the orphaned iflib_qset_lock_get() prototype.
- Remove some extraneous empty lines.


# 668d6dbb 29-May-2019 Eric Joyner <erj@FreeBSD.org>

iflib: provide probe wrapper for vendor drivers

From Jake:
Vendor drivers that exist out-of-tree generally should return
BUS_PROBE_VENDOR from their device probe functions. This helps ensure
that a vendor replacement driver will supersede the in-kernel driver for
a given device.

Currently, if a vendor wants to implement a driver based on iflib, it
will always report BUS_PROBE_DEFAULT.

Add a wrapper function, iflib_device_probe_vendor() which can be used in
place of iflib_device_probe(). This function will just return
BUS_PROBE_VENDOR whenever iflib_device_probe() would return
BUS_PROBE_DEFAULT.

While vendor drivers can already implement such a wrapper themselves,
providing it in the iflib.h header makes it easier for the vendor driver
to do the right thing.

Submitted by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed by: erj@, gallatin@, marius@
MFC after: 1 week
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D20221


# 1722eeac 06-May-2019 Marius Strobl <marius@FreeBSD.org>

- Remove the unused ifc_link_irq and ifc_mtx_name members of struct iflib_ctx.
- Remove the only ever written to ift_db_mtx_name member of struct iflib_txq.
- Remove the unused or only ever written to ifr_size, ifr_cq_pidx, ifr_cq_gen
and ifr_lro_enabled members of struct iflib_rxq.
- Consistently spell DMA, RX and TX uppercase in comments, messages etc.
instead of mixing with some lowercase variants.
- Consistently use if_t instead of a mix of if_t and struct ifnet pointers.
- Bring the function comments of _iflib_fl_refill(), iflib_rx_sds_free() and
iflib_fl_setup() in line with reality.
- Judging problem reports, people are wondering what on earth messages like:
"TX(0) desc avail = 1024, pidx = 0"
are trying to indicate. Thus, extend this string to be more like that of
non-iflib(4) Ethernet MAC drivers, notifying about a watchdog timeout due
to which the interface will be reset.
- Take advantage of the M_HAS_VLANTAG macro.
- Use false/true rather than FALSE/TRUE for variables of type bool.
- Use FALLTHROUGH as advocated by style(9).


# e2621d96 03-May-2019 Matt Macy <mmacy@FreeBSD.org>

Allow iflib drivers to pass a pointer to their own ifmedia structure.

Tested by: emaste@

Differential Revision: https://reviews.freebsd.org/D19946


# 10a1e981 19-Mar-2019 Eric Joyner <erj@FreeBSD.org>

iflib: mark isc_driver_version as constant

From Jake:
The iflib core never modifies the isc_driver_version string. Allow
drivers to safely assign pointers to constant buffers by marking this
parameter const.

Submitted by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed by: erj@, gallatin@, jhb@
MFC after: 1 week
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D19577


# 1b9d9394 19-Mar-2019 Eric Joyner <erj@FreeBSD.org>

iflib: expose the Rx mbuf buffer size to drivers

From Jake:
iflib_fl_setup calculates a suitable buffer size for the Rx mbufs based
on the isc_max_frame_size value that drivers setup. This calculation is
repeated by drivers when programming their hardware with the size of
each Rx buffer.

This can lead to a mismatch where the iflib mbuf size is different from
the expected size of the buffer as programmed by the hardware. This can
lead to unexpected results.

If iflib ever wants to support mbuf sizes larger than one page, every
driver must be updated to account for the new possible buffer sizes.

Fix this by calculating the mbuf size prior to calling IFDI_INIT, and
adding the iflib_get_rx_mbuf_sz function which will expose this value to
drivers, so that they do not repeat the same calculation.

Submitted by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed by: shurd@, erj@
MFC after: 1 week
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D19489


# 8f82136a 21-Jan-2019 Patrick Kelsey <pkelsey@FreeBSD.org>

onvert vmx(4) to being an iflib driver.

Also, expose IFLIB_MAX_RX_SEGS to iflib drivers and add
iflib_dma_alloc_align() to the iflib API.

Performance is generally better with the tunable/sysctl
dev.vmx.<index>.iflib.tx_abdicate=1.

Reviewed by: shurd
MFC after: 1 week
Relnotes: yes
Sponsored by: RG Nets
Differential Revision: https://reviews.freebsd.org/D18761


# 77c1fcec 12-Oct-2018 Eric Joyner <erj@FreeBSD.org>

ixl/iavf(4): Change ixlv to iavf and update it to use iflib(9)

Finishes the conversion of the 40Gb Intel Ethernet drivers to iflib(9) for
FreeBSD 12.0, and fixes numerous bugs in both ixl(4) and iavf(4).

This commit also re-adds the VF driver to GENERIC since it now compiles and
functions.

The VF driver name was changed from ixlv(4) to iavf(4) because the VF driver is
now intended to be used with future products, not just with Fortville/Fort Park
VFs.

A man page update that documents these drivers is forthcoming in a separate
commit.

Reviewed by: sbruno@, kbowling@
Tested by: jeffrey.e.pieper@intel.com
Approved by: re (gjb@)
Relnotes: yes
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D16429


# 329e817f 26-Sep-2018 Warner Losh <imp@FreeBSD.org>

Reapply, with minor tweaks, r338025, from the original commit:

Remove unused and easy to misuse PNP macro parameter

Inspired by r338025, just remove the element size parameter to the
MODULE_PNP_INFO macro entirely. The 'table' parameter is now required to
have correct pointer (or array) type. Since all invocations of the macro
already had this property and the emitted PNP data continues to include the
element size, there is no functional change.

Mostly done with the coccinelle 'spatch' tool:

$ cat modpnpsize0.cocci
@normaltables@
identifier b,c;
expression a,d,e;
declarer MODULE_PNP_INFO;
@@
MODULE_PNP_INFO(a,b,c,d,
-sizeof(d[0]),
e);

@singletons@
identifier b,c,d;
expression a;
declarer MODULE_PNP_INFO;
@@
MODULE_PNP_INFO(a,b,c,&d,
-sizeof(d),
1);

$ rg -l MODULE_PNP_INFO -- sys | \
xargs spatch --in-place --sp-file modpnpsize0.cocci

(Note that coccinelle invokes diff(1) via a PATH search and expects diff to
tolerate the -B flag, which BSD diff does not. So I had to link gdiff into
PATH as diff to use spatch.)

Tinderbox'd (-DMAKE_JUST_KERNELS).
Approved by: re (glen)


# b8e771e9 18-Aug-2018 Conrad Meyer <cem@FreeBSD.org>

Back out r338035 until Warner is finished churning GSoC PNP patches

I was not aware Warner was making or planning to make forward progress in
this area and have since been informed of that.

It's easy to apply/reapply when churn dies down.


# faa31943 18-Aug-2018 Conrad Meyer <cem@FreeBSD.org>

Remove unused and easy to misuse PNP macro parameter

Inspired by r338025, just remove the element size parameter to the
MODULE_PNP_INFO macro entirely. The 'table' parameter is now required to
have correct pointer (or array) type. Since all invocations of the macro
already had this property and the emitted PNP data continues to include the
element size, there is no functional change.

Mostly done with the coccinelle 'spatch' tool:

$ cat modpnpsize0.cocci
@normaltables@
identifier b,c;
expression a,d,e;
declarer MODULE_PNP_INFO;
@@
MODULE_PNP_INFO(a,b,c,d,
-sizeof(d[0]),
e);

@singletons@
identifier b,c,d;
expression a;
declarer MODULE_PNP_INFO;
@@
MODULE_PNP_INFO(a,b,c,&d,
-sizeof(d),
1);

$ rg -l MODULE_PNP_INFO -- sys | \
xargs spatch --in-place --sp-file modpnpsize0.cocci

(Note that coccinelle invokes diff(1) via a PATH search and expects diff to
tolerate the -B flag, which BSD diff does not. So I had to link gdiff into
PATH as diff to use spatch.)

Tinderbox'd (-DMAKE_JUST_KERNELS).


# 7f87c040 15-Jul-2018 Marius Strobl <marius@FreeBSD.org>

Assorted TSO fixes for em(4)/iflib(9) and dead code removal:
- Ever since the workaround for the silicon bug of TSO4 causing MAC hangs
was committed in r295133, CSUM_TSO always got disabled unconditionally
by em(4) on the first invocation of em_init_locked(). However, even with
that problem fixed, it turned out that for at least e. g. 82579 not all
necessary TSO workarounds are in place, still causing MAC hangs even at
Gigabit speed. Thus, for stable/11, TSO usage was deliberately disabled
in r323292 (r323293 for stable/10) for the EM-class by default, allowing
users to turn it on if it happens to work with their particular EM MAC
in a Gigabit-only environment.
In head, the TSO workaround for speeds other than Gigabit was lost with
the conversion to iflib(9) in r311849 (possibly along with another one
or two TSO workarounds). Yet at the same time, for EM-class MACs TSO4
got enabled by default again, causing device hangs. Therefore, change the
default for this hardware class back to have TSO4 off, allowing users
to turn it on manually if it happens to work in their environment as
we do in stable/{10,11}. An alternative would be to add a whitelist of
EM-class devices where TSO4 actually is reliable with the workarounds in
place, but given that the advantage of TSO at Gigabit speed is rather
limited - especially with the overhead of these workarounds -, that's
really not worth it. [1]
This change includes the addition of an isc_capabilities to struct
if_softc_ctx so iflib(9) can also handle interface capabilities that
shouldn't be enabled by default which is used to handle the default-off
capabilities of e1000 as suggested by shurd@ and moving their handling
from em_setup_interface() to em_if_attach_pre() accordingly.
- Although 82543 support TSO4 in theory, the former lem(4) didn't have
support for TSO4, presumably because TSO4 is even more broken in the
LEM-class of MACs than the later EM ones. Still, TSO4 for LEM-class
devices was enabled as part of the conversion to iflib(9) in r311849,
causing device hangs. So revert back to the pre-r311849 behavior of
not supporting TSO4 for LEM-class at all, which includes not creating
a TSO DMA tag in iflib(9) for devices not having IFCAP_TSO4 set. [2]
- In fact, the FreeBSD TCP stack can handle a TSO size of IP_MAXPACKET
(65535) rather than FREEBSD_TSO_SIZE_MAX (65518). However, the TSO
DMA must have a maxsize of the maximum TSO size plus the size of a
VLAN header for software VLAN tagging. The iflib(9) converted em(4),
thus, first correctly sets scctx->isc_tx_tso_size_max to EM_TSO_SIZE
in em_if_attach_pre(), but later on overrides it with IP_MAXPACKET
in em_setup_interface() (apparently, left-over from pre-iflib(9)
times). So remove the later and correct iflib(9) to correctly cap
the maximum TSO size reported to the stack at IP_MAXPACKET. While at
it, let iflib(9) use if_sethwtsomax*().
This change includes the addition of isc_tso_max{seg,}size DMA engine
constraints for the TSO DMA tag to struct if_shared_ctx and letting
iflib_txsd_alloc() automatically adjust the maxsize of that tag in case
IFCAP_VLAN_MTU is supported as requested by shurd@.
- Move the if_setifheaderlen(9) call for adjusting the maximum Ethernet
header length from {ixgbe,ixl,ixlv,ixv,em}_setup_interface() to iflib(9)
so adjustment is automatically done in case IFCAP_VLAN_MTU is supported.
As a consequence, this adjustment now is also done in case of bnxt(4)
which missed it previously.
- Move the reduction of the maximum TSO segment count reported to the
stack by the number of m_pullup(9) calls (which in the worst case,
can add another mbuf and, thus, the requirement for another DMA
segment each) in the transmit path for performance reasons from
em_setup_interface() to iflib_txsd_alloc() as these pull-ups are now
done in iflib_parse_header() rather than in the no longer existing
em_xmit(). Moreover, this optimization applies to all drivers using
iflib(9) and not just em(4); all in-tree iflib(9) consumers still
have enough room to handle full size TSO packets. Also, reduce the
adjustment to the maximum number of m_pullup(9)'s now performed in
iflib_parse_header().
- Prior to the conversion of em(4)/igb(4)/lem(4) and ixl(4) to iflib(9)
in r311849 and r335338 respectively, these drivers didn't enable
IFCAP_VLAN_HWFILTER by default due to VLAN events not being passed
through by lagg(4). With iflib(9), IFCAP_VLAN_HWFILTER was turned on
by default but also lagg(4) was fixed in that regard in r203548. So
just remove the now redundant and defunct IFCAP_VLAN_HWFILTER handling
in {em,ixl,ixlv}_setup_interface().
- Nuke other redundant IFCAP_* setting in {em,ixl,ixlv}_setup_interface()
which is (more completely) already done in {em,ixl,ixlv}_if_attach_pre()
now.
- Remove some redundant/dead setting of scctx->isc_tx_csum_flags in
em_if_attach_pre().
- Remove some IFCAP_* duplicated either directly or indirectly (e. g.
via IFCAP_HWCSUM) in {EM,IGB,IXL}_CAPS.
- Don't bother to fiddle with IFCAP_HWSTATS in ixgbe(4)/ixgbev(4) as
iflib(9) adds that capability unconditionally.
- Remove some unused macros from em(4).
- Bump __FreeBSD_version as some of the above changes require the modules
of drivers using iflib(9) to be recompiled.

Okayed by: sbruno@ at 201806 DevSummit Transport Working Group [1]
Reviewed by: sbruno (earlier version), erj
PR: 219428 (part of; comment #10) [1], 220997 (part of; comment #3) [2]
Differential Revision: https://reviews.freebsd.org/D15720


# 3e0e6330 29-May-2018 Stephen Hurd <shurd@FreeBSD.org>

iflib: mark irq allocation name parameter as constant

The *name parameter passed to iflib_irq_alloc_generic and
iflib_softirq_alloc_generic is never modified. Many places in code pass
string literals and thus should not be modified.

Mark the *name parameter as a const char * instead, so that we enforce
that the name is not modified before passing to bus_describe_intr()

Submitted by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed by: kmacy
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D15343


# 1d7ef186 25-May-2018 Eric Joyner <erj@FreeBSD.org>

iflib: Add new shared flag: IFLIB_ADMIN_ALWAYS_RUN

ixl(4)'s nvmupdate utility expects the nvmupdate process to run
while the interface is down; these nvm update commands use the
admin queue, so the admin queue needs to be able to generate
interrupts and be processed while the interface is down.

So add a flag that ixl(4) sets that lets the entire admin task
run even when the interface is marked down/IFF_DRV_RUNNING isn't set.

With this change, nvmupdate should function like it did pre-iflib.

Reviewed by: gallatin@, sbruno@
MFC after: 1 week
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D15575


# 09f6ff4f 11-May-2018 Matt Macy <mmacy@FreeBSD.org>

iflib(9): Add support for cloning pseudo interfaces

Part 3 of many ...
The VPC framework relies heavily on cloning pseudo interfaces
(vmnics, vpc switch, vcpswitch port, hostif, vxlan if, etc).

This pulls in that piece. Some ancillary changes get pulled
in as a side effect.

Reviewed by: shurd@
Approved by: sbruno@
Sponsored by: Joyent, Inc.
Differential Revision: https://reviews.freebsd.org/D15347


# aa8a24d3 03-May-2018 Stephen Hurd <shurd@FreeBSD.org>

Allow iflib NIC drivers to sleep rather than busy wait

Since the move to SMP NIC driver locking has had to go through serious
contortions using mtx around long running hardware operations. This moves
iflib past that.

Individual drivers may now sleep when appropriate.

Submitted by: Matthew Macy <mmacy@mattmacy.io>
Reviewed by: shurd
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D14983


# 81ad57b1 21-Feb-2018 Stephen Hurd <shurd@FreeBSD.org>

IFLIB: Make isc_magic unsigned

The IFLIB_MAGIC macro is > INT_MAX, so isc_magic should
be able to contain it.

Reported by: jeb
Sponsored by: Limelight Networks


# 79554b40 22-Dec-2017 Warner Losh <imp@FreeBSD.org>

The device tables end with a sentinel in iflib. Don't include the
sentinel in the output.


# d2064cf0 22-Dec-2017 Warner Losh <imp@FreeBSD.org>

Use '#' rather than some made up name for fields we want to ignore.


# 96fc97c8 19-Dec-2017 Stephen Hurd <shurd@FreeBSD.org>

Update Matthew Macy contact info

Email address has changed, uses consistent name (Matthew, not Matt)

Reported by: Matthew Macy <mmacy@mattmacy.io>
Differential Revision: https://reviews.freebsd.org/D13537


# d14c853b 05-Dec-2017 Stephen Hurd <shurd@FreeBSD.org>

iflib: Support to padding Ethernet frames to a min size

Some bnxt devices do not correctly send frames smaller than
52 bytes (without CRC), so add a quirk that will pad frames to an
arbitrary size before passing off to the encap routine.

Reported by: Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com>
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D13269


# 1c0054d2 05-Oct-2017 Stephen Hurd <shurd@FreeBSD.org>

Fix "taskqgroup_attach: setaffinity failed: 3" with iflib drivers

Improved logging added in r323879 exposed an error during
attach. We need the irq, not the rid to work correctly. em uses
shared irqs, so it will use the same irq for TX as RX. bnxt does
not use shared irqs, or TX irqs at all, so there's no need to set
the TX irq affinity.

Reviewed by: sbruno
Approved by: sbruno (mentor)
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D12496


# 916616c4 26-Sep-2017 Conrad Meyer <cem@FreeBSD.org>

Add PNP metadata to more drivers

GPUs: radeonkms, i915kms
NICs: if_em, if_igb, if_bnxt

This metadata isn't used yet, but it will be handy to have later to
implement automatic module loading.

Reviewed by: imp, mmacy
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D12488


# c5cf2172 22-Sep-2017 Stephen Hurd <shurd@FreeBSD.org>

Some small packet performance improvements

If the packet is smaller than MTU, disable the TSO flags.
Move TCP header parsing inside the IS_TSO?() test.
Add a new IFLIB_NEED_ZERO_CSUM flag to indicate the checksums need to be zeroed before TX.

Reviewed by: sbruno
Approved by: sbruno (mentor)
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D12442


# ab2e3f79 15-Sep-2017 Stephen Hurd <shurd@FreeBSD.org>

Revert r323516 (iflib rollup)

This was really too big of a commit even if everything worked, but there
are multiple new issues introduced in the one huge commit, so it's not
worth keeping this until it's fixed.

I'll work on splitting this up into logical chunks and introduce them one
at a time over the next week or two.

Approved by: sbruno (mentor)
Sponsored by: Limelight Networks


# d300df01 12-Sep-2017 Stephen Hurd <shurd@FreeBSD.org>

Roll up iflib commits from github. This pulls in most of the work done
by Matt Macy as well as other changes which he has accepted via pull
request to his github repo at https://github.com/mattmacy/networking/

This should bring -CURRENT and the github repo into close enough sync to
allow small feature branches rather than a large chain of interdependant
patches being developed out of tree. The reset of the synchronization
should be able to be completed on github by splitting the remaining
changes that are not yet ready into short feature branches for later
review as smaller commits.

Here is a summary of changes included in this patch:

1) More checks when INVARIANTS are enabled for eariler problem
detection
2) Group Task Queue cleanups
- Fix use of duplicate shortdesc for gtaskqueue malloc type.
Some interfaces such as memguard(9) use the short description to
identify malloc types, so duplicates should be avoided.
3) Allow gtaskqueues to use ithreads in addition to taskqueues
- In some cases, this can improve performance
4) Better logging when taskqgroup_attach*() fails to set interrupt
affinity.
5) Do not start gtaskqueues until they're needed
6) Have mp_ring enqueue function enter the ABDICATED rather than BUSY
state. This moves the TX to the gtaskq and allows processing to
continue faster as well as make TX batching more likely.
7) Add an ift_txd_errata function to struct if_txrx. This allows
drivers to inspect/modify mbufs before transmission.
8) Add a new IFLIB_NEED_ZERO_CSUM for drivers to indicate they need
checksums zeroed for checksum offload to work. This avoids modifying
packet data in the TX path when possible.
9) Use ithreads for iflib I/O instead of taskqueues
10) Clean up ioctl and support async ioctl functions
11) Prefetch two cachlines from each mbuf instead of one up to 128B. We
often need to parse packet header info beyond 64B.
12) Fix potential memory corruption due to fence post error in
bit_nclear() usage.
13) Improved hang detection and handling
14) If the packet is smaller than MTU, disable the TSO flags.
This avoids extra packet parsing when not needed.
15) Move TCP header parsing inside the IS_TSO?() test.
This avoids extra packet parsing when not needed.
16) Pass chains of mbufs that are not consumed by lro to if_input()
rather call if_input() for each mbuf.
17) Re-arrange packet header loads to get as much work as possible done
before a cache stall.
18) Lock the context when calling IFDI_ATTACH_PRE()/IFDI_ATTACH_POST()/
IFDI_DETACH();
19) Attempt to distribute RX/TX tasks across cores more sensibly,
especially when RX and TX share an interrupt. RX will attempt to
take the first threads on a core, and TX will attempt to take
successive threads.
20) Allow iflib_softirq_alloc_generic() to request affinity to the same
cpus an interrupt has affinity with. This allows TX queues to
ensure they are serviced by the socket the device is on.
21) Add new iflib sysctls to net.iflib:
- timer_int - interval at which to run per-queue timers in ticks
- force_busdma
22) Add new per-device iflib sysctls to dev.X.Y.iflib
- rx_budget allows tuning the batch size on the RX path
- watchdog_events Count of watchdog events seen since load
23) Fix error where netmap_rxq_init() could get called before
IFDI_INIT()
24) e1000: Fixed version of r323008: post-cold sleep instead of DELAY
when waiting for firmware
- After interrupts are enabled, convert all waits to sleeps
- Eliminates e1000 software/firmware synchronization busy waits after
startup
25) e1000: Remove special case for budget=1 in em_txrx.c
- Premature optimization which may actually be incorrect with
multi-segment packets
26) e1000: Split out TX interrupt rather than share an interrupt for
RX and TX.
- Allows better performance by keeping RX and TX paths separate
27) e1000: Separate igb from em code where suitable
Much easier to understand separate functions and "if (is_igb)" than
previous tests like "if (reg_icr & (E1000_ICR_RXSEQ | E1000_ICR_LSC))"

#blamebruno

Reviewed by: sbruno
Approved by: sbruno (mentor)
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D12235


# a9693502 30-Aug-2017 Sean Bruno <sbruno@FreeBSD.org>

Revert r323008 and its conversion of e1000/iflib to using SX locks.

This seems to be missing something on the 82574L causing NFS root mounts
to hang.

Reported by: kib


# e17e5b41 29-Aug-2017 Sean Bruno <sbruno@FreeBSD.org>

Continuation of lock cleanup in e1000.

Post-cold sleep instead of DELAY when waiting for firmware.

Convert softc mutex to an SX lock. Change all waits to sleeps
once interrupts are enabled (and it is safe to sleep).

Submitted by: Matt Macy <matt@mattmacy.io>
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D12101


# 9bc7588c 27-Jul-2017 Sean Bruno <sbruno@FreeBSD.org>

Deprecate unused int isc_max_txqsets and int isc_max_rxqsets as they
were redundant and not being used to set anything up.

Submitted by: Matt Macy <mmacy@mattmacy.io>
Reported by: Jeb Cramer <cramerj@intel.com>
Sponsored by: Limelight Networks


# eb36b1d0 30-Jun-2017 Jason A. Harmening <jah@FreeBSD.org>

Clean up MD pollution of bus_dma.h:

--Remove special-case handling of sparc64 bus_dmamap* functions.
Replace with a more generic mechanism that allows MD busdma
implementations to generate inline mapping functions by
defining WANT_INLINE_DMAMAP in <machine/bus_dma.h>. This
is currently useful for sparc64, x86, and arm64, which all
implement non-load dmamap operations as simple wrappers
around map objects which may be bus- or device-specific.

--Remove NULL-checked bus_dmamap macros. Implement the
equivalent NULL checks in the inlined x86 implementation.
For non-x86 platforms, these checks are a minor pessimization
as those platforms do not currently allow NULL maps. NULL
maps were originally allowed on arm64, which appears to have
been the motivation behind adding arm[64]-specific barriers
to bus_dma.h, but that support was removed in r299463.

--Simplify the internal interface used by the bus_dmamap_load*
variants and move it to bus_dma_internal.h

--Fix some drivers that directly include sys/bus_dma.h
despite the recommendations of bus_dma(9)

Reviewed by: kib (previous revision), marius
Differential Revision: https://reviews.freebsd.org/D10729


# 60596476 06-Apr-2017 Sean Bruno <sbruno@FreeBSD.org>

Move pause frame counter out of struct if_ctx and into struct if_softc_ctx_t
so that we can use it in iflib to detect pause frames.

The igb(4) driver definitely used to use this in its old timer function and
I see no reason to restrict it to that driver only.

Sponsored by: Limelight Networks


# ea351d3f 04-Apr-2017 Sean Bruno <sbruno@FreeBSD.org>

Allow MSIX to be turned off by tuneable per interface, per driver.

Sponsored by: Limelight Networks


# 95246abb 13-Mar-2017 Sean Bruno <sbruno@FreeBSD.org>

IFLIB updates
- unconditionally enable BUS_DMA on non-x86 architectures
- speed up rxd zeroing via customized function
- support out of order updates to rxd's
- add prefetching to hardware descriptor rings
- only prefetch on 10G or faster hardware
- add seperate tx queue intr function
- preliminary rework of NETMAP interfaces, WIP

Submitted by: Matt Macy <mmacy@nextbsd.org>
Sponsored by: Limelight Networks


# e035717e 27-Jan-2017 Sean Bruno <sbruno@FreeBSD.org>

IFLIB updates:
We found routing performance dropped significantly when configuring
FreeBSD as a router, we are applying the following changes in order to
resolve those issues and hopefully perform better.
- don't prefetch the flags array, we usually don't need it
- prefetch the next cache line of each of the software descriptor arrays as
well as the first cache line of each of the next four packets' mbufs and
clusters
- reduce max copy size to 63 bytes
- convert rx soft descriptors from array of structures to a structure of arrays
- update copyrights

Submitted by: Matt Macy <mmacy@nextbsd.org>


# 1248952a 01-Jan-2017 Sean Bruno <sbruno@FreeBSD.org>

2017 IFLIB updates in preparation for commits to e1000 and ixgbe.
- iflib - add checksum in place support (mmacy)
- iflib - initialize IP for TSO (going to be needed for e1000) (mmacy)
- iflib - move isc_txrx from shared context to softc context (mmacy)
- iflib - Normalize checks in TXQ drainage. (shurd)
- iflib - Fix queue capping checks (mmacy)
- iflib - Fix invalid assert, em can need 2 sentinels (mmacy)
- iflib - let the driver determine what capabilities are set and what
tx csum flags are used (mmacy)
- add INVARIANTS debugging hooks to gtaskqueue enqueue (mmacy)
- update bnxt(4) to support the changes to iflib (shurd)

Some other various, sundry updates. Slightly more verbose changelog:

Submitted by: mmacy@nextbsd.org
Reviewed by: shurd
mFC after:
Sponsored by: LimeLight Networks and Dell EMC Isilon


# 23ac9029 12-Aug-2016 Stephen Hurd <shurd@FreeBSD.org>

Update iflib to support more NIC designs

- Move group task queue into kern/subr_gtaskqueue.c
- Change intr_enable to return an int so it can be detected if it's not
implemented
- Allow different TX/RX queues per set to be different sizes
- Don't split up TX mbufs before transmit
- Allow a completion queue for TX as well as RX
- Pass the RX budget to isc_rxd_available() to allow an earlier return
and avoid multiple calls

Submitted by: shurd
Reviewed by: gallatin
Approved by: scottl
Differential Revision: https://reviews.freebsd.org/D7393


# 4c7070db 17-May-2016 Scott Long <scottl@FreeBSD.org>

Import the 'iflib' API library for network drivers. From the author:

"iflib is a library to eliminate the need for frequently duplicated device
independent logic propagated (poorly) across many network drivers."

Participation is purely optional. The IFLIB kernel config option is
provided for drivers that want to transition between legacy and iflib
modes of operation. ixl and ixgbe driver conversions will be committed
shortly. We hope to see participation from the Broadcom and maybe
Chelsio drivers in the near future.

Submitted by: mmacy@nextbsd.org
Reviewed by: gallatin
Differential Revision: D5211