History log of /freebsd-current/sys/dev/e1000/if_em.c
Revision Date Author Comments
# 0d6d28ce 09-Jan-2024 Marius Strobl <marius@FreeBSD.org>

e1000(4): Remove disconnected SYSCTL

The global hw.em.rx_process_limit knob has been replaced by the device-
specific dev.IF.N.iflib.rx_budget along with the conversion to iflib(4).

While at it, remove the - besides initialization of tx_process_limit -
unused {r,t}x_process_limit members.


# 725e4008 24-Aug-2023 Kevin Bowling <kbowling@FreeBSD.org>

iflib: invert default restart on VLAN changes

In rS360398, a new iflib device method was added to opt out of VLAN
events needing an interface reset.

I am switching the default to not requiring a restart for:
* VLAN events
* unknown events

After fixing various bugs, I do not think this would be a common need
of hardware and it is undesirable from the user's perspective causing
link flaps and much slower VLAN configuration. Currently, there are no
other restart events besides VLAN events, and setting the
ifdi_needs_restart default to false will alleviate the need to churn
every driver if an odd event is added in the future for specific
hardware.

markj points out this could cause churn in the other direction; I will
solve that problem with an event registration system as he mentions in
the review should we need it in the future.

These drivers will opt into restart and need further inspection or work:
* ixv (needs code audit, 61a8231 fixed principal issue; re-init probably
not necessary)
* axgbe (needs code audit; re-init probably not necessary)
* iavf - (needs code audit; interaction with Malicious Driver Detection
mentioned in rS360398)
* mgb - no VLAN functions are currently implemented. Left a comment.

MFC after: 2 weeks
Sponsored by: BBOX.io
Differential Revision: https://reviews.freebsd.org/D41558


# 51e23514 16-Aug-2023 Marius Strobl <marius@FreeBSD.org>

iflib drivers: Constify PCI ID LUTs

Since d49e83eac3baf16a22b1c5d42e8438b68b17e6f9, iflib(9) is ready
for this change.
While at it, make isc_driver_version strings (static) const where
not apparently un-const on purpose, too.
This reduces the size of the amd64 GENERIC by about 10 KiB.


# 71625ec9 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c comment pattern

Remove /^/[*/]\s*\$FreeBSD\$.*\n/


# 797e480c 14-Aug-2023 Kevin Bowling <kbowling@FreeBSD.org>

e1000: disable TSO on lem(4) and em(4)

Disable TSO on lem(4) and em(4) until a ring stall can be debugged.

I am not able to reproduce the issue on lem(4) but disabling there in
abundance of caution in case the issue is not specific to em(4).

Reported by: grog


# 13da8423 09-Aug-2023 Kevin Bowling <kbowling@FreeBSD.org>

e1000: Enable TSO on 82574

Further testing indicates something wrong with particular reciever,
enabling TSO 82574 for wider testing.

Tested by: karels
MFC after: 3 months


# f1b5488f 03-Aug-2023 Kevin Bowling <kbowling@FreeBSD.org>

e1000: Enable TSO for lem(4) and em(4)

Most em(4) devices now enjoy TSO and TSO6, matching NetBSD and Linux
defaults.

A prior commit automasks TSO on 10/100 Ethernet due to errata and other
bugs for IPv6 were fixed recently allowing this.

Mike Karels identified a performance anomaly on Intel 82574L devices.
These are multiqueue enabled on FreeBSD since the conversion to
iflib. I am investigating whether this can be fixed, in the mean time
MSI-X with checksum offloads remain default.

i219 SPT devices have an errata that downclocks the DMA engine, which
results in TSO not being able to acheive line rate. Therefore, it is
disabled on:
* Intel(R) I219-LM and I219-V SPT
* Intel(R) I219-LM and I219-V SPT-H (2)
* Intel(R) I219-LM and I219-V LBG (3)
* Intel(R) I219-LM and I219-V SPT (4)
* Intel(R) I219-LM and I219-V SPT (5)

Many lem(4) devices enjoy TSO, exceptions being 82542, 82543, 82547.
TSO6 may be possible for some chipsets but I am still working through
my testing matrix and that is hidden behind hw.em.unsupported_tso.

If you encounter issues, you may disable TSO with for example:
ifconfig em0 -tso -tso6.
I ask to be informed of any deviations from normal operation requiring
this.

Thanks to cc@ for access to emulab.net.

On a sample I219 system it saves about 16% CPU on IPv4 and 19% on IPv6.

iperf3 -Vc reported numbers:
total% user% system%

IPv4 TSO
21.3 7 14.4
21.4 6 15.4
21.5 6 15.5

IPv4 no TSO
36.8 5.4 31.4
38.5 5.1 33.5
38.2 5.7 32.6

IPv4 no TSO no TXCSUM
45.1 5.8 39.3
46 6.3 39.7
46.2 5.9 40.4

IPv6 TSO6
21.7 5.4 16.3
21.6 5.1 16.5
21.9 5.6 16.3

IPv6 no TSO6
41.2 5.2 36
41 5.1 36
40.8 5.2 35.7

IPv6 no TSO6 no TXCSUM6
49 5.9 43.1
48.8 4.9 43.9
49 5.6 43.4

Tested by: cc (lem(4)), karels (82574L)
MFC after: 3 months
Relnotes: yes
Sponsored by: BBOX.io
Differential Revision: https://reviews.freebsd.org/D41170


# 2ddf24f8 02-Aug-2023 Kevin Bowling <kbowling@FreeBSD.org>

e1000: Automask TSO on lem(4)/em(4) 10/100 Ethernet

This feature masks TSO capability when a link comes up at 10 or 100mbit
due to errata on the chips. This behavior matches previous versions of
FreeBSD as well as NetBSD and Linux.

A tunable, hw.em.unsupported_tso may be set if the admin desires to
disabling automasking and configure TSO settings manually.

MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D41170


# e1353dcb 31-Jul-2023 Kevin Bowling <kbowling@FreeBSD.org>

e1000: Fix lem(4)/em(4) TSO6

* Fix TSO6 by specializing IP checksum insertion and following Intel SDM
values for IPv6.
* Remove unnecessary 82544 IP-bit handling
* Remove TSO6 from lem(4) capabilitities

Reviewed by: erj (earlier version)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D41170


# 38588749 28-Jul-2023 Kevin Bowling <kbowling@FreeBSD.org>

e1000: HWCSUM excemption fixes

Also disable IPV6 checksum offload.

Spell hw->mac.type < e1000_82543 as e1000_82542. Confusingly, chips
like 82540 and 82541 come later and do not have these issues. There
is no functional change here, as the enum was defined in such a way
it worked correctly. But this reads literally.

MFC after: 1 week


# cbcab907 27-Jul-2023 Kevin Bowling <kbowling@FreeBSD.org>

e1000: Corrections for lem(4)/em(4) txcsum offload

Explicitly set ipcss/ipcse/ipcso for IPv6 per intel SDM as indicated in
inline comments.

Fix and consolidate 82543/82547 hwcsum exemption.

While here rearrange and expand some commentary.


# 918c2567 22-Jul-2023 Kevin Bowling <kbowling@FreeBSD.org>

e1000: lem(4)/em(4) ifcaps, TSO and hwcsum fixes

* em(4) obey administrative ifcaps for using hwcsum offload
* em(4) obey administrative ifcaps for hw vlan receive tagging
* em(4) add additional TSO6 ifcap, but disabled by default as is TSO4
* lem(4) obey administrative ifcaps for using hwcsum offload
* lem(4) add support for hw vlan receive tagging
* lem(4) Add ifcaps for TSO offload experimentation, but disabled by
default due to errata and possibly missing txrx code.
* lem(4) disable HWCSUM ifcaps by default on 82547 due to errata around
full duplex links. It may still be administratively enabled.

Reviewed by: markj (previous version)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D30072


# 5d3c9825 21-Jul-2023 Kevin Bowling <kbowling@FreeBSD.org>

Revert "e1000: lem(4)/em(4) ifcaps, TSO and hwcsum fixes"

Seems to cause a panic when booting under VitrualBox.

Reported by: yasu

This reverts commit 95f7b36e8fac45092b9a4eea5e32732e979989f0.


# 95f7b36e 20-Jul-2023 Kevin Bowling <kbowling@FreeBSD.org>

e1000: lem(4)/em(4) ifcaps, TSO and hwcsum fixes

* em(4) obey administrative ifcaps for using hwcsum offload
* em(4) obey administrative ifcaps for hw vlan receive tagging
* em(4) add additional TSO6 ifcap, but disabled by default as is TSO4
* lem(4) obey administrative ifcaps for using hwcsum offload
* lem(4) add support for hw vlan receive tagging
* lem(4) Add ifcaps for TSO offload experimentation, but disabled by
default due to errata and possibly missing txrx code.
* lem(4) disable HWCSUM ifcaps by default on 82547 due to errata around
full duplex links. It may still be administratively enabled.

Reviewed by: markj (previous version)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D30072


# 0229fab2 10-May-2023 Kristof Provost <kp@FreeBSD.org>

e1000: fix VLAN 0

VLAN 0 essentially means "Treat as untagged, but with priority bits",
and is used by some ISPs.

On igb/em interfaces we did not receive packets with VLAN tag 0 unless
vlanhwfilter was disabled.

This can be fixed by explicitly listing VLAN 0 in the hardware VLAN
filter (VFTA). Do this from em_setup_vlan_hw_support(), where we already
(re-)write the VFTA.

Reviewed by: kbowling
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D40046


# 905ae588 08-Feb-2023 Kevin Bowling <kbowling@FreeBSD.org>

e1000: Add support for future client platforms

MFC after: 2 weeks
Sponsored by: BBOX.io


# d36fbdb0 08-Feb-2023 Kevin Bowling <kbowling@FreeBSD.org>

e1000: Remove redundant disable_ulp for ich8lan

This call only makes sense for ich8lan, and the shared code does it in
e1000_setup_init_funcs() above this deletion.

Obtained from: DPDK
MFC after: 2 weeks
Sponsored by: BBOX.io
Pull Request: https://github.com/freebsd/freebsd-src/pull/539


# 647f2d2b 08-Feb-2023 Kevin Bowling <kbowling@FreeBSD.org>

e1000: bump driver version

Incrementing these to avoid confusion in users; we are on par with these
out of tree versions.

Reviewed by: erj
MFC after: 2 weeks
Sponsored by: BBOX.io
Pull Request: https://github.com/freebsd/freebsd-src/pull/540


# ae1dca79 08-Feb-2023 Kevin Bowling <kbowling@FreeBSD.org>

e1000: fix I219 hang on reset

Clear the rings before reset to avoid a HW hang.

Inspired by em-7.7.8 and DPDK (1fc9701238edcf0541289b9ae15565b6d9d7ab30)

Reviewed by: erj
MFC after: 2 weeks
Sponsored by: BBOX.io
Pull Request: https://github.com/freebsd/freebsd-src/pull/540


# c0548bfc 06-Feb-2023 Piotr Kubaj <pkubaj@FreeBSD.org>

em(4): Add IDs for new Intel(R) I219 devices

These include I219 (20) through I219 (23), which ends at Raptor Lake.

This also corrects a discrepancy where the (16) devices should be
mac type "e1000_pch_tgp" and not "e1000_pch_adp".

Signed-off-by: Eric Joyner <erj@FreeBSD.org>

PR: 269224
Reviewed by: erj@
MFC after: 1 day
Relnotes: yes
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D38376


# 402810d3 20-Oct-2021 Justin Hibbits <jhibbits@FreeBSD.org>

Convert iflib(4) and iflib-based drivers to the DrvAPI

Summary:
Convert iflib(4) and the following drivers:
* axgbe
* em
* ice
* ixl
* vmxnet

Sponsored by: Juniper Networks, Inc.
Reviewed by: kbowling, #iflib
Differential Revision: https://reviews.freebsd.org/D37768


# 21cc0918 16-Aug-2021 Elliott Mitchell <ehem+freebsd@m5p.com>

sys: Nuke double-semicolons

A distinct number of double-semicolons have ended up in FreeBSD. Take a
pass at getting rid of many of these harmless typos.

Reviewed by: emaste, rrs
Pull Request: https://github.com/freebsd/freebsd-src/pull/609
Differential Revision: https://reviews.freebsd.org/D31716


# 66dad2db 12-Oct-2022 Kevin Bowling <kbowling@FreeBSD.org>

Revert "e1000: Try auto-negotiation for fixed 100 or 10 configuration"

This reverts commit 9ab4dfce8feda8cf3545be0c3c7569095b1fcd24.

OPNsense users have reported a regression with fixed configs.

The e1000 api is not ready for this change.


# 6987c475 12-May-2022 Kevin Bowling <kbowling@FreeBSD.org>

e1000: Increase rx_buffer_size to 32b

Extend the size of the local rx_buffer_size variable to account for
larger buffer sizes possible on 82580, i350 chips.

From i350 datasheet, 6.2.10 Initialization Control 4 (LAN Base Address
+ Offset 0x13):
When 4 ports are enabled maximum buffer size is 36 KB. When 2 ports are
enabled maximum buffer size is 72 KB. When only a single port is
enabled maximum buffer size is 144 KB.

and 8.3:
The overall available internal buffer size in the I350 for all ports is
144 KB for receive buffers and 80 KB for transmit Buffers. Disabled
ports memory can be shared between active ports and sharing can be
asymmetric. The default buffer size for each port is loaded from the
EEPROM on initialization.

From the reporter:
But for I350 when only 2 ports are used PBA size can be set as 72KB
(see datasheet RXPbsize or e1000_rxpbs_adjust_82580 function in
e1000_82575.c). In this case calculating the rx_buffer_size overflows
as 0x0048 << 10 = 73728 or 0x12000 pushed into u16. It is then set as
0x2000 or 8192.

PR: 263896
Reported by: hannula@gmail.com
Tested by: hannula@gmail.com
Approved by: markj
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D35167


# 9e057054 19-Apr-2022 John Baldwin <jhb@FreeBSD.org>

em/igb: Remove unused devclass arguments to DRIVER_MODULE.


# 9ab4dfce 13-Apr-2022 J.R. Oldroyd <fbsd@opal.com>

e1000: Try auto-negotiation for fixed 100 or 10 configuration

Currently if an e1000 interface is set to a fixed media configuration,
for gigabit, it will participate in auto-negotiation as required by
IEEE 802.3-2018 Clause 37. However, if set to fixed media configuration
for 100 or 10, it does NOT participate in auto-negotiation.

By my reading of Clauses 28 and 37, while auto-negotiation is optional
for 100 and 10, it is not prohibited and is, in fact, "highly
recommended".

This patch enables auto-negotiation for fixed 100 and 10 media
configuration, in a similar manner to that already performed for 1000.
I.e., the patch enables advertising of just the manually configured
settings with the goal of allowing the remote end to match the manually
configured settings if it has them available.

To be clear, this patch does NOT allow an em(4) interface that has been
manually configured with specific media settings to respond to
auto-negotiation by then configuring different parameters to those that
were manually configured. The intent of this patch is to fully comply
with the requirements of Clause 37, but for 100 and 10.

The need for this has arisen on an em(4) link where the other end is
under a different administrative control and is set to full
auto-negotiation. Due to the cable length GigE is not working well. It
is desired to set the em(4) end to "media 100baseTX mediatype
full-duplex" which does work when both ends are configured that way.
Currently, because em(4) does not participate in autoneg for this
setting, the remote defaults to half-duplex - i.e., there's a duplex
mismatch and things don't work. With this patch, em(4) would inform the
remote that it has only 100baseTX full, the remote would match that and
it will work.

Approved by: erj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D34449


# 07ede751 13-Apr-2022 Kevin Bowling <kbowling@FreeBSD.org>

e1000: Update mc filter before RCTL flags

Update mc filter array before changing RCTL flags as in 5a3eb6207a35

Approved by: grehan
MFC after: 2 weeks


# e0f4cdba 08-Dec-2021 Vincenzo Maffione <vmaffione@FreeBSD.org>

e1000: fix interface capabilities management

The e1000 drivers (em, lem, igb) are currently looking at the
iflib copies of the capabilities bitvectors (scctx->isc_capabilities
and scctx->isc_capenable) rather than the ifnet ones
(ifp->if_capabilities and ifp->if_capenable). However, the latter
are the ones that are actually updated by ifconfig and that should
be used by the drivers during interface operation. The former are
set by the driver on interface attach (for iflib internal use)
and should not be used anymore by the driver.
This patch fixes the e1000 driver to use the correct bitvectors.

PR: 260068
Reviewed by: markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D33154


# 293663f4 06-Oct-2021 Kevin Bowling <kbowling@FreeBSD.org>

e1000: print EEPROM/NVM/OROM versions

This is useful for diagnosing problems. In particular, the errata
sheets identify the EEPROM version for many fixes.

Reviewed by: gallatin
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D32333


# 9b3e252e 06-Oct-2021 Kevin Bowling <kbowling@FreeBSD.org>

e1000: Lock nvm print sysctl

Otherwise results in KASSERT with debug kernels because we rely on the
iflib CTX lock to implement the software serialization to the NVM model

Reviewed by: gallatin
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D32333


# 28ccd780 06-Oct-2021 Kevin Bowling <kbowling@FreeBSD.org>

e1000: Function prototype cleanup

Drop arguments of function prototypes since the file is mixed between
listing arg names and not.

No functional changes

Reviewed by: markj
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D32329


# 450c3f8b 27-Sep-2021 Kevin Bowling <kbowling@FreeBSD.org>

e1000: Re-arm link changes

A change to MSI-X link handler was somehow causing issues on
MSI-based em(4) NICs.

Revert the change based on user reports and testing.

PR: 258551
Reported by: Franco Fichtner <franco@opnsense.org>, t_uemura@macome.co.jp
Reviewed by: markj, Franco Fichtner <franco@opnsense.org>
Tested by: t_uemura@macome.co.jp
MFC after: 1 day


# dc926051 24-Sep-2021 Kevin Bowling <kbowling@FreeBSD.org>

e1000: Rename 'struct adapter' to 'struct e1000_sc'

Rename the 'struct adapter' to 'struct e1000_sc' to avoid type ambiguity
in things like kgdb.

Reviewed by: jhb, markj
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D32129


# 1bbdc25f 16-Sep-2021 Kevin Bowling <kbowling@FreeBSD.org>

e1000: Use C99 bool types

Approved by: imp
MFC after: 1 week


# 6b9d35fa 06-Jul-2020 Guinan Sun <guinanx.sun@intel.com>

e1000: remove duplicated phy codes

Add two files base.c and base.h to reduce the redundancy
in the silicon family code.
Remove the code duplication from e1000_82575 files.
Clean family specific functions from base.
Fix up a stray and duplicate function declaration.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Signed-off-by: Guinan Sun <guinanx.sun@intel.com>
Reviewed-by: Wei Zhao <wei.zhao1@intel.com>

Approved by: imp
Obtained from: DPDK (44dddd14059f151f39f7e075b887decfc9a10f11)
MFC after: 1 week


# 82a9d0c2 06-Jul-2020 Guinan Sun <guinanx.sun@intel.com>

e1000: add missing device ID

Adding Intel(R) I210 Gigabit Network Connection 15F6 device ID for SGMII
flashless automotive device.

Signed-off-by: Kamil Bednarczyk <kamil.bednarczyk@intel.com>
Signed-off-by: Guinan Sun <guinanx.sun@intel.com>
Reviewed-by: Wei Zhao <wei.zhao1@intel.com>

Approved by: imp
Obtained from: DPDK (586d770bfefc01d4af97c0ddf17c960c3e49ec22)
MFC after: 1 week


# a4378873 08-Sep-2021 Kevin Bowling <kbowling@FreeBSD.org>

e1000: Revert Update intel shared code

This reverts commit fc7682b17f3738573099b8b03f5628dcc8148adb.

This will be done incrementally to help with bisecting an issue in
later I21x devices (ich8lan).

PR: 258153
Approved by: imp
MFC after: 1 day


# 22b20b45 15-Sep-2021 Kevin Bowling <kbowling@FreeBSD.org>

e1000: Fix variable typo

Forgot to git add this in last commit

Reported by: jenkins
Fixes: 2796f7cab107
MFC after: 2 week


# 2796f7ca 15-Sep-2021 Kevin Bowling <kbowling@FreeBSD.org>

e1000: Fix up HW vlan ops

* Don't reset the entire adapter for vlan changes, fix up the problems
* Add some functions for vlan filter (vfta) manipulation
* Don't muck with the vfta if we aren't doing HW vlan filtering
* Disable interrupts when manipulating vfta on lem(4)-class NICs
* On the I350 there is a specification update (2.4.20) in which the
suggested workaround is to write to the vfta 10 times (if at first you
don't succeed, try, try again). Our shared code has the goods, use it
* Increase a VF's frame receive size in the case of vlans

From the referenced PR, this reduced vlan configuration from minutes
to seconds with hundreds or thousands of vlans and prevents wedging the
adapter with needless adapter reinitialization for each vlan ID.

PR: 230996
Reviewed by: markj
Tested by: Ozkan KIRIK <ozkan.kirik@gmail.com>
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D30002


# 0e5811a2 20-Aug-2021 Kevin Bowling <kbowling@FreeBSD.org>

intel ethernet: Use ether_gen_addr(9)

Use ether_gen_addr(9) for VF MAC generation

Reviewed by: Intel Networking (erj), kevans
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31616


# fc7682b1 19-Aug-2021 Kevin Bowling <kbowling@FreeBSD.org>

e1000: Update intel shared code

Sync the e1000 shared code with DPDK shared code
"cid-gigabit.2020.06.05.tar.gz released by ND"

Primary focus was on client platforms (ich8lan). More work remains here
but we need an Intel contact for client networking.

Reviewed by: grehan, Intel Networking (erj, earlier rev)
Obtained from: DPDK <http://git.dpdk.org/dpdk/tree/drivers/net/e1000/base>
MFC after: 1 week
Sponsored by: me
Differential Revision: https://reviews.freebsd.org/D31547


# 69e8e8ea 16-Aug-2021 Kevin Bowling <kbowling@FreeBSD.org>

e1000: always enable PCSD when RSS hashing

To enable RSS hashing in the NIC, the PCSD bit must be set.

By default, this is never set when RXCSUM is disabled - which
causes problems higher up in the stack.

While here improve the RXCSUM flag assignments when enabling or
disabling IFCAP_RXCSUM.

See also: https://lists.freebsd.org/pipermail/freebsd-current/2020-May/076148.html

Reviewed by: markj, Franco Fichtner <franco@opnsense.org>,
Stephan de Wit <stephan.dewt@yahoo.co.uk>
Obtained from: OPNsense
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31501
Co-authored-by: Stephan de Wit <stephan.dewt@yahoo.co.uk>
Co-authored-by: Franco Fichtner <franco@opnsense.org>


# 12e8addd 10-Aug-2021 Kevin Bowling <kbowling@FreeBSD.org>

e1000: rctl/srrctl buffer size init, rfctl fix

Simplify the setup of srrctl.BSIZEPKT on igb class NICs.
Improve the setup of rctl.BSIZE on lem and em class NICs.
Don't try to touch rfctl on lem class NICs.
Manipulate rctl.BSEX correctly on lem and em class NICs.

Approved by: markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31457


# eea55de7 25-Apr-2021 Kevin Bowling <kbowling@FreeBSD.org>

e1000: Rework em_msi_link interrupt filter

* Fix 82574 Link Status Changes, carrying the OTHER mask bit around as
needed.
* Move igb-class LSC re-arming out of FAST back into the handler.
* Clarify spurious/other interrupt re-arms in FAST.

In MSI-X mode, 82574 and igb-class devices use an interrupt filter to
handle Link Status Changes. We want to do LSC re-arms in the handler
to take advantage of autoclear (EIAC) single shot behavior.

82574 uses 'Other' in ICR and IMS for LSC interrupt types when in MSI-X
mode, so we need to set and re-arm the 'Other' bit during attach and
after ICR reads in the FAST handler if not an LSC or after handling on
LSC due to autoclearing.

This work was primarily done to address the referenced PR, but inspired
some clarification and improvement for igb-class devices once the
intentions of previous bug fix attempts became clearer.

PR: 211219
Reported by: Alexey <aserp3@gmail.com>
Tested by: kbowling (I210 lagg), markj (I210)
Approved by: markj
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D29943


# ba7b31b3 26-Apr-2021 Kevin Bowling <kbowling@FreeBSD.org>

e1000: Fix register name in reg_dump sysctl

The correct name of this register is CTRL_EXT.

Approved by: markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D29967


# 0f6bea61 20-Apr-2021 Kevin Bowling <kbowling@FreeBSD.org>

e1000: Improve device name strings

This is just clerical work to ease bug triage and may be used to set
expectations around the ability for anyone in the community to perform
testing and development on older parts (this driver covers over 20 years
of silicon)

Reviewed by: erj
Approved by: markj
Sponsored by: Pink Floyd - Any Colour You Like (in kind)
Differential Revision: https://reviews.freebsd.org/D29872


# 59690eab 18-Apr-2021 Kevin Bowling <kbowling@FreeBSD.org>

e1000: Add support for [Tiger, Alder, Meteor] Lake

Add support for current and future client platform PCI IDs. These are
all I219 variants and have no known driver changes versus previous
generation client platform I219 variants.

Reviewed by: markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D29801


# 4b38eed7 16-Apr-2021 Kevin Bowling <kbowling@FreeBSD.org>

e1000: Correct promisc multicast filter handling

There are a number of issues in the e1000 multicast filter handling
that have been present for a long time. Take the updated approach from
ixgbe(4) which does not have the issues.

The issues are outlined in the PR, in particular this solves crossing
over and under the hardware's filter limit, not programming the
hardware filter when we are above its limit, disabling SBP (show bad
packets) when the tunable is enabled and exiting promiscuous mode, and
an off-by-one error in the em_copy_maddr function.

PR: 140647
Reported by: jtl
Reviewed by: markj
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D29789


# 68a46f11 15-Apr-2021 Kevin Bowling <kbowling@FreeBSD.org>

e1000: Restore VF interface random MAC

Restore 525e07418c77 after the iflib conversion of igb(4). This
reenables random MAC address generation when attaching to a VF with a
zeroed MAC.

PR: 253535
Reported by: Balaev PA <mail@void.so>
Reviewed by: markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D29785


# bb1b375f 15-Apr-2021 Kevin Bowling <kbowling@FreeBSD.org>

e1000: fix em_mac_min and 82547 packet buffer

The boundary differentiating "lem" vs "em" class devices was wrong
after the iflib conversion of lem(4).

The Packet Buffer size for 82547 class chips was not set correctly
after the iflib conversion of lem(4).

These changes restore functionality on an 82547 for the submitter.

PR: 236119
Reported by: Jeff Gibbons <jgibbons@protogate.com>
Reviewed by: markj
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D29766


# 548d8a13 15-Apr-2021 Kevin Bowling <kbowling@FreeBSD.org>

e1000: disable hw.em.sbp debug setting

This is a debugging tunable that shouldn't have retained this setting
after the initial iflib conversion of the driver

PR: 248934
Reported by: Franco Fichtner <franco@opnsense.org>
Reviewed by: markj
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D29768


# ffe3def9 07-Mar-2021 Mark Johnston <markj@FreeBSD.org>

iflib: Make if_shared_ctx_t a pointer to const

This structure is shared among multiple instances of a driver, so we
should ensure that it doesn't somehow get treated as if there's a
separate instance per interface. This is especially important for
software-only drivers like wg.

DEVICE_REGISTER() still returns a void * and so the per-driver sctx
structures are not yet defined with the const qualifier.

Reviewed by: gallatin, erj
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29102


# 38bfc6de 01-Feb-2021 Sai Rajesh Tallamraju <stallamr@netapp.com>

iflib: Free resources in a consistent order during detach

Memory and PCI resources are freed with no particular order. This could
cause use-after-frees when detaching following a failed attach. For
instance, iflib_tx_structures_free() frees ctx->ifc_txqs[] but
iflib_tqg_detach() attempts to access this array. Similarly, adapter
queues gets freed by IFDI_QUEUES_FREE() but IFDI_DETACH() attempts to
access adapter queues to free PCI resources.

MFC after: 2 weeks
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D27634


# c262e8e8 27-Jan-2021 Marius Strobl <marius@FreeBSD.org>

e1000: fix build after c1655b0f


# c1655b0f 23-Jan-2021 Marius Strobl <marius@FreeBSD.org>

e1000: consistently use the hw variables

It's rather confusing when adapter->hw and hw are mixed and matched
within a particular function.
Some of this was missed in cd1cf2fc1d49c509ded05dcd41b7600a5957fb9a
and r353778 respectively.


# e07d687e 16-Dec-2020 Jessica Clarke <jrtc27@FreeBSD.org>

Fix whitespace in r368698

MFC with: r368698


# 63d973c3 16-Dec-2020 Michal Meloun <mmel@FreeBSD.org>

Use the standard method for localizing of MSI-X table bar.

Current way, hardcoded value plus heuristic is not conform to the PCI(e)
specification and it fails on systems where MSI-X bar is not initialized by
BIOS/ACPI (many arm or arm64 systems for example).
Instead, use the standard PCI(e) capability for determining of
MSIX table bar address.

MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D27265


# a3cd2439 02-Dec-2020 Mitchell Horne <mhorne@FreeBSD.org>

em: fix a null de-reference in em_free_pci_resources

A failure in iflib_device_register() can result in
em_free_pci_resources() being called after receive queues have already
been freed. In particular, a failure to allocate IRQ resources will goto
fail_queues, where IFDI_QUEUES_FREE() will be called via
iflib_tx_structures_free(), preceding the call to IFDI_DETACH().

Cope with this by checking adapter->rx_queues before dereferencing it.
A similar check is present in ixgbe(4) and ixl(4).

MFC after: 1 week
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D27260


# a3b9a736 15-Sep-2020 Eric Joyner <erj@FreeBSD.org>

e1000: Properly retain promisc flag

From Franco:
The iflib rewrite forced the promisc flag but it was not reported
to the system. Noticed on a stock VM that went into unsolicited
promisc mode when dhclient was started during bootup.

PR: 248869
Submitted by: Franco Fichtner <franco@opnsense.org>
Reviewed by: erj@
MFC after: 3 days


# 7e6223b2 06-Aug-2020 Vincenzo Maffione <vmaffione@FreeBSD.org>

em(4): honor vlanhwtag offload

The FreeBSD em driver fails to properly reset the VME flag
in the e1000 CTRL register oneg the following ifconfig command

ifconfig em1 -vlanhwtag

Tested on the e1000 device emulated by QEMU, and on a real
NIC (chip=0x10d38086).

PR: 236584
Submitted by: murat@sunnyvalley.io
Reported by: murat@sunnyvalley.io
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D25286


# 104d75a0 11-Jun-2020 Eric Joyner <erj@FreeBSD.org>

em(4): Always reinit interface when adding/removing VLAN

This partially reverts r361053 since there have been reports
by users that this breaks some functionality for em(4)
devices; it seems at first glance that some sort of interface
restart is required for those cards.

This isn't a proper fix; this unbreaks those users until a proper
fix is found for their issues.

PR: 240818
Reported by: Marek Zarychta <zarychtam@plan-b.pwste.edu.pl>
MFC after: 3 days


# 51569bd7 04-Jun-2020 Eric Joyner <erj@FreeBSD.org>

em(4): Add support for Comet Lake Mobile Platform, update shared code

This change introduces Comet Lake Mobile Platform support in the e1000
driver along with shared code patches described below.

- Cast return value of e1000_ltr2ns() to higher type to avoid overflow
- Remove useless statement of assigning act_offset
- Add initialization of identification LED
- Fix flow control setup after connected standby:
After connected standby the driver blocks resets during
"AdapterStart" and skips flow control setup. This change adds
condition in e1000_setup_link_ich8lan() to always setup flow control
and to setup physical interface only when there is no need to block
resets.

Signed-off-by: Piotr Pietruszewski <piotr.pietruszewski@intel.com>

Submitted by: Piotr Pietruszewski <piotr.pietruszewski@intel.com>
Reviewed by: erj@
Tested by: Jeffrey Pieper <jeffrey.e.pieper@intel.com>
MFC after: 1 week
Relnotes: yes
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D25035


# cf150917 11-May-2020 Eric Joyner <erj@FreeBSD.org>

em/ix/ixv/ixl/iavf: Implement ifdi_needs_restart iflib method

Pursuant to r360398, implement driver-specific versions of the
ifdi_needs_restart iflib device method.

Some (if not most?) Intel network cards don't need reinitializing when a
VLAN is added or removed from the device hardware, so these implement
ifdi_needs_restart in a way that tell iflib not to bring the interface
up or down when a VLAN is added or removed, regardless of whether the
VLAN_HWFILTER interface capability flag is set or not.

This could potentially solve several PRs relating to link flaps that
occur when VLANs are added/removed to devices.

Signed-off-by: Eric Joyner <erj@freebsd.org>

PR: 240818, 241785
Reviewed by: gallatin@, olivier@
MFC after: 3 days
MFC with: r360398
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D24659


# 20b91f0a 24-Feb-2020 Pawel Biernacki <kaktus@FreeBSD.org>

Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (15 of many)

r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.


# 927dd68e 19-Jan-2020 George V. Neville-Neil <gnn@FreeBSD.org>

Add support for latest Intel I219 device, supported in Lenovo Carbon X1 v7

MFC after: 2 weeks


# e81998f4 04-Nov-2019 Eric Joyner <erj@FreeBSD.org>

net: prefer ETHER_ADDR_LEN over ETH_ADDR_LEN

A couple of drivers and one place in if.c use ETH_ADDR_LEN, even though
net/ethernet.h provides an equivalent ETHER_ADDR_LEN definition.

Cleanup all of the locations which refer to ETH_ADDR_LEN to use the
standard ETHER_ADDR_LEN instead.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>

Submitted by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed by: erj@, jpaetzel@
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D21239


# c4298026 21-Oct-2019 Gleb Smirnoff <glebius@FreeBSD.org>

Convert to if_foreach_llmaddr() KPI.


# cd1cf2fc 20-Oct-2019 Marius Strobl <marius@FreeBSD.org>

- In em_intr(), just call em_handle_link() instead of duplicating it.
- In em_msix_link(), properly handle IGB-class devices after the iflib(4)
conversion again by only setting EM_MSIX_LINK for the EM-class 82574
and by re-arming link interrupts unconditionally, i. e. not only in
case of spurious interrupts. This fixes the interface link state change
detection for the IGB-class. [1]
- In em_if_update_admin_status(), only re-arm the link state change
interrupt for 82574 and also only if such a device uses MSI-X, i. e.
takes advantage of autoclearing. In case of INTx and MSI as well as
for LEM- and IGB-class devices, re-arming isn't appropriate here and
setting EM_MSIX_LINK isn't either.
While at it, consistently take advantage of the hw variable.

PR: 236724 [1]
Differential Revision: https://reviews.freebsd.org/D21924


# ea0e3f4d 16-Oct-2019 Eric Joyner <erj@FreeBSD.org>

e1000: correctly set isc_pause_frames only when XOFF increases

From Jake:
The e1000 driver sets the iflib shared context isc_pause_frames value to
the number of received xoff frames. This is done so that the iflib
watchdog timer won't trigger a Tx Hang due to pause frames.

Unfortunately, the function simply sets it to the value of the xoffrxc
counter. Once the device has received a single XOFF packet, the driver
always reports that we received pause frames. This will prevent the Tx
hang detection entirely from that point on.

Fix this by assigning isc_pause_frames to a non-zero value if we
received any XOFF packets in the last timer interval.

We could attempt to calculate the total number of received packets by
doing a subtraction, but the iflib stack only seems to check if
isc_pause_frames is non-zero.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>

Submitted by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed by: gallatin@
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D21868


# ca2ebb27 07-May-2019 Marius Strobl <marius@FreeBSD.org>

o Avoid determining the MAC class (LEM/EM or IGB) - possibly even multiple
times - on every interrupt by using an own set of device methods for the
IGB class. This translates to introducing igb_if_intr_{disable,enable}()
and igb_if_{rx,tx}_queue_intr_enable() with that IGB-specific code moved
out of their EM counterparts and otherwise continuing to use the EM IFDI
methods also for IGB.
Note that igb_if_intr_{disable,enable}() also issue E1000_WRITE_FLUSH as
lost with the conversion of igb(4) to iflib(4).
Also note, that the em_if_{disable,enable}_intr() methods are renamed to
em_if_intr_{disable,enable}() for consistency with the names used in the
interface declaration.
o In em_intr():
- Don't bother to bail out if the interrupt type is "legacy", i. e. INTx
or MSI, as iflib(4) doesn't use ift_legacy_intr methods for MSI-X. All
other iflib(4)-based drivers avoid this check, too.
- Given that only the MSI-X interrupts have one-shot behavior (by taking
advantage of the EIAC register), explicitly disable interrupts. Hence,
em_intr() now matches what {em,igb}_irq_fast() previously did (in case
of igb(4) supposedly also to work around MSI message reordering errata
on certain systems).
o In em_if_intr_disable():
- Clear the EIAC register unconditionally for 82574 and not just in case
of MSI-X, matching em_if_intr_enable() and bringing back the last hunk
of r206437 lost with the iflib(4) conversion.
- Write to EM_EIAC for clearing said register instead of to the IGB-only
E1000_EIAC used ever since the iflib(4) conversion.

Reviewed by: shurd
Differential Revision: https://reviews.freebsd.org/D20176


# 1b9d9394 19-Mar-2019 Eric Joyner <erj@FreeBSD.org>

iflib: expose the Rx mbuf buffer size to drivers

From Jake:
iflib_fl_setup calculates a suitable buffer size for the Rx mbufs based
on the isc_max_frame_size value that drivers setup. This calculation is
repeated by drivers when programming their hardware with the size of
each Rx buffer.

This can lead to a mismatch where the iflib mbuf size is different from
the expected size of the buffer as programmed by the hardware. This can
lead to unexpected results.

If iflib ever wants to support mbuf sizes larger than one page, every
driver must be updated to account for the new possible buffer sizes.

Fix this by calculating the mbuf size prior to calling IFDI_INIT, and
adding the iflib_get_rx_mbuf_sz function which will expose this value to
drivers, so that they do not repeat the same calculation.

Submitted by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed by: shurd@, erj@
MFC after: 1 week
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D19489


# bc408c7d 05-Mar-2019 Eric Joyner <erj@FreeBSD.org>

Remove references to CONTIGMALLOC_WORKS in iflib and em

From Jake:
"The iflib_fl_setup() function tries to pick various buffer sizes based
on the max_frame_size value defined by the parent driver. However, this
code was wrapped under CONTIGMALLOC_WORKS, which was never actually
defined anywhere.

This same code pattern was used in if_em.c, likely trying to match
what iflib uses.

Since CONTIGMALLOC_WORKS is not defined, remove this dead code from
iflib_fl_setup and if_em.c

Given that various iflib drivers appear to be using a similar
calculation, it might be worth making this buffer size a value that the
driver can peek at in the future."

Submitted by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed by: shurd@
MFC after: 1 week
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D19199


# 6143b977 09-Feb-2019 Marius Strobl <marius@FreeBSD.org>

- Remove the redundant device disabled hint handling; ever since
r241119 that's performed globally by device_attach(9).
- As for the EM-class of devices, em(4) supports multiple queues
and MSI-X respectively only with 82574 devices. However, since
the conversion to iflib(4), em(4) relies on the interrupt type
fallback mechanism, i. e. MSI-X -> MSI -> INTx, of iflib(4) to
figure out the interrupt type to use for the EM-class (as well
as the IGB-class) of MACs. Moreover, despite the datasheet for
82583V not mentioning any support of MSI-X, there actually are
82583V devices out there that report a varying number of MSI-X
messages as supported. The interrupt type fallback of iflib(4)
is causing two failure modes depending on the actual number of
MSI-X messages supported for such instances of 82583V:
1) With only one MSI-X message supported, none is left for the
RX/TX queues as that one message gets assigned to the admin
interrupt. Worse, later on - which will be addressed with a
separate fix - iflib(4) interprets that one messages as MSI
or INTx to be set up, but fails to actually do so as it has
previously called pci_alloc_msix(9). [1, 2]
2) With more message supported, their distribution is okay but
then em_if_msix_intr_assign() doesn't work for 82583V, with
the interface being left in a non-working state, too. [3]
Thus, let em_if_attach_pre() indicate to iflib(4) to try MSI-X
with 82574 only, and at most MSI for the remainder of EM-class
devices.
While at it, remove "try_second_bar" as it's polarity inverted
and not actually needed.
- Remove code from em_if_timer() that effectively is a NOP since
the conversion to iflib(4) ("trigger" is no longer read).
While at it, let the comment for em_if_timer() reflect reality
after said conversion.
- Implement an ifdi_watchdog_reset method which only updates the
em(4) "watchdog_events" counter but doesn't perform any reset,
so that the em(4) "watchdog_timeouts" SYSCTL (iflib(4) doesn't
provide a counterpart) reflects reality and these timeouts add
to IFCOUNTER_OERRORS again after the iflib(4) conversion.
- Remove the "mbuf_defrag_fail" and "tx_dma_fail" SYSCTLS; since
the iflib(4) conversion, associated counters are disconnected,
but iflib(4) provides "mbuf_defrag_failed" and "tx_map_failed"
respectively as equivalents.
- Move the description preceding lem_smartspeed() to the correct
spot before em_reset() and bring back appropriate comments for
{igb,em}_initialize_rss_mapping() and lem_smartspeed() lost in
the iflib(4) conversion.
- Adapt some other function descriptions and INIT_DEBUGOUT() use
to match reality after the iflib(4) conversion.
- Put the debugging message of em_enable_vectors_82574() (missed
in r343578) under bootverbose, too.

PR: 219428 [1], 235246 [2], 235147 [3]
Reviewed by: erj (previous version)
Differential Revision: https://reviews.freebsd.org/D19108


# d533db84 08-Feb-2019 Patrick Kelsey <pkelsey@FreeBSD.org>

Fix em(4) interrupt routing

When configured with more tx queues than rx queues,
em_if_msix_intr_assign() was incorrectly routing the tx event
interrupts.

Reviewed by: erj, marius
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D19070


# b97de13a 30-Jan-2019 Marius Strobl <marius@FreeBSD.org>

- Stop iflib(4) from leaking MSI messages on detachment by calling
bus_teardown_intr(9) before pci_release_msi(9).
- Ensure that iflib(4) and associated drivers pass correct RIDs to
bus_release_resource(9) by obtaining the RIDs via rman_get_rid(9)
on the corresponding resources instead of using the RIDs initially
passed to bus_alloc_resource_any(9) as the latter function may
change those RIDs. Solely em(4) for the ioport resource (but not
others) and bnxt(4) were using the correct RIDs by caching the ones
returned by bus_alloc_resource_any(9).
- Change the logic of iflib_msix_init() around to only map the MSI-X
BAR if MSI-X is actually supported, i. e. pci_msix_count(9) returns
> 0. Otherwise the "Unable to map MSIX table " message triggers for
devices that simply don't support MSI-X and the user may think that
something is wrong while in fact everything works as expected.
- Put some (mostly redundant) debug messages emitted by iflib(4)
and em(4) during attachment under bootverbose. The non-verbose
output of em(4) seen during attachment now is close to the one
prior to the conversion to iflib(4).
- Replace various variants of spelling "MSI-X" (several in messages)
with "MSI-X" as used in the PCI specifications.
- Remove some trailing whitespace from messages emitted by iflib(4)
and change them to consistently start with uppercase.
- Remove some obsolete comments about releasing interrupts from
drivers and correct a few others.

Reviewed by: erj, Jacob Keller, shurd
Differential Revision: https://reviews.freebsd.org/D18980


# 088a0b27 23-Jan-2019 Eric Joyner <erj@FreeBSD.org>

intel iflib drivers: correct initialization of tx_cidx_processed

From Jake:

In r341156 ("Fix first-packet completion", 2018-11-28) a hack to work
around a delta calculation determining how many descriptors were used
was added to ixl_isc_tx_credits_update_dwb.

The same fix was also applied to the em and igb drivers in r340310, and
to ix in r341156.

The hack checked the case where prev and cur were equal, and then added
one. This works, because by the time we do the delta check, we already
know there is at least one packet available, so the delta should be at
least one.

However, it's not a complete fix, and as indicated by the comment is
really a hack to work around the real bug.

The real problem is that the first time that we transmit a packet,
tx_cidx_processed will be set to point to the start of the ring.
Ultimately, the credits_update function expects it to point to the
*last* descriptor that was processed. Since we haven't yet processed any
descriptors, pointing it to 0 results in this incorrect calculation.

Fix the initialization code to have it point to the end of the ring
instead. One way to think about this, is that we are setting the value
to be one prior to the first available descriptor.

Doing so, corrects the delta calculation in all cases. The original fix
only works if the first packet has exactly one descriptor. Otherwise, we
will report 1 less than the correct value.

As part of this fix, also update the MPASS assertions to match the real
expectations. First, ensure that prev is not equal to cur, since this
should never happen. Second, remove the assertion about prev==0 || delta
!= 0. It looks like that originated from when the em driver was
converted to iflib. It seems like it was supposed to ensure that delta
was non-zero. However, because we originally returned 0 delta for the
first calculation, the "prev == 0" was tacked on.

Instead, replace this with a check that delta is greater than zero,
after the correction necessary when the ring pointers wrap around.

This new solution should fix the same bug as r341156 did, but in a more
robust way.

Submitted by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed by: shurd@
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D18545


# 382000a1 07-Oct-2018 Eric van Gyzen <vangyzen@FreeBSD.org>

em/igb: Do not print link state messages

These messages are totally redundant with the iflib messages.
They're also not very useful, since they don't include the
interface name.

Discussed with: shurd
Approved by: re (rgrimes)
Sponsored by: Dell EMC Isilon


# 861437f8 20-Sep-2018 Stephen Hurd <shurd@FreeBSD.org>

Add IFCAP_TSO6 for igb

It seems igb supports TSO6, but the capability got lost in
the iflib update. Restore this capability.

PR: 231476
Reported by: lev
Reviewed by: erj
Approved by: re (gjb)
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D17242


# 73ed47f0 13-Aug-2018 Marius Strobl <marius@FreeBSD.org>

Remove the duplicated CSUM_IP6_TCP introduced in r311849 from the TX
checksum capabilities of IGB-class MACs. While at it, fix the line
wrapping.

PR: 230571


# 9820d945 22-Jul-2018 Marius Strobl <marius@FreeBSD.org>

o In em_if_update_admin_status():
- Don't bother calling if_setbaudrate(9) as iflib_link_state_change(9)
takes care of that,
- correctly check for E1000_CTRL_EXT_LINK_MODE_GMII in E1000_CTRL_EXT [1],
- properly convert the uint16_t link_speed to a uint64_t baudrate by
using IF_Mbps() which contains an appropriate cast [2],
- remove the duplicate link down announcement when bootverbose isn't
zero and bring the remaining one in line with the other link state
messages.
o Remove a dead store to rid in em_if_msix_intr_assign(). [3]
o Or in the DMA coalescing Rx threshold so the other bits set in E1000_DMACR
remain intact as intended in igb_init_dmac(). [4]

Reported by: Coverity
CID: 1378464 [1], 1368765 [2], 1381681 [3], 1304929 [4]


# c1176e63 16-Jul-2018 Marius Strobl <marius@FreeBSD.org>

Update igb_sctx_init for r336313, missed when incorporating shurd@'s
feedback on the initial D15720.

Reported by: kib


# 7f87c040 15-Jul-2018 Marius Strobl <marius@FreeBSD.org>

Assorted TSO fixes for em(4)/iflib(9) and dead code removal:
- Ever since the workaround for the silicon bug of TSO4 causing MAC hangs
was committed in r295133, CSUM_TSO always got disabled unconditionally
by em(4) on the first invocation of em_init_locked(). However, even with
that problem fixed, it turned out that for at least e. g. 82579 not all
necessary TSO workarounds are in place, still causing MAC hangs even at
Gigabit speed. Thus, for stable/11, TSO usage was deliberately disabled
in r323292 (r323293 for stable/10) for the EM-class by default, allowing
users to turn it on if it happens to work with their particular EM MAC
in a Gigabit-only environment.
In head, the TSO workaround for speeds other than Gigabit was lost with
the conversion to iflib(9) in r311849 (possibly along with another one
or two TSO workarounds). Yet at the same time, for EM-class MACs TSO4
got enabled by default again, causing device hangs. Therefore, change the
default for this hardware class back to have TSO4 off, allowing users
to turn it on manually if it happens to work in their environment as
we do in stable/{10,11}. An alternative would be to add a whitelist of
EM-class devices where TSO4 actually is reliable with the workarounds in
place, but given that the advantage of TSO at Gigabit speed is rather
limited - especially with the overhead of these workarounds -, that's
really not worth it. [1]
This change includes the addition of an isc_capabilities to struct
if_softc_ctx so iflib(9) can also handle interface capabilities that
shouldn't be enabled by default which is used to handle the default-off
capabilities of e1000 as suggested by shurd@ and moving their handling
from em_setup_interface() to em_if_attach_pre() accordingly.
- Although 82543 support TSO4 in theory, the former lem(4) didn't have
support for TSO4, presumably because TSO4 is even more broken in the
LEM-class of MACs than the later EM ones. Still, TSO4 for LEM-class
devices was enabled as part of the conversion to iflib(9) in r311849,
causing device hangs. So revert back to the pre-r311849 behavior of
not supporting TSO4 for LEM-class at all, which includes not creating
a TSO DMA tag in iflib(9) for devices not having IFCAP_TSO4 set. [2]
- In fact, the FreeBSD TCP stack can handle a TSO size of IP_MAXPACKET
(65535) rather than FREEBSD_TSO_SIZE_MAX (65518). However, the TSO
DMA must have a maxsize of the maximum TSO size plus the size of a
VLAN header for software VLAN tagging. The iflib(9) converted em(4),
thus, first correctly sets scctx->isc_tx_tso_size_max to EM_TSO_SIZE
in em_if_attach_pre(), but later on overrides it with IP_MAXPACKET
in em_setup_interface() (apparently, left-over from pre-iflib(9)
times). So remove the later and correct iflib(9) to correctly cap
the maximum TSO size reported to the stack at IP_MAXPACKET. While at
it, let iflib(9) use if_sethwtsomax*().
This change includes the addition of isc_tso_max{seg,}size DMA engine
constraints for the TSO DMA tag to struct if_shared_ctx and letting
iflib_txsd_alloc() automatically adjust the maxsize of that tag in case
IFCAP_VLAN_MTU is supported as requested by shurd@.
- Move the if_setifheaderlen(9) call for adjusting the maximum Ethernet
header length from {ixgbe,ixl,ixlv,ixv,em}_setup_interface() to iflib(9)
so adjustment is automatically done in case IFCAP_VLAN_MTU is supported.
As a consequence, this adjustment now is also done in case of bnxt(4)
which missed it previously.
- Move the reduction of the maximum TSO segment count reported to the
stack by the number of m_pullup(9) calls (which in the worst case,
can add another mbuf and, thus, the requirement for another DMA
segment each) in the transmit path for performance reasons from
em_setup_interface() to iflib_txsd_alloc() as these pull-ups are now
done in iflib_parse_header() rather than in the no longer existing
em_xmit(). Moreover, this optimization applies to all drivers using
iflib(9) and not just em(4); all in-tree iflib(9) consumers still
have enough room to handle full size TSO packets. Also, reduce the
adjustment to the maximum number of m_pullup(9)'s now performed in
iflib_parse_header().
- Prior to the conversion of em(4)/igb(4)/lem(4) and ixl(4) to iflib(9)
in r311849 and r335338 respectively, these drivers didn't enable
IFCAP_VLAN_HWFILTER by default due to VLAN events not being passed
through by lagg(4). With iflib(9), IFCAP_VLAN_HWFILTER was turned on
by default but also lagg(4) was fixed in that regard in r203548. So
just remove the now redundant and defunct IFCAP_VLAN_HWFILTER handling
in {em,ixl,ixlv}_setup_interface().
- Nuke other redundant IFCAP_* setting in {em,ixl,ixlv}_setup_interface()
which is (more completely) already done in {em,ixl,ixlv}_if_attach_pre()
now.
- Remove some redundant/dead setting of scctx->isc_tx_csum_flags in
em_if_attach_pre().
- Remove some IFCAP_* duplicated either directly or indirectly (e. g.
via IFCAP_HWCSUM) in {EM,IGB,IXL}_CAPS.
- Don't bother to fiddle with IFCAP_HWSTATS in ixgbe(4)/ixgbev(4) as
iflib(9) adds that capability unconditionally.
- Remove some unused macros from em(4).
- Bump __FreeBSD_version as some of the above changes require the modules
of drivers using iflib(9) to be recompiled.

Okayed by: sbruno@ at 201806 DevSummit Transport Working Group [1]
Reviewed by: sbruno (earlier version), erj
PR: 219428 (part of; comment #10) [1], 220997 (part of; comment #3) [2]
Differential Revision: https://reviews.freebsd.org/D15720


# d5210708 07-May-2018 Matt Macy <mmacy@FreeBSD.org>

Sleep rather than spin in e1000 when doing long running config operations.

With r333218 it is now possible for drivers to use an sx lock and thus sleep while
waiting on long running operations rather than DELAY().

Reported by: gallatin
Reviewed by: sbruno
Approved by: sbruno
MFC after: 1 month
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D14984


# 7021bf05 21-Mar-2018 Stephen Hurd <shurd@FreeBSD.org>

Update copyright per Matthew Macy

"Under my tutelage Nicole did 85% of the work. At the time it seemed
simplest for a number of reasons to put my copyright on it. I now consider
that to have been a mistake."

Submitted by: Matthew Macy <mmacy@mattmacy.io>
Reviewed by: shurd
Approved by: shurd
Differential Revision: https://reviews.freebsd.org/D14766


# ac2fffa4 21-Jan-2018 Pedro F. Giffuni <pfg@FreeBSD.org>

Revert r327828, r327949, r327953, r328016-r328026, r328041:
Uses of mallocarray(9).

The use of mallocarray(9) has rocketed the required swap to build FreeBSD.
This is likely caused by the allocation size attributes which put extra pressure
on the compiler.

Given that most of these checks are superfluous we have to choose better
where to use mallocarray(9). We still have more uses of mallocarray(9) but
hopefully this is enough to bring swap usage to a reasonable level.

Reported by: wosch
PR: 225197


# 1b65356b 11-Jan-2018 Eric Joyner <erj@FreeBSD.org>

e1000: Fix typos in value written to register and a comment

The value written to E1000_TARC(0) wasn't intended to have every bit but
E1000_TARC0_CB_MULTIQ_3_REQ cleared; a ~ was missing.

Also change the referenced spec update section in the comment to the correct
section.

Sponsored by: Intel Corporation


# efaa3e07 11-Jan-2018 Pedro F. Giffuni <pfg@FreeBSD.org>

dev/(e1000,ixl): Make some use of mallocarray(9).

Reviewed by: erj
Differential Revision: https://reviews.freebsd.org/D13833


# 6fe4c0a0 28-Dec-2017 Sean Bruno <sbruno@FreeBSD.org>

e1000: Add support for Ice Lake and Cannon Lake

Ths add initial support for Ice Lake and Cannon Lake ethernet devices.

This also addressed errata 1.5.4.4 for Sky Lake and Kabby Lake devices:
https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/i218-i219-ethernet-connection-spec-update.pdf?asset=9561

Submitted by: Kevin Bowling <kevin.bowling@kev009.com>
Relnotes: Yes
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D13660


# 96fc97c8 19-Dec-2017 Stephen Hurd <shurd@FreeBSD.org>

Update Matthew Macy contact info

Email address has changed, uses consistent name (Matthew, not Matt)

Reported by: Matthew Macy <mmacy@mattmacy.io>
Differential Revision: https://reviews.freebsd.org/D13537


# 7282444b 20-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys/dev: further adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.


# 1c0054d2 05-Oct-2017 Stephen Hurd <shurd@FreeBSD.org>

Fix "taskqgroup_attach: setaffinity failed: 3" with iflib drivers

Improved logging added in r323879 exposed an error during
attach. We need the irq, not the rid to work correctly. em uses
shared irqs, so it will use the same irq for TX as RX. bnxt does
not use shared irqs, or TX irqs at all, so there's no need to set
the TX irq affinity.

Reviewed by: sbruno
Approved by: sbruno (mentor)
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D12496


# 916616c4 26-Sep-2017 Conrad Meyer <cem@FreeBSD.org>

Add PNP metadata to more drivers

GPUs: radeonkms, i915kms
NICs: if_em, if_igb, if_bnxt

This metadata isn't used yet, but it will be handy to have later to
implement automatic module loading.

Reviewed by: imp, mmacy
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D12488


# c5cf2172 22-Sep-2017 Stephen Hurd <shurd@FreeBSD.org>

Some small packet performance improvements

If the packet is smaller than MTU, disable the TSO flags.
Move TCP header parsing inside the IS_TSO?() test.
Add a new IFLIB_NEED_ZERO_CSUM flag to indicate the checksums need to be zeroed before TX.

Reviewed by: sbruno
Approved by: sbruno (mentor)
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D12442


# ab2e3f79 15-Sep-2017 Stephen Hurd <shurd@FreeBSD.org>

Revert r323516 (iflib rollup)

This was really too big of a commit even if everything worked, but there
are multiple new issues introduced in the one huge commit, so it's not
worth keeping this until it's fixed.

I'll work on splitting this up into logical chunks and introduce them one
at a time over the next week or two.

Approved by: sbruno (mentor)
Sponsored by: Limelight Networks


# d300df01 12-Sep-2017 Stephen Hurd <shurd@FreeBSD.org>

Roll up iflib commits from github. This pulls in most of the work done
by Matt Macy as well as other changes which he has accepted via pull
request to his github repo at https://github.com/mattmacy/networking/

This should bring -CURRENT and the github repo into close enough sync to
allow small feature branches rather than a large chain of interdependant
patches being developed out of tree. The reset of the synchronization
should be able to be completed on github by splitting the remaining
changes that are not yet ready into short feature branches for later
review as smaller commits.

Here is a summary of changes included in this patch:

1) More checks when INVARIANTS are enabled for eariler problem
detection
2) Group Task Queue cleanups
- Fix use of duplicate shortdesc for gtaskqueue malloc type.
Some interfaces such as memguard(9) use the short description to
identify malloc types, so duplicates should be avoided.
3) Allow gtaskqueues to use ithreads in addition to taskqueues
- In some cases, this can improve performance
4) Better logging when taskqgroup_attach*() fails to set interrupt
affinity.
5) Do not start gtaskqueues until they're needed
6) Have mp_ring enqueue function enter the ABDICATED rather than BUSY
state. This moves the TX to the gtaskq and allows processing to
continue faster as well as make TX batching more likely.
7) Add an ift_txd_errata function to struct if_txrx. This allows
drivers to inspect/modify mbufs before transmission.
8) Add a new IFLIB_NEED_ZERO_CSUM for drivers to indicate they need
checksums zeroed for checksum offload to work. This avoids modifying
packet data in the TX path when possible.
9) Use ithreads for iflib I/O instead of taskqueues
10) Clean up ioctl and support async ioctl functions
11) Prefetch two cachlines from each mbuf instead of one up to 128B. We
often need to parse packet header info beyond 64B.
12) Fix potential memory corruption due to fence post error in
bit_nclear() usage.
13) Improved hang detection and handling
14) If the packet is smaller than MTU, disable the TSO flags.
This avoids extra packet parsing when not needed.
15) Move TCP header parsing inside the IS_TSO?() test.
This avoids extra packet parsing when not needed.
16) Pass chains of mbufs that are not consumed by lro to if_input()
rather call if_input() for each mbuf.
17) Re-arrange packet header loads to get as much work as possible done
before a cache stall.
18) Lock the context when calling IFDI_ATTACH_PRE()/IFDI_ATTACH_POST()/
IFDI_DETACH();
19) Attempt to distribute RX/TX tasks across cores more sensibly,
especially when RX and TX share an interrupt. RX will attempt to
take the first threads on a core, and TX will attempt to take
successive threads.
20) Allow iflib_softirq_alloc_generic() to request affinity to the same
cpus an interrupt has affinity with. This allows TX queues to
ensure they are serviced by the socket the device is on.
21) Add new iflib sysctls to net.iflib:
- timer_int - interval at which to run per-queue timers in ticks
- force_busdma
22) Add new per-device iflib sysctls to dev.X.Y.iflib
- rx_budget allows tuning the batch size on the RX path
- watchdog_events Count of watchdog events seen since load
23) Fix error where netmap_rxq_init() could get called before
IFDI_INIT()
24) e1000: Fixed version of r323008: post-cold sleep instead of DELAY
when waiting for firmware
- After interrupts are enabled, convert all waits to sleeps
- Eliminates e1000 software/firmware synchronization busy waits after
startup
25) e1000: Remove special case for budget=1 in em_txrx.c
- Premature optimization which may actually be incorrect with
multi-segment packets
26) e1000: Split out TX interrupt rather than share an interrupt for
RX and TX.
- Allows better performance by keeping RX and TX paths separate
27) e1000: Separate igb from em code where suitable
Much easier to understand separate functions and "if (is_igb)" than
previous tests like "if (reg_icr & (E1000_ICR_RXSEQ | E1000_ICR_LSC))"

#blamebruno

Reviewed by: sbruno
Approved by: sbruno (mentor)
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D12235


# dc63dc00 08-Sep-2017 Konstantin Belousov <kib@FreeBSD.org>

Fix malloc() uses in em_get_regs().

Do not use malloc(M_NOWAIT), wait is possible there, and the malloc
failures where not checked. Do not forget to free malloced memory.

Reported and tested by: pho
Approved by: sbruno
Sponsored by: The FreeBSD Foundation


# a9693502 30-Aug-2017 Sean Bruno <sbruno@FreeBSD.org>

Revert r323008 and its conversion of e1000/iflib to using SX locks.

This seems to be missing something on the 82574L causing NFS root mounts
to hang.

Reported by: kib


# e17e5b41 29-Aug-2017 Sean Bruno <sbruno@FreeBSD.org>

Continuation of lock cleanup in e1000.

Post-cold sleep instead of DELAY when waiting for firmware.

Convert softc mutex to an SX lock. Change all waits to sleeps
once interrupts are enabled (and it is safe to sleep).

Submitted by: Matt Macy <matt@mattmacy.io>
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D12101


# e023501c 28-Aug-2017 Marius Strobl <marius@FreeBSD.org>

Don't set any WOL enabling hardware bits if WOL isn't requested
according to the enabled interface capability bits. Also remove
some dead code, which tried to preserve already set contents of
E1000_WUC while that register is completely overwritten shortly
after in all cases.


# 7d0d6484 25-Aug-2017 Sean Bruno <sbruno@FreeBSD.org>

Add a different #define for the maximum number of transmit and
recieve descriptors for the igb(4) class of devices. This will
allow a better definition for maximum going forward. Some igb(4)
devices support more than the default 4K.

Reported by: Jason (j@nitrology.com)
Sponsored by: Limelight Networks


# 5eedcb09 21-Jul-2017 Sean Bruno <sbruno@FreeBSD.org>

Do not update stats counter in SWI context. Defer to the already existing
admin thread.

Submitted by: Matt Macy <mmacy@mattmacy.io>
Sponsored by: Limelight Networks


# 2df7231d 19-Jul-2017 Enji Cooper <ngie@FreeBSD.org>

Some trivial style(9) fixes

- Delete trailing whitespace.
- Fix leading indentation (convert single column spaces to tabs).
- Convert "[Ff]all through" to "FALLTHROUGH", per implicit project
style/spelling.

Reviewed by: sbruno
Differential Revision: D11665


# d8c2808f 19-Jul-2017 Sean Bruno <sbruno@FreeBSD.org>

Restore igb(4) code dropped during iflib conversion
- restore newer code for vf, i350, i210, i211
- restore dmac init code for i354 and i350
- restore WUC/WUFC update
- check for igb mac type before attempting trying to assert
a media changed event.
- handle link events for igb(4) and em(4) devices differently
and appropriately for their respective model types.

Submitted by: Matt Macy <mmacy@mattmacy.io>
Sponsored by: Limelight Networks


# 60596476 06-Apr-2017 Sean Bruno <sbruno@FreeBSD.org>

Move pause frame counter out of struct if_ctx and into struct if_softc_ctx_t
so that we can use it in iflib to detect pause frames.

The igb(4) driver definitely used to use this in its old timer function and
I see no reason to restrict it to that driver only.

Sponsored by: Limelight Networks


# fd70242d 04-Apr-2017 Sean Bruno <sbruno@FreeBSD.org>

no_desc_avail is tracked in iflib now making this redundant.

Sponsored by: Limelight Networks


# 548b549a 03-Apr-2017 Sean Bruno <sbruno@FreeBSD.org>

Remove unsafe and non-functional DDB functions that I added long ago
for debugging.


# c45420cc 03-Apr-2017 Sean Bruno <sbruno@FreeBSD.org>

Remove rx_processing_limit sysctl and now orphaned function em_set_sysctl_value

Sponsored by: Limelight Networks


# 935ca1ae 27-Mar-2017 Sean Bruno <sbruno@FreeBSD.org>

Access *correct* ifp data structure when debug sysctl is invoked.

Submitted by: Kevin Bowling <kevin.bowling@kev009.com>
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D10126


# cb101e12 24-Mar-2017 Sean Bruno <sbruno@FreeBSD.org>

Add missing 'else' to conditional. This doesn't really affect the code
flow or configuration in any way.


# fdb25f38 24-Mar-2017 Sean Bruno <sbruno@FreeBSD.org>

Add missing 'else' to 3-state conditional during setup of interrupts.

We don't want to overwrite the 82574 interrupt setup with a different
configuration.

PR: 218041
Submitted by: razmyslov@viva64.com


# b207ed2b 15-Mar-2017 Sean Bruno <sbruno@FreeBSD.org>

Actually set the MTU to the requested value and fixup handling of jumbo
frames.

Submitted by: Matt Macy <mmacy@nextbsd.org>
Reported by: pho
Sponsored by: Limelight Networks


# 95246abb 13-Mar-2017 Sean Bruno <sbruno@FreeBSD.org>

IFLIB updates
- unconditionally enable BUS_DMA on non-x86 architectures
- speed up rxd zeroing via customized function
- support out of order updates to rxd's
- add prefetching to hardware descriptor rings
- only prefetch on 10G or faster hardware
- add seperate tx queue intr function
- preliminary rework of NETMAP interfaces, WIP

Submitted by: Matt Macy <mmacy@nextbsd.org>
Sponsored by: Limelight Networks


# 38b7de95 18-Feb-2017 Sean Bruno <sbruno@FreeBSD.org>

Restore PBA setup for igb(4) class devices.

Do no write to PBA register on igb(4) devices unless we need
to make adjustments for the 82575 and jumbo frames.

Remove redundant LPE/~LPE assignments.

Move e1000_lv_jumbo_workaround_ich8lan() invokcation into a block
so that its not executed in the igb case.

Move em(4) class assignments of RCTL values to its own code block.

Adjust a few direct accesses of ifp->mtu to use accessor functions.

PR: 216734
Submitted by: Kaho Toshikazu <kaho@elam.kais.kyoto-u.ac.jp>


# 9c030a21 17-Feb-2017 Sean Bruno <sbruno@FreeBSD.org>

Push max_frame_len back into iflib so that jumbo frame sizes work.

Reported by: pho


# bcc537c5 12-Feb-2017 Sean Bruno <sbruno@FreeBSD.org>

Only trigger em_local_timer on queue index 0. This was causing continuous
em_local_timer() executions during normal operation and was very likely
to cause a lock up on igb(4) devices.

Submitted by: Matt Macy (mmacy@nextbsd.org)
Reported by: jtl
Reviewed by: gallatin
Sponsored by: Limelight Networks & Netflix


# 2a3c5de4 09-Feb-2017 Andrew Turner <andrew@FreeBSD.org>

Add support for the Intel 82572EI back to em(4), it seems it was dropped
when oving to iflib.

Reviewed by: sbruno
Sponsored by: ABT Systems Ltd
Differential Revision: https://reviews.freebsd.org/D9511


# d0b2cad1 31-Jan-2017 Stephen J. Kiernan <stevek@FreeBSD.org>

Add the folowing set accessor functions for recently-added members of ifnet
structure:

if_gethwtsomax(), if_sethwtsomax() - if_hw_tsomax
if_gethwtsomaxsegcount(), if_sethwtsomaxsegcount() - if_hw_tsomaxsegcount
if_gethwtsomaxsegsize(), if_sethwtsomaxsegsize() - if_hw_tsomaxsegsize

Update em and vnic drivers which had already been coverted to use accessor
functions for the other ifnet structure members.

Reviewed by: erj
Approved by: sjg (mentor)
Obtained from: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D8544


# db569544 22-Jan-2017 Sean Bruno <sbruno@FreeBSD.org>

igb(4) enable WOL features for this class of devices.

PR: 208343
Submitted by: Kaho Tashikazu <kaho@elam.kais.kyoto-u.ac.jp>


# 295df609 19-Jan-2017 Eric Joyner <erj@FreeBSD.org>

e1000: Add support for Kaby Lake generation i219 (4) and i219 (5) devices

MFC after: 1 week
Sponsored by: Intel Corporation


# 653e35e6 12-Jan-2017 Sean Bruno <sbruno@FreeBSD.org>

Restore fixup for newer em(4) devices WOL capabilities post iflib integration.

PR: 208343


# 0aa7d3ff 12-Jan-2017 Sean Bruno <sbruno@FreeBSD.org>

Reset the EIAC register to include the LINK status bit and restore
link up/down notifications.

Submitted by: Franco Fichtner <franco@opnsense.org>


# 8bc3dfc4 12-Jan-2017 Sean Bruno <sbruno@FreeBSD.org>

Attempt to use the "new" BAR address for newer igb(4) devices. This code
was dropped during the IFLIB migration.

Reported by: olivier
Reviewed by: mmacy@nextbsd.org


# d37cece2 09-Jan-2017 Sean Bruno <sbruno@FreeBSD.org>

Add copywrite notices, 2-clause BSD.

Reported by: jmallett


# f2d6ace4 09-Jan-2017 Sean Bruno <sbruno@FreeBSD.org>

Migrate e1000 to the IFLIB framework:
- em(4) igb(4) and lem(4)
- deprecate the igb device from kernel configurations
- create a symbolic link in /boot/kernel from if_em.ko to if_igb.ko

Devices tested:
- 82574L
- I218-LM
- 82546GB
- 82579LM
- I350
- I217

Please report problems to freebsd-net@freebsd.org

Partial review from jhb and suggestions on how to *not* brick folks who
originally would have lost their igbX device.

Submitted by: mmacy@nextbsd.org
MFC after: 2 weeks
Relnotes: yes
Sponsored by: Limelight Networks and Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8299


# 751e3554 14-Nov-2016 Sean Bruno <sbruno@FreeBSD.org>

Update WOL support for newer em(4) devices.

Do not overwrite the contents of the WUC register, add E1000_WUC_PME_EN
to the register contents, leaving the default contents intact.

PR: 208343
Submitted by: Kaho Toshikazu <kaho@elam.kais.kyoto-u.ac.jp>
Reviewed by: jeffrey piper <jeffrey.e.pieper@intel.com>
Approved by: erj@
MFC after: 2 weeks


# b1b6afa5 05-Nov-2016 Sean Bruno <sbruno@FreeBSD.org>

r295133 attempted to deactivate TSO in the 100Mbit link case with this
adapter to work around bugs in TSO handling at this speed.

em_init_locked is called during first boot of the adapter and will
see that link_speed is unitialized, effectively turning off tso for
all cards at all speeds, which I believe was *not* the intent.

Move the handling of TSO deactivation to the link handler where we can
more effectively make the decision about what to do. In addition,
completely purge the TSO capabilities instead of disabling just CSUM_TSO.

Thanks to jhb for explanation of the hw capabilites api.

Thanks to royger and cognet for testing the 100Mbit failure case to
ensure that their adapters do indeed still work.

MFC after: 1 week
Sponsored by: Limelight Networks


# 7846f73c 02-Nov-2016 Sean Bruno <sbruno@FreeBSD.org>

Removed unused M_TSO_LEN.

MFC after: 2 weeks


# e760e292 15-Aug-2016 Sean Bruno <sbruno@FreeBSD.org>

e1000: Add support for Kaby Lake IDs

Fixup some errors when transitioning to/from low power states.

Submitted by: erj
Reviewed by: Jeffery Piper (jeffrey.e.piper@intel.com)
MFC after: 3 days
Relnotes: yes
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D7478


# e72a746a 01-Aug-2016 Sean Bruno <sbruno@FreeBSD.org>

r293331 mistakingly failed to add an assignment of paddr to the rxbuf
but only in the NETMAP code. This lead to the NETMAP code paths
passing nothing up to userland.

Submitted by: Ad Schellevis <ad@opnsense.org>
Reported by: Franco Fichtner <franco@opnsense.org>
MFC after: 1 day


# 761e5261 06-Jul-2016 Sean Bruno <sbruno@FreeBSD.org>

Do not initialize the adapter on MTU change when adapter status is down.
This fixes long-standing problems when changing settings of the adapter.

Discussed in:
https://lists.freebsd.org/pipermail/freebsd-net/2016-June/045509.html

Submitted by: arnaud.ysmal@stormshield.eu
Reviewed by: erj@freebsd.org
Approved by: re (gjb)
Differential Revision: https://reviews.freebsd.org/D7030


# 363089d8 06-May-2016 Pedro F. Giffuni <pfg@FreeBSD.org>

dev/e1000,ixgbe: minor spelling fixes.

No functional change.

Differential Revision: https://reviews.freebsd.org/D6177


# ae6bd5b7 18-Apr-2016 Sean Bruno <sbruno@FreeBSD.org>

Correct possible underflow conditions when checking for available space
in the tx h/w ring buffer.

Reviewed by: gnn jeb.j.cramer@intel.com
MFC after: 1 week
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D5918


# 0a01ff4d 22-Feb-2016 Marius Strobl <marius@FreeBSD.org>

Fix and clean up usage of DMA and TSO segments:
- At Intel it is believed that most of their products support "only"
40 DMA segments so lower {EM,IGB}_MAX_SCATTER accordingly. Actually,
40 is more than plenty to handle full size TSO packets so it doesn't
make sense to further distinguish between MAC variants that really
can do 64 DMA segments. Moreover, capping at 40 DMA segments limits
the stack usage of {em,igb}_xmit() that - given the rare use of more
than these - previously hardly was justifiable, while still being
sufficient to avoid the problems seen with em(4) and EM_MAX_SCATTER
set to 32.
- In igb(4), pass the actually supported TSO parameters up the stack.
Previously, the defaults set in if_attach_internal() were applied,
i. e. a maximum of 35 TSO segments, which made supporting more than
these in the driver pointless. However, this might explain why no
problems were seen with IGB_MAX_SCATTER at 64.
- In em(4), take the 5 m_pullup(9) invocations performed by em_xmit()
in the TSO case into account when reporting TSO parameters upwards.
In the worst case, each of these calls will add another mbuf and,
thus, the requirement for an additional DMA segment. So for best
performance, it doesn't make sense to advertize a maximum of TSO
segments that typically will require defragmentation in em_xmit().
Again, this leaves enough room to handle full size TSO packets.
- Drop TSO macros from if_lem.h given that corresponding MACS don't
support TSO in the first place.

Reviewed by: erj, sbruno, jeffrey.e.pieper_intel.com
Approved by: erj
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D5238


# c80429ce 05-Feb-2016 Eric Joyner <erj@FreeBSD.org>

Update em(4) to 7.6.1; update igb(4) to 2.5.3.

Major changes:

- Add i219/i219(2) hardware support. (Found on Skylake generation and newer
chipsets.)
- Further to the last Skylake support diff, this one also includes support for
the Lewisburg chipset (i219(3)).

- Add a workaround to an igb hardware errata.
All 1G server products need to have IPv6 extension header parsing turned off.
This should be listed in the specification updates for current 1G server
products, e.g. for i350 it's errata #37 in this document:
http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/ethernet-controller-i350-spec-update.pdf

- Avoton (i354) PHY errata workaround added

And a bunch of minor fixes, as well as #defines for things that the current
em(4)/igb(4) drivers don't implement.

Differential Revision: https://reviews.freebsd.org/D3162
Reviewed by: sbruno, marius, gnn
Approved by: gnn
MFC after: 2 weeks
Sponsored by: Intel Corporation


# d233a81b 01-Feb-2016 Marius Strobl <marius@FreeBSD.org>

As it turns out, one of the more or less recent changes to em(4)
causes watchdog timeouts when using TSO4 at link speeds below
Gigabit, at least with 82573E. So disable the assist automatically
when at lower speeds.

Submitted by: jfv
Approved by: erj
Obtained from: D3162
MFC after: 3 days


# e12a9f25 13-Jan-2016 Marius Strobl <marius@FreeBSD.org>

Given that em(4), lem(4) and igb(4) hardware doesn't require the
alignment guarantees provided by m_defrag(9), use m_collapse(9)
instead for performance reasons.
While at it, sanitize the statistics softc members, i. e. retire
unused ones and add SYSCTL nodes missing for actually used ones.

Differential Revision: https://reviews.freebsd.org/D4717


# 676822ac 07-Jan-2016 Sean Bruno <sbruno@FreeBSD.org>

Disable the reuse of checksum offload context descriptors in the case
of multiple queues in em(4). Document errata in the code.

MFC after: 2 weeks
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D3995


# b834dcea 07-Jan-2016 Sean Bruno <sbruno@FreeBSD.org>

Switch em(4) to the extended RX descriptor format. This matches the
e1000/e1000e split in linux.

Split rxbuffer and txbuffer apart to support the new RX descriptor format
structures. Move rxbuffer manipulation to em_setup_rxdesc() to unify the
new behavior changes.

Add a RSSKEYLEN macro for help in generating the RSSKEY data structures
in the card.

Change em_receive_checksum() to process the new rxdescriptor format
status bit.

MFC after: 2 weeks
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D3447


# 8061e8bb 07-Jan-2016 Sean Bruno <sbruno@FreeBSD.org>

Wow, um ... sorry about that. The commit log for this code should have
read that it was for EM_MULTIQUEUE. Revert this and try again.


# 712b97a6 07-Jan-2016 Sean Bruno <sbruno@FreeBSD.org>

Switch em(4) to the extended RX descriptor format. This matches the
e1000/e1000e split in linux.

MFC after: 2 weeks
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D3447


# e373323f 19-Sep-2015 Sean Bruno <sbruno@FreeBSD.org>

Revert 287914,287762.

Reports of breakage on igb(4) have been narrowed down to 287762 and 287914
is an dependant change.

Submitted by: erj


# 25fd5dd9 17-Sep-2015 Sean Bruno <sbruno@FreeBSD.org>

Add Intel Skylake/I219 Support
- New em(4) device in currently shipping products

Differential Revision: https://reviews.freebsd.org/D3163
Submitted by: erj@freebsd.org
Reviewed by: jfv@freebsd.org
MFC after: 2 weeks
Sponsored by: Intel Corporation


# c597a019 05-Sep-2015 Sean Bruno <sbruno@FreeBSD.org>

Revert last two commits to em(4)/igb(4). Reports are coming in that
this breaks initialization and reads from EEPROM on boot/driver load.

r287469 is being reverted as a dependancy on r287467


# 98ae230f 04-Sep-2015 Sean Bruno <sbruno@FreeBSD.org>

em(4): Add Skylake/I219 support.
- driver rev 7.5.2
- use new functions em_flush* for i219 devices

Differential Revision: https://reviews.freebsd.org/D3163
Submitted by: erj jfv
Reviewed by: jfv
MFC after: 1 month
Relnotes: Yes
Sponsored by: Intel Corporation


# fac82436 31-Aug-2015 Sean Bruno <sbruno@FreeBSD.org>

Restrict tso_max to IP_MAXPACKET to avoid the panic reported in:
https://lists.freebsd.org/pipermail/freebsd-current/2015-August/057192.html

Submitted by: pyunyh@gmail.com
MFC after: 2 weeks


# df40405f 16-Aug-2015 Sean Bruno <sbruno@FreeBSD.org>

Increase EM_MAX_SCATTER to 64 such that the size of em_xmit()::segs[EM_MAX_SCATTER]
doesn't get overrun by things like NFS that can and do shove more than 32 segs when
being used with em(4) and TSO4.

Update tso handling code in em_xmit() with update from jhb@ in email thread:
https://lists.freebsd.org/pipermail/freebsd-net/2014-July/039306.html

set ifp->if_hw_tsomax, ifp->if_hw_tsomaxsegcount & ifp->if_hw_tsomaxsegsize
to appropriate values.

Define a TSO workaround "magic" number of 4 that is used to avoid an
alignment issue in hardware.

Change a couple of integer values that were used as booleans to actual
bool types.

Ensure that em_enable_intr() enables the appropriate mask of interrupts
and not just a hardcoded define of values.

PR: 200221 199174 195078
Differential Revision: https://reviews.freebsd.org/D3192
Reviewed by: erj jhb hiren
MFC after: 2 weeks
Sponsored by: Limelight Networks


# 38be29d3 16-Aug-2015 Sean Bruno <sbruno@FreeBSD.org>

Add capability to disable CRC stripping. This breaks IPMI/BMC capabilities on certain adatpers.
Linux has been doing the exact same thing since 2008

https://github.com/torvalds/linux/commit/eb7c3adb1ca92450870dbb0d347fc986cd5e2af4

PR: 161277
Differential Revision: https://reviews.freebsd.org/D3282
Submitted by: Fravadona@gmail.com
Reviewed by: erj wblock
MFC after: 2 weeks
Relnotes: yes
Sponsored by: Limelight Networks


# 577c3413 01-Aug-2015 Hans Petter Selasky <hselasky@FreeBSD.org>

Free mbufs when busdma loading fails.

Reviewed by: erj, sbruno
MFC after: 1 month


# a82cd516 25-Jul-2015 Sean Bruno <sbruno@FreeBSD.org>

Remove unused txd_saved.

Intialize txd_upper, txd_lower and txd_used at declaration.

Differential Revision: D3174
Reviewed by: erj hiren
MFC after: 2 weeks
Sponsored by: Limelight Networks


# f46fb03d 16-Jul-2015 Sean Bruno <sbruno@FreeBSD.org>

Add an adapter CORE lock in the DDB hook em_dump_queue to avoid WITNESS
panic in em_init_locked() while debugging.

MFC after: 2 weeks
Sponsored by: Limelight Networks


# 847bf383 09-Jul-2015 Luigi Rizzo <luigi@FreeBSD.org>

Sync netmap sources with the version in our private tree.
This commit contains large contributions from Giuseppe Lettieri and
Stefano Garzarella, is partly supported by grants from Verisign and Cisco,
and brings in the following:

- fix zerocopy monitor ports and introduce copying monitor ports
(the latter are lower performance but give access to all traffic
in parallel with the application)

- exclusive open mode, useful to implement solutions that recover
from crashes of the main netmap client (suggested by Patrick Kelsey)

- revised memory allocator in preparation for the 'passthrough mode'
(ptnetmap) recently presented at bsdcan. ptnetmap is described in
S. Garzarella, G. Lettieri, L. Rizzo;
Virtual device passthrough for high speed VM networking,
ACM/IEEE ANCS 2015, Oakland (CA) May 2015
http://info.iet.unipi.it/~luigi/research.html

- fix rx CRC handing on ixl

- add module dependencies for netmap when building drivers as modules

- minor simplifications to device-specific routines (*txsync, *rxsync)

- general code cleanup (remove unused variables, introduce macros
to access rings and remove duplicate code,

Applications do not need to be recompiled, unless of course
they want to use the new features (monitors and exclusive open).

Those willing to try this code on stable/10 can just update the
sys/dev/netmap/*, sys/net/netmap* with the version in HEAD
and apply the small patches to individual device drivers.

MFC after: 1 month
Sponsored by: (partly) Verisign, Cisco


# 23c9098b 03-Jun-2015 Sean Bruno <sbruno@FreeBSD.org>

Change EM_MULTIQUEUE to a real kernconf entry and enable support for
up to 2 rx/tx queues for the 82574.

Program the 82574 to enable 5 msix vectors, assign 1 to each rx queue,
1 to each tx queue and 1 to the link handler.

Inspired by DragonFlyBSD, enable some RSS logic for handling tx queue
handling/processing.

Move multiqueue handler functions so that they line up better in a diff
review to if_igb.c

Always enqueue tx work to be done in em_mq_start, if unable to acquire
the TX lock, then this will be processed in the background later by the
taskqueue. Remove mbuf argument from em_start_mq_locked() as the work
is always enqueued. (stolen from igb)

Setup TARC, TXDCTL and RXDCTL registers for better performance and stability
in multiqueue and singlequeue implementations. Handle Intel errata 3 and
generic multiqueue behavior with the initialization of TARC(0) and TARC(1)

Bind interrupt threads to cpus in order. (stolen from igb)

Add 2 new DDB functions, one to display the queue(s) and their settings and
one to reset the adapter. Primarily used for debugging.

In the multiqueue configuration, bump RXD and TXD ring size to max for the
adapter (4096). Setup an RDTR of 64 and an RADV of 128 in multiqueue configuration
to cut down on the number of interrupts. RADV was arbitrarily set to 2x RDTR
and can be adjusted as needed.

Cleanup the display in top a bit to make it clearer where the taskqueue threads
are running and what they should be doing.

Ensure that both queues are processed by em_local_timer() by writing them both
to the IMS register to generate soft interrupts.

Ensure that an soft interrupt is generated when em_msix_link() is run so that
any races between assertion of the link/status interrupt and a rx/tx interrupt
are handled.

Document existing tuneables: hw.em.eee_setting, hw.em.msix, hw.em.smart_pwr_down, hw.em.sbp

Document use of hw.em.num_queues and the new kernel option EM_MULTIQUEUE

Thanks to Intel for their continued support of FreeBSD.

Reviewed by: erj jfv hiren gnn wblock
Obtained from: Intel Corporation
MFC after: 2 weeks
Relnotes: Yes
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D1994


# b7a728aa 02-Jun-2015 Sean Bruno <sbruno@FreeBSD.org>

Simplify hang detection by stealing the techniques used in ixl(4) and
applying them to em(4).

Rely on iterations through the local timer, and the tx queue state to
determine if an actual hang has occurred. Any time a descriptor is used
(packet sent), the tx queue is flagged as busy. Then when txeof runs, it
either clears the flag when all is clean, or resets it to 1 if ANY are
cleaned, if nothing is cleaned it increments the flag.

Local timer simply checks to see if busy ever reaches MAX (10, which
is compile time configurable), and then sets it as HUNG, at that point
there is one more timer cycle in which to have any cleans, if not a
watchdog reset will occur.

Differential Revision: https://reviews.freebsd.org/D2019
Submitted by: jfv
Reviewed by: hiren
Obtained from: Intel Corporation
MFC after: 2 weeks
Relnotes: Yes
Sponsored by: Limelight Networks


# 316f4c88 22-May-2015 Sean Bruno <sbruno@FreeBSD.org>

Bump rx_overruns when indicated by the ICR mask.

PR: 199716
MFC after: 3 days
Sponsored by: Limelight Networks


# f0188618 21-Oct-2014 Hans Petter Selasky <hselasky@FreeBSD.org>

Fix multiple incorrect SYSCTL arguments in the kernel:

- Wrong integer type was specified.

- Wrong or missing "access" specifier. The "access" specifier
sometimes included the SYSCTL type, which it should not, except for
procedural SYSCTL nodes.

- Logical OR where binary OR was expected.

- Properly assert the "access" argument passed to all SYSCTL macros,
using the CTASSERT macro. This applies to both static- and dynamically
created SYSCTLs.

- Properly assert the the data type for both static and dynamic
SYSCTLs. In the case of static SYSCTLs we only assert that the data
pointed to by the SYSCTL data pointer has the correct size, hence
there is no easy way to assert types in the C language outside a
C-function.

- Rewrote some code which doesn't pass a constant "access" specifier
when creating dynamic SYSCTL nodes, which is now a requirement.

- Updated "EXAMPLES" section in SYSCTL manual page.

MFC after: 3 days
Sponsored by: Mellanox Technologies


# bd071d4d 28-Sep-2014 Gleb Smirnoff <glebius@FreeBSD.org>

- Remove empty wrappers ether_poll_[de]register_drv(). [1]
- Move polling(9) declarations out of ifq.h back to if_var.h
they are absolutely unrelated to queues.

Submitted by: Mikhail <mp lenta.ru> [1]


# df360178 18-Sep-2014 Gleb Smirnoff <glebius@FreeBSD.org>

- Use if_inc_counter() to increment various counters.
- Do not ever set a counter to a value. For those counters
that we don't increment, but return directly from hardware
create cases in if_get_counter() method.

Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# 09a8241f 30-Aug-2014 Gleb Smirnoff <glebius@FreeBSD.org>

It is actually possible to have if_t a typedef to non-void type,
and keep both converted to drvapi and non-converted drivers
compilable.

o Make if_t typedef to struct ifnet *.
o Remove shim functions.

Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# 4bf50f18 16-Aug-2014 Luigi Rizzo <luigi@FreeBSD.org>

Update to the current version of netmap.
Mostly bugfixes or features developed in the past 6 months,
so this is a 10.1 candidate.

Basically no user API changes (some bugfixes in sys/net/netmap_user.h).

In detail:

1. netmap support for virtio-net, including in netmap mode.
Under bhyve and with a netmap backend [2] we reach over 1Mpps
with standard APIs (e.g. libpcap), and 5-8 Mpps in netmap mode.

2. (kernel) add support for multiple memory allocators, so we can
better partition physical and virtual interfaces giving access
to separate users. The most visible effect is one additional
argument to the various kernel functions to compute buffer
addresses. All netmap-supported drivers are affected, but changes
are mechanical and trivial

3. (kernel) simplify the prototype for *txsync() and *rxsync()
driver methods. All netmap drivers affected, changes mostly mechanical.

4. add support for netmap-monitor ports. Think of it as a mirroring
port on a physical switch: a netmap monitor port replicates traffic
present on the main port. Restrictions apply. Drive carefully.

5. if_lem.c: support for various paravirtualization features,
experimental and disabled by default.
Most of these are described in our ANCS'13 paper [1].
Paravirtualized support in netmap mode is new, and beats the
numbers in the paper by a large factor (under qemu-kvm,
we measured gues-host throughput up to 10-12 Mpps).

A lot of refactoring and additional documentation in the files
in sys/dev/netmap, but apart from #2 and #3 above, almost nothing
of this stuff is visible to other kernel parts.

Example programs in tools/tools/netmap have been updated with bugfixes
and to support more of the existing features.

This is meant to go into 10.1 so we plan an MFC before the Aug.22 deadline.

A lot of this code has been contributed by my colleagues at UNIPI,
including Giuseppe Lettieri, Vincenzo Maffione, Stefano Garzarella.

MFC after: 3 days.


# e2ade3b6 15-Jul-2014 Rick Macklem <rmacklem@FreeBSD.org>

Move the "retry:" label so that the calls to m_pullup() are
not done after the call to m_defrag(). This fixes a problem
where m_pullup() would prepend an mbuf to the list created
by m_defrag() making the chain greater than 32 again.

Tested by: rcarter@pinyon.org
Reviewed by: yongari, jfv
MFC after: 2 weeks


# 58e65495 10-Jul-2014 Mark Johnston <markj@FreeBSD.org>

Correct the setting of the VID in transmit descriptors when hardware VLAN
tagging is enabled. This was broken in r266978.

Reported by: gjb
Tested by: gjb


# af3b2549 27-Jun-2014 Hans Petter Selasky <hselasky@FreeBSD.org>

Pull in r267961 and r267973 again. Fix for issues reported will follow.


# 37a107a4 27-Jun-2014 Glen Barber <gjb@FreeBSD.org>

Revert r267961, r267973:

These changes prevent sysctl(8) from returning proper output,
such as:

1) no output from sysctl(8)
2) erroneously returning ENOMEM with tools like truss(1)
or uname(1)
truss: can not get etype: Cannot allocate memory


# 3da1cf1e 27-Jun-2014 Hans Petter Selasky <hselasky@FreeBSD.org>

Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after: 2 weeks
Sponsored by: Mellanox Technologies


# 8cc64f1e 26-Jun-2014 Jack F Vogel <jfv@FreeBSD.org>

Sync the E1000 shared code with Intel internal, this adds fixes,
and more importantly, new I218 adapter support to the em driver.

MFC after: 1 week


# 46e89834 12-Jun-2014 John Baldwin <jhb@FreeBSD.org>

- Don't compare bus_dma map pointers for static DMA allocations against
NULL to determine if bus_dmamap_unload() or bus_dmamem_free() should be
called. Instead, check the associated bus and virtual addresses.
- Don't clear static DMA maps to NULL.

Reviewed by: jfv


# 9e115290 02-Jun-2014 Marcel Moolenaar <marcel@FreeBSD.org>

Convert em(4) to use the driver API.

Submitted by: Anuranjan Shukla <anshukla@juniper.net>
Obtained from: Juniper Networks, Inc.


# 17885a7b 05-Jan-2014 Luigi Rizzo <luigi@FreeBSD.org>

It is 2014 and we have a new version of netmap.
Most relevant features:

- netmap emulation on any NIC, even those without native netmap support.

On the ixgbe we have measured about 4Mpps/core/queue in this mode,
which is still a lot more than with sockets/bpf.

- seamless interconnection of VALE switch, NICs and host stack.

If you disable accelerations on your NIC (say em0)

ifconfig em0 -txcsum -txcsum

you can use the VALE switch to connect the NIC and the host stack:

vale-ctl -h valeXX:em0

allowing sharing the NIC with other netmap clients.

- THE USER API HAS SLIGHTLY CHANGED (head/cur/tail pointers
instead of pointers/count as before). This was unavoidable to support,
in the future, multiple threads operating on the same rings.
Netmap clients require very small source code changes to compile again.
On the plus side, the new API should be easier to understand
and the internals are a lot simpler.

The manual page has been updated extensively to reflect the current
features and give some examples.

This is the result of work of several people including Giuseppe Lettieri,
Vincenzo Maffione, Michio Honda and myself, and has been financially
supported by EU projects CHANGE and OPENLAB, from NetApp University
Research Fund, NEC, and of course the Universita` di Pisa.


# d480f5b8 02-Nov-2013 Konstantin Belousov <kib@FreeBSD.org>

Fix several issues with the busdma(9) KPI use in the e1000 drivers.
The problems do not affect bouncing busdma in a visible way, but are
critical for the dmar backend.

- The bus_dmamap_create(9) is not documented to take BUS_DMA_NOWAIT flag.
- Unload descriptor map after receive.
- Do not reset descriptor map to NULL, bus_dmamap_load(9) requires
valid map, and also this leaks the map.

Reported and tested by: pho
Approved by: jfv
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# ce3ee1e7 01-Nov-2013 Luigi Rizzo <luigi@FreeBSD.org>

update to the latest netmap snapshot.
This includes the following:
- use separate memory regions for VALE ports
- locking fixes
- some simplifications in the NIC-specific routines
- performance improvements for the VALE switch
- some new features in the pkt-gen test program
- documentation updates

There are small API changes that require programs to be recompiled
(NETMAP_API has been bumped so you will detect old binaries at runtime).

In particular:
- struct netmap_slot now is 16 bytes to support an extra pointer,
which may save one data copy when using VALE ports or VMs;
- the struct netmap_if has two extra fields;

MFC after: 3 days


# 76039bc8 26-Oct-2013 Gleb Smirnoff <glebius@FreeBSD.org>

The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h

Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# c68534f1 12-Aug-2013 Scott Long <scottl@FreeBSD.org>

Update PCI drivers to no longer look at the MEMIO-enabled bit in the PCI
command register. The lazy BAR allocation code in FreeBSD sometimes
disables this bit when it detects a range conflict, and will re-enable
it on demand when a driver allocates the BAR. Thus, the bit is no longer
a reliable indication of capability, and should not be checked. This
results in the elimination of a lot of code from drivers, and also gives
the opportunity to simplify a lot of drivers to use a helper API to set
the busmaster enable bit.

This changes fixes some recent reports of disk controllers and their
associated drives/enclosures disappearing during boot.

Submitted by: jhb
Reviewed by: jfv, marius, achadd, achim
MFC after: 1 day


# 4dc63104 12-Aug-2013 Jack F Vogel <jfv@FreeBSD.org>

Improve the MSIX setup code in the drivers, thanks to Marius for
the changes. Make sure that pci_alloc_msix() does give us the vectors
we need and fall back to MSI when it doesn't, also release any that
were allocated when insufficient.

MFC after: 3 days


# d0913b7f 06-Aug-2013 Jack F Vogel <jfv@FreeBSD.org>

Make the various driver MSIX setup routines fallback to MSI more
gracefully. This change was suggested by Marius Strobl, thank you.

PR: kern/181016
MFC after: ASAP


# a1db87ec 12-Jul-2013 Jack F Vogel <jfv@FreeBSD.org>

Change the E1000 driver option header handling to match the
ixgbe driver. As it was, when building them as a module INET
and INET6 are not defined. In these drivers it does not cause
a panic, however it does result in different behavior in the
ioctl routine when you are using a module vs static, and I
think the behavior should be the same.

MFC after: 3 days


# 4dc07530 09-May-2013 Luigi Rizzo <luigi@FreeBSD.org>

if_lem.c: make sure that lem_rxeof() can drain the entire rx queue
irrespective of the setting of lem_rx_process_limit, while
giving a chance to the taskqueue scheduler to act after
each chunk.
This makes lem_rxeof similar to the one in if_em.c and if_igb.c .

if_lem.c and if_em.c: add a sysctl to manually configure the
'itr' moderation register.

Approved by: Jack Vogel


# 14054781 09-May-2013 Luigi Rizzo <luigi@FreeBSD.org>

simplify the code to initialize the RDT while in netmap mode.


# d61ba752 30-Apr-2013 Luigi Rizzo <luigi@FreeBSD.org>

use netmap_rx_irq() / netmap_tx_irq() to handle interrupts in
netmap mode, removing the logic from individual drivers.

(note: if_lem.c not updated yet due to some other pending modifications)


# 386c110e 15-Apr-2013 Jack F Vogel <jfv@FreeBSD.org>

Corrections to the RX checksum code, make sure its disabled as
well as enabled when necessary. And simplify the checksum routine
itself, adding UDP bit to the test. Thanks to Kevin Lo for pointing
out the problems and code suggestions.


# 3b0b7ffb 03-Apr-2013 Jack F Vogel <jfv@FreeBSD.org>

Correct the multicast handling in the E1000 drivers as was
done in ixgbe, thanks to Mike Karels for this fix. When exiting
promiscuous mode MPE bit was being unconditionally cleared, this
should not be done if we are in MAX multicast groups.


# 6ab6bfe3 20-Feb-2013 Jack F Vogel <jfv@FreeBSD.org>

Refresh on the shared code for the E1000 drivers.
- bear with me, there are lots of white space changes, I would not
do them, but I am a mere consumer of this stuff and if these drivers
are to stay in shape they need to be taken.

em driver changes: support for the new i217/i218 interfaces

igb driver changes:
- TX mq start has a quick turnaround to the stack
- Link/media handling improvement
- When link status changes happen the current flow control state
will now be displayed.
- A few white space/style changes.

lem driver changes:
- the shared code uncovered a bogus write to the RLPML register
(which does not exist in this hardware) in the vlan code,this
is removed.


# ded5ea6a 07-Feb-2013 Randall Stewart <rrs@FreeBSD.org>

This fixes a out-of-order problem with several
of the newer drivers. The basic problem was
that the driver was pulling the mbuf off the
drbr ring and then when sending with xmit(), encounting
a full transmit ring. Thus the lower layer
xmit() function would return an error, and the
drivers would then append the data back on to the ring.
For TCP this is a horrible scenario sure to bring
on a fast-retransmit.

The fix is to use drbr_peek() to pull the data pointer
but not remove it from the ring. If it fails then
we either call the new drbr_putback or drbr_advance
method. Advance moves it forward (we do this sometimes
when the xmit() function frees the mbuf). When
we succeed we always call advance. The
putback will always copy the mbuf back to the top
of the ring. Note that the putback *cannot* be used
with a drbr_dequeue() only with drbr_peek(). We most
of the time, in putback, would not need to copy it
back since most likey the mbuf is still the same, but
sometimes xmit() functions will change the mbuf via
a pullup or other call. So the optimial case for
the single consumer is to always copy it back. If
we ever do a multiple_consumer (for lagg?) we
will need a test and atomic in the put back possibly
a seperate putback_mc() in the ring buf.

Reviewed by: jhb@freebsd.org, jlv@freebsd.org


# 61bfd867 30-Jan-2013 Sofian Brabez <sbz@FreeBSD.org>

Use DEVMETHOD_END macro defined in sys/bus.h instead of {0, 0} sentinel on device_method_t arrays

Reviewed by: cognet
Approved by: cognet


# c6499ecc 04-Dec-2012 Gleb Smirnoff <glebius@FreeBSD.org>

Mechanically substitute flags from historic mbuf allocator with
malloc(9) flags in sys/dev.


# a8de37b0 22-Oct-2012 Eitan Adler <eadler@FreeBSD.org>

This isn't functionally identical. In some cases a hint to disable
unit 0 would in fact disable all units.

This reverts r241856

Approved by: cperciva (implicit)


# 76b75122 21-Oct-2012 Eitan Adler <eadler@FreeBSD.org>

Now that device disabling is generic, remove extraneous code from the
device drivers that used to provide this feature.

Reviewed by: des
Approved by: cperciva
MFC after: 1 week


# 063efed2 28-Sep-2012 Gleb Smirnoff <glebius@FreeBSD.org>

The drbr(9) API appeared to be so unclear, that most drivers in
tree used it incorrectly, which lead to inaccurate overrated
if_obytes accounting. The drbr(9) used to update ifnet stats on
drbr_enqueue(), which is not accurate since enqueuing doesn't
imply successful processing by driver. Dequeuing neither mean
that. Most drivers also called drbr_stats_update() which did
accounting again, leading to doubled if_obytes statistics. And
in case of severe transmitting, when a packet could be several
times enqueued and dequeued it could have been accounted several
times.

o Thus, make drbr(9) API thinner. Now drbr(9) merely chooses between
ALTQ queueing or buf_ring(9) queueing.
- It doesn't touch the buf_ring stats any more.
- It doesn't touch ifnet stats anymore.
- drbr_stats_update() no longer exists.

o buf_ring(9) handles its stats itself:
- It handles br_drops itself.
- br_prod_bytes stats are dropped. Rationale: no one ever
reads them but update of a common counter on every packet
negatively affects performance due to excessive cache
invalidation.
- buf_ring_enqueue_bytes() reduced to buf_ring_enqueue(), since
we no longer account bytes.

o Drivers handle their stats theirselves: if_obytes, if_omcasts.

o mlx4(4), igb(4), em(4), vxge(4), oce(4) and ixv(4) no longer
use drbr_stats_update(), and update ifnet stats theirselves.

o bxe(4) was the most correct driver, it didn't call
drbr_stats_update(), thus it was the only driver accurate under
moderate load. Now it also maintains stats itself.

o ixgbe(4) had already taken stats from hardware, so just
- drop software stats updating.
- take multicast packet count from hardware as well.

o mxge(4) just no longer needs NO_SLOW_STATS define.

o cxgb(4), cxgbe(4) need no change, since they obtain stats
from hardware.

Reviewed by: jfv, gnn


# 126a39ce 23-Sep-2012 Sean Bruno <sbruno@FreeBSD.org>

This patch fixes a nit in the em, lem, and igb driver statistics. Increment
adapter->dropped_pkts instead of if_ierrors because if_ierrors is
overwritten by hw stats collection.

Submitted by: Andrew Boyer <aboyer@averesystems.com>
Reviewed by: Jack F Vogel <jfv@freebsd.org>
MFC after: 2 weeks


# e935190a 18-Sep-2012 Gavin Atkinson <gavin@FreeBSD.org>

Switch some PCI register reads from using magic numbers to using the names
defined in pcireg.h

MFC after: 1 week


# 389c8bd5 18-Sep-2012 Gavin Atkinson <gavin@FreeBSD.org>

Align the PCI Express #defines with the style used for the PCI-X
#defines. This also has the advantage that it makes the names more
compact, iand also allows us to correct the non-uniform naming of
the PCIM_LINK_* defines, making them all consistent amongst themselves.

This is a mostly mechanical rename:
s/PCIR_EXPRESS_/PCIER_/g
s/PCIM_EXP_/PCIEM_/g
s/PCIM_LINK_/PCIEM_LINK_/g

When this is MFC'd, #defines will be added for the old names to assist
out-of-tree drivers.

Discussed with: jhb
MFC after: 1 week


# 252781f4 15-Aug-2012 Jack F Vogel <jfv@FreeBSD.org>

Customer report of a panic on boot due to the old
"m_getjcl:invalid cluster type" that occurred some
time back with the igb driver. This happens often when
booting over the net. I believe the NIC hardware is left
in a warm state when handed over to the driver, and a stray
RX interrupt happens earlier than the code is prepared for
it to happen. This change was verified to fix the problem,
its kind of a bandaid... but it is similar to what was done
in the igb code.


# fcc144ad 07-Jul-2012 Jack F Vogel <jfv@FreeBSD.org>

Change the interface to the Energy Efficient Ethernet (EEE)
setting in the igb and em driver. This was necessitated by
a shared code change that I was given late in the game, a data
type changed from bool to int, in the last update I dealt with
it by a cast, but it was pointed out (thanks jhb) that there
was a potential problem with this. John suggested this safer
approach, and it is fine with me...

MFC after:2 days (to catch the 9.1 update)


# ab5d0362 05-Jul-2012 Jack F Vogel <jfv@FreeBSD.org>

Sync with Intel internal source:
shared code update and small changes in core required
Add support for new i210/i211 devices
Improve queue calculation based on mac type

MFC after:5 days


# 4d8b94d2 10-May-2012 Kevin Lo <kevlo@FreeBSD.org>

Initialize "error" to zero when it's declared in em_setup_receive_ring()


# d8a86483 30-Mar-2012 John Baldwin <jhb@FreeBSD.org>

Fix a few issues with transmit handling in em(4) and igb(4):
- Do not define the foo_start() methods or set if_start in the ifnet if
multiq transmit is enabled. Also, set if_transmit and if_qflush before
ether_ifattach rather than after when multiq transmit is enabled. This
helps to ensure that the drivers never try to mix different transmit
methods.
- Properly restart transmit during resume. igb(4) was not restarting it
at all, and em(4) was restarting even if the link was down and was
calling the wrong method if multiq transmit was enabled.
- Remove all the 'more' handling for transmit completions. Transmit
completion processing does not have a processing limit, so it always
runs to completion and never has more work to do when it returns.
Instead, the previous code was returning 'true' anytime there were
packets in the queue that weren't still in the process of being
transmitted. The effect was that the driver would continuously
reschedule a task to process TX completions in effect running at 100%
CPU polling the hardware until it finished transmitting all of the
packets in the ring. Now it will just wait for the next TX completion
interrupt.
- Restart packet transmission when the link becomes active.
- Fix the MSI-X queue interrupt handlers to restart packet transmission if
there are pending packets in the relevant software queue (IFQ or buf_ring)
after processing TX completions. This is the root cause for the OACTIVE
hangs as if the MSI-X queue handler drained all the pending packets from
the TX ring, nothing would ever restart it. As such, remove some
previously-added workarounds to reschedule a task to poll the TX ring
anytime OACTIVE was set.

Tested by: sbruno
Reviewed by: jfv
MFC after: 1 week


# 64ae02c3 27-Feb-2012 Luigi Rizzo <luigi@FreeBSD.org>

A bunch of netmap fixes:

USERSPACE:
1. add support for devices with different number of rx and tx queues;

2. add better support for zero-copy operation, adding an extra field
to the netmap ring to indicate how many buffers we have already processed
but not yet released (with help from Eddie Kohler);

3. The two changes above unfortunately require an API change, so while
at it add a version field and some spares to the ioctl() argument
to help detect mismatches.

4. update the manual page for the two changes above;

5. update sample applications in tools/tools/netmap

KERNEL:

1. simplify the internal structures moving the global wait queues
to the 'struct netmap_adapter';

2. simplify the functions that map kring<->nic ring indexes

3. normalize device-specific code, helps mainteinance;

4. start exploring the impact of micro-optimizations (prefetch etc.)
in the ixgbe driver.
Use 'legacy' descriptors on the tx ring and prefetch slots gives
about 20% speedup at 900 MHz. Another 7-10% would come from removing
the explict calls to bus_dmamap* in the core (they are effectively
NOPs in this case, but it takes expensive load of the per-buffer
dma maps to figure out that they are all NULL.

Rx performance not investigated.

I am postponing the MFC so i can import a few more improvements
before merging.


# 5644ccec 15-Feb-2012 Luigi Rizzo <luigi@FreeBSD.org>

(This commit only touches code within the DEV_NETMAP blocks)

Introduce some functions to map NIC ring indexes into netmap ring
indexes and vice versa. This way we can implement the bound
checks only in one place (and hopefully in a correct way).

On passing, make the code and comments more uniform across the
various drivers.


# ce9f43b4 12-Jan-2012 Luigi Rizzo <luigi@FreeBSD.org>

clear the pointer after freeing the mbuf. Without that, we
risk a double free if the subsequent mbuf allocation fails.
This bug is not netmap-related and was introduced in rev. 228387


# 467bd5c2 12-Jan-2012 Luigi Rizzo <luigi@FreeBSD.org>

fix the initialization of the rings when netmap is used,
to adapt it to the changes in 228387 .
Now the code is similar to the one used in other drivers.
Not applicable to stable/9 and stable/8


# 6e10c8b8 10-Jan-2012 Luigi Rizzo <luigi@FreeBSD.org>

small code cleanup in preparation for future modifications in
the memory allocator used by netmap. No functional change,
two small bug fixes:
- in if_re.c add a missing bus_dmamap_sync()
- in netmap.c comment out a spurious free() in an error handling block


# 5bbe0c53 07-Jan-2012 Kevin Lo <kevlo@FreeBSD.org>

ether_ifattach() sets if_mtu to ETHERMTU, don't bother set it again

Reviewed by: yongari


# 19d52de5 05-Jan-2012 Robert Watson <rwatson@FreeBSD.org>

When extracting the VLAN tag from if_em and if_lem receive descriptor
rings, copy the whole VLAN tag, not just the VLAN ID. This fixes a
problem in which VLAN priority information was dropped when using
offloaded VLAN processing with these drivers.

Discussed with: jfv, rrs
Sponsored by: ADARA Networks, Inc.
MFC after: 3 days


# 62aca365 11-Dec-2011 Jack F Vogel <jfv@FreeBSD.org>

Last change still had an issue, one more time...


# 133f283b 11-Dec-2011 Jack F Vogel <jfv@FreeBSD.org>

Correct LINT build issues in the ioctl code.


# 96b38ade 10-Dec-2011 Jack F Vogel <jfv@FreeBSD.org>

Fix NETMAP code problem in the build.


# fd33ce41 10-Dec-2011 Jack F Vogel <jfv@FreeBSD.org>

Part 2 of 2 New deltas for the 1G drivers.

There have still been intermittent problems with apparent TX
hangs for some customers. These have been problematic to reproduce
but I believe these changes will address them. Testing on a number
of fronts have been positive.

EM: there is an important 'chicken bit' fix for 82574 in the shared
code this is supported in the core here.
- The TX path has been tightened up to improve performance. In
particular UDP with jumbo frames was having problems, and the
changes here have improved that.
- OACTIVE has been used more carefully on the theory that some
hangs may be due to a problem in this interaction
- Problems with the RX init code, the "lazy" allocation and
ring initialization has been found to cause problems in some
newer client systems, and as it really is not that big a win
(its not in a hot path) it seems best to remove it.
- HWTSO was broken when VLAN HWTAGGING or HWFILTER is used, I
found this was due to an error in setting up the descriptors
in em_xmit.

IGB:
- TX is also improved here. With multiqueue I realized its very
important to handle OACTIVE only under the CORE lock so there
are no races between the queues.
- Flow Control handling was broken in a couple ways, I have changed
and I hope improved that in this delta.
- UDP also had a problem in the TX path here, it was change to
improve that.
- On some hardware, with the driver static, a weird stray interrupt
seems to sometimes fire and cause a panic in the RX mbuf refresh
code. This is addressed by setting interrupts late in the init
path, and also to set all interrupts bits off at the start of that.


# 579a6e3c 05-Dec-2011 Luigi Rizzo <luigi@FreeBSD.org>

add netmap support for "em", "lem", "igb" and "re".

On my hardware, "em" in netmap mode does about 1.388 Mpps
on one card (on an Asus motherboard), and 1.1 Mpps on another
card (PCIe bus). Both seem to be NIC-limited, because
i have the same rate even with the CPU running at 150 MHz.

On the "re" driver the tx throughput is around 420-450 Kpps
on various (8111C and the like) chipsets. On the Rx side
performance seems much better, and i can receive the full
load generated by the "em" cards.

"igb" is untested as i don't have the hardware.


# 6472ac3d 07-Nov-2011 Ed Schouten <ed@FreeBSD.org>

Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.

The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.


# b37e0f6e 29-Jun-2011 John Baldwin <jhb@FreeBSD.org>

- Add read-only sysctls for all of the tunables supported by the igb and
em drivers.
- Make the per-instance 'enable_aim' sysctl truly per-instance by having it
change a per-instance variable (which is used to control AIM) rather
than having all of the per-instance sysctls operate on a single global
variable.

Reviewed by: jfv (earlier version)
MFC after: 1 week


# 3cec53b8 05-May-2011 Jack F Vogel <jfv@FreeBSD.org>

Add an initialization to the error variable, without
this there is a rare return path that bogusly appears
to fail when it should not. Also white space correction.

Thanks to Arnaud Lacombe for noticing the problem.


# 62d8da8c 01-Apr-2011 Jack F Vogel <jfv@FreeBSD.org>

Fix to an error condition case, when an mbuf chain
get's defragged due to a mapping failure the header
pointers will be invalidated and can result in a
TSO or other failure down the line. So, when the
remapping occurs force a retry thru the offload
calculation code. Thanks to Andrew Boyer for discovering
this and cooking up the fix!!


# e61e0b91 01-Apr-2011 Jack F Vogel <jfv@FreeBSD.org>

Change the refresh_mbuf logic slightly, add an inline
to calculate the outstanding descriptors that need to be
refreshed at any time, and use THAT in rxeof to determine
if refreshing needs to be done. Also change the local_timer
to simply fire off the appropriate interrupt rather than
schedule a tasklet, its simpler.

MFC in two weeks


# 3b0a4aef 23-Mar-2011 John Baldwin <jhb@FreeBSD.org>

Do a sweep of the tree replacing calls to pci_find_extcap() with calls to
pci_find_cap() instead.


# 1fd3c44f 18-Mar-2011 Jack F Vogel <jfv@FreeBSD.org>

This delta updates the em driver to version 7.2.2 which has
been undergoing test for some weeks. This improves the RX
mbuf handling to avoid system hang due to depletion. Thanks
to all those who have been testing the code, and to Beezar
Liu for the design changes.

Next the igb driver is updated for similar RX changes, but
also to add new features support for our upcoming i350 family
of adapters.

MFC after a week


# fbfbce8a 19-Jan-2011 Jack F Vogel <jfv@FreeBSD.org>

Fix for kern/152853, pullup at the wrong point
is breaking UDP. Thanks to Petr Lampa for the
patch.


# 5bc0787f 18-Jan-2011 Matthew D Fleming <mdf@FreeBSD.org>

Specify a CTLTYPE_FOO so that a future sysctl(8) change does not need
to rely on the format string.


# 8c49f187 12-Jan-2011 Matthew D Fleming <mdf@FreeBSD.org>

sysctl(9) cleanup checkpoint: amd64 GENERIC builds cleanly.

Commit the Intel drivers.


# 599564e6 11-Jan-2011 Jack F Vogel <jfv@FreeBSD.org>

A couple problems discovered by Andrew Boyer:
- failure code in em_xmit got mangled along the way
and was not properly handling errors.
- local timer code had a leftover UNLOCK call that
should be removed.

MFC after 3 days


# 1ce42f72 03-Dec-2010 Jack F Vogel <jfv@FreeBSD.org>

Correct build error.


# 9d43b64d 03-Dec-2010 Jack F Vogel <jfv@FreeBSD.org>

Small cut and paste bug in flow control string fixed.
Second, correct the discard/refresh_mbufs code to behave
more like igb, there have been panics due to discards and
this should fix them.

MFC after: 3 days


# 12203744 24-Nov-2010 Jack F Vogel <jfv@FreeBSD.org>

The purpose of this change is to add a routine to
disable ASPM L0S and L1 LINK states on 82573, 82574,
and 82583. The theory is that this is behind certain
hangs being experienced by some customers.

Also included a small optimization in the rxeof routine
that was in my internal code.

Change the PBA size for pchlan, it was incorrect.

MFC after: 3 days


# e4c690b4 01-Nov-2010 Jack F Vogel <jfv@FreeBSD.org>

Sync the lem code up with the vlan and other fixes in em.
Delete a unneeded test from the beginning of em_xmit.
CRITICAL: shared code fix for 82574, a mutex might not be
released, this can cause hangs.


# 35928b33 27-Oct-2010 Jack F Vogel <jfv@FreeBSD.org>

In the data setup code for doing offloads the
ip and tcp pointers were not reset after some
pullups. In practice this led to an NFS mount
failure when using UDP reported by Kevin Lo,
thanks Kevin. Fix from yongari, thank you!


# 7deff7f9 25-Oct-2010 Jack F Vogel <jfv@FreeBSD.org>

Bug fix delta to the em driver:
- Chasin down bogus watchdogs has led to an improved
design to this handling, the hang decision takes
place in the tx cleanup, with only a simple report
check in local_timer. Our tests have shown no false
watchdogs with this code.
- VLAN fixes from jhb, the shadow vfta should be per
interface, but as global it was not. Thanks John.
- Bug fixes in the support for new PCH2 hardware.
- Thanks for all the help and feedback on the driver,
changes to lem with be coming shortly as well.


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# 7d9119bd 27-Sep-2010 Jack F Vogel <jfv@FreeBSD.org>

Update code from Intel:
- Sync shared code with Intel internal
- New client chipset support added
- em driver - fixes to 82574, limit queues to 1 but use MSIX
- em driver - large changes in TX checksum offload and tso
code, thanks to yongari.
- some small changes for watchdog issues.
- igb driver - local timer watchdog code was missing locking
this and a couple other watchdog related fixes.
- bug in rx discard found by Andrew Boyer, check for null pointer

MFC: a week


# 8385f4cf 20-Sep-2010 John Baldwin <jhb@FreeBSD.org>

Tweak the stats exported by the e1000 drivers:
- Add a single sysctl procedure to all three drivers to read an arbitrary
register (the register is passed as arg2). Use it to replace existing
routines in igb(4) that used a separate routine for each register, and
to add support for missing stats in em(4) and lem(4).
- Move the 'rx_overruns' and 'watchdog_timeouts' stats out of the MAC stats
section as they are driver stats, not MAC counters.
- Simplify the code that creates per-queue stats in igb(4) to use a single
loop and remove duplicated code.
- Properly read all 64 bits of the 'good octets received/transmitted' in
em(4) and lem(4).
- Actually read the interrupt count registers in em(4), and drop the
'host to card' sysctl stats from em(4) as they are not implemented in
any of the hardware this driver supports.
- Restore several stats to em(4) that were lost in the earlier stats
conversion including per-queue stats.
- Export several MAC stats in em(4) that were exported in igb(4) but not
in em(4).
- Export stats in lem(4) using individual sysctls as in em(4) and igb(4).

Reviewed by: jfv
MFC after: 1 week


# 26c88ee8 07-Sep-2010 Jack F Vogel <jfv@FreeBSD.org>

Code correction in refresh_mbufs, just continuing
without index recalc was wrong.


# d9f1a5aa 07-Sep-2010 Jack F Vogel <jfv@FreeBSD.org>

Tighten up the rx mbuf refresh code, there were some
discrepencies from the igb version which was the target.

Change the message when neither MSI or MSIX are enabled
and a fallback to Legacy interrupts happen, the existing
message was confusing.


# dd20cce1 27-Aug-2010 Pyun YongHyeon <yongari@FreeBSD.org>

Do not allocate multicast array memory in multicast filter
configuration function. For failed memory allocations, em(4)/lem(4)
called panic(9) which is not acceptable on production box.
igb(4)/ixgb(4)/ix(4) allocated the required memory in stack which
consumed 768 bytes of stack memory which looks too big.

To address these issues, allocate multicast array memory in device
attach time and make multicast configuration success under any
conditions. This change also removes the excessive use of memory in
stack.

Reviewed by: jfv


# 880a50b5 27-Aug-2010 Pyun YongHyeon <yongari@FreeBSD.org>

If em(4) failed to allocate RX buffers, do not call panic(9).
Just showing some buffer allocation error is more appropriate
action for drivers. This should fix occasional panic reported on
em(4) when driver encountered resource shortage.

Reviewed by: jfv


# ad1917be 27-Aug-2010 Pyun YongHyeon <yongari@FreeBSD.org>

Do not call voluntary panic(9) in case of if_alloc() failure.

Reviewed by: jfv


# 9886a800 12-Jul-2010 Jack F Vogel <jfv@FreeBSD.org>

Fix for a panic when TX checksum offload is done and
a packet has only a header in the first mbuf, the
checksum code will dereference a pointer into the
non-existing IP header. Do a check for the size and
pullup if needed. Thanks to Michael Tuexen for this
fix.

MFC: asap - should be in 8.1 IMHO


# b7741e7a 17-Jun-2010 Jack F Vogel <jfv@FreeBSD.org>

Two stats were duplicated, thanks to Andrew Boyer
for pointing this out.


# fdbf7e3c 16-Jun-2010 George V. Neville-Neil <gnn@FreeBSD.org>

Move statistics into the sysctl tree making it easier to find
and use them.
Add previously hidden statistics, some of which include interrupt
and host/card communication counters.


# dfc14ce0 16-Jun-2010 Jack F Vogel <jfv@FreeBSD.org>

Changes from John Baldwin adding to last commit,
change rxeof api for poll friendliness, and
eliminate unnecessary link tasklet use. Thanks John!


# ed6b099b 18-May-2010 Marius Strobl <marius@FreeBSD.org>

MFC: r208117

Fix a mismerge in r206001 (MFC'ed to stable/8 in r206211).

PR: 146614
Approved by: jfv (implicit)


# 876ab8b5 15-May-2010 Marius Strobl <marius@FreeBSD.org>

Fix a mismerge in r206001.

PR: 146614
Approved by: jfv (implicit)
MFC afer: 3 days


# 70defa90 14-May-2010 Jack F Vogel <jfv@FreeBSD.org>

Missing fix in lem code to limit WOL to MAGIC,
and made code backward compatible to 7.3 with
conditionals around the buf_ring_free call.


# 46168c54 14-May-2010 Jack F Vogel <jfv@FreeBSD.org>

Small changes preparing for MFC, need to conditionalize
the buf_ring_free call, and lem is missing the WOL change
put into em.


# beef45ff 28-Apr-2010 Jack F Vogel <jfv@FreeBSD.org>

Address the LOD that some are seeing, put the RX lock
back in rxeof (I could see little point in taking it out),
and now release it before the stack entry.

Also, make it so the 82574 does not configure for multiqueue
when its not used in the stack.


# 517ac329 28-Apr-2010 Jack F Vogel <jfv@FreeBSD.org>

Only enable MAGIC WOL by default, MCAST causes
systems to just wakeup immediately in many
environments.


# 1655af0a 28-Apr-2010 Jack F Vogel <jfv@FreeBSD.org>

Change default WOL back to MAGIC only, having
multicast enabled causes problems in man environments.


# ace006de 16-Apr-2010 Jack F Vogel <jfv@FreeBSD.org>

MFC bug fixes to em and igb from HEAD.


# d43a1187 14-Apr-2010 Jack F Vogel <jfv@FreeBSD.org>

Add a missing fragment in the tx msix handler to invoke
another if all work is not done.

Sync the igb driver with changes suggested by yongari and
made in em, these made sense to be in both drivers.


# 3b4e5df8 10-Apr-2010 Jack F Vogel <jfv@FreeBSD.org>

The lock move in rxeof necessitated a couple
more places to do the locking, fixes a panic.


# b4ab02b8 10-Apr-2010 Jack F Vogel <jfv@FreeBSD.org>

Correct broken build.


# 91ce5735 09-Apr-2010 Jack F Vogel <jfv@FreeBSD.org>

A few more changes from yongari:
- code flow in handler could let interrupt be
reenabled when not wanted.
- change where the RX lock is taken to improve
performance.
- adapter->msix is true for MSI systems also,
it needs to explicitly test for 82574, good one :)


# 681ac9c0 09-Apr-2010 Jack F Vogel <jfv@FreeBSD.org>

Incorporate suggested improvements from yongari.

Also, from feedback, make the multiqueue code an
option (EM_MULTIQUEUE) that is off by default.
Problems have been seen with UDP when its on.


# 476310d3 08-Apr-2010 Jack F Vogel <jfv@FreeBSD.org>

Three changes:
- add CRC stripping to the RX side, this was handled
by some obscure code in rxeof previously, its easier
to simply have the hardware strip it now.
- Add back an ALTQ change that slipped between the cracks
- Add an update to the watchdog_time in the xmit code, not
doing this in ixgbe caused problems, think its needed here
as well.


# 500e8f26 07-Apr-2010 Jack F Vogel <jfv@FreeBSD.org>

Important fix got clobbered in the em driver, keeping
VLAN HWFILTER from being used by default, this breaks
stacked pseudo devices, and as it turns out, also breaks
virtual machines that happen to use VLANS (didn't know that
before :). Put the fix back into the em driver, and for good
measure add the same code to the igb driver where it should
have been anyway.


# d05b20c6 05-Apr-2010 Jack F Vogel <jfv@FreeBSD.org>

MFC of the em/igb drivers


# 79c7b719 31-Mar-2010 Marius Strobl <marius@FreeBSD.org>

Hook the identification LEDs of igb(4), lem(4) and em(4) devices up with
led(4) so they can be lit or f.e. made blink via `echo f2 > /dev/led/em0`
for localization purposes.

Approved by: jfv
MFC afer: 1 week (after r205869)


# eaa9db2b 30-Mar-2010 Jack F Vogel <jfv@FreeBSD.org>

Fix lint build problem.


# 8ec87fc5 29-Mar-2010 Jack F Vogel <jfv@FreeBSD.org>

Update to igb and em:

em revision 7.0.0:
- Using driver devclass, seperate legacy (pre-pcie) code
into a seperate source file. This will at least help
protect against regression issues. It compiles along
with em, and is transparent to end use, devices in each
appear to be 'emX'. When using em in a modular form this
also allows the legacy stuff to be defined out.
- Add tx and rx rings as in igb, in the 82574 this becomes
actual multiqueue for the first time (2 queues) while in
other PCIE adapters its just make code cleaner.
- Add RX mbuf handling logic that matches igb, this will
eliminate packet drops due to temporary mbuf shortage.

igb revision 1.9.3:
- Following the ixgbe code, use a new approach in what
was called 'get_buf', the routine now has been made
independent of rxeof, it now does the update to the
engine TDT register, this design allows temporary
mbuf resources to become non-critical, not requiring
a packet to be discarded, instead it just returns and
does not increment the tail pointer.
- With the above change it was also unnecessary to keep
'spare' maps around, since we do not have the discard
issue.
- Performance tweaks and improvements to the code also.

MFC in a week


# 29f2c008 18-Mar-2010 Max Laier <mlaier@FreeBSD.org>

MFC r203834 and r205197: Make ALTQ work for drbr consumers.


# 193cbc4d 13-Feb-2010 Max Laier <mlaier@FreeBSD.org>

Fix drbr and altq interaction:
- introduce drbr_needs_enqueue that returns whether the interface/br needs
an enqueue operation: returns true if altq is enabled or there are
already packets in the ring (as we need to maintain packet order)
- update all drbr consumers
- fix drbr_flush
- avoid using the driver queue (IFQ_DRV_*) in the altq case as the
multiqueue consumer does not provide enough protection, serialize altq
interaction with the main queue lock
- make drbr_dequeue_cond work with altq

Discussed with: kmacy, yongari, jfv
MFC after: 4 weeks


# 7733cf8f 11-Feb-2010 Matt Jacob <mjacob@FreeBSD.org>

MFC a number of changes from head for ISP (203478,203463,203444,202418,201758,
201408,201325,200089,198822,197373,197372,197214,196162). Since one of those
changes was a semicolon cleanup from somebody else, this touches a lot more.


# 29e1e1a3 01-Feb-2010 Jack F Vogel <jfv@FreeBSD.org>

A few minor changes: add altq option header, add missing conditional
around a buf_ring call that will break 7.3, and thanks to Fabien Thomas
add POLLING support for igb and a minor related fix in the em driver.


# 146f2564 29-Jan-2010 Jack F Vogel <jfv@FreeBSD.org>

Fix for kern/141646: when stacking pseudo drivers like
lagg and vlan the vlan attach/detach event is not being
handed down to em, this caused some init code not to run,
and thus VLANs did not work. Ultimately having the event
get propagated would be nice, but for now the solution is
to have HWFILTER off by default, when this is the case
VLANs will work, ifconfig can be used to turn it on and
then get HW tag filtering.


# ac95ee29 26-Jan-2010 Jack F Vogel <jfv@FreeBSD.org>

Missing a fix for the new watchdog handling.


# a69ed8df 26-Jan-2010 Jack F Vogel <jfv@FreeBSD.org>

Update the 1G drivers, shared code sync with Intel,
igb now has a queue notion that has a single interrupt
with an RX/TX pair, this will reduce the total interrupts
seen on a system. Both em and igb have a new watchdog
method. igb has fixes from Pyun Yong-Hyeon that have
improved stability, thank you :)

I wish to MFC this for 7.3 asap, please test if able.


# c2ede4b3 07-Jan-2010 Martin Blapp <mbr@FreeBSD.org>

Remove extraneous semicolons, no functional changes.

Submitted by: Marc Balmer <marc@msys.ch>
MFC after: 1 week


# 4edd8523 07-Dec-2009 Jack F Vogel <jfv@FreeBSD.org>

Resync with Intel versions of both the em and igb
drivers. These add new hardware support, most importantly
the pch (i5 chipset) in the em driver. Also, both drivers
now have the simplified (and I hope improved) watchdog
code. The igb driver uses the new RX cleanup that I
first implemented in ixgbe.

em - version 6.9.24
igb - version 1.8.4


# b9a65dad 11-Sep-2009 Jack F Vogel <jfv@FreeBSD.org>

This fixes kern/138516, an mbuf leak in both the em
and igb driver, when a transmit fails the packet/mbuf
was not being requeued. Thanks to those that pointed
this problem out.

Approved by: re


# b53aa98e 10-Sep-2009 Jack F Vogel <jfv@FreeBSD.org>

Fix build complaint from previous checkin


# f3288884 10-Sep-2009 Jack F Vogel <jfv@FreeBSD.org>

Fix for pr 138516
An mbuf is not requeued when a xmit fails.

MFC: 3 days


# 67784314 08-Sep-2009 Poul-Henning Kamp <phk@FreeBSD.org>

Revert previous commit and add myself to the list of people who should
know better than to commit with a cat in the area.


# b34421bf 08-Sep-2009 Poul-Henning Kamp <phk@FreeBSD.org>

Add necessary include.


# 67a43534 19-Aug-2009 Xin LI <delphij@FreeBSD.org>

MFC r196386:

Temporarily enhance em(4) and igb(4) hack to take account for IFF_NOARP.
Without this changeset there will be no way to prevent these NICs from
sending ARP, which is harmful in server farms that is configured as
"Direct Server Return" behind a load balancer.

A better fix would remove the whole hack completely but it would be
later than 8.0-RELEASE.

Reviewed by: jfv, yongari
Approved by: re (kib)


# 1886a691 19-Aug-2009 Xin LI <delphij@FreeBSD.org>

Temporarily enhance em(4) and igb(4) hack to take account for IFF_NOARP.
Without this changeset there will be no way to prevent these NICs from
sending ARP, which is harmful in server farms that is configured as
"Direct Server Return" behind a load balancer.

A better fix would remove the whole hack completely but it would be
later than 8.0-RELEASE.

Reviewed by: jfv, yongari
Approved by: re (kib)


# 45289e2d 24-Jul-2009 Jack F Vogel <jfv@FreeBSD.org>

Improvement on the last change, this gives a precise
way to tell the one and only interface that a vlan
event is for. Thanks to John Baldwin for the patch.

Approved by: re


# 387424df 24-Jul-2009 Jack F Vogel <jfv@FreeBSD.org>

This delta fixes two bugs:
- When a vlan event occurs a check was not made that
the event was actually for the interface, thus resulting
in a panic. All three drivers have this vulnerability. Add
a check for this condition.
- Secondly, there was a duplicate buf_ring free in the em
driver resulting in a panic on unload. Remove.

Approved by: re


# 562a924d 29-Jun-2009 Jack F Vogel <jfv@FreeBSD.org>

Type problem when FreeBSD is in a virtualized environment, the
result was when the RX index wrapped it was converted into some
sort of gibberish and written into the RDT register, effectively
killing the RX side of the thing :)

Approved by: re


# eb956cd0 26-Jun-2009 Robert Watson <rwatson@FreeBSD.org>

Use if_maddr_rlock()/if_maddr_runlock() rather than IF_ADDR_LOCK()/
IF_ADDR_UNLOCK() across network device drivers when accessing the
per-interface multicast address list, if_multiaddrs. This will
allow us to change the locking strategy without affecting our driver
programming interface or binary interface.

For two wireless drivers, remove unnecessary locking, since they
don't actually access the multicast address list.

Approved by: re (kib)
MFC after: 6 weeks


# 9d81738f 24-Jun-2009 Jack F Vogel <jfv@FreeBSD.org>

Updates for both the em and igb drivers, add support
for multiqueue tx, shared code updates, new device
support, and some bug fixes.


# 1abcdbd1 30-May-2009 Attilio Rao <attilio@FreeBSD.org>

When user_frac in the polling subsystem is low it is going to busy the
CPU for too long period than necessary. Additively, interfaces are kept
polled (in the tick) even if no more packets are available.
In order to avoid such situations a new generic mechanism can be
implemented in proactive way, keeping track of the time spent on any
packet and fragmenting the time for any tick, stopping the processing
as soon as possible.

In order to implement such mechanism, the polling handler needs to
change, returning the number of packets processed.
While the intended logic is not part of this patch, the polling KPI is
broken by this commit, adding an int return value and the new flag
IFCAP_POLLING_NOCOUNT (which will signal that the return value is
meaningless for the installed handler and checking should be skipped).

Bump __FreeBSD_version in order to signal such situation.

Reviewed by: emaste
Sponsored by: Sandvine Incorporated


# 8222d237 13-May-2009 Kip Macy <kmacy@FreeBSD.org>

Call drbr_stats_update to update ifp stats directly when we bypass the buf_ring on transmit


# 8f781951 27-Apr-2009 Kip Macy <kmacy@FreeBSD.org>

fix typo in conditional


# b3e6cec7 27-Apr-2009 Kip Macy <kmacy@FreeBSD.org>

collapse the two em_start_locked routines in to one


# eb60d221 27-Apr-2009 Jack F Vogel <jfv@FreeBSD.org>

Correct fat finger mistake


# baf2572c 27-Apr-2009 Jack F Vogel <jfv@FreeBSD.org>

Thanks for Michael Tuexen for tracking down a path where
the watchdog timer was not being rearmed in txeof, and also
a missing case in the new code.

MFC after: 2 weeks


# 1af19022 23-Apr-2009 Kip Macy <kmacy@FreeBSD.org>

fix typo


# 4b4945b6 23-Apr-2009 Kip Macy <kmacy@FreeBSD.org>

fix panic when using msix

Pointed out by Nate Whitehorn


# 173ff3e2 23-Apr-2009 Kip Macy <kmacy@FreeBSD.org>

Make sure the ALTQ case is handle correctly by using drbr_dequeue


# db90f94b 16-Apr-2009 Kip Macy <kmacy@FreeBSD.org>

call base if_qflush routine to flush if_snd


# f2502470 13-Apr-2009 Kip Macy <kmacy@FreeBSD.org>

- define em_transmit and em_qflush
- make buF_ring usage conditional but enabled by default

Reviewed by: jfv


# d035aa2d 09-Apr-2009 Jack F Vogel <jfv@FreeBSD.org>

This delta syncs the em and igb drivers with Intel,
adds header split and SCTP support into the igb driver.
Various small improvements and fixes.

MFC after: 2 weeks


# c84f4dc8 07-Dec-2008 Andrew Thompson <thompsa@FreeBSD.org>

Restore opt_inet.h include which was lost in the last commit.


# daf9197c 26-Nov-2008 Jack F Vogel <jfv@FreeBSD.org>

This delta is primarily a fix for es2lan devices that
will sometimes fail to initialize problem due to a lock
contention with management hardware. However, in order to
deliver that fix it was necessary to take a shared code
update as a whole, and this required scattered changes in
the core code to be compatible.

The em driver now has VLAN HW support added as the igb
driver had previously.

MFC after: ASAP - in time for 7.1 RELEASE


# edb04458 06-Nov-2008 Bjoern A. Zeeb <bz@FreeBSD.org>

Hide AF_INET specific ioctl handling under #ifdef INET.

MFC after: 2 months


# d7f03759 19-Oct-2008 Ulf Lilleengen <lulf@FreeBSD.org>

- Import the HEAD csup code which is the basis for the cvsmode work.


# 8cfa0ad2 30-Jul-2008 Jack F Vogel <jfv@FreeBSD.org>

Merge of the source for igb and em into dev/e1000, this
proved to be necessary to make the static drivers work
in EITHER/OR or BOTH configurations. Modules will still
build in sys/modules/igb or em as before.

This also updates the igb driver for support for the 82576
adapter, adds shared code fixes, and etc....

MFC after: ASAP