History log of /openbsd-current/sys/dev/pci/if_ixl.c
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 1.101 24-May-2024 jsg

remove unneeded includes; ok miod@


# 1.100 08-May-2024 jan

ixl(4): force mss of tso packets in hardware supported range.

ok bluhm@


# 1.99 07-May-2024 jan

Additional check for TSO packets with 0 MSS.

Tested by bluhm

ok bluhm@


# 1.98 12-Apr-2024 jan

remove useless includes of ip.h and ip6.h

ok bluhm


Revision tags: OPENBSD_7_5_BASE
# 1.97 14-Feb-2024 bluhm

Check IP length in ether_extract_headers().

For LRO with ix(4) it is necessary to detect ethernet padding.
Extract ip_len and ip6_plen from the mbuf and provide it to the
drivers.
Add extended sanitity checks, like IP packet is shorter than TCP
header. This prevents offloading to network hardware with bougus
packets.
Also iphlen of extracted headers contains header length for IPv4
and IPv6, to make code in drivers simpler.

OK mglocker@


# 1.96 13-Feb-2024 bluhm

Analyse header layout in ether_extract_headers().

Several drivers need IPv4 header length and TCP offset for checksum
offload, TSO and LRO. Accessing these fields directly caused crashes
on sparc64 due to misaligned access. It cannot be guaranteed that
IP and TCP header is 4 byte aligned in driver level. Also gcc 4.2.1
assumes that bit fields can be accessed with 32 bit load instructions.

Use memcpy() in ether_extract_headers() to get the bits from IPv4
and TCP header and store the header length in struct ether_extracted.
From there network drivers can esily use it without caring about
alignment and bit shift. Do some sanity checks with the length
values to prevent that invalid values from evil packets get stored
into hardware registers. If check fails, clear the pointer to the
header to hide it from the driver. Add debug prints that help to
figure out the reason for bad packets and provide information when
debugging drivers.

OK mglocker@


# 1.95 07-Jan-2024 bluhm

In ixl(4) attach, initialize mutex before using it.

Function ixl_get_link_status() calls ixl_set_link_status() which
locks sc_link_state_mtx. Move initilization of mutex before calling
ixl_get_link_status(). This makes witness happy.

Bug reported and fix tested by Hrvoje Popovski; OK miod@


# 1.94 30-Dec-2023 bluhm

Set ixl(4) IXL_TX_PKT_DESCS to 8.

Mark Patruck has reported problems with ixl revision 1.90 TSO diff.
He uses ixl device passthrough from Linux via KVM to OpenBSD guest.
After a few hours of operation, interface locks up with oactive.
The problem also occures with TSO disabled, after the TSO diff had
been commited. deraadt@ has seen similar problems with ixl interface
on sparc64.
Changing IXL_TX_PKT_DESCS back to the original value 8 fixes the
lockup and even TSO on the hardware still works. FreeBSD and NetBSD
also use this value. The 32 was copied from ix(4) TSO diff and is
not necessary for ixl(4).

debugged with jan@; lot of bisecting and testing by Mark Patruck
OK mglocker@ patrick@


# 1.93 10-Nov-2023 bluhm

Make ifq and ifiq interface MP safe.

Rename ifq_set_maxlen() to ifq_init_maxlen(). This function neither
uses WRITE_ONCE() nor a mutex and is called before the ifq mutex
is initialized. The new name expresses that it should be used only
during interface attach when there is no concurrency.

Protect ifq_len(), ifq_empty(), ifiq_len(), and ifiq_empty() with
READ_ONCE(). They can be used without lock as they only read a
single integer.

OK dlg@


# 1.92 20-Oct-2023 jan

Add missing tcps_outpkttso counter to ixl(4) TSO setup.

ok bluhm@


# 1.91 20-Oct-2023 jan

Improve bad comment.

pointed out by kn@

ok kn@


# 1.90 19-Oct-2023 jan

Enable TCP Segmentation Offloading for ixl(4)

Tested on amd64 and sparc64.
Also tested by bluhm@.

ok bluhm@


Revision tags: OPENBSD_7_4_BASE
# 1.89 29-Sep-2023 bluhm

Replace kernel lock with mutex in ixl(4) media status.

Witness found that sc_atq_mtx mutex is held when kernel lock is
acquired. This might cause a deadlock. Protect sc_media_status
and sc_media_active with the link state mutex instead. Global
fields ifm->ifm_status and ifm->ifm_active are still protected by
kernel lock.

OK tobhe@


# 1.88 19-Jul-2023 jan

Protect ixl(4) admin queue with mutex(9).

with tweaks from bluhm

tested by bluhm

ok bluhm@


Revision tags: OPENBSD_7_3_BASE
# 1.87 06-Feb-2023 jan

consolidate mbuf header parsing on device driver layer

with tweaks from mvs@, mpi@, dlg@, naddy@ and bluhm@

"go for it" deraadt@

ok naddy@, mvs@


# 1.86 26-Jan-2023 deraadt

backing "consolidate mbuf header parsing on device driver layer"
easily repeatable ASSERT happens seconds after starting compiles over nfs.


# 1.85 24-Jan-2023 jan

consolidate mbuf header parsing on device driver layer

with tweaks from mvs@, mpi@ and dlg@

ok mvs@, dlg@


Revision tags: OPENBSD_7_2_BASE
# 1.84 05-Aug-2022 bluhm

The netlock for SIOCSIFMEDIA and SIOCGIFMEDIA ioctl is not necessary.
Legacy drivers run with kernel lock, interface media is MP safe or
has kernel lock. Assert kernel lock in ix(4) and ixl(4).
OK kettenis@


Revision tags: OPENBSD_7_1_BASE
# 1.83 11-Mar-2022 mpi

Constify struct cfattach.


# 1.82 10-Feb-2022 bluhm

Enable receive checksum offloading on ixl(4) network interfaces.
from jan@; test and OK dlg@


# 1.81 09-Feb-2022 dlg

enable hw vlan tag handling in the rx path too.

also tested on both x86 and sparc64.


# 1.80 09-Feb-2022 dlg

enable hardware vlan tagging.

tested on sparc64 and x86


# 1.79 08-Feb-2022 dlg

bring back IPv4, TCP4/6 and UDP4/6 checksum offloading.

this was first introduced in r1.176 by jan@. this diff includes two
fixes to that implementation.

the most important one is to parse the ip and tcp headers before a
possible call to m_defrag. if an l4 offload is requested, it's only
requested by the stack when the payload is correctly aligned and
with each header contiguous in memory. this means you can use
m_getptr and cast the packet data to the relevant headers to read
them directly because that's what the stack does when it's working
on them. this makes it cheap to work on them too.

however, if you m_defrag, it ignores the alignment and ends up
making it unsafe to dereference the ip and tcp/udp payloads on
strict alignment architectures. if we want to look at the headers
after m_defrag, we'd likely have to copy them onto the stack first.

the other fix is to reset the offload bits between packets in the
loop in ixl_start.

another difference is that this code skips parsing the packet if
no checksum offload is requested.

tests and a tweak by bluhm@ to actually use the offloading
tested by me on sparc64 and x86 boxes

ok bluhm@ jmatthew@


# 1.78 09-Jan-2022 jsg

spelling
feedback and ok tb@ jmc@ ok ratchov@


# 1.77 27-Nov-2021 deraadt

previous commit causes gcc to perform an unaligned access to the tcphdr
(at least on sparc64) since it accesses the bitfield using an "int sized"
instructions, rather than the minimally sized byte instruction. This is
permitted by the language laywers who probably prefer we change the tcphdr
in every packet. It is not clear how to convince gcc to avoid this behaviour,
and a week of futzing hasn't found fast path solutions yet. In the meantime
the tree may not be broken.


# 1.76 09-Nov-2021 jan

Add IPv4, TCP4/6 and UDP4/6 checksum offloading.

ok jmatthew@


Revision tags: OPENBSD_7_0_BASE
# 1.75 23-Jul-2021 jmatthew

pci_intr_msix_count() is the function that drivers using multiple MSI-X
vectors use to decide whether to use MSI-X, so make it return 0 if MSI
is not enabled for the device.

fixes problems with ix(4) on older amd64 hardware and current riscv64
ok kettenis@ dlg@


Revision tags: OPENBSD_6_9_BASE
# 1.74 26-Mar-2021 jan

Add PCI ID for Intel X710 10G SFP+ NIC

ok patrick@


# 1.73 26-Feb-2021 jan

Add missing PCI product IDs for x710 10GBase-T into ixl(4)

OK phessler


# 1.72 25-Jan-2021 dlg

if the rx descriptor reports the rss hash, use it for the mbuf flowid.

ok jmatthew@


# 1.71 22-Dec-2020 dlg

name the rx rings like ix does for systat mb


# 1.70 12-Dec-2020 jan

Rename the macro MCLGETI to MCLGETL and removes the dead parameter ifp.

OK dlg@, bluhm@
No Opinion mpi@
Not against it claudio@


# 1.69 02-Nov-2020 dlg

add support for ctl reads and writes on 710 chips with API < 1.5

this gets rid of some annoying errors when bringing such an interface
up, but more importantly is allows RSS to work on these boards with
older firmware.

ok jmatthew@


Revision tags: OPENBSD_6_8_BASE
# 1.68 16-Jul-2020 dlg

use a mutex to serialise the test and set of ifp->if_link_state.

this was serialised by NET_LOCK, but now i get link state change
information in an interrupt context, so i shouldn't (can't) do that
anymore.

ok jmatthew@


# 1.67 16-Jul-2020 dlg

sc_atq_mtx is unused, so get rid of it


# 1.66 12-Jul-2020 dlg

it's not an error if the API doesnt support reading sfp/qsfp stuff.

it's just not supported. the manpage says why.

ok sthen@ deraadt@


# 1.65 11-Jul-2020 dlg

don't complain if the API is too low to support "query phy types".

it just means we won't know what to show in ifconfig media output,
but that's not a huge deal. there's still some more issues around
api versions and driver support that we're working on though.

ok sthen@ jmatthew@


# 1.64 10-Jul-2020 patrick

Change users of IFQ_SET_MAXLEN() and IFQ_IS_EMPTY() to use the "new" API.

ok dlg@ tobhe@


# 1.63 07-Jul-2020 dlg

fix a copy pasto.

from netbsd if_ixl.c r1.39 by yamaguchi


# 1.62 07-Jul-2020 dlg

don't try and be too clever in the kstat update timeout.


# 1.61 07-Jul-2020 dlg

remove some old code from a previous version of the kstat diff.

ixl(4) is only enabled on 64bit archs, so we don't need a 32bit
fallback for the 48 and 64 bit counter reads.


# 1.60 07-Jul-2020 dlg

add kstat support for reading hardware counters.

this chip is annoyingly complicated, which is reflected in how
complicated it is to read counters off the chip. while we just use
ixl as a normal network interface, the chip is capable of being a
switch with physical ports, virtual ports and all sorts of other
functionality, and there are counters in different places for all
these different pieces. in our simple setup the driver interface
is mapped to a single physical port which we talk to via a single
virtual switch interface. this diff adds counters on each interface
for the physical port and for the virtual switch interface (vsi).

the port counters show what the hardware is doing, while the vsi
counters show how the driver is interacting with the chip. for
things like packet counters, these numbers tend to correlate strongly,
but there are some differences. if the chip drops packets cos there's
no descriptors on the rx ring, that's shown in the vsi counters.
problems talking to the physical network (eg, packet corruption off
the wire) are reported on the port counters.

on top of the chip just being complicated, reading the counters is
a complicated activity on its own. because the counters can be read
by multiple consumers in a virtualised environment, the counters
never really get reset. they are also stored in annoyingly small
fields. this means you basically have to poll the chip periodically
and calculate differences between the polls to avoid losing numbers
if they overflow too quickly.


# 1.59 26-Jun-2020 dlg

whitespace fixes, no functional change.


# 1.58 26-Jun-2020 dlg

fix link state handling so we can see link go both up and down.


# 1.57 25-Jun-2020 dlg

LUT writes go through actual regsiters on 710s, not ctl writes.

found by jmatthew@


# 1.56 25-Jun-2020 dlg

add rss/toeplitz support for 710 chips.

this basically distributes incoming packets over the rx rings, where
without this they would all land on ring 0.

note that the 722 stuff is stubbed out at the moment. i don't have
an x722 to test with, so it's hard to get motivated to write the
code for it.

this is based on stuff supplied by christiano haesbaert.


# 1.55 25-Jun-2020 dlg

use the ixl_chip struct to store different rss_hena settings for 710/722


# 1.54 25-Jun-2020 dlg

add definitions for rss bits.

based on info from christiano haesbaert


# 1.53 25-Jun-2020 dlg

we're close to a point where the differences between 710s and 722s matter.

this adds a struct ixl_chip, which should hold the differences in
functionality between 710s and 722s. this adds which type of chip
each product is to the ixl_devices array.

based on stuff from christiano haesbaert


# 1.52 25-Jun-2020 dlg

use intrmap to set up multiple queues across multiple cpus.

ixl(4) is only enabled on amd64 and sparc64, and both of them now
support pci_intr_establish_cpu(), so it is safe to apply this.

a few things to note:

- the chip only supports a power of 2 number of queues, (ie, 1, 2,
4, 8, etc), so this also tests the INTRMAP_POWEROF2 flag to
intrmap_create. i tested this on a box with 6 cpus and it did the
right thing.
- the chip can support a lot of vectors, but we're limiting it to
8 for now.
- rss with toeplitz is not implemented yet, so all rxed packets end
up on the 0th queue for now.

jmatthew@ had done most of the work already, and christiano haesbaert
provided some hints to motivate me to work on this bit of it.

tested by jmatthew@ on sparc64
ok jmatthew@


# 1.51 24-Jun-2020 dlg

set IFQ_SET_MAXLEN to the number of slots on the tx ring, not 1.

this effectively enables tx mitigation on this chip. hrvoje popovski
tested it and discovered it adds about 20% to forwarding performance
on his test machine, and brings it more in line with ix(4) performance.

jmatthew thinks i copied setting it to 1 from myx, but myx resets
it to a proper value later on when it figures out what the chip is
capable of. how embarrassment.


# 1.50 24-Jun-2020 dlg

get rid of the per device sff lock because we only use the global one.

no functional change


# 1.49 21-Jun-2020 jmatthew

The onboard interfaces on T7/S7 machines don't provide a valid MAC address
for themselves, so use the "local-mac-address" Open Firmware property
instead, as done in ix(4).

ok dlg@


# 1.48 09-May-2020 jmatthew

Use MSI-X interrupts where available. The first vector is used for
events and command completions as that's the only vector they can go to.
tx/rx queues are mapped to subsequent vectors.

ok mpi@ dlg@


Revision tags: OPENBSD_6_7_BASE
# 1.47 22-Apr-2020 mpi

Use I40E_QUEUE_TYPE_EOL instead of hardcoding its value localy.

ok jmatthew@


# 1.46 19-Nov-2019 yasuoka

Correct the link speed constant variables. From Shoichi Yamaguchi.

ok jmatthew


Revision tags: OPENBSD_6_6_BASE
# 1.45 02-Oct-2019 yasuoka

When dequeuing an aqb from the live queue and move it to the idle queue,
remove it from the live queue. Found by Shoichi Yamaguchi.

ok dlg


# 1.44 30-Sep-2019 dlg

use the right values when figuring out which if_baudrate to use.

i was basically using ISSET(bitfield, shift) instead of
ISSET(bitfield, 1 << shift), so things didn't line up properly.

before this the baudrate would come out as 0, but lacp mode in
trunk(4) wants a non-zero if_baudrate value to help pick the best
aggregator to use. if the port reports 0, it doesn't get selected.
this probably explains why trunk(4) doesn't like doing lacp on
virtual ethernet interfaces too, but at least i can get a number
that actually means something on a hardware interface. alternatively,
use aggr(4), which doesn't care about the baudrate.

ok jmatthew@


# 1.43 27-Aug-2019 dlg

don't check the page number was set correctly.

there are some extremely terrible modules out there that violate
specs badly. some SFP+ DACs i have don't support page switching,
and always report page 0xff. let userland look at them anyway.


# 1.42 27-Aug-2019 dlg

make SIOCGIFSFFPAGE support QSFP modules better.

basically the "read phy register" admin op uses the "address" field
on SFP modules for the i2c address, but uses that same field for
the "sff page number" for QSFP. this wraps the calls to the admin
op with sfp and qsfp specific code, and chooses between them depending
on what type of phy the fw reports.

tested by me and jmatthew@ pretty hard


# 1.41 29-Jul-2019 jmatthew

remove some VF bits now that iavf(4) exists

ok dlg@


# 1.40 21-Jul-2019 dlg

apply backpressure when ifiq says the stack is getting busy


# 1.39 04-Jul-2019 jmatthew

when the mac address changes, update the mac/vlan filters accordingly.

ok dlg@


# 1.38 04-May-2019 jsg

fix array bounds check in ixl_search_link_speed()
ok dlg@


# 1.37 15-Apr-2019 visa

Use timeout_del_barrier(9) instead of timeout_del(9) followed by
conditional timeout_barrier(9).

OK kn@ dlg@


Revision tags: OPENBSD_6_5_BASE
# 1.36 10-Apr-2019 phessler

Add support for X722 to ixl(4)

full list of pci ids from sthen@

OK sthen@, jmatthew@, deraadt@


# 1.35 10-Apr-2019 dlg

implement support for SIOCGIFSFFPAGE

this relies on some firmware commands for accessing the i2c bus
that are not available until a relatively recent API version. our
nics using API 1.4 and 1.5 don't handle the command, but the API
1.7 nic we have is happy to talk to the module that is plugged into
it.

xl710 cards (the 40g ones with a qsfp+ connector) can be split up
into 4 functions that represent lanes on a single port. you can get
qsfp+ to 4x sfp+ cables so you can use the different lanes/functions
as completely independent interfaces. however, because each lane/
function is one port and therefore module, we need to serialise
access to the module by at least the port. this is to prevent
concurrent reads of different pages of the one module from stepping
on each other. i took the easy path and made a single ixlsff lock,
which is at least conservative.


# 1.34 01-Apr-2019 jmatthew

Enable the full pre-reset code path again, now that we've fixed the cause
of the pcie errors.

ok dlg@


# 1.33 01-Apr-2019 jmatthew

Don't use a prefetchable mapping for the registers - the controller only
accepts specific read and write widths, so prefetching and write combining
can cause it to generate pcie errors.

ok dlg@


# 1.32 29-Mar-2019 dlg

remove ifiq_barrier in ixl_down cos ifiq tasks don't use nic resources.

ixl_down frees all the dma memory used by the rings, but that memory
isn't used by ifiqs, so there's no need to wait for them to finish


# 1.31 22-Mar-2019 dlg

back out 1.28

i misread the doco, having promisc vlan reception is what we want.


# 1.30 22-Mar-2019 dlg

use a cond in ixl_atq_exec to wait for a command to be completed.

the main change is to not run ixl_atq_done, cos the interrupt handler
is supposed to do that for us.


# 1.29 22-Mar-2019 dlg

don't blindly complete admin queue entries

there's a bit in the flags field that the firmware sets when the
command is done, so check to decide if the command is ready to be
completed. this in turn makes ixl_iff work.

"oh" jmatthew@


# 1.28 21-Mar-2019 dlg

don't set VLAN things when configuring promisc

the doco says vlan things should only be set if you're manipulating
something on a specific vlan, otherwise the config applies to
everything, which is how the stack wants things to work at the
moment.

there's still something wrong in here, but let's get this out of
the way first.


# 1.27 21-Mar-2019 dlg

run event callbacks directly in the atq processing

previously events were queued on an SLIST, but multiple link state
events could fire with the same callback. this corrupted the SLIST
and effectively caused an infinite loop.

ok jmatthew@


# 1.26 12-Mar-2019 jmatthew

Until we can figure out why it causes NMIs on some machines, skip the
pre-reset steps described in Intel's datasheet and also their driver code.

ok dlg@


# 1.25 06-Mar-2019 jmatthew

Ignore rx interrupts if the interface isn't supposed to be running.
In some situations the pxe rom seems to leave rx interrupts
pending, so we get them as soon as we turn interrupts on.
Trying to process rx interrupts before we've allocated an rx ring
leads to crashes.

ok dlg@


# 1.24 01-Mar-2019 dlg

use ifiq_input instead of if_input

call if_rxr_livelocked if ifiq_input says to slow down


# 1.23 26-Feb-2019 dlg

tweak the mbuf loading in the tx path so it's easier to read.

count mbuf load failures as output errors so i can see if that's
a problem (it's not, but at least i can see it isn't now).


# 1.22 26-Feb-2019 dlg

put the rx buffer at the end of the cluster.

makes m_prepend later less likely to allocate a new mbuf.


# 1.21 26-Feb-2019 dlg

avoid a deadlock in ixl_down when calling ifq_barrier.

this is particularly noticable on sparc64 when you reboot.

ok jmatthew@


# 1.20 24-Feb-2019 dlg

get rid of an unused softc member


# 1.19 01-Feb-2019 jmatthew

fix up calculation of our physical function id, making the second port
on dual port cards work much better.

ok dlg@


# 1.18 29-Jan-2019 dlg

get rid of some more debug printfs

suggested by jmatthew@


# 1.17 29-Jan-2019 dlg

don't need to print the base queue number.

ok jmatthew@


# 1.16 22-Jan-2019 jmatthew

Increase hardmtu to the maximum according to the datasheet and set the rx
packet size limit to match so jumbos actually work. Larger packets are
split across multiple buffers on the ring, so the buffers themselves stay
the same size.

ok dlg@


# 1.15 22-Jan-2019 jmatthew

Add and remove mac filters for multicast addresses.

ok dlg@


# 1.14 21-Jan-2019 jmatthew

also add a mac filter that ignores vlans for the broadcast address, so we can
see arp requests on vlans, among other things.

ok dlg@


# 1.13 20-Jan-2019 jmatthew

Replace the default mac filter with one that ignores vlans, and enable
promisc vlan mode so we can see vlan tagged traffic.

ok dlg@


# 1.12 20-Jan-2019 jmatthew

implement ixl_rxrinfo, dynamically allocating the buffer to prepare for
multiple receive rings at some point in the future.

ok dlg@


# 1.11 20-Jan-2019 jmatthew

Handle link state change interrupts by issuing IXL_AQ_OP_PHY_LINK_STATUS
to the admin queue. We don't need to wait for or process the reply,
because the existing admin reply queue processing already does it.

ok dlg@


# 1.10 19-Jan-2019 jmatthew

actually set CAUSE_ENA on the rx and tx queues, and re-enable interrupts
at the start of the interrupt handler. now it works well enough to commit
over.

ok dlg@


# 1.9 18-Jan-2019 jmatthew

pack hmc bits in the right order

ok dlg@


# 1.8 18-Nov-2018 jmatthew

request notification of link state changes, which helps us detect
link when it takes a bit longer to establish.

ok dlg@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.7 21-Dec-2017 dlg

add some ifq and ifiq barriers in ixl_down.

move rxfill in ixl_up so the qtail is only written once.


# 1.6 21-Dec-2017 dlg

now that we have multiple input queues in ifnet structs, use them.

for now we still only have one set of tx and rx rings, but sets us up to
bump that number.


# 1.5 15-Dec-2017 dlg

put where im up to into the tree so jmatthew@ can look at it.


# 1.4 29-Nov-2017 dlg

turns out you dont have to configure a vsi as the default in a veb.

what a waste of two days.


# 1.3 29-Nov-2017 dlg

let this build on sparc64 again.


# 1.2 28-Nov-2017 dlg

remove the #if 0ed out ixl_add_veb now that cvs has backed it up

im pretty sure we dont need it if we're just using the chip as a
single ethernet port.


# 1.1 28-Nov-2017 dlg

add ixl(4) for the "Intel Ethernet 700 Series"

this doesn't work yet, but it very recently got too big to hack on
without cvs to help me manage further changes to it.

ok deraadt@


# 1.100 08-May-2024 jan

ixl(4): force mss of tso packets in hardware supported range.

ok bluhm@


# 1.99 07-May-2024 jan

Additional check for TSO packets with 0 MSS.

Tested by bluhm

ok bluhm@


# 1.98 12-Apr-2024 jan

remove useless includes of ip.h and ip6.h

ok bluhm


Revision tags: OPENBSD_7_5_BASE
# 1.97 14-Feb-2024 bluhm

Check IP length in ether_extract_headers().

For LRO with ix(4) it is necessary to detect ethernet padding.
Extract ip_len and ip6_plen from the mbuf and provide it to the
drivers.
Add extended sanitity checks, like IP packet is shorter than TCP
header. This prevents offloading to network hardware with bougus
packets.
Also iphlen of extracted headers contains header length for IPv4
and IPv6, to make code in drivers simpler.

OK mglocker@


# 1.96 13-Feb-2024 bluhm

Analyse header layout in ether_extract_headers().

Several drivers need IPv4 header length and TCP offset for checksum
offload, TSO and LRO. Accessing these fields directly caused crashes
on sparc64 due to misaligned access. It cannot be guaranteed that
IP and TCP header is 4 byte aligned in driver level. Also gcc 4.2.1
assumes that bit fields can be accessed with 32 bit load instructions.

Use memcpy() in ether_extract_headers() to get the bits from IPv4
and TCP header and store the header length in struct ether_extracted.
From there network drivers can esily use it without caring about
alignment and bit shift. Do some sanity checks with the length
values to prevent that invalid values from evil packets get stored
into hardware registers. If check fails, clear the pointer to the
header to hide it from the driver. Add debug prints that help to
figure out the reason for bad packets and provide information when
debugging drivers.

OK mglocker@


# 1.95 07-Jan-2024 bluhm

In ixl(4) attach, initialize mutex before using it.

Function ixl_get_link_status() calls ixl_set_link_status() which
locks sc_link_state_mtx. Move initilization of mutex before calling
ixl_get_link_status(). This makes witness happy.

Bug reported and fix tested by Hrvoje Popovski; OK miod@


# 1.94 30-Dec-2023 bluhm

Set ixl(4) IXL_TX_PKT_DESCS to 8.

Mark Patruck has reported problems with ixl revision 1.90 TSO diff.
He uses ixl device passthrough from Linux via KVM to OpenBSD guest.
After a few hours of operation, interface locks up with oactive.
The problem also occures with TSO disabled, after the TSO diff had
been commited. deraadt@ has seen similar problems with ixl interface
on sparc64.
Changing IXL_TX_PKT_DESCS back to the original value 8 fixes the
lockup and even TSO on the hardware still works. FreeBSD and NetBSD
also use this value. The 32 was copied from ix(4) TSO diff and is
not necessary for ixl(4).

debugged with jan@; lot of bisecting and testing by Mark Patruck
OK mglocker@ patrick@


# 1.93 10-Nov-2023 bluhm

Make ifq and ifiq interface MP safe.

Rename ifq_set_maxlen() to ifq_init_maxlen(). This function neither
uses WRITE_ONCE() nor a mutex and is called before the ifq mutex
is initialized. The new name expresses that it should be used only
during interface attach when there is no concurrency.

Protect ifq_len(), ifq_empty(), ifiq_len(), and ifiq_empty() with
READ_ONCE(). They can be used without lock as they only read a
single integer.

OK dlg@


# 1.92 20-Oct-2023 jan

Add missing tcps_outpkttso counter to ixl(4) TSO setup.

ok bluhm@


# 1.91 20-Oct-2023 jan

Improve bad comment.

pointed out by kn@

ok kn@


# 1.90 19-Oct-2023 jan

Enable TCP Segmentation Offloading for ixl(4)

Tested on amd64 and sparc64.
Also tested by bluhm@.

ok bluhm@


Revision tags: OPENBSD_7_4_BASE
# 1.89 29-Sep-2023 bluhm

Replace kernel lock with mutex in ixl(4) media status.

Witness found that sc_atq_mtx mutex is held when kernel lock is
acquired. This might cause a deadlock. Protect sc_media_status
and sc_media_active with the link state mutex instead. Global
fields ifm->ifm_status and ifm->ifm_active are still protected by
kernel lock.

OK tobhe@


# 1.88 19-Jul-2023 jan

Protect ixl(4) admin queue with mutex(9).

with tweaks from bluhm

tested by bluhm

ok bluhm@


Revision tags: OPENBSD_7_3_BASE
# 1.87 06-Feb-2023 jan

consolidate mbuf header parsing on device driver layer

with tweaks from mvs@, mpi@, dlg@, naddy@ and bluhm@

"go for it" deraadt@

ok naddy@, mvs@


# 1.86 26-Jan-2023 deraadt

backing "consolidate mbuf header parsing on device driver layer"
easily repeatable ASSERT happens seconds after starting compiles over nfs.


# 1.85 24-Jan-2023 jan

consolidate mbuf header parsing on device driver layer

with tweaks from mvs@, mpi@ and dlg@

ok mvs@, dlg@


Revision tags: OPENBSD_7_2_BASE
# 1.84 05-Aug-2022 bluhm

The netlock for SIOCSIFMEDIA and SIOCGIFMEDIA ioctl is not necessary.
Legacy drivers run with kernel lock, interface media is MP safe or
has kernel lock. Assert kernel lock in ix(4) and ixl(4).
OK kettenis@


Revision tags: OPENBSD_7_1_BASE
# 1.83 11-Mar-2022 mpi

Constify struct cfattach.


# 1.82 10-Feb-2022 bluhm

Enable receive checksum offloading on ixl(4) network interfaces.
from jan@; test and OK dlg@


# 1.81 09-Feb-2022 dlg

enable hw vlan tag handling in the rx path too.

also tested on both x86 and sparc64.


# 1.80 09-Feb-2022 dlg

enable hardware vlan tagging.

tested on sparc64 and x86


# 1.79 08-Feb-2022 dlg

bring back IPv4, TCP4/6 and UDP4/6 checksum offloading.

this was first introduced in r1.176 by jan@. this diff includes two
fixes to that implementation.

the most important one is to parse the ip and tcp headers before a
possible call to m_defrag. if an l4 offload is requested, it's only
requested by the stack when the payload is correctly aligned and
with each header contiguous in memory. this means you can use
m_getptr and cast the packet data to the relevant headers to read
them directly because that's what the stack does when it's working
on them. this makes it cheap to work on them too.

however, if you m_defrag, it ignores the alignment and ends up
making it unsafe to dereference the ip and tcp/udp payloads on
strict alignment architectures. if we want to look at the headers
after m_defrag, we'd likely have to copy them onto the stack first.

the other fix is to reset the offload bits between packets in the
loop in ixl_start.

another difference is that this code skips parsing the packet if
no checksum offload is requested.

tests and a tweak by bluhm@ to actually use the offloading
tested by me on sparc64 and x86 boxes

ok bluhm@ jmatthew@


# 1.78 09-Jan-2022 jsg

spelling
feedback and ok tb@ jmc@ ok ratchov@


# 1.77 27-Nov-2021 deraadt

previous commit causes gcc to perform an unaligned access to the tcphdr
(at least on sparc64) since it accesses the bitfield using an "int sized"
instructions, rather than the minimally sized byte instruction. This is
permitted by the language laywers who probably prefer we change the tcphdr
in every packet. It is not clear how to convince gcc to avoid this behaviour,
and a week of futzing hasn't found fast path solutions yet. In the meantime
the tree may not be broken.


# 1.76 09-Nov-2021 jan

Add IPv4, TCP4/6 and UDP4/6 checksum offloading.

ok jmatthew@


Revision tags: OPENBSD_7_0_BASE
# 1.75 23-Jul-2021 jmatthew

pci_intr_msix_count() is the function that drivers using multiple MSI-X
vectors use to decide whether to use MSI-X, so make it return 0 if MSI
is not enabled for the device.

fixes problems with ix(4) on older amd64 hardware and current riscv64
ok kettenis@ dlg@


Revision tags: OPENBSD_6_9_BASE
# 1.74 26-Mar-2021 jan

Add PCI ID for Intel X710 10G SFP+ NIC

ok patrick@


# 1.73 26-Feb-2021 jan

Add missing PCI product IDs for x710 10GBase-T into ixl(4)

OK phessler


# 1.72 25-Jan-2021 dlg

if the rx descriptor reports the rss hash, use it for the mbuf flowid.

ok jmatthew@


# 1.71 22-Dec-2020 dlg

name the rx rings like ix does for systat mb


# 1.70 12-Dec-2020 jan

Rename the macro MCLGETI to MCLGETL and removes the dead parameter ifp.

OK dlg@, bluhm@
No Opinion mpi@
Not against it claudio@


# 1.69 02-Nov-2020 dlg

add support for ctl reads and writes on 710 chips with API < 1.5

this gets rid of some annoying errors when bringing such an interface
up, but more importantly is allows RSS to work on these boards with
older firmware.

ok jmatthew@


Revision tags: OPENBSD_6_8_BASE
# 1.68 16-Jul-2020 dlg

use a mutex to serialise the test and set of ifp->if_link_state.

this was serialised by NET_LOCK, but now i get link state change
information in an interrupt context, so i shouldn't (can't) do that
anymore.

ok jmatthew@


# 1.67 16-Jul-2020 dlg

sc_atq_mtx is unused, so get rid of it


# 1.66 12-Jul-2020 dlg

it's not an error if the API doesnt support reading sfp/qsfp stuff.

it's just not supported. the manpage says why.

ok sthen@ deraadt@


# 1.65 11-Jul-2020 dlg

don't complain if the API is too low to support "query phy types".

it just means we won't know what to show in ifconfig media output,
but that's not a huge deal. there's still some more issues around
api versions and driver support that we're working on though.

ok sthen@ jmatthew@


# 1.64 10-Jul-2020 patrick

Change users of IFQ_SET_MAXLEN() and IFQ_IS_EMPTY() to use the "new" API.

ok dlg@ tobhe@


# 1.63 07-Jul-2020 dlg

fix a copy pasto.

from netbsd if_ixl.c r1.39 by yamaguchi


# 1.62 07-Jul-2020 dlg

don't try and be too clever in the kstat update timeout.


# 1.61 07-Jul-2020 dlg

remove some old code from a previous version of the kstat diff.

ixl(4) is only enabled on 64bit archs, so we don't need a 32bit
fallback for the 48 and 64 bit counter reads.


# 1.60 07-Jul-2020 dlg

add kstat support for reading hardware counters.

this chip is annoyingly complicated, which is reflected in how
complicated it is to read counters off the chip. while we just use
ixl as a normal network interface, the chip is capable of being a
switch with physical ports, virtual ports and all sorts of other
functionality, and there are counters in different places for all
these different pieces. in our simple setup the driver interface
is mapped to a single physical port which we talk to via a single
virtual switch interface. this diff adds counters on each interface
for the physical port and for the virtual switch interface (vsi).

the port counters show what the hardware is doing, while the vsi
counters show how the driver is interacting with the chip. for
things like packet counters, these numbers tend to correlate strongly,
but there are some differences. if the chip drops packets cos there's
no descriptors on the rx ring, that's shown in the vsi counters.
problems talking to the physical network (eg, packet corruption off
the wire) are reported on the port counters.

on top of the chip just being complicated, reading the counters is
a complicated activity on its own. because the counters can be read
by multiple consumers in a virtualised environment, the counters
never really get reset. they are also stored in annoyingly small
fields. this means you basically have to poll the chip periodically
and calculate differences between the polls to avoid losing numbers
if they overflow too quickly.


# 1.59 26-Jun-2020 dlg

whitespace fixes, no functional change.


# 1.58 26-Jun-2020 dlg

fix link state handling so we can see link go both up and down.


# 1.57 25-Jun-2020 dlg

LUT writes go through actual regsiters on 710s, not ctl writes.

found by jmatthew@


# 1.56 25-Jun-2020 dlg

add rss/toeplitz support for 710 chips.

this basically distributes incoming packets over the rx rings, where
without this they would all land on ring 0.

note that the 722 stuff is stubbed out at the moment. i don't have
an x722 to test with, so it's hard to get motivated to write the
code for it.

this is based on stuff supplied by christiano haesbaert.


# 1.55 25-Jun-2020 dlg

use the ixl_chip struct to store different rss_hena settings for 710/722


# 1.54 25-Jun-2020 dlg

add definitions for rss bits.

based on info from christiano haesbaert


# 1.53 25-Jun-2020 dlg

we're close to a point where the differences between 710s and 722s matter.

this adds a struct ixl_chip, which should hold the differences in
functionality between 710s and 722s. this adds which type of chip
each product is to the ixl_devices array.

based on stuff from christiano haesbaert


# 1.52 25-Jun-2020 dlg

use intrmap to set up multiple queues across multiple cpus.

ixl(4) is only enabled on amd64 and sparc64, and both of them now
support pci_intr_establish_cpu(), so it is safe to apply this.

a few things to note:

- the chip only supports a power of 2 number of queues, (ie, 1, 2,
4, 8, etc), so this also tests the INTRMAP_POWEROF2 flag to
intrmap_create. i tested this on a box with 6 cpus and it did the
right thing.
- the chip can support a lot of vectors, but we're limiting it to
8 for now.
- rss with toeplitz is not implemented yet, so all rxed packets end
up on the 0th queue for now.

jmatthew@ had done most of the work already, and christiano haesbaert
provided some hints to motivate me to work on this bit of it.

tested by jmatthew@ on sparc64
ok jmatthew@


# 1.51 24-Jun-2020 dlg

set IFQ_SET_MAXLEN to the number of slots on the tx ring, not 1.

this effectively enables tx mitigation on this chip. hrvoje popovski
tested it and discovered it adds about 20% to forwarding performance
on his test machine, and brings it more in line with ix(4) performance.

jmatthew thinks i copied setting it to 1 from myx, but myx resets
it to a proper value later on when it figures out what the chip is
capable of. how embarrassment.


# 1.50 24-Jun-2020 dlg

get rid of the per device sff lock because we only use the global one.

no functional change


# 1.49 21-Jun-2020 jmatthew

The onboard interfaces on T7/S7 machines don't provide a valid MAC address
for themselves, so use the "local-mac-address" Open Firmware property
instead, as done in ix(4).

ok dlg@


# 1.48 09-May-2020 jmatthew

Use MSI-X interrupts where available. The first vector is used for
events and command completions as that's the only vector they can go to.
tx/rx queues are mapped to subsequent vectors.

ok mpi@ dlg@


Revision tags: OPENBSD_6_7_BASE
# 1.47 22-Apr-2020 mpi

Use I40E_QUEUE_TYPE_EOL instead of hardcoding its value localy.

ok jmatthew@


# 1.46 19-Nov-2019 yasuoka

Correct the link speed constant variables. From Shoichi Yamaguchi.

ok jmatthew


Revision tags: OPENBSD_6_6_BASE
# 1.45 02-Oct-2019 yasuoka

When dequeuing an aqb from the live queue and move it to the idle queue,
remove it from the live queue. Found by Shoichi Yamaguchi.

ok dlg


# 1.44 30-Sep-2019 dlg

use the right values when figuring out which if_baudrate to use.

i was basically using ISSET(bitfield, shift) instead of
ISSET(bitfield, 1 << shift), so things didn't line up properly.

before this the baudrate would come out as 0, but lacp mode in
trunk(4) wants a non-zero if_baudrate value to help pick the best
aggregator to use. if the port reports 0, it doesn't get selected.
this probably explains why trunk(4) doesn't like doing lacp on
virtual ethernet interfaces too, but at least i can get a number
that actually means something on a hardware interface. alternatively,
use aggr(4), which doesn't care about the baudrate.

ok jmatthew@


# 1.43 27-Aug-2019 dlg

don't check the page number was set correctly.

there are some extremely terrible modules out there that violate
specs badly. some SFP+ DACs i have don't support page switching,
and always report page 0xff. let userland look at them anyway.


# 1.42 27-Aug-2019 dlg

make SIOCGIFSFFPAGE support QSFP modules better.

basically the "read phy register" admin op uses the "address" field
on SFP modules for the i2c address, but uses that same field for
the "sff page number" for QSFP. this wraps the calls to the admin
op with sfp and qsfp specific code, and chooses between them depending
on what type of phy the fw reports.

tested by me and jmatthew@ pretty hard


# 1.41 29-Jul-2019 jmatthew

remove some VF bits now that iavf(4) exists

ok dlg@


# 1.40 21-Jul-2019 dlg

apply backpressure when ifiq says the stack is getting busy


# 1.39 04-Jul-2019 jmatthew

when the mac address changes, update the mac/vlan filters accordingly.

ok dlg@


# 1.38 04-May-2019 jsg

fix array bounds check in ixl_search_link_speed()
ok dlg@


# 1.37 15-Apr-2019 visa

Use timeout_del_barrier(9) instead of timeout_del(9) followed by
conditional timeout_barrier(9).

OK kn@ dlg@


Revision tags: OPENBSD_6_5_BASE
# 1.36 10-Apr-2019 phessler

Add support for X722 to ixl(4)

full list of pci ids from sthen@

OK sthen@, jmatthew@, deraadt@


# 1.35 10-Apr-2019 dlg

implement support for SIOCGIFSFFPAGE

this relies on some firmware commands for accessing the i2c bus
that are not available until a relatively recent API version. our
nics using API 1.4 and 1.5 don't handle the command, but the API
1.7 nic we have is happy to talk to the module that is plugged into
it.

xl710 cards (the 40g ones with a qsfp+ connector) can be split up
into 4 functions that represent lanes on a single port. you can get
qsfp+ to 4x sfp+ cables so you can use the different lanes/functions
as completely independent interfaces. however, because each lane/
function is one port and therefore module, we need to serialise
access to the module by at least the port. this is to prevent
concurrent reads of different pages of the one module from stepping
on each other. i took the easy path and made a single ixlsff lock,
which is at least conservative.


# 1.34 01-Apr-2019 jmatthew

Enable the full pre-reset code path again, now that we've fixed the cause
of the pcie errors.

ok dlg@


# 1.33 01-Apr-2019 jmatthew

Don't use a prefetchable mapping for the registers - the controller only
accepts specific read and write widths, so prefetching and write combining
can cause it to generate pcie errors.

ok dlg@


# 1.32 29-Mar-2019 dlg

remove ifiq_barrier in ixl_down cos ifiq tasks don't use nic resources.

ixl_down frees all the dma memory used by the rings, but that memory
isn't used by ifiqs, so there's no need to wait for them to finish


# 1.31 22-Mar-2019 dlg

back out 1.28

i misread the doco, having promisc vlan reception is what we want.


# 1.30 22-Mar-2019 dlg

use a cond in ixl_atq_exec to wait for a command to be completed.

the main change is to not run ixl_atq_done, cos the interrupt handler
is supposed to do that for us.


# 1.29 22-Mar-2019 dlg

don't blindly complete admin queue entries

there's a bit in the flags field that the firmware sets when the
command is done, so check to decide if the command is ready to be
completed. this in turn makes ixl_iff work.

"oh" jmatthew@


# 1.28 21-Mar-2019 dlg

don't set VLAN things when configuring promisc

the doco says vlan things should only be set if you're manipulating
something on a specific vlan, otherwise the config applies to
everything, which is how the stack wants things to work at the
moment.

there's still something wrong in here, but let's get this out of
the way first.


# 1.27 21-Mar-2019 dlg

run event callbacks directly in the atq processing

previously events were queued on an SLIST, but multiple link state
events could fire with the same callback. this corrupted the SLIST
and effectively caused an infinite loop.

ok jmatthew@


# 1.26 12-Mar-2019 jmatthew

Until we can figure out why it causes NMIs on some machines, skip the
pre-reset steps described in Intel's datasheet and also their driver code.

ok dlg@


# 1.25 06-Mar-2019 jmatthew

Ignore rx interrupts if the interface isn't supposed to be running.
In some situations the pxe rom seems to leave rx interrupts
pending, so we get them as soon as we turn interrupts on.
Trying to process rx interrupts before we've allocated an rx ring
leads to crashes.

ok dlg@


# 1.24 01-Mar-2019 dlg

use ifiq_input instead of if_input

call if_rxr_livelocked if ifiq_input says to slow down


# 1.23 26-Feb-2019 dlg

tweak the mbuf loading in the tx path so it's easier to read.

count mbuf load failures as output errors so i can see if that's
a problem (it's not, but at least i can see it isn't now).


# 1.22 26-Feb-2019 dlg

put the rx buffer at the end of the cluster.

makes m_prepend later less likely to allocate a new mbuf.


# 1.21 26-Feb-2019 dlg

avoid a deadlock in ixl_down when calling ifq_barrier.

this is particularly noticable on sparc64 when you reboot.

ok jmatthew@


# 1.20 24-Feb-2019 dlg

get rid of an unused softc member


# 1.19 01-Feb-2019 jmatthew

fix up calculation of our physical function id, making the second port
on dual port cards work much better.

ok dlg@


# 1.18 29-Jan-2019 dlg

get rid of some more debug printfs

suggested by jmatthew@


# 1.17 29-Jan-2019 dlg

don't need to print the base queue number.

ok jmatthew@


# 1.16 22-Jan-2019 jmatthew

Increase hardmtu to the maximum according to the datasheet and set the rx
packet size limit to match so jumbos actually work. Larger packets are
split across multiple buffers on the ring, so the buffers themselves stay
the same size.

ok dlg@


# 1.15 22-Jan-2019 jmatthew

Add and remove mac filters for multicast addresses.

ok dlg@


# 1.14 21-Jan-2019 jmatthew

also add a mac filter that ignores vlans for the broadcast address, so we can
see arp requests on vlans, among other things.

ok dlg@


# 1.13 20-Jan-2019 jmatthew

Replace the default mac filter with one that ignores vlans, and enable
promisc vlan mode so we can see vlan tagged traffic.

ok dlg@


# 1.12 20-Jan-2019 jmatthew

implement ixl_rxrinfo, dynamically allocating the buffer to prepare for
multiple receive rings at some point in the future.

ok dlg@


# 1.11 20-Jan-2019 jmatthew

Handle link state change interrupts by issuing IXL_AQ_OP_PHY_LINK_STATUS
to the admin queue. We don't need to wait for or process the reply,
because the existing admin reply queue processing already does it.

ok dlg@


# 1.10 19-Jan-2019 jmatthew

actually set CAUSE_ENA on the rx and tx queues, and re-enable interrupts
at the start of the interrupt handler. now it works well enough to commit
over.

ok dlg@


# 1.9 18-Jan-2019 jmatthew

pack hmc bits in the right order

ok dlg@


# 1.8 18-Nov-2018 jmatthew

request notification of link state changes, which helps us detect
link when it takes a bit longer to establish.

ok dlg@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.7 21-Dec-2017 dlg

add some ifq and ifiq barriers in ixl_down.

move rxfill in ixl_up so the qtail is only written once.


# 1.6 21-Dec-2017 dlg

now that we have multiple input queues in ifnet structs, use them.

for now we still only have one set of tx and rx rings, but sets us up to
bump that number.


# 1.5 15-Dec-2017 dlg

put where im up to into the tree so jmatthew@ can look at it.


# 1.4 29-Nov-2017 dlg

turns out you dont have to configure a vsi as the default in a veb.

what a waste of two days.


# 1.3 29-Nov-2017 dlg

let this build on sparc64 again.


# 1.2 28-Nov-2017 dlg

remove the #if 0ed out ixl_add_veb now that cvs has backed it up

im pretty sure we dont need it if we're just using the chip as a
single ethernet port.


# 1.1 28-Nov-2017 dlg

add ixl(4) for the "Intel Ethernet 700 Series"

this doesn't work yet, but it very recently got too big to hack on
without cvs to help me manage further changes to it.

ok deraadt@


# 1.98 12-Apr-2024 jan

remove useless includes of ip.h and ip6.h

ok bluhm


Revision tags: OPENBSD_7_5_BASE
# 1.97 14-Feb-2024 bluhm

Check IP length in ether_extract_headers().

For LRO with ix(4) it is necessary to detect ethernet padding.
Extract ip_len and ip6_plen from the mbuf and provide it to the
drivers.
Add extended sanitity checks, like IP packet is shorter than TCP
header. This prevents offloading to network hardware with bougus
packets.
Also iphlen of extracted headers contains header length for IPv4
and IPv6, to make code in drivers simpler.

OK mglocker@


# 1.96 13-Feb-2024 bluhm

Analyse header layout in ether_extract_headers().

Several drivers need IPv4 header length and TCP offset for checksum
offload, TSO and LRO. Accessing these fields directly caused crashes
on sparc64 due to misaligned access. It cannot be guaranteed that
IP and TCP header is 4 byte aligned in driver level. Also gcc 4.2.1
assumes that bit fields can be accessed with 32 bit load instructions.

Use memcpy() in ether_extract_headers() to get the bits from IPv4
and TCP header and store the header length in struct ether_extracted.
From there network drivers can esily use it without caring about
alignment and bit shift. Do some sanity checks with the length
values to prevent that invalid values from evil packets get stored
into hardware registers. If check fails, clear the pointer to the
header to hide it from the driver. Add debug prints that help to
figure out the reason for bad packets and provide information when
debugging drivers.

OK mglocker@


# 1.95 07-Jan-2024 bluhm

In ixl(4) attach, initialize mutex before using it.

Function ixl_get_link_status() calls ixl_set_link_status() which
locks sc_link_state_mtx. Move initilization of mutex before calling
ixl_get_link_status(). This makes witness happy.

Bug reported and fix tested by Hrvoje Popovski; OK miod@


# 1.94 30-Dec-2023 bluhm

Set ixl(4) IXL_TX_PKT_DESCS to 8.

Mark Patruck has reported problems with ixl revision 1.90 TSO diff.
He uses ixl device passthrough from Linux via KVM to OpenBSD guest.
After a few hours of operation, interface locks up with oactive.
The problem also occures with TSO disabled, after the TSO diff had
been commited. deraadt@ has seen similar problems with ixl interface
on sparc64.
Changing IXL_TX_PKT_DESCS back to the original value 8 fixes the
lockup and even TSO on the hardware still works. FreeBSD and NetBSD
also use this value. The 32 was copied from ix(4) TSO diff and is
not necessary for ixl(4).

debugged with jan@; lot of bisecting and testing by Mark Patruck
OK mglocker@ patrick@


# 1.93 10-Nov-2023 bluhm

Make ifq and ifiq interface MP safe.

Rename ifq_set_maxlen() to ifq_init_maxlen(). This function neither
uses WRITE_ONCE() nor a mutex and is called before the ifq mutex
is initialized. The new name expresses that it should be used only
during interface attach when there is no concurrency.

Protect ifq_len(), ifq_empty(), ifiq_len(), and ifiq_empty() with
READ_ONCE(). They can be used without lock as they only read a
single integer.

OK dlg@


# 1.92 20-Oct-2023 jan

Add missing tcps_outpkttso counter to ixl(4) TSO setup.

ok bluhm@


# 1.91 20-Oct-2023 jan

Improve bad comment.

pointed out by kn@

ok kn@


# 1.90 19-Oct-2023 jan

Enable TCP Segmentation Offloading for ixl(4)

Tested on amd64 and sparc64.
Also tested by bluhm@.

ok bluhm@


Revision tags: OPENBSD_7_4_BASE
# 1.89 29-Sep-2023 bluhm

Replace kernel lock with mutex in ixl(4) media status.

Witness found that sc_atq_mtx mutex is held when kernel lock is
acquired. This might cause a deadlock. Protect sc_media_status
and sc_media_active with the link state mutex instead. Global
fields ifm->ifm_status and ifm->ifm_active are still protected by
kernel lock.

OK tobhe@


# 1.88 19-Jul-2023 jan

Protect ixl(4) admin queue with mutex(9).

with tweaks from bluhm

tested by bluhm

ok bluhm@


Revision tags: OPENBSD_7_3_BASE
# 1.87 06-Feb-2023 jan

consolidate mbuf header parsing on device driver layer

with tweaks from mvs@, mpi@, dlg@, naddy@ and bluhm@

"go for it" deraadt@

ok naddy@, mvs@


# 1.86 26-Jan-2023 deraadt

backing "consolidate mbuf header parsing on device driver layer"
easily repeatable ASSERT happens seconds after starting compiles over nfs.


# 1.85 24-Jan-2023 jan

consolidate mbuf header parsing on device driver layer

with tweaks from mvs@, mpi@ and dlg@

ok mvs@, dlg@


Revision tags: OPENBSD_7_2_BASE
# 1.84 05-Aug-2022 bluhm

The netlock for SIOCSIFMEDIA and SIOCGIFMEDIA ioctl is not necessary.
Legacy drivers run with kernel lock, interface media is MP safe or
has kernel lock. Assert kernel lock in ix(4) and ixl(4).
OK kettenis@


Revision tags: OPENBSD_7_1_BASE
# 1.83 11-Mar-2022 mpi

Constify struct cfattach.


# 1.82 10-Feb-2022 bluhm

Enable receive checksum offloading on ixl(4) network interfaces.
from jan@; test and OK dlg@


# 1.81 09-Feb-2022 dlg

enable hw vlan tag handling in the rx path too.

also tested on both x86 and sparc64.


# 1.80 09-Feb-2022 dlg

enable hardware vlan tagging.

tested on sparc64 and x86


# 1.79 08-Feb-2022 dlg

bring back IPv4, TCP4/6 and UDP4/6 checksum offloading.

this was first introduced in r1.176 by jan@. this diff includes two
fixes to that implementation.

the most important one is to parse the ip and tcp headers before a
possible call to m_defrag. if an l4 offload is requested, it's only
requested by the stack when the payload is correctly aligned and
with each header contiguous in memory. this means you can use
m_getptr and cast the packet data to the relevant headers to read
them directly because that's what the stack does when it's working
on them. this makes it cheap to work on them too.

however, if you m_defrag, it ignores the alignment and ends up
making it unsafe to dereference the ip and tcp/udp payloads on
strict alignment architectures. if we want to look at the headers
after m_defrag, we'd likely have to copy them onto the stack first.

the other fix is to reset the offload bits between packets in the
loop in ixl_start.

another difference is that this code skips parsing the packet if
no checksum offload is requested.

tests and a tweak by bluhm@ to actually use the offloading
tested by me on sparc64 and x86 boxes

ok bluhm@ jmatthew@


# 1.78 09-Jan-2022 jsg

spelling
feedback and ok tb@ jmc@ ok ratchov@


# 1.77 27-Nov-2021 deraadt

previous commit causes gcc to perform an unaligned access to the tcphdr
(at least on sparc64) since it accesses the bitfield using an "int sized"
instructions, rather than the minimally sized byte instruction. This is
permitted by the language laywers who probably prefer we change the tcphdr
in every packet. It is not clear how to convince gcc to avoid this behaviour,
and a week of futzing hasn't found fast path solutions yet. In the meantime
the tree may not be broken.


# 1.76 09-Nov-2021 jan

Add IPv4, TCP4/6 and UDP4/6 checksum offloading.

ok jmatthew@


Revision tags: OPENBSD_7_0_BASE
# 1.75 23-Jul-2021 jmatthew

pci_intr_msix_count() is the function that drivers using multiple MSI-X
vectors use to decide whether to use MSI-X, so make it return 0 if MSI
is not enabled for the device.

fixes problems with ix(4) on older amd64 hardware and current riscv64
ok kettenis@ dlg@


Revision tags: OPENBSD_6_9_BASE
# 1.74 26-Mar-2021 jan

Add PCI ID for Intel X710 10G SFP+ NIC

ok patrick@


# 1.73 26-Feb-2021 jan

Add missing PCI product IDs for x710 10GBase-T into ixl(4)

OK phessler


# 1.72 25-Jan-2021 dlg

if the rx descriptor reports the rss hash, use it for the mbuf flowid.

ok jmatthew@


# 1.71 22-Dec-2020 dlg

name the rx rings like ix does for systat mb


# 1.70 12-Dec-2020 jan

Rename the macro MCLGETI to MCLGETL and removes the dead parameter ifp.

OK dlg@, bluhm@
No Opinion mpi@
Not against it claudio@


# 1.69 02-Nov-2020 dlg

add support for ctl reads and writes on 710 chips with API < 1.5

this gets rid of some annoying errors when bringing such an interface
up, but more importantly is allows RSS to work on these boards with
older firmware.

ok jmatthew@


Revision tags: OPENBSD_6_8_BASE
# 1.68 16-Jul-2020 dlg

use a mutex to serialise the test and set of ifp->if_link_state.

this was serialised by NET_LOCK, but now i get link state change
information in an interrupt context, so i shouldn't (can't) do that
anymore.

ok jmatthew@


# 1.67 16-Jul-2020 dlg

sc_atq_mtx is unused, so get rid of it


# 1.66 12-Jul-2020 dlg

it's not an error if the API doesnt support reading sfp/qsfp stuff.

it's just not supported. the manpage says why.

ok sthen@ deraadt@


# 1.65 11-Jul-2020 dlg

don't complain if the API is too low to support "query phy types".

it just means we won't know what to show in ifconfig media output,
but that's not a huge deal. there's still some more issues around
api versions and driver support that we're working on though.

ok sthen@ jmatthew@


# 1.64 10-Jul-2020 patrick

Change users of IFQ_SET_MAXLEN() and IFQ_IS_EMPTY() to use the "new" API.

ok dlg@ tobhe@


# 1.63 07-Jul-2020 dlg

fix a copy pasto.

from netbsd if_ixl.c r1.39 by yamaguchi


# 1.62 07-Jul-2020 dlg

don't try and be too clever in the kstat update timeout.


# 1.61 07-Jul-2020 dlg

remove some old code from a previous version of the kstat diff.

ixl(4) is only enabled on 64bit archs, so we don't need a 32bit
fallback for the 48 and 64 bit counter reads.


# 1.60 07-Jul-2020 dlg

add kstat support for reading hardware counters.

this chip is annoyingly complicated, which is reflected in how
complicated it is to read counters off the chip. while we just use
ixl as a normal network interface, the chip is capable of being a
switch with physical ports, virtual ports and all sorts of other
functionality, and there are counters in different places for all
these different pieces. in our simple setup the driver interface
is mapped to a single physical port which we talk to via a single
virtual switch interface. this diff adds counters on each interface
for the physical port and for the virtual switch interface (vsi).

the port counters show what the hardware is doing, while the vsi
counters show how the driver is interacting with the chip. for
things like packet counters, these numbers tend to correlate strongly,
but there are some differences. if the chip drops packets cos there's
no descriptors on the rx ring, that's shown in the vsi counters.
problems talking to the physical network (eg, packet corruption off
the wire) are reported on the port counters.

on top of the chip just being complicated, reading the counters is
a complicated activity on its own. because the counters can be read
by multiple consumers in a virtualised environment, the counters
never really get reset. they are also stored in annoyingly small
fields. this means you basically have to poll the chip periodically
and calculate differences between the polls to avoid losing numbers
if they overflow too quickly.


# 1.59 26-Jun-2020 dlg

whitespace fixes, no functional change.


# 1.58 26-Jun-2020 dlg

fix link state handling so we can see link go both up and down.


# 1.57 25-Jun-2020 dlg

LUT writes go through actual regsiters on 710s, not ctl writes.

found by jmatthew@


# 1.56 25-Jun-2020 dlg

add rss/toeplitz support for 710 chips.

this basically distributes incoming packets over the rx rings, where
without this they would all land on ring 0.

note that the 722 stuff is stubbed out at the moment. i don't have
an x722 to test with, so it's hard to get motivated to write the
code for it.

this is based on stuff supplied by christiano haesbaert.


# 1.55 25-Jun-2020 dlg

use the ixl_chip struct to store different rss_hena settings for 710/722


# 1.54 25-Jun-2020 dlg

add definitions for rss bits.

based on info from christiano haesbaert


# 1.53 25-Jun-2020 dlg

we're close to a point where the differences between 710s and 722s matter.

this adds a struct ixl_chip, which should hold the differences in
functionality between 710s and 722s. this adds which type of chip
each product is to the ixl_devices array.

based on stuff from christiano haesbaert


# 1.52 25-Jun-2020 dlg

use intrmap to set up multiple queues across multiple cpus.

ixl(4) is only enabled on amd64 and sparc64, and both of them now
support pci_intr_establish_cpu(), so it is safe to apply this.

a few things to note:

- the chip only supports a power of 2 number of queues, (ie, 1, 2,
4, 8, etc), so this also tests the INTRMAP_POWEROF2 flag to
intrmap_create. i tested this on a box with 6 cpus and it did the
right thing.
- the chip can support a lot of vectors, but we're limiting it to
8 for now.
- rss with toeplitz is not implemented yet, so all rxed packets end
up on the 0th queue for now.

jmatthew@ had done most of the work already, and christiano haesbaert
provided some hints to motivate me to work on this bit of it.

tested by jmatthew@ on sparc64
ok jmatthew@


# 1.51 24-Jun-2020 dlg

set IFQ_SET_MAXLEN to the number of slots on the tx ring, not 1.

this effectively enables tx mitigation on this chip. hrvoje popovski
tested it and discovered it adds about 20% to forwarding performance
on his test machine, and brings it more in line with ix(4) performance.

jmatthew thinks i copied setting it to 1 from myx, but myx resets
it to a proper value later on when it figures out what the chip is
capable of. how embarrassment.


# 1.50 24-Jun-2020 dlg

get rid of the per device sff lock because we only use the global one.

no functional change


# 1.49 21-Jun-2020 jmatthew

The onboard interfaces on T7/S7 machines don't provide a valid MAC address
for themselves, so use the "local-mac-address" Open Firmware property
instead, as done in ix(4).

ok dlg@


# 1.48 09-May-2020 jmatthew

Use MSI-X interrupts where available. The first vector is used for
events and command completions as that's the only vector they can go to.
tx/rx queues are mapped to subsequent vectors.

ok mpi@ dlg@


Revision tags: OPENBSD_6_7_BASE
# 1.47 22-Apr-2020 mpi

Use I40E_QUEUE_TYPE_EOL instead of hardcoding its value localy.

ok jmatthew@


# 1.46 19-Nov-2019 yasuoka

Correct the link speed constant variables. From Shoichi Yamaguchi.

ok jmatthew


Revision tags: OPENBSD_6_6_BASE
# 1.45 02-Oct-2019 yasuoka

When dequeuing an aqb from the live queue and move it to the idle queue,
remove it from the live queue. Found by Shoichi Yamaguchi.

ok dlg


# 1.44 30-Sep-2019 dlg

use the right values when figuring out which if_baudrate to use.

i was basically using ISSET(bitfield, shift) instead of
ISSET(bitfield, 1 << shift), so things didn't line up properly.

before this the baudrate would come out as 0, but lacp mode in
trunk(4) wants a non-zero if_baudrate value to help pick the best
aggregator to use. if the port reports 0, it doesn't get selected.
this probably explains why trunk(4) doesn't like doing lacp on
virtual ethernet interfaces too, but at least i can get a number
that actually means something on a hardware interface. alternatively,
use aggr(4), which doesn't care about the baudrate.

ok jmatthew@


# 1.43 27-Aug-2019 dlg

don't check the page number was set correctly.

there are some extremely terrible modules out there that violate
specs badly. some SFP+ DACs i have don't support page switching,
and always report page 0xff. let userland look at them anyway.


# 1.42 27-Aug-2019 dlg

make SIOCGIFSFFPAGE support QSFP modules better.

basically the "read phy register" admin op uses the "address" field
on SFP modules for the i2c address, but uses that same field for
the "sff page number" for QSFP. this wraps the calls to the admin
op with sfp and qsfp specific code, and chooses between them depending
on what type of phy the fw reports.

tested by me and jmatthew@ pretty hard


# 1.41 29-Jul-2019 jmatthew

remove some VF bits now that iavf(4) exists

ok dlg@


# 1.40 21-Jul-2019 dlg

apply backpressure when ifiq says the stack is getting busy


# 1.39 04-Jul-2019 jmatthew

when the mac address changes, update the mac/vlan filters accordingly.

ok dlg@


# 1.38 04-May-2019 jsg

fix array bounds check in ixl_search_link_speed()
ok dlg@


# 1.37 15-Apr-2019 visa

Use timeout_del_barrier(9) instead of timeout_del(9) followed by
conditional timeout_barrier(9).

OK kn@ dlg@


Revision tags: OPENBSD_6_5_BASE
# 1.36 10-Apr-2019 phessler

Add support for X722 to ixl(4)

full list of pci ids from sthen@

OK sthen@, jmatthew@, deraadt@


# 1.35 10-Apr-2019 dlg

implement support for SIOCGIFSFFPAGE

this relies on some firmware commands for accessing the i2c bus
that are not available until a relatively recent API version. our
nics using API 1.4 and 1.5 don't handle the command, but the API
1.7 nic we have is happy to talk to the module that is plugged into
it.

xl710 cards (the 40g ones with a qsfp+ connector) can be split up
into 4 functions that represent lanes on a single port. you can get
qsfp+ to 4x sfp+ cables so you can use the different lanes/functions
as completely independent interfaces. however, because each lane/
function is one port and therefore module, we need to serialise
access to the module by at least the port. this is to prevent
concurrent reads of different pages of the one module from stepping
on each other. i took the easy path and made a single ixlsff lock,
which is at least conservative.


# 1.34 01-Apr-2019 jmatthew

Enable the full pre-reset code path again, now that we've fixed the cause
of the pcie errors.

ok dlg@


# 1.33 01-Apr-2019 jmatthew

Don't use a prefetchable mapping for the registers - the controller only
accepts specific read and write widths, so prefetching and write combining
can cause it to generate pcie errors.

ok dlg@


# 1.32 29-Mar-2019 dlg

remove ifiq_barrier in ixl_down cos ifiq tasks don't use nic resources.

ixl_down frees all the dma memory used by the rings, but that memory
isn't used by ifiqs, so there's no need to wait for them to finish


# 1.31 22-Mar-2019 dlg

back out 1.28

i misread the doco, having promisc vlan reception is what we want.


# 1.30 22-Mar-2019 dlg

use a cond in ixl_atq_exec to wait for a command to be completed.

the main change is to not run ixl_atq_done, cos the interrupt handler
is supposed to do that for us.


# 1.29 22-Mar-2019 dlg

don't blindly complete admin queue entries

there's a bit in the flags field that the firmware sets when the
command is done, so check to decide if the command is ready to be
completed. this in turn makes ixl_iff work.

"oh" jmatthew@


# 1.28 21-Mar-2019 dlg

don't set VLAN things when configuring promisc

the doco says vlan things should only be set if you're manipulating
something on a specific vlan, otherwise the config applies to
everything, which is how the stack wants things to work at the
moment.

there's still something wrong in here, but let's get this out of
the way first.


# 1.27 21-Mar-2019 dlg

run event callbacks directly in the atq processing

previously events were queued on an SLIST, but multiple link state
events could fire with the same callback. this corrupted the SLIST
and effectively caused an infinite loop.

ok jmatthew@


# 1.26 12-Mar-2019 jmatthew

Until we can figure out why it causes NMIs on some machines, skip the
pre-reset steps described in Intel's datasheet and also their driver code.

ok dlg@


# 1.25 06-Mar-2019 jmatthew

Ignore rx interrupts if the interface isn't supposed to be running.
In some situations the pxe rom seems to leave rx interrupts
pending, so we get them as soon as we turn interrupts on.
Trying to process rx interrupts before we've allocated an rx ring
leads to crashes.

ok dlg@


# 1.24 01-Mar-2019 dlg

use ifiq_input instead of if_input

call if_rxr_livelocked if ifiq_input says to slow down


# 1.23 26-Feb-2019 dlg

tweak the mbuf loading in the tx path so it's easier to read.

count mbuf load failures as output errors so i can see if that's
a problem (it's not, but at least i can see it isn't now).


# 1.22 26-Feb-2019 dlg

put the rx buffer at the end of the cluster.

makes m_prepend later less likely to allocate a new mbuf.


# 1.21 26-Feb-2019 dlg

avoid a deadlock in ixl_down when calling ifq_barrier.

this is particularly noticable on sparc64 when you reboot.

ok jmatthew@


# 1.20 24-Feb-2019 dlg

get rid of an unused softc member


# 1.19 01-Feb-2019 jmatthew

fix up calculation of our physical function id, making the second port
on dual port cards work much better.

ok dlg@


# 1.18 29-Jan-2019 dlg

get rid of some more debug printfs

suggested by jmatthew@


# 1.17 29-Jan-2019 dlg

don't need to print the base queue number.

ok jmatthew@


# 1.16 22-Jan-2019 jmatthew

Increase hardmtu to the maximum according to the datasheet and set the rx
packet size limit to match so jumbos actually work. Larger packets are
split across multiple buffers on the ring, so the buffers themselves stay
the same size.

ok dlg@


# 1.15 22-Jan-2019 jmatthew

Add and remove mac filters for multicast addresses.

ok dlg@


# 1.14 21-Jan-2019 jmatthew

also add a mac filter that ignores vlans for the broadcast address, so we can
see arp requests on vlans, among other things.

ok dlg@


# 1.13 20-Jan-2019 jmatthew

Replace the default mac filter with one that ignores vlans, and enable
promisc vlan mode so we can see vlan tagged traffic.

ok dlg@


# 1.12 20-Jan-2019 jmatthew

implement ixl_rxrinfo, dynamically allocating the buffer to prepare for
multiple receive rings at some point in the future.

ok dlg@


# 1.11 20-Jan-2019 jmatthew

Handle link state change interrupts by issuing IXL_AQ_OP_PHY_LINK_STATUS
to the admin queue. We don't need to wait for or process the reply,
because the existing admin reply queue processing already does it.

ok dlg@


# 1.10 19-Jan-2019 jmatthew

actually set CAUSE_ENA on the rx and tx queues, and re-enable interrupts
at the start of the interrupt handler. now it works well enough to commit
over.

ok dlg@


# 1.9 18-Jan-2019 jmatthew

pack hmc bits in the right order

ok dlg@


# 1.8 18-Nov-2018 jmatthew

request notification of link state changes, which helps us detect
link when it takes a bit longer to establish.

ok dlg@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.7 21-Dec-2017 dlg

add some ifq and ifiq barriers in ixl_down.

move rxfill in ixl_up so the qtail is only written once.


# 1.6 21-Dec-2017 dlg

now that we have multiple input queues in ifnet structs, use them.

for now we still only have one set of tx and rx rings, but sets us up to
bump that number.


# 1.5 15-Dec-2017 dlg

put where im up to into the tree so jmatthew@ can look at it.


# 1.4 29-Nov-2017 dlg

turns out you dont have to configure a vsi as the default in a veb.

what a waste of two days.


# 1.3 29-Nov-2017 dlg

let this build on sparc64 again.


# 1.2 28-Nov-2017 dlg

remove the #if 0ed out ixl_add_veb now that cvs has backed it up

im pretty sure we dont need it if we're just using the chip as a
single ethernet port.


# 1.1 28-Nov-2017 dlg

add ixl(4) for the "Intel Ethernet 700 Series"

this doesn't work yet, but it very recently got too big to hack on
without cvs to help me manage further changes to it.

ok deraadt@


# 1.97 14-Feb-2024 bluhm

Check IP length in ether_extract_headers().

For LRO with ix(4) it is necessary to detect ethernet padding.
Extract ip_len and ip6_plen from the mbuf and provide it to the
drivers.
Add extended sanitity checks, like IP packet is shorter than TCP
header. This prevents offloading to network hardware with bougus
packets.
Also iphlen of extracted headers contains header length for IPv4
and IPv6, to make code in drivers simpler.

OK mglocker@


# 1.96 13-Feb-2024 bluhm

Analyse header layout in ether_extract_headers().

Several drivers need IPv4 header length and TCP offset for checksum
offload, TSO and LRO. Accessing these fields directly caused crashes
on sparc64 due to misaligned access. It cannot be guaranteed that
IP and TCP header is 4 byte aligned in driver level. Also gcc 4.2.1
assumes that bit fields can be accessed with 32 bit load instructions.

Use memcpy() in ether_extract_headers() to get the bits from IPv4
and TCP header and store the header length in struct ether_extracted.
From there network drivers can esily use it without caring about
alignment and bit shift. Do some sanity checks with the length
values to prevent that invalid values from evil packets get stored
into hardware registers. If check fails, clear the pointer to the
header to hide it from the driver. Add debug prints that help to
figure out the reason for bad packets and provide information when
debugging drivers.

OK mglocker@


# 1.95 07-Jan-2024 bluhm

In ixl(4) attach, initialize mutex before using it.

Function ixl_get_link_status() calls ixl_set_link_status() which
locks sc_link_state_mtx. Move initilization of mutex before calling
ixl_get_link_status(). This makes witness happy.

Bug reported and fix tested by Hrvoje Popovski; OK miod@


# 1.94 30-Dec-2023 bluhm

Set ixl(4) IXL_TX_PKT_DESCS to 8.

Mark Patruck has reported problems with ixl revision 1.90 TSO diff.
He uses ixl device passthrough from Linux via KVM to OpenBSD guest.
After a few hours of operation, interface locks up with oactive.
The problem also occures with TSO disabled, after the TSO diff had
been commited. deraadt@ has seen similar problems with ixl interface
on sparc64.
Changing IXL_TX_PKT_DESCS back to the original value 8 fixes the
lockup and even TSO on the hardware still works. FreeBSD and NetBSD
also use this value. The 32 was copied from ix(4) TSO diff and is
not necessary for ixl(4).

debugged with jan@; lot of bisecting and testing by Mark Patruck
OK mglocker@ patrick@


# 1.93 10-Nov-2023 bluhm

Make ifq and ifiq interface MP safe.

Rename ifq_set_maxlen() to ifq_init_maxlen(). This function neither
uses WRITE_ONCE() nor a mutex and is called before the ifq mutex
is initialized. The new name expresses that it should be used only
during interface attach when there is no concurrency.

Protect ifq_len(), ifq_empty(), ifiq_len(), and ifiq_empty() with
READ_ONCE(). They can be used without lock as they only read a
single integer.

OK dlg@


# 1.92 20-Oct-2023 jan

Add missing tcps_outpkttso counter to ixl(4) TSO setup.

ok bluhm@


# 1.91 20-Oct-2023 jan

Improve bad comment.

pointed out by kn@

ok kn@


# 1.90 19-Oct-2023 jan

Enable TCP Segmentation Offloading for ixl(4)

Tested on amd64 and sparc64.
Also tested by bluhm@.

ok bluhm@


Revision tags: OPENBSD_7_4_BASE
# 1.89 29-Sep-2023 bluhm

Replace kernel lock with mutex in ixl(4) media status.

Witness found that sc_atq_mtx mutex is held when kernel lock is
acquired. This might cause a deadlock. Protect sc_media_status
and sc_media_active with the link state mutex instead. Global
fields ifm->ifm_status and ifm->ifm_active are still protected by
kernel lock.

OK tobhe@


# 1.88 19-Jul-2023 jan

Protect ixl(4) admin queue with mutex(9).

with tweaks from bluhm

tested by bluhm

ok bluhm@


Revision tags: OPENBSD_7_3_BASE
# 1.87 06-Feb-2023 jan

consolidate mbuf header parsing on device driver layer

with tweaks from mvs@, mpi@, dlg@, naddy@ and bluhm@

"go for it" deraadt@

ok naddy@, mvs@


# 1.86 26-Jan-2023 deraadt

backing "consolidate mbuf header parsing on device driver layer"
easily repeatable ASSERT happens seconds after starting compiles over nfs.


# 1.85 24-Jan-2023 jan

consolidate mbuf header parsing on device driver layer

with tweaks from mvs@, mpi@ and dlg@

ok mvs@, dlg@


Revision tags: OPENBSD_7_2_BASE
# 1.84 05-Aug-2022 bluhm

The netlock for SIOCSIFMEDIA and SIOCGIFMEDIA ioctl is not necessary.
Legacy drivers run with kernel lock, interface media is MP safe or
has kernel lock. Assert kernel lock in ix(4) and ixl(4).
OK kettenis@


Revision tags: OPENBSD_7_1_BASE
# 1.83 11-Mar-2022 mpi

Constify struct cfattach.


# 1.82 10-Feb-2022 bluhm

Enable receive checksum offloading on ixl(4) network interfaces.
from jan@; test and OK dlg@


# 1.81 09-Feb-2022 dlg

enable hw vlan tag handling in the rx path too.

also tested on both x86 and sparc64.


# 1.80 09-Feb-2022 dlg

enable hardware vlan tagging.

tested on sparc64 and x86


# 1.79 08-Feb-2022 dlg

bring back IPv4, TCP4/6 and UDP4/6 checksum offloading.

this was first introduced in r1.176 by jan@. this diff includes two
fixes to that implementation.

the most important one is to parse the ip and tcp headers before a
possible call to m_defrag. if an l4 offload is requested, it's only
requested by the stack when the payload is correctly aligned and
with each header contiguous in memory. this means you can use
m_getptr and cast the packet data to the relevant headers to read
them directly because that's what the stack does when it's working
on them. this makes it cheap to work on them too.

however, if you m_defrag, it ignores the alignment and ends up
making it unsafe to dereference the ip and tcp/udp payloads on
strict alignment architectures. if we want to look at the headers
after m_defrag, we'd likely have to copy them onto the stack first.

the other fix is to reset the offload bits between packets in the
loop in ixl_start.

another difference is that this code skips parsing the packet if
no checksum offload is requested.

tests and a tweak by bluhm@ to actually use the offloading
tested by me on sparc64 and x86 boxes

ok bluhm@ jmatthew@


# 1.78 09-Jan-2022 jsg

spelling
feedback and ok tb@ jmc@ ok ratchov@


# 1.77 27-Nov-2021 deraadt

previous commit causes gcc to perform an unaligned access to the tcphdr
(at least on sparc64) since it accesses the bitfield using an "int sized"
instructions, rather than the minimally sized byte instruction. This is
permitted by the language laywers who probably prefer we change the tcphdr
in every packet. It is not clear how to convince gcc to avoid this behaviour,
and a week of futzing hasn't found fast path solutions yet. In the meantime
the tree may not be broken.


# 1.76 09-Nov-2021 jan

Add IPv4, TCP4/6 and UDP4/6 checksum offloading.

ok jmatthew@


Revision tags: OPENBSD_7_0_BASE
# 1.75 23-Jul-2021 jmatthew

pci_intr_msix_count() is the function that drivers using multiple MSI-X
vectors use to decide whether to use MSI-X, so make it return 0 if MSI
is not enabled for the device.

fixes problems with ix(4) on older amd64 hardware and current riscv64
ok kettenis@ dlg@


Revision tags: OPENBSD_6_9_BASE
# 1.74 26-Mar-2021 jan

Add PCI ID for Intel X710 10G SFP+ NIC

ok patrick@


# 1.73 26-Feb-2021 jan

Add missing PCI product IDs for x710 10GBase-T into ixl(4)

OK phessler


# 1.72 25-Jan-2021 dlg

if the rx descriptor reports the rss hash, use it for the mbuf flowid.

ok jmatthew@


# 1.71 22-Dec-2020 dlg

name the rx rings like ix does for systat mb


# 1.70 12-Dec-2020 jan

Rename the macro MCLGETI to MCLGETL and removes the dead parameter ifp.

OK dlg@, bluhm@
No Opinion mpi@
Not against it claudio@


# 1.69 02-Nov-2020 dlg

add support for ctl reads and writes on 710 chips with API < 1.5

this gets rid of some annoying errors when bringing such an interface
up, but more importantly is allows RSS to work on these boards with
older firmware.

ok jmatthew@


Revision tags: OPENBSD_6_8_BASE
# 1.68 16-Jul-2020 dlg

use a mutex to serialise the test and set of ifp->if_link_state.

this was serialised by NET_LOCK, but now i get link state change
information in an interrupt context, so i shouldn't (can't) do that
anymore.

ok jmatthew@


# 1.67 16-Jul-2020 dlg

sc_atq_mtx is unused, so get rid of it


# 1.66 12-Jul-2020 dlg

it's not an error if the API doesnt support reading sfp/qsfp stuff.

it's just not supported. the manpage says why.

ok sthen@ deraadt@


# 1.65 11-Jul-2020 dlg

don't complain if the API is too low to support "query phy types".

it just means we won't know what to show in ifconfig media output,
but that's not a huge deal. there's still some more issues around
api versions and driver support that we're working on though.

ok sthen@ jmatthew@


# 1.64 10-Jul-2020 patrick

Change users of IFQ_SET_MAXLEN() and IFQ_IS_EMPTY() to use the "new" API.

ok dlg@ tobhe@


# 1.63 07-Jul-2020 dlg

fix a copy pasto.

from netbsd if_ixl.c r1.39 by yamaguchi


# 1.62 07-Jul-2020 dlg

don't try and be too clever in the kstat update timeout.


# 1.61 07-Jul-2020 dlg

remove some old code from a previous version of the kstat diff.

ixl(4) is only enabled on 64bit archs, so we don't need a 32bit
fallback for the 48 and 64 bit counter reads.


# 1.60 07-Jul-2020 dlg

add kstat support for reading hardware counters.

this chip is annoyingly complicated, which is reflected in how
complicated it is to read counters off the chip. while we just use
ixl as a normal network interface, the chip is capable of being a
switch with physical ports, virtual ports and all sorts of other
functionality, and there are counters in different places for all
these different pieces. in our simple setup the driver interface
is mapped to a single physical port which we talk to via a single
virtual switch interface. this diff adds counters on each interface
for the physical port and for the virtual switch interface (vsi).

the port counters show what the hardware is doing, while the vsi
counters show how the driver is interacting with the chip. for
things like packet counters, these numbers tend to correlate strongly,
but there are some differences. if the chip drops packets cos there's
no descriptors on the rx ring, that's shown in the vsi counters.
problems talking to the physical network (eg, packet corruption off
the wire) are reported on the port counters.

on top of the chip just being complicated, reading the counters is
a complicated activity on its own. because the counters can be read
by multiple consumers in a virtualised environment, the counters
never really get reset. they are also stored in annoyingly small
fields. this means you basically have to poll the chip periodically
and calculate differences between the polls to avoid losing numbers
if they overflow too quickly.


# 1.59 26-Jun-2020 dlg

whitespace fixes, no functional change.


# 1.58 26-Jun-2020 dlg

fix link state handling so we can see link go both up and down.


# 1.57 25-Jun-2020 dlg

LUT writes go through actual regsiters on 710s, not ctl writes.

found by jmatthew@


# 1.56 25-Jun-2020 dlg

add rss/toeplitz support for 710 chips.

this basically distributes incoming packets over the rx rings, where
without this they would all land on ring 0.

note that the 722 stuff is stubbed out at the moment. i don't have
an x722 to test with, so it's hard to get motivated to write the
code for it.

this is based on stuff supplied by christiano haesbaert.


# 1.55 25-Jun-2020 dlg

use the ixl_chip struct to store different rss_hena settings for 710/722


# 1.54 25-Jun-2020 dlg

add definitions for rss bits.

based on info from christiano haesbaert


# 1.53 25-Jun-2020 dlg

we're close to a point where the differences between 710s and 722s matter.

this adds a struct ixl_chip, which should hold the differences in
functionality between 710s and 722s. this adds which type of chip
each product is to the ixl_devices array.

based on stuff from christiano haesbaert


# 1.52 25-Jun-2020 dlg

use intrmap to set up multiple queues across multiple cpus.

ixl(4) is only enabled on amd64 and sparc64, and both of them now
support pci_intr_establish_cpu(), so it is safe to apply this.

a few things to note:

- the chip only supports a power of 2 number of queues, (ie, 1, 2,
4, 8, etc), so this also tests the INTRMAP_POWEROF2 flag to
intrmap_create. i tested this on a box with 6 cpus and it did the
right thing.
- the chip can support a lot of vectors, but we're limiting it to
8 for now.
- rss with toeplitz is not implemented yet, so all rxed packets end
up on the 0th queue for now.

jmatthew@ had done most of the work already, and christiano haesbaert
provided some hints to motivate me to work on this bit of it.

tested by jmatthew@ on sparc64
ok jmatthew@


# 1.51 24-Jun-2020 dlg

set IFQ_SET_MAXLEN to the number of slots on the tx ring, not 1.

this effectively enables tx mitigation on this chip. hrvoje popovski
tested it and discovered it adds about 20% to forwarding performance
on his test machine, and brings it more in line with ix(4) performance.

jmatthew thinks i copied setting it to 1 from myx, but myx resets
it to a proper value later on when it figures out what the chip is
capable of. how embarrassment.


# 1.50 24-Jun-2020 dlg

get rid of the per device sff lock because we only use the global one.

no functional change


# 1.49 21-Jun-2020 jmatthew

The onboard interfaces on T7/S7 machines don't provide a valid MAC address
for themselves, so use the "local-mac-address" Open Firmware property
instead, as done in ix(4).

ok dlg@


# 1.48 09-May-2020 jmatthew

Use MSI-X interrupts where available. The first vector is used for
events and command completions as that's the only vector they can go to.
tx/rx queues are mapped to subsequent vectors.

ok mpi@ dlg@


Revision tags: OPENBSD_6_7_BASE
# 1.47 22-Apr-2020 mpi

Use I40E_QUEUE_TYPE_EOL instead of hardcoding its value localy.

ok jmatthew@


# 1.46 19-Nov-2019 yasuoka

Correct the link speed constant variables. From Shoichi Yamaguchi.

ok jmatthew


Revision tags: OPENBSD_6_6_BASE
# 1.45 02-Oct-2019 yasuoka

When dequeuing an aqb from the live queue and move it to the idle queue,
remove it from the live queue. Found by Shoichi Yamaguchi.

ok dlg


# 1.44 30-Sep-2019 dlg

use the right values when figuring out which if_baudrate to use.

i was basically using ISSET(bitfield, shift) instead of
ISSET(bitfield, 1 << shift), so things didn't line up properly.

before this the baudrate would come out as 0, but lacp mode in
trunk(4) wants a non-zero if_baudrate value to help pick the best
aggregator to use. if the port reports 0, it doesn't get selected.
this probably explains why trunk(4) doesn't like doing lacp on
virtual ethernet interfaces too, but at least i can get a number
that actually means something on a hardware interface. alternatively,
use aggr(4), which doesn't care about the baudrate.

ok jmatthew@


# 1.43 27-Aug-2019 dlg

don't check the page number was set correctly.

there are some extremely terrible modules out there that violate
specs badly. some SFP+ DACs i have don't support page switching,
and always report page 0xff. let userland look at them anyway.


# 1.42 27-Aug-2019 dlg

make SIOCGIFSFFPAGE support QSFP modules better.

basically the "read phy register" admin op uses the "address" field
on SFP modules for the i2c address, but uses that same field for
the "sff page number" for QSFP. this wraps the calls to the admin
op with sfp and qsfp specific code, and chooses between them depending
on what type of phy the fw reports.

tested by me and jmatthew@ pretty hard


# 1.41 29-Jul-2019 jmatthew

remove some VF bits now that iavf(4) exists

ok dlg@


# 1.40 21-Jul-2019 dlg

apply backpressure when ifiq says the stack is getting busy


# 1.39 04-Jul-2019 jmatthew

when the mac address changes, update the mac/vlan filters accordingly.

ok dlg@


# 1.38 04-May-2019 jsg

fix array bounds check in ixl_search_link_speed()
ok dlg@


# 1.37 15-Apr-2019 visa

Use timeout_del_barrier(9) instead of timeout_del(9) followed by
conditional timeout_barrier(9).

OK kn@ dlg@


Revision tags: OPENBSD_6_5_BASE
# 1.36 10-Apr-2019 phessler

Add support for X722 to ixl(4)

full list of pci ids from sthen@

OK sthen@, jmatthew@, deraadt@


# 1.35 10-Apr-2019 dlg

implement support for SIOCGIFSFFPAGE

this relies on some firmware commands for accessing the i2c bus
that are not available until a relatively recent API version. our
nics using API 1.4 and 1.5 don't handle the command, but the API
1.7 nic we have is happy to talk to the module that is plugged into
it.

xl710 cards (the 40g ones with a qsfp+ connector) can be split up
into 4 functions that represent lanes on a single port. you can get
qsfp+ to 4x sfp+ cables so you can use the different lanes/functions
as completely independent interfaces. however, because each lane/
function is one port and therefore module, we need to serialise
access to the module by at least the port. this is to prevent
concurrent reads of different pages of the one module from stepping
on each other. i took the easy path and made a single ixlsff lock,
which is at least conservative.


# 1.34 01-Apr-2019 jmatthew

Enable the full pre-reset code path again, now that we've fixed the cause
of the pcie errors.

ok dlg@


# 1.33 01-Apr-2019 jmatthew

Don't use a prefetchable mapping for the registers - the controller only
accepts specific read and write widths, so prefetching and write combining
can cause it to generate pcie errors.

ok dlg@


# 1.32 29-Mar-2019 dlg

remove ifiq_barrier in ixl_down cos ifiq tasks don't use nic resources.

ixl_down frees all the dma memory used by the rings, but that memory
isn't used by ifiqs, so there's no need to wait for them to finish


# 1.31 22-Mar-2019 dlg

back out 1.28

i misread the doco, having promisc vlan reception is what we want.


# 1.30 22-Mar-2019 dlg

use a cond in ixl_atq_exec to wait for a command to be completed.

the main change is to not run ixl_atq_done, cos the interrupt handler
is supposed to do that for us.


# 1.29 22-Mar-2019 dlg

don't blindly complete admin queue entries

there's a bit in the flags field that the firmware sets when the
command is done, so check to decide if the command is ready to be
completed. this in turn makes ixl_iff work.

"oh" jmatthew@


# 1.28 21-Mar-2019 dlg

don't set VLAN things when configuring promisc

the doco says vlan things should only be set if you're manipulating
something on a specific vlan, otherwise the config applies to
everything, which is how the stack wants things to work at the
moment.

there's still something wrong in here, but let's get this out of
the way first.


# 1.27 21-Mar-2019 dlg

run event callbacks directly in the atq processing

previously events were queued on an SLIST, but multiple link state
events could fire with the same callback. this corrupted the SLIST
and effectively caused an infinite loop.

ok jmatthew@


# 1.26 12-Mar-2019 jmatthew

Until we can figure out why it causes NMIs on some machines, skip the
pre-reset steps described in Intel's datasheet and also their driver code.

ok dlg@


# 1.25 06-Mar-2019 jmatthew

Ignore rx interrupts if the interface isn't supposed to be running.
In some situations the pxe rom seems to leave rx interrupts
pending, so we get them as soon as we turn interrupts on.
Trying to process rx interrupts before we've allocated an rx ring
leads to crashes.

ok dlg@


# 1.24 01-Mar-2019 dlg

use ifiq_input instead of if_input

call if_rxr_livelocked if ifiq_input says to slow down


# 1.23 26-Feb-2019 dlg

tweak the mbuf loading in the tx path so it's easier to read.

count mbuf load failures as output errors so i can see if that's
a problem (it's not, but at least i can see it isn't now).


# 1.22 26-Feb-2019 dlg

put the rx buffer at the end of the cluster.

makes m_prepend later less likely to allocate a new mbuf.


# 1.21 26-Feb-2019 dlg

avoid a deadlock in ixl_down when calling ifq_barrier.

this is particularly noticable on sparc64 when you reboot.

ok jmatthew@


# 1.20 24-Feb-2019 dlg

get rid of an unused softc member


# 1.19 01-Feb-2019 jmatthew

fix up calculation of our physical function id, making the second port
on dual port cards work much better.

ok dlg@


# 1.18 29-Jan-2019 dlg

get rid of some more debug printfs

suggested by jmatthew@


# 1.17 29-Jan-2019 dlg

don't need to print the base queue number.

ok jmatthew@


# 1.16 22-Jan-2019 jmatthew

Increase hardmtu to the maximum according to the datasheet and set the rx
packet size limit to match so jumbos actually work. Larger packets are
split across multiple buffers on the ring, so the buffers themselves stay
the same size.

ok dlg@


# 1.15 22-Jan-2019 jmatthew

Add and remove mac filters for multicast addresses.

ok dlg@


# 1.14 21-Jan-2019 jmatthew

also add a mac filter that ignores vlans for the broadcast address, so we can
see arp requests on vlans, among other things.

ok dlg@


# 1.13 20-Jan-2019 jmatthew

Replace the default mac filter with one that ignores vlans, and enable
promisc vlan mode so we can see vlan tagged traffic.

ok dlg@


# 1.12 20-Jan-2019 jmatthew

implement ixl_rxrinfo, dynamically allocating the buffer to prepare for
multiple receive rings at some point in the future.

ok dlg@


# 1.11 20-Jan-2019 jmatthew

Handle link state change interrupts by issuing IXL_AQ_OP_PHY_LINK_STATUS
to the admin queue. We don't need to wait for or process the reply,
because the existing admin reply queue processing already does it.

ok dlg@


# 1.10 19-Jan-2019 jmatthew

actually set CAUSE_ENA on the rx and tx queues, and re-enable interrupts
at the start of the interrupt handler. now it works well enough to commit
over.

ok dlg@


# 1.9 18-Jan-2019 jmatthew

pack hmc bits in the right order

ok dlg@


# 1.8 18-Nov-2018 jmatthew

request notification of link state changes, which helps us detect
link when it takes a bit longer to establish.

ok dlg@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.7 21-Dec-2017 dlg

add some ifq and ifiq barriers in ixl_down.

move rxfill in ixl_up so the qtail is only written once.


# 1.6 21-Dec-2017 dlg

now that we have multiple input queues in ifnet structs, use them.

for now we still only have one set of tx and rx rings, but sets us up to
bump that number.


# 1.5 15-Dec-2017 dlg

put where im up to into the tree so jmatthew@ can look at it.


# 1.4 29-Nov-2017 dlg

turns out you dont have to configure a vsi as the default in a veb.

what a waste of two days.


# 1.3 29-Nov-2017 dlg

let this build on sparc64 again.


# 1.2 28-Nov-2017 dlg

remove the #if 0ed out ixl_add_veb now that cvs has backed it up

im pretty sure we dont need it if we're just using the chip as a
single ethernet port.


# 1.1 28-Nov-2017 dlg

add ixl(4) for the "Intel Ethernet 700 Series"

this doesn't work yet, but it very recently got too big to hack on
without cvs to help me manage further changes to it.

ok deraadt@


# 1.97 14-Feb-2024 bluhm

Check IP length in ether_extract_headers().

For LRO with ix(4) it is necessary to detect ethernet padding.
Extract ip_len and ip6_plen from the mbuf and provide it to the
drivers.
Add extended sanitity checks, like IP packet is shorter than TCP
header. This prevents offloading to network hardware with bougus
packets.
Also iphlen of extracted headers contains header length for IPv4
and IPv6, to make code in drivers simpler.

OK mglocker@


# 1.96 13-Feb-2024 bluhm

Analyse header layout in ether_extract_headers().

Several drivers need IPv4 header length and TCP offset for checksum
offload, TSO and LRO. Accessing these fields directly caused crashes
on sparc64 due to misaligned access. It cannot be guaranteed that
IP and TCP header is 4 byte aligned in driver level. Also gcc 4.2.1
assumes that bit fields can be accessed with 32 bit load instructions.

Use memcpy() in ether_extract_headers() to get the bits from IPv4
and TCP header and store the header length in struct ether_extracted.
From there network drivers can esily use it without caring about
alignment and bit shift. Do some sanity checks with the length
values to prevent that invalid values from evil packets get stored
into hardware registers. If check fails, clear the pointer to the
header to hide it from the driver. Add debug prints that help to
figure out the reason for bad packets and provide information when
debugging drivers.

OK mglocker@


# 1.95 07-Jan-2024 bluhm

In ixl(4) attach, initialize mutex before using it.

Function ixl_get_link_status() calls ixl_set_link_status() which
locks sc_link_state_mtx. Move initilization of mutex before calling
ixl_get_link_status(). This makes witness happy.

Bug reported and fix tested by Hrvoje Popovski; OK miod@


# 1.94 30-Dec-2023 bluhm

Set ixl(4) IXL_TX_PKT_DESCS to 8.

Mark Patruck has reported problems with ixl revision 1.90 TSO diff.
He uses ixl device passthrough from Linux via KVM to OpenBSD guest.
After a few hours of operation, interface locks up with oactive.
The problem also occures with TSO disabled, after the TSO diff had
been commited. deraadt@ has seen similar problems with ixl interface
on sparc64.
Changing IXL_TX_PKT_DESCS back to the original value 8 fixes the
lockup and even TSO on the hardware still works. FreeBSD and NetBSD
also use this value. The 32 was copied from ix(4) TSO diff and is
not necessary for ixl(4).

debugged with jan@; lot of bisecting and testing by Mark Patruck
OK mglocker@ patrick@


# 1.93 10-Nov-2023 bluhm

Make ifq and ifiq interface MP safe.

Rename ifq_set_maxlen() to ifq_init_maxlen(). This function neither
uses WRITE_ONCE() nor a mutex and is called before the ifq mutex
is initialized. The new name expresses that it should be used only
during interface attach when there is no concurrency.

Protect ifq_len(), ifq_empty(), ifiq_len(), and ifiq_empty() with
READ_ONCE(). They can be used without lock as they only read a
single integer.

OK dlg@


# 1.92 20-Oct-2023 jan

Add missing tcps_outpkttso counter to ixl(4) TSO setup.

ok bluhm@


# 1.91 20-Oct-2023 jan

Improve bad comment.

pointed out by kn@

ok kn@


# 1.90 19-Oct-2023 jan

Enable TCP Segmentation Offloading for ixl(4)

Tested on amd64 and sparc64.
Also tested by bluhm@.

ok bluhm@


Revision tags: OPENBSD_7_4_BASE
# 1.89 29-Sep-2023 bluhm

Replace kernel lock with mutex in ixl(4) media status.

Witness found that sc_atq_mtx mutex is held when kernel lock is
acquired. This might cause a deadlock. Protect sc_media_status
and sc_media_active with the link state mutex instead. Global
fields ifm->ifm_status and ifm->ifm_active are still protected by
kernel lock.

OK tobhe@


# 1.88 19-Jul-2023 jan

Protect ixl(4) admin queue with mutex(9).

with tweaks from bluhm

tested by bluhm

ok bluhm@


Revision tags: OPENBSD_7_3_BASE
# 1.87 06-Feb-2023 jan

consolidate mbuf header parsing on device driver layer

with tweaks from mvs@, mpi@, dlg@, naddy@ and bluhm@

"go for it" deraadt@

ok naddy@, mvs@


# 1.86 26-Jan-2023 deraadt

backing "consolidate mbuf header parsing on device driver layer"
easily repeatable ASSERT happens seconds after starting compiles over nfs.


# 1.85 24-Jan-2023 jan

consolidate mbuf header parsing on device driver layer

with tweaks from mvs@, mpi@ and dlg@

ok mvs@, dlg@


Revision tags: OPENBSD_7_2_BASE
# 1.84 05-Aug-2022 bluhm

The netlock for SIOCSIFMEDIA and SIOCGIFMEDIA ioctl is not necessary.
Legacy drivers run with kernel lock, interface media is MP safe or
has kernel lock. Assert kernel lock in ix(4) and ixl(4).
OK kettenis@


Revision tags: OPENBSD_7_1_BASE
# 1.83 11-Mar-2022 mpi

Constify struct cfattach.


# 1.82 10-Feb-2022 bluhm

Enable receive checksum offloading on ixl(4) network interfaces.
from jan@; test and OK dlg@


# 1.81 09-Feb-2022 dlg

enable hw vlan tag handling in the rx path too.

also tested on both x86 and sparc64.


# 1.80 09-Feb-2022 dlg

enable hardware vlan tagging.

tested on sparc64 and x86


# 1.79 08-Feb-2022 dlg

bring back IPv4, TCP4/6 and UDP4/6 checksum offloading.

this was first introduced in r1.176 by jan@. this diff includes two
fixes to that implementation.

the most important one is to parse the ip and tcp headers before a
possible call to m_defrag. if an l4 offload is requested, it's only
requested by the stack when the payload is correctly aligned and
with each header contiguous in memory. this means you can use
m_getptr and cast the packet data to the relevant headers to read
them directly because that's what the stack does when it's working
on them. this makes it cheap to work on them too.

however, if you m_defrag, it ignores the alignment and ends up
making it unsafe to dereference the ip and tcp/udp payloads on
strict alignment architectures. if we want to look at the headers
after m_defrag, we'd likely have to copy them onto the stack first.

the other fix is to reset the offload bits between packets in the
loop in ixl_start.

another difference is that this code skips parsing the packet if
no checksum offload is requested.

tests and a tweak by bluhm@ to actually use the offloading
tested by me on sparc64 and x86 boxes

ok bluhm@ jmatthew@


# 1.78 09-Jan-2022 jsg

spelling
feedback and ok tb@ jmc@ ok ratchov@


# 1.77 27-Nov-2021 deraadt

previous commit causes gcc to perform an unaligned access to the tcphdr
(at least on sparc64) since it accesses the bitfield using an "int sized"
instructions, rather than the minimally sized byte instruction. This is
permitted by the language laywers who probably prefer we change the tcphdr
in every packet. It is not clear how to convince gcc to avoid this behaviour,
and a week of futzing hasn't found fast path solutions yet. In the meantime
the tree may not be broken.


# 1.76 09-Nov-2021 jan

Add IPv4, TCP4/6 and UDP4/6 checksum offloading.

ok jmatthew@


Revision tags: OPENBSD_7_0_BASE
# 1.75 23-Jul-2021 jmatthew

pci_intr_msix_count() is the function that drivers using multiple MSI-X
vectors use to decide whether to use MSI-X, so make it return 0 if MSI
is not enabled for the device.

fixes problems with ix(4) on older amd64 hardware and current riscv64
ok kettenis@ dlg@


Revision tags: OPENBSD_6_9_BASE
# 1.74 26-Mar-2021 jan

Add PCI ID for Intel X710 10G SFP+ NIC

ok patrick@


# 1.73 26-Feb-2021 jan

Add missing PCI product IDs for x710 10GBase-T into ixl(4)

OK phessler


# 1.72 25-Jan-2021 dlg

if the rx descriptor reports the rss hash, use it for the mbuf flowid.

ok jmatthew@


# 1.71 22-Dec-2020 dlg

name the rx rings like ix does for systat mb


# 1.70 12-Dec-2020 jan

Rename the macro MCLGETI to MCLGETL and removes the dead parameter ifp.

OK dlg@, bluhm@
No Opinion mpi@
Not against it claudio@


# 1.69 02-Nov-2020 dlg

add support for ctl reads and writes on 710 chips with API < 1.5

this gets rid of some annoying errors when bringing such an interface
up, but more importantly is allows RSS to work on these boards with
older firmware.

ok jmatthew@


Revision tags: OPENBSD_6_8_BASE
# 1.68 16-Jul-2020 dlg

use a mutex to serialise the test and set of ifp->if_link_state.

this was serialised by NET_LOCK, but now i get link state change
information in an interrupt context, so i shouldn't (can't) do that
anymore.

ok jmatthew@


# 1.67 16-Jul-2020 dlg

sc_atq_mtx is unused, so get rid of it


# 1.66 12-Jul-2020 dlg

it's not an error if the API doesnt support reading sfp/qsfp stuff.

it's just not supported. the manpage says why.

ok sthen@ deraadt@


# 1.65 11-Jul-2020 dlg

don't complain if the API is too low to support "query phy types".

it just means we won't know what to show in ifconfig media output,
but that's not a huge deal. there's still some more issues around
api versions and driver support that we're working on though.

ok sthen@ jmatthew@


# 1.64 10-Jul-2020 patrick

Change users of IFQ_SET_MAXLEN() and IFQ_IS_EMPTY() to use the "new" API.

ok dlg@ tobhe@


# 1.63 07-Jul-2020 dlg

fix a copy pasto.

from netbsd if_ixl.c r1.39 by yamaguchi


# 1.62 07-Jul-2020 dlg

don't try and be too clever in the kstat update timeout.


# 1.61 07-Jul-2020 dlg

remove some old code from a previous version of the kstat diff.

ixl(4) is only enabled on 64bit archs, so we don't need a 32bit
fallback for the 48 and 64 bit counter reads.


# 1.60 07-Jul-2020 dlg

add kstat support for reading hardware counters.

this chip is annoyingly complicated, which is reflected in how
complicated it is to read counters off the chip. while we just use
ixl as a normal network interface, the chip is capable of being a
switch with physical ports, virtual ports and all sorts of other
functionality, and there are counters in different places for all
these different pieces. in our simple setup the driver interface
is mapped to a single physical port which we talk to via a single
virtual switch interface. this diff adds counters on each interface
for the physical port and for the virtual switch interface (vsi).

the port counters show what the hardware is doing, while the vsi
counters show how the driver is interacting with the chip. for
things like packet counters, these numbers tend to correlate strongly,
but there are some differences. if the chip drops packets cos there's
no descriptors on the rx ring, that's shown in the vsi counters.
problems talking to the physical network (eg, packet corruption off
the wire) are reported on the port counters.

on top of the chip just being complicated, reading the counters is
a complicated activity on its own. because the counters can be read
by multiple consumers in a virtualised environment, the counters
never really get reset. they are also stored in annoyingly small
fields. this means you basically have to poll the chip periodically
and calculate differences between the polls to avoid losing numbers
if they overflow too quickly.


# 1.59 26-Jun-2020 dlg

whitespace fixes, no functional change.


# 1.58 26-Jun-2020 dlg

fix link state handling so we can see link go both up and down.


# 1.57 25-Jun-2020 dlg

LUT writes go through actual regsiters on 710s, not ctl writes.

found by jmatthew@


# 1.56 25-Jun-2020 dlg

add rss/toeplitz support for 710 chips.

this basically distributes incoming packets over the rx rings, where
without this they would all land on ring 0.

note that the 722 stuff is stubbed out at the moment. i don't have
an x722 to test with, so it's hard to get motivated to write the
code for it.

this is based on stuff supplied by christiano haesbaert.


# 1.55 25-Jun-2020 dlg

use the ixl_chip struct to store different rss_hena settings for 710/722


# 1.54 25-Jun-2020 dlg

add definitions for rss bits.

based on info from christiano haesbaert


# 1.53 25-Jun-2020 dlg

we're close to a point where the differences between 710s and 722s matter.

this adds a struct ixl_chip, which should hold the differences in
functionality between 710s and 722s. this adds which type of chip
each product is to the ixl_devices array.

based on stuff from christiano haesbaert


# 1.52 25-Jun-2020 dlg

use intrmap to set up multiple queues across multiple cpus.

ixl(4) is only enabled on amd64 and sparc64, and both of them now
support pci_intr_establish_cpu(), so it is safe to apply this.

a few things to note:

- the chip only supports a power of 2 number of queues, (ie, 1, 2,
4, 8, etc), so this also tests the INTRMAP_POWEROF2 flag to
intrmap_create. i tested this on a box with 6 cpus and it did the
right thing.
- the chip can support a lot of vectors, but we're limiting it to
8 for now.
- rss with toeplitz is not implemented yet, so all rxed packets end
up on the 0th queue for now.

jmatthew@ had done most of the work already, and christiano haesbaert
provided some hints to motivate me to work on this bit of it.

tested by jmatthew@ on sparc64
ok jmatthew@


# 1.51 24-Jun-2020 dlg

set IFQ_SET_MAXLEN to the number of slots on the tx ring, not 1.

this effectively enables tx mitigation on this chip. hrvoje popovski
tested it and discovered it adds about 20% to forwarding performance
on his test machine, and brings it more in line with ix(4) performance.

jmatthew thinks i copied setting it to 1 from myx, but myx resets
it to a proper value later on when it figures out what the chip is
capable of. how embarrassment.


# 1.50 24-Jun-2020 dlg

get rid of the per device sff lock because we only use the global one.

no functional change


# 1.49 21-Jun-2020 jmatthew

The onboard interfaces on T7/S7 machines don't provide a valid MAC address
for themselves, so use the "local-mac-address" Open Firmware property
instead, as done in ix(4).

ok dlg@


# 1.48 09-May-2020 jmatthew

Use MSI-X interrupts where available. The first vector is used for
events and command completions as that's the only vector they can go to.
tx/rx queues are mapped to subsequent vectors.

ok mpi@ dlg@


Revision tags: OPENBSD_6_7_BASE
# 1.47 22-Apr-2020 mpi

Use I40E_QUEUE_TYPE_EOL instead of hardcoding its value localy.

ok jmatthew@


# 1.46 19-Nov-2019 yasuoka

Correct the link speed constant variables. From Shoichi Yamaguchi.

ok jmatthew


Revision tags: OPENBSD_6_6_BASE
# 1.45 02-Oct-2019 yasuoka

When dequeuing an aqb from the live queue and move it to the idle queue,
remove it from the live queue. Found by Shoichi Yamaguchi.

ok dlg


# 1.44 30-Sep-2019 dlg

use the right values when figuring out which if_baudrate to use.

i was basically using ISSET(bitfield, shift) instead of
ISSET(bitfield, 1 << shift), so things didn't line up properly.

before this the baudrate would come out as 0, but lacp mode in
trunk(4) wants a non-zero if_baudrate value to help pick the best
aggregator to use. if the port reports 0, it doesn't get selected.
this probably explains why trunk(4) doesn't like doing lacp on
virtual ethernet interfaces too, but at least i can get a number
that actually means something on a hardware interface. alternatively,
use aggr(4), which doesn't care about the baudrate.

ok jmatthew@


# 1.43 27-Aug-2019 dlg

don't check the page number was set correctly.

there are some extremely terrible modules out there that violate
specs badly. some SFP+ DACs i have don't support page switching,
and always report page 0xff. let userland look at them anyway.


# 1.42 27-Aug-2019 dlg

make SIOCGIFSFFPAGE support QSFP modules better.

basically the "read phy register" admin op uses the "address" field
on SFP modules for the i2c address, but uses that same field for
the "sff page number" for QSFP. this wraps the calls to the admin
op with sfp and qsfp specific code, and chooses between them depending
on what type of phy the fw reports.

tested by me and jmatthew@ pretty hard


# 1.41 29-Jul-2019 jmatthew

remove some VF bits now that iavf(4) exists

ok dlg@


# 1.40 21-Jul-2019 dlg

apply backpressure when ifiq says the stack is getting busy


# 1.39 04-Jul-2019 jmatthew

when the mac address changes, update the mac/vlan filters accordingly.

ok dlg@


# 1.38 04-May-2019 jsg

fix array bounds check in ixl_search_link_speed()
ok dlg@


# 1.37 15-Apr-2019 visa

Use timeout_del_barrier(9) instead of timeout_del(9) followed by
conditional timeout_barrier(9).

OK kn@ dlg@


Revision tags: OPENBSD_6_5_BASE
# 1.36 10-Apr-2019 phessler

Add support for X722 to ixl(4)

full list of pci ids from sthen@

OK sthen@, jmatthew@, deraadt@


# 1.35 10-Apr-2019 dlg

implement support for SIOCGIFSFFPAGE

this relies on some firmware commands for accessing the i2c bus
that are not available until a relatively recent API version. our
nics using API 1.4 and 1.5 don't handle the command, but the API
1.7 nic we have is happy to talk to the module that is plugged into
it.

xl710 cards (the 40g ones with a qsfp+ connector) can be split up
into 4 functions that represent lanes on a single port. you can get
qsfp+ to 4x sfp+ cables so you can use the different lanes/functions
as completely independent interfaces. however, because each lane/
function is one port and therefore module, we need to serialise
access to the module by at least the port. this is to prevent
concurrent reads of different pages of the one module from stepping
on each other. i took the easy path and made a single ixlsff lock,
which is at least conservative.


# 1.34 01-Apr-2019 jmatthew

Enable the full pre-reset code path again, now that we've fixed the cause
of the pcie errors.

ok dlg@


# 1.33 01-Apr-2019 jmatthew

Don't use a prefetchable mapping for the registers - the controller only
accepts specific read and write widths, so prefetching and write combining
can cause it to generate pcie errors.

ok dlg@


# 1.32 29-Mar-2019 dlg

remove ifiq_barrier in ixl_down cos ifiq tasks don't use nic resources.

ixl_down frees all the dma memory used by the rings, but that memory
isn't used by ifiqs, so there's no need to wait for them to finish


# 1.31 22-Mar-2019 dlg

back out 1.28

i misread the doco, having promisc vlan reception is what we want.


# 1.30 22-Mar-2019 dlg

use a cond in ixl_atq_exec to wait for a command to be completed.

the main change is to not run ixl_atq_done, cos the interrupt handler
is supposed to do that for us.


# 1.29 22-Mar-2019 dlg

don't blindly complete admin queue entries

there's a bit in the flags field that the firmware sets when the
command is done, so check to decide if the command is ready to be
completed. this in turn makes ixl_iff work.

"oh" jmatthew@


# 1.28 21-Mar-2019 dlg

don't set VLAN things when configuring promisc

the doco says vlan things should only be set if you're manipulating
something on a specific vlan, otherwise the config applies to
everything, which is how the stack wants things to work at the
moment.

there's still something wrong in here, but let's get this out of
the way first.


# 1.27 21-Mar-2019 dlg

run event callbacks directly in the atq processing

previously events were queued on an SLIST, but multiple link state
events could fire with the same callback. this corrupted the SLIST
and effectively caused an infinite loop.

ok jmatthew@


# 1.26 12-Mar-2019 jmatthew

Until we can figure out why it causes NMIs on some machines, skip the
pre-reset steps described in Intel's datasheet and also their driver code.

ok dlg@


# 1.25 06-Mar-2019 jmatthew

Ignore rx interrupts if the interface isn't supposed to be running.
In some situations the pxe rom seems to leave rx interrupts
pending, so we get them as soon as we turn interrupts on.
Trying to process rx interrupts before we've allocated an rx ring
leads to crashes.

ok dlg@


# 1.24 01-Mar-2019 dlg

use ifiq_input instead of if_input

call if_rxr_livelocked if ifiq_input says to slow down


# 1.23 26-Feb-2019 dlg

tweak the mbuf loading in the tx path so it's easier to read.

count mbuf load failures as output errors so i can see if that's
a problem (it's not, but at least i can see it isn't now).


# 1.22 26-Feb-2019 dlg

put the rx buffer at the end of the cluster.

makes m_prepend later less likely to allocate a new mbuf.


# 1.21 26-Feb-2019 dlg

avoid a deadlock in ixl_down when calling ifq_barrier.

this is particularly noticable on sparc64 when you reboot.

ok jmatthew@


# 1.20 24-Feb-2019 dlg

get rid of an unused softc member


# 1.19 01-Feb-2019 jmatthew

fix up calculation of our physical function id, making the second port
on dual port cards work much better.

ok dlg@


# 1.18 29-Jan-2019 dlg

get rid of some more debug printfs

suggested by jmatthew@


# 1.17 29-Jan-2019 dlg

don't need to print the base queue number.

ok jmatthew@


# 1.16 22-Jan-2019 jmatthew

Increase hardmtu to the maximum according to the datasheet and set the rx
packet size limit to match so jumbos actually work. Larger packets are
split across multiple buffers on the ring, so the buffers themselves stay
the same size.

ok dlg@


# 1.15 22-Jan-2019 jmatthew

Add and remove mac filters for multicast addresses.

ok dlg@


# 1.14 21-Jan-2019 jmatthew

also add a mac filter that ignores vlans for the broadcast address, so we can
see arp requests on vlans, among other things.

ok dlg@


# 1.13 20-Jan-2019 jmatthew

Replace the default mac filter with one that ignores vlans, and enable
promisc vlan mode so we can see vlan tagged traffic.

ok dlg@


# 1.12 20-Jan-2019 jmatthew

implement ixl_rxrinfo, dynamically allocating the buffer to prepare for
multiple receive rings at some point in the future.

ok dlg@


# 1.11 20-Jan-2019 jmatthew

Handle link state change interrupts by issuing IXL_AQ_OP_PHY_LINK_STATUS
to the admin queue. We don't need to wait for or process the reply,
because the existing admin reply queue processing already does it.

ok dlg@


# 1.10 19-Jan-2019 jmatthew

actually set CAUSE_ENA on the rx and tx queues, and re-enable interrupts
at the start of the interrupt handler. now it works well enough to commit
over.

ok dlg@


# 1.9 18-Jan-2019 jmatthew

pack hmc bits in the right order

ok dlg@


# 1.8 18-Nov-2018 jmatthew

request notification of link state changes, which helps us detect
link when it takes a bit longer to establish.

ok dlg@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.7 21-Dec-2017 dlg

add some ifq and ifiq barriers in ixl_down.

move rxfill in ixl_up so the qtail is only written once.


# 1.6 21-Dec-2017 dlg

now that we have multiple input queues in ifnet structs, use them.

for now we still only have one set of tx and rx rings, but sets us up to
bump that number.


# 1.5 15-Dec-2017 dlg

put where im up to into the tree so jmatthew@ can look at it.


# 1.4 29-Nov-2017 dlg

turns out you dont have to configure a vsi as the default in a veb.

what a waste of two days.


# 1.3 29-Nov-2017 dlg

let this build on sparc64 again.


# 1.2 28-Nov-2017 dlg

remove the #if 0ed out ixl_add_veb now that cvs has backed it up

im pretty sure we dont need it if we're just using the chip as a
single ethernet port.


# 1.1 28-Nov-2017 dlg

add ixl(4) for the "Intel Ethernet 700 Series"

this doesn't work yet, but it very recently got too big to hack on
without cvs to help me manage further changes to it.

ok deraadt@


# 1.95 07-Jan-2024 bluhm

In ixl(4) attach, initialize mutex before using it.

Function ixl_get_link_status() calls ixl_set_link_status() which
locks sc_link_state_mtx. Move initilization of mutex before calling
ixl_get_link_status(). This makes witness happy.

Bug reported and fix tested by Hrvoje Popovski; OK miod@


# 1.94 30-Dec-2023 bluhm

Set ixl(4) IXL_TX_PKT_DESCS to 8.

Mark Patruck has reported problems with ixl revision 1.90 TSO diff.
He uses ixl device passthrough from Linux via KVM to OpenBSD guest.
After a few hours of operation, interface locks up with oactive.
The problem also occures with TSO disabled, after the TSO diff had
been commited. deraadt@ has seen similar problems with ixl interface
on sparc64.
Changing IXL_TX_PKT_DESCS back to the original value 8 fixes the
lockup and even TSO on the hardware still works. FreeBSD and NetBSD
also use this value. The 32 was copied from ix(4) TSO diff and is
not necessary for ixl(4).

debugged with jan@; lot of bisecting and testing by Mark Patruck
OK mglocker@ patrick@


# 1.93 10-Nov-2023 bluhm

Make ifq and ifiq interface MP safe.

Rename ifq_set_maxlen() to ifq_init_maxlen(). This function neither
uses WRITE_ONCE() nor a mutex and is called before the ifq mutex
is initialized. The new name expresses that it should be used only
during interface attach when there is no concurrency.

Protect ifq_len(), ifq_empty(), ifiq_len(), and ifiq_empty() with
READ_ONCE(). They can be used without lock as they only read a
single integer.

OK dlg@


# 1.92 20-Oct-2023 jan

Add missing tcps_outpkttso counter to ixl(4) TSO setup.

ok bluhm@


# 1.91 20-Oct-2023 jan

Improve bad comment.

pointed out by kn@

ok kn@


# 1.90 19-Oct-2023 jan

Enable TCP Segmentation Offloading for ixl(4)

Tested on amd64 and sparc64.
Also tested by bluhm@.

ok bluhm@


Revision tags: OPENBSD_7_4_BASE
# 1.89 29-Sep-2023 bluhm

Replace kernel lock with mutex in ixl(4) media status.

Witness found that sc_atq_mtx mutex is held when kernel lock is
acquired. This might cause a deadlock. Protect sc_media_status
and sc_media_active with the link state mutex instead. Global
fields ifm->ifm_status and ifm->ifm_active are still protected by
kernel lock.

OK tobhe@


# 1.88 19-Jul-2023 jan

Protect ixl(4) admin queue with mutex(9).

with tweaks from bluhm

tested by bluhm

ok bluhm@


Revision tags: OPENBSD_7_3_BASE
# 1.87 06-Feb-2023 jan

consolidate mbuf header parsing on device driver layer

with tweaks from mvs@, mpi@, dlg@, naddy@ and bluhm@

"go for it" deraadt@

ok naddy@, mvs@


# 1.86 26-Jan-2023 deraadt

backing "consolidate mbuf header parsing on device driver layer"
easily repeatable ASSERT happens seconds after starting compiles over nfs.


# 1.85 24-Jan-2023 jan

consolidate mbuf header parsing on device driver layer

with tweaks from mvs@, mpi@ and dlg@

ok mvs@, dlg@


Revision tags: OPENBSD_7_2_BASE
# 1.84 05-Aug-2022 bluhm

The netlock for SIOCSIFMEDIA and SIOCGIFMEDIA ioctl is not necessary.
Legacy drivers run with kernel lock, interface media is MP safe or
has kernel lock. Assert kernel lock in ix(4) and ixl(4).
OK kettenis@


Revision tags: OPENBSD_7_1_BASE
# 1.83 11-Mar-2022 mpi

Constify struct cfattach.


# 1.82 10-Feb-2022 bluhm

Enable receive checksum offloading on ixl(4) network interfaces.
from jan@; test and OK dlg@


# 1.81 09-Feb-2022 dlg

enable hw vlan tag handling in the rx path too.

also tested on both x86 and sparc64.


# 1.80 09-Feb-2022 dlg

enable hardware vlan tagging.

tested on sparc64 and x86


# 1.79 08-Feb-2022 dlg

bring back IPv4, TCP4/6 and UDP4/6 checksum offloading.

this was first introduced in r1.176 by jan@. this diff includes two
fixes to that implementation.

the most important one is to parse the ip and tcp headers before a
possible call to m_defrag. if an l4 offload is requested, it's only
requested by the stack when the payload is correctly aligned and
with each header contiguous in memory. this means you can use
m_getptr and cast the packet data to the relevant headers to read
them directly because that's what the stack does when it's working
on them. this makes it cheap to work on them too.

however, if you m_defrag, it ignores the alignment and ends up
making it unsafe to dereference the ip and tcp/udp payloads on
strict alignment architectures. if we want to look at the headers
after m_defrag, we'd likely have to copy them onto the stack first.

the other fix is to reset the offload bits between packets in the
loop in ixl_start.

another difference is that this code skips parsing the packet if
no checksum offload is requested.

tests and a tweak by bluhm@ to actually use the offloading
tested by me on sparc64 and x86 boxes

ok bluhm@ jmatthew@


# 1.78 09-Jan-2022 jsg

spelling
feedback and ok tb@ jmc@ ok ratchov@


# 1.77 27-Nov-2021 deraadt

previous commit causes gcc to perform an unaligned access to the tcphdr
(at least on sparc64) since it accesses the bitfield using an "int sized"
instructions, rather than the minimally sized byte instruction. This is
permitted by the language laywers who probably prefer we change the tcphdr
in every packet. It is not clear how to convince gcc to avoid this behaviour,
and a week of futzing hasn't found fast path solutions yet. In the meantime
the tree may not be broken.


# 1.76 09-Nov-2021 jan

Add IPv4, TCP4/6 and UDP4/6 checksum offloading.

ok jmatthew@


Revision tags: OPENBSD_7_0_BASE
# 1.75 23-Jul-2021 jmatthew

pci_intr_msix_count() is the function that drivers using multiple MSI-X
vectors use to decide whether to use MSI-X, so make it return 0 if MSI
is not enabled for the device.

fixes problems with ix(4) on older amd64 hardware and current riscv64
ok kettenis@ dlg@


Revision tags: OPENBSD_6_9_BASE
# 1.74 26-Mar-2021 jan

Add PCI ID for Intel X710 10G SFP+ NIC

ok patrick@


# 1.73 26-Feb-2021 jan

Add missing PCI product IDs for x710 10GBase-T into ixl(4)

OK phessler


# 1.72 25-Jan-2021 dlg

if the rx descriptor reports the rss hash, use it for the mbuf flowid.

ok jmatthew@


# 1.71 22-Dec-2020 dlg

name the rx rings like ix does for systat mb


# 1.70 12-Dec-2020 jan

Rename the macro MCLGETI to MCLGETL and removes the dead parameter ifp.

OK dlg@, bluhm@
No Opinion mpi@
Not against it claudio@


# 1.69 02-Nov-2020 dlg

add support for ctl reads and writes on 710 chips with API < 1.5

this gets rid of some annoying errors when bringing such an interface
up, but more importantly is allows RSS to work on these boards with
older firmware.

ok jmatthew@


Revision tags: OPENBSD_6_8_BASE
# 1.68 16-Jul-2020 dlg

use a mutex to serialise the test and set of ifp->if_link_state.

this was serialised by NET_LOCK, but now i get link state change
information in an interrupt context, so i shouldn't (can't) do that
anymore.

ok jmatthew@


# 1.67 16-Jul-2020 dlg

sc_atq_mtx is unused, so get rid of it


# 1.66 12-Jul-2020 dlg

it's not an error if the API doesnt support reading sfp/qsfp stuff.

it's just not supported. the manpage says why.

ok sthen@ deraadt@


# 1.65 11-Jul-2020 dlg

don't complain if the API is too low to support "query phy types".

it just means we won't know what to show in ifconfig media output,
but that's not a huge deal. there's still some more issues around
api versions and driver support that we're working on though.

ok sthen@ jmatthew@


# 1.64 10-Jul-2020 patrick

Change users of IFQ_SET_MAXLEN() and IFQ_IS_EMPTY() to use the "new" API.

ok dlg@ tobhe@


# 1.63 07-Jul-2020 dlg

fix a copy pasto.

from netbsd if_ixl.c r1.39 by yamaguchi


# 1.62 07-Jul-2020 dlg

don't try and be too clever in the kstat update timeout.


# 1.61 07-Jul-2020 dlg

remove some old code from a previous version of the kstat diff.

ixl(4) is only enabled on 64bit archs, so we don't need a 32bit
fallback for the 48 and 64 bit counter reads.


# 1.60 07-Jul-2020 dlg

add kstat support for reading hardware counters.

this chip is annoyingly complicated, which is reflected in how
complicated it is to read counters off the chip. while we just use
ixl as a normal network interface, the chip is capable of being a
switch with physical ports, virtual ports and all sorts of other
functionality, and there are counters in different places for all
these different pieces. in our simple setup the driver interface
is mapped to a single physical port which we talk to via a single
virtual switch interface. this diff adds counters on each interface
for the physical port and for the virtual switch interface (vsi).

the port counters show what the hardware is doing, while the vsi
counters show how the driver is interacting with the chip. for
things like packet counters, these numbers tend to correlate strongly,
but there are some differences. if the chip drops packets cos there's
no descriptors on the rx ring, that's shown in the vsi counters.
problems talking to the physical network (eg, packet corruption off
the wire) are reported on the port counters.

on top of the chip just being complicated, reading the counters is
a complicated activity on its own. because the counters can be read
by multiple consumers in a virtualised environment, the counters
never really get reset. they are also stored in annoyingly small
fields. this means you basically have to poll the chip periodically
and calculate differences between the polls to avoid losing numbers
if they overflow too quickly.


# 1.59 26-Jun-2020 dlg

whitespace fixes, no functional change.


# 1.58 26-Jun-2020 dlg

fix link state handling so we can see link go both up and down.


# 1.57 25-Jun-2020 dlg

LUT writes go through actual regsiters on 710s, not ctl writes.

found by jmatthew@


# 1.56 25-Jun-2020 dlg

add rss/toeplitz support for 710 chips.

this basically distributes incoming packets over the rx rings, where
without this they would all land on ring 0.

note that the 722 stuff is stubbed out at the moment. i don't have
an x722 to test with, so it's hard to get motivated to write the
code for it.

this is based on stuff supplied by christiano haesbaert.


# 1.55 25-Jun-2020 dlg

use the ixl_chip struct to store different rss_hena settings for 710/722


# 1.54 25-Jun-2020 dlg

add definitions for rss bits.

based on info from christiano haesbaert


# 1.53 25-Jun-2020 dlg

we're close to a point where the differences between 710s and 722s matter.

this adds a struct ixl_chip, which should hold the differences in
functionality between 710s and 722s. this adds which type of chip
each product is to the ixl_devices array.

based on stuff from christiano haesbaert


# 1.52 25-Jun-2020 dlg

use intrmap to set up multiple queues across multiple cpus.

ixl(4) is only enabled on amd64 and sparc64, and both of them now
support pci_intr_establish_cpu(), so it is safe to apply this.

a few things to note:

- the chip only supports a power of 2 number of queues, (ie, 1, 2,
4, 8, etc), so this also tests the INTRMAP_POWEROF2 flag to
intrmap_create. i tested this on a box with 6 cpus and it did the
right thing.
- the chip can support a lot of vectors, but we're limiting it to
8 for now.
- rss with toeplitz is not implemented yet, so all rxed packets end
up on the 0th queue for now.

jmatthew@ had done most of the work already, and christiano haesbaert
provided some hints to motivate me to work on this bit of it.

tested by jmatthew@ on sparc64
ok jmatthew@


# 1.51 24-Jun-2020 dlg

set IFQ_SET_MAXLEN to the number of slots on the tx ring, not 1.

this effectively enables tx mitigation on this chip. hrvoje popovski
tested it and discovered it adds about 20% to forwarding performance
on his test machine, and brings it more in line with ix(4) performance.

jmatthew thinks i copied setting it to 1 from myx, but myx resets
it to a proper value later on when it figures out what the chip is
capable of. how embarrassment.


# 1.50 24-Jun-2020 dlg

get rid of the per device sff lock because we only use the global one.

no functional change


# 1.49 21-Jun-2020 jmatthew

The onboard interfaces on T7/S7 machines don't provide a valid MAC address
for themselves, so use the "local-mac-address" Open Firmware property
instead, as done in ix(4).

ok dlg@


# 1.48 09-May-2020 jmatthew

Use MSI-X interrupts where available. The first vector is used for
events and command completions as that's the only vector they can go to.
tx/rx queues are mapped to subsequent vectors.

ok mpi@ dlg@


Revision tags: OPENBSD_6_7_BASE
# 1.47 22-Apr-2020 mpi

Use I40E_QUEUE_TYPE_EOL instead of hardcoding its value localy.

ok jmatthew@


# 1.46 19-Nov-2019 yasuoka

Correct the link speed constant variables. From Shoichi Yamaguchi.

ok jmatthew


Revision tags: OPENBSD_6_6_BASE
# 1.45 02-Oct-2019 yasuoka

When dequeuing an aqb from the live queue and move it to the idle queue,
remove it from the live queue. Found by Shoichi Yamaguchi.

ok dlg


# 1.44 30-Sep-2019 dlg

use the right values when figuring out which if_baudrate to use.

i was basically using ISSET(bitfield, shift) instead of
ISSET(bitfield, 1 << shift), so things didn't line up properly.

before this the baudrate would come out as 0, but lacp mode in
trunk(4) wants a non-zero if_baudrate value to help pick the best
aggregator to use. if the port reports 0, it doesn't get selected.
this probably explains why trunk(4) doesn't like doing lacp on
virtual ethernet interfaces too, but at least i can get a number
that actually means something on a hardware interface. alternatively,
use aggr(4), which doesn't care about the baudrate.

ok jmatthew@


# 1.43 27-Aug-2019 dlg

don't check the page number was set correctly.

there are some extremely terrible modules out there that violate
specs badly. some SFP+ DACs i have don't support page switching,
and always report page 0xff. let userland look at them anyway.


# 1.42 27-Aug-2019 dlg

make SIOCGIFSFFPAGE support QSFP modules better.

basically the "read phy register" admin op uses the "address" field
on SFP modules for the i2c address, but uses that same field for
the "sff page number" for QSFP. this wraps the calls to the admin
op with sfp and qsfp specific code, and chooses between them depending
on what type of phy the fw reports.

tested by me and jmatthew@ pretty hard


# 1.41 29-Jul-2019 jmatthew

remove some VF bits now that iavf(4) exists

ok dlg@


# 1.40 21-Jul-2019 dlg

apply backpressure when ifiq says the stack is getting busy


# 1.39 04-Jul-2019 jmatthew

when the mac address changes, update the mac/vlan filters accordingly.

ok dlg@


# 1.38 04-May-2019 jsg

fix array bounds check in ixl_search_link_speed()
ok dlg@


# 1.37 15-Apr-2019 visa

Use timeout_del_barrier(9) instead of timeout_del(9) followed by
conditional timeout_barrier(9).

OK kn@ dlg@


Revision tags: OPENBSD_6_5_BASE
# 1.36 10-Apr-2019 phessler

Add support for X722 to ixl(4)

full list of pci ids from sthen@

OK sthen@, jmatthew@, deraadt@


# 1.35 10-Apr-2019 dlg

implement support for SIOCGIFSFFPAGE

this relies on some firmware commands for accessing the i2c bus
that are not available until a relatively recent API version. our
nics using API 1.4 and 1.5 don't handle the command, but the API
1.7 nic we have is happy to talk to the module that is plugged into
it.

xl710 cards (the 40g ones with a qsfp+ connector) can be split up
into 4 functions that represent lanes on a single port. you can get
qsfp+ to 4x sfp+ cables so you can use the different lanes/functions
as completely independent interfaces. however, because each lane/
function is one port and therefore module, we need to serialise
access to the module by at least the port. this is to prevent
concurrent reads of different pages of the one module from stepping
on each other. i took the easy path and made a single ixlsff lock,
which is at least conservative.


# 1.34 01-Apr-2019 jmatthew

Enable the full pre-reset code path again, now that we've fixed the cause
of the pcie errors.

ok dlg@


# 1.33 01-Apr-2019 jmatthew

Don't use a prefetchable mapping for the registers - the controller only
accepts specific read and write widths, so prefetching and write combining
can cause it to generate pcie errors.

ok dlg@


# 1.32 29-Mar-2019 dlg

remove ifiq_barrier in ixl_down cos ifiq tasks don't use nic resources.

ixl_down frees all the dma memory used by the rings, but that memory
isn't used by ifiqs, so there's no need to wait for them to finish


# 1.31 22-Mar-2019 dlg

back out 1.28

i misread the doco, having promisc vlan reception is what we want.


# 1.30 22-Mar-2019 dlg

use a cond in ixl_atq_exec to wait for a command to be completed.

the main change is to not run ixl_atq_done, cos the interrupt handler
is supposed to do that for us.


# 1.29 22-Mar-2019 dlg

don't blindly complete admin queue entries

there's a bit in the flags field that the firmware sets when the
command is done, so check to decide if the command is ready to be
completed. this in turn makes ixl_iff work.

"oh" jmatthew@


# 1.28 21-Mar-2019 dlg

don't set VLAN things when configuring promisc

the doco says vlan things should only be set if you're manipulating
something on a specific vlan, otherwise the config applies to
everything, which is how the stack wants things to work at the
moment.

there's still something wrong in here, but let's get this out of
the way first.


# 1.27 21-Mar-2019 dlg

run event callbacks directly in the atq processing

previously events were queued on an SLIST, but multiple link state
events could fire with the same callback. this corrupted the SLIST
and effectively caused an infinite loop.

ok jmatthew@


# 1.26 12-Mar-2019 jmatthew

Until we can figure out why it causes NMIs on some machines, skip the
pre-reset steps described in Intel's datasheet and also their driver code.

ok dlg@


# 1.25 06-Mar-2019 jmatthew

Ignore rx interrupts if the interface isn't supposed to be running.
In some situations the pxe rom seems to leave rx interrupts
pending, so we get them as soon as we turn interrupts on.
Trying to process rx interrupts before we've allocated an rx ring
leads to crashes.

ok dlg@


# 1.24 01-Mar-2019 dlg

use ifiq_input instead of if_input

call if_rxr_livelocked if ifiq_input says to slow down


# 1.23 26-Feb-2019 dlg

tweak the mbuf loading in the tx path so it's easier to read.

count mbuf load failures as output errors so i can see if that's
a problem (it's not, but at least i can see it isn't now).


# 1.22 26-Feb-2019 dlg

put the rx buffer at the end of the cluster.

makes m_prepend later less likely to allocate a new mbuf.


# 1.21 26-Feb-2019 dlg

avoid a deadlock in ixl_down when calling ifq_barrier.

this is particularly noticable on sparc64 when you reboot.

ok jmatthew@


# 1.20 24-Feb-2019 dlg

get rid of an unused softc member


# 1.19 01-Feb-2019 jmatthew

fix up calculation of our physical function id, making the second port
on dual port cards work much better.

ok dlg@


# 1.18 29-Jan-2019 dlg

get rid of some more debug printfs

suggested by jmatthew@


# 1.17 29-Jan-2019 dlg

don't need to print the base queue number.

ok jmatthew@


# 1.16 22-Jan-2019 jmatthew

Increase hardmtu to the maximum according to the datasheet and set the rx
packet size limit to match so jumbos actually work. Larger packets are
split across multiple buffers on the ring, so the buffers themselves stay
the same size.

ok dlg@


# 1.15 22-Jan-2019 jmatthew

Add and remove mac filters for multicast addresses.

ok dlg@


# 1.14 21-Jan-2019 jmatthew

also add a mac filter that ignores vlans for the broadcast address, so we can
see arp requests on vlans, among other things.

ok dlg@


# 1.13 20-Jan-2019 jmatthew

Replace the default mac filter with one that ignores vlans, and enable
promisc vlan mode so we can see vlan tagged traffic.

ok dlg@


# 1.12 20-Jan-2019 jmatthew

implement ixl_rxrinfo, dynamically allocating the buffer to prepare for
multiple receive rings at some point in the future.

ok dlg@


# 1.11 20-Jan-2019 jmatthew

Handle link state change interrupts by issuing IXL_AQ_OP_PHY_LINK_STATUS
to the admin queue. We don't need to wait for or process the reply,
because the existing admin reply queue processing already does it.

ok dlg@


# 1.10 19-Jan-2019 jmatthew

actually set CAUSE_ENA on the rx and tx queues, and re-enable interrupts
at the start of the interrupt handler. now it works well enough to commit
over.

ok dlg@


# 1.9 18-Jan-2019 jmatthew

pack hmc bits in the right order

ok dlg@


# 1.8 18-Nov-2018 jmatthew

request notification of link state changes, which helps us detect
link when it takes a bit longer to establish.

ok dlg@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.7 21-Dec-2017 dlg

add some ifq and ifiq barriers in ixl_down.

move rxfill in ixl_up so the qtail is only written once.


# 1.6 21-Dec-2017 dlg

now that we have multiple input queues in ifnet structs, use them.

for now we still only have one set of tx and rx rings, but sets us up to
bump that number.


# 1.5 15-Dec-2017 dlg

put where im up to into the tree so jmatthew@ can look at it.


# 1.4 29-Nov-2017 dlg

turns out you dont have to configure a vsi as the default in a veb.

what a waste of two days.


# 1.3 29-Nov-2017 dlg

let this build on sparc64 again.


# 1.2 28-Nov-2017 dlg

remove the #if 0ed out ixl_add_veb now that cvs has backed it up

im pretty sure we dont need it if we're just using the chip as a
single ethernet port.


# 1.1 28-Nov-2017 dlg

add ixl(4) for the "Intel Ethernet 700 Series"

this doesn't work yet, but it very recently got too big to hack on
without cvs to help me manage further changes to it.

ok deraadt@


# 1.94 30-Dec-2023 bluhm

Set ixl(4) IXL_TX_PKT_DESCS to 8.

Mark Patruck has reported problems with ixl revision 1.90 TSO diff.
He uses ixl device passthrough from Linux via KVM to OpenBSD guest.
After a few hours of operation, interface locks up with oactive.
The problem also occures with TSO disabled, after the TSO diff had
been commited. deraadt@ has seen similar problems with ixl interface
on sparc64.
Changing IXL_TX_PKT_DESCS back to the original value 8 fixes the
lockup and even TSO on the hardware still works. FreeBSD and NetBSD
also use this value. The 32 was copied from ix(4) TSO diff and is
not necessary for ixl(4).

debugged with jan@; lot of bisecting and testing by Mark Patruck
OK mglocker@ patrick@


# 1.93 10-Nov-2023 bluhm

Make ifq and ifiq interface MP safe.

Rename ifq_set_maxlen() to ifq_init_maxlen(). This function neither
uses WRITE_ONCE() nor a mutex and is called before the ifq mutex
is initialized. The new name expresses that it should be used only
during interface attach when there is no concurrency.

Protect ifq_len(), ifq_empty(), ifiq_len(), and ifiq_empty() with
READ_ONCE(). They can be used without lock as they only read a
single integer.

OK dlg@


# 1.92 20-Oct-2023 jan

Add missing tcps_outpkttso counter to ixl(4) TSO setup.

ok bluhm@


# 1.91 20-Oct-2023 jan

Improve bad comment.

pointed out by kn@

ok kn@


# 1.90 19-Oct-2023 jan

Enable TCP Segmentation Offloading for ixl(4)

Tested on amd64 and sparc64.
Also tested by bluhm@.

ok bluhm@


Revision tags: OPENBSD_7_4_BASE
# 1.89 29-Sep-2023 bluhm

Replace kernel lock with mutex in ixl(4) media status.

Witness found that sc_atq_mtx mutex is held when kernel lock is
acquired. This might cause a deadlock. Protect sc_media_status
and sc_media_active with the link state mutex instead. Global
fields ifm->ifm_status and ifm->ifm_active are still protected by
kernel lock.

OK tobhe@


# 1.88 19-Jul-2023 jan

Protect ixl(4) admin queue with mutex(9).

with tweaks from bluhm

tested by bluhm

ok bluhm@


Revision tags: OPENBSD_7_3_BASE
# 1.87 06-Feb-2023 jan

consolidate mbuf header parsing on device driver layer

with tweaks from mvs@, mpi@, dlg@, naddy@ and bluhm@

"go for it" deraadt@

ok naddy@, mvs@


# 1.86 26-Jan-2023 deraadt

backing "consolidate mbuf header parsing on device driver layer"
easily repeatable ASSERT happens seconds after starting compiles over nfs.


# 1.85 24-Jan-2023 jan

consolidate mbuf header parsing on device driver layer

with tweaks from mvs@, mpi@ and dlg@

ok mvs@, dlg@


Revision tags: OPENBSD_7_2_BASE
# 1.84 05-Aug-2022 bluhm

The netlock for SIOCSIFMEDIA and SIOCGIFMEDIA ioctl is not necessary.
Legacy drivers run with kernel lock, interface media is MP safe or
has kernel lock. Assert kernel lock in ix(4) and ixl(4).
OK kettenis@


Revision tags: OPENBSD_7_1_BASE
# 1.83 11-Mar-2022 mpi

Constify struct cfattach.


# 1.82 10-Feb-2022 bluhm

Enable receive checksum offloading on ixl(4) network interfaces.
from jan@; test and OK dlg@


# 1.81 09-Feb-2022 dlg

enable hw vlan tag handling in the rx path too.

also tested on both x86 and sparc64.


# 1.80 09-Feb-2022 dlg

enable hardware vlan tagging.

tested on sparc64 and x86


# 1.79 08-Feb-2022 dlg

bring back IPv4, TCP4/6 and UDP4/6 checksum offloading.

this was first introduced in r1.176 by jan@. this diff includes two
fixes to that implementation.

the most important one is to parse the ip and tcp headers before a
possible call to m_defrag. if an l4 offload is requested, it's only
requested by the stack when the payload is correctly aligned and
with each header contiguous in memory. this means you can use
m_getptr and cast the packet data to the relevant headers to read
them directly because that's what the stack does when it's working
on them. this makes it cheap to work on them too.

however, if you m_defrag, it ignores the alignment and ends up
making it unsafe to dereference the ip and tcp/udp payloads on
strict alignment architectures. if we want to look at the headers
after m_defrag, we'd likely have to copy them onto the stack first.

the other fix is to reset the offload bits between packets in the
loop in ixl_start.

another difference is that this code skips parsing the packet if
no checksum offload is requested.

tests and a tweak by bluhm@ to actually use the offloading
tested by me on sparc64 and x86 boxes

ok bluhm@ jmatthew@


# 1.78 09-Jan-2022 jsg

spelling
feedback and ok tb@ jmc@ ok ratchov@


# 1.77 27-Nov-2021 deraadt

previous commit causes gcc to perform an unaligned access to the tcphdr
(at least on sparc64) since it accesses the bitfield using an "int sized"
instructions, rather than the minimally sized byte instruction. This is
permitted by the language laywers who probably prefer we change the tcphdr
in every packet. It is not clear how to convince gcc to avoid this behaviour,
and a week of futzing hasn't found fast path solutions yet. In the meantime
the tree may not be broken.


# 1.76 09-Nov-2021 jan

Add IPv4, TCP4/6 and UDP4/6 checksum offloading.

ok jmatthew@


Revision tags: OPENBSD_7_0_BASE
# 1.75 23-Jul-2021 jmatthew

pci_intr_msix_count() is the function that drivers using multiple MSI-X
vectors use to decide whether to use MSI-X, so make it return 0 if MSI
is not enabled for the device.

fixes problems with ix(4) on older amd64 hardware and current riscv64
ok kettenis@ dlg@


Revision tags: OPENBSD_6_9_BASE
# 1.74 26-Mar-2021 jan

Add PCI ID for Intel X710 10G SFP+ NIC

ok patrick@


# 1.73 26-Feb-2021 jan

Add missing PCI product IDs for x710 10GBase-T into ixl(4)

OK phessler


# 1.72 25-Jan-2021 dlg

if the rx descriptor reports the rss hash, use it for the mbuf flowid.

ok jmatthew@


# 1.71 22-Dec-2020 dlg

name the rx rings like ix does for systat mb


# 1.70 12-Dec-2020 jan

Rename the macro MCLGETI to MCLGETL and removes the dead parameter ifp.

OK dlg@, bluhm@
No Opinion mpi@
Not against it claudio@


# 1.69 02-Nov-2020 dlg

add support for ctl reads and writes on 710 chips with API < 1.5

this gets rid of some annoying errors when bringing such an interface
up, but more importantly is allows RSS to work on these boards with
older firmware.

ok jmatthew@


Revision tags: OPENBSD_6_8_BASE
# 1.68 16-Jul-2020 dlg

use a mutex to serialise the test and set of ifp->if_link_state.

this was serialised by NET_LOCK, but now i get link state change
information in an interrupt context, so i shouldn't (can't) do that
anymore.

ok jmatthew@


# 1.67 16-Jul-2020 dlg

sc_atq_mtx is unused, so get rid of it


# 1.66 12-Jul-2020 dlg

it's not an error if the API doesnt support reading sfp/qsfp stuff.

it's just not supported. the manpage says why.

ok sthen@ deraadt@


# 1.65 11-Jul-2020 dlg

don't complain if the API is too low to support "query phy types".

it just means we won't know what to show in ifconfig media output,
but that's not a huge deal. there's still some more issues around
api versions and driver support that we're working on though.

ok sthen@ jmatthew@


# 1.64 10-Jul-2020 patrick

Change users of IFQ_SET_MAXLEN() and IFQ_IS_EMPTY() to use the "new" API.

ok dlg@ tobhe@


# 1.63 07-Jul-2020 dlg

fix a copy pasto.

from netbsd if_ixl.c r1.39 by yamaguchi


# 1.62 07-Jul-2020 dlg

don't try and be too clever in the kstat update timeout.


# 1.61 07-Jul-2020 dlg

remove some old code from a previous version of the kstat diff.

ixl(4) is only enabled on 64bit archs, so we don't need a 32bit
fallback for the 48 and 64 bit counter reads.


# 1.60 07-Jul-2020 dlg

add kstat support for reading hardware counters.

this chip is annoyingly complicated, which is reflected in how
complicated it is to read counters off the chip. while we just use
ixl as a normal network interface, the chip is capable of being a
switch with physical ports, virtual ports and all sorts of other
functionality, and there are counters in different places for all
these different pieces. in our simple setup the driver interface
is mapped to a single physical port which we talk to via a single
virtual switch interface. this diff adds counters on each interface
for the physical port and for the virtual switch interface (vsi).

the port counters show what the hardware is doing, while the vsi
counters show how the driver is interacting with the chip. for
things like packet counters, these numbers tend to correlate strongly,
but there are some differences. if the chip drops packets cos there's
no descriptors on the rx ring, that's shown in the vsi counters.
problems talking to the physical network (eg, packet corruption off
the wire) are reported on the port counters.

on top of the chip just being complicated, reading the counters is
a complicated activity on its own. because the counters can be read
by multiple consumers in a virtualised environment, the counters
never really get reset. they are also stored in annoyingly small
fields. this means you basically have to poll the chip periodically
and calculate differences between the polls to avoid losing numbers
if they overflow too quickly.


# 1.59 26-Jun-2020 dlg

whitespace fixes, no functional change.


# 1.58 26-Jun-2020 dlg

fix link state handling so we can see link go both up and down.


# 1.57 25-Jun-2020 dlg

LUT writes go through actual regsiters on 710s, not ctl writes.

found by jmatthew@


# 1.56 25-Jun-2020 dlg

add rss/toeplitz support for 710 chips.

this basically distributes incoming packets over the rx rings, where
without this they would all land on ring 0.

note that the 722 stuff is stubbed out at the moment. i don't have
an x722 to test with, so it's hard to get motivated to write the
code for it.

this is based on stuff supplied by christiano haesbaert.


# 1.55 25-Jun-2020 dlg

use the ixl_chip struct to store different rss_hena settings for 710/722


# 1.54 25-Jun-2020 dlg

add definitions for rss bits.

based on info from christiano haesbaert


# 1.53 25-Jun-2020 dlg

we're close to a point where the differences between 710s and 722s matter.

this adds a struct ixl_chip, which should hold the differences in
functionality between 710s and 722s. this adds which type of chip
each product is to the ixl_devices array.

based on stuff from christiano haesbaert


# 1.52 25-Jun-2020 dlg

use intrmap to set up multiple queues across multiple cpus.

ixl(4) is only enabled on amd64 and sparc64, and both of them now
support pci_intr_establish_cpu(), so it is safe to apply this.

a few things to note:

- the chip only supports a power of 2 number of queues, (ie, 1, 2,
4, 8, etc), so this also tests the INTRMAP_POWEROF2 flag to
intrmap_create. i tested this on a box with 6 cpus and it did the
right thing.
- the chip can support a lot of vectors, but we're limiting it to
8 for now.
- rss with toeplitz is not implemented yet, so all rxed packets end
up on the 0th queue for now.

jmatthew@ had done most of the work already, and christiano haesbaert
provided some hints to motivate me to work on this bit of it.

tested by jmatthew@ on sparc64
ok jmatthew@


# 1.51 24-Jun-2020 dlg

set IFQ_SET_MAXLEN to the number of slots on the tx ring, not 1.

this effectively enables tx mitigation on this chip. hrvoje popovski
tested it and discovered it adds about 20% to forwarding performance
on his test machine, and brings it more in line with ix(4) performance.

jmatthew thinks i copied setting it to 1 from myx, but myx resets
it to a proper value later on when it figures out what the chip is
capable of. how embarrassment.


# 1.50 24-Jun-2020 dlg

get rid of the per device sff lock because we only use the global one.

no functional change


# 1.49 21-Jun-2020 jmatthew

The onboard interfaces on T7/S7 machines don't provide a valid MAC address
for themselves, so use the "local-mac-address" Open Firmware property
instead, as done in ix(4).

ok dlg@


# 1.48 09-May-2020 jmatthew

Use MSI-X interrupts where available. The first vector is used for
events and command completions as that's the only vector they can go to.
tx/rx queues are mapped to subsequent vectors.

ok mpi@ dlg@


Revision tags: OPENBSD_6_7_BASE
# 1.47 22-Apr-2020 mpi

Use I40E_QUEUE_TYPE_EOL instead of hardcoding its value localy.

ok jmatthew@


# 1.46 19-Nov-2019 yasuoka

Correct the link speed constant variables. From Shoichi Yamaguchi.

ok jmatthew


Revision tags: OPENBSD_6_6_BASE
# 1.45 02-Oct-2019 yasuoka

When dequeuing an aqb from the live queue and move it to the idle queue,
remove it from the live queue. Found by Shoichi Yamaguchi.

ok dlg


# 1.44 30-Sep-2019 dlg

use the right values when figuring out which if_baudrate to use.

i was basically using ISSET(bitfield, shift) instead of
ISSET(bitfield, 1 << shift), so things didn't line up properly.

before this the baudrate would come out as 0, but lacp mode in
trunk(4) wants a non-zero if_baudrate value to help pick the best
aggregator to use. if the port reports 0, it doesn't get selected.
this probably explains why trunk(4) doesn't like doing lacp on
virtual ethernet interfaces too, but at least i can get a number
that actually means something on a hardware interface. alternatively,
use aggr(4), which doesn't care about the baudrate.

ok jmatthew@


# 1.43 27-Aug-2019 dlg

don't check the page number was set correctly.

there are some extremely terrible modules out there that violate
specs badly. some SFP+ DACs i have don't support page switching,
and always report page 0xff. let userland look at them anyway.


# 1.42 27-Aug-2019 dlg

make SIOCGIFSFFPAGE support QSFP modules better.

basically the "read phy register" admin op uses the "address" field
on SFP modules for the i2c address, but uses that same field for
the "sff page number" for QSFP. this wraps the calls to the admin
op with sfp and qsfp specific code, and chooses between them depending
on what type of phy the fw reports.

tested by me and jmatthew@ pretty hard


# 1.41 29-Jul-2019 jmatthew

remove some VF bits now that iavf(4) exists

ok dlg@


# 1.40 21-Jul-2019 dlg

apply backpressure when ifiq says the stack is getting busy


# 1.39 04-Jul-2019 jmatthew

when the mac address changes, update the mac/vlan filters accordingly.

ok dlg@


# 1.38 04-May-2019 jsg

fix array bounds check in ixl_search_link_speed()
ok dlg@


# 1.37 15-Apr-2019 visa

Use timeout_del_barrier(9) instead of timeout_del(9) followed by
conditional timeout_barrier(9).

OK kn@ dlg@


Revision tags: OPENBSD_6_5_BASE
# 1.36 10-Apr-2019 phessler

Add support for X722 to ixl(4)

full list of pci ids from sthen@

OK sthen@, jmatthew@, deraadt@


# 1.35 10-Apr-2019 dlg

implement support for SIOCGIFSFFPAGE

this relies on some firmware commands for accessing the i2c bus
that are not available until a relatively recent API version. our
nics using API 1.4 and 1.5 don't handle the command, but the API
1.7 nic we have is happy to talk to the module that is plugged into
it.

xl710 cards (the 40g ones with a qsfp+ connector) can be split up
into 4 functions that represent lanes on a single port. you can get
qsfp+ to 4x sfp+ cables so you can use the different lanes/functions
as completely independent interfaces. however, because each lane/
function is one port and therefore module, we need to serialise
access to the module by at least the port. this is to prevent
concurrent reads of different pages of the one module from stepping
on each other. i took the easy path and made a single ixlsff lock,
which is at least conservative.


# 1.34 01-Apr-2019 jmatthew

Enable the full pre-reset code path again, now that we've fixed the cause
of the pcie errors.

ok dlg@


# 1.33 01-Apr-2019 jmatthew

Don't use a prefetchable mapping for the registers - the controller only
accepts specific read and write widths, so prefetching and write combining
can cause it to generate pcie errors.

ok dlg@


# 1.32 29-Mar-2019 dlg

remove ifiq_barrier in ixl_down cos ifiq tasks don't use nic resources.

ixl_down frees all the dma memory used by the rings, but that memory
isn't used by ifiqs, so there's no need to wait for them to finish


# 1.31 22-Mar-2019 dlg

back out 1.28

i misread the doco, having promisc vlan reception is what we want.


# 1.30 22-Mar-2019 dlg

use a cond in ixl_atq_exec to wait for a command to be completed.

the main change is to not run ixl_atq_done, cos the interrupt handler
is supposed to do that for us.


# 1.29 22-Mar-2019 dlg

don't blindly complete admin queue entries

there's a bit in the flags field that the firmware sets when the
command is done, so check to decide if the command is ready to be
completed. this in turn makes ixl_iff work.

"oh" jmatthew@


# 1.28 21-Mar-2019 dlg

don't set VLAN things when configuring promisc

the doco says vlan things should only be set if you're manipulating
something on a specific vlan, otherwise the config applies to
everything, which is how the stack wants things to work at the
moment.

there's still something wrong in here, but let's get this out of
the way first.


# 1.27 21-Mar-2019 dlg

run event callbacks directly in the atq processing

previously events were queued on an SLIST, but multiple link state
events could fire with the same callback. this corrupted the SLIST
and effectively caused an infinite loop.

ok jmatthew@


# 1.26 12-Mar-2019 jmatthew

Until we can figure out why it causes NMIs on some machines, skip the
pre-reset steps described in Intel's datasheet and also their driver code.

ok dlg@


# 1.25 06-Mar-2019 jmatthew

Ignore rx interrupts if the interface isn't supposed to be running.
In some situations the pxe rom seems to leave rx interrupts
pending, so we get them as soon as we turn interrupts on.
Trying to process rx interrupts before we've allocated an rx ring
leads to crashes.

ok dlg@


# 1.24 01-Mar-2019 dlg

use ifiq_input instead of if_input

call if_rxr_livelocked if ifiq_input says to slow down


# 1.23 26-Feb-2019 dlg

tweak the mbuf loading in the tx path so it's easier to read.

count mbuf load failures as output errors so i can see if that's
a problem (it's not, but at least i can see it isn't now).


# 1.22 26-Feb-2019 dlg

put the rx buffer at the end of the cluster.

makes m_prepend later less likely to allocate a new mbuf.


# 1.21 26-Feb-2019 dlg

avoid a deadlock in ixl_down when calling ifq_barrier.

this is particularly noticable on sparc64 when you reboot.

ok jmatthew@


# 1.20 24-Feb-2019 dlg

get rid of an unused softc member


# 1.19 01-Feb-2019 jmatthew

fix up calculation of our physical function id, making the second port
on dual port cards work much better.

ok dlg@


# 1.18 29-Jan-2019 dlg

get rid of some more debug printfs

suggested by jmatthew@


# 1.17 29-Jan-2019 dlg

don't need to print the base queue number.

ok jmatthew@


# 1.16 22-Jan-2019 jmatthew

Increase hardmtu to the maximum according to the datasheet and set the rx
packet size limit to match so jumbos actually work. Larger packets are
split across multiple buffers on the ring, so the buffers themselves stay
the same size.

ok dlg@


# 1.15 22-Jan-2019 jmatthew

Add and remove mac filters for multicast addresses.

ok dlg@


# 1.14 21-Jan-2019 jmatthew

also add a mac filter that ignores vlans for the broadcast address, so we can
see arp requests on vlans, among other things.

ok dlg@


# 1.13 20-Jan-2019 jmatthew

Replace the default mac filter with one that ignores vlans, and enable
promisc vlan mode so we can see vlan tagged traffic.

ok dlg@


# 1.12 20-Jan-2019 jmatthew

implement ixl_rxrinfo, dynamically allocating the buffer to prepare for
multiple receive rings at some point in the future.

ok dlg@


# 1.11 20-Jan-2019 jmatthew

Handle link state change interrupts by issuing IXL_AQ_OP_PHY_LINK_STATUS
to the admin queue. We don't need to wait for or process the reply,
because the existing admin reply queue processing already does it.

ok dlg@


# 1.10 19-Jan-2019 jmatthew

actually set CAUSE_ENA on the rx and tx queues, and re-enable interrupts
at the start of the interrupt handler. now it works well enough to commit
over.

ok dlg@


# 1.9 18-Jan-2019 jmatthew

pack hmc bits in the right order

ok dlg@


# 1.8 18-Nov-2018 jmatthew

request notification of link state changes, which helps us detect
link when it takes a bit longer to establish.

ok dlg@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.7 21-Dec-2017 dlg

add some ifq and ifiq barriers in ixl_down.

move rxfill in ixl_up so the qtail is only written once.


# 1.6 21-Dec-2017 dlg

now that we have multiple input queues in ifnet structs, use them.

for now we still only have one set of tx and rx rings, but sets us up to
bump that number.


# 1.5 15-Dec-2017 dlg

put where im up to into the tree so jmatthew@ can look at it.


# 1.4 29-Nov-2017 dlg

turns out you dont have to configure a vsi as the default in a veb.

what a waste of two days.


# 1.3 29-Nov-2017 dlg

let this build on sparc64 again.


# 1.2 28-Nov-2017 dlg

remove the #if 0ed out ixl_add_veb now that cvs has backed it up

im pretty sure we dont need it if we're just using the chip as a
single ethernet port.


# 1.1 28-Nov-2017 dlg

add ixl(4) for the "Intel Ethernet 700 Series"

this doesn't work yet, but it very recently got too big to hack on
without cvs to help me manage further changes to it.

ok deraadt@


# 1.93 10-Nov-2023 bluhm

Make ifq and ifiq interface MP safe.

Rename ifq_set_maxlen() to ifq_init_maxlen(). This function neither
uses WRITE_ONCE() nor a mutex and is called before the ifq mutex
is initialized. The new name expresses that it should be used only
during interface attach when there is no concurrency.

Protect ifq_len(), ifq_empty(), ifiq_len(), and ifiq_empty() with
READ_ONCE(). They can be used without lock as they only read a
single integer.

OK dlg@


# 1.92 20-Oct-2023 jan

Add missing tcps_outpkttso counter to ixl(4) TSO setup.

ok bluhm@


# 1.91 20-Oct-2023 jan

Improve bad comment.

pointed out by kn@

ok kn@


# 1.90 19-Oct-2023 jan

Enable TCP Segmentation Offloading for ixl(4)

Tested on amd64 and sparc64.
Also tested by bluhm@.

ok bluhm@


Revision tags: OPENBSD_7_4_BASE
# 1.89 29-Sep-2023 bluhm

Replace kernel lock with mutex in ixl(4) media status.

Witness found that sc_atq_mtx mutex is held when kernel lock is
acquired. This might cause a deadlock. Protect sc_media_status
and sc_media_active with the link state mutex instead. Global
fields ifm->ifm_status and ifm->ifm_active are still protected by
kernel lock.

OK tobhe@


# 1.88 19-Jul-2023 jan

Protect ixl(4) admin queue with mutex(9).

with tweaks from bluhm

tested by bluhm

ok bluhm@


Revision tags: OPENBSD_7_3_BASE
# 1.87 06-Feb-2023 jan

consolidate mbuf header parsing on device driver layer

with tweaks from mvs@, mpi@, dlg@, naddy@ and bluhm@

"go for it" deraadt@

ok naddy@, mvs@


# 1.86 26-Jan-2023 deraadt

backing "consolidate mbuf header parsing on device driver layer"
easily repeatable ASSERT happens seconds after starting compiles over nfs.


# 1.85 24-Jan-2023 jan

consolidate mbuf header parsing on device driver layer

with tweaks from mvs@, mpi@ and dlg@

ok mvs@, dlg@


Revision tags: OPENBSD_7_2_BASE
# 1.84 05-Aug-2022 bluhm

The netlock for SIOCSIFMEDIA and SIOCGIFMEDIA ioctl is not necessary.
Legacy drivers run with kernel lock, interface media is MP safe or
has kernel lock. Assert kernel lock in ix(4) and ixl(4).
OK kettenis@


Revision tags: OPENBSD_7_1_BASE
# 1.83 11-Mar-2022 mpi

Constify struct cfattach.


# 1.82 10-Feb-2022 bluhm

Enable receive checksum offloading on ixl(4) network interfaces.
from jan@; test and OK dlg@


# 1.81 09-Feb-2022 dlg

enable hw vlan tag handling in the rx path too.

also tested on both x86 and sparc64.


# 1.80 09-Feb-2022 dlg

enable hardware vlan tagging.

tested on sparc64 and x86


# 1.79 08-Feb-2022 dlg

bring back IPv4, TCP4/6 and UDP4/6 checksum offloading.

this was first introduced in r1.176 by jan@. this diff includes two
fixes to that implementation.

the most important one is to parse the ip and tcp headers before a
possible call to m_defrag. if an l4 offload is requested, it's only
requested by the stack when the payload is correctly aligned and
with each header contiguous in memory. this means you can use
m_getptr and cast the packet data to the relevant headers to read
them directly because that's what the stack does when it's working
on them. this makes it cheap to work on them too.

however, if you m_defrag, it ignores the alignment and ends up
making it unsafe to dereference the ip and tcp/udp payloads on
strict alignment architectures. if we want to look at the headers
after m_defrag, we'd likely have to copy them onto the stack first.

the other fix is to reset the offload bits between packets in the
loop in ixl_start.

another difference is that this code skips parsing the packet if
no checksum offload is requested.

tests and a tweak by bluhm@ to actually use the offloading
tested by me on sparc64 and x86 boxes

ok bluhm@ jmatthew@


# 1.78 09-Jan-2022 jsg

spelling
feedback and ok tb@ jmc@ ok ratchov@


# 1.77 27-Nov-2021 deraadt

previous commit causes gcc to perform an unaligned access to the tcphdr
(at least on sparc64) since it accesses the bitfield using an "int sized"
instructions, rather than the minimally sized byte instruction. This is
permitted by the language laywers who probably prefer we change the tcphdr
in every packet. It is not clear how to convince gcc to avoid this behaviour,
and a week of futzing hasn't found fast path solutions yet. In the meantime
the tree may not be broken.


# 1.76 09-Nov-2021 jan

Add IPv4, TCP4/6 and UDP4/6 checksum offloading.

ok jmatthew@


Revision tags: OPENBSD_7_0_BASE
# 1.75 23-Jul-2021 jmatthew

pci_intr_msix_count() is the function that drivers using multiple MSI-X
vectors use to decide whether to use MSI-X, so make it return 0 if MSI
is not enabled for the device.

fixes problems with ix(4) on older amd64 hardware and current riscv64
ok kettenis@ dlg@


Revision tags: OPENBSD_6_9_BASE
# 1.74 26-Mar-2021 jan

Add PCI ID for Intel X710 10G SFP+ NIC

ok patrick@


# 1.73 26-Feb-2021 jan

Add missing PCI product IDs for x710 10GBase-T into ixl(4)

OK phessler


# 1.72 25-Jan-2021 dlg

if the rx descriptor reports the rss hash, use it for the mbuf flowid.

ok jmatthew@


# 1.71 22-Dec-2020 dlg

name the rx rings like ix does for systat mb


# 1.70 12-Dec-2020 jan

Rename the macro MCLGETI to MCLGETL and removes the dead parameter ifp.

OK dlg@, bluhm@
No Opinion mpi@
Not against it claudio@


# 1.69 02-Nov-2020 dlg

add support for ctl reads and writes on 710 chips with API < 1.5

this gets rid of some annoying errors when bringing such an interface
up, but more importantly is allows RSS to work on these boards with
older firmware.

ok jmatthew@


Revision tags: OPENBSD_6_8_BASE
# 1.68 16-Jul-2020 dlg

use a mutex to serialise the test and set of ifp->if_link_state.

this was serialised by NET_LOCK, but now i get link state change
information in an interrupt context, so i shouldn't (can't) do that
anymore.

ok jmatthew@


# 1.67 16-Jul-2020 dlg

sc_atq_mtx is unused, so get rid of it


# 1.66 12-Jul-2020 dlg

it's not an error if the API doesnt support reading sfp/qsfp stuff.

it's just not supported. the manpage says why.

ok sthen@ deraadt@


# 1.65 11-Jul-2020 dlg

don't complain if the API is too low to support "query phy types".

it just means we won't know what to show in ifconfig media output,
but that's not a huge deal. there's still some more issues around
api versions and driver support that we're working on though.

ok sthen@ jmatthew@


# 1.64 10-Jul-2020 patrick

Change users of IFQ_SET_MAXLEN() and IFQ_IS_EMPTY() to use the "new" API.

ok dlg@ tobhe@


# 1.63 07-Jul-2020 dlg

fix a copy pasto.

from netbsd if_ixl.c r1.39 by yamaguchi


# 1.62 07-Jul-2020 dlg

don't try and be too clever in the kstat update timeout.


# 1.61 07-Jul-2020 dlg

remove some old code from a previous version of the kstat diff.

ixl(4) is only enabled on 64bit archs, so we don't need a 32bit
fallback for the 48 and 64 bit counter reads.


# 1.60 07-Jul-2020 dlg

add kstat support for reading hardware counters.

this chip is annoyingly complicated, which is reflected in how
complicated it is to read counters off the chip. while we just use
ixl as a normal network interface, the chip is capable of being a
switch with physical ports, virtual ports and all sorts of other
functionality, and there are counters in different places for all
these different pieces. in our simple setup the driver interface
is mapped to a single physical port which we talk to via a single
virtual switch interface. this diff adds counters on each interface
for the physical port and for the virtual switch interface (vsi).

the port counters show what the hardware is doing, while the vsi
counters show how the driver is interacting with the chip. for
things like packet counters, these numbers tend to correlate strongly,
but there are some differences. if the chip drops packets cos there's
no descriptors on the rx ring, that's shown in the vsi counters.
problems talking to the physical network (eg, packet corruption off
the wire) are reported on the port counters.

on top of the chip just being complicated, reading the counters is
a complicated activity on its own. because the counters can be read
by multiple consumers in a virtualised environment, the counters
never really get reset. they are also stored in annoyingly small
fields. this means you basically have to poll the chip periodically
and calculate differences between the polls to avoid losing numbers
if they overflow too quickly.


# 1.59 26-Jun-2020 dlg

whitespace fixes, no functional change.


# 1.58 26-Jun-2020 dlg

fix link state handling so we can see link go both up and down.


# 1.57 25-Jun-2020 dlg

LUT writes go through actual regsiters on 710s, not ctl writes.

found by jmatthew@


# 1.56 25-Jun-2020 dlg

add rss/toeplitz support for 710 chips.

this basically distributes incoming packets over the rx rings, where
without this they would all land on ring 0.

note that the 722 stuff is stubbed out at the moment. i don't have
an x722 to test with, so it's hard to get motivated to write the
code for it.

this is based on stuff supplied by christiano haesbaert.


# 1.55 25-Jun-2020 dlg

use the ixl_chip struct to store different rss_hena settings for 710/722


# 1.54 25-Jun-2020 dlg

add definitions for rss bits.

based on info from christiano haesbaert


# 1.53 25-Jun-2020 dlg

we're close to a point where the differences between 710s and 722s matter.

this adds a struct ixl_chip, which should hold the differences in
functionality between 710s and 722s. this adds which type of chip
each product is to the ixl_devices array.

based on stuff from christiano haesbaert


# 1.52 25-Jun-2020 dlg

use intrmap to set up multiple queues across multiple cpus.

ixl(4) is only enabled on amd64 and sparc64, and both of them now
support pci_intr_establish_cpu(), so it is safe to apply this.

a few things to note:

- the chip only supports a power of 2 number of queues, (ie, 1, 2,
4, 8, etc), so this also tests the INTRMAP_POWEROF2 flag to
intrmap_create. i tested this on a box with 6 cpus and it did the
right thing.
- the chip can support a lot of vectors, but we're limiting it to
8 for now.
- rss with toeplitz is not implemented yet, so all rxed packets end
up on the 0th queue for now.

jmatthew@ had done most of the work already, and christiano haesbaert
provided some hints to motivate me to work on this bit of it.

tested by jmatthew@ on sparc64
ok jmatthew@


# 1.51 24-Jun-2020 dlg

set IFQ_SET_MAXLEN to the number of slots on the tx ring, not 1.

this effectively enables tx mitigation on this chip. hrvoje popovski
tested it and discovered it adds about 20% to forwarding performance
on his test machine, and brings it more in line with ix(4) performance.

jmatthew thinks i copied setting it to 1 from myx, but myx resets
it to a proper value later on when it figures out what the chip is
capable of. how embarrassment.


# 1.50 24-Jun-2020 dlg

get rid of the per device sff lock because we only use the global one.

no functional change


# 1.49 21-Jun-2020 jmatthew

The onboard interfaces on T7/S7 machines don't provide a valid MAC address
for themselves, so use the "local-mac-address" Open Firmware property
instead, as done in ix(4).

ok dlg@


# 1.48 09-May-2020 jmatthew

Use MSI-X interrupts where available. The first vector is used for
events and command completions as that's the only vector they can go to.
tx/rx queues are mapped to subsequent vectors.

ok mpi@ dlg@


Revision tags: OPENBSD_6_7_BASE
# 1.47 22-Apr-2020 mpi

Use I40E_QUEUE_TYPE_EOL instead of hardcoding its value localy.

ok jmatthew@


# 1.46 19-Nov-2019 yasuoka

Correct the link speed constant variables. From Shoichi Yamaguchi.

ok jmatthew


Revision tags: OPENBSD_6_6_BASE
# 1.45 02-Oct-2019 yasuoka

When dequeuing an aqb from the live queue and move it to the idle queue,
remove it from the live queue. Found by Shoichi Yamaguchi.

ok dlg


# 1.44 30-Sep-2019 dlg

use the right values when figuring out which if_baudrate to use.

i was basically using ISSET(bitfield, shift) instead of
ISSET(bitfield, 1 << shift), so things didn't line up properly.

before this the baudrate would come out as 0, but lacp mode in
trunk(4) wants a non-zero if_baudrate value to help pick the best
aggregator to use. if the port reports 0, it doesn't get selected.
this probably explains why trunk(4) doesn't like doing lacp on
virtual ethernet interfaces too, but at least i can get a number
that actually means something on a hardware interface. alternatively,
use aggr(4), which doesn't care about the baudrate.

ok jmatthew@


# 1.43 27-Aug-2019 dlg

don't check the page number was set correctly.

there are some extremely terrible modules out there that violate
specs badly. some SFP+ DACs i have don't support page switching,
and always report page 0xff. let userland look at them anyway.


# 1.42 27-Aug-2019 dlg

make SIOCGIFSFFPAGE support QSFP modules better.

basically the "read phy register" admin op uses the "address" field
on SFP modules for the i2c address, but uses that same field for
the "sff page number" for QSFP. this wraps the calls to the admin
op with sfp and qsfp specific code, and chooses between them depending
on what type of phy the fw reports.

tested by me and jmatthew@ pretty hard


# 1.41 29-Jul-2019 jmatthew

remove some VF bits now that iavf(4) exists

ok dlg@


# 1.40 21-Jul-2019 dlg

apply backpressure when ifiq says the stack is getting busy


# 1.39 04-Jul-2019 jmatthew

when the mac address changes, update the mac/vlan filters accordingly.

ok dlg@


# 1.38 04-May-2019 jsg

fix array bounds check in ixl_search_link_speed()
ok dlg@


# 1.37 15-Apr-2019 visa

Use timeout_del_barrier(9) instead of timeout_del(9) followed by
conditional timeout_barrier(9).

OK kn@ dlg@


Revision tags: OPENBSD_6_5_BASE
# 1.36 10-Apr-2019 phessler

Add support for X722 to ixl(4)

full list of pci ids from sthen@

OK sthen@, jmatthew@, deraadt@


# 1.35 10-Apr-2019 dlg

implement support for SIOCGIFSFFPAGE

this relies on some firmware commands for accessing the i2c bus
that are not available until a relatively recent API version. our
nics using API 1.4 and 1.5 don't handle the command, but the API
1.7 nic we have is happy to talk to the module that is plugged into
it.

xl710 cards (the 40g ones with a qsfp+ connector) can be split up
into 4 functions that represent lanes on a single port. you can get
qsfp+ to 4x sfp+ cables so you can use the different lanes/functions
as completely independent interfaces. however, because each lane/
function is one port and therefore module, we need to serialise
access to the module by at least the port. this is to prevent
concurrent reads of different pages of the one module from stepping
on each other. i took the easy path and made a single ixlsff lock,
which is at least conservative.


# 1.34 01-Apr-2019 jmatthew

Enable the full pre-reset code path again, now that we've fixed the cause
of the pcie errors.

ok dlg@


# 1.33 01-Apr-2019 jmatthew

Don't use a prefetchable mapping for the registers - the controller only
accepts specific read and write widths, so prefetching and write combining
can cause it to generate pcie errors.

ok dlg@


# 1.32 29-Mar-2019 dlg

remove ifiq_barrier in ixl_down cos ifiq tasks don't use nic resources.

ixl_down frees all the dma memory used by the rings, but that memory
isn't used by ifiqs, so there's no need to wait for them to finish


# 1.31 22-Mar-2019 dlg

back out 1.28

i misread the doco, having promisc vlan reception is what we want.


# 1.30 22-Mar-2019 dlg

use a cond in ixl_atq_exec to wait for a command to be completed.

the main change is to not run ixl_atq_done, cos the interrupt handler
is supposed to do that for us.


# 1.29 22-Mar-2019 dlg

don't blindly complete admin queue entries

there's a bit in the flags field that the firmware sets when the
command is done, so check to decide if the command is ready to be
completed. this in turn makes ixl_iff work.

"oh" jmatthew@


# 1.28 21-Mar-2019 dlg

don't set VLAN things when configuring promisc

the doco says vlan things should only be set if you're manipulating
something on a specific vlan, otherwise the config applies to
everything, which is how the stack wants things to work at the
moment.

there's still something wrong in here, but let's get this out of
the way first.


# 1.27 21-Mar-2019 dlg

run event callbacks directly in the atq processing

previously events were queued on an SLIST, but multiple link state
events could fire with the same callback. this corrupted the SLIST
and effectively caused an infinite loop.

ok jmatthew@


# 1.26 12-Mar-2019 jmatthew

Until we can figure out why it causes NMIs on some machines, skip the
pre-reset steps described in Intel's datasheet and also their driver code.

ok dlg@


# 1.25 06-Mar-2019 jmatthew

Ignore rx interrupts if the interface isn't supposed to be running.
In some situations the pxe rom seems to leave rx interrupts
pending, so we get them as soon as we turn interrupts on.
Trying to process rx interrupts before we've allocated an rx ring
leads to crashes.

ok dlg@


# 1.24 01-Mar-2019 dlg

use ifiq_input instead of if_input

call if_rxr_livelocked if ifiq_input says to slow down


# 1.23 26-Feb-2019 dlg

tweak the mbuf loading in the tx path so it's easier to read.

count mbuf load failures as output errors so i can see if that's
a problem (it's not, but at least i can see it isn't now).


# 1.22 26-Feb-2019 dlg

put the rx buffer at the end of the cluster.

makes m_prepend later less likely to allocate a new mbuf.


# 1.21 26-Feb-2019 dlg

avoid a deadlock in ixl_down when calling ifq_barrier.

this is particularly noticable on sparc64 when you reboot.

ok jmatthew@


# 1.20 24-Feb-2019 dlg

get rid of an unused softc member


# 1.19 01-Feb-2019 jmatthew

fix up calculation of our physical function id, making the second port
on dual port cards work much better.

ok dlg@


# 1.18 29-Jan-2019 dlg

get rid of some more debug printfs

suggested by jmatthew@


# 1.17 29-Jan-2019 dlg

don't need to print the base queue number.

ok jmatthew@


# 1.16 22-Jan-2019 jmatthew

Increase hardmtu to the maximum according to the datasheet and set the rx
packet size limit to match so jumbos actually work. Larger packets are
split across multiple buffers on the ring, so the buffers themselves stay
the same size.

ok dlg@


# 1.15 22-Jan-2019 jmatthew

Add and remove mac filters for multicast addresses.

ok dlg@


# 1.14 21-Jan-2019 jmatthew

also add a mac filter that ignores vlans for the broadcast address, so we can
see arp requests on vlans, among other things.

ok dlg@


# 1.13 20-Jan-2019 jmatthew

Replace the default mac filter with one that ignores vlans, and enable
promisc vlan mode so we can see vlan tagged traffic.

ok dlg@


# 1.12 20-Jan-2019 jmatthew

implement ixl_rxrinfo, dynamically allocating the buffer to prepare for
multiple receive rings at some point in the future.

ok dlg@


# 1.11 20-Jan-2019 jmatthew

Handle link state change interrupts by issuing IXL_AQ_OP_PHY_LINK_STATUS
to the admin queue. We don't need to wait for or process the reply,
because the existing admin reply queue processing already does it.

ok dlg@


# 1.10 19-Jan-2019 jmatthew

actually set CAUSE_ENA on the rx and tx queues, and re-enable interrupts
at the start of the interrupt handler. now it works well enough to commit
over.

ok dlg@


# 1.9 18-Jan-2019 jmatthew

pack hmc bits in the right order

ok dlg@


# 1.8 18-Nov-2018 jmatthew

request notification of link state changes, which helps us detect
link when it takes a bit longer to establish.

ok dlg@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.7 21-Dec-2017 dlg

add some ifq and ifiq barriers in ixl_down.

move rxfill in ixl_up so the qtail is only written once.


# 1.6 21-Dec-2017 dlg

now that we have multiple input queues in ifnet structs, use them.

for now we still only have one set of tx and rx rings, but sets us up to
bump that number.


# 1.5 15-Dec-2017 dlg

put where im up to into the tree so jmatthew@ can look at it.


# 1.4 29-Nov-2017 dlg

turns out you dont have to configure a vsi as the default in a veb.

what a waste of two days.


# 1.3 29-Nov-2017 dlg

let this build on sparc64 again.


# 1.2 28-Nov-2017 dlg

remove the #if 0ed out ixl_add_veb now that cvs has backed it up

im pretty sure we dont need it if we're just using the chip as a
single ethernet port.


# 1.1 28-Nov-2017 dlg

add ixl(4) for the "Intel Ethernet 700 Series"

this doesn't work yet, but it very recently got too big to hack on
without cvs to help me manage further changes to it.

ok deraadt@


# 1.92 20-Oct-2023 jan

Add missing tcps_outpkttso counter to ixl(4) TSO setup.

ok bluhm@


# 1.91 20-Oct-2023 jan

Improve bad comment.

pointed out by kn@

ok kn@


# 1.90 19-Oct-2023 jan

Enable TCP Segmentation Offloading for ixl(4)

Tested on amd64 and sparc64.
Also tested by bluhm@.

ok bluhm@


Revision tags: OPENBSD_7_4_BASE
# 1.89 29-Sep-2023 bluhm

Replace kernel lock with mutex in ixl(4) media status.

Witness found that sc_atq_mtx mutex is held when kernel lock is
acquired. This might cause a deadlock. Protect sc_media_status
and sc_media_active with the link state mutex instead. Global
fields ifm->ifm_status and ifm->ifm_active are still protected by
kernel lock.

OK tobhe@


# 1.88 19-Jul-2023 jan

Protect ixl(4) admin queue with mutex(9).

with tweaks from bluhm

tested by bluhm

ok bluhm@


Revision tags: OPENBSD_7_3_BASE
# 1.87 06-Feb-2023 jan

consolidate mbuf header parsing on device driver layer

with tweaks from mvs@, mpi@, dlg@, naddy@ and bluhm@

"go for it" deraadt@

ok naddy@, mvs@


# 1.86 26-Jan-2023 deraadt

backing "consolidate mbuf header parsing on device driver layer"
easily repeatable ASSERT happens seconds after starting compiles over nfs.


# 1.85 24-Jan-2023 jan

consolidate mbuf header parsing on device driver layer

with tweaks from mvs@, mpi@ and dlg@

ok mvs@, dlg@


Revision tags: OPENBSD_7_2_BASE
# 1.84 05-Aug-2022 bluhm

The netlock for SIOCSIFMEDIA and SIOCGIFMEDIA ioctl is not necessary.
Legacy drivers run with kernel lock, interface media is MP safe or
has kernel lock. Assert kernel lock in ix(4) and ixl(4).
OK kettenis@


Revision tags: OPENBSD_7_1_BASE
# 1.83 11-Mar-2022 mpi

Constify struct cfattach.


# 1.82 10-Feb-2022 bluhm

Enable receive checksum offloading on ixl(4) network interfaces.
from jan@; test and OK dlg@


# 1.81 09-Feb-2022 dlg

enable hw vlan tag handling in the rx path too.

also tested on both x86 and sparc64.


# 1.80 09-Feb-2022 dlg

enable hardware vlan tagging.

tested on sparc64 and x86


# 1.79 08-Feb-2022 dlg

bring back IPv4, TCP4/6 and UDP4/6 checksum offloading.

this was first introduced in r1.176 by jan@. this diff includes two
fixes to that implementation.

the most important one is to parse the ip and tcp headers before a
possible call to m_defrag. if an l4 offload is requested, it's only
requested by the stack when the payload is correctly aligned and
with each header contiguous in memory. this means you can use
m_getptr and cast the packet data to the relevant headers to read
them directly because that's what the stack does when it's working
on them. this makes it cheap to work on them too.

however, if you m_defrag, it ignores the alignment and ends up
making it unsafe to dereference the ip and tcp/udp payloads on
strict alignment architectures. if we want to look at the headers
after m_defrag, we'd likely have to copy them onto the stack first.

the other fix is to reset the offload bits between packets in the
loop in ixl_start.

another difference is that this code skips parsing the packet if
no checksum offload is requested.

tests and a tweak by bluhm@ to actually use the offloading
tested by me on sparc64 and x86 boxes

ok bluhm@ jmatthew@


# 1.78 09-Jan-2022 jsg

spelling
feedback and ok tb@ jmc@ ok ratchov@


# 1.77 27-Nov-2021 deraadt

previous commit causes gcc to perform an unaligned access to the tcphdr
(at least on sparc64) since it accesses the bitfield using an "int sized"
instructions, rather than the minimally sized byte instruction. This is
permitted by the language laywers who probably prefer we change the tcphdr
in every packet. It is not clear how to convince gcc to avoid this behaviour,
and a week of futzing hasn't found fast path solutions yet. In the meantime
the tree may not be broken.


# 1.76 09-Nov-2021 jan

Add IPv4, TCP4/6 and UDP4/6 checksum offloading.

ok jmatthew@


Revision tags: OPENBSD_7_0_BASE
# 1.75 23-Jul-2021 jmatthew

pci_intr_msix_count() is the function that drivers using multiple MSI-X
vectors use to decide whether to use MSI-X, so make it return 0 if MSI
is not enabled for the device.

fixes problems with ix(4) on older amd64 hardware and current riscv64
ok kettenis@ dlg@


Revision tags: OPENBSD_6_9_BASE
# 1.74 26-Mar-2021 jan

Add PCI ID for Intel X710 10G SFP+ NIC

ok patrick@


# 1.73 26-Feb-2021 jan

Add missing PCI product IDs for x710 10GBase-T into ixl(4)

OK phessler


# 1.72 25-Jan-2021 dlg

if the rx descriptor reports the rss hash, use it for the mbuf flowid.

ok jmatthew@


# 1.71 22-Dec-2020 dlg

name the rx rings like ix does for systat mb


# 1.70 12-Dec-2020 jan

Rename the macro MCLGETI to MCLGETL and removes the dead parameter ifp.

OK dlg@, bluhm@
No Opinion mpi@
Not against it claudio@


# 1.69 02-Nov-2020 dlg

add support for ctl reads and writes on 710 chips with API < 1.5

this gets rid of some annoying errors when bringing such an interface
up, but more importantly is allows RSS to work on these boards with
older firmware.

ok jmatthew@


Revision tags: OPENBSD_6_8_BASE
# 1.68 16-Jul-2020 dlg

use a mutex to serialise the test and set of ifp->if_link_state.

this was serialised by NET_LOCK, but now i get link state change
information in an interrupt context, so i shouldn't (can't) do that
anymore.

ok jmatthew@


# 1.67 16-Jul-2020 dlg

sc_atq_mtx is unused, so get rid of it


# 1.66 12-Jul-2020 dlg

it's not an error if the API doesnt support reading sfp/qsfp stuff.

it's just not supported. the manpage says why.

ok sthen@ deraadt@


# 1.65 11-Jul-2020 dlg

don't complain if the API is too low to support "query phy types".

it just means we won't know what to show in ifconfig media output,
but that's not a huge deal. there's still some more issues around
api versions and driver support that we're working on though.

ok sthen@ jmatthew@


# 1.64 10-Jul-2020 patrick

Change users of IFQ_SET_MAXLEN() and IFQ_IS_EMPTY() to use the "new" API.

ok dlg@ tobhe@


# 1.63 07-Jul-2020 dlg

fix a copy pasto.

from netbsd if_ixl.c r1.39 by yamaguchi


# 1.62 07-Jul-2020 dlg

don't try and be too clever in the kstat update timeout.


# 1.61 07-Jul-2020 dlg

remove some old code from a previous version of the kstat diff.

ixl(4) is only enabled on 64bit archs, so we don't need a 32bit
fallback for the 48 and 64 bit counter reads.


# 1.60 07-Jul-2020 dlg

add kstat support for reading hardware counters.

this chip is annoyingly complicated, which is reflected in how
complicated it is to read counters off the chip. while we just use
ixl as a normal network interface, the chip is capable of being a
switch with physical ports, virtual ports and all sorts of other
functionality, and there are counters in different places for all
these different pieces. in our simple setup the driver interface
is mapped to a single physical port which we talk to via a single
virtual switch interface. this diff adds counters on each interface
for the physical port and for the virtual switch interface (vsi).

the port counters show what the hardware is doing, while the vsi
counters show how the driver is interacting with the chip. for
things like packet counters, these numbers tend to correlate strongly,
but there are some differences. if the chip drops packets cos there's
no descriptors on the rx ring, that's shown in the vsi counters.
problems talking to the physical network (eg, packet corruption off
the wire) are reported on the port counters.

on top of the chip just being complicated, reading the counters is
a complicated activity on its own. because the counters can be read
by multiple consumers in a virtualised environment, the counters
never really get reset. they are also stored in annoyingly small
fields. this means you basically have to poll the chip periodically
and calculate differences between the polls to avoid losing numbers
if they overflow too quickly.


# 1.59 26-Jun-2020 dlg

whitespace fixes, no functional change.


# 1.58 26-Jun-2020 dlg

fix link state handling so we can see link go both up and down.


# 1.57 25-Jun-2020 dlg

LUT writes go through actual regsiters on 710s, not ctl writes.

found by jmatthew@


# 1.56 25-Jun-2020 dlg

add rss/toeplitz support for 710 chips.

this basically distributes incoming packets over the rx rings, where
without this they would all land on ring 0.

note that the 722 stuff is stubbed out at the moment. i don't have
an x722 to test with, so it's hard to get motivated to write the
code for it.

this is based on stuff supplied by christiano haesbaert.


# 1.55 25-Jun-2020 dlg

use the ixl_chip struct to store different rss_hena settings for 710/722


# 1.54 25-Jun-2020 dlg

add definitions for rss bits.

based on info from christiano haesbaert


# 1.53 25-Jun-2020 dlg

we're close to a point where the differences between 710s and 722s matter.

this adds a struct ixl_chip, which should hold the differences in
functionality between 710s and 722s. this adds which type of chip
each product is to the ixl_devices array.

based on stuff from christiano haesbaert


# 1.52 25-Jun-2020 dlg

use intrmap to set up multiple queues across multiple cpus.

ixl(4) is only enabled on amd64 and sparc64, and both of them now
support pci_intr_establish_cpu(), so it is safe to apply this.

a few things to note:

- the chip only supports a power of 2 number of queues, (ie, 1, 2,
4, 8, etc), so this also tests the INTRMAP_POWEROF2 flag to
intrmap_create. i tested this on a box with 6 cpus and it did the
right thing.
- the chip can support a lot of vectors, but we're limiting it to
8 for now.
- rss with toeplitz is not implemented yet, so all rxed packets end
up on the 0th queue for now.

jmatthew@ had done most of the work already, and christiano haesbaert
provided some hints to motivate me to work on this bit of it.

tested by jmatthew@ on sparc64
ok jmatthew@


# 1.51 24-Jun-2020 dlg

set IFQ_SET_MAXLEN to the number of slots on the tx ring, not 1.

this effectively enables tx mitigation on this chip. hrvoje popovski
tested it and discovered it adds about 20% to forwarding performance
on his test machine, and brings it more in line with ix(4) performance.

jmatthew thinks i copied setting it to 1 from myx, but myx resets
it to a proper value later on when it figures out what the chip is
capable of. how embarrassment.


# 1.50 24-Jun-2020 dlg

get rid of the per device sff lock because we only use the global one.

no functional change


# 1.49 21-Jun-2020 jmatthew

The onboard interfaces on T7/S7 machines don't provide a valid MAC address
for themselves, so use the "local-mac-address" Open Firmware property
instead, as done in ix(4).

ok dlg@


# 1.48 09-May-2020 jmatthew

Use MSI-X interrupts where available. The first vector is used for
events and command completions as that's the only vector they can go to.
tx/rx queues are mapped to subsequent vectors.

ok mpi@ dlg@


Revision tags: OPENBSD_6_7_BASE
# 1.47 22-Apr-2020 mpi

Use I40E_QUEUE_TYPE_EOL instead of hardcoding its value localy.

ok jmatthew@


# 1.46 19-Nov-2019 yasuoka

Correct the link speed constant variables. From Shoichi Yamaguchi.

ok jmatthew


Revision tags: OPENBSD_6_6_BASE
# 1.45 02-Oct-2019 yasuoka

When dequeuing an aqb from the live queue and move it to the idle queue,
remove it from the live queue. Found by Shoichi Yamaguchi.

ok dlg


# 1.44 30-Sep-2019 dlg

use the right values when figuring out which if_baudrate to use.

i was basically using ISSET(bitfield, shift) instead of
ISSET(bitfield, 1 << shift), so things didn't line up properly.

before this the baudrate would come out as 0, but lacp mode in
trunk(4) wants a non-zero if_baudrate value to help pick the best
aggregator to use. if the port reports 0, it doesn't get selected.
this probably explains why trunk(4) doesn't like doing lacp on
virtual ethernet interfaces too, but at least i can get a number
that actually means something on a hardware interface. alternatively,
use aggr(4), which doesn't care about the baudrate.

ok jmatthew@


# 1.43 27-Aug-2019 dlg

don't check the page number was set correctly.

there are some extremely terrible modules out there that violate
specs badly. some SFP+ DACs i have don't support page switching,
and always report page 0xff. let userland look at them anyway.


# 1.42 27-Aug-2019 dlg

make SIOCGIFSFFPAGE support QSFP modules better.

basically the "read phy register" admin op uses the "address" field
on SFP modules for the i2c address, but uses that same field for
the "sff page number" for QSFP. this wraps the calls to the admin
op with sfp and qsfp specific code, and chooses between them depending
on what type of phy the fw reports.

tested by me and jmatthew@ pretty hard


# 1.41 29-Jul-2019 jmatthew

remove some VF bits now that iavf(4) exists

ok dlg@


# 1.40 21-Jul-2019 dlg

apply backpressure when ifiq says the stack is getting busy


# 1.39 04-Jul-2019 jmatthew

when the mac address changes, update the mac/vlan filters accordingly.

ok dlg@


# 1.38 04-May-2019 jsg

fix array bounds check in ixl_search_link_speed()
ok dlg@


# 1.37 15-Apr-2019 visa

Use timeout_del_barrier(9) instead of timeout_del(9) followed by
conditional timeout_barrier(9).

OK kn@ dlg@


Revision tags: OPENBSD_6_5_BASE
# 1.36 10-Apr-2019 phessler

Add support for X722 to ixl(4)

full list of pci ids from sthen@

OK sthen@, jmatthew@, deraadt@


# 1.35 10-Apr-2019 dlg

implement support for SIOCGIFSFFPAGE

this relies on some firmware commands for accessing the i2c bus
that are not available until a relatively recent API version. our
nics using API 1.4 and 1.5 don't handle the command, but the API
1.7 nic we have is happy to talk to the module that is plugged into
it.

xl710 cards (the 40g ones with a qsfp+ connector) can be split up
into 4 functions that represent lanes on a single port. you can get
qsfp+ to 4x sfp+ cables so you can use the different lanes/functions
as completely independent interfaces. however, because each lane/
function is one port and therefore module, we need to serialise
access to the module by at least the port. this is to prevent
concurrent reads of different pages of the one module from stepping
on each other. i took the easy path and made a single ixlsff lock,
which is at least conservative.


# 1.34 01-Apr-2019 jmatthew

Enable the full pre-reset code path again, now that we've fixed the cause
of the pcie errors.

ok dlg@


# 1.33 01-Apr-2019 jmatthew

Don't use a prefetchable mapping for the registers - the controller only
accepts specific read and write widths, so prefetching and write combining
can cause it to generate pcie errors.

ok dlg@


# 1.32 29-Mar-2019 dlg

remove ifiq_barrier in ixl_down cos ifiq tasks don't use nic resources.

ixl_down frees all the dma memory used by the rings, but that memory
isn't used by ifiqs, so there's no need to wait for them to finish


# 1.31 22-Mar-2019 dlg

back out 1.28

i misread the doco, having promisc vlan reception is what we want.


# 1.30 22-Mar-2019 dlg

use a cond in ixl_atq_exec to wait for a command to be completed.

the main change is to not run ixl_atq_done, cos the interrupt handler
is supposed to do that for us.


# 1.29 22-Mar-2019 dlg

don't blindly complete admin queue entries

there's a bit in the flags field that the firmware sets when the
command is done, so check to decide if the command is ready to be
completed. this in turn makes ixl_iff work.

"oh" jmatthew@


# 1.28 21-Mar-2019 dlg

don't set VLAN things when configuring promisc

the doco says vlan things should only be set if you're manipulating
something on a specific vlan, otherwise the config applies to
everything, which is how the stack wants things to work at the
moment.

there's still something wrong in here, but let's get this out of
the way first.


# 1.27 21-Mar-2019 dlg

run event callbacks directly in the atq processing

previously events were queued on an SLIST, but multiple link state
events could fire with the same callback. this corrupted the SLIST
and effectively caused an infinite loop.

ok jmatthew@


# 1.26 12-Mar-2019 jmatthew

Until we can figure out why it causes NMIs on some machines, skip the
pre-reset steps described in Intel's datasheet and also their driver code.

ok dlg@


# 1.25 06-Mar-2019 jmatthew

Ignore rx interrupts if the interface isn't supposed to be running.
In some situations the pxe rom seems to leave rx interrupts
pending, so we get them as soon as we turn interrupts on.
Trying to process rx interrupts before we've allocated an rx ring
leads to crashes.

ok dlg@


# 1.24 01-Mar-2019 dlg

use ifiq_input instead of if_input

call if_rxr_livelocked if ifiq_input says to slow down


# 1.23 26-Feb-2019 dlg

tweak the mbuf loading in the tx path so it's easier to read.

count mbuf load failures as output errors so i can see if that's
a problem (it's not, but at least i can see it isn't now).


# 1.22 26-Feb-2019 dlg

put the rx buffer at the end of the cluster.

makes m_prepend later less likely to allocate a new mbuf.


# 1.21 26-Feb-2019 dlg

avoid a deadlock in ixl_down when calling ifq_barrier.

this is particularly noticable on sparc64 when you reboot.

ok jmatthew@


# 1.20 24-Feb-2019 dlg

get rid of an unused softc member


# 1.19 01-Feb-2019 jmatthew

fix up calculation of our physical function id, making the second port
on dual port cards work much better.

ok dlg@


# 1.18 29-Jan-2019 dlg

get rid of some more debug printfs

suggested by jmatthew@


# 1.17 29-Jan-2019 dlg

don't need to print the base queue number.

ok jmatthew@


# 1.16 22-Jan-2019 jmatthew

Increase hardmtu to the maximum according to the datasheet and set the rx
packet size limit to match so jumbos actually work. Larger packets are
split across multiple buffers on the ring, so the buffers themselves stay
the same size.

ok dlg@


# 1.15 22-Jan-2019 jmatthew

Add and remove mac filters for multicast addresses.

ok dlg@


# 1.14 21-Jan-2019 jmatthew

also add a mac filter that ignores vlans for the broadcast address, so we can
see arp requests on vlans, among other things.

ok dlg@


# 1.13 20-Jan-2019 jmatthew

Replace the default mac filter with one that ignores vlans, and enable
promisc vlan mode so we can see vlan tagged traffic.

ok dlg@


# 1.12 20-Jan-2019 jmatthew

implement ixl_rxrinfo, dynamically allocating the buffer to prepare for
multiple receive rings at some point in the future.

ok dlg@


# 1.11 20-Jan-2019 jmatthew

Handle link state change interrupts by issuing IXL_AQ_OP_PHY_LINK_STATUS
to the admin queue. We don't need to wait for or process the reply,
because the existing admin reply queue processing already does it.

ok dlg@


# 1.10 19-Jan-2019 jmatthew

actually set CAUSE_ENA on the rx and tx queues, and re-enable interrupts
at the start of the interrupt handler. now it works well enough to commit
over.

ok dlg@


# 1.9 18-Jan-2019 jmatthew

pack hmc bits in the right order

ok dlg@


# 1.8 18-Nov-2018 jmatthew

request notification of link state changes, which helps us detect
link when it takes a bit longer to establish.

ok dlg@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.7 21-Dec-2017 dlg

add some ifq and ifiq barriers in ixl_down.

move rxfill in ixl_up so the qtail is only written once.


# 1.6 21-Dec-2017 dlg

now that we have multiple input queues in ifnet structs, use them.

for now we still only have one set of tx and rx rings, but sets us up to
bump that number.


# 1.5 15-Dec-2017 dlg

put where im up to into the tree so jmatthew@ can look at it.


# 1.4 29-Nov-2017 dlg

turns out you dont have to configure a vsi as the default in a veb.

what a waste of two days.


# 1.3 29-Nov-2017 dlg

let this build on sparc64 again.


# 1.2 28-Nov-2017 dlg

remove the #if 0ed out ixl_add_veb now that cvs has backed it up

im pretty sure we dont need it if we're just using the chip as a
single ethernet port.


# 1.1 28-Nov-2017 dlg

add ixl(4) for the "Intel Ethernet 700 Series"

this doesn't work yet, but it very recently got too big to hack on
without cvs to help me manage further changes to it.

ok deraadt@


# 1.92 20-Oct-2023 jan

Add missing tcps_outpkttso counter to ixl(4) TSO setup.

ok bluhm@


# 1.91 20-Oct-2023 jan

Improve bad comment.

pointed out by kn@

ok kn@


# 1.90 19-Oct-2023 jan

Enable TCP Segmentation Offloading for ixl(4)

Tested on amd64 and sparc64.
Also tested by bluhm@.

ok bluhm@


Revision tags: OPENBSD_7_4_BASE
# 1.89 29-Sep-2023 bluhm

Replace kernel lock with mutex in ixl(4) media status.

Witness found that sc_atq_mtx mutex is held when kernel lock is
acquired. This might cause a deadlock. Protect sc_media_status
and sc_media_active with the link state mutex instead. Global
fields ifm->ifm_status and ifm->ifm_active are still protected by
kernel lock.

OK tobhe@


# 1.88 19-Jul-2023 jan

Protect ixl(4) admin queue with mutex(9).

with tweaks from bluhm

tested by bluhm

ok bluhm@


Revision tags: OPENBSD_7_3_BASE
# 1.87 06-Feb-2023 jan

consolidate mbuf header parsing on device driver layer

with tweaks from mvs@, mpi@, dlg@, naddy@ and bluhm@

"go for it" deraadt@

ok naddy@, mvs@


# 1.86 26-Jan-2023 deraadt

backing "consolidate mbuf header parsing on device driver layer"
easily repeatable ASSERT happens seconds after starting compiles over nfs.


# 1.85 24-Jan-2023 jan

consolidate mbuf header parsing on device driver layer

with tweaks from mvs@, mpi@ and dlg@

ok mvs@, dlg@


Revision tags: OPENBSD_7_2_BASE
# 1.84 05-Aug-2022 bluhm

The netlock for SIOCSIFMEDIA and SIOCGIFMEDIA ioctl is not necessary.
Legacy drivers run with kernel lock, interface media is MP safe or
has kernel lock. Assert kernel lock in ix(4) and ixl(4).
OK kettenis@


Revision tags: OPENBSD_7_1_BASE
# 1.83 11-Mar-2022 mpi

Constify struct cfattach.


# 1.82 10-Feb-2022 bluhm

Enable receive checksum offloading on ixl(4) network interfaces.
from jan@; test and OK dlg@


# 1.81 09-Feb-2022 dlg

enable hw vlan tag handling in the rx path too.

also tested on both x86 and sparc64.


# 1.80 09-Feb-2022 dlg

enable hardware vlan tagging.

tested on sparc64 and x86


# 1.79 08-Feb-2022 dlg

bring back IPv4, TCP4/6 and UDP4/6 checksum offloading.

this was first introduced in r1.176 by jan@. this diff includes two
fixes to that implementation.

the most important one is to parse the ip and tcp headers before a
possible call to m_defrag. if an l4 offload is requested, it's only
requested by the stack when the payload is correctly aligned and
with each header contiguous in memory. this means you can use
m_getptr and cast the packet data to the relevant headers to read
them directly because that's what the stack does when it's working
on them. this makes it cheap to work on them too.

however, if you m_defrag, it ignores the alignment and ends up
making it unsafe to dereference the ip and tcp/udp payloads on
strict alignment architectures. if we want to look at the headers
after m_defrag, we'd likely have to copy them onto the stack first.

the other fix is to reset the offload bits between packets in the
loop in ixl_start.

another difference is that this code skips parsing the packet if
no checksum offload is requested.

tests and a tweak by bluhm@ to actually use the offloading
tested by me on sparc64 and x86 boxes

ok bluhm@ jmatthew@


# 1.78 09-Jan-2022 jsg

spelling
feedback and ok tb@ jmc@ ok ratchov@


# 1.77 27-Nov-2021 deraadt

previous commit causes gcc to perform an unaligned access to the tcphdr
(at least on sparc64) since it accesses the bitfield using an "int sized"
instructions, rather than the minimally sized byte instruction. This is
permitted by the language laywers who probably prefer we change the tcphdr
in every packet. It is not clear how to convince gcc to avoid this behaviour,
and a week of futzing hasn't found fast path solutions yet. In the meantime
the tree may not be broken.


# 1.76 09-Nov-2021 jan

Add IPv4, TCP4/6 and UDP4/6 checksum offloading.

ok jmatthew@


Revision tags: OPENBSD_7_0_BASE
# 1.75 23-Jul-2021 jmatthew

pci_intr_msix_count() is the function that drivers using multiple MSI-X
vectors use to decide whether to use MSI-X, so make it return 0 if MSI
is not enabled for the device.

fixes problems with ix(4) on older amd64 hardware and current riscv64
ok kettenis@ dlg@


Revision tags: OPENBSD_6_9_BASE
# 1.74 26-Mar-2021 jan

Add PCI ID for Intel X710 10G SFP+ NIC

ok patrick@


# 1.73 26-Feb-2021 jan

Add missing PCI product IDs for x710 10GBase-T into ixl(4)

OK phessler


# 1.72 25-Jan-2021 dlg

if the rx descriptor reports the rss hash, use it for the mbuf flowid.

ok jmatthew@


# 1.71 22-Dec-2020 dlg

name the rx rings like ix does for systat mb


# 1.70 12-Dec-2020 jan

Rename the macro MCLGETI to MCLGETL and removes the dead parameter ifp.

OK dlg@, bluhm@
No Opinion mpi@
Not against it claudio@


# 1.69 02-Nov-2020 dlg

add support for ctl reads and writes on 710 chips with API < 1.5

this gets rid of some annoying errors when bringing such an interface
up, but more importantly is allows RSS to work on these boards with
older firmware.

ok jmatthew@


Revision tags: OPENBSD_6_8_BASE
# 1.68 16-Jul-2020 dlg

use a mutex to serialise the test and set of ifp->if_link_state.

this was serialised by NET_LOCK, but now i get link state change
information in an interrupt context, so i shouldn't (can't) do that
anymore.

ok jmatthew@


# 1.67 16-Jul-2020 dlg

sc_atq_mtx is unused, so get rid of it


# 1.66 12-Jul-2020 dlg

it's not an error if the API doesnt support reading sfp/qsfp stuff.

it's just not supported. the manpage says why.

ok sthen@ deraadt@


# 1.65 11-Jul-2020 dlg

don't complain if the API is too low to support "query phy types".

it just means we won't know what to show in ifconfig media output,
but that's not a huge deal. there's still some more issues around
api versions and driver support that we're working on though.

ok sthen@ jmatthew@


# 1.64 10-Jul-2020 patrick

Change users of IFQ_SET_MAXLEN() and IFQ_IS_EMPTY() to use the "new" API.

ok dlg@ tobhe@


# 1.63 07-Jul-2020 dlg

fix a copy pasto.

from netbsd if_ixl.c r1.39 by yamaguchi


# 1.62 07-Jul-2020 dlg

don't try and be too clever in the kstat update timeout.


# 1.61 07-Jul-2020 dlg

remove some old code from a previous version of the kstat diff.

ixl(4) is only enabled on 64bit archs, so we don't need a 32bit
fallback for the 48 and 64 bit counter reads.


# 1.60 07-Jul-2020 dlg

add kstat support for reading hardware counters.

this chip is annoyingly complicated, which is reflected in how
complicated it is to read counters off the chip. while we just use
ixl as a normal network interface, the chip is capable of being a
switch with physical ports, virtual ports and all sorts of other
functionality, and there are counters in different places for all
these different pieces. in our simple setup the driver interface
is mapped to a single physical port which we talk to via a single
virtual switch interface. this diff adds counters on each interface
for the physical port and for the virtual switch interface (vsi).

the port counters show what the hardware is doing, while the vsi
counters show how the driver is interacting with the chip. for
things like packet counters, these numbers tend to correlate strongly,
but there are some differences. if the chip drops packets cos there's
no descriptors on the rx ring, that's shown in the vsi counters.
problems talking to the physical network (eg, packet corruption off
the wire) are reported on the port counters.

on top of the chip just being complicated, reading the counters is
a complicated activity on its own. because the counters can be read
by multiple consumers in a virtualised environment, the counters
never really get reset. they are also stored in annoyingly small
fields. this means you basically have to poll the chip periodically
and calculate differences between the polls to avoid losing numbers
if they overflow too quickly.


# 1.59 26-Jun-2020 dlg

whitespace fixes, no functional change.


# 1.58 26-Jun-2020 dlg

fix link state handling so we can see link go both up and down.


# 1.57 25-Jun-2020 dlg

LUT writes go through actual regsiters on 710s, not ctl writes.

found by jmatthew@


# 1.56 25-Jun-2020 dlg

add rss/toeplitz support for 710 chips.

this basically distributes incoming packets over the rx rings, where
without this they would all land on ring 0.

note that the 722 stuff is stubbed out at the moment. i don't have
an x722 to test with, so it's hard to get motivated to write the
code for it.

this is based on stuff supplied by christiano haesbaert.


# 1.55 25-Jun-2020 dlg

use the ixl_chip struct to store different rss_hena settings for 710/722


# 1.54 25-Jun-2020 dlg

add definitions for rss bits.

based on info from christiano haesbaert


# 1.53 25-Jun-2020 dlg

we're close to a point where the differences between 710s and 722s matter.

this adds a struct ixl_chip, which should hold the differences in
functionality between 710s and 722s. this adds which type of chip
each product is to the ixl_devices array.

based on stuff from christiano haesbaert


# 1.52 25-Jun-2020 dlg

use intrmap to set up multiple queues across multiple cpus.

ixl(4) is only enabled on amd64 and sparc64, and both of them now
support pci_intr_establish_cpu(), so it is safe to apply this.

a few things to note:

- the chip only supports a power of 2 number of queues, (ie, 1, 2,
4, 8, etc), so this also tests the INTRMAP_POWEROF2 flag to
intrmap_create. i tested this on a box with 6 cpus and it did the
right thing.
- the chip can support a lot of vectors, but we're limiting it to
8 for now.
- rss with toeplitz is not implemented yet, so all rxed packets end
up on the 0th queue for now.

jmatthew@ had done most of the work already, and christiano haesbaert
provided some hints to motivate me to work on this bit of it.

tested by jmatthew@ on sparc64
ok jmatthew@


# 1.51 24-Jun-2020 dlg

set IFQ_SET_MAXLEN to the number of slots on the tx ring, not 1.

this effectively enables tx mitigation on this chip. hrvoje popovski
tested it and discovered it adds about 20% to forwarding performance
on his test machine, and brings it more in line with ix(4) performance.

jmatthew thinks i copied setting it to 1 from myx, but myx resets
it to a proper value later on when it figures out what the chip is
capable of. how embarrassment.


# 1.50 24-Jun-2020 dlg

get rid of the per device sff lock because we only use the global one.

no functional change


# 1.49 21-Jun-2020 jmatthew

The onboard interfaces on T7/S7 machines don't provide a valid MAC address
for themselves, so use the "local-mac-address" Open Firmware property
instead, as done in ix(4).

ok dlg@


# 1.48 09-May-2020 jmatthew

Use MSI-X interrupts where available. The first vector is used for
events and command completions as that's the only vector they can go to.
tx/rx queues are mapped to subsequent vectors.

ok mpi@ dlg@


Revision tags: OPENBSD_6_7_BASE
# 1.47 22-Apr-2020 mpi

Use I40E_QUEUE_TYPE_EOL instead of hardcoding its value localy.

ok jmatthew@


# 1.46 19-Nov-2019 yasuoka

Correct the link speed constant variables. From Shoichi Yamaguchi.

ok jmatthew


Revision tags: OPENBSD_6_6_BASE
# 1.45 02-Oct-2019 yasuoka

When dequeuing an aqb from the live queue and move it to the idle queue,
remove it from the live queue. Found by Shoichi Yamaguchi.

ok dlg


# 1.44 30-Sep-2019 dlg

use the right values when figuring out which if_baudrate to use.

i was basically using ISSET(bitfield, shift) instead of
ISSET(bitfield, 1 << shift), so things didn't line up properly.

before this the baudrate would come out as 0, but lacp mode in
trunk(4) wants a non-zero if_baudrate value to help pick the best
aggregator to use. if the port reports 0, it doesn't get selected.
this probably explains why trunk(4) doesn't like doing lacp on
virtual ethernet interfaces too, but at least i can get a number
that actually means something on a hardware interface. alternatively,
use aggr(4), which doesn't care about the baudrate.

ok jmatthew@


# 1.43 27-Aug-2019 dlg

don't check the page number was set correctly.

there are some extremely terrible modules out there that violate
specs badly. some SFP+ DACs i have don't support page switching,
and always report page 0xff. let userland look at them anyway.


# 1.42 27-Aug-2019 dlg

make SIOCGIFSFFPAGE support QSFP modules better.

basically the "read phy register" admin op uses the "address" field
on SFP modules for the i2c address, but uses that same field for
the "sff page number" for QSFP. this wraps the calls to the admin
op with sfp and qsfp specific code, and chooses between them depending
on what type of phy the fw reports.

tested by me and jmatthew@ pretty hard


# 1.41 29-Jul-2019 jmatthew

remove some VF bits now that iavf(4) exists

ok dlg@


# 1.40 21-Jul-2019 dlg

apply backpressure when ifiq says the stack is getting busy


# 1.39 04-Jul-2019 jmatthew

when the mac address changes, update the mac/vlan filters accordingly.

ok dlg@


# 1.38 04-May-2019 jsg

fix array bounds check in ixl_search_link_speed()
ok dlg@


# 1.37 15-Apr-2019 visa

Use timeout_del_barrier(9) instead of timeout_del(9) followed by
conditional timeout_barrier(9).

OK kn@ dlg@


Revision tags: OPENBSD_6_5_BASE
# 1.36 10-Apr-2019 phessler

Add support for X722 to ixl(4)

full list of pci ids from sthen@

OK sthen@, jmatthew@, deraadt@


# 1.35 10-Apr-2019 dlg

implement support for SIOCGIFSFFPAGE

this relies on some firmware commands for accessing the i2c bus
that are not available until a relatively recent API version. our
nics using API 1.4 and 1.5 don't handle the command, but the API
1.7 nic we have is happy to talk to the module that is plugged into
it.

xl710 cards (the 40g ones with a qsfp+ connector) can be split up
into 4 functions that represent lanes on a single port. you can get
qsfp+ to 4x sfp+ cables so you can use the different lanes/functions
as completely independent interfaces. however, because each lane/
function is one port and therefore module, we need to serialise
access to the module by at least the port. this is to prevent
concurrent reads of different pages of the one module from stepping
on each other. i took the easy path and made a single ixlsff lock,
which is at least conservative.


# 1.34 01-Apr-2019 jmatthew

Enable the full pre-reset code path again, now that we've fixed the cause
of the pcie errors.

ok dlg@


# 1.33 01-Apr-2019 jmatthew

Don't use a prefetchable mapping for the registers - the controller only
accepts specific read and write widths, so prefetching and write combining
can cause it to generate pcie errors.

ok dlg@


# 1.32 29-Mar-2019 dlg

remove ifiq_barrier in ixl_down cos ifiq tasks don't use nic resources.

ixl_down frees all the dma memory used by the rings, but that memory
isn't used by ifiqs, so there's no need to wait for them to finish


# 1.31 22-Mar-2019 dlg

back out 1.28

i misread the doco, having promisc vlan reception is what we want.


# 1.30 22-Mar-2019 dlg

use a cond in ixl_atq_exec to wait for a command to be completed.

the main change is to not run ixl_atq_done, cos the interrupt handler
is supposed to do that for us.


# 1.29 22-Mar-2019 dlg

don't blindly complete admin queue entries

there's a bit in the flags field that the firmware sets when the
command is done, so check to decide if the command is ready to be
completed. this in turn makes ixl_iff work.

"oh" jmatthew@


# 1.28 21-Mar-2019 dlg

don't set VLAN things when configuring promisc

the doco says vlan things should only be set if you're manipulating
something on a specific vlan, otherwise the config applies to
everything, which is how the stack wants things to work at the
moment.

there's still something wrong in here, but let's get this out of
the way first.


# 1.27 21-Mar-2019 dlg

run event callbacks directly in the atq processing

previously events were queued on an SLIST, but multiple link state
events could fire with the same callback. this corrupted the SLIST
and effectively caused an infinite loop.

ok jmatthew@


# 1.26 12-Mar-2019 jmatthew

Until we can figure out why it causes NMIs on some machines, skip the
pre-reset steps described in Intel's datasheet and also their driver code.

ok dlg@


# 1.25 06-Mar-2019 jmatthew

Ignore rx interrupts if the interface isn't supposed to be running.
In some situations the pxe rom seems to leave rx interrupts
pending, so we get them as soon as we turn interrupts on.
Trying to process rx interrupts before we've allocated an rx ring
leads to crashes.

ok dlg@


# 1.24 01-Mar-2019 dlg

use ifiq_input instead of if_input

call if_rxr_livelocked if ifiq_input says to slow down


# 1.23 26-Feb-2019 dlg

tweak the mbuf loading in the tx path so it's easier to read.

count mbuf load failures as output errors so i can see if that's
a problem (it's not, but at least i can see it isn't now).


# 1.22 26-Feb-2019 dlg

put the rx buffer at the end of the cluster.

makes m_prepend later less likely to allocate a new mbuf.


# 1.21 26-Feb-2019 dlg

avoid a deadlock in ixl_down when calling ifq_barrier.

this is particularly noticable on sparc64 when you reboot.

ok jmatthew@


# 1.20 24-Feb-2019 dlg

get rid of an unused softc member


# 1.19 01-Feb-2019 jmatthew

fix up calculation of our physical function id, making the second port
on dual port cards work much better.

ok dlg@


# 1.18 29-Jan-2019 dlg

get rid of some more debug printfs

suggested by jmatthew@


# 1.17 29-Jan-2019 dlg

don't need to print the base queue number.

ok jmatthew@


# 1.16 22-Jan-2019 jmatthew

Increase hardmtu to the maximum according to the datasheet and set the rx
packet size limit to match so jumbos actually work. Larger packets are
split across multiple buffers on the ring, so the buffers themselves stay
the same size.

ok dlg@


# 1.15 22-Jan-2019 jmatthew

Add and remove mac filters for multicast addresses.

ok dlg@


# 1.14 21-Jan-2019 jmatthew

also add a mac filter that ignores vlans for the broadcast address, so we can
see arp requests on vlans, among other things.

ok dlg@


# 1.13 20-Jan-2019 jmatthew

Replace the default mac filter with one that ignores vlans, and enable
promisc vlan mode so we can see vlan tagged traffic.

ok dlg@


# 1.12 20-Jan-2019 jmatthew

implement ixl_rxrinfo, dynamically allocating the buffer to prepare for
multiple receive rings at some point in the future.

ok dlg@


# 1.11 20-Jan-2019 jmatthew

Handle link state change interrupts by issuing IXL_AQ_OP_PHY_LINK_STATUS
to the admin queue. We don't need to wait for or process the reply,
because the existing admin reply queue processing already does it.

ok dlg@


# 1.10 19-Jan-2019 jmatthew

actually set CAUSE_ENA on the rx and tx queues, and re-enable interrupts
at the start of the interrupt handler. now it works well enough to commit
over.

ok dlg@


# 1.9 18-Jan-2019 jmatthew

pack hmc bits in the right order

ok dlg@


# 1.8 18-Nov-2018 jmatthew

request notification of link state changes, which helps us detect
link when it takes a bit longer to establish.

ok dlg@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.7 21-Dec-2017 dlg

add some ifq and ifiq barriers in ixl_down.

move rxfill in ixl_up so the qtail is only written once.


# 1.6 21-Dec-2017 dlg

now that we have multiple input queues in ifnet structs, use them.

for now we still only have one set of tx and rx rings, but sets us up to
bump that number.


# 1.5 15-Dec-2017 dlg

put where im up to into the tree so jmatthew@ can look at it.


# 1.4 29-Nov-2017 dlg

turns out you dont have to configure a vsi as the default in a veb.

what a waste of two days.


# 1.3 29-Nov-2017 dlg

let this build on sparc64 again.


# 1.2 28-Nov-2017 dlg

remove the #if 0ed out ixl_add_veb now that cvs has backed it up

im pretty sure we dont need it if we're just using the chip as a
single ethernet port.


# 1.1 28-Nov-2017 dlg

add ixl(4) for the "Intel Ethernet 700 Series"

this doesn't work yet, but it very recently got too big to hack on
without cvs to help me manage further changes to it.

ok deraadt@


# 1.89 29-Sep-2023 bluhm

Replace kernel lock with mutex in ixl(4) media status.

Witness found that sc_atq_mtx mutex is held when kernel lock is
acquired. This might cause a deadlock. Protect sc_media_status
and sc_media_active with the link state mutex instead. Global
fields ifm->ifm_status and ifm->ifm_active are still protected by
kernel lock.

OK tobhe@


# 1.88 19-Jul-2023 jan

Protect ixl(4) admin queue with mutex(9).

with tweaks from bluhm

tested by bluhm

ok bluhm@


Revision tags: OPENBSD_7_3_BASE
# 1.87 06-Feb-2023 jan

consolidate mbuf header parsing on device driver layer

with tweaks from mvs@, mpi@, dlg@, naddy@ and bluhm@

"go for it" deraadt@

ok naddy@, mvs@


# 1.86 26-Jan-2023 deraadt

backing "consolidate mbuf header parsing on device driver layer"
easily repeatable ASSERT happens seconds after starting compiles over nfs.


# 1.85 24-Jan-2023 jan

consolidate mbuf header parsing on device driver layer

with tweaks from mvs@, mpi@ and dlg@

ok mvs@, dlg@


Revision tags: OPENBSD_7_2_BASE
# 1.84 05-Aug-2022 bluhm

The netlock for SIOCSIFMEDIA and SIOCGIFMEDIA ioctl is not necessary.
Legacy drivers run with kernel lock, interface media is MP safe or
has kernel lock. Assert kernel lock in ix(4) and ixl(4).
OK kettenis@


Revision tags: OPENBSD_7_1_BASE
# 1.83 11-Mar-2022 mpi

Constify struct cfattach.


# 1.82 10-Feb-2022 bluhm

Enable receive checksum offloading on ixl(4) network interfaces.
from jan@; test and OK dlg@


# 1.81 09-Feb-2022 dlg

enable hw vlan tag handling in the rx path too.

also tested on both x86 and sparc64.


# 1.80 09-Feb-2022 dlg

enable hardware vlan tagging.

tested on sparc64 and x86


# 1.79 08-Feb-2022 dlg

bring back IPv4, TCP4/6 and UDP4/6 checksum offloading.

this was first introduced in r1.176 by jan@. this diff includes two
fixes to that implementation.

the most important one is to parse the ip and tcp headers before a
possible call to m_defrag. if an l4 offload is requested, it's only
requested by the stack when the payload is correctly aligned and
with each header contiguous in memory. this means you can use
m_getptr and cast the packet data to the relevant headers to read
them directly because that's what the stack does when it's working
on them. this makes it cheap to work on them too.

however, if you m_defrag, it ignores the alignment and ends up
making it unsafe to dereference the ip and tcp/udp payloads on
strict alignment architectures. if we want to look at the headers
after m_defrag, we'd likely have to copy them onto the stack first.

the other fix is to reset the offload bits between packets in the
loop in ixl_start.

another difference is that this code skips parsing the packet if
no checksum offload is requested.

tests and a tweak by bluhm@ to actually use the offloading
tested by me on sparc64 and x86 boxes

ok bluhm@ jmatthew@


# 1.78 09-Jan-2022 jsg

spelling
feedback and ok tb@ jmc@ ok ratchov@


# 1.77 27-Nov-2021 deraadt

previous commit causes gcc to perform an unaligned access to the tcphdr
(at least on sparc64) since it accesses the bitfield using an "int sized"
instructions, rather than the minimally sized byte instruction. This is
permitted by the language laywers who probably prefer we change the tcphdr
in every packet. It is not clear how to convince gcc to avoid this behaviour,
and a week of futzing hasn't found fast path solutions yet. In the meantime
the tree may not be broken.


# 1.76 09-Nov-2021 jan

Add IPv4, TCP4/6 and UDP4/6 checksum offloading.

ok jmatthew@


Revision tags: OPENBSD_7_0_BASE
# 1.75 23-Jul-2021 jmatthew

pci_intr_msix_count() is the function that drivers using multiple MSI-X
vectors use to decide whether to use MSI-X, so make it return 0 if MSI
is not enabled for the device.

fixes problems with ix(4) on older amd64 hardware and current riscv64
ok kettenis@ dlg@


Revision tags: OPENBSD_6_9_BASE
# 1.74 26-Mar-2021 jan

Add PCI ID for Intel X710 10G SFP+ NIC

ok patrick@


# 1.73 26-Feb-2021 jan

Add missing PCI product IDs for x710 10GBase-T into ixl(4)

OK phessler


# 1.72 25-Jan-2021 dlg

if the rx descriptor reports the rss hash, use it for the mbuf flowid.

ok jmatthew@


# 1.71 22-Dec-2020 dlg

name the rx rings like ix does for systat mb


# 1.70 12-Dec-2020 jan

Rename the macro MCLGETI to MCLGETL and removes the dead parameter ifp.

OK dlg@, bluhm@
No Opinion mpi@
Not against it claudio@


# 1.69 02-Nov-2020 dlg

add support for ctl reads and writes on 710 chips with API < 1.5

this gets rid of some annoying errors when bringing such an interface
up, but more importantly is allows RSS to work on these boards with
older firmware.

ok jmatthew@


Revision tags: OPENBSD_6_8_BASE
# 1.68 16-Jul-2020 dlg

use a mutex to serialise the test and set of ifp->if_link_state.

this was serialised by NET_LOCK, but now i get link state change
information in an interrupt context, so i shouldn't (can't) do that
anymore.

ok jmatthew@


# 1.67 16-Jul-2020 dlg

sc_atq_mtx is unused, so get rid of it


# 1.66 12-Jul-2020 dlg

it's not an error if the API doesnt support reading sfp/qsfp stuff.

it's just not supported. the manpage says why.

ok sthen@ deraadt@


# 1.65 11-Jul-2020 dlg

don't complain if the API is too low to support "query phy types".

it just means we won't know what to show in ifconfig media output,
but that's not a huge deal. there's still some more issues around
api versions and driver support that we're working on though.

ok sthen@ jmatthew@


# 1.64 10-Jul-2020 patrick

Change users of IFQ_SET_MAXLEN() and IFQ_IS_EMPTY() to use the "new" API.

ok dlg@ tobhe@


# 1.63 07-Jul-2020 dlg

fix a copy pasto.

from netbsd if_ixl.c r1.39 by yamaguchi


# 1.62 07-Jul-2020 dlg

don't try and be too clever in the kstat update timeout.


# 1.61 07-Jul-2020 dlg

remove some old code from a previous version of the kstat diff.

ixl(4) is only enabled on 64bit archs, so we don't need a 32bit
fallback for the 48 and 64 bit counter reads.


# 1.60 07-Jul-2020 dlg

add kstat support for reading hardware counters.

this chip is annoyingly complicated, which is reflected in how
complicated it is to read counters off the chip. while we just use
ixl as a normal network interface, the chip is capable of being a
switch with physical ports, virtual ports and all sorts of other
functionality, and there are counters in different places for all
these different pieces. in our simple setup the driver interface
is mapped to a single physical port which we talk to via a single
virtual switch interface. this diff adds counters on each interface
for the physical port and for the virtual switch interface (vsi).

the port counters show what the hardware is doing, while the vsi
counters show how the driver is interacting with the chip. for
things like packet counters, these numbers tend to correlate strongly,
but there are some differences. if the chip drops packets cos there's
no descriptors on the rx ring, that's shown in the vsi counters.
problems talking to the physical network (eg, packet corruption off
the wire) are reported on the port counters.

on top of the chip just being complicated, reading the counters is
a complicated activity on its own. because the counters can be read
by multiple consumers in a virtualised environment, the counters
never really get reset. they are also stored in annoyingly small
fields. this means you basically have to poll the chip periodically
and calculate differences between the polls to avoid losing numbers
if they overflow too quickly.


# 1.59 26-Jun-2020 dlg

whitespace fixes, no functional change.


# 1.58 26-Jun-2020 dlg

fix link state handling so we can see link go both up and down.


# 1.57 25-Jun-2020 dlg

LUT writes go through actual regsiters on 710s, not ctl writes.

found by jmatthew@


# 1.56 25-Jun-2020 dlg

add rss/toeplitz support for 710 chips.

this basically distributes incoming packets over the rx rings, where
without this they would all land on ring 0.

note that the 722 stuff is stubbed out at the moment. i don't have
an x722 to test with, so it's hard to get motivated to write the
code for it.

this is based on stuff supplied by christiano haesbaert.


# 1.55 25-Jun-2020 dlg

use the ixl_chip struct to store different rss_hena settings for 710/722


# 1.54 25-Jun-2020 dlg

add definitions for rss bits.

based on info from christiano haesbaert


# 1.53 25-Jun-2020 dlg

we're close to a point where the differences between 710s and 722s matter.

this adds a struct ixl_chip, which should hold the differences in
functionality between 710s and 722s. this adds which type of chip
each product is to the ixl_devices array.

based on stuff from christiano haesbaert


# 1.52 25-Jun-2020 dlg

use intrmap to set up multiple queues across multiple cpus.

ixl(4) is only enabled on amd64 and sparc64, and both of them now
support pci_intr_establish_cpu(), so it is safe to apply this.

a few things to note:

- the chip only supports a power of 2 number of queues, (ie, 1, 2,
4, 8, etc), so this also tests the INTRMAP_POWEROF2 flag to
intrmap_create. i tested this on a box with 6 cpus and it did the
right thing.
- the chip can support a lot of vectors, but we're limiting it to
8 for now.
- rss with toeplitz is not implemented yet, so all rxed packets end
up on the 0th queue for now.

jmatthew@ had done most of the work already, and christiano haesbaert
provided some hints to motivate me to work on this bit of it.

tested by jmatthew@ on sparc64
ok jmatthew@


# 1.51 24-Jun-2020 dlg

set IFQ_SET_MAXLEN to the number of slots on the tx ring, not 1.

this effectively enables tx mitigation on this chip. hrvoje popovski
tested it and discovered it adds about 20% to forwarding performance
on his test machine, and brings it more in line with ix(4) performance.

jmatthew thinks i copied setting it to 1 from myx, but myx resets
it to a proper value later on when it figures out what the chip is
capable of. how embarrassment.


# 1.50 24-Jun-2020 dlg

get rid of the per device sff lock because we only use the global one.

no functional change


# 1.49 21-Jun-2020 jmatthew

The onboard interfaces on T7/S7 machines don't provide a valid MAC address
for themselves, so use the "local-mac-address" Open Firmware property
instead, as done in ix(4).

ok dlg@


# 1.48 09-May-2020 jmatthew

Use MSI-X interrupts where available. The first vector is used for
events and command completions as that's the only vector they can go to.
tx/rx queues are mapped to subsequent vectors.

ok mpi@ dlg@


Revision tags: OPENBSD_6_7_BASE
# 1.47 22-Apr-2020 mpi

Use I40E_QUEUE_TYPE_EOL instead of hardcoding its value localy.

ok jmatthew@


# 1.46 19-Nov-2019 yasuoka

Correct the link speed constant variables. From Shoichi Yamaguchi.

ok jmatthew


Revision tags: OPENBSD_6_6_BASE
# 1.45 02-Oct-2019 yasuoka

When dequeuing an aqb from the live queue and move it to the idle queue,
remove it from the live queue. Found by Shoichi Yamaguchi.

ok dlg


# 1.44 30-Sep-2019 dlg

use the right values when figuring out which if_baudrate to use.

i was basically using ISSET(bitfield, shift) instead of
ISSET(bitfield, 1 << shift), so things didn't line up properly.

before this the baudrate would come out as 0, but lacp mode in
trunk(4) wants a non-zero if_baudrate value to help pick the best
aggregator to use. if the port reports 0, it doesn't get selected.
this probably explains why trunk(4) doesn't like doing lacp on
virtual ethernet interfaces too, but at least i can get a number
that actually means something on a hardware interface. alternatively,
use aggr(4), which doesn't care about the baudrate.

ok jmatthew@


# 1.43 27-Aug-2019 dlg

don't check the page number was set correctly.

there are some extremely terrible modules out there that violate
specs badly. some SFP+ DACs i have don't support page switching,
and always report page 0xff. let userland look at them anyway.


# 1.42 27-Aug-2019 dlg

make SIOCGIFSFFPAGE support QSFP modules better.

basically the "read phy register" admin op uses the "address" field
on SFP modules for the i2c address, but uses that same field for
the "sff page number" for QSFP. this wraps the calls to the admin
op with sfp and qsfp specific code, and chooses between them depending
on what type of phy the fw reports.

tested by me and jmatthew@ pretty hard


# 1.41 29-Jul-2019 jmatthew

remove some VF bits now that iavf(4) exists

ok dlg@


# 1.40 21-Jul-2019 dlg

apply backpressure when ifiq says the stack is getting busy


# 1.39 04-Jul-2019 jmatthew

when the mac address changes, update the mac/vlan filters accordingly.

ok dlg@


# 1.38 04-May-2019 jsg

fix array bounds check in ixl_search_link_speed()
ok dlg@


# 1.37 15-Apr-2019 visa

Use timeout_del_barrier(9) instead of timeout_del(9) followed by
conditional timeout_barrier(9).

OK kn@ dlg@


Revision tags: OPENBSD_6_5_BASE
# 1.36 10-Apr-2019 phessler

Add support for X722 to ixl(4)

full list of pci ids from sthen@

OK sthen@, jmatthew@, deraadt@


# 1.35 10-Apr-2019 dlg

implement support for SIOCGIFSFFPAGE

this relies on some firmware commands for accessing the i2c bus
that are not available until a relatively recent API version. our
nics using API 1.4 and 1.5 don't handle the command, but the API
1.7 nic we have is happy to talk to the module that is plugged into
it.

xl710 cards (the 40g ones with a qsfp+ connector) can be split up
into 4 functions that represent lanes on a single port. you can get
qsfp+ to 4x sfp+ cables so you can use the different lanes/functions
as completely independent interfaces. however, because each lane/
function is one port and therefore module, we need to serialise
access to the module by at least the port. this is to prevent
concurrent reads of different pages of the one module from stepping
on each other. i took the easy path and made a single ixlsff lock,
which is at least conservative.


# 1.34 01-Apr-2019 jmatthew

Enable the full pre-reset code path again, now that we've fixed the cause
of the pcie errors.

ok dlg@


# 1.33 01-Apr-2019 jmatthew

Don't use a prefetchable mapping for the registers - the controller only
accepts specific read and write widths, so prefetching and write combining
can cause it to generate pcie errors.

ok dlg@


# 1.32 29-Mar-2019 dlg

remove ifiq_barrier in ixl_down cos ifiq tasks don't use nic resources.

ixl_down frees all the dma memory used by the rings, but that memory
isn't used by ifiqs, so there's no need to wait for them to finish


# 1.31 22-Mar-2019 dlg

back out 1.28

i misread the doco, having promisc vlan reception is what we want.


# 1.30 22-Mar-2019 dlg

use a cond in ixl_atq_exec to wait for a command to be completed.

the main change is to not run ixl_atq_done, cos the interrupt handler
is supposed to do that for us.


# 1.29 22-Mar-2019 dlg

don't blindly complete admin queue entries

there's a bit in the flags field that the firmware sets when the
command is done, so check to decide if the command is ready to be
completed. this in turn makes ixl_iff work.

"oh" jmatthew@


# 1.28 21-Mar-2019 dlg

don't set VLAN things when configuring promisc

the doco says vlan things should only be set if you're manipulating
something on a specific vlan, otherwise the config applies to
everything, which is how the stack wants things to work at the
moment.

there's still something wrong in here, but let's get this out of
the way first.


# 1.27 21-Mar-2019 dlg

run event callbacks directly in the atq processing

previously events were queued on an SLIST, but multiple link state
events could fire with the same callback. this corrupted the SLIST
and effectively caused an infinite loop.

ok jmatthew@


# 1.26 12-Mar-2019 jmatthew

Until we can figure out why it causes NMIs on some machines, skip the
pre-reset steps described in Intel's datasheet and also their driver code.

ok dlg@


# 1.25 06-Mar-2019 jmatthew

Ignore rx interrupts if the interface isn't supposed to be running.
In some situations the pxe rom seems to leave rx interrupts
pending, so we get them as soon as we turn interrupts on.
Trying to process rx interrupts before we've allocated an rx ring
leads to crashes.

ok dlg@


# 1.24 01-Mar-2019 dlg

use ifiq_input instead of if_input

call if_rxr_livelocked if ifiq_input says to slow down


# 1.23 26-Feb-2019 dlg

tweak the mbuf loading in the tx path so it's easier to read.

count mbuf load failures as output errors so i can see if that's
a problem (it's not, but at least i can see it isn't now).


# 1.22 26-Feb-2019 dlg

put the rx buffer at the end of the cluster.

makes m_prepend later less likely to allocate a new mbuf.


# 1.21 26-Feb-2019 dlg

avoid a deadlock in ixl_down when calling ifq_barrier.

this is particularly noticable on sparc64 when you reboot.

ok jmatthew@


# 1.20 24-Feb-2019 dlg

get rid of an unused softc member


# 1.19 01-Feb-2019 jmatthew

fix up calculation of our physical function id, making the second port
on dual port cards work much better.

ok dlg@


# 1.18 29-Jan-2019 dlg

get rid of some more debug printfs

suggested by jmatthew@


# 1.17 29-Jan-2019 dlg

don't need to print the base queue number.

ok jmatthew@


# 1.16 22-Jan-2019 jmatthew

Increase hardmtu to the maximum according to the datasheet and set the rx
packet size limit to match so jumbos actually work. Larger packets are
split across multiple buffers on the ring, so the buffers themselves stay
the same size.

ok dlg@


# 1.15 22-Jan-2019 jmatthew

Add and remove mac filters for multicast addresses.

ok dlg@


# 1.14 21-Jan-2019 jmatthew

also add a mac filter that ignores vlans for the broadcast address, so we can
see arp requests on vlans, among other things.

ok dlg@


# 1.13 20-Jan-2019 jmatthew

Replace the default mac filter with one that ignores vlans, and enable
promisc vlan mode so we can see vlan tagged traffic.

ok dlg@


# 1.12 20-Jan-2019 jmatthew

implement ixl_rxrinfo, dynamically allocating the buffer to prepare for
multiple receive rings at some point in the future.

ok dlg@


# 1.11 20-Jan-2019 jmatthew

Handle link state change interrupts by issuing IXL_AQ_OP_PHY_LINK_STATUS
to the admin queue. We don't need to wait for or process the reply,
because the existing admin reply queue processing already does it.

ok dlg@


# 1.10 19-Jan-2019 jmatthew

actually set CAUSE_ENA on the rx and tx queues, and re-enable interrupts
at the start of the interrupt handler. now it works well enough to commit
over.

ok dlg@


# 1.9 18-Jan-2019 jmatthew

pack hmc bits in the right order

ok dlg@


# 1.8 18-Nov-2018 jmatthew

request notification of link state changes, which helps us detect
link when it takes a bit longer to establish.

ok dlg@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.7 21-Dec-2017 dlg

add some ifq and ifiq barriers in ixl_down.

move rxfill in ixl_up so the qtail is only written once.


# 1.6 21-Dec-2017 dlg

now that we have multiple input queues in ifnet structs, use them.

for now we still only have one set of tx and rx rings, but sets us up to
bump that number.


# 1.5 15-Dec-2017 dlg

put where im up to into the tree so jmatthew@ can look at it.


# 1.4 29-Nov-2017 dlg

turns out you dont have to configure a vsi as the default in a veb.

what a waste of two days.


# 1.3 29-Nov-2017 dlg

let this build on sparc64 again.


# 1.2 28-Nov-2017 dlg

remove the #if 0ed out ixl_add_veb now that cvs has backed it up

im pretty sure we dont need it if we're just using the chip as a
single ethernet port.


# 1.1 28-Nov-2017 dlg

add ixl(4) for the "Intel Ethernet 700 Series"

this doesn't work yet, but it very recently got too big to hack on
without cvs to help me manage further changes to it.

ok deraadt@


# 1.88 19-Jul-2023 jan

Protect ixl(4) admin queue with mutex(9).

with tweaks from bluhm

tested by bluhm

ok bluhm@


Revision tags: OPENBSD_7_3_BASE
# 1.87 06-Feb-2023 jan

consolidate mbuf header parsing on device driver layer

with tweaks from mvs@, mpi@, dlg@, naddy@ and bluhm@

"go for it" deraadt@

ok naddy@, mvs@


# 1.86 26-Jan-2023 deraadt

backing "consolidate mbuf header parsing on device driver layer"
easily repeatable ASSERT happens seconds after starting compiles over nfs.


# 1.85 24-Jan-2023 jan

consolidate mbuf header parsing on device driver layer

with tweaks from mvs@, mpi@ and dlg@

ok mvs@, dlg@


Revision tags: OPENBSD_7_2_BASE
# 1.84 05-Aug-2022 bluhm

The netlock for SIOCSIFMEDIA and SIOCGIFMEDIA ioctl is not necessary.
Legacy drivers run with kernel lock, interface media is MP safe or
has kernel lock. Assert kernel lock in ix(4) and ixl(4).
OK kettenis@


Revision tags: OPENBSD_7_1_BASE
# 1.83 11-Mar-2022 mpi

Constify struct cfattach.


# 1.82 10-Feb-2022 bluhm

Enable receive checksum offloading on ixl(4) network interfaces.
from jan@; test and OK dlg@


# 1.81 09-Feb-2022 dlg

enable hw vlan tag handling in the rx path too.

also tested on both x86 and sparc64.


# 1.80 09-Feb-2022 dlg

enable hardware vlan tagging.

tested on sparc64 and x86


# 1.79 08-Feb-2022 dlg

bring back IPv4, TCP4/6 and UDP4/6 checksum offloading.

this was first introduced in r1.176 by jan@. this diff includes two
fixes to that implementation.

the most important one is to parse the ip and tcp headers before a
possible call to m_defrag. if an l4 offload is requested, it's only
requested by the stack when the payload is correctly aligned and
with each header contiguous in memory. this means you can use
m_getptr and cast the packet data to the relevant headers to read
them directly because that's what the stack does when it's working
on them. this makes it cheap to work on them too.

however, if you m_defrag, it ignores the alignment and ends up
making it unsafe to dereference the ip and tcp/udp payloads on
strict alignment architectures. if we want to look at the headers
after m_defrag, we'd likely have to copy them onto the stack first.

the other fix is to reset the offload bits between packets in the
loop in ixl_start.

another difference is that this code skips parsing the packet if
no checksum offload is requested.

tests and a tweak by bluhm@ to actually use the offloading
tested by me on sparc64 and x86 boxes

ok bluhm@ jmatthew@


# 1.78 09-Jan-2022 jsg

spelling
feedback and ok tb@ jmc@ ok ratchov@


# 1.77 27-Nov-2021 deraadt

previous commit causes gcc to perform an unaligned access to the tcphdr
(at least on sparc64) since it accesses the bitfield using an "int sized"
instructions, rather than the minimally sized byte instruction. This is
permitted by the language laywers who probably prefer we change the tcphdr
in every packet. It is not clear how to convince gcc to avoid this behaviour,
and a week of futzing hasn't found fast path solutions yet. In the meantime
the tree may not be broken.


# 1.76 09-Nov-2021 jan

Add IPv4, TCP4/6 and UDP4/6 checksum offloading.

ok jmatthew@


Revision tags: OPENBSD_7_0_BASE
# 1.75 23-Jul-2021 jmatthew

pci_intr_msix_count() is the function that drivers using multiple MSI-X
vectors use to decide whether to use MSI-X, so make it return 0 if MSI
is not enabled for the device.

fixes problems with ix(4) on older amd64 hardware and current riscv64
ok kettenis@ dlg@


Revision tags: OPENBSD_6_9_BASE
# 1.74 26-Mar-2021 jan

Add PCI ID for Intel X710 10G SFP+ NIC

ok patrick@


# 1.73 26-Feb-2021 jan

Add missing PCI product IDs for x710 10GBase-T into ixl(4)

OK phessler


# 1.72 25-Jan-2021 dlg

if the rx descriptor reports the rss hash, use it for the mbuf flowid.

ok jmatthew@


# 1.71 22-Dec-2020 dlg

name the rx rings like ix does for systat mb


# 1.70 12-Dec-2020 jan

Rename the macro MCLGETI to MCLGETL and removes the dead parameter ifp.

OK dlg@, bluhm@
No Opinion mpi@
Not against it claudio@


# 1.69 02-Nov-2020 dlg

add support for ctl reads and writes on 710 chips with API < 1.5

this gets rid of some annoying errors when bringing such an interface
up, but more importantly is allows RSS to work on these boards with
older firmware.

ok jmatthew@


Revision tags: OPENBSD_6_8_BASE
# 1.68 16-Jul-2020 dlg

use a mutex to serialise the test and set of ifp->if_link_state.

this was serialised by NET_LOCK, but now i get link state change
information in an interrupt context, so i shouldn't (can't) do that
anymore.

ok jmatthew@


# 1.67 16-Jul-2020 dlg

sc_atq_mtx is unused, so get rid of it


# 1.66 12-Jul-2020 dlg

it's not an error if the API doesnt support reading sfp/qsfp stuff.

it's just not supported. the manpage says why.

ok sthen@ deraadt@


# 1.65 11-Jul-2020 dlg

don't complain if the API is too low to support "query phy types".

it just means we won't know what to show in ifconfig media output,
but that's not a huge deal. there's still some more issues around
api versions and driver support that we're working on though.

ok sthen@ jmatthew@


# 1.64 10-Jul-2020 patrick

Change users of IFQ_SET_MAXLEN() and IFQ_IS_EMPTY() to use the "new" API.

ok dlg@ tobhe@


# 1.63 07-Jul-2020 dlg

fix a copy pasto.

from netbsd if_ixl.c r1.39 by yamaguchi


# 1.62 07-Jul-2020 dlg

don't try and be too clever in the kstat update timeout.


# 1.61 07-Jul-2020 dlg

remove some old code from a previous version of the kstat diff.

ixl(4) is only enabled on 64bit archs, so we don't need a 32bit
fallback for the 48 and 64 bit counter reads.


# 1.60 07-Jul-2020 dlg

add kstat support for reading hardware counters.

this chip is annoyingly complicated, which is reflected in how
complicated it is to read counters off the chip. while we just use
ixl as a normal network interface, the chip is capable of being a
switch with physical ports, virtual ports and all sorts of other
functionality, and there are counters in different places for all
these different pieces. in our simple setup the driver interface
is mapped to a single physical port which we talk to via a single
virtual switch interface. this diff adds counters on each interface
for the physical port and for the virtual switch interface (vsi).

the port counters show what the hardware is doing, while the vsi
counters show how the driver is interacting with the chip. for
things like packet counters, these numbers tend to correlate strongly,
but there are some differences. if the chip drops packets cos there's
no descriptors on the rx ring, that's shown in the vsi counters.
problems talking to the physical network (eg, packet corruption off
the wire) are reported on the port counters.

on top of the chip just being complicated, reading the counters is
a complicated activity on its own. because the counters can be read
by multiple consumers in a virtualised environment, the counters
never really get reset. they are also stored in annoyingly small
fields. this means you basically have to poll the chip periodically
and calculate differences between the polls to avoid losing numbers
if they overflow too quickly.


# 1.59 26-Jun-2020 dlg

whitespace fixes, no functional change.


# 1.58 26-Jun-2020 dlg

fix link state handling so we can see link go both up and down.


# 1.57 25-Jun-2020 dlg

LUT writes go through actual regsiters on 710s, not ctl writes.

found by jmatthew@


# 1.56 25-Jun-2020 dlg

add rss/toeplitz support for 710 chips.

this basically distributes incoming packets over the rx rings, where
without this they would all land on ring 0.

note that the 722 stuff is stubbed out at the moment. i don't have
an x722 to test with, so it's hard to get motivated to write the
code for it.

this is based on stuff supplied by christiano haesbaert.


# 1.55 25-Jun-2020 dlg

use the ixl_chip struct to store different rss_hena settings for 710/722


# 1.54 25-Jun-2020 dlg

add definitions for rss bits.

based on info from christiano haesbaert


# 1.53 25-Jun-2020 dlg

we're close to a point where the differences between 710s and 722s matter.

this adds a struct ixl_chip, which should hold the differences in
functionality between 710s and 722s. this adds which type of chip
each product is to the ixl_devices array.

based on stuff from christiano haesbaert


# 1.52 25-Jun-2020 dlg

use intrmap to set up multiple queues across multiple cpus.

ixl(4) is only enabled on amd64 and sparc64, and both of them now
support pci_intr_establish_cpu(), so it is safe to apply this.

a few things to note:

- the chip only supports a power of 2 number of queues, (ie, 1, 2,
4, 8, etc), so this also tests the INTRMAP_POWEROF2 flag to
intrmap_create. i tested this on a box with 6 cpus and it did the
right thing.
- the chip can support a lot of vectors, but we're limiting it to
8 for now.
- rss with toeplitz is not implemented yet, so all rxed packets end
up on the 0th queue for now.

jmatthew@ had done most of the work already, and christiano haesbaert
provided some hints to motivate me to work on this bit of it.

tested by jmatthew@ on sparc64
ok jmatthew@


# 1.51 24-Jun-2020 dlg

set IFQ_SET_MAXLEN to the number of slots on the tx ring, not 1.

this effectively enables tx mitigation on this chip. hrvoje popovski
tested it and discovered it adds about 20% to forwarding performance
on his test machine, and brings it more in line with ix(4) performance.

jmatthew thinks i copied setting it to 1 from myx, but myx resets
it to a proper value later on when it figures out what the chip is
capable of. how embarrassment.


# 1.50 24-Jun-2020 dlg

get rid of the per device sff lock because we only use the global one.

no functional change


# 1.49 21-Jun-2020 jmatthew

The onboard interfaces on T7/S7 machines don't provide a valid MAC address
for themselves, so use the "local-mac-address" Open Firmware property
instead, as done in ix(4).

ok dlg@


# 1.48 09-May-2020 jmatthew

Use MSI-X interrupts where available. The first vector is used for
events and command completions as that's the only vector they can go to.
tx/rx queues are mapped to subsequent vectors.

ok mpi@ dlg@


Revision tags: OPENBSD_6_7_BASE
# 1.47 22-Apr-2020 mpi

Use I40E_QUEUE_TYPE_EOL instead of hardcoding its value localy.

ok jmatthew@


# 1.46 19-Nov-2019 yasuoka

Correct the link speed constant variables. From Shoichi Yamaguchi.

ok jmatthew


Revision tags: OPENBSD_6_6_BASE
# 1.45 02-Oct-2019 yasuoka

When dequeuing an aqb from the live queue and move it to the idle queue,
remove it from the live queue. Found by Shoichi Yamaguchi.

ok dlg


# 1.44 30-Sep-2019 dlg

use the right values when figuring out which if_baudrate to use.

i was basically using ISSET(bitfield, shift) instead of
ISSET(bitfield, 1 << shift), so things didn't line up properly.

before this the baudrate would come out as 0, but lacp mode in
trunk(4) wants a non-zero if_baudrate value to help pick the best
aggregator to use. if the port reports 0, it doesn't get selected.
this probably explains why trunk(4) doesn't like doing lacp on
virtual ethernet interfaces too, but at least i can get a number
that actually means something on a hardware interface. alternatively,
use aggr(4), which doesn't care about the baudrate.

ok jmatthew@


# 1.43 27-Aug-2019 dlg

don't check the page number was set correctly.

there are some extremely terrible modules out there that violate
specs badly. some SFP+ DACs i have don't support page switching,
and always report page 0xff. let userland look at them anyway.


# 1.42 27-Aug-2019 dlg

make SIOCGIFSFFPAGE support QSFP modules better.

basically the "read phy register" admin op uses the "address" field
on SFP modules for the i2c address, but uses that same field for
the "sff page number" for QSFP. this wraps the calls to the admin
op with sfp and qsfp specific code, and chooses between them depending
on what type of phy the fw reports.

tested by me and jmatthew@ pretty hard


# 1.41 29-Jul-2019 jmatthew

remove some VF bits now that iavf(4) exists

ok dlg@


# 1.40 21-Jul-2019 dlg

apply backpressure when ifiq says the stack is getting busy


# 1.39 04-Jul-2019 jmatthew

when the mac address changes, update the mac/vlan filters accordingly.

ok dlg@


# 1.38 04-May-2019 jsg

fix array bounds check in ixl_search_link_speed()
ok dlg@


# 1.37 15-Apr-2019 visa

Use timeout_del_barrier(9) instead of timeout_del(9) followed by
conditional timeout_barrier(9).

OK kn@ dlg@


Revision tags: OPENBSD_6_5_BASE
# 1.36 10-Apr-2019 phessler

Add support for X722 to ixl(4)

full list of pci ids from sthen@

OK sthen@, jmatthew@, deraadt@


# 1.35 10-Apr-2019 dlg

implement support for SIOCGIFSFFPAGE

this relies on some firmware commands for accessing the i2c bus
that are not available until a relatively recent API version. our
nics using API 1.4 and 1.5 don't handle the command, but the API
1.7 nic we have is happy to talk to the module that is plugged into
it.

xl710 cards (the 40g ones with a qsfp+ connector) can be split up
into 4 functions that represent lanes on a single port. you can get
qsfp+ to 4x sfp+ cables so you can use the different lanes/functions
as completely independent interfaces. however, because each lane/
function is one port and therefore module, we need to serialise
access to the module by at least the port. this is to prevent
concurrent reads of different pages of the one module from stepping
on each other. i took the easy path and made a single ixlsff lock,
which is at least conservative.


# 1.34 01-Apr-2019 jmatthew

Enable the full pre-reset code path again, now that we've fixed the cause
of the pcie errors.

ok dlg@


# 1.33 01-Apr-2019 jmatthew

Don't use a prefetchable mapping for the registers - the controller only
accepts specific read and write widths, so prefetching and write combining
can cause it to generate pcie errors.

ok dlg@


# 1.32 29-Mar-2019 dlg

remove ifiq_barrier in ixl_down cos ifiq tasks don't use nic resources.

ixl_down frees all the dma memory used by the rings, but that memory
isn't used by ifiqs, so there's no need to wait for them to finish


# 1.31 22-Mar-2019 dlg

back out 1.28

i misread the doco, having promisc vlan reception is what we want.


# 1.30 22-Mar-2019 dlg

use a cond in ixl_atq_exec to wait for a command to be completed.

the main change is to not run ixl_atq_done, cos the interrupt handler
is supposed to do that for us.


# 1.29 22-Mar-2019 dlg

don't blindly complete admin queue entries

there's a bit in the flags field that the firmware sets when the
command is done, so check to decide if the command is ready to be
completed. this in turn makes ixl_iff work.

"oh" jmatthew@


# 1.28 21-Mar-2019 dlg

don't set VLAN things when configuring promisc

the doco says vlan things should only be set if you're manipulating
something on a specific vlan, otherwise the config applies to
everything, which is how the stack wants things to work at the
moment.

there's still something wrong in here, but let's get this out of
the way first.


# 1.27 21-Mar-2019 dlg

run event callbacks directly in the atq processing

previously events were queued on an SLIST, but multiple link state
events could fire with the same callback. this corrupted the SLIST
and effectively caused an infinite loop.

ok jmatthew@


# 1.26 12-Mar-2019 jmatthew

Until we can figure out why it causes NMIs on some machines, skip the
pre-reset steps described in Intel's datasheet and also their driver code.

ok dlg@


# 1.25 06-Mar-2019 jmatthew

Ignore rx interrupts if the interface isn't supposed to be running.
In some situations the pxe rom seems to leave rx interrupts
pending, so we get them as soon as we turn interrupts on.
Trying to process rx interrupts before we've allocated an rx ring
leads to crashes.

ok dlg@


# 1.24 01-Mar-2019 dlg

use ifiq_input instead of if_input

call if_rxr_livelocked if ifiq_input says to slow down


# 1.23 26-Feb-2019 dlg

tweak the mbuf loading in the tx path so it's easier to read.

count mbuf load failures as output errors so i can see if that's
a problem (it's not, but at least i can see it isn't now).


# 1.22 26-Feb-2019 dlg

put the rx buffer at the end of the cluster.

makes m_prepend later less likely to allocate a new mbuf.


# 1.21 26-Feb-2019 dlg

avoid a deadlock in ixl_down when calling ifq_barrier.

this is particularly noticable on sparc64 when you reboot.

ok jmatthew@


# 1.20 24-Feb-2019 dlg

get rid of an unused softc member


# 1.19 01-Feb-2019 jmatthew

fix up calculation of our physical function id, making the second port
on dual port cards work much better.

ok dlg@


# 1.18 29-Jan-2019 dlg

get rid of some more debug printfs

suggested by jmatthew@


# 1.17 29-Jan-2019 dlg

don't need to print the base queue number.

ok jmatthew@


# 1.16 22-Jan-2019 jmatthew

Increase hardmtu to the maximum according to the datasheet and set the rx
packet size limit to match so jumbos actually work. Larger packets are
split across multiple buffers on the ring, so the buffers themselves stay
the same size.

ok dlg@


# 1.15 22-Jan-2019 jmatthew

Add and remove mac filters for multicast addresses.

ok dlg@


# 1.14 21-Jan-2019 jmatthew

also add a mac filter that ignores vlans for the broadcast address, so we can
see arp requests on vlans, among other things.

ok dlg@


# 1.13 20-Jan-2019 jmatthew

Replace the default mac filter with one that ignores vlans, and enable
promisc vlan mode so we can see vlan tagged traffic.

ok dlg@


# 1.12 20-Jan-2019 jmatthew

implement ixl_rxrinfo, dynamically allocating the buffer to prepare for
multiple receive rings at some point in the future.

ok dlg@


# 1.11 20-Jan-2019 jmatthew

Handle link state change interrupts by issuing IXL_AQ_OP_PHY_LINK_STATUS
to the admin queue. We don't need to wait for or process the reply,
because the existing admin reply queue processing already does it.

ok dlg@


# 1.10 19-Jan-2019 jmatthew

actually set CAUSE_ENA on the rx and tx queues, and re-enable interrupts
at the start of the interrupt handler. now it works well enough to commit
over.

ok dlg@


# 1.9 18-Jan-2019 jmatthew

pack hmc bits in the right order

ok dlg@


# 1.8 18-Nov-2018 jmatthew

request notification of link state changes, which helps us detect
link when it takes a bit longer to establish.

ok dlg@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.7 21-Dec-2017 dlg

add some ifq and ifiq barriers in ixl_down.

move rxfill in ixl_up so the qtail is only written once.


# 1.6 21-Dec-2017 dlg

now that we have multiple input queues in ifnet structs, use them.

for now we still only have one set of tx and rx rings, but sets us up to
bump that number.


# 1.5 15-Dec-2017 dlg

put where im up to into the tree so jmatthew@ can look at it.


# 1.4 29-Nov-2017 dlg

turns out you dont have to configure a vsi as the default in a veb.

what a waste of two days.


# 1.3 29-Nov-2017 dlg

let this build on sparc64 again.


# 1.2 28-Nov-2017 dlg

remove the #if 0ed out ixl_add_veb now that cvs has backed it up

im pretty sure we dont need it if we're just using the chip as a
single ethernet port.


# 1.1 28-Nov-2017 dlg

add ixl(4) for the "Intel Ethernet 700 Series"

this doesn't work yet, but it very recently got too big to hack on
without cvs to help me manage further changes to it.

ok deraadt@


# 1.87 06-Feb-2023 jan

consolidate mbuf header parsing on device driver layer

with tweaks from mvs@, mpi@, dlg@, naddy@ and bluhm@

"go for it" deraadt@

ok naddy@, mvs@


# 1.86 26-Jan-2023 deraadt

backing "consolidate mbuf header parsing on device driver layer"
easily repeatable ASSERT happens seconds after starting compiles over nfs.


# 1.85 24-Jan-2023 jan

consolidate mbuf header parsing on device driver layer

with tweaks from mvs@, mpi@ and dlg@

ok mvs@, dlg@


Revision tags: OPENBSD_7_2_BASE
# 1.84 05-Aug-2022 bluhm

The netlock for SIOCSIFMEDIA and SIOCGIFMEDIA ioctl is not necessary.
Legacy drivers run with kernel lock, interface media is MP safe or
has kernel lock. Assert kernel lock in ix(4) and ixl(4).
OK kettenis@


Revision tags: OPENBSD_7_1_BASE
# 1.83 11-Mar-2022 mpi

Constify struct cfattach.


# 1.82 10-Feb-2022 bluhm

Enable receive checksum offloading on ixl(4) network interfaces.
from jan@; test and OK dlg@


# 1.81 09-Feb-2022 dlg

enable hw vlan tag handling in the rx path too.

also tested on both x86 and sparc64.


# 1.80 09-Feb-2022 dlg

enable hardware vlan tagging.

tested on sparc64 and x86


# 1.79 08-Feb-2022 dlg

bring back IPv4, TCP4/6 and UDP4/6 checksum offloading.

this was first introduced in r1.176 by jan@. this diff includes two
fixes to that implementation.

the most important one is to parse the ip and tcp headers before a
possible call to m_defrag. if an l4 offload is requested, it's only
requested by the stack when the payload is correctly aligned and
with each header contiguous in memory. this means you can use
m_getptr and cast the packet data to the relevant headers to read
them directly because that's what the stack does when it's working
on them. this makes it cheap to work on them too.

however, if you m_defrag, it ignores the alignment and ends up
making it unsafe to dereference the ip and tcp/udp payloads on
strict alignment architectures. if we want to look at the headers
after m_defrag, we'd likely have to copy them onto the stack first.

the other fix is to reset the offload bits between packets in the
loop in ixl_start.

another difference is that this code skips parsing the packet if
no checksum offload is requested.

tests and a tweak by bluhm@ to actually use the offloading
tested by me on sparc64 and x86 boxes

ok bluhm@ jmatthew@


# 1.78 09-Jan-2022 jsg

spelling
feedback and ok tb@ jmc@ ok ratchov@


# 1.77 27-Nov-2021 deraadt

previous commit causes gcc to perform an unaligned access to the tcphdr
(at least on sparc64) since it accesses the bitfield using an "int sized"
instructions, rather than the minimally sized byte instruction. This is
permitted by the language laywers who probably prefer we change the tcphdr
in every packet. It is not clear how to convince gcc to avoid this behaviour,
and a week of futzing hasn't found fast path solutions yet. In the meantime
the tree may not be broken.


# 1.76 09-Nov-2021 jan

Add IPv4, TCP4/6 and UDP4/6 checksum offloading.

ok jmatthew@


Revision tags: OPENBSD_7_0_BASE
# 1.75 23-Jul-2021 jmatthew

pci_intr_msix_count() is the function that drivers using multiple MSI-X
vectors use to decide whether to use MSI-X, so make it return 0 if MSI
is not enabled for the device.

fixes problems with ix(4) on older amd64 hardware and current riscv64
ok kettenis@ dlg@


Revision tags: OPENBSD_6_9_BASE
# 1.74 26-Mar-2021 jan

Add PCI ID for Intel X710 10G SFP+ NIC

ok patrick@


# 1.73 26-Feb-2021 jan

Add missing PCI product IDs for x710 10GBase-T into ixl(4)

OK phessler


# 1.72 25-Jan-2021 dlg

if the rx descriptor reports the rss hash, use it for the mbuf flowid.

ok jmatthew@


# 1.71 22-Dec-2020 dlg

name the rx rings like ix does for systat mb


# 1.70 12-Dec-2020 jan

Rename the macro MCLGETI to MCLGETL and removes the dead parameter ifp.

OK dlg@, bluhm@
No Opinion mpi@
Not against it claudio@


# 1.69 02-Nov-2020 dlg

add support for ctl reads and writes on 710 chips with API < 1.5

this gets rid of some annoying errors when bringing such an interface
up, but more importantly is allows RSS to work on these boards with
older firmware.

ok jmatthew@


Revision tags: OPENBSD_6_8_BASE
# 1.68 16-Jul-2020 dlg

use a mutex to serialise the test and set of ifp->if_link_state.

this was serialised by NET_LOCK, but now i get link state change
information in an interrupt context, so i shouldn't (can't) do that
anymore.

ok jmatthew@


# 1.67 16-Jul-2020 dlg

sc_atq_mtx is unused, so get rid of it


# 1.66 12-Jul-2020 dlg

it's not an error if the API doesnt support reading sfp/qsfp stuff.

it's just not supported. the manpage says why.

ok sthen@ deraadt@


# 1.65 11-Jul-2020 dlg

don't complain if the API is too low to support "query phy types".

it just means we won't know what to show in ifconfig media output,
but that's not a huge deal. there's still some more issues around
api versions and driver support that we're working on though.

ok sthen@ jmatthew@


# 1.64 10-Jul-2020 patrick

Change users of IFQ_SET_MAXLEN() and IFQ_IS_EMPTY() to use the "new" API.

ok dlg@ tobhe@


# 1.63 07-Jul-2020 dlg

fix a copy pasto.

from netbsd if_ixl.c r1.39 by yamaguchi


# 1.62 07-Jul-2020 dlg

don't try and be too clever in the kstat update timeout.


# 1.61 07-Jul-2020 dlg

remove some old code from a previous version of the kstat diff.

ixl(4) is only enabled on 64bit archs, so we don't need a 32bit
fallback for the 48 and 64 bit counter reads.


# 1.60 07-Jul-2020 dlg

add kstat support for reading hardware counters.

this chip is annoyingly complicated, which is reflected in how
complicated it is to read counters off the chip. while we just use
ixl as a normal network interface, the chip is capable of being a
switch with physical ports, virtual ports and all sorts of other
functionality, and there are counters in different places for all
these different pieces. in our simple setup the driver interface
is mapped to a single physical port which we talk to via a single
virtual switch interface. this diff adds counters on each interface
for the physical port and for the virtual switch interface (vsi).

the port counters show what the hardware is doing, while the vsi
counters show how the driver is interacting with the chip. for
things like packet counters, these numbers tend to correlate strongly,
but there are some differences. if the chip drops packets cos there's
no descriptors on the rx ring, that's shown in the vsi counters.
problems talking to the physical network (eg, packet corruption off
the wire) are reported on the port counters.

on top of the chip just being complicated, reading the counters is
a complicated activity on its own. because the counters can be read
by multiple consumers in a virtualised environment, the counters
never really get reset. they are also stored in annoyingly small
fields. this means you basically have to poll the chip periodically
and calculate differences between the polls to avoid losing numbers
if they overflow too quickly.


# 1.59 26-Jun-2020 dlg

whitespace fixes, no functional change.


# 1.58 26-Jun-2020 dlg

fix link state handling so we can see link go both up and down.


# 1.57 25-Jun-2020 dlg

LUT writes go through actual regsiters on 710s, not ctl writes.

found by jmatthew@


# 1.56 25-Jun-2020 dlg

add rss/toeplitz support for 710 chips.

this basically distributes incoming packets over the rx rings, where
without this they would all land on ring 0.

note that the 722 stuff is stubbed out at the moment. i don't have
an x722 to test with, so it's hard to get motivated to write the
code for it.

this is based on stuff supplied by christiano haesbaert.


# 1.55 25-Jun-2020 dlg

use the ixl_chip struct to store different rss_hena settings for 710/722


# 1.54 25-Jun-2020 dlg

add definitions for rss bits.

based on info from christiano haesbaert


# 1.53 25-Jun-2020 dlg

we're close to a point where the differences between 710s and 722s matter.

this adds a struct ixl_chip, which should hold the differences in
functionality between 710s and 722s. this adds which type of chip
each product is to the ixl_devices array.

based on stuff from christiano haesbaert


# 1.52 25-Jun-2020 dlg

use intrmap to set up multiple queues across multiple cpus.

ixl(4) is only enabled on amd64 and sparc64, and both of them now
support pci_intr_establish_cpu(), so it is safe to apply this.

a few things to note:

- the chip only supports a power of 2 number of queues, (ie, 1, 2,
4, 8, etc), so this also tests the INTRMAP_POWEROF2 flag to
intrmap_create. i tested this on a box with 6 cpus and it did the
right thing.
- the chip can support a lot of vectors, but we're limiting it to
8 for now.
- rss with toeplitz is not implemented yet, so all rxed packets end
up on the 0th queue for now.

jmatthew@ had done most of the work already, and christiano haesbaert
provided some hints to motivate me to work on this bit of it.

tested by jmatthew@ on sparc64
ok jmatthew@


# 1.51 24-Jun-2020 dlg

set IFQ_SET_MAXLEN to the number of slots on the tx ring, not 1.

this effectively enables tx mitigation on this chip. hrvoje popovski
tested it and discovered it adds about 20% to forwarding performance
on his test machine, and brings it more in line with ix(4) performance.

jmatthew thinks i copied setting it to 1 from myx, but myx resets
it to a proper value later on when it figures out what the chip is
capable of. how embarrassment.


# 1.50 24-Jun-2020 dlg

get rid of the per device sff lock because we only use the global one.

no functional change


# 1.49 21-Jun-2020 jmatthew

The onboard interfaces on T7/S7 machines don't provide a valid MAC address
for themselves, so use the "local-mac-address" Open Firmware property
instead, as done in ix(4).

ok dlg@


# 1.48 09-May-2020 jmatthew

Use MSI-X interrupts where available. The first vector is used for
events and command completions as that's the only vector they can go to.
tx/rx queues are mapped to subsequent vectors.

ok mpi@ dlg@


Revision tags: OPENBSD_6_7_BASE
# 1.47 22-Apr-2020 mpi

Use I40E_QUEUE_TYPE_EOL instead of hardcoding its value localy.

ok jmatthew@


# 1.46 19-Nov-2019 yasuoka

Correct the link speed constant variables. From Shoichi Yamaguchi.

ok jmatthew


Revision tags: OPENBSD_6_6_BASE
# 1.45 02-Oct-2019 yasuoka

When dequeuing an aqb from the live queue and move it to the idle queue,
remove it from the live queue. Found by Shoichi Yamaguchi.

ok dlg


# 1.44 30-Sep-2019 dlg

use the right values when figuring out which if_baudrate to use.

i was basically using ISSET(bitfield, shift) instead of
ISSET(bitfield, 1 << shift), so things didn't line up properly.

before this the baudrate would come out as 0, but lacp mode in
trunk(4) wants a non-zero if_baudrate value to help pick the best
aggregator to use. if the port reports 0, it doesn't get selected.
this probably explains why trunk(4) doesn't like doing lacp on
virtual ethernet interfaces too, but at least i can get a number
that actually means something on a hardware interface. alternatively,
use aggr(4), which doesn't care about the baudrate.

ok jmatthew@


# 1.43 27-Aug-2019 dlg

don't check the page number was set correctly.

there are some extremely terrible modules out there that violate
specs badly. some SFP+ DACs i have don't support page switching,
and always report page 0xff. let userland look at them anyway.


# 1.42 27-Aug-2019 dlg

make SIOCGIFSFFPAGE support QSFP modules better.

basically the "read phy register" admin op uses the "address" field
on SFP modules for the i2c address, but uses that same field for
the "sff page number" for QSFP. this wraps the calls to the admin
op with sfp and qsfp specific code, and chooses between them depending
on what type of phy the fw reports.

tested by me and jmatthew@ pretty hard


# 1.41 29-Jul-2019 jmatthew

remove some VF bits now that iavf(4) exists

ok dlg@


# 1.40 21-Jul-2019 dlg

apply backpressure when ifiq says the stack is getting busy


# 1.39 04-Jul-2019 jmatthew

when the mac address changes, update the mac/vlan filters accordingly.

ok dlg@


# 1.38 04-May-2019 jsg

fix array bounds check in ixl_search_link_speed()
ok dlg@


# 1.37 15-Apr-2019 visa

Use timeout_del_barrier(9) instead of timeout_del(9) followed by
conditional timeout_barrier(9).

OK kn@ dlg@


Revision tags: OPENBSD_6_5_BASE
# 1.36 10-Apr-2019 phessler

Add support for X722 to ixl(4)

full list of pci ids from sthen@

OK sthen@, jmatthew@, deraadt@


# 1.35 10-Apr-2019 dlg

implement support for SIOCGIFSFFPAGE

this relies on some firmware commands for accessing the i2c bus
that are not available until a relatively recent API version. our
nics using API 1.4 and 1.5 don't handle the command, but the API
1.7 nic we have is happy to talk to the module that is plugged into
it.

xl710 cards (the 40g ones with a qsfp+ connector) can be split up
into 4 functions that represent lanes on a single port. you can get
qsfp+ to 4x sfp+ cables so you can use the different lanes/functions
as completely independent interfaces. however, because each lane/
function is one port and therefore module, we need to serialise
access to the module by at least the port. this is to prevent
concurrent reads of different pages of the one module from stepping
on each other. i took the easy path and made a single ixlsff lock,
which is at least conservative.


# 1.34 01-Apr-2019 jmatthew

Enable the full pre-reset code path again, now that we've fixed the cause
of the pcie errors.

ok dlg@


# 1.33 01-Apr-2019 jmatthew

Don't use a prefetchable mapping for the registers - the controller only
accepts specific read and write widths, so prefetching and write combining
can cause it to generate pcie errors.

ok dlg@


# 1.32 29-Mar-2019 dlg

remove ifiq_barrier in ixl_down cos ifiq tasks don't use nic resources.

ixl_down frees all the dma memory used by the rings, but that memory
isn't used by ifiqs, so there's no need to wait for them to finish


# 1.31 22-Mar-2019 dlg

back out 1.28

i misread the doco, having promisc vlan reception is what we want.


# 1.30 22-Mar-2019 dlg

use a cond in ixl_atq_exec to wait for a command to be completed.

the main change is to not run ixl_atq_done, cos the interrupt handler
is supposed to do that for us.


# 1.29 22-Mar-2019 dlg

don't blindly complete admin queue entries

there's a bit in the flags field that the firmware sets when the
command is done, so check to decide if the command is ready to be
completed. this in turn makes ixl_iff work.

"oh" jmatthew@


# 1.28 21-Mar-2019 dlg

don't set VLAN things when configuring promisc

the doco says vlan things should only be set if you're manipulating
something on a specific vlan, otherwise the config applies to
everything, which is how the stack wants things to work at the
moment.

there's still something wrong in here, but let's get this out of
the way first.


# 1.27 21-Mar-2019 dlg

run event callbacks directly in the atq processing

previously events were queued on an SLIST, but multiple link state
events could fire with the same callback. this corrupted the SLIST
and effectively caused an infinite loop.

ok jmatthew@


# 1.26 12-Mar-2019 jmatthew

Until we can figure out why it causes NMIs on some machines, skip the
pre-reset steps described in Intel's datasheet and also their driver code.

ok dlg@


# 1.25 06-Mar-2019 jmatthew

Ignore rx interrupts if the interface isn't supposed to be running.
In some situations the pxe rom seems to leave rx interrupts
pending, so we get them as soon as we turn interrupts on.
Trying to process rx interrupts before we've allocated an rx ring
leads to crashes.

ok dlg@


# 1.24 01-Mar-2019 dlg

use ifiq_input instead of if_input

call if_rxr_livelocked if ifiq_input says to slow down


# 1.23 26-Feb-2019 dlg

tweak the mbuf loading in the tx path so it's easier to read.

count mbuf load failures as output errors so i can see if that's
a problem (it's not, but at least i can see it isn't now).


# 1.22 26-Feb-2019 dlg

put the rx buffer at the end of the cluster.

makes m_prepend later less likely to allocate a new mbuf.


# 1.21 26-Feb-2019 dlg

avoid a deadlock in ixl_down when calling ifq_barrier.

this is particularly noticable on sparc64 when you reboot.

ok jmatthew@


# 1.20 24-Feb-2019 dlg

get rid of an unused softc member


# 1.19 01-Feb-2019 jmatthew

fix up calculation of our physical function id, making the second port
on dual port cards work much better.

ok dlg@


# 1.18 29-Jan-2019 dlg

get rid of some more debug printfs

suggested by jmatthew@


# 1.17 29-Jan-2019 dlg

don't need to print the base queue number.

ok jmatthew@


# 1.16 22-Jan-2019 jmatthew

Increase hardmtu to the maximum according to the datasheet and set the rx
packet size limit to match so jumbos actually work. Larger packets are
split across multiple buffers on the ring, so the buffers themselves stay
the same size.

ok dlg@


# 1.15 22-Jan-2019 jmatthew

Add and remove mac filters for multicast addresses.

ok dlg@


# 1.14 21-Jan-2019 jmatthew

also add a mac filter that ignores vlans for the broadcast address, so we can
see arp requests on vlans, among other things.

ok dlg@


# 1.13 20-Jan-2019 jmatthew

Replace the default mac filter with one that ignores vlans, and enable
promisc vlan mode so we can see vlan tagged traffic.

ok dlg@


# 1.12 20-Jan-2019 jmatthew

implement ixl_rxrinfo, dynamically allocating the buffer to prepare for
multiple receive rings at some point in the future.

ok dlg@


# 1.11 20-Jan-2019 jmatthew

Handle link state change interrupts by issuing IXL_AQ_OP_PHY_LINK_STATUS
to the admin queue. We don't need to wait for or process the reply,
because the existing admin reply queue processing already does it.

ok dlg@


# 1.10 19-Jan-2019 jmatthew

actually set CAUSE_ENA on the rx and tx queues, and re-enable interrupts
at the start of the interrupt handler. now it works well enough to commit
over.

ok dlg@


# 1.9 18-Jan-2019 jmatthew

pack hmc bits in the right order

ok dlg@


# 1.8 18-Nov-2018 jmatthew

request notification of link state changes, which helps us detect
link when it takes a bit longer to establish.

ok dlg@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.7 21-Dec-2017 dlg

add some ifq and ifiq barriers in ixl_down.

move rxfill in ixl_up so the qtail is only written once.


# 1.6 21-Dec-2017 dlg

now that we have multiple input queues in ifnet structs, use them.

for now we still only have one set of tx and rx rings, but sets us up to
bump that number.


# 1.5 15-Dec-2017 dlg

put where im up to into the tree so jmatthew@ can look at it.


# 1.4 29-Nov-2017 dlg

turns out you dont have to configure a vsi as the default in a veb.

what a waste of two days.


# 1.3 29-Nov-2017 dlg

let this build on sparc64 again.


# 1.2 28-Nov-2017 dlg

remove the #if 0ed out ixl_add_veb now that cvs has backed it up

im pretty sure we dont need it if we're just using the chip as a
single ethernet port.


# 1.1 28-Nov-2017 dlg

add ixl(4) for the "Intel Ethernet 700 Series"

this doesn't work yet, but it very recently got too big to hack on
without cvs to help me manage further changes to it.

ok deraadt@


# 1.86 26-Jan-2023 deraadt

backing "consolidate mbuf header parsing on device driver layer"
easily repeatable ASSERT happens seconds after starting compiles over nfs.


# 1.85 24-Jan-2023 jan

consolidate mbuf header parsing on device driver layer

with tweaks from mvs@, mpi@ and dlg@

ok mvs@, dlg@


Revision tags: OPENBSD_7_2_BASE
# 1.84 05-Aug-2022 bluhm

The netlock for SIOCSIFMEDIA and SIOCGIFMEDIA ioctl is not necessary.
Legacy drivers run with kernel lock, interface media is MP safe or
has kernel lock. Assert kernel lock in ix(4) and ixl(4).
OK kettenis@


Revision tags: OPENBSD_7_1_BASE
# 1.83 11-Mar-2022 mpi

Constify struct cfattach.


# 1.82 10-Feb-2022 bluhm

Enable receive checksum offloading on ixl(4) network interfaces.
from jan@; test and OK dlg@


# 1.81 09-Feb-2022 dlg

enable hw vlan tag handling in the rx path too.

also tested on both x86 and sparc64.


# 1.80 09-Feb-2022 dlg

enable hardware vlan tagging.

tested on sparc64 and x86


# 1.79 08-Feb-2022 dlg

bring back IPv4, TCP4/6 and UDP4/6 checksum offloading.

this was first introduced in r1.176 by jan@. this diff includes two
fixes to that implementation.

the most important one is to parse the ip and tcp headers before a
possible call to m_defrag. if an l4 offload is requested, it's only
requested by the stack when the payload is correctly aligned and
with each header contiguous in memory. this means you can use
m_getptr and cast the packet data to the relevant headers to read
them directly because that's what the stack does when it's working
on them. this makes it cheap to work on them too.

however, if you m_defrag, it ignores the alignment and ends up
making it unsafe to dereference the ip and tcp/udp payloads on
strict alignment architectures. if we want to look at the headers
after m_defrag, we'd likely have to copy them onto the stack first.

the other fix is to reset the offload bits between packets in the
loop in ixl_start.

another difference is that this code skips parsing the packet if
no checksum offload is requested.

tests and a tweak by bluhm@ to actually use the offloading
tested by me on sparc64 and x86 boxes

ok bluhm@ jmatthew@


# 1.78 09-Jan-2022 jsg

spelling
feedback and ok tb@ jmc@ ok ratchov@


# 1.77 27-Nov-2021 deraadt

previous commit causes gcc to perform an unaligned access to the tcphdr
(at least on sparc64) since it accesses the bitfield using an "int sized"
instructions, rather than the minimally sized byte instruction. This is
permitted by the language laywers who probably prefer we change the tcphdr
in every packet. It is not clear how to convince gcc to avoid this behaviour,
and a week of futzing hasn't found fast path solutions yet. In the meantime
the tree may not be broken.


# 1.76 09-Nov-2021 jan

Add IPv4, TCP4/6 and UDP4/6 checksum offloading.

ok jmatthew@


Revision tags: OPENBSD_7_0_BASE
# 1.75 23-Jul-2021 jmatthew

pci_intr_msix_count() is the function that drivers using multiple MSI-X
vectors use to decide whether to use MSI-X, so make it return 0 if MSI
is not enabled for the device.

fixes problems with ix(4) on older amd64 hardware and current riscv64
ok kettenis@ dlg@


Revision tags: OPENBSD_6_9_BASE
# 1.74 26-Mar-2021 jan

Add PCI ID for Intel X710 10G SFP+ NIC

ok patrick@


# 1.73 26-Feb-2021 jan

Add missing PCI product IDs for x710 10GBase-T into ixl(4)

OK phessler


# 1.72 25-Jan-2021 dlg

if the rx descriptor reports the rss hash, use it for the mbuf flowid.

ok jmatthew@


# 1.71 22-Dec-2020 dlg

name the rx rings like ix does for systat mb


# 1.70 12-Dec-2020 jan

Rename the macro MCLGETI to MCLGETL and removes the dead parameter ifp.

OK dlg@, bluhm@
No Opinion mpi@
Not against it claudio@


# 1.69 02-Nov-2020 dlg

add support for ctl reads and writes on 710 chips with API < 1.5

this gets rid of some annoying errors when bringing such an interface
up, but more importantly is allows RSS to work on these boards with
older firmware.

ok jmatthew@


Revision tags: OPENBSD_6_8_BASE
# 1.68 16-Jul-2020 dlg

use a mutex to serialise the test and set of ifp->if_link_state.

this was serialised by NET_LOCK, but now i get link state change
information in an interrupt context, so i shouldn't (can't) do that
anymore.

ok jmatthew@


# 1.67 16-Jul-2020 dlg

sc_atq_mtx is unused, so get rid of it


# 1.66 12-Jul-2020 dlg

it's not an error if the API doesnt support reading sfp/qsfp stuff.

it's just not supported. the manpage says why.

ok sthen@ deraadt@


# 1.65 11-Jul-2020 dlg

don't complain if the API is too low to support "query phy types".

it just means we won't know what to show in ifconfig media output,
but that's not a huge deal. there's still some more issues around
api versions and driver support that we're working on though.

ok sthen@ jmatthew@


# 1.64 10-Jul-2020 patrick

Change users of IFQ_SET_MAXLEN() and IFQ_IS_EMPTY() to use the "new" API.

ok dlg@ tobhe@


# 1.63 07-Jul-2020 dlg

fix a copy pasto.

from netbsd if_ixl.c r1.39 by yamaguchi


# 1.62 07-Jul-2020 dlg

don't try and be too clever in the kstat update timeout.


# 1.61 07-Jul-2020 dlg

remove some old code from a previous version of the kstat diff.

ixl(4) is only enabled on 64bit archs, so we don't need a 32bit
fallback for the 48 and 64 bit counter reads.


# 1.60 07-Jul-2020 dlg

add kstat support for reading hardware counters.

this chip is annoyingly complicated, which is reflected in how
complicated it is to read counters off the chip. while we just use
ixl as a normal network interface, the chip is capable of being a
switch with physical ports, virtual ports and all sorts of other
functionality, and there are counters in different places for all
these different pieces. in our simple setup the driver interface
is mapped to a single physical port which we talk to via a single
virtual switch interface. this diff adds counters on each interface
for the physical port and for the virtual switch interface (vsi).

the port counters show what the hardware is doing, while the vsi
counters show how the driver is interacting with the chip. for
things like packet counters, these numbers tend to correlate strongly,
but there are some differences. if the chip drops packets cos there's
no descriptors on the rx ring, that's shown in the vsi counters.
problems talking to the physical network (eg, packet corruption off
the wire) are reported on the port counters.

on top of the chip just being complicated, reading the counters is
a complicated activity on its own. because the counters can be read
by multiple consumers in a virtualised environment, the counters
never really get reset. they are also stored in annoyingly small
fields. this means you basically have to poll the chip periodically
and calculate differences between the polls to avoid losing numbers
if they overflow too quickly.


# 1.59 26-Jun-2020 dlg

whitespace fixes, no functional change.


# 1.58 26-Jun-2020 dlg

fix link state handling so we can see link go both up and down.


# 1.57 25-Jun-2020 dlg

LUT writes go through actual regsiters on 710s, not ctl writes.

found by jmatthew@


# 1.56 25-Jun-2020 dlg

add rss/toeplitz support for 710 chips.

this basically distributes incoming packets over the rx rings, where
without this they would all land on ring 0.

note that the 722 stuff is stubbed out at the moment. i don't have
an x722 to test with, so it's hard to get motivated to write the
code for it.

this is based on stuff supplied by christiano haesbaert.


# 1.55 25-Jun-2020 dlg

use the ixl_chip struct to store different rss_hena settings for 710/722


# 1.54 25-Jun-2020 dlg

add definitions for rss bits.

based on info from christiano haesbaert


# 1.53 25-Jun-2020 dlg

we're close to a point where the differences between 710s and 722s matter.

this adds a struct ixl_chip, which should hold the differences in
functionality between 710s and 722s. this adds which type of chip
each product is to the ixl_devices array.

based on stuff from christiano haesbaert


# 1.52 25-Jun-2020 dlg

use intrmap to set up multiple queues across multiple cpus.

ixl(4) is only enabled on amd64 and sparc64, and both of them now
support pci_intr_establish_cpu(), so it is safe to apply this.

a few things to note:

- the chip only supports a power of 2 number of queues, (ie, 1, 2,
4, 8, etc), so this also tests the INTRMAP_POWEROF2 flag to
intrmap_create. i tested this on a box with 6 cpus and it did the
right thing.
- the chip can support a lot of vectors, but we're limiting it to
8 for now.
- rss with toeplitz is not implemented yet, so all rxed packets end
up on the 0th queue for now.

jmatthew@ had done most of the work already, and christiano haesbaert
provided some hints to motivate me to work on this bit of it.

tested by jmatthew@ on sparc64
ok jmatthew@


# 1.51 24-Jun-2020 dlg

set IFQ_SET_MAXLEN to the number of slots on the tx ring, not 1.

this effectively enables tx mitigation on this chip. hrvoje popovski
tested it and discovered it adds about 20% to forwarding performance
on his test machine, and brings it more in line with ix(4) performance.

jmatthew thinks i copied setting it to 1 from myx, but myx resets
it to a proper value later on when it figures out what the chip is
capable of. how embarrassment.


# 1.50 24-Jun-2020 dlg

get rid of the per device sff lock because we only use the global one.

no functional change


# 1.49 21-Jun-2020 jmatthew

The onboard interfaces on T7/S7 machines don't provide a valid MAC address
for themselves, so use the "local-mac-address" Open Firmware property
instead, as done in ix(4).

ok dlg@


# 1.48 09-May-2020 jmatthew

Use MSI-X interrupts where available. The first vector is used for
events and command completions as that's the only vector they can go to.
tx/rx queues are mapped to subsequent vectors.

ok mpi@ dlg@


Revision tags: OPENBSD_6_7_BASE
# 1.47 22-Apr-2020 mpi

Use I40E_QUEUE_TYPE_EOL instead of hardcoding its value localy.

ok jmatthew@


# 1.46 19-Nov-2019 yasuoka

Correct the link speed constant variables. From Shoichi Yamaguchi.

ok jmatthew


Revision tags: OPENBSD_6_6_BASE
# 1.45 02-Oct-2019 yasuoka

When dequeuing an aqb from the live queue and move it to the idle queue,
remove it from the live queue. Found by Shoichi Yamaguchi.

ok dlg


# 1.44 30-Sep-2019 dlg

use the right values when figuring out which if_baudrate to use.

i was basically using ISSET(bitfield, shift) instead of
ISSET(bitfield, 1 << shift), so things didn't line up properly.

before this the baudrate would come out as 0, but lacp mode in
trunk(4) wants a non-zero if_baudrate value to help pick the best
aggregator to use. if the port reports 0, it doesn't get selected.
this probably explains why trunk(4) doesn't like doing lacp on
virtual ethernet interfaces too, but at least i can get a number
that actually means something on a hardware interface. alternatively,
use aggr(4), which doesn't care about the baudrate.

ok jmatthew@


# 1.43 27-Aug-2019 dlg

don't check the page number was set correctly.

there are some extremely terrible modules out there that violate
specs badly. some SFP+ DACs i have don't support page switching,
and always report page 0xff. let userland look at them anyway.


# 1.42 27-Aug-2019 dlg

make SIOCGIFSFFPAGE support QSFP modules better.

basically the "read phy register" admin op uses the "address" field
on SFP modules for the i2c address, but uses that same field for
the "sff page number" for QSFP. this wraps the calls to the admin
op with sfp and qsfp specific code, and chooses between them depending
on what type of phy the fw reports.

tested by me and jmatthew@ pretty hard


# 1.41 29-Jul-2019 jmatthew

remove some VF bits now that iavf(4) exists

ok dlg@


# 1.40 21-Jul-2019 dlg

apply backpressure when ifiq says the stack is getting busy


# 1.39 04-Jul-2019 jmatthew

when the mac address changes, update the mac/vlan filters accordingly.

ok dlg@


# 1.38 04-May-2019 jsg

fix array bounds check in ixl_search_link_speed()
ok dlg@


# 1.37 15-Apr-2019 visa

Use timeout_del_barrier(9) instead of timeout_del(9) followed by
conditional timeout_barrier(9).

OK kn@ dlg@


Revision tags: OPENBSD_6_5_BASE
# 1.36 10-Apr-2019 phessler

Add support for X722 to ixl(4)

full list of pci ids from sthen@

OK sthen@, jmatthew@, deraadt@


# 1.35 10-Apr-2019 dlg

implement support for SIOCGIFSFFPAGE

this relies on some firmware commands for accessing the i2c bus
that are not available until a relatively recent API version. our
nics using API 1.4 and 1.5 don't handle the command, but the API
1.7 nic we have is happy to talk to the module that is plugged into
it.

xl710 cards (the 40g ones with a qsfp+ connector) can be split up
into 4 functions that represent lanes on a single port. you can get
qsfp+ to 4x sfp+ cables so you can use the different lanes/functions
as completely independent interfaces. however, because each lane/
function is one port and therefore module, we need to serialise
access to the module by at least the port. this is to prevent
concurrent reads of different pages of the one module from stepping
on each other. i took the easy path and made a single ixlsff lock,
which is at least conservative.


# 1.34 01-Apr-2019 jmatthew

Enable the full pre-reset code path again, now that we've fixed the cause
of the pcie errors.

ok dlg@


# 1.33 01-Apr-2019 jmatthew

Don't use a prefetchable mapping for the registers - the controller only
accepts specific read and write widths, so prefetching and write combining
can cause it to generate pcie errors.

ok dlg@


# 1.32 29-Mar-2019 dlg

remove ifiq_barrier in ixl_down cos ifiq tasks don't use nic resources.

ixl_down frees all the dma memory used by the rings, but that memory
isn't used by ifiqs, so there's no need to wait for them to finish


# 1.31 22-Mar-2019 dlg

back out 1.28

i misread the doco, having promisc vlan reception is what we want.


# 1.30 22-Mar-2019 dlg

use a cond in ixl_atq_exec to wait for a command to be completed.

the main change is to not run ixl_atq_done, cos the interrupt handler
is supposed to do that for us.


# 1.29 22-Mar-2019 dlg

don't blindly complete admin queue entries

there's a bit in the flags field that the firmware sets when the
command is done, so check to decide if the command is ready to be
completed. this in turn makes ixl_iff work.

"oh" jmatthew@


# 1.28 21-Mar-2019 dlg

don't set VLAN things when configuring promisc

the doco says vlan things should only be set if you're manipulating
something on a specific vlan, otherwise the config applies to
everything, which is how the stack wants things to work at the
moment.

there's still something wrong in here, but let's get this out of
the way first.


# 1.27 21-Mar-2019 dlg

run event callbacks directly in the atq processing

previously events were queued on an SLIST, but multiple link state
events could fire with the same callback. this corrupted the SLIST
and effectively caused an infinite loop.

ok jmatthew@


# 1.26 12-Mar-2019 jmatthew

Until we can figure out why it causes NMIs on some machines, skip the
pre-reset steps described in Intel's datasheet and also their driver code.

ok dlg@


# 1.25 06-Mar-2019 jmatthew

Ignore rx interrupts if the interface isn't supposed to be running.
In some situations the pxe rom seems to leave rx interrupts
pending, so we get them as soon as we turn interrupts on.
Trying to process rx interrupts before we've allocated an rx ring
leads to crashes.

ok dlg@


# 1.24 01-Mar-2019 dlg

use ifiq_input instead of if_input

call if_rxr_livelocked if ifiq_input says to slow down


# 1.23 26-Feb-2019 dlg

tweak the mbuf loading in the tx path so it's easier to read.

count mbuf load failures as output errors so i can see if that's
a problem (it's not, but at least i can see it isn't now).


# 1.22 26-Feb-2019 dlg

put the rx buffer at the end of the cluster.

makes m_prepend later less likely to allocate a new mbuf.


# 1.21 26-Feb-2019 dlg

avoid a deadlock in ixl_down when calling ifq_barrier.

this is particularly noticable on sparc64 when you reboot.

ok jmatthew@


# 1.20 24-Feb-2019 dlg

get rid of an unused softc member


# 1.19 01-Feb-2019 jmatthew

fix up calculation of our physical function id, making the second port
on dual port cards work much better.

ok dlg@


# 1.18 29-Jan-2019 dlg

get rid of some more debug printfs

suggested by jmatthew@


# 1.17 29-Jan-2019 dlg

don't need to print the base queue number.

ok jmatthew@


# 1.16 22-Jan-2019 jmatthew

Increase hardmtu to the maximum according to the datasheet and set the rx
packet size limit to match so jumbos actually work. Larger packets are
split across multiple buffers on the ring, so the buffers themselves stay
the same size.

ok dlg@


# 1.15 22-Jan-2019 jmatthew

Add and remove mac filters for multicast addresses.

ok dlg@


# 1.14 21-Jan-2019 jmatthew

also add a mac filter that ignores vlans for the broadcast address, so we can
see arp requests on vlans, among other things.

ok dlg@


# 1.13 20-Jan-2019 jmatthew

Replace the default mac filter with one that ignores vlans, and enable
promisc vlan mode so we can see vlan tagged traffic.

ok dlg@


# 1.12 20-Jan-2019 jmatthew

implement ixl_rxrinfo, dynamically allocating the buffer to prepare for
multiple receive rings at some point in the future.

ok dlg@


# 1.11 20-Jan-2019 jmatthew

Handle link state change interrupts by issuing IXL_AQ_OP_PHY_LINK_STATUS
to the admin queue. We don't need to wait for or process the reply,
because the existing admin reply queue processing already does it.

ok dlg@


# 1.10 19-Jan-2019 jmatthew

actually set CAUSE_ENA on the rx and tx queues, and re-enable interrupts
at the start of the interrupt handler. now it works well enough to commit
over.

ok dlg@


# 1.9 18-Jan-2019 jmatthew

pack hmc bits in the right order

ok dlg@


# 1.8 18-Nov-2018 jmatthew

request notification of link state changes, which helps us detect
link when it takes a bit longer to establish.

ok dlg@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.7 21-Dec-2017 dlg

add some ifq and ifiq barriers in ixl_down.

move rxfill in ixl_up so the qtail is only written once.


# 1.6 21-Dec-2017 dlg

now that we have multiple input queues in ifnet structs, use them.

for now we still only have one set of tx and rx rings, but sets us up to
bump that number.


# 1.5 15-Dec-2017 dlg

put where im up to into the tree so jmatthew@ can look at it.


# 1.4 29-Nov-2017 dlg

turns out you dont have to configure a vsi as the default in a veb.

what a waste of two days.


# 1.3 29-Nov-2017 dlg

let this build on sparc64 again.


# 1.2 28-Nov-2017 dlg

remove the #if 0ed out ixl_add_veb now that cvs has backed it up

im pretty sure we dont need it if we're just using the chip as a
single ethernet port.


# 1.1 28-Nov-2017 dlg

add ixl(4) for the "Intel Ethernet 700 Series"

this doesn't work yet, but it very recently got too big to hack on
without cvs to help me manage further changes to it.

ok deraadt@


# 1.85 24-Jan-2023 jan

consolidate mbuf header parsing on device driver layer

with tweaks from mvs@, mpi@ and dlg@

ok mvs@, dlg@


Revision tags: OPENBSD_7_2_BASE
# 1.84 05-Aug-2022 bluhm

The netlock for SIOCSIFMEDIA and SIOCGIFMEDIA ioctl is not necessary.
Legacy drivers run with kernel lock, interface media is MP safe or
has kernel lock. Assert kernel lock in ix(4) and ixl(4).
OK kettenis@


Revision tags: OPENBSD_7_1_BASE
# 1.83 11-Mar-2022 mpi

Constify struct cfattach.


# 1.82 10-Feb-2022 bluhm

Enable receive checksum offloading on ixl(4) network interfaces.
from jan@; test and OK dlg@


# 1.81 09-Feb-2022 dlg

enable hw vlan tag handling in the rx path too.

also tested on both x86 and sparc64.


# 1.80 09-Feb-2022 dlg

enable hardware vlan tagging.

tested on sparc64 and x86


# 1.79 08-Feb-2022 dlg

bring back IPv4, TCP4/6 and UDP4/6 checksum offloading.

this was first introduced in r1.176 by jan@. this diff includes two
fixes to that implementation.

the most important one is to parse the ip and tcp headers before a
possible call to m_defrag. if an l4 offload is requested, it's only
requested by the stack when the payload is correctly aligned and
with each header contiguous in memory. this means you can use
m_getptr and cast the packet data to the relevant headers to read
them directly because that's what the stack does when it's working
on them. this makes it cheap to work on them too.

however, if you m_defrag, it ignores the alignment and ends up
making it unsafe to dereference the ip and tcp/udp payloads on
strict alignment architectures. if we want to look at the headers
after m_defrag, we'd likely have to copy them onto the stack first.

the other fix is to reset the offload bits between packets in the
loop in ixl_start.

another difference is that this code skips parsing the packet if
no checksum offload is requested.

tests and a tweak by bluhm@ to actually use the offloading
tested by me on sparc64 and x86 boxes

ok bluhm@ jmatthew@


# 1.78 09-Jan-2022 jsg

spelling
feedback and ok tb@ jmc@ ok ratchov@


# 1.77 27-Nov-2021 deraadt

previous commit causes gcc to perform an unaligned access to the tcphdr
(at least on sparc64) since it accesses the bitfield using an "int sized"
instructions, rather than the minimally sized byte instruction. This is
permitted by the language laywers who probably prefer we change the tcphdr
in every packet. It is not clear how to convince gcc to avoid this behaviour,
and a week of futzing hasn't found fast path solutions yet. In the meantime
the tree may not be broken.


# 1.76 09-Nov-2021 jan

Add IPv4, TCP4/6 and UDP4/6 checksum offloading.

ok jmatthew@


Revision tags: OPENBSD_7_0_BASE
# 1.75 23-Jul-2021 jmatthew

pci_intr_msix_count() is the function that drivers using multiple MSI-X
vectors use to decide whether to use MSI-X, so make it return 0 if MSI
is not enabled for the device.

fixes problems with ix(4) on older amd64 hardware and current riscv64
ok kettenis@ dlg@


Revision tags: OPENBSD_6_9_BASE
# 1.74 26-Mar-2021 jan

Add PCI ID for Intel X710 10G SFP+ NIC

ok patrick@


# 1.73 26-Feb-2021 jan

Add missing PCI product IDs for x710 10GBase-T into ixl(4)

OK phessler


# 1.72 25-Jan-2021 dlg

if the rx descriptor reports the rss hash, use it for the mbuf flowid.

ok jmatthew@


# 1.71 22-Dec-2020 dlg

name the rx rings like ix does for systat mb


# 1.70 12-Dec-2020 jan

Rename the macro MCLGETI to MCLGETL and removes the dead parameter ifp.

OK dlg@, bluhm@
No Opinion mpi@
Not against it claudio@


# 1.69 02-Nov-2020 dlg

add support for ctl reads and writes on 710 chips with API < 1.5

this gets rid of some annoying errors when bringing such an interface
up, but more importantly is allows RSS to work on these boards with
older firmware.

ok jmatthew@


Revision tags: OPENBSD_6_8_BASE
# 1.68 16-Jul-2020 dlg

use a mutex to serialise the test and set of ifp->if_link_state.

this was serialised by NET_LOCK, but now i get link state change
information in an interrupt context, so i shouldn't (can't) do that
anymore.

ok jmatthew@


# 1.67 16-Jul-2020 dlg

sc_atq_mtx is unused, so get rid of it


# 1.66 12-Jul-2020 dlg

it's not an error if the API doesnt support reading sfp/qsfp stuff.

it's just not supported. the manpage says why.

ok sthen@ deraadt@


# 1.65 11-Jul-2020 dlg

don't complain if the API is too low to support "query phy types".

it just means we won't know what to show in ifconfig media output,
but that's not a huge deal. there's still some more issues around
api versions and driver support that we're working on though.

ok sthen@ jmatthew@


# 1.64 10-Jul-2020 patrick

Change users of IFQ_SET_MAXLEN() and IFQ_IS_EMPTY() to use the "new" API.

ok dlg@ tobhe@


# 1.63 07-Jul-2020 dlg

fix a copy pasto.

from netbsd if_ixl.c r1.39 by yamaguchi


# 1.62 07-Jul-2020 dlg

don't try and be too clever in the kstat update timeout.


# 1.61 07-Jul-2020 dlg

remove some old code from a previous version of the kstat diff.

ixl(4) is only enabled on 64bit archs, so we don't need a 32bit
fallback for the 48 and 64 bit counter reads.


# 1.60 07-Jul-2020 dlg

add kstat support for reading hardware counters.

this chip is annoyingly complicated, which is reflected in how
complicated it is to read counters off the chip. while we just use
ixl as a normal network interface, the chip is capable of being a
switch with physical ports, virtual ports and all sorts of other
functionality, and there are counters in different places for all
these different pieces. in our simple setup the driver interface
is mapped to a single physical port which we talk to via a single
virtual switch interface. this diff adds counters on each interface
for the physical port and for the virtual switch interface (vsi).

the port counters show what the hardware is doing, while the vsi
counters show how the driver is interacting with the chip. for
things like packet counters, these numbers tend to correlate strongly,
but there are some differences. if the chip drops packets cos there's
no descriptors on the rx ring, that's shown in the vsi counters.
problems talking to the physical network (eg, packet corruption off
the wire) are reported on the port counters.

on top of the chip just being complicated, reading the counters is
a complicated activity on its own. because the counters can be read
by multiple consumers in a virtualised environment, the counters
never really get reset. they are also stored in annoyingly small
fields. this means you basically have to poll the chip periodically
and calculate differences between the polls to avoid losing numbers
if they overflow too quickly.


# 1.59 26-Jun-2020 dlg

whitespace fixes, no functional change.


# 1.58 26-Jun-2020 dlg

fix link state handling so we can see link go both up and down.


# 1.57 25-Jun-2020 dlg

LUT writes go through actual regsiters on 710s, not ctl writes.

found by jmatthew@


# 1.56 25-Jun-2020 dlg

add rss/toeplitz support for 710 chips.

this basically distributes incoming packets over the rx rings, where
without this they would all land on ring 0.

note that the 722 stuff is stubbed out at the moment. i don't have
an x722 to test with, so it's hard to get motivated to write the
code for it.

this is based on stuff supplied by christiano haesbaert.


# 1.55 25-Jun-2020 dlg

use the ixl_chip struct to store different rss_hena settings for 710/722


# 1.54 25-Jun-2020 dlg

add definitions for rss bits.

based on info from christiano haesbaert


# 1.53 25-Jun-2020 dlg

we're close to a point where the differences between 710s and 722s matter.

this adds a struct ixl_chip, which should hold the differences in
functionality between 710s and 722s. this adds which type of chip
each product is to the ixl_devices array.

based on stuff from christiano haesbaert


# 1.52 25-Jun-2020 dlg

use intrmap to set up multiple queues across multiple cpus.

ixl(4) is only enabled on amd64 and sparc64, and both of them now
support pci_intr_establish_cpu(), so it is safe to apply this.

a few things to note:

- the chip only supports a power of 2 number of queues, (ie, 1, 2,
4, 8, etc), so this also tests the INTRMAP_POWEROF2 flag to
intrmap_create. i tested this on a box with 6 cpus and it did the
right thing.
- the chip can support a lot of vectors, but we're limiting it to
8 for now.
- rss with toeplitz is not implemented yet, so all rxed packets end
up on the 0th queue for now.

jmatthew@ had done most of the work already, and christiano haesbaert
provided some hints to motivate me to work on this bit of it.

tested by jmatthew@ on sparc64
ok jmatthew@


# 1.51 24-Jun-2020 dlg

set IFQ_SET_MAXLEN to the number of slots on the tx ring, not 1.

this effectively enables tx mitigation on this chip. hrvoje popovski
tested it and discovered it adds about 20% to forwarding performance
on his test machine, and brings it more in line with ix(4) performance.

jmatthew thinks i copied setting it to 1 from myx, but myx resets
it to a proper value later on when it figures out what the chip is
capable of. how embarrassment.


# 1.50 24-Jun-2020 dlg

get rid of the per device sff lock because we only use the global one.

no functional change


# 1.49 21-Jun-2020 jmatthew

The onboard interfaces on T7/S7 machines don't provide a valid MAC address
for themselves, so use the "local-mac-address" Open Firmware property
instead, as done in ix(4).

ok dlg@


# 1.48 09-May-2020 jmatthew

Use MSI-X interrupts where available. The first vector is used for
events and command completions as that's the only vector they can go to.
tx/rx queues are mapped to subsequent vectors.

ok mpi@ dlg@


Revision tags: OPENBSD_6_7_BASE
# 1.47 22-Apr-2020 mpi

Use I40E_QUEUE_TYPE_EOL instead of hardcoding its value localy.

ok jmatthew@


# 1.46 19-Nov-2019 yasuoka

Correct the link speed constant variables. From Shoichi Yamaguchi.

ok jmatthew


Revision tags: OPENBSD_6_6_BASE
# 1.45 02-Oct-2019 yasuoka

When dequeuing an aqb from the live queue and move it to the idle queue,
remove it from the live queue. Found by Shoichi Yamaguchi.

ok dlg


# 1.44 30-Sep-2019 dlg

use the right values when figuring out which if_baudrate to use.

i was basically using ISSET(bitfield, shift) instead of
ISSET(bitfield, 1 << shift), so things didn't line up properly.

before this the baudrate would come out as 0, but lacp mode in
trunk(4) wants a non-zero if_baudrate value to help pick the best
aggregator to use. if the port reports 0, it doesn't get selected.
this probably explains why trunk(4) doesn't like doing lacp on
virtual ethernet interfaces too, but at least i can get a number
that actually means something on a hardware interface. alternatively,
use aggr(4), which doesn't care about the baudrate.

ok jmatthew@


# 1.43 27-Aug-2019 dlg

don't check the page number was set correctly.

there are some extremely terrible modules out there that violate
specs badly. some SFP+ DACs i have don't support page switching,
and always report page 0xff. let userland look at them anyway.


# 1.42 27-Aug-2019 dlg

make SIOCGIFSFFPAGE support QSFP modules better.

basically the "read phy register" admin op uses the "address" field
on SFP modules for the i2c address, but uses that same field for
the "sff page number" for QSFP. this wraps the calls to the admin
op with sfp and qsfp specific code, and chooses between them depending
on what type of phy the fw reports.

tested by me and jmatthew@ pretty hard


# 1.41 29-Jul-2019 jmatthew

remove some VF bits now that iavf(4) exists

ok dlg@


# 1.40 21-Jul-2019 dlg

apply backpressure when ifiq says the stack is getting busy


# 1.39 04-Jul-2019 jmatthew

when the mac address changes, update the mac/vlan filters accordingly.

ok dlg@


# 1.38 04-May-2019 jsg

fix array bounds check in ixl_search_link_speed()
ok dlg@


# 1.37 15-Apr-2019 visa

Use timeout_del_barrier(9) instead of timeout_del(9) followed by
conditional timeout_barrier(9).

OK kn@ dlg@


Revision tags: OPENBSD_6_5_BASE
# 1.36 10-Apr-2019 phessler

Add support for X722 to ixl(4)

full list of pci ids from sthen@

OK sthen@, jmatthew@, deraadt@


# 1.35 10-Apr-2019 dlg

implement support for SIOCGIFSFFPAGE

this relies on some firmware commands for accessing the i2c bus
that are not available until a relatively recent API version. our
nics using API 1.4 and 1.5 don't handle the command, but the API
1.7 nic we have is happy to talk to the module that is plugged into
it.

xl710 cards (the 40g ones with a qsfp+ connector) can be split up
into 4 functions that represent lanes on a single port. you can get
qsfp+ to 4x sfp+ cables so you can use the different lanes/functions
as completely independent interfaces. however, because each lane/
function is one port and therefore module, we need to serialise
access to the module by at least the port. this is to prevent
concurrent reads of different pages of the one module from stepping
on each other. i took the easy path and made a single ixlsff lock,
which is at least conservative.


# 1.34 01-Apr-2019 jmatthew

Enable the full pre-reset code path again, now that we've fixed the cause
of the pcie errors.

ok dlg@


# 1.33 01-Apr-2019 jmatthew

Don't use a prefetchable mapping for the registers - the controller only
accepts specific read and write widths, so prefetching and write combining
can cause it to generate pcie errors.

ok dlg@


# 1.32 29-Mar-2019 dlg

remove ifiq_barrier in ixl_down cos ifiq tasks don't use nic resources.

ixl_down frees all the dma memory used by the rings, but that memory
isn't used by ifiqs, so there's no need to wait for them to finish


# 1.31 22-Mar-2019 dlg

back out 1.28

i misread the doco, having promisc vlan reception is what we want.


# 1.30 22-Mar-2019 dlg

use a cond in ixl_atq_exec to wait for a command to be completed.

the main change is to not run ixl_atq_done, cos the interrupt handler
is supposed to do that for us.


# 1.29 22-Mar-2019 dlg

don't blindly complete admin queue entries

there's a bit in the flags field that the firmware sets when the
command is done, so check to decide if the command is ready to be
completed. this in turn makes ixl_iff work.

"oh" jmatthew@


# 1.28 21-Mar-2019 dlg

don't set VLAN things when configuring promisc

the doco says vlan things should only be set if you're manipulating
something on a specific vlan, otherwise the config applies to
everything, which is how the stack wants things to work at the
moment.

there's still something wrong in here, but let's get this out of
the way first.


# 1.27 21-Mar-2019 dlg

run event callbacks directly in the atq processing

previously events were queued on an SLIST, but multiple link state
events could fire with the same callback. this corrupted the SLIST
and effectively caused an infinite loop.

ok jmatthew@


# 1.26 12-Mar-2019 jmatthew

Until we can figure out why it causes NMIs on some machines, skip the
pre-reset steps described in Intel's datasheet and also their driver code.

ok dlg@


# 1.25 06-Mar-2019 jmatthew

Ignore rx interrupts if the interface isn't supposed to be running.
In some situations the pxe rom seems to leave rx interrupts
pending, so we get them as soon as we turn interrupts on.
Trying to process rx interrupts before we've allocated an rx ring
leads to crashes.

ok dlg@


# 1.24 01-Mar-2019 dlg

use ifiq_input instead of if_input

call if_rxr_livelocked if ifiq_input says to slow down


# 1.23 26-Feb-2019 dlg

tweak the mbuf loading in the tx path so it's easier to read.

count mbuf load failures as output errors so i can see if that's
a problem (it's not, but at least i can see it isn't now).


# 1.22 26-Feb-2019 dlg

put the rx buffer at the end of the cluster.

makes m_prepend later less likely to allocate a new mbuf.


# 1.21 26-Feb-2019 dlg

avoid a deadlock in ixl_down when calling ifq_barrier.

this is particularly noticable on sparc64 when you reboot.

ok jmatthew@


# 1.20 24-Feb-2019 dlg

get rid of an unused softc member


# 1.19 01-Feb-2019 jmatthew

fix up calculation of our physical function id, making the second port
on dual port cards work much better.

ok dlg@


# 1.18 29-Jan-2019 dlg

get rid of some more debug printfs

suggested by jmatthew@


# 1.17 29-Jan-2019 dlg

don't need to print the base queue number.

ok jmatthew@


# 1.16 22-Jan-2019 jmatthew

Increase hardmtu to the maximum according to the datasheet and set the rx
packet size limit to match so jumbos actually work. Larger packets are
split across multiple buffers on the ring, so the buffers themselves stay
the same size.

ok dlg@


# 1.15 22-Jan-2019 jmatthew

Add and remove mac filters for multicast addresses.

ok dlg@


# 1.14 21-Jan-2019 jmatthew

also add a mac filter that ignores vlans for the broadcast address, so we can
see arp requests on vlans, among other things.

ok dlg@


# 1.13 20-Jan-2019 jmatthew

Replace the default mac filter with one that ignores vlans, and enable
promisc vlan mode so we can see vlan tagged traffic.

ok dlg@


# 1.12 20-Jan-2019 jmatthew

implement ixl_rxrinfo, dynamically allocating the buffer to prepare for
multiple receive rings at some point in the future.

ok dlg@


# 1.11 20-Jan-2019 jmatthew

Handle link state change interrupts by issuing IXL_AQ_OP_PHY_LINK_STATUS
to the admin queue. We don't need to wait for or process the reply,
because the existing admin reply queue processing already does it.

ok dlg@


# 1.10 19-Jan-2019 jmatthew

actually set CAUSE_ENA on the rx and tx queues, and re-enable interrupts
at the start of the interrupt handler. now it works well enough to commit
over.

ok dlg@


# 1.9 18-Jan-2019 jmatthew

pack hmc bits in the right order

ok dlg@


# 1.8 18-Nov-2018 jmatthew

request notification of link state changes, which helps us detect
link when it takes a bit longer to establish.

ok dlg@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.7 21-Dec-2017 dlg

add some ifq and ifiq barriers in ixl_down.

move rxfill in ixl_up so the qtail is only written once.


# 1.6 21-Dec-2017 dlg

now that we have multiple input queues in ifnet structs, use them.

for now we still only have one set of tx and rx rings, but sets us up to
bump that number.


# 1.5 15-Dec-2017 dlg

put where im up to into the tree so jmatthew@ can look at it.


# 1.4 29-Nov-2017 dlg

turns out you dont have to configure a vsi as the default in a veb.

what a waste of two days.


# 1.3 29-Nov-2017 dlg

let this build on sparc64 again.


# 1.2 28-Nov-2017 dlg

remove the #if 0ed out ixl_add_veb now that cvs has backed it up

im pretty sure we dont need it if we're just using the chip as a
single ethernet port.


# 1.1 28-Nov-2017 dlg

add ixl(4) for the "Intel Ethernet 700 Series"

this doesn't work yet, but it very recently got too big to hack on
without cvs to help me manage further changes to it.

ok deraadt@


# 1.84 05-Aug-2022 bluhm

The netlock for SIOCSIFMEDIA and SIOCGIFMEDIA ioctl is not necessary.
Legacy drivers run with kernel lock, interface media is MP safe or
has kernel lock. Assert kernel lock in ix(4) and ixl(4).
OK kettenis@


Revision tags: OPENBSD_7_1_BASE
# 1.83 11-Mar-2022 mpi

Constify struct cfattach.


# 1.82 10-Feb-2022 bluhm

Enable receive checksum offloading on ixl(4) network interfaces.
from jan@; test and OK dlg@


# 1.81 09-Feb-2022 dlg

enable hw vlan tag handling in the rx path too.

also tested on both x86 and sparc64.


# 1.80 09-Feb-2022 dlg

enable hardware vlan tagging.

tested on sparc64 and x86


# 1.79 08-Feb-2022 dlg

bring back IPv4, TCP4/6 and UDP4/6 checksum offloading.

this was first introduced in r1.176 by jan@. this diff includes two
fixes to that implementation.

the most important one is to parse the ip and tcp headers before a
possible call to m_defrag. if an l4 offload is requested, it's only
requested by the stack when the payload is correctly aligned and
with each header contiguous in memory. this means you can use
m_getptr and cast the packet data to the relevant headers to read
them directly because that's what the stack does when it's working
on them. this makes it cheap to work on them too.

however, if you m_defrag, it ignores the alignment and ends up
making it unsafe to dereference the ip and tcp/udp payloads on
strict alignment architectures. if we want to look at the headers
after m_defrag, we'd likely have to copy them onto the stack first.

the other fix is to reset the offload bits between packets in the
loop in ixl_start.

another difference is that this code skips parsing the packet if
no checksum offload is requested.

tests and a tweak by bluhm@ to actually use the offloading
tested by me on sparc64 and x86 boxes

ok bluhm@ jmatthew@


# 1.78 09-Jan-2022 jsg

spelling
feedback and ok tb@ jmc@ ok ratchov@


# 1.77 27-Nov-2021 deraadt

previous commit causes gcc to perform an unaligned access to the tcphdr
(at least on sparc64) since it accesses the bitfield using an "int sized"
instructions, rather than the minimally sized byte instruction. This is
permitted by the language laywers who probably prefer we change the tcphdr
in every packet. It is not clear how to convince gcc to avoid this behaviour,
and a week of futzing hasn't found fast path solutions yet. In the meantime
the tree may not be broken.


# 1.76 09-Nov-2021 jan

Add IPv4, TCP4/6 and UDP4/6 checksum offloading.

ok jmatthew@


Revision tags: OPENBSD_7_0_BASE
# 1.75 23-Jul-2021 jmatthew

pci_intr_msix_count() is the function that drivers using multiple MSI-X
vectors use to decide whether to use MSI-X, so make it return 0 if MSI
is not enabled for the device.

fixes problems with ix(4) on older amd64 hardware and current riscv64
ok kettenis@ dlg@


Revision tags: OPENBSD_6_9_BASE
# 1.74 26-Mar-2021 jan

Add PCI ID for Intel X710 10G SFP+ NIC

ok patrick@


# 1.73 26-Feb-2021 jan

Add missing PCI product IDs for x710 10GBase-T into ixl(4)

OK phessler


# 1.72 25-Jan-2021 dlg

if the rx descriptor reports the rss hash, use it for the mbuf flowid.

ok jmatthew@


# 1.71 22-Dec-2020 dlg

name the rx rings like ix does for systat mb


# 1.70 12-Dec-2020 jan

Rename the macro MCLGETI to MCLGETL and removes the dead parameter ifp.

OK dlg@, bluhm@
No Opinion mpi@
Not against it claudio@


# 1.69 02-Nov-2020 dlg

add support for ctl reads and writes on 710 chips with API < 1.5

this gets rid of some annoying errors when bringing such an interface
up, but more importantly is allows RSS to work on these boards with
older firmware.

ok jmatthew@


Revision tags: OPENBSD_6_8_BASE
# 1.68 16-Jul-2020 dlg

use a mutex to serialise the test and set of ifp->if_link_state.

this was serialised by NET_LOCK, but now i get link state change
information in an interrupt context, so i shouldn't (can't) do that
anymore.

ok jmatthew@


# 1.67 16-Jul-2020 dlg

sc_atq_mtx is unused, so get rid of it


# 1.66 12-Jul-2020 dlg

it's not an error if the API doesnt support reading sfp/qsfp stuff.

it's just not supported. the manpage says why.

ok sthen@ deraadt@


# 1.65 11-Jul-2020 dlg

don't complain if the API is too low to support "query phy types".

it just means we won't know what to show in ifconfig media output,
but that's not a huge deal. there's still some more issues around
api versions and driver support that we're working on though.

ok sthen@ jmatthew@


# 1.64 10-Jul-2020 patrick

Change users of IFQ_SET_MAXLEN() and IFQ_IS_EMPTY() to use the "new" API.

ok dlg@ tobhe@


# 1.63 07-Jul-2020 dlg

fix a copy pasto.

from netbsd if_ixl.c r1.39 by yamaguchi


# 1.62 07-Jul-2020 dlg

don't try and be too clever in the kstat update timeout.


# 1.61 07-Jul-2020 dlg

remove some old code from a previous version of the kstat diff.

ixl(4) is only enabled on 64bit archs, so we don't need a 32bit
fallback for the 48 and 64 bit counter reads.


# 1.60 07-Jul-2020 dlg

add kstat support for reading hardware counters.

this chip is annoyingly complicated, which is reflected in how
complicated it is to read counters off the chip. while we just use
ixl as a normal network interface, the chip is capable of being a
switch with physical ports, virtual ports and all sorts of other
functionality, and there are counters in different places for all
these different pieces. in our simple setup the driver interface
is mapped to a single physical port which we talk to via a single
virtual switch interface. this diff adds counters on each interface
for the physical port and for the virtual switch interface (vsi).

the port counters show what the hardware is doing, while the vsi
counters show how the driver is interacting with the chip. for
things like packet counters, these numbers tend to correlate strongly,
but there are some differences. if the chip drops packets cos there's
no descriptors on the rx ring, that's shown in the vsi counters.
problems talking to the physical network (eg, packet corruption off
the wire) are reported on the port counters.

on top of the chip just being complicated, reading the counters is
a complicated activity on its own. because the counters can be read
by multiple consumers in a virtualised environment, the counters
never really get reset. they are also stored in annoyingly small
fields. this means you basically have to poll the chip periodically
and calculate differences between the polls to avoid losing numbers
if they overflow too quickly.


# 1.59 26-Jun-2020 dlg

whitespace fixes, no functional change.


# 1.58 26-Jun-2020 dlg

fix link state handling so we can see link go both up and down.


# 1.57 25-Jun-2020 dlg

LUT writes go through actual regsiters on 710s, not ctl writes.

found by jmatthew@


# 1.56 25-Jun-2020 dlg

add rss/toeplitz support for 710 chips.

this basically distributes incoming packets over the rx rings, where
without this they would all land on ring 0.

note that the 722 stuff is stubbed out at the moment. i don't have
an x722 to test with, so it's hard to get motivated to write the
code for it.

this is based on stuff supplied by christiano haesbaert.


# 1.55 25-Jun-2020 dlg

use the ixl_chip struct to store different rss_hena settings for 710/722


# 1.54 25-Jun-2020 dlg

add definitions for rss bits.

based on info from christiano haesbaert


# 1.53 25-Jun-2020 dlg

we're close to a point where the differences between 710s and 722s matter.

this adds a struct ixl_chip, which should hold the differences in
functionality between 710s and 722s. this adds which type of chip
each product is to the ixl_devices array.

based on stuff from christiano haesbaert


# 1.52 25-Jun-2020 dlg

use intrmap to set up multiple queues across multiple cpus.

ixl(4) is only enabled on amd64 and sparc64, and both of them now
support pci_intr_establish_cpu(), so it is safe to apply this.

a few things to note:

- the chip only supports a power of 2 number of queues, (ie, 1, 2,
4, 8, etc), so this also tests the INTRMAP_POWEROF2 flag to
intrmap_create. i tested this on a box with 6 cpus and it did the
right thing.
- the chip can support a lot of vectors, but we're limiting it to
8 for now.
- rss with toeplitz is not implemented yet, so all rxed packets end
up on the 0th queue for now.

jmatthew@ had done most of the work already, and christiano haesbaert
provided some hints to motivate me to work on this bit of it.

tested by jmatthew@ on sparc64
ok jmatthew@


# 1.51 24-Jun-2020 dlg

set IFQ_SET_MAXLEN to the number of slots on the tx ring, not 1.

this effectively enables tx mitigation on this chip. hrvoje popovski
tested it and discovered it adds about 20% to forwarding performance
on his test machine, and brings it more in line with ix(4) performance.

jmatthew thinks i copied setting it to 1 from myx, but myx resets
it to a proper value later on when it figures out what the chip is
capable of. how embarrassment.


# 1.50 24-Jun-2020 dlg

get rid of the per device sff lock because we only use the global one.

no functional change


# 1.49 21-Jun-2020 jmatthew

The onboard interfaces on T7/S7 machines don't provide a valid MAC address
for themselves, so use the "local-mac-address" Open Firmware property
instead, as done in ix(4).

ok dlg@


# 1.48 09-May-2020 jmatthew

Use MSI-X interrupts where available. The first vector is used for
events and command completions as that's the only vector they can go to.
tx/rx queues are mapped to subsequent vectors.

ok mpi@ dlg@


Revision tags: OPENBSD_6_7_BASE
# 1.47 22-Apr-2020 mpi

Use I40E_QUEUE_TYPE_EOL instead of hardcoding its value localy.

ok jmatthew@


# 1.46 19-Nov-2019 yasuoka

Correct the link speed constant variables. From Shoichi Yamaguchi.

ok jmatthew


Revision tags: OPENBSD_6_6_BASE
# 1.45 02-Oct-2019 yasuoka

When dequeuing an aqb from the live queue and move it to the idle queue,
remove it from the live queue. Found by Shoichi Yamaguchi.

ok dlg


# 1.44 30-Sep-2019 dlg

use the right values when figuring out which if_baudrate to use.

i was basically using ISSET(bitfield, shift) instead of
ISSET(bitfield, 1 << shift), so things didn't line up properly.

before this the baudrate would come out as 0, but lacp mode in
trunk(4) wants a non-zero if_baudrate value to help pick the best
aggregator to use. if the port reports 0, it doesn't get selected.
this probably explains why trunk(4) doesn't like doing lacp on
virtual ethernet interfaces too, but at least i can get a number
that actually means something on a hardware interface. alternatively,
use aggr(4), which doesn't care about the baudrate.

ok jmatthew@


# 1.43 27-Aug-2019 dlg

don't check the page number was set correctly.

there are some extremely terrible modules out there that violate
specs badly. some SFP+ DACs i have don't support page switching,
and always report page 0xff. let userland look at them anyway.


# 1.42 27-Aug-2019 dlg

make SIOCGIFSFFPAGE support QSFP modules better.

basically the "read phy register" admin op uses the "address" field
on SFP modules for the i2c address, but uses that same field for
the "sff page number" for QSFP. this wraps the calls to the admin
op with sfp and qsfp specific code, and chooses between them depending
on what type of phy the fw reports.

tested by me and jmatthew@ pretty hard


# 1.41 29-Jul-2019 jmatthew

remove some VF bits now that iavf(4) exists

ok dlg@


# 1.40 21-Jul-2019 dlg

apply backpressure when ifiq says the stack is getting busy


# 1.39 04-Jul-2019 jmatthew

when the mac address changes, update the mac/vlan filters accordingly.

ok dlg@


# 1.38 04-May-2019 jsg

fix array bounds check in ixl_search_link_speed()
ok dlg@


# 1.37 15-Apr-2019 visa

Use timeout_del_barrier(9) instead of timeout_del(9) followed by
conditional timeout_barrier(9).

OK kn@ dlg@


Revision tags: OPENBSD_6_5_BASE
# 1.36 10-Apr-2019 phessler

Add support for X722 to ixl(4)

full list of pci ids from sthen@

OK sthen@, jmatthew@, deraadt@


# 1.35 10-Apr-2019 dlg

implement support for SIOCGIFSFFPAGE

this relies on some firmware commands for accessing the i2c bus
that are not available until a relatively recent API version. our
nics using API 1.4 and 1.5 don't handle the command, but the API
1.7 nic we have is happy to talk to the module that is plugged into
it.

xl710 cards (the 40g ones with a qsfp+ connector) can be split up
into 4 functions that represent lanes on a single port. you can get
qsfp+ to 4x sfp+ cables so you can use the different lanes/functions
as completely independent interfaces. however, because each lane/
function is one port and therefore module, we need to serialise
access to the module by at least the port. this is to prevent
concurrent reads of different pages of the one module from stepping
on each other. i took the easy path and made a single ixlsff lock,
which is at least conservative.


# 1.34 01-Apr-2019 jmatthew

Enable the full pre-reset code path again, now that we've fixed the cause
of the pcie errors.

ok dlg@


# 1.33 01-Apr-2019 jmatthew

Don't use a prefetchable mapping for the registers - the controller only
accepts specific read and write widths, so prefetching and write combining
can cause it to generate pcie errors.

ok dlg@


# 1.32 29-Mar-2019 dlg

remove ifiq_barrier in ixl_down cos ifiq tasks don't use nic resources.

ixl_down frees all the dma memory used by the rings, but that memory
isn't used by ifiqs, so there's no need to wait for them to finish


# 1.31 22-Mar-2019 dlg

back out 1.28

i misread the doco, having promisc vlan reception is what we want.


# 1.30 22-Mar-2019 dlg

use a cond in ixl_atq_exec to wait for a command to be completed.

the main change is to not run ixl_atq_done, cos the interrupt handler
is supposed to do that for us.


# 1.29 22-Mar-2019 dlg

don't blindly complete admin queue entries

there's a bit in the flags field that the firmware sets when the
command is done, so check to decide if the command is ready to be
completed. this in turn makes ixl_iff work.

"oh" jmatthew@


# 1.28 21-Mar-2019 dlg

don't set VLAN things when configuring promisc

the doco says vlan things should only be set if you're manipulating
something on a specific vlan, otherwise the config applies to
everything, which is how the stack wants things to work at the
moment.

there's still something wrong in here, but let's get this out of
the way first.


# 1.27 21-Mar-2019 dlg

run event callbacks directly in the atq processing

previously events were queued on an SLIST, but multiple link state
events could fire with the same callback. this corrupted the SLIST
and effectively caused an infinite loop.

ok jmatthew@


# 1.26 12-Mar-2019 jmatthew

Until we can figure out why it causes NMIs on some machines, skip the
pre-reset steps described in Intel's datasheet and also their driver code.

ok dlg@


# 1.25 06-Mar-2019 jmatthew

Ignore rx interrupts if the interface isn't supposed to be running.
In some situations the pxe rom seems to leave rx interrupts
pending, so we get them as soon as we turn interrupts on.
Trying to process rx interrupts before we've allocated an rx ring
leads to crashes.

ok dlg@


# 1.24 01-Mar-2019 dlg

use ifiq_input instead of if_input

call if_rxr_livelocked if ifiq_input says to slow down


# 1.23 26-Feb-2019 dlg

tweak the mbuf loading in the tx path so it's easier to read.

count mbuf load failures as output errors so i can see if that's
a problem (it's not, but at least i can see it isn't now).


# 1.22 26-Feb-2019 dlg

put the rx buffer at the end of the cluster.

makes m_prepend later less likely to allocate a new mbuf.


# 1.21 26-Feb-2019 dlg

avoid a deadlock in ixl_down when calling ifq_barrier.

this is particularly noticable on sparc64 when you reboot.

ok jmatthew@


# 1.20 24-Feb-2019 dlg

get rid of an unused softc member


# 1.19 01-Feb-2019 jmatthew

fix up calculation of our physical function id, making the second port
on dual port cards work much better.

ok dlg@


# 1.18 29-Jan-2019 dlg

get rid of some more debug printfs

suggested by jmatthew@


# 1.17 29-Jan-2019 dlg

don't need to print the base queue number.

ok jmatthew@


# 1.16 22-Jan-2019 jmatthew

Increase hardmtu to the maximum according to the datasheet and set the rx
packet size limit to match so jumbos actually work. Larger packets are
split across multiple buffers on the ring, so the buffers themselves stay
the same size.

ok dlg@


# 1.15 22-Jan-2019 jmatthew

Add and remove mac filters for multicast addresses.

ok dlg@


# 1.14 21-Jan-2019 jmatthew

also add a mac filter that ignores vlans for the broadcast address, so we can
see arp requests on vlans, among other things.

ok dlg@


# 1.13 20-Jan-2019 jmatthew

Replace the default mac filter with one that ignores vlans, and enable
promisc vlan mode so we can see vlan tagged traffic.

ok dlg@


# 1.12 20-Jan-2019 jmatthew

implement ixl_rxrinfo, dynamically allocating the buffer to prepare for
multiple receive rings at some point in the future.

ok dlg@


# 1.11 20-Jan-2019 jmatthew

Handle link state change interrupts by issuing IXL_AQ_OP_PHY_LINK_STATUS
to the admin queue. We don't need to wait for or process the reply,
because the existing admin reply queue processing already does it.

ok dlg@


# 1.10 19-Jan-2019 jmatthew

actually set CAUSE_ENA on the rx and tx queues, and re-enable interrupts
at the start of the interrupt handler. now it works well enough to commit
over.

ok dlg@


# 1.9 18-Jan-2019 jmatthew

pack hmc bits in the right order

ok dlg@


# 1.8 18-Nov-2018 jmatthew

request notification of link state changes, which helps us detect
link when it takes a bit longer to establish.

ok dlg@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.7 21-Dec-2017 dlg

add some ifq and ifiq barriers in ixl_down.

move rxfill in ixl_up so the qtail is only written once.


# 1.6 21-Dec-2017 dlg

now that we have multiple input queues in ifnet structs, use them.

for now we still only have one set of tx and rx rings, but sets us up to
bump that number.


# 1.5 15-Dec-2017 dlg

put where im up to into the tree so jmatthew@ can look at it.


# 1.4 29-Nov-2017 dlg

turns out you dont have to configure a vsi as the default in a veb.

what a waste of two days.


# 1.3 29-Nov-2017 dlg

let this build on sparc64 again.


# 1.2 28-Nov-2017 dlg

remove the #if 0ed out ixl_add_veb now that cvs has backed it up

im pretty sure we dont need it if we're just using the chip as a
single ethernet port.


# 1.1 28-Nov-2017 dlg

add ixl(4) for the "Intel Ethernet 700 Series"

this doesn't work yet, but it very recently got too big to hack on
without cvs to help me manage further changes to it.

ok deraadt@


# 1.83 11-Mar-2022 mpi

Constify struct cfattach.


# 1.82 10-Feb-2022 bluhm

Enable receive checksum offloading on ixl(4) network interfaces.
from jan@; test and OK dlg@


# 1.81 09-Feb-2022 dlg

enable hw vlan tag handling in the rx path too.

also tested on both x86 and sparc64.


# 1.80 09-Feb-2022 dlg

enable hardware vlan tagging.

tested on sparc64 and x86


# 1.79 08-Feb-2022 dlg

bring back IPv4, TCP4/6 and UDP4/6 checksum offloading.

this was first introduced in r1.176 by jan@. this diff includes two
fixes to that implementation.

the most important one is to parse the ip and tcp headers before a
possible call to m_defrag. if an l4 offload is requested, it's only
requested by the stack when the payload is correctly aligned and
with each header contiguous in memory. this means you can use
m_getptr and cast the packet data to the relevant headers to read
them directly because that's what the stack does when it's working
on them. this makes it cheap to work on them too.

however, if you m_defrag, it ignores the alignment and ends up
making it unsafe to dereference the ip and tcp/udp payloads on
strict alignment architectures. if we want to look at the headers
after m_defrag, we'd likely have to copy them onto the stack first.

the other fix is to reset the offload bits between packets in the
loop in ixl_start.

another difference is that this code skips parsing the packet if
no checksum offload is requested.

tests and a tweak by bluhm@ to actually use the offloading
tested by me on sparc64 and x86 boxes

ok bluhm@ jmatthew@


# 1.78 09-Jan-2022 jsg

spelling
feedback and ok tb@ jmc@ ok ratchov@


# 1.77 27-Nov-2021 deraadt

previous commit causes gcc to perform an unaligned access to the tcphdr
(at least on sparc64) since it accesses the bitfield using an "int sized"
instructions, rather than the minimally sized byte instruction. This is
permitted by the language laywers who probably prefer we change the tcphdr
in every packet. It is not clear how to convince gcc to avoid this behaviour,
and a week of futzing hasn't found fast path solutions yet. In the meantime
the tree may not be broken.


# 1.76 09-Nov-2021 jan

Add IPv4, TCP4/6 and UDP4/6 checksum offloading.

ok jmatthew@


Revision tags: OPENBSD_7_0_BASE
# 1.75 23-Jul-2021 jmatthew

pci_intr_msix_count() is the function that drivers using multiple MSI-X
vectors use to decide whether to use MSI-X, so make it return 0 if MSI
is not enabled for the device.

fixes problems with ix(4) on older amd64 hardware and current riscv64
ok kettenis@ dlg@


Revision tags: OPENBSD_6_9_BASE
# 1.74 26-Mar-2021 jan

Add PCI ID for Intel X710 10G SFP+ NIC

ok patrick@


# 1.73 26-Feb-2021 jan

Add missing PCI product IDs for x710 10GBase-T into ixl(4)

OK phessler


# 1.72 25-Jan-2021 dlg

if the rx descriptor reports the rss hash, use it for the mbuf flowid.

ok jmatthew@


# 1.71 22-Dec-2020 dlg

name the rx rings like ix does for systat mb


# 1.70 12-Dec-2020 jan

Rename the macro MCLGETI to MCLGETL and removes the dead parameter ifp.

OK dlg@, bluhm@
No Opinion mpi@
Not against it claudio@


# 1.69 02-Nov-2020 dlg

add support for ctl reads and writes on 710 chips with API < 1.5

this gets rid of some annoying errors when bringing such an interface
up, but more importantly is allows RSS to work on these boards with
older firmware.

ok jmatthew@


Revision tags: OPENBSD_6_8_BASE
# 1.68 16-Jul-2020 dlg

use a mutex to serialise the test and set of ifp->if_link_state.

this was serialised by NET_LOCK, but now i get link state change
information in an interrupt context, so i shouldn't (can't) do that
anymore.

ok jmatthew@


# 1.67 16-Jul-2020 dlg

sc_atq_mtx is unused, so get rid of it


# 1.66 12-Jul-2020 dlg

it's not an error if the API doesnt support reading sfp/qsfp stuff.

it's just not supported. the manpage says why.

ok sthen@ deraadt@


# 1.65 11-Jul-2020 dlg

don't complain if the API is too low to support "query phy types".

it just means we won't know what to show in ifconfig media output,
but that's not a huge deal. there's still some more issues around
api versions and driver support that we're working on though.

ok sthen@ jmatthew@


# 1.64 10-Jul-2020 patrick

Change users of IFQ_SET_MAXLEN() and IFQ_IS_EMPTY() to use the "new" API.

ok dlg@ tobhe@


# 1.63 07-Jul-2020 dlg

fix a copy pasto.

from netbsd if_ixl.c r1.39 by yamaguchi


# 1.62 07-Jul-2020 dlg

don't try and be too clever in the kstat update timeout.


# 1.61 07-Jul-2020 dlg

remove some old code from a previous version of the kstat diff.

ixl(4) is only enabled on 64bit archs, so we don't need a 32bit
fallback for the 48 and 64 bit counter reads.


# 1.60 07-Jul-2020 dlg

add kstat support for reading hardware counters.

this chip is annoyingly complicated, which is reflected in how
complicated it is to read counters off the chip. while we just use
ixl as a normal network interface, the chip is capable of being a
switch with physical ports, virtual ports and all sorts of other
functionality, and there are counters in different places for all
these different pieces. in our simple setup the driver interface
is mapped to a single physical port which we talk to via a single
virtual switch interface. this diff adds counters on each interface
for the physical port and for the virtual switch interface (vsi).

the port counters show what the hardware is doing, while the vsi
counters show how the driver is interacting with the chip. for
things like packet counters, these numbers tend to correlate strongly,
but there are some differences. if the chip drops packets cos there's
no descriptors on the rx ring, that's shown in the vsi counters.
problems talking to the physical network (eg, packet corruption off
the wire) are reported on the port counters.

on top of the chip just being complicated, reading the counters is
a complicated activity on its own. because the counters can be read
by multiple consumers in a virtualised environment, the counters
never really get reset. they are also stored in annoyingly small
fields. this means you basically have to poll the chip periodically
and calculate differences between the polls to avoid losing numbers
if they overflow too quickly.


# 1.59 26-Jun-2020 dlg

whitespace fixes, no functional change.


# 1.58 26-Jun-2020 dlg

fix link state handling so we can see link go both up and down.


# 1.57 25-Jun-2020 dlg

LUT writes go through actual regsiters on 710s, not ctl writes.

found by jmatthew@


# 1.56 25-Jun-2020 dlg

add rss/toeplitz support for 710 chips.

this basically distributes incoming packets over the rx rings, where
without this they would all land on ring 0.

note that the 722 stuff is stubbed out at the moment. i don't have
an x722 to test with, so it's hard to get motivated to write the
code for it.

this is based on stuff supplied by christiano haesbaert.


# 1.55 25-Jun-2020 dlg

use the ixl_chip struct to store different rss_hena settings for 710/722


# 1.54 25-Jun-2020 dlg

add definitions for rss bits.

based on info from christiano haesbaert


# 1.53 25-Jun-2020 dlg

we're close to a point where the differences between 710s and 722s matter.

this adds a struct ixl_chip, which should hold the differences in
functionality between 710s and 722s. this adds which type of chip
each product is to the ixl_devices array.

based on stuff from christiano haesbaert


# 1.52 25-Jun-2020 dlg

use intrmap to set up multiple queues across multiple cpus.

ixl(4) is only enabled on amd64 and sparc64, and both of them now
support pci_intr_establish_cpu(), so it is safe to apply this.

a few things to note:

- the chip only supports a power of 2 number of queues, (ie, 1, 2,
4, 8, etc), so this also tests the INTRMAP_POWEROF2 flag to
intrmap_create. i tested this on a box with 6 cpus and it did the
right thing.
- the chip can support a lot of vectors, but we're limiting it to
8 for now.
- rss with toeplitz is not implemented yet, so all rxed packets end
up on the 0th queue for now.

jmatthew@ had done most of the work already, and christiano haesbaert
provided some hints to motivate me to work on this bit of it.

tested by jmatthew@ on sparc64
ok jmatthew@


# 1.51 24-Jun-2020 dlg

set IFQ_SET_MAXLEN to the number of slots on the tx ring, not 1.

this effectively enables tx mitigation on this chip. hrvoje popovski
tested it and discovered it adds about 20% to forwarding performance
on his test machine, and brings it more in line with ix(4) performance.

jmatthew thinks i copied setting it to 1 from myx, but myx resets
it to a proper value later on when it figures out what the chip is
capable of. how embarrassment.


# 1.50 24-Jun-2020 dlg

get rid of the per device sff lock because we only use the global one.

no functional change


# 1.49 21-Jun-2020 jmatthew

The onboard interfaces on T7/S7 machines don't provide a valid MAC address
for themselves, so use the "local-mac-address" Open Firmware property
instead, as done in ix(4).

ok dlg@


# 1.48 09-May-2020 jmatthew

Use MSI-X interrupts where available. The first vector is used for
events and command completions as that's the only vector they can go to.
tx/rx queues are mapped to subsequent vectors.

ok mpi@ dlg@


Revision tags: OPENBSD_6_7_BASE
# 1.47 22-Apr-2020 mpi

Use I40E_QUEUE_TYPE_EOL instead of hardcoding its value localy.

ok jmatthew@


# 1.46 19-Nov-2019 yasuoka

Correct the link speed constant variables. From Shoichi Yamaguchi.

ok jmatthew


Revision tags: OPENBSD_6_6_BASE
# 1.45 02-Oct-2019 yasuoka

When dequeuing an aqb from the live queue and move it to the idle queue,
remove it from the live queue. Found by Shoichi Yamaguchi.

ok dlg


# 1.44 30-Sep-2019 dlg

use the right values when figuring out which if_baudrate to use.

i was basically using ISSET(bitfield, shift) instead of
ISSET(bitfield, 1 << shift), so things didn't line up properly.

before this the baudrate would come out as 0, but lacp mode in
trunk(4) wants a non-zero if_baudrate value to help pick the best
aggregator to use. if the port reports 0, it doesn't get selected.
this probably explains why trunk(4) doesn't like doing lacp on
virtual ethernet interfaces too, but at least i can get a number
that actually means something on a hardware interface. alternatively,
use aggr(4), which doesn't care about the baudrate.

ok jmatthew@


# 1.43 27-Aug-2019 dlg

don't check the page number was set correctly.

there are some extremely terrible modules out there that violate
specs badly. some SFP+ DACs i have don't support page switching,
and always report page 0xff. let userland look at them anyway.


# 1.42 27-Aug-2019 dlg

make SIOCGIFSFFPAGE support QSFP modules better.

basically the "read phy register" admin op uses the "address" field
on SFP modules for the i2c address, but uses that same field for
the "sff page number" for QSFP. this wraps the calls to the admin
op with sfp and qsfp specific code, and chooses between them depending
on what type of phy the fw reports.

tested by me and jmatthew@ pretty hard


# 1.41 29-Jul-2019 jmatthew

remove some VF bits now that iavf(4) exists

ok dlg@


# 1.40 21-Jul-2019 dlg

apply backpressure when ifiq says the stack is getting busy


# 1.39 04-Jul-2019 jmatthew

when the mac address changes, update the mac/vlan filters accordingly.

ok dlg@


# 1.38 04-May-2019 jsg

fix array bounds check in ixl_search_link_speed()
ok dlg@


# 1.37 15-Apr-2019 visa

Use timeout_del_barrier(9) instead of timeout_del(9) followed by
conditional timeout_barrier(9).

OK kn@ dlg@


Revision tags: OPENBSD_6_5_BASE
# 1.36 10-Apr-2019 phessler

Add support for X722 to ixl(4)

full list of pci ids from sthen@

OK sthen@, jmatthew@, deraadt@


# 1.35 10-Apr-2019 dlg

implement support for SIOCGIFSFFPAGE

this relies on some firmware commands for accessing the i2c bus
that are not available until a relatively recent API version. our
nics using API 1.4 and 1.5 don't handle the command, but the API
1.7 nic we have is happy to talk to the module that is plugged into
it.

xl710 cards (the 40g ones with a qsfp+ connector) can be split up
into 4 functions that represent lanes on a single port. you can get
qsfp+ to 4x sfp+ cables so you can use the different lanes/functions
as completely independent interfaces. however, because each lane/
function is one port and therefore module, we need to serialise
access to the module by at least the port. this is to prevent
concurrent reads of different pages of the one module from stepping
on each other. i took the easy path and made a single ixlsff lock,
which is at least conservative.


# 1.34 01-Apr-2019 jmatthew

Enable the full pre-reset code path again, now that we've fixed the cause
of the pcie errors.

ok dlg@


# 1.33 01-Apr-2019 jmatthew

Don't use a prefetchable mapping for the registers - the controller only
accepts specific read and write widths, so prefetching and write combining
can cause it to generate pcie errors.

ok dlg@


# 1.32 29-Mar-2019 dlg

remove ifiq_barrier in ixl_down cos ifiq tasks don't use nic resources.

ixl_down frees all the dma memory used by the rings, but that memory
isn't used by ifiqs, so there's no need to wait for them to finish


# 1.31 22-Mar-2019 dlg

back out 1.28

i misread the doco, having promisc vlan reception is what we want.


# 1.30 22-Mar-2019 dlg

use a cond in ixl_atq_exec to wait for a command to be completed.

the main change is to not run ixl_atq_done, cos the interrupt handler
is supposed to do that for us.


# 1.29 22-Mar-2019 dlg

don't blindly complete admin queue entries

there's a bit in the flags field that the firmware sets when the
command is done, so check to decide if the command is ready to be
completed. this in turn makes ixl_iff work.

"oh" jmatthew@


# 1.28 21-Mar-2019 dlg

don't set VLAN things when configuring promisc

the doco says vlan things should only be set if you're manipulating
something on a specific vlan, otherwise the config applies to
everything, which is how the stack wants things to work at the
moment.

there's still something wrong in here, but let's get this out of
the way first.


# 1.27 21-Mar-2019 dlg

run event callbacks directly in the atq processing

previously events were queued on an SLIST, but multiple link state
events could fire with the same callback. this corrupted the SLIST
and effectively caused an infinite loop.

ok jmatthew@


# 1.26 12-Mar-2019 jmatthew

Until we can figure out why it causes NMIs on some machines, skip the
pre-reset steps described in Intel's datasheet and also their driver code.

ok dlg@


# 1.25 06-Mar-2019 jmatthew

Ignore rx interrupts if the interface isn't supposed to be running.
In some situations the pxe rom seems to leave rx interrupts
pending, so we get them as soon as we turn interrupts on.
Trying to process rx interrupts before we've allocated an rx ring
leads to crashes.

ok dlg@


# 1.24 01-Mar-2019 dlg

use ifiq_input instead of if_input

call if_rxr_livelocked if ifiq_input says to slow down


# 1.23 26-Feb-2019 dlg

tweak the mbuf loading in the tx path so it's easier to read.

count mbuf load failures as output errors so i can see if that's
a problem (it's not, but at least i can see it isn't now).


# 1.22 26-Feb-2019 dlg

put the rx buffer at the end of the cluster.

makes m_prepend later less likely to allocate a new mbuf.


# 1.21 26-Feb-2019 dlg

avoid a deadlock in ixl_down when calling ifq_barrier.

this is particularly noticable on sparc64 when you reboot.

ok jmatthew@


# 1.20 24-Feb-2019 dlg

get rid of an unused softc member


# 1.19 01-Feb-2019 jmatthew

fix up calculation of our physical function id, making the second port
on dual port cards work much better.

ok dlg@


# 1.18 29-Jan-2019 dlg

get rid of some more debug printfs

suggested by jmatthew@


# 1.17 29-Jan-2019 dlg

don't need to print the base queue number.

ok jmatthew@


# 1.16 22-Jan-2019 jmatthew

Increase hardmtu to the maximum according to the datasheet and set the rx
packet size limit to match so jumbos actually work. Larger packets are
split across multiple buffers on the ring, so the buffers themselves stay
the same size.

ok dlg@


# 1.15 22-Jan-2019 jmatthew

Add and remove mac filters for multicast addresses.

ok dlg@


# 1.14 21-Jan-2019 jmatthew

also add a mac filter that ignores vlans for the broadcast address, so we can
see arp requests on vlans, among other things.

ok dlg@


# 1.13 20-Jan-2019 jmatthew

Replace the default mac filter with one that ignores vlans, and enable
promisc vlan mode so we can see vlan tagged traffic.

ok dlg@


# 1.12 20-Jan-2019 jmatthew

implement ixl_rxrinfo, dynamically allocating the buffer to prepare for
multiple receive rings at some point in the future.

ok dlg@


# 1.11 20-Jan-2019 jmatthew

Handle link state change interrupts by issuing IXL_AQ_OP_PHY_LINK_STATUS
to the admin queue. We don't need to wait for or process the reply,
because the existing admin reply queue processing already does it.

ok dlg@


# 1.10 19-Jan-2019 jmatthew

actually set CAUSE_ENA on the rx and tx queues, and re-enable interrupts
at the start of the interrupt handler. now it works well enough to commit
over.

ok dlg@


# 1.9 18-Jan-2019 jmatthew

pack hmc bits in the right order

ok dlg@


# 1.8 18-Nov-2018 jmatthew

request notification of link state changes, which helps us detect
link when it takes a bit longer to establish.

ok dlg@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.7 21-Dec-2017 dlg

add some ifq and ifiq barriers in ixl_down.

move rxfill in ixl_up so the qtail is only written once.


# 1.6 21-Dec-2017 dlg

now that we have multiple input queues in ifnet structs, use them.

for now we still only have one set of tx and rx rings, but sets us up to
bump that number.


# 1.5 15-Dec-2017 dlg

put where im up to into the tree so jmatthew@ can look at it.


# 1.4 29-Nov-2017 dlg

turns out you dont have to configure a vsi as the default in a veb.

what a waste of two days.


# 1.3 29-Nov-2017 dlg

let this build on sparc64 again.


# 1.2 28-Nov-2017 dlg

remove the #if 0ed out ixl_add_veb now that cvs has backed it up

im pretty sure we dont need it if we're just using the chip as a
single ethernet port.


# 1.1 28-Nov-2017 dlg

add ixl(4) for the "Intel Ethernet 700 Series"

this doesn't work yet, but it very recently got too big to hack on
without cvs to help me manage further changes to it.

ok deraadt@


# 1.82 10-Feb-2022 bluhm

Enable receive checksum offloading on ixl(4) network interfaces.
from jan@; test and OK dlg@


# 1.81 09-Feb-2022 dlg

enable hw vlan tag handling in the rx path too.

also tested on both x86 and sparc64.


# 1.80 09-Feb-2022 dlg

enable hardware vlan tagging.

tested on sparc64 and x86


# 1.79 08-Feb-2022 dlg

bring back IPv4, TCP4/6 and UDP4/6 checksum offloading.

this was first introduced in r1.176 by jan@. this diff includes two
fixes to that implementation.

the most important one is to parse the ip and tcp headers before a
possible call to m_defrag. if an l4 offload is requested, it's only
requested by the stack when the payload is correctly aligned and
with each header contiguous in memory. this means you can use
m_getptr and cast the packet data to the relevant headers to read
them directly because that's what the stack does when it's working
on them. this makes it cheap to work on them too.

however, if you m_defrag, it ignores the alignment and ends up
making it unsafe to dereference the ip and tcp/udp payloads on
strict alignment architectures. if we want to look at the headers
after m_defrag, we'd likely have to copy them onto the stack first.

the other fix is to reset the offload bits between packets in the
loop in ixl_start.

another difference is that this code skips parsing the packet if
no checksum offload is requested.

tests and a tweak by bluhm@ to actually use the offloading
tested by me on sparc64 and x86 boxes

ok bluhm@ jmatthew@


# 1.78 09-Jan-2022 jsg

spelling
feedback and ok tb@ jmc@ ok ratchov@


# 1.77 27-Nov-2021 deraadt

previous commit causes gcc to perform an unaligned access to the tcphdr
(at least on sparc64) since it accesses the bitfield using an "int sized"
instructions, rather than the minimally sized byte instruction. This is
permitted by the language laywers who probably prefer we change the tcphdr
in every packet. It is not clear how to convince gcc to avoid this behaviour,
and a week of futzing hasn't found fast path solutions yet. In the meantime
the tree may not be broken.


# 1.76 09-Nov-2021 jan

Add IPv4, TCP4/6 and UDP4/6 checksum offloading.

ok jmatthew@


Revision tags: OPENBSD_7_0_BASE
# 1.75 23-Jul-2021 jmatthew

pci_intr_msix_count() is the function that drivers using multiple MSI-X
vectors use to decide whether to use MSI-X, so make it return 0 if MSI
is not enabled for the device.

fixes problems with ix(4) on older amd64 hardware and current riscv64
ok kettenis@ dlg@


Revision tags: OPENBSD_6_9_BASE
# 1.74 26-Mar-2021 jan

Add PCI ID for Intel X710 10G SFP+ NIC

ok patrick@


# 1.73 26-Feb-2021 jan

Add missing PCI product IDs for x710 10GBase-T into ixl(4)

OK phessler


# 1.72 25-Jan-2021 dlg

if the rx descriptor reports the rss hash, use it for the mbuf flowid.

ok jmatthew@


# 1.71 22-Dec-2020 dlg

name the rx rings like ix does for systat mb


# 1.70 12-Dec-2020 jan

Rename the macro MCLGETI to MCLGETL and removes the dead parameter ifp.

OK dlg@, bluhm@
No Opinion mpi@
Not against it claudio@


# 1.69 02-Nov-2020 dlg

add support for ctl reads and writes on 710 chips with API < 1.5

this gets rid of some annoying errors when bringing such an interface
up, but more importantly is allows RSS to work on these boards with
older firmware.

ok jmatthew@


Revision tags: OPENBSD_6_8_BASE
# 1.68 16-Jul-2020 dlg

use a mutex to serialise the test and set of ifp->if_link_state.

this was serialised by NET_LOCK, but now i get link state change
information in an interrupt context, so i shouldn't (can't) do that
anymore.

ok jmatthew@


# 1.67 16-Jul-2020 dlg

sc_atq_mtx is unused, so get rid of it


# 1.66 12-Jul-2020 dlg

it's not an error if the API doesnt support reading sfp/qsfp stuff.

it's just not supported. the manpage says why.

ok sthen@ deraadt@


# 1.65 11-Jul-2020 dlg

don't complain if the API is too low to support "query phy types".

it just means we won't know what to show in ifconfig media output,
but that's not a huge deal. there's still some more issues around
api versions and driver support that we're working on though.

ok sthen@ jmatthew@


# 1.64 10-Jul-2020 patrick

Change users of IFQ_SET_MAXLEN() and IFQ_IS_EMPTY() to use the "new" API.

ok dlg@ tobhe@


# 1.63 07-Jul-2020 dlg

fix a copy pasto.

from netbsd if_ixl.c r1.39 by yamaguchi


# 1.62 07-Jul-2020 dlg

don't try and be too clever in the kstat update timeout.


# 1.61 07-Jul-2020 dlg

remove some old code from a previous version of the kstat diff.

ixl(4) is only enabled on 64bit archs, so we don't need a 32bit
fallback for the 48 and 64 bit counter reads.


# 1.60 07-Jul-2020 dlg

add kstat support for reading hardware counters.

this chip is annoyingly complicated, which is reflected in how
complicated it is to read counters off the chip. while we just use
ixl as a normal network interface, the chip is capable of being a
switch with physical ports, virtual ports and all sorts of other
functionality, and there are counters in different places for all
these different pieces. in our simple setup the driver interface
is mapped to a single physical port which we talk to via a single
virtual switch interface. this diff adds counters on each interface
for the physical port and for the virtual switch interface (vsi).

the port counters show what the hardware is doing, while the vsi
counters show how the driver is interacting with the chip. for
things like packet counters, these numbers tend to correlate strongly,
but there are some differences. if the chip drops packets cos there's
no descriptors on the rx ring, that's shown in the vsi counters.
problems talking to the physical network (eg, packet corruption off
the wire) are reported on the port counters.

on top of the chip just being complicated, reading the counters is
a complicated activity on its own. because the counters can be read
by multiple consumers in a virtualised environment, the counters
never really get reset. they are also stored in annoyingly small
fields. this means you basically have to poll the chip periodically
and calculate differences between the polls to avoid losing numbers
if they overflow too quickly.


# 1.59 26-Jun-2020 dlg

whitespace fixes, no functional change.


# 1.58 26-Jun-2020 dlg

fix link state handling so we can see link go both up and down.


# 1.57 25-Jun-2020 dlg

LUT writes go through actual regsiters on 710s, not ctl writes.

found by jmatthew@


# 1.56 25-Jun-2020 dlg

add rss/toeplitz support for 710 chips.

this basically distributes incoming packets over the rx rings, where
without this they would all land on ring 0.

note that the 722 stuff is stubbed out at the moment. i don't have
an x722 to test with, so it's hard to get motivated to write the
code for it.

this is based on stuff supplied by christiano haesbaert.


# 1.55 25-Jun-2020 dlg

use the ixl_chip struct to store different rss_hena settings for 710/722


# 1.54 25-Jun-2020 dlg

add definitions for rss bits.

based on info from christiano haesbaert


# 1.53 25-Jun-2020 dlg

we're close to a point where the differences between 710s and 722s matter.

this adds a struct ixl_chip, which should hold the differences in
functionality between 710s and 722s. this adds which type of chip
each product is to the ixl_devices array.

based on stuff from christiano haesbaert


# 1.52 25-Jun-2020 dlg

use intrmap to set up multiple queues across multiple cpus.

ixl(4) is only enabled on amd64 and sparc64, and both of them now
support pci_intr_establish_cpu(), so it is safe to apply this.

a few things to note:

- the chip only supports a power of 2 number of queues, (ie, 1, 2,
4, 8, etc), so this also tests the INTRMAP_POWEROF2 flag to
intrmap_create. i tested this on a box with 6 cpus and it did the
right thing.
- the chip can support a lot of vectors, but we're limiting it to
8 for now.
- rss with toeplitz is not implemented yet, so all rxed packets end
up on the 0th queue for now.

jmatthew@ had done most of the work already, and christiano haesbaert
provided some hints to motivate me to work on this bit of it.

tested by jmatthew@ on sparc64
ok jmatthew@


# 1.51 24-Jun-2020 dlg

set IFQ_SET_MAXLEN to the number of slots on the tx ring, not 1.

this effectively enables tx mitigation on this chip. hrvoje popovski
tested it and discovered it adds about 20% to forwarding performance
on his test machine, and brings it more in line with ix(4) performance.

jmatthew thinks i copied setting it to 1 from myx, but myx resets
it to a proper value later on when it figures out what the chip is
capable of. how embarrassment.


# 1.50 24-Jun-2020 dlg

get rid of the per device sff lock because we only use the global one.

no functional change


# 1.49 21-Jun-2020 jmatthew

The onboard interfaces on T7/S7 machines don't provide a valid MAC address
for themselves, so use the "local-mac-address" Open Firmware property
instead, as done in ix(4).

ok dlg@


# 1.48 09-May-2020 jmatthew

Use MSI-X interrupts where available. The first vector is used for
events and command completions as that's the only vector they can go to.
tx/rx queues are mapped to subsequent vectors.

ok mpi@ dlg@


Revision tags: OPENBSD_6_7_BASE
# 1.47 22-Apr-2020 mpi

Use I40E_QUEUE_TYPE_EOL instead of hardcoding its value localy.

ok jmatthew@


# 1.46 19-Nov-2019 yasuoka

Correct the link speed constant variables. From Shoichi Yamaguchi.

ok jmatthew


Revision tags: OPENBSD_6_6_BASE
# 1.45 02-Oct-2019 yasuoka

When dequeuing an aqb from the live queue and move it to the idle queue,
remove it from the live queue. Found by Shoichi Yamaguchi.

ok dlg


# 1.44 30-Sep-2019 dlg

use the right values when figuring out which if_baudrate to use.

i was basically using ISSET(bitfield, shift) instead of
ISSET(bitfield, 1 << shift), so things didn't line up properly.

before this the baudrate would come out as 0, but lacp mode in
trunk(4) wants a non-zero if_baudrate value to help pick the best
aggregator to use. if the port reports 0, it doesn't get selected.
this probably explains why trunk(4) doesn't like doing lacp on
virtual ethernet interfaces too, but at least i can get a number
that actually means something on a hardware interface. alternatively,
use aggr(4), which doesn't care about the baudrate.

ok jmatthew@


# 1.43 27-Aug-2019 dlg

don't check the page number was set correctly.

there are some extremely terrible modules out there that violate
specs badly. some SFP+ DACs i have don't support page switching,
and always report page 0xff. let userland look at them anyway.


# 1.42 27-Aug-2019 dlg

make SIOCGIFSFFPAGE support QSFP modules better.

basically the "read phy register" admin op uses the "address" field
on SFP modules for the i2c address, but uses that same field for
the "sff page number" for QSFP. this wraps the calls to the admin
op with sfp and qsfp specific code, and chooses between them depending
on what type of phy the fw reports.

tested by me and jmatthew@ pretty hard


# 1.41 29-Jul-2019 jmatthew

remove some VF bits now that iavf(4) exists

ok dlg@


# 1.40 21-Jul-2019 dlg

apply backpressure when ifiq says the stack is getting busy


# 1.39 04-Jul-2019 jmatthew

when the mac address changes, update the mac/vlan filters accordingly.

ok dlg@


# 1.38 04-May-2019 jsg

fix array bounds check in ixl_search_link_speed()
ok dlg@


# 1.37 15-Apr-2019 visa

Use timeout_del_barrier(9) instead of timeout_del(9) followed by
conditional timeout_barrier(9).

OK kn@ dlg@


Revision tags: OPENBSD_6_5_BASE
# 1.36 10-Apr-2019 phessler

Add support for X722 to ixl(4)

full list of pci ids from sthen@

OK sthen@, jmatthew@, deraadt@


# 1.35 10-Apr-2019 dlg

implement support for SIOCGIFSFFPAGE

this relies on some firmware commands for accessing the i2c bus
that are not available until a relatively recent API version. our
nics using API 1.4 and 1.5 don't handle the command, but the API
1.7 nic we have is happy to talk to the module that is plugged into
it.

xl710 cards (the 40g ones with a qsfp+ connector) can be split up
into 4 functions that represent lanes on a single port. you can get
qsfp+ to 4x sfp+ cables so you can use the different lanes/functions
as completely independent interfaces. however, because each lane/
function is one port and therefore module, we need to serialise
access to the module by at least the port. this is to prevent
concurrent reads of different pages of the one module from stepping
on each other. i took the easy path and made a single ixlsff lock,
which is at least conservative.


# 1.34 01-Apr-2019 jmatthew

Enable the full pre-reset code path again, now that we've fixed the cause
of the pcie errors.

ok dlg@


# 1.33 01-Apr-2019 jmatthew

Don't use a prefetchable mapping for the registers - the controller only
accepts specific read and write widths, so prefetching and write combining
can cause it to generate pcie errors.

ok dlg@


# 1.32 29-Mar-2019 dlg

remove ifiq_barrier in ixl_down cos ifiq tasks don't use nic resources.

ixl_down frees all the dma memory used by the rings, but that memory
isn't used by ifiqs, so there's no need to wait for them to finish


# 1.31 22-Mar-2019 dlg

back out 1.28

i misread the doco, having promisc vlan reception is what we want.


# 1.30 22-Mar-2019 dlg

use a cond in ixl_atq_exec to wait for a command to be completed.

the main change is to not run ixl_atq_done, cos the interrupt handler
is supposed to do that for us.


# 1.29 22-Mar-2019 dlg

don't blindly complete admin queue entries

there's a bit in the flags field that the firmware sets when the
command is done, so check to decide if the command is ready to be
completed. this in turn makes ixl_iff work.

"oh" jmatthew@


# 1.28 21-Mar-2019 dlg

don't set VLAN things when configuring promisc

the doco says vlan things should only be set if you're manipulating
something on a specific vlan, otherwise the config applies to
everything, which is how the stack wants things to work at the
moment.

there's still something wrong in here, but let's get this out of
the way first.


# 1.27 21-Mar-2019 dlg

run event callbacks directly in the atq processing

previously events were queued on an SLIST, but multiple link state
events could fire with the same callback. this corrupted the SLIST
and effectively caused an infinite loop.

ok jmatthew@


# 1.26 12-Mar-2019 jmatthew

Until we can figure out why it causes NMIs on some machines, skip the
pre-reset steps described in Intel's datasheet and also their driver code.

ok dlg@


# 1.25 06-Mar-2019 jmatthew

Ignore rx interrupts if the interface isn't supposed to be running.
In some situations the pxe rom seems to leave rx interrupts
pending, so we get them as soon as we turn interrupts on.
Trying to process rx interrupts before we've allocated an rx ring
leads to crashes.

ok dlg@


# 1.24 01-Mar-2019 dlg

use ifiq_input instead of if_input

call if_rxr_livelocked if ifiq_input says to slow down


# 1.23 26-Feb-2019 dlg

tweak the mbuf loading in the tx path so it's easier to read.

count mbuf load failures as output errors so i can see if that's
a problem (it's not, but at least i can see it isn't now).


# 1.22 26-Feb-2019 dlg

put the rx buffer at the end of the cluster.

makes m_prepend later less likely to allocate a new mbuf.


# 1.21 26-Feb-2019 dlg

avoid a deadlock in ixl_down when calling ifq_barrier.

this is particularly noticable on sparc64 when you reboot.

ok jmatthew@


# 1.20 24-Feb-2019 dlg

get rid of an unused softc member


# 1.19 01-Feb-2019 jmatthew

fix up calculation of our physical function id, making the second port
on dual port cards work much better.

ok dlg@


# 1.18 29-Jan-2019 dlg

get rid of some more debug printfs

suggested by jmatthew@


# 1.17 29-Jan-2019 dlg

don't need to print the base queue number.

ok jmatthew@


# 1.16 22-Jan-2019 jmatthew

Increase hardmtu to the maximum according to the datasheet and set the rx
packet size limit to match so jumbos actually work. Larger packets are
split across multiple buffers on the ring, so the buffers themselves stay
the same size.

ok dlg@


# 1.15 22-Jan-2019 jmatthew

Add and remove mac filters for multicast addresses.

ok dlg@


# 1.14 21-Jan-2019 jmatthew

also add a mac filter that ignores vlans for the broadcast address, so we can
see arp requests on vlans, among other things.

ok dlg@


# 1.13 20-Jan-2019 jmatthew

Replace the default mac filter with one that ignores vlans, and enable
promisc vlan mode so we can see vlan tagged traffic.

ok dlg@


# 1.12 20-Jan-2019 jmatthew

implement ixl_rxrinfo, dynamically allocating the buffer to prepare for
multiple receive rings at some point in the future.

ok dlg@


# 1.11 20-Jan-2019 jmatthew

Handle link state change interrupts by issuing IXL_AQ_OP_PHY_LINK_STATUS
to the admin queue. We don't need to wait for or process the reply,
because the existing admin reply queue processing already does it.

ok dlg@


# 1.10 19-Jan-2019 jmatthew

actually set CAUSE_ENA on the rx and tx queues, and re-enable interrupts
at the start of the interrupt handler. now it works well enough to commit
over.

ok dlg@


# 1.9 18-Jan-2019 jmatthew

pack hmc bits in the right order

ok dlg@


# 1.8 18-Nov-2018 jmatthew

request notification of link state changes, which helps us detect
link when it takes a bit longer to establish.

ok dlg@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.7 21-Dec-2017 dlg

add some ifq and ifiq barriers in ixl_down.

move rxfill in ixl_up so the qtail is only written once.


# 1.6 21-Dec-2017 dlg

now that we have multiple input queues in ifnet structs, use them.

for now we still only have one set of tx and rx rings, but sets us up to
bump that number.


# 1.5 15-Dec-2017 dlg

put where im up to into the tree so jmatthew@ can look at it.


# 1.4 29-Nov-2017 dlg

turns out you dont have to configure a vsi as the default in a veb.

what a waste of two days.


# 1.3 29-Nov-2017 dlg

let this build on sparc64 again.


# 1.2 28-Nov-2017 dlg

remove the #if 0ed out ixl_add_veb now that cvs has backed it up

im pretty sure we dont need it if we're just using the chip as a
single ethernet port.


# 1.1 28-Nov-2017 dlg

add ixl(4) for the "Intel Ethernet 700 Series"

this doesn't work yet, but it very recently got too big to hack on
without cvs to help me manage further changes to it.

ok deraadt@


# 1.81 09-Feb-2022 dlg

enable hw vlan tag handling in the rx path too.

also tested on both x86 and sparc64.


# 1.80 09-Feb-2022 dlg

enable hardware vlan tagging.

tested on sparc64 and x86


# 1.79 08-Feb-2022 dlg

bring back IPv4, TCP4/6 and UDP4/6 checksum offloading.

this was first introduced in r1.176 by jan@. this diff includes two
fixes to that implementation.

the most important one is to parse the ip and tcp headers before a
possible call to m_defrag. if an l4 offload is requested, it's only
requested by the stack when the payload is correctly aligned and
with each header contiguous in memory. this means you can use
m_getptr and cast the packet data to the relevant headers to read
them directly because that's what the stack does when it's working
on them. this makes it cheap to work on them too.

however, if you m_defrag, it ignores the alignment and ends up
making it unsafe to dereference the ip and tcp/udp payloads on
strict alignment architectures. if we want to look at the headers
after m_defrag, we'd likely have to copy them onto the stack first.

the other fix is to reset the offload bits between packets in the
loop in ixl_start.

another difference is that this code skips parsing the packet if
no checksum offload is requested.

tests and a tweak by bluhm@ to actually use the offloading
tested by me on sparc64 and x86 boxes

ok bluhm@ jmatthew@


# 1.78 09-Jan-2022 jsg

spelling
feedback and ok tb@ jmc@ ok ratchov@


# 1.77 27-Nov-2021 deraadt

previous commit causes gcc to perform an unaligned access to the tcphdr
(at least on sparc64) since it accesses the bitfield using an "int sized"
instructions, rather than the minimally sized byte instruction. This is
permitted by the language laywers who probably prefer we change the tcphdr
in every packet. It is not clear how to convince gcc to avoid this behaviour,
and a week of futzing hasn't found fast path solutions yet. In the meantime
the tree may not be broken.


# 1.76 09-Nov-2021 jan

Add IPv4, TCP4/6 and UDP4/6 checksum offloading.

ok jmatthew@


Revision tags: OPENBSD_7_0_BASE
# 1.75 23-Jul-2021 jmatthew

pci_intr_msix_count() is the function that drivers using multiple MSI-X
vectors use to decide whether to use MSI-X, so make it return 0 if MSI
is not enabled for the device.

fixes problems with ix(4) on older amd64 hardware and current riscv64
ok kettenis@ dlg@


Revision tags: OPENBSD_6_9_BASE
# 1.74 26-Mar-2021 jan

Add PCI ID for Intel X710 10G SFP+ NIC

ok patrick@


# 1.73 26-Feb-2021 jan

Add missing PCI product IDs for x710 10GBase-T into ixl(4)

OK phessler


# 1.72 25-Jan-2021 dlg

if the rx descriptor reports the rss hash, use it for the mbuf flowid.

ok jmatthew@


# 1.71 22-Dec-2020 dlg

name the rx rings like ix does for systat mb


# 1.70 12-Dec-2020 jan

Rename the macro MCLGETI to MCLGETL and removes the dead parameter ifp.

OK dlg@, bluhm@
No Opinion mpi@
Not against it claudio@


# 1.69 02-Nov-2020 dlg

add support for ctl reads and writes on 710 chips with API < 1.5

this gets rid of some annoying errors when bringing such an interface
up, but more importantly is allows RSS to work on these boards with
older firmware.

ok jmatthew@


Revision tags: OPENBSD_6_8_BASE
# 1.68 16-Jul-2020 dlg

use a mutex to serialise the test and set of ifp->if_link_state.

this was serialised by NET_LOCK, but now i get link state change
information in an interrupt context, so i shouldn't (can't) do that
anymore.

ok jmatthew@


# 1.67 16-Jul-2020 dlg

sc_atq_mtx is unused, so get rid of it


# 1.66 12-Jul-2020 dlg

it's not an error if the API doesnt support reading sfp/qsfp stuff.

it's just not supported. the manpage says why.

ok sthen@ deraadt@


# 1.65 11-Jul-2020 dlg

don't complain if the API is too low to support "query phy types".

it just means we won't know what to show in ifconfig media output,
but that's not a huge deal. there's still some more issues around
api versions and driver support that we're working on though.

ok sthen@ jmatthew@


# 1.64 10-Jul-2020 patrick

Change users of IFQ_SET_MAXLEN() and IFQ_IS_EMPTY() to use the "new" API.

ok dlg@ tobhe@


# 1.63 07-Jul-2020 dlg

fix a copy pasto.

from netbsd if_ixl.c r1.39 by yamaguchi


# 1.62 07-Jul-2020 dlg

don't try and be too clever in the kstat update timeout.


# 1.61 07-Jul-2020 dlg

remove some old code from a previous version of the kstat diff.

ixl(4) is only enabled on 64bit archs, so we don't need a 32bit
fallback for the 48 and 64 bit counter reads.


# 1.60 07-Jul-2020 dlg

add kstat support for reading hardware counters.

this chip is annoyingly complicated, which is reflected in how
complicated it is to read counters off the chip. while we just use
ixl as a normal network interface, the chip is capable of being a
switch with physical ports, virtual ports and all sorts of other
functionality, and there are counters in different places for all
these different pieces. in our simple setup the driver interface
is mapped to a single physical port which we talk to via a single
virtual switch interface. this diff adds counters on each interface
for the physical port and for the virtual switch interface (vsi).

the port counters show what the hardware is doing, while the vsi
counters show how the driver is interacting with the chip. for
things like packet counters, these numbers tend to correlate strongly,
but there are some differences. if the chip drops packets cos there's
no descriptors on the rx ring, that's shown in the vsi counters.
problems talking to the physical network (eg, packet corruption off
the wire) are reported on the port counters.

on top of the chip just being complicated, reading the counters is
a complicated activity on its own. because the counters can be read
by multiple consumers in a virtualised environment, the counters
never really get reset. they are also stored in annoyingly small
fields. this means you basically have to poll the chip periodically
and calculate differences between the polls to avoid losing numbers
if they overflow too quickly.


# 1.59 26-Jun-2020 dlg

whitespace fixes, no functional change.


# 1.58 26-Jun-2020 dlg

fix link state handling so we can see link go both up and down.


# 1.57 25-Jun-2020 dlg

LUT writes go through actual regsiters on 710s, not ctl writes.

found by jmatthew@


# 1.56 25-Jun-2020 dlg

add rss/toeplitz support for 710 chips.

this basically distributes incoming packets over the rx rings, where
without this they would all land on ring 0.

note that the 722 stuff is stubbed out at the moment. i don't have
an x722 to test with, so it's hard to get motivated to write the
code for it.

this is based on stuff supplied by christiano haesbaert.


# 1.55 25-Jun-2020 dlg

use the ixl_chip struct to store different rss_hena settings for 710/722


# 1.54 25-Jun-2020 dlg

add definitions for rss bits.

based on info from christiano haesbaert


# 1.53 25-Jun-2020 dlg

we're close to a point where the differences between 710s and 722s matter.

this adds a struct ixl_chip, which should hold the differences in
functionality between 710s and 722s. this adds which type of chip
each product is to the ixl_devices array.

based on stuff from christiano haesbaert


# 1.52 25-Jun-2020 dlg

use intrmap to set up multiple queues across multiple cpus.

ixl(4) is only enabled on amd64 and sparc64, and both of them now
support pci_intr_establish_cpu(), so it is safe to apply this.

a few things to note:

- the chip only supports a power of 2 number of queues, (ie, 1, 2,
4, 8, etc), so this also tests the INTRMAP_POWEROF2 flag to
intrmap_create. i tested this on a box with 6 cpus and it did the
right thing.
- the chip can support a lot of vectors, but we're limiting it to
8 for now.
- rss with toeplitz is not implemented yet, so all rxed packets end
up on the 0th queue for now.

jmatthew@ had done most of the work already, and christiano haesbaert
provided some hints to motivate me to work on this bit of it.

tested by jmatthew@ on sparc64
ok jmatthew@


# 1.51 24-Jun-2020 dlg

set IFQ_SET_MAXLEN to the number of slots on the tx ring, not 1.

this effectively enables tx mitigation on this chip. hrvoje popovski
tested it and discovered it adds about 20% to forwarding performance
on his test machine, and brings it more in line with ix(4) performance.

jmatthew thinks i copied setting it to 1 from myx, but myx resets
it to a proper value later on when it figures out what the chip is
capable of. how embarrassment.


# 1.50 24-Jun-2020 dlg

get rid of the per device sff lock because we only use the global one.

no functional change


# 1.49 21-Jun-2020 jmatthew

The onboard interfaces on T7/S7 machines don't provide a valid MAC address
for themselves, so use the "local-mac-address" Open Firmware property
instead, as done in ix(4).

ok dlg@


# 1.48 09-May-2020 jmatthew

Use MSI-X interrupts where available. The first vector is used for
events and command completions as that's the only vector they can go to.
tx/rx queues are mapped to subsequent vectors.

ok mpi@ dlg@


Revision tags: OPENBSD_6_7_BASE
# 1.47 22-Apr-2020 mpi

Use I40E_QUEUE_TYPE_EOL instead of hardcoding its value localy.

ok jmatthew@


# 1.46 19-Nov-2019 yasuoka

Correct the link speed constant variables. From Shoichi Yamaguchi.

ok jmatthew


Revision tags: OPENBSD_6_6_BASE
# 1.45 02-Oct-2019 yasuoka

When dequeuing an aqb from the live queue and move it to the idle queue,
remove it from the live queue. Found by Shoichi Yamaguchi.

ok dlg


# 1.44 30-Sep-2019 dlg

use the right values when figuring out which if_baudrate to use.

i was basically using ISSET(bitfield, shift) instead of
ISSET(bitfield, 1 << shift), so things didn't line up properly.

before this the baudrate would come out as 0, but lacp mode in
trunk(4) wants a non-zero if_baudrate value to help pick the best
aggregator to use. if the port reports 0, it doesn't get selected.
this probably explains why trunk(4) doesn't like doing lacp on
virtual ethernet interfaces too, but at least i can get a number
that actually means something on a hardware interface. alternatively,
use aggr(4), which doesn't care about the baudrate.

ok jmatthew@


# 1.43 27-Aug-2019 dlg

don't check the page number was set correctly.

there are some extremely terrible modules out there that violate
specs badly. some SFP+ DACs i have don't support page switching,
and always report page 0xff. let userland look at them anyway.


# 1.42 27-Aug-2019 dlg

make SIOCGIFSFFPAGE support QSFP modules better.

basically the "read phy register" admin op uses the "address" field
on SFP modules for the i2c address, but uses that same field for
the "sff page number" for QSFP. this wraps the calls to the admin
op with sfp and qsfp specific code, and chooses between them depending
on what type of phy the fw reports.

tested by me and jmatthew@ pretty hard


# 1.41 29-Jul-2019 jmatthew

remove some VF bits now that iavf(4) exists

ok dlg@


# 1.40 21-Jul-2019 dlg

apply backpressure when ifiq says the stack is getting busy


# 1.39 04-Jul-2019 jmatthew

when the mac address changes, update the mac/vlan filters accordingly.

ok dlg@


# 1.38 04-May-2019 jsg

fix array bounds check in ixl_search_link_speed()
ok dlg@


# 1.37 15-Apr-2019 visa

Use timeout_del_barrier(9) instead of timeout_del(9) followed by
conditional timeout_barrier(9).

OK kn@ dlg@


Revision tags: OPENBSD_6_5_BASE
# 1.36 10-Apr-2019 phessler

Add support for X722 to ixl(4)

full list of pci ids from sthen@

OK sthen@, jmatthew@, deraadt@


# 1.35 10-Apr-2019 dlg

implement support for SIOCGIFSFFPAGE

this relies on some firmware commands for accessing the i2c bus
that are not available until a relatively recent API version. our
nics using API 1.4 and 1.5 don't handle the command, but the API
1.7 nic we have is happy to talk to the module that is plugged into
it.

xl710 cards (the 40g ones with a qsfp+ connector) can be split up
into 4 functions that represent lanes on a single port. you can get
qsfp+ to 4x sfp+ cables so you can use the different lanes/functions
as completely independent interfaces. however, because each lane/
function is one port and therefore module, we need to serialise
access to the module by at least the port. this is to prevent
concurrent reads of different pages of the one module from stepping
on each other. i took the easy path and made a single ixlsff lock,
which is at least conservative.


# 1.34 01-Apr-2019 jmatthew

Enable the full pre-reset code path again, now that we've fixed the cause
of the pcie errors.

ok dlg@


# 1.33 01-Apr-2019 jmatthew

Don't use a prefetchable mapping for the registers - the controller only
accepts specific read and write widths, so prefetching and write combining
can cause it to generate pcie errors.

ok dlg@


# 1.32 29-Mar-2019 dlg

remove ifiq_barrier in ixl_down cos ifiq tasks don't use nic resources.

ixl_down frees all the dma memory used by the rings, but that memory
isn't used by ifiqs, so there's no need to wait for them to finish


# 1.31 22-Mar-2019 dlg

back out 1.28

i misread the doco, having promisc vlan reception is what we want.


# 1.30 22-Mar-2019 dlg

use a cond in ixl_atq_exec to wait for a command to be completed.

the main change is to not run ixl_atq_done, cos the interrupt handler
is supposed to do that for us.


# 1.29 22-Mar-2019 dlg

don't blindly complete admin queue entries

there's a bit in the flags field that the firmware sets when the
command is done, so check to decide if the command is ready to be
completed. this in turn makes ixl_iff work.

"oh" jmatthew@


# 1.28 21-Mar-2019 dlg

don't set VLAN things when configuring promisc

the doco says vlan things should only be set if you're manipulating
something on a specific vlan, otherwise the config applies to
everything, which is how the stack wants things to work at the
moment.

there's still something wrong in here, but let's get this out of
the way first.


# 1.27 21-Mar-2019 dlg

run event callbacks directly in the atq processing

previously events were queued on an SLIST, but multiple link state
events could fire with the same callback. this corrupted the SLIST
and effectively caused an infinite loop.

ok jmatthew@


# 1.26 12-Mar-2019 jmatthew

Until we can figure out why it causes NMIs on some machines, skip the
pre-reset steps described in Intel's datasheet and also their driver code.

ok dlg@


# 1.25 06-Mar-2019 jmatthew

Ignore rx interrupts if the interface isn't supposed to be running.
In some situations the pxe rom seems to leave rx interrupts
pending, so we get them as soon as we turn interrupts on.
Trying to process rx interrupts before we've allocated an rx ring
leads to crashes.

ok dlg@


# 1.24 01-Mar-2019 dlg

use ifiq_input instead of if_input

call if_rxr_livelocked if ifiq_input says to slow down


# 1.23 26-Feb-2019 dlg

tweak the mbuf loading in the tx path so it's easier to read.

count mbuf load failures as output errors so i can see if that's
a problem (it's not, but at least i can see it isn't now).


# 1.22 26-Feb-2019 dlg

put the rx buffer at the end of the cluster.

makes m_prepend later less likely to allocate a new mbuf.


# 1.21 26-Feb-2019 dlg

avoid a deadlock in ixl_down when calling ifq_barrier.

this is particularly noticable on sparc64 when you reboot.

ok jmatthew@


# 1.20 24-Feb-2019 dlg

get rid of an unused softc member


# 1.19 01-Feb-2019 jmatthew

fix up calculation of our physical function id, making the second port
on dual port cards work much better.

ok dlg@


# 1.18 29-Jan-2019 dlg

get rid of some more debug printfs

suggested by jmatthew@


# 1.17 29-Jan-2019 dlg

don't need to print the base queue number.

ok jmatthew@


# 1.16 22-Jan-2019 jmatthew

Increase hardmtu to the maximum according to the datasheet and set the rx
packet size limit to match so jumbos actually work. Larger packets are
split across multiple buffers on the ring, so the buffers themselves stay
the same size.

ok dlg@


# 1.15 22-Jan-2019 jmatthew

Add and remove mac filters for multicast addresses.

ok dlg@


# 1.14 21-Jan-2019 jmatthew

also add a mac filter that ignores vlans for the broadcast address, so we can
see arp requests on vlans, among other things.

ok dlg@


# 1.13 20-Jan-2019 jmatthew

Replace the default mac filter with one that ignores vlans, and enable
promisc vlan mode so we can see vlan tagged traffic.

ok dlg@


# 1.12 20-Jan-2019 jmatthew

implement ixl_rxrinfo, dynamically allocating the buffer to prepare for
multiple receive rings at some point in the future.

ok dlg@


# 1.11 20-Jan-2019 jmatthew

Handle link state change interrupts by issuing IXL_AQ_OP_PHY_LINK_STATUS
to the admin queue. We don't need to wait for or process the reply,
because the existing admin reply queue processing already does it.

ok dlg@


# 1.10 19-Jan-2019 jmatthew

actually set CAUSE_ENA on the rx and tx queues, and re-enable interrupts
at the start of the interrupt handler. now it works well enough to commit
over.

ok dlg@


# 1.9 18-Jan-2019 jmatthew

pack hmc bits in the right order

ok dlg@


# 1.8 18-Nov-2018 jmatthew

request notification of link state changes, which helps us detect
link when it takes a bit longer to establish.

ok dlg@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.7 21-Dec-2017 dlg

add some ifq and ifiq barriers in ixl_down.

move rxfill in ixl_up so the qtail is only written once.


# 1.6 21-Dec-2017 dlg

now that we have multiple input queues in ifnet structs, use them.

for now we still only have one set of tx and rx rings, but sets us up to
bump that number.


# 1.5 15-Dec-2017 dlg

put where im up to into the tree so jmatthew@ can look at it.


# 1.4 29-Nov-2017 dlg

turns out you dont have to configure a vsi as the default in a veb.

what a waste of two days.


# 1.3 29-Nov-2017 dlg

let this build on sparc64 again.


# 1.2 28-Nov-2017 dlg

remove the #if 0ed out ixl_add_veb now that cvs has backed it up

im pretty sure we dont need it if we're just using the chip as a
single ethernet port.


# 1.1 28-Nov-2017 dlg

add ixl(4) for the "Intel Ethernet 700 Series"

this doesn't work yet, but it very recently got too big to hack on
without cvs to help me manage further changes to it.

ok deraadt@


# 1.78 09-Jan-2022 jsg

spelling
feedback and ok tb@ jmc@ ok ratchov@


# 1.77 27-Nov-2021 deraadt

previous commit causes gcc to perform an unaligned access to the tcphdr
(at least on sparc64) since it accesses the bitfield using an "int sized"
instructions, rather than the minimally sized byte instruction. This is
permitted by the language laywers who probably prefer we change the tcphdr
in every packet. It is not clear how to convince gcc to avoid this behaviour,
and a week of futzing hasn't found fast path solutions yet. In the meantime
the tree may not be broken.


# 1.76 09-Nov-2021 jan

Add IPv4, TCP4/6 and UDP4/6 checksum offloading.

ok jmatthew@


Revision tags: OPENBSD_7_0_BASE
# 1.75 23-Jul-2021 jmatthew

pci_intr_msix_count() is the function that drivers using multiple MSI-X
vectors use to decide whether to use MSI-X, so make it return 0 if MSI
is not enabled for the device.

fixes problems with ix(4) on older amd64 hardware and current riscv64
ok kettenis@ dlg@


Revision tags: OPENBSD_6_9_BASE
# 1.74 26-Mar-2021 jan

Add PCI ID for Intel X710 10G SFP+ NIC

ok patrick@


# 1.73 26-Feb-2021 jan

Add missing PCI product IDs for x710 10GBase-T into ixl(4)

OK phessler


# 1.72 25-Jan-2021 dlg

if the rx descriptor reports the rss hash, use it for the mbuf flowid.

ok jmatthew@


# 1.71 22-Dec-2020 dlg

name the rx rings like ix does for systat mb


# 1.70 12-Dec-2020 jan

Rename the macro MCLGETI to MCLGETL and removes the dead parameter ifp.

OK dlg@, bluhm@
No Opinion mpi@
Not against it claudio@


# 1.69 02-Nov-2020 dlg

add support for ctl reads and writes on 710 chips with API < 1.5

this gets rid of some annoying errors when bringing such an interface
up, but more importantly is allows RSS to work on these boards with
older firmware.

ok jmatthew@


Revision tags: OPENBSD_6_8_BASE
# 1.68 16-Jul-2020 dlg

use a mutex to serialise the test and set of ifp->if_link_state.

this was serialised by NET_LOCK, but now i get link state change
information in an interrupt context, so i shouldn't (can't) do that
anymore.

ok jmatthew@


# 1.67 16-Jul-2020 dlg

sc_atq_mtx is unused, so get rid of it


# 1.66 12-Jul-2020 dlg

it's not an error if the API doesnt support reading sfp/qsfp stuff.

it's just not supported. the manpage says why.

ok sthen@ deraadt@


# 1.65 11-Jul-2020 dlg

don't complain if the API is too low to support "query phy types".

it just means we won't know what to show in ifconfig media output,
but that's not a huge deal. there's still some more issues around
api versions and driver support that we're working on though.

ok sthen@ jmatthew@


# 1.64 10-Jul-2020 patrick

Change users of IFQ_SET_MAXLEN() and IFQ_IS_EMPTY() to use the "new" API.

ok dlg@ tobhe@


# 1.63 07-Jul-2020 dlg

fix a copy pasto.

from netbsd if_ixl.c r1.39 by yamaguchi


# 1.62 07-Jul-2020 dlg

don't try and be too clever in the kstat update timeout.


# 1.61 07-Jul-2020 dlg

remove some old code from a previous version of the kstat diff.

ixl(4) is only enabled on 64bit archs, so we don't need a 32bit
fallback for the 48 and 64 bit counter reads.


# 1.60 07-Jul-2020 dlg

add kstat support for reading hardware counters.

this chip is annoyingly complicated, which is reflected in how
complicated it is to read counters off the chip. while we just use
ixl as a normal network interface, the chip is capable of being a
switch with physical ports, virtual ports and all sorts of other
functionality, and there are counters in different places for all
these different pieces. in our simple setup the driver interface
is mapped to a single physical port which we talk to via a single
virtual switch interface. this diff adds counters on each interface
for the physical port and for the virtual switch interface (vsi).

the port counters show what the hardware is doing, while the vsi
counters show how the driver is interacting with the chip. for
things like packet counters, these numbers tend to correlate strongly,
but there are some differences. if the chip drops packets cos there's
no descriptors on the rx ring, that's shown in the vsi counters.
problems talking to the physical network (eg, packet corruption off
the wire) are reported on the port counters.

on top of the chip just being complicated, reading the counters is
a complicated activity on its own. because the counters can be read
by multiple consumers in a virtualised environment, the counters
never really get reset. they are also stored in annoyingly small
fields. this means you basically have to poll the chip periodically
and calculate differences between the polls to avoid losing numbers
if they overflow too quickly.


# 1.59 26-Jun-2020 dlg

whitespace fixes, no functional change.


# 1.58 26-Jun-2020 dlg

fix link state handling so we can see link go both up and down.


# 1.57 25-Jun-2020 dlg

LUT writes go through actual regsiters on 710s, not ctl writes.

found by jmatthew@


# 1.56 25-Jun-2020 dlg

add rss/toeplitz support for 710 chips.

this basically distributes incoming packets over the rx rings, where
without this they would all land on ring 0.

note that the 722 stuff is stubbed out at the moment. i don't have
an x722 to test with, so it's hard to get motivated to write the
code for it.

this is based on stuff supplied by christiano haesbaert.


# 1.55 25-Jun-2020 dlg

use the ixl_chip struct to store different rss_hena settings for 710/722


# 1.54 25-Jun-2020 dlg

add definitions for rss bits.

based on info from christiano haesbaert


# 1.53 25-Jun-2020 dlg

we're close to a point where the differences between 710s and 722s matter.

this adds a struct ixl_chip, which should hold the differences in
functionality between 710s and 722s. this adds which type of chip
each product is to the ixl_devices array.

based on stuff from christiano haesbaert


# 1.52 25-Jun-2020 dlg

use intrmap to set up multiple queues across multiple cpus.

ixl(4) is only enabled on amd64 and sparc64, and both of them now
support pci_intr_establish_cpu(), so it is safe to apply this.

a few things to note:

- the chip only supports a power of 2 number of queues, (ie, 1, 2,
4, 8, etc), so this also tests the INTRMAP_POWEROF2 flag to
intrmap_create. i tested this on a box with 6 cpus and it did the
right thing.
- the chip can support a lot of vectors, but we're limiting it to
8 for now.
- rss with toeplitz is not implemented yet, so all rxed packets end
up on the 0th queue for now.

jmatthew@ had done most of the work already, and christiano haesbaert
provided some hints to motivate me to work on this bit of it.

tested by jmatthew@ on sparc64
ok jmatthew@


# 1.51 24-Jun-2020 dlg

set IFQ_SET_MAXLEN to the number of slots on the tx ring, not 1.

this effectively enables tx mitigation on this chip. hrvoje popovski
tested it and discovered it adds about 20% to forwarding performance
on his test machine, and brings it more in line with ix(4) performance.

jmatthew thinks i copied setting it to 1 from myx, but myx resets
it to a proper value later on when it figures out what the chip is
capable of. how embarrassment.


# 1.50 24-Jun-2020 dlg

get rid of the per device sff lock because we only use the global one.

no functional change


# 1.49 21-Jun-2020 jmatthew

The onboard interfaces on T7/S7 machines don't provide a valid MAC address
for themselves, so use the "local-mac-address" Open Firmware property
instead, as done in ix(4).

ok dlg@


# 1.48 09-May-2020 jmatthew

Use MSI-X interrupts where available. The first vector is used for
events and command completions as that's the only vector they can go to.
tx/rx queues are mapped to subsequent vectors.

ok mpi@ dlg@


Revision tags: OPENBSD_6_7_BASE
# 1.47 22-Apr-2020 mpi

Use I40E_QUEUE_TYPE_EOL instead of hardcoding its value localy.

ok jmatthew@


# 1.46 19-Nov-2019 yasuoka

Correct the link speed constant variables. From Shoichi Yamaguchi.

ok jmatthew


Revision tags: OPENBSD_6_6_BASE
# 1.45 02-Oct-2019 yasuoka

When dequeuing an aqb from the live queue and move it to the idle queue,
remove it from the live queue. Found by Shoichi Yamaguchi.

ok dlg


# 1.44 30-Sep-2019 dlg

use the right values when figuring out which if_baudrate to use.

i was basically using ISSET(bitfield, shift) instead of
ISSET(bitfield, 1 << shift), so things didn't line up properly.

before this the baudrate would come out as 0, but lacp mode in
trunk(4) wants a non-zero if_baudrate value to help pick the best
aggregator to use. if the port reports 0, it doesn't get selected.
this probably explains why trunk(4) doesn't like doing lacp on
virtual ethernet interfaces too, but at least i can get a number
that actually means something on a hardware interface. alternatively,
use aggr(4), which doesn't care about the baudrate.

ok jmatthew@


# 1.43 27-Aug-2019 dlg

don't check the page number was set correctly.

there are some extremely terrible modules out there that violate
specs badly. some SFP+ DACs i have don't support page switching,
and always report page 0xff. let userland look at them anyway.


# 1.42 27-Aug-2019 dlg

make SIOCGIFSFFPAGE support QSFP modules better.

basically the "read phy register" admin op uses the "address" field
on SFP modules for the i2c address, but uses that same field for
the "sff page number" for QSFP. this wraps the calls to the admin
op with sfp and qsfp specific code, and chooses between them depending
on what type of phy the fw reports.

tested by me and jmatthew@ pretty hard


# 1.41 29-Jul-2019 jmatthew

remove some VF bits now that iavf(4) exists

ok dlg@


# 1.40 21-Jul-2019 dlg

apply backpressure when ifiq says the stack is getting busy


# 1.39 04-Jul-2019 jmatthew

when the mac address changes, update the mac/vlan filters accordingly.

ok dlg@


# 1.38 04-May-2019 jsg

fix array bounds check in ixl_search_link_speed()
ok dlg@


# 1.37 15-Apr-2019 visa

Use timeout_del_barrier(9) instead of timeout_del(9) followed by
conditional timeout_barrier(9).

OK kn@ dlg@


Revision tags: OPENBSD_6_5_BASE
# 1.36 10-Apr-2019 phessler

Add support for X722 to ixl(4)

full list of pci ids from sthen@

OK sthen@, jmatthew@, deraadt@


# 1.35 10-Apr-2019 dlg

implement support for SIOCGIFSFFPAGE

this relies on some firmware commands for accessing the i2c bus
that are not available until a relatively recent API version. our
nics using API 1.4 and 1.5 don't handle the command, but the API
1.7 nic we have is happy to talk to the module that is plugged into
it.

xl710 cards (the 40g ones with a qsfp+ connector) can be split up
into 4 functions that represent lanes on a single port. you can get
qsfp+ to 4x sfp+ cables so you can use the different lanes/functions
as completely independent interfaces. however, because each lane/
function is one port and therefore module, we need to serialise
access to the module by at least the port. this is to prevent
concurrent reads of different pages of the one module from stepping
on each other. i took the easy path and made a single ixlsff lock,
which is at least conservative.


# 1.34 01-Apr-2019 jmatthew

Enable the full pre-reset code path again, now that we've fixed the cause
of the pcie errors.

ok dlg@


# 1.33 01-Apr-2019 jmatthew

Don't use a prefetchable mapping for the registers - the controller only
accepts specific read and write widths, so prefetching and write combining
can cause it to generate pcie errors.

ok dlg@


# 1.32 29-Mar-2019 dlg

remove ifiq_barrier in ixl_down cos ifiq tasks don't use nic resources.

ixl_down frees all the dma memory used by the rings, but that memory
isn't used by ifiqs, so there's no need to wait for them to finish


# 1.31 22-Mar-2019 dlg

back out 1.28

i misread the doco, having promisc vlan reception is what we want.


# 1.30 22-Mar-2019 dlg

use a cond in ixl_atq_exec to wait for a command to be completed.

the main change is to not run ixl_atq_done, cos the interrupt handler
is supposed to do that for us.


# 1.29 22-Mar-2019 dlg

don't blindly complete admin queue entries

there's a bit in the flags field that the firmware sets when the
command is done, so check to decide if the command is ready to be
completed. this in turn makes ixl_iff work.

"oh" jmatthew@


# 1.28 21-Mar-2019 dlg

don't set VLAN things when configuring promisc

the doco says vlan things should only be set if you're manipulating
something on a specific vlan, otherwise the config applies to
everything, which is how the stack wants things to work at the
moment.

there's still something wrong in here, but let's get this out of
the way first.


# 1.27 21-Mar-2019 dlg

run event callbacks directly in the atq processing

previously events were queued on an SLIST, but multiple link state
events could fire with the same callback. this corrupted the SLIST
and effectively caused an infinite loop.

ok jmatthew@


# 1.26 12-Mar-2019 jmatthew

Until we can figure out why it causes NMIs on some machines, skip the
pre-reset steps described in Intel's datasheet and also their driver code.

ok dlg@


# 1.25 06-Mar-2019 jmatthew

Ignore rx interrupts if the interface isn't supposed to be running.
In some situations the pxe rom seems to leave rx interrupts
pending, so we get them as soon as we turn interrupts on.
Trying to process rx interrupts before we've allocated an rx ring
leads to crashes.

ok dlg@


# 1.24 01-Mar-2019 dlg

use ifiq_input instead of if_input

call if_rxr_livelocked if ifiq_input says to slow down


# 1.23 26-Feb-2019 dlg

tweak the mbuf loading in the tx path so it's easier to read.

count mbuf load failures as output errors so i can see if that's
a problem (it's not, but at least i can see it isn't now).


# 1.22 26-Feb-2019 dlg

put the rx buffer at the end of the cluster.

makes m_prepend later less likely to allocate a new mbuf.


# 1.21 26-Feb-2019 dlg

avoid a deadlock in ixl_down when calling ifq_barrier.

this is particularly noticable on sparc64 when you reboot.

ok jmatthew@


# 1.20 24-Feb-2019 dlg

get rid of an unused softc member


# 1.19 01-Feb-2019 jmatthew

fix up calculation of our physical function id, making the second port
on dual port cards work much better.

ok dlg@


# 1.18 29-Jan-2019 dlg

get rid of some more debug printfs

suggested by jmatthew@


# 1.17 29-Jan-2019 dlg

don't need to print the base queue number.

ok jmatthew@


# 1.16 22-Jan-2019 jmatthew

Increase hardmtu to the maximum according to the datasheet and set the rx
packet size limit to match so jumbos actually work. Larger packets are
split across multiple buffers on the ring, so the buffers themselves stay
the same size.

ok dlg@


# 1.15 22-Jan-2019 jmatthew

Add and remove mac filters for multicast addresses.

ok dlg@


# 1.14 21-Jan-2019 jmatthew

also add a mac filter that ignores vlans for the broadcast address, so we can
see arp requests on vlans, among other things.

ok dlg@


# 1.13 20-Jan-2019 jmatthew

Replace the default mac filter with one that ignores vlans, and enable
promisc vlan mode so we can see vlan tagged traffic.

ok dlg@


# 1.12 20-Jan-2019 jmatthew

implement ixl_rxrinfo, dynamically allocating the buffer to prepare for
multiple receive rings at some point in the future.

ok dlg@


# 1.11 20-Jan-2019 jmatthew

Handle link state change interrupts by issuing IXL_AQ_OP_PHY_LINK_STATUS
to the admin queue. We don't need to wait for or process the reply,
because the existing admin reply queue processing already does it.

ok dlg@


# 1.10 19-Jan-2019 jmatthew

actually set CAUSE_ENA on the rx and tx queues, and re-enable interrupts
at the start of the interrupt handler. now it works well enough to commit
over.

ok dlg@


# 1.9 18-Jan-2019 jmatthew

pack hmc bits in the right order

ok dlg@


# 1.8 18-Nov-2018 jmatthew

request notification of link state changes, which helps us detect
link when it takes a bit longer to establish.

ok dlg@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.7 21-Dec-2017 dlg

add some ifq and ifiq barriers in ixl_down.

move rxfill in ixl_up so the qtail is only written once.


# 1.6 21-Dec-2017 dlg

now that we have multiple input queues in ifnet structs, use them.

for now we still only have one set of tx and rx rings, but sets us up to
bump that number.


# 1.5 15-Dec-2017 dlg

put where im up to into the tree so jmatthew@ can look at it.


# 1.4 29-Nov-2017 dlg

turns out you dont have to configure a vsi as the default in a veb.

what a waste of two days.


# 1.3 29-Nov-2017 dlg

let this build on sparc64 again.


# 1.2 28-Nov-2017 dlg

remove the #if 0ed out ixl_add_veb now that cvs has backed it up

im pretty sure we dont need it if we're just using the chip as a
single ethernet port.


# 1.1 28-Nov-2017 dlg

add ixl(4) for the "Intel Ethernet 700 Series"

this doesn't work yet, but it very recently got too big to hack on
without cvs to help me manage further changes to it.

ok deraadt@


# 1.77 27-Nov-2021 deraadt

previous commit causes gcc to perform an unaligned access to the tcphdr
(at least on sparc64) since it accesses the bitfield using an "int sized"
instructions, rather than the minimally sized byte instruction. This is
permitted by the language laywers who probably prefer we change the tcphdr
in every packet. It is not clear how to convince gcc to avoid this behaviour,
and a week of futzing hasn't found fast path solutions yet. In the meantime
the tree may not be broken.


# 1.76 09-Nov-2021 jan

Add IPv4, TCP4/6 and UDP4/6 checksum offloading.

ok jmatthew@


Revision tags: OPENBSD_7_0_BASE
# 1.75 23-Jul-2021 jmatthew

pci_intr_msix_count() is the function that drivers using multiple MSI-X
vectors use to decide whether to use MSI-X, so make it return 0 if MSI
is not enabled for the device.

fixes problems with ix(4) on older amd64 hardware and current riscv64
ok kettenis@ dlg@


Revision tags: OPENBSD_6_9_BASE
# 1.74 26-Mar-2021 jan

Add PCI ID for Intel X710 10G SFP+ NIC

ok patrick@


# 1.73 26-Feb-2021 jan

Add missing PCI product IDs for x710 10GBase-T into ixl(4)

OK phessler


# 1.72 25-Jan-2021 dlg

if the rx descriptor reports the rss hash, use it for the mbuf flowid.

ok jmatthew@


# 1.71 22-Dec-2020 dlg

name the rx rings like ix does for systat mb


# 1.70 12-Dec-2020 jan

Rename the macro MCLGETI to MCLGETL and removes the dead parameter ifp.

OK dlg@, bluhm@
No Opinion mpi@
Not against it claudio@


# 1.69 02-Nov-2020 dlg

add support for ctl reads and writes on 710 chips with API < 1.5

this gets rid of some annoying errors when bringing such an interface
up, but more importantly is allows RSS to work on these boards with
older firmware.

ok jmatthew@


Revision tags: OPENBSD_6_8_BASE
# 1.68 16-Jul-2020 dlg

use a mutex to serialise the test and set of ifp->if_link_state.

this was serialised by NET_LOCK, but now i get link state change
information in an interrupt context, so i shouldn't (can't) do that
anymore.

ok jmatthew@


# 1.67 16-Jul-2020 dlg

sc_atq_mtx is unused, so get rid of it


# 1.66 12-Jul-2020 dlg

it's not an error if the API doesnt support reading sfp/qsfp stuff.

it's just not supported. the manpage says why.

ok sthen@ deraadt@


# 1.65 11-Jul-2020 dlg

don't complain if the API is too low to support "query phy types".

it just means we won't know what to show in ifconfig media output,
but that's not a huge deal. there's still some more issues around
api versions and driver support that we're working on though.

ok sthen@ jmatthew@


# 1.64 10-Jul-2020 patrick

Change users of IFQ_SET_MAXLEN() and IFQ_IS_EMPTY() to use the "new" API.

ok dlg@ tobhe@


# 1.63 07-Jul-2020 dlg

fix a copy pasto.

from netbsd if_ixl.c r1.39 by yamaguchi


# 1.62 07-Jul-2020 dlg

don't try and be too clever in the kstat update timeout.


# 1.61 07-Jul-2020 dlg

remove some old code from a previous version of the kstat diff.

ixl(4) is only enabled on 64bit archs, so we don't need a 32bit
fallback for the 48 and 64 bit counter reads.


# 1.60 07-Jul-2020 dlg

add kstat support for reading hardware counters.

this chip is annoyingly complicated, which is reflected in how
complicated it is to read counters off the chip. while we just use
ixl as a normal network interface, the chip is capable of being a
switch with physical ports, virtual ports and all sorts of other
functionality, and there are counters in different places for all
these different pieces. in our simple setup the driver interface
is mapped to a single physical port which we talk to via a single
virtual switch interface. this diff adds counters on each interface
for the physical port and for the virtual switch interface (vsi).

the port counters show what the hardware is doing, while the vsi
counters show how the driver is interacting with the chip. for
things like packet counters, these numbers tend to correlate strongly,
but there are some differences. if the chip drops packets cos there's
no descriptors on the rx ring, that's shown in the vsi counters.
problems talking to the physical network (eg, packet corruption off
the wire) are reported on the port counters.

on top of the chip just being complicated, reading the counters is
a complicated activity on its own. because the counters can be read
by multiple consumers in a virtualised environment, the counters
never really get reset. they are also stored in annoyingly small
fields. this means you basically have to poll the chip periodically
and calculate differences between the polls to avoid losing numbers
if they overflow too quickly.


# 1.59 26-Jun-2020 dlg

whitespace fixes, no functional change.


# 1.58 26-Jun-2020 dlg

fix link state handling so we can see link go both up and down.


# 1.57 25-Jun-2020 dlg

LUT writes go through actual regsiters on 710s, not ctl writes.

found by jmatthew@


# 1.56 25-Jun-2020 dlg

add rss/toeplitz support for 710 chips.

this basically distributes incoming packets over the rx rings, where
without this they would all land on ring 0.

note that the 722 stuff is stubbed out at the moment. i don't have
an x722 to test with, so it's hard to get motivated to write the
code for it.

this is based on stuff supplied by christiano haesbaert.


# 1.55 25-Jun-2020 dlg

use the ixl_chip struct to store different rss_hena settings for 710/722


# 1.54 25-Jun-2020 dlg

add definitions for rss bits.

based on info from christiano haesbaert


# 1.53 25-Jun-2020 dlg

we're close to a point where the differences between 710s and 722s matter.

this adds a struct ixl_chip, which should hold the differences in
functionality between 710s and 722s. this adds which type of chip
each product is to the ixl_devices array.

based on stuff from christiano haesbaert


# 1.52 25-Jun-2020 dlg

use intrmap to set up multiple queues across multiple cpus.

ixl(4) is only enabled on amd64 and sparc64, and both of them now
support pci_intr_establish_cpu(), so it is safe to apply this.

a few things to note:

- the chip only supports a power of 2 number of queues, (ie, 1, 2,
4, 8, etc), so this also tests the INTRMAP_POWEROF2 flag to
intrmap_create. i tested this on a box with 6 cpus and it did the
right thing.
- the chip can support a lot of vectors, but we're limiting it to
8 for now.
- rss with toeplitz is not implemented yet, so all rxed packets end
up on the 0th queue for now.

jmatthew@ had done most of the work already, and christiano haesbaert
provided some hints to motivate me to work on this bit of it.

tested by jmatthew@ on sparc64
ok jmatthew@


# 1.51 24-Jun-2020 dlg

set IFQ_SET_MAXLEN to the number of slots on the tx ring, not 1.

this effectively enables tx mitigation on this chip. hrvoje popovski
tested it and discovered it adds about 20% to forwarding performance
on his test machine, and brings it more in line with ix(4) performance.

jmatthew thinks i copied setting it to 1 from myx, but myx resets
it to a proper value later on when it figures out what the chip is
capable of. how embarrassment.


# 1.50 24-Jun-2020 dlg

get rid of the per device sff lock because we only use the global one.

no functional change


# 1.49 21-Jun-2020 jmatthew

The onboard interfaces on T7/S7 machines don't provide a valid MAC address
for themselves, so use the "local-mac-address" Open Firmware property
instead, as done in ix(4).

ok dlg@


# 1.48 09-May-2020 jmatthew

Use MSI-X interrupts where available. The first vector is used for
events and command completions as that's the only vector they can go to.
tx/rx queues are mapped to subsequent vectors.

ok mpi@ dlg@


Revision tags: OPENBSD_6_7_BASE
# 1.47 22-Apr-2020 mpi

Use I40E_QUEUE_TYPE_EOL instead of hardcoding its value localy.

ok jmatthew@


# 1.46 19-Nov-2019 yasuoka

Correct the link speed constant variables. From Shoichi Yamaguchi.

ok jmatthew


Revision tags: OPENBSD_6_6_BASE
# 1.45 02-Oct-2019 yasuoka

When dequeuing an aqb from the live queue and move it to the idle queue,
remove it from the live queue. Found by Shoichi Yamaguchi.

ok dlg


# 1.44 30-Sep-2019 dlg

use the right values when figuring out which if_baudrate to use.

i was basically using ISSET(bitfield, shift) instead of
ISSET(bitfield, 1 << shift), so things didn't line up properly.

before this the baudrate would come out as 0, but lacp mode in
trunk(4) wants a non-zero if_baudrate value to help pick the best
aggregator to use. if the port reports 0, it doesn't get selected.
this probably explains why trunk(4) doesn't like doing lacp on
virtual ethernet interfaces too, but at least i can get a number
that actually means something on a hardware interface. alternatively,
use aggr(4), which doesn't care about the baudrate.

ok jmatthew@


# 1.43 27-Aug-2019 dlg

don't check the page number was set correctly.

there are some extremely terrible modules out there that violate
specs badly. some SFP+ DACs i have don't support page switching,
and always report page 0xff. let userland look at them anyway.


# 1.42 27-Aug-2019 dlg

make SIOCGIFSFFPAGE support QSFP modules better.

basically the "read phy register" admin op uses the "address" field
on SFP modules for the i2c address, but uses that same field for
the "sff page number" for QSFP. this wraps the calls to the admin
op with sfp and qsfp specific code, and chooses between them depending
on what type of phy the fw reports.

tested by me and jmatthew@ pretty hard


# 1.41 29-Jul-2019 jmatthew

remove some VF bits now that iavf(4) exists

ok dlg@


# 1.40 21-Jul-2019 dlg

apply backpressure when ifiq says the stack is getting busy


# 1.39 04-Jul-2019 jmatthew

when the mac address changes, update the mac/vlan filters accordingly.

ok dlg@


# 1.38 04-May-2019 jsg

fix array bounds check in ixl_search_link_speed()
ok dlg@


# 1.37 15-Apr-2019 visa

Use timeout_del_barrier(9) instead of timeout_del(9) followed by
conditional timeout_barrier(9).

OK kn@ dlg@


Revision tags: OPENBSD_6_5_BASE
# 1.36 10-Apr-2019 phessler

Add support for X722 to ixl(4)

full list of pci ids from sthen@

OK sthen@, jmatthew@, deraadt@


# 1.35 10-Apr-2019 dlg

implement support for SIOCGIFSFFPAGE

this relies on some firmware commands for accessing the i2c bus
that are not available until a relatively recent API version. our
nics using API 1.4 and 1.5 don't handle the command, but the API
1.7 nic we have is happy to talk to the module that is plugged into
it.

xl710 cards (the 40g ones with a qsfp+ connector) can be split up
into 4 functions that represent lanes on a single port. you can get
qsfp+ to 4x sfp+ cables so you can use the different lanes/functions
as completely independent interfaces. however, because each lane/
function is one port and therefore module, we need to serialise
access to the module by at least the port. this is to prevent
concurrent reads of different pages of the one module from stepping
on each other. i took the easy path and made a single ixlsff lock,
which is at least conservative.


# 1.34 01-Apr-2019 jmatthew

Enable the full pre-reset code path again, now that we've fixed the cause
of the pcie errors.

ok dlg@


# 1.33 01-Apr-2019 jmatthew

Don't use a prefetchable mapping for the registers - the controller only
accepts specific read and write widths, so prefetching and write combining
can cause it to generate pcie errors.

ok dlg@


# 1.32 29-Mar-2019 dlg

remove ifiq_barrier in ixl_down cos ifiq tasks don't use nic resources.

ixl_down frees all the dma memory used by the rings, but that memory
isn't used by ifiqs, so there's no need to wait for them to finish


# 1.31 22-Mar-2019 dlg

back out 1.28

i misread the doco, having promisc vlan reception is what we want.


# 1.30 22-Mar-2019 dlg

use a cond in ixl_atq_exec to wait for a command to be completed.

the main change is to not run ixl_atq_done, cos the interrupt handler
is supposed to do that for us.


# 1.29 22-Mar-2019 dlg

don't blindly complete admin queue entries

there's a bit in the flags field that the firmware sets when the
command is done, so check to decide if the command is ready to be
completed. this in turn makes ixl_iff work.

"oh" jmatthew@


# 1.28 21-Mar-2019 dlg

don't set VLAN things when configuring promisc

the doco says vlan things should only be set if you're manipulating
something on a specific vlan, otherwise the config applies to
everything, which is how the stack wants things to work at the
moment.

there's still something wrong in here, but let's get this out of
the way first.


# 1.27 21-Mar-2019 dlg

run event callbacks directly in the atq processing

previously events were queued on an SLIST, but multiple link state
events could fire with the same callback. this corrupted the SLIST
and effectively caused an infinite loop.

ok jmatthew@


# 1.26 12-Mar-2019 jmatthew

Until we can figure out why it causes NMIs on some machines, skip the
pre-reset steps described in Intel's datasheet and also their driver code.

ok dlg@


# 1.25 06-Mar-2019 jmatthew

Ignore rx interrupts if the interface isn't supposed to be running.
In some situations the pxe rom seems to leave rx interrupts
pending, so we get them as soon as we turn interrupts on.
Trying to process rx interrupts before we've allocated an rx ring
leads to crashes.

ok dlg@


# 1.24 01-Mar-2019 dlg

use ifiq_input instead of if_input

call if_rxr_livelocked if ifiq_input says to slow down


# 1.23 26-Feb-2019 dlg

tweak the mbuf loading in the tx path so it's easier to read.

count mbuf load failures as output errors so i can see if that's
a problem (it's not, but at least i can see it isn't now).


# 1.22 26-Feb-2019 dlg

put the rx buffer at the end of the cluster.

makes m_prepend later less likely to allocate a new mbuf.


# 1.21 26-Feb-2019 dlg

avoid a deadlock in ixl_down when calling ifq_barrier.

this is particularly noticable on sparc64 when you reboot.

ok jmatthew@


# 1.20 24-Feb-2019 dlg

get rid of an unused softc member


# 1.19 01-Feb-2019 jmatthew

fix up calculation of our physical function id, making the second port
on dual port cards work much better.

ok dlg@


# 1.18 29-Jan-2019 dlg

get rid of some more debug printfs

suggested by jmatthew@


# 1.17 29-Jan-2019 dlg

don't need to print the base queue number.

ok jmatthew@


# 1.16 22-Jan-2019 jmatthew

Increase hardmtu to the maximum according to the datasheet and set the rx
packet size limit to match so jumbos actually work. Larger packets are
split across multiple buffers on the ring, so the buffers themselves stay
the same size.

ok dlg@


# 1.15 22-Jan-2019 jmatthew

Add and remove mac filters for multicast addresses.

ok dlg@


# 1.14 21-Jan-2019 jmatthew

also add a mac filter that ignores vlans for the broadcast address, so we can
see arp requests on vlans, among other things.

ok dlg@


# 1.13 20-Jan-2019 jmatthew

Replace the default mac filter with one that ignores vlans, and enable
promisc vlan mode so we can see vlan tagged traffic.

ok dlg@


# 1.12 20-Jan-2019 jmatthew

implement ixl_rxrinfo, dynamically allocating the buffer to prepare for
multiple receive rings at some point in the future.

ok dlg@


# 1.11 20-Jan-2019 jmatthew

Handle link state change interrupts by issuing IXL_AQ_OP_PHY_LINK_STATUS
to the admin queue. We don't need to wait for or process the reply,
because the existing admin reply queue processing already does it.

ok dlg@


# 1.10 19-Jan-2019 jmatthew

actually set CAUSE_ENA on the rx and tx queues, and re-enable interrupts
at the start of the interrupt handler. now it works well enough to commit
over.

ok dlg@


# 1.9 18-Jan-2019 jmatthew

pack hmc bits in the right order

ok dlg@


# 1.8 18-Nov-2018 jmatthew

request notification of link state changes, which helps us detect
link when it takes a bit longer to establish.

ok dlg@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.7 21-Dec-2017 dlg

add some ifq and ifiq barriers in ixl_down.

move rxfill in ixl_up so the qtail is only written once.


# 1.6 21-Dec-2017 dlg

now that we have multiple input queues in ifnet structs, use them.

for now we still only have one set of tx and rx rings, but sets us up to
bump that number.


# 1.5 15-Dec-2017 dlg

put where im up to into the tree so jmatthew@ can look at it.


# 1.4 29-Nov-2017 dlg

turns out you dont have to configure a vsi as the default in a veb.

what a waste of two days.


# 1.3 29-Nov-2017 dlg

let this build on sparc64 again.


# 1.2 28-Nov-2017 dlg

remove the #if 0ed out ixl_add_veb now that cvs has backed it up

im pretty sure we dont need it if we're just using the chip as a
single ethernet port.


# 1.1 28-Nov-2017 dlg

add ixl(4) for the "Intel Ethernet 700 Series"

this doesn't work yet, but it very recently got too big to hack on
without cvs to help me manage further changes to it.

ok deraadt@


# 1.76 09-Nov-2021 jan

Add IPv4, TCP4/6 and UDP4/6 checksum offloading.

ok jmatthew@


Revision tags: OPENBSD_7_0_BASE
# 1.75 23-Jul-2021 jmatthew

pci_intr_msix_count() is the function that drivers using multiple MSI-X
vectors use to decide whether to use MSI-X, so make it return 0 if MSI
is not enabled for the device.

fixes problems with ix(4) on older amd64 hardware and current riscv64
ok kettenis@ dlg@


Revision tags: OPENBSD_6_9_BASE
# 1.74 26-Mar-2021 jan

Add PCI ID for Intel X710 10G SFP+ NIC

ok patrick@


# 1.73 26-Feb-2021 jan

Add missing PCI product IDs for x710 10GBase-T into ixl(4)

OK phessler


# 1.72 25-Jan-2021 dlg

if the rx descriptor reports the rss hash, use it for the mbuf flowid.

ok jmatthew@


# 1.71 22-Dec-2020 dlg

name the rx rings like ix does for systat mb


# 1.70 12-Dec-2020 jan

Rename the macro MCLGETI to MCLGETL and removes the dead parameter ifp.

OK dlg@, bluhm@
No Opinion mpi@
Not against it claudio@


# 1.69 02-Nov-2020 dlg

add support for ctl reads and writes on 710 chips with API < 1.5

this gets rid of some annoying errors when bringing such an interface
up, but more importantly is allows RSS to work on these boards with
older firmware.

ok jmatthew@


Revision tags: OPENBSD_6_8_BASE
# 1.68 16-Jul-2020 dlg

use a mutex to serialise the test and set of ifp->if_link_state.

this was serialised by NET_LOCK, but now i get link state change
information in an interrupt context, so i shouldn't (can't) do that
anymore.

ok jmatthew@


# 1.67 16-Jul-2020 dlg

sc_atq_mtx is unused, so get rid of it


# 1.66 12-Jul-2020 dlg

it's not an error if the API doesnt support reading sfp/qsfp stuff.

it's just not supported. the manpage says why.

ok sthen@ deraadt@


# 1.65 11-Jul-2020 dlg

don't complain if the API is too low to support "query phy types".

it just means we won't know what to show in ifconfig media output,
but that's not a huge deal. there's still some more issues around
api versions and driver support that we're working on though.

ok sthen@ jmatthew@


# 1.64 10-Jul-2020 patrick

Change users of IFQ_SET_MAXLEN() and IFQ_IS_EMPTY() to use the "new" API.

ok dlg@ tobhe@


# 1.63 07-Jul-2020 dlg

fix a copy pasto.

from netbsd if_ixl.c r1.39 by yamaguchi


# 1.62 07-Jul-2020 dlg

don't try and be too clever in the kstat update timeout.


# 1.61 07-Jul-2020 dlg

remove some old code from a previous version of the kstat diff.

ixl(4) is only enabled on 64bit archs, so we don't need a 32bit
fallback for the 48 and 64 bit counter reads.


# 1.60 07-Jul-2020 dlg

add kstat support for reading hardware counters.

this chip is annoyingly complicated, which is reflected in how
complicated it is to read counters off the chip. while we just use
ixl as a normal network interface, the chip is capable of being a
switch with physical ports, virtual ports and all sorts of other
functionality, and there are counters in different places for all
these different pieces. in our simple setup the driver interface
is mapped to a single physical port which we talk to via a single
virtual switch interface. this diff adds counters on each interface
for the physical port and for the virtual switch interface (vsi).

the port counters show what the hardware is doing, while the vsi
counters show how the driver is interacting with the chip. for
things like packet counters, these numbers tend to correlate strongly,
but there are some differences. if the chip drops packets cos there's
no descriptors on the rx ring, that's shown in the vsi counters.
problems talking to the physical network (eg, packet corruption off
the wire) are reported on the port counters.

on top of the chip just being complicated, reading the counters is
a complicated activity on its own. because the counters can be read
by multiple consumers in a virtualised environment, the counters
never really get reset. they are also stored in annoyingly small
fields. this means you basically have to poll the chip periodically
and calculate differences between the polls to avoid losing numbers
if they overflow too quickly.


# 1.59 26-Jun-2020 dlg

whitespace fixes, no functional change.


# 1.58 26-Jun-2020 dlg

fix link state handling so we can see link go both up and down.


# 1.57 25-Jun-2020 dlg

LUT writes go through actual regsiters on 710s, not ctl writes.

found by jmatthew@


# 1.56 25-Jun-2020 dlg

add rss/toeplitz support for 710 chips.

this basically distributes incoming packets over the rx rings, where
without this they would all land on ring 0.

note that the 722 stuff is stubbed out at the moment. i don't have
an x722 to test with, so it's hard to get motivated to write the
code for it.

this is based on stuff supplied by christiano haesbaert.


# 1.55 25-Jun-2020 dlg

use the ixl_chip struct to store different rss_hena settings for 710/722


# 1.54 25-Jun-2020 dlg

add definitions for rss bits.

based on info from christiano haesbaert


# 1.53 25-Jun-2020 dlg

we're close to a point where the differences between 710s and 722s matter.

this adds a struct ixl_chip, which should hold the differences in
functionality between 710s and 722s. this adds which type of chip
each product is to the ixl_devices array.

based on stuff from christiano haesbaert


# 1.52 25-Jun-2020 dlg

use intrmap to set up multiple queues across multiple cpus.

ixl(4) is only enabled on amd64 and sparc64, and both of them now
support pci_intr_establish_cpu(), so it is safe to apply this.

a few things to note:

- the chip only supports a power of 2 number of queues, (ie, 1, 2,
4, 8, etc), so this also tests the INTRMAP_POWEROF2 flag to
intrmap_create. i tested this on a box with 6 cpus and it did the
right thing.
- the chip can support a lot of vectors, but we're limiting it to
8 for now.
- rss with toeplitz is not implemented yet, so all rxed packets end
up on the 0th queue for now.

jmatthew@ had done most of the work already, and christiano haesbaert
provided some hints to motivate me to work on this bit of it.

tested by jmatthew@ on sparc64
ok jmatthew@


# 1.51 24-Jun-2020 dlg

set IFQ_SET_MAXLEN to the number of slots on the tx ring, not 1.

this effectively enables tx mitigation on this chip. hrvoje popovski
tested it and discovered it adds about 20% to forwarding performance
on his test machine, and brings it more in line with ix(4) performance.

jmatthew thinks i copied setting it to 1 from myx, but myx resets
it to a proper value later on when it figures out what the chip is
capable of. how embarrassment.


# 1.50 24-Jun-2020 dlg

get rid of the per device sff lock because we only use the global one.

no functional change


# 1.49 21-Jun-2020 jmatthew

The onboard interfaces on T7/S7 machines don't provide a valid MAC address
for themselves, so use the "local-mac-address" Open Firmware property
instead, as done in ix(4).

ok dlg@


# 1.48 09-May-2020 jmatthew

Use MSI-X interrupts where available. The first vector is used for
events and command completions as that's the only vector they can go to.
tx/rx queues are mapped to subsequent vectors.

ok mpi@ dlg@


Revision tags: OPENBSD_6_7_BASE
# 1.47 22-Apr-2020 mpi

Use I40E_QUEUE_TYPE_EOL instead of hardcoding its value localy.

ok jmatthew@


# 1.46 19-Nov-2019 yasuoka

Correct the link speed constant variables. From Shoichi Yamaguchi.

ok jmatthew


Revision tags: OPENBSD_6_6_BASE
# 1.45 02-Oct-2019 yasuoka

When dequeuing an aqb from the live queue and move it to the idle queue,
remove it from the live queue. Found by Shoichi Yamaguchi.

ok dlg


# 1.44 30-Sep-2019 dlg

use the right values when figuring out which if_baudrate to use.

i was basically using ISSET(bitfield, shift) instead of
ISSET(bitfield, 1 << shift), so things didn't line up properly.

before this the baudrate would come out as 0, but lacp mode in
trunk(4) wants a non-zero if_baudrate value to help pick the best
aggregator to use. if the port reports 0, it doesn't get selected.
this probably explains why trunk(4) doesn't like doing lacp on
virtual ethernet interfaces too, but at least i can get a number
that actually means something on a hardware interface. alternatively,
use aggr(4), which doesn't care about the baudrate.

ok jmatthew@


# 1.43 27-Aug-2019 dlg

don't check the page number was set correctly.

there are some extremely terrible modules out there that violate
specs badly. some SFP+ DACs i have don't support page switching,
and always report page 0xff. let userland look at them anyway.


# 1.42 27-Aug-2019 dlg

make SIOCGIFSFFPAGE support QSFP modules better.

basically the "read phy register" admin op uses the "address" field
on SFP modules for the i2c address, but uses that same field for
the "sff page number" for QSFP. this wraps the calls to the admin
op with sfp and qsfp specific code, and chooses between them depending
on what type of phy the fw reports.

tested by me and jmatthew@ pretty hard


# 1.41 29-Jul-2019 jmatthew

remove some VF bits now that iavf(4) exists

ok dlg@


# 1.40 21-Jul-2019 dlg

apply backpressure when ifiq says the stack is getting busy


# 1.39 04-Jul-2019 jmatthew

when the mac address changes, update the mac/vlan filters accordingly.

ok dlg@


# 1.38 04-May-2019 jsg

fix array bounds check in ixl_search_link_speed()
ok dlg@


# 1.37 15-Apr-2019 visa

Use timeout_del_barrier(9) instead of timeout_del(9) followed by
conditional timeout_barrier(9).

OK kn@ dlg@


Revision tags: OPENBSD_6_5_BASE
# 1.36 10-Apr-2019 phessler

Add support for X722 to ixl(4)

full list of pci ids from sthen@

OK sthen@, jmatthew@, deraadt@


# 1.35 10-Apr-2019 dlg

implement support for SIOCGIFSFFPAGE

this relies on some firmware commands for accessing the i2c bus
that are not available until a relatively recent API version. our
nics using API 1.4 and 1.5 don't handle the command, but the API
1.7 nic we have is happy to talk to the module that is plugged into
it.

xl710 cards (the 40g ones with a qsfp+ connector) can be split up
into 4 functions that represent lanes on a single port. you can get
qsfp+ to 4x sfp+ cables so you can use the different lanes/functions
as completely independent interfaces. however, because each lane/
function is one port and therefore module, we need to serialise
access to the module by at least the port. this is to prevent
concurrent reads of different pages of the one module from stepping
on each other. i took the easy path and made a single ixlsff lock,
which is at least conservative.


# 1.34 01-Apr-2019 jmatthew

Enable the full pre-reset code path again, now that we've fixed the cause
of the pcie errors.

ok dlg@


# 1.33 01-Apr-2019 jmatthew

Don't use a prefetchable mapping for the registers - the controller only
accepts specific read and write widths, so prefetching and write combining
can cause it to generate pcie errors.

ok dlg@


# 1.32 29-Mar-2019 dlg

remove ifiq_barrier in ixl_down cos ifiq tasks don't use nic resources.

ixl_down frees all the dma memory used by the rings, but that memory
isn't used by ifiqs, so there's no need to wait for them to finish


# 1.31 22-Mar-2019 dlg

back out 1.28

i misread the doco, having promisc vlan reception is what we want.


# 1.30 22-Mar-2019 dlg

use a cond in ixl_atq_exec to wait for a command to be completed.

the main change is to not run ixl_atq_done, cos the interrupt handler
is supposed to do that for us.


# 1.29 22-Mar-2019 dlg

don't blindly complete admin queue entries

there's a bit in the flags field that the firmware sets when the
command is done, so check to decide if the command is ready to be
completed. this in turn makes ixl_iff work.

"oh" jmatthew@


# 1.28 21-Mar-2019 dlg

don't set VLAN things when configuring promisc

the doco says vlan things should only be set if you're manipulating
something on a specific vlan, otherwise the config applies to
everything, which is how the stack wants things to work at the
moment.

there's still something wrong in here, but let's get this out of
the way first.


# 1.27 21-Mar-2019 dlg

run event callbacks directly in the atq processing

previously events were queued on an SLIST, but multiple link state
events could fire with the same callback. this corrupted the SLIST
and effectively caused an infinite loop.

ok jmatthew@


# 1.26 12-Mar-2019 jmatthew

Until we can figure out why it causes NMIs on some machines, skip the
pre-reset steps described in Intel's datasheet and also their driver code.

ok dlg@


# 1.25 06-Mar-2019 jmatthew

Ignore rx interrupts if the interface isn't supposed to be running.
In some situations the pxe rom seems to leave rx interrupts
pending, so we get them as soon as we turn interrupts on.
Trying to process rx interrupts before we've allocated an rx ring
leads to crashes.

ok dlg@


# 1.24 01-Mar-2019 dlg

use ifiq_input instead of if_input

call if_rxr_livelocked if ifiq_input says to slow down


# 1.23 26-Feb-2019 dlg

tweak the mbuf loading in the tx path so it's easier to read.

count mbuf load failures as output errors so i can see if that's
a problem (it's not, but at least i can see it isn't now).


# 1.22 26-Feb-2019 dlg

put the rx buffer at the end of the cluster.

makes m_prepend later less likely to allocate a new mbuf.


# 1.21 26-Feb-2019 dlg

avoid a deadlock in ixl_down when calling ifq_barrier.

this is particularly noticable on sparc64 when you reboot.

ok jmatthew@


# 1.20 24-Feb-2019 dlg

get rid of an unused softc member


# 1.19 01-Feb-2019 jmatthew

fix up calculation of our physical function id, making the second port
on dual port cards work much better.

ok dlg@


# 1.18 29-Jan-2019 dlg

get rid of some more debug printfs

suggested by jmatthew@


# 1.17 29-Jan-2019 dlg

don't need to print the base queue number.

ok jmatthew@


# 1.16 22-Jan-2019 jmatthew

Increase hardmtu to the maximum according to the datasheet and set the rx
packet size limit to match so jumbos actually work. Larger packets are
split across multiple buffers on the ring, so the buffers themselves stay
the same size.

ok dlg@


# 1.15 22-Jan-2019 jmatthew

Add and remove mac filters for multicast addresses.

ok dlg@


# 1.14 21-Jan-2019 jmatthew

also add a mac filter that ignores vlans for the broadcast address, so we can
see arp requests on vlans, among other things.

ok dlg@


# 1.13 20-Jan-2019 jmatthew

Replace the default mac filter with one that ignores vlans, and enable
promisc vlan mode so we can see vlan tagged traffic.

ok dlg@


# 1.12 20-Jan-2019 jmatthew

implement ixl_rxrinfo, dynamically allocating the buffer to prepare for
multiple receive rings at some point in the future.

ok dlg@


# 1.11 20-Jan-2019 jmatthew

Handle link state change interrupts by issuing IXL_AQ_OP_PHY_LINK_STATUS
to the admin queue. We don't need to wait for or process the reply,
because the existing admin reply queue processing already does it.

ok dlg@


# 1.10 19-Jan-2019 jmatthew

actually set CAUSE_ENA on the rx and tx queues, and re-enable interrupts
at the start of the interrupt handler. now it works well enough to commit
over.

ok dlg@


# 1.9 18-Jan-2019 jmatthew

pack hmc bits in the right order

ok dlg@


# 1.8 18-Nov-2018 jmatthew

request notification of link state changes, which helps us detect
link when it takes a bit longer to establish.

ok dlg@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.7 21-Dec-2017 dlg

add some ifq and ifiq barriers in ixl_down.

move rxfill in ixl_up so the qtail is only written once.


# 1.6 21-Dec-2017 dlg

now that we have multiple input queues in ifnet structs, use them.

for now we still only have one set of tx and rx rings, but sets us up to
bump that number.


# 1.5 15-Dec-2017 dlg

put where im up to into the tree so jmatthew@ can look at it.


# 1.4 29-Nov-2017 dlg

turns out you dont have to configure a vsi as the default in a veb.

what a waste of two days.


# 1.3 29-Nov-2017 dlg

let this build on sparc64 again.


# 1.2 28-Nov-2017 dlg

remove the #if 0ed out ixl_add_veb now that cvs has backed it up

im pretty sure we dont need it if we're just using the chip as a
single ethernet port.


# 1.1 28-Nov-2017 dlg

add ixl(4) for the "Intel Ethernet 700 Series"

this doesn't work yet, but it very recently got too big to hack on
without cvs to help me manage further changes to it.

ok deraadt@


# 1.75 23-Jul-2021 jmatthew

pci_intr_msix_count() is the function that drivers using multiple MSI-X
vectors use to decide whether to use MSI-X, so make it return 0 if MSI
is not enabled for the device.

fixes problems with ix(4) on older amd64 hardware and current riscv64
ok kettenis@ dlg@


Revision tags: OPENBSD_6_9_BASE
# 1.74 26-Mar-2021 jan

Add PCI ID for Intel X710 10G SFP+ NIC

ok patrick@


# 1.73 26-Feb-2021 jan

Add missing PCI product IDs for x710 10GBase-T into ixl(4)

OK phessler


# 1.72 25-Jan-2021 dlg

if the rx descriptor reports the rss hash, use it for the mbuf flowid.

ok jmatthew@


# 1.71 22-Dec-2020 dlg

name the rx rings like ix does for systat mb


# 1.70 12-Dec-2020 jan

Rename the macro MCLGETI to MCLGETL and removes the dead parameter ifp.

OK dlg@, bluhm@
No Opinion mpi@
Not against it claudio@


# 1.69 02-Nov-2020 dlg

add support for ctl reads and writes on 710 chips with API < 1.5

this gets rid of some annoying errors when bringing such an interface
up, but more importantly is allows RSS to work on these boards with
older firmware.

ok jmatthew@


Revision tags: OPENBSD_6_8_BASE
# 1.68 16-Jul-2020 dlg

use a mutex to serialise the test and set of ifp->if_link_state.

this was serialised by NET_LOCK, but now i get link state change
information in an interrupt context, so i shouldn't (can't) do that
anymore.

ok jmatthew@


# 1.67 16-Jul-2020 dlg

sc_atq_mtx is unused, so get rid of it


# 1.66 12-Jul-2020 dlg

it's not an error if the API doesnt support reading sfp/qsfp stuff.

it's just not supported. the manpage says why.

ok sthen@ deraadt@


# 1.65 11-Jul-2020 dlg

don't complain if the API is too low to support "query phy types".

it just means we won't know what to show in ifconfig media output,
but that's not a huge deal. there's still some more issues around
api versions and driver support that we're working on though.

ok sthen@ jmatthew@


# 1.64 10-Jul-2020 patrick

Change users of IFQ_SET_MAXLEN() and IFQ_IS_EMPTY() to use the "new" API.

ok dlg@ tobhe@


# 1.63 07-Jul-2020 dlg

fix a copy pasto.

from netbsd if_ixl.c r1.39 by yamaguchi


# 1.62 07-Jul-2020 dlg

don't try and be too clever in the kstat update timeout.


# 1.61 07-Jul-2020 dlg

remove some old code from a previous version of the kstat diff.

ixl(4) is only enabled on 64bit archs, so we don't need a 32bit
fallback for the 48 and 64 bit counter reads.


# 1.60 07-Jul-2020 dlg

add kstat support for reading hardware counters.

this chip is annoyingly complicated, which is reflected in how
complicated it is to read counters off the chip. while we just use
ixl as a normal network interface, the chip is capable of being a
switch with physical ports, virtual ports and all sorts of other
functionality, and there are counters in different places for all
these different pieces. in our simple setup the driver interface
is mapped to a single physical port which we talk to via a single
virtual switch interface. this diff adds counters on each interface
for the physical port and for the virtual switch interface (vsi).

the port counters show what the hardware is doing, while the vsi
counters show how the driver is interacting with the chip. for
things like packet counters, these numbers tend to correlate strongly,
but there are some differences. if the chip drops packets cos there's
no descriptors on the rx ring, that's shown in the vsi counters.
problems talking to the physical network (eg, packet corruption off
the wire) are reported on the port counters.

on top of the chip just being complicated, reading the counters is
a complicated activity on its own. because the counters can be read
by multiple consumers in a virtualised environment, the counters
never really get reset. they are also stored in annoyingly small
fields. this means you basically have to poll the chip periodically
and calculate differences between the polls to avoid losing numbers
if they overflow too quickly.


# 1.59 26-Jun-2020 dlg

whitespace fixes, no functional change.


# 1.58 26-Jun-2020 dlg

fix link state handling so we can see link go both up and down.


# 1.57 25-Jun-2020 dlg

LUT writes go through actual regsiters on 710s, not ctl writes.

found by jmatthew@


# 1.56 25-Jun-2020 dlg

add rss/toeplitz support for 710 chips.

this basically distributes incoming packets over the rx rings, where
without this they would all land on ring 0.

note that the 722 stuff is stubbed out at the moment. i don't have
an x722 to test with, so it's hard to get motivated to write the
code for it.

this is based on stuff supplied by christiano haesbaert.


# 1.55 25-Jun-2020 dlg

use the ixl_chip struct to store different rss_hena settings for 710/722


# 1.54 25-Jun-2020 dlg

add definitions for rss bits.

based on info from christiano haesbaert


# 1.53 25-Jun-2020 dlg

we're close to a point where the differences between 710s and 722s matter.

this adds a struct ixl_chip, which should hold the differences in
functionality between 710s and 722s. this adds which type of chip
each product is to the ixl_devices array.

based on stuff from christiano haesbaert


# 1.52 25-Jun-2020 dlg

use intrmap to set up multiple queues across multiple cpus.

ixl(4) is only enabled on amd64 and sparc64, and both of them now
support pci_intr_establish_cpu(), so it is safe to apply this.

a few things to note:

- the chip only supports a power of 2 number of queues, (ie, 1, 2,
4, 8, etc), so this also tests the INTRMAP_POWEROF2 flag to
intrmap_create. i tested this on a box with 6 cpus and it did the
right thing.
- the chip can support a lot of vectors, but we're limiting it to
8 for now.
- rss with toeplitz is not implemented yet, so all rxed packets end
up on the 0th queue for now.

jmatthew@ had done most of the work already, and christiano haesbaert
provided some hints to motivate me to work on this bit of it.

tested by jmatthew@ on sparc64
ok jmatthew@


# 1.51 24-Jun-2020 dlg

set IFQ_SET_MAXLEN to the number of slots on the tx ring, not 1.

this effectively enables tx mitigation on this chip. hrvoje popovski
tested it and discovered it adds about 20% to forwarding performance
on his test machine, and brings it more in line with ix(4) performance.

jmatthew thinks i copied setting it to 1 from myx, but myx resets
it to a proper value later on when it figures out what the chip is
capable of. how embarrassment.


# 1.50 24-Jun-2020 dlg

get rid of the per device sff lock because we only use the global one.

no functional change


# 1.49 21-Jun-2020 jmatthew

The onboard interfaces on T7/S7 machines don't provide a valid MAC address
for themselves, so use the "local-mac-address" Open Firmware property
instead, as done in ix(4).

ok dlg@


# 1.48 09-May-2020 jmatthew

Use MSI-X interrupts where available. The first vector is used for
events and command completions as that's the only vector they can go to.
tx/rx queues are mapped to subsequent vectors.

ok mpi@ dlg@


Revision tags: OPENBSD_6_7_BASE
# 1.47 22-Apr-2020 mpi

Use I40E_QUEUE_TYPE_EOL instead of hardcoding its value localy.

ok jmatthew@


# 1.46 19-Nov-2019 yasuoka

Correct the link speed constant variables. From Shoichi Yamaguchi.

ok jmatthew


Revision tags: OPENBSD_6_6_BASE
# 1.45 02-Oct-2019 yasuoka

When dequeuing an aqb from the live queue and move it to the idle queue,
remove it from the live queue. Found by Shoichi Yamaguchi.

ok dlg


# 1.44 30-Sep-2019 dlg

use the right values when figuring out which if_baudrate to use.

i was basically using ISSET(bitfield, shift) instead of
ISSET(bitfield, 1 << shift), so things didn't line up properly.

before this the baudrate would come out as 0, but lacp mode in
trunk(4) wants a non-zero if_baudrate value to help pick the best
aggregator to use. if the port reports 0, it doesn't get selected.
this probably explains why trunk(4) doesn't like doing lacp on
virtual ethernet interfaces too, but at least i can get a number
that actually means something on a hardware interface. alternatively,
use aggr(4), which doesn't care about the baudrate.

ok jmatthew@


# 1.43 27-Aug-2019 dlg

don't check the page number was set correctly.

there are some extremely terrible modules out there that violate
specs badly. some SFP+ DACs i have don't support page switching,
and always report page 0xff. let userland look at them anyway.


# 1.42 27-Aug-2019 dlg

make SIOCGIFSFFPAGE support QSFP modules better.

basically the "read phy register" admin op uses the "address" field
on SFP modules for the i2c address, but uses that same field for
the "sff page number" for QSFP. this wraps the calls to the admin
op with sfp and qsfp specific code, and chooses between them depending
on what type of phy the fw reports.

tested by me and jmatthew@ pretty hard


# 1.41 29-Jul-2019 jmatthew

remove some VF bits now that iavf(4) exists

ok dlg@


# 1.40 21-Jul-2019 dlg

apply backpressure when ifiq says the stack is getting busy


# 1.39 04-Jul-2019 jmatthew

when the mac address changes, update the mac/vlan filters accordingly.

ok dlg@


# 1.38 04-May-2019 jsg

fix array bounds check in ixl_search_link_speed()
ok dlg@


# 1.37 15-Apr-2019 visa

Use timeout_del_barrier(9) instead of timeout_del(9) followed by
conditional timeout_barrier(9).

OK kn@ dlg@


Revision tags: OPENBSD_6_5_BASE
# 1.36 10-Apr-2019 phessler

Add support for X722 to ixl(4)

full list of pci ids from sthen@

OK sthen@, jmatthew@, deraadt@


# 1.35 10-Apr-2019 dlg

implement support for SIOCGIFSFFPAGE

this relies on some firmware commands for accessing the i2c bus
that are not available until a relatively recent API version. our
nics using API 1.4 and 1.5 don't handle the command, but the API
1.7 nic we have is happy to talk to the module that is plugged into
it.

xl710 cards (the 40g ones with a qsfp+ connector) can be split up
into 4 functions that represent lanes on a single port. you can get
qsfp+ to 4x sfp+ cables so you can use the different lanes/functions
as completely independent interfaces. however, because each lane/
function is one port and therefore module, we need to serialise
access to the module by at least the port. this is to prevent
concurrent reads of different pages of the one module from stepping
on each other. i took the easy path and made a single ixlsff lock,
which is at least conservative.


# 1.34 01-Apr-2019 jmatthew

Enable the full pre-reset code path again, now that we've fixed the cause
of the pcie errors.

ok dlg@


# 1.33 01-Apr-2019 jmatthew

Don't use a prefetchable mapping for the registers - the controller only
accepts specific read and write widths, so prefetching and write combining
can cause it to generate pcie errors.

ok dlg@


# 1.32 29-Mar-2019 dlg

remove ifiq_barrier in ixl_down cos ifiq tasks don't use nic resources.

ixl_down frees all the dma memory used by the rings, but that memory
isn't used by ifiqs, so there's no need to wait for them to finish


# 1.31 22-Mar-2019 dlg

back out 1.28

i misread the doco, having promisc vlan reception is what we want.


# 1.30 22-Mar-2019 dlg

use a cond in ixl_atq_exec to wait for a command to be completed.

the main change is to not run ixl_atq_done, cos the interrupt handler
is supposed to do that for us.


# 1.29 22-Mar-2019 dlg

don't blindly complete admin queue entries

there's a bit in the flags field that the firmware sets when the
command is done, so check to decide if the command is ready to be
completed. this in turn makes ixl_iff work.

"oh" jmatthew@


# 1.28 21-Mar-2019 dlg

don't set VLAN things when configuring promisc

the doco says vlan things should only be set if you're manipulating
something on a specific vlan, otherwise the config applies to
everything, which is how the stack wants things to work at the
moment.

there's still something wrong in here, but let's get this out of
the way first.


# 1.27 21-Mar-2019 dlg

run event callbacks directly in the atq processing

previously events were queued on an SLIST, but multiple link state
events could fire with the same callback. this corrupted the SLIST
and effectively caused an infinite loop.

ok jmatthew@


# 1.26 12-Mar-2019 jmatthew

Until we can figure out why it causes NMIs on some machines, skip the
pre-reset steps described in Intel's datasheet and also their driver code.

ok dlg@


# 1.25 06-Mar-2019 jmatthew

Ignore rx interrupts if the interface isn't supposed to be running.
In some situations the pxe rom seems to leave rx interrupts
pending, so we get them as soon as we turn interrupts on.
Trying to process rx interrupts before we've allocated an rx ring
leads to crashes.

ok dlg@


# 1.24 01-Mar-2019 dlg

use ifiq_input instead of if_input

call if_rxr_livelocked if ifiq_input says to slow down


# 1.23 26-Feb-2019 dlg

tweak the mbuf loading in the tx path so it's easier to read.

count mbuf load failures as output errors so i can see if that's
a problem (it's not, but at least i can see it isn't now).


# 1.22 26-Feb-2019 dlg

put the rx buffer at the end of the cluster.

makes m_prepend later less likely to allocate a new mbuf.


# 1.21 26-Feb-2019 dlg

avoid a deadlock in ixl_down when calling ifq_barrier.

this is particularly noticable on sparc64 when you reboot.

ok jmatthew@


# 1.20 24-Feb-2019 dlg

get rid of an unused softc member


# 1.19 01-Feb-2019 jmatthew

fix up calculation of our physical function id, making the second port
on dual port cards work much better.

ok dlg@


# 1.18 29-Jan-2019 dlg

get rid of some more debug printfs

suggested by jmatthew@


# 1.17 29-Jan-2019 dlg

don't need to print the base queue number.

ok jmatthew@


# 1.16 22-Jan-2019 jmatthew

Increase hardmtu to the maximum according to the datasheet and set the rx
packet size limit to match so jumbos actually work. Larger packets are
split across multiple buffers on the ring, so the buffers themselves stay
the same size.

ok dlg@


# 1.15 22-Jan-2019 jmatthew

Add and remove mac filters for multicast addresses.

ok dlg@


# 1.14 21-Jan-2019 jmatthew

also add a mac filter that ignores vlans for the broadcast address, so we can
see arp requests on vlans, among other things.

ok dlg@


# 1.13 20-Jan-2019 jmatthew

Replace the default mac filter with one that ignores vlans, and enable
promisc vlan mode so we can see vlan tagged traffic.

ok dlg@


# 1.12 20-Jan-2019 jmatthew

implement ixl_rxrinfo, dynamically allocating the buffer to prepare for
multiple receive rings at some point in the future.

ok dlg@


# 1.11 20-Jan-2019 jmatthew

Handle link state change interrupts by issuing IXL_AQ_OP_PHY_LINK_STATUS
to the admin queue. We don't need to wait for or process the reply,
because the existing admin reply queue processing already does it.

ok dlg@


# 1.10 19-Jan-2019 jmatthew

actually set CAUSE_ENA on the rx and tx queues, and re-enable interrupts
at the start of the interrupt handler. now it works well enough to commit
over.

ok dlg@


# 1.9 18-Jan-2019 jmatthew

pack hmc bits in the right order

ok dlg@


# 1.8 18-Nov-2018 jmatthew

request notification of link state changes, which helps us detect
link when it takes a bit longer to establish.

ok dlg@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.7 21-Dec-2017 dlg

add some ifq and ifiq barriers in ixl_down.

move rxfill in ixl_up so the qtail is only written once.


# 1.6 21-Dec-2017 dlg

now that we have multiple input queues in ifnet structs, use them.

for now we still only have one set of tx and rx rings, but sets us up to
bump that number.


# 1.5 15-Dec-2017 dlg

put where im up to into the tree so jmatthew@ can look at it.


# 1.4 29-Nov-2017 dlg

turns out you dont have to configure a vsi as the default in a veb.

what a waste of two days.


# 1.3 29-Nov-2017 dlg

let this build on sparc64 again.


# 1.2 28-Nov-2017 dlg

remove the #if 0ed out ixl_add_veb now that cvs has backed it up

im pretty sure we dont need it if we're just using the chip as a
single ethernet port.


# 1.1 28-Nov-2017 dlg

add ixl(4) for the "Intel Ethernet 700 Series"

this doesn't work yet, but it very recently got too big to hack on
without cvs to help me manage further changes to it.

ok deraadt@


# 1.74 26-Mar-2021 jan

Add PCI ID for Intel X710 10G SFP+ NIC

ok patrick@


# 1.73 26-Feb-2021 jan

Add missing PCI product IDs for x710 10GBase-T into ixl(4)

OK phessler


# 1.72 25-Jan-2021 dlg

if the rx descriptor reports the rss hash, use it for the mbuf flowid.

ok jmatthew@


# 1.71 22-Dec-2020 dlg

name the rx rings like ix does for systat mb


# 1.70 12-Dec-2020 jan

Rename the macro MCLGETI to MCLGETL and removes the dead parameter ifp.

OK dlg@, bluhm@
No Opinion mpi@
Not against it claudio@


# 1.69 02-Nov-2020 dlg

add support for ctl reads and writes on 710 chips with API < 1.5

this gets rid of some annoying errors when bringing such an interface
up, but more importantly is allows RSS to work on these boards with
older firmware.

ok jmatthew@


Revision tags: OPENBSD_6_8_BASE
# 1.68 16-Jul-2020 dlg

use a mutex to serialise the test and set of ifp->if_link_state.

this was serialised by NET_LOCK, but now i get link state change
information in an interrupt context, so i shouldn't (can't) do that
anymore.

ok jmatthew@


# 1.67 16-Jul-2020 dlg

sc_atq_mtx is unused, so get rid of it


# 1.66 12-Jul-2020 dlg

it's not an error if the API doesnt support reading sfp/qsfp stuff.

it's just not supported. the manpage says why.

ok sthen@ deraadt@


# 1.65 11-Jul-2020 dlg

don't complain if the API is too low to support "query phy types".

it just means we won't know what to show in ifconfig media output,
but that's not a huge deal. there's still some more issues around
api versions and driver support that we're working on though.

ok sthen@ jmatthew@


# 1.64 10-Jul-2020 patrick

Change users of IFQ_SET_MAXLEN() and IFQ_IS_EMPTY() to use the "new" API.

ok dlg@ tobhe@


# 1.63 07-Jul-2020 dlg

fix a copy pasto.

from netbsd if_ixl.c r1.39 by yamaguchi


# 1.62 07-Jul-2020 dlg

don't try and be too clever in the kstat update timeout.


# 1.61 07-Jul-2020 dlg

remove some old code from a previous version of the kstat diff.

ixl(4) is only enabled on 64bit archs, so we don't need a 32bit
fallback for the 48 and 64 bit counter reads.


# 1.60 07-Jul-2020 dlg

add kstat support for reading hardware counters.

this chip is annoyingly complicated, which is reflected in how
complicated it is to read counters off the chip. while we just use
ixl as a normal network interface, the chip is capable of being a
switch with physical ports, virtual ports and all sorts of other
functionality, and there are counters in different places for all
these different pieces. in our simple setup the driver interface
is mapped to a single physical port which we talk to via a single
virtual switch interface. this diff adds counters on each interface
for the physical port and for the virtual switch interface (vsi).

the port counters show what the hardware is doing, while the vsi
counters show how the driver is interacting with the chip. for
things like packet counters, these numbers tend to correlate strongly,
but there are some differences. if the chip drops packets cos there's
no descriptors on the rx ring, that's shown in the vsi counters.
problems talking to the physical network (eg, packet corruption off
the wire) are reported on the port counters.

on top of the chip just being complicated, reading the counters is
a complicated activity on its own. because the counters can be read
by multiple consumers in a virtualised environment, the counters
never really get reset. they are also stored in annoyingly small
fields. this means you basically have to poll the chip periodically
and calculate differences between the polls to avoid losing numbers
if they overflow too quickly.


# 1.59 26-Jun-2020 dlg

whitespace fixes, no functional change.


# 1.58 26-Jun-2020 dlg

fix link state handling so we can see link go both up and down.


# 1.57 25-Jun-2020 dlg

LUT writes go through actual regsiters on 710s, not ctl writes.

found by jmatthew@


# 1.56 25-Jun-2020 dlg

add rss/toeplitz support for 710 chips.

this basically distributes incoming packets over the rx rings, where
without this they would all land on ring 0.

note that the 722 stuff is stubbed out at the moment. i don't have
an x722 to test with, so it's hard to get motivated to write the
code for it.

this is based on stuff supplied by christiano haesbaert.


# 1.55 25-Jun-2020 dlg

use the ixl_chip struct to store different rss_hena settings for 710/722


# 1.54 25-Jun-2020 dlg

add definitions for rss bits.

based on info from christiano haesbaert


# 1.53 25-Jun-2020 dlg

we're close to a point where the differences between 710s and 722s matter.

this adds a struct ixl_chip, which should hold the differences in
functionality between 710s and 722s. this adds which type of chip
each product is to the ixl_devices array.

based on stuff from christiano haesbaert


# 1.52 25-Jun-2020 dlg

use intrmap to set up multiple queues across multiple cpus.

ixl(4) is only enabled on amd64 and sparc64, and both of them now
support pci_intr_establish_cpu(), so it is safe to apply this.

a few things to note:

- the chip only supports a power of 2 number of queues, (ie, 1, 2,
4, 8, etc), so this also tests the INTRMAP_POWEROF2 flag to
intrmap_create. i tested this on a box with 6 cpus and it did the
right thing.
- the chip can support a lot of vectors, but we're limiting it to
8 for now.
- rss with toeplitz is not implemented yet, so all rxed packets end
up on the 0th queue for now.

jmatthew@ had done most of the work already, and christiano haesbaert
provided some hints to motivate me to work on this bit of it.

tested by jmatthew@ on sparc64
ok jmatthew@


# 1.51 24-Jun-2020 dlg

set IFQ_SET_MAXLEN to the number of slots on the tx ring, not 1.

this effectively enables tx mitigation on this chip. hrvoje popovski
tested it and discovered it adds about 20% to forwarding performance
on his test machine, and brings it more in line with ix(4) performance.

jmatthew thinks i copied setting it to 1 from myx, but myx resets
it to a proper value later on when it figures out what the chip is
capable of. how embarrassment.


# 1.50 24-Jun-2020 dlg

get rid of the per device sff lock because we only use the global one.

no functional change


# 1.49 21-Jun-2020 jmatthew

The onboard interfaces on T7/S7 machines don't provide a valid MAC address
for themselves, so use the "local-mac-address" Open Firmware property
instead, as done in ix(4).

ok dlg@


# 1.48 09-May-2020 jmatthew

Use MSI-X interrupts where available. The first vector is used for
events and command completions as that's the only vector they can go to.
tx/rx queues are mapped to subsequent vectors.

ok mpi@ dlg@


Revision tags: OPENBSD_6_7_BASE
# 1.47 22-Apr-2020 mpi

Use I40E_QUEUE_TYPE_EOL instead of hardcoding its value localy.

ok jmatthew@


# 1.46 19-Nov-2019 yasuoka

Correct the link speed constant variables. From Shoichi Yamaguchi.

ok jmatthew


Revision tags: OPENBSD_6_6_BASE
# 1.45 02-Oct-2019 yasuoka

When dequeuing an aqb from the live queue and move it to the idle queue,
remove it from the live queue. Found by Shoichi Yamaguchi.

ok dlg


# 1.44 30-Sep-2019 dlg

use the right values when figuring out which if_baudrate to use.

i was basically using ISSET(bitfield, shift) instead of
ISSET(bitfield, 1 << shift), so things didn't line up properly.

before this the baudrate would come out as 0, but lacp mode in
trunk(4) wants a non-zero if_baudrate value to help pick the best
aggregator to use. if the port reports 0, it doesn't get selected.
this probably explains why trunk(4) doesn't like doing lacp on
virtual ethernet interfaces too, but at least i can get a number
that actually means something on a hardware interface. alternatively,
use aggr(4), which doesn't care about the baudrate.

ok jmatthew@


# 1.43 27-Aug-2019 dlg

don't check the page number was set correctly.

there are some extremely terrible modules out there that violate
specs badly. some SFP+ DACs i have don't support page switching,
and always report page 0xff. let userland look at them anyway.


# 1.42 27-Aug-2019 dlg

make SIOCGIFSFFPAGE support QSFP modules better.

basically the "read phy register" admin op uses the "address" field
on SFP modules for the i2c address, but uses that same field for
the "sff page number" for QSFP. this wraps the calls to the admin
op with sfp and qsfp specific code, and chooses between them depending
on what type of phy the fw reports.

tested by me and jmatthew@ pretty hard


# 1.41 29-Jul-2019 jmatthew

remove some VF bits now that iavf(4) exists

ok dlg@


# 1.40 21-Jul-2019 dlg

apply backpressure when ifiq says the stack is getting busy


# 1.39 04-Jul-2019 jmatthew

when the mac address changes, update the mac/vlan filters accordingly.

ok dlg@


# 1.38 04-May-2019 jsg

fix array bounds check in ixl_search_link_speed()
ok dlg@


# 1.37 15-Apr-2019 visa

Use timeout_del_barrier(9) instead of timeout_del(9) followed by
conditional timeout_barrier(9).

OK kn@ dlg@


Revision tags: OPENBSD_6_5_BASE
# 1.36 10-Apr-2019 phessler

Add support for X722 to ixl(4)

full list of pci ids from sthen@

OK sthen@, jmatthew@, deraadt@


# 1.35 10-Apr-2019 dlg

implement support for SIOCGIFSFFPAGE

this relies on some firmware commands for accessing the i2c bus
that are not available until a relatively recent API version. our
nics using API 1.4 and 1.5 don't handle the command, but the API
1.7 nic we have is happy to talk to the module that is plugged into
it.

xl710 cards (the 40g ones with a qsfp+ connector) can be split up
into 4 functions that represent lanes on a single port. you can get
qsfp+ to 4x sfp+ cables so you can use the different lanes/functions
as completely independent interfaces. however, because each lane/
function is one port and therefore module, we need to serialise
access to the module by at least the port. this is to prevent
concurrent reads of different pages of the one module from stepping
on each other. i took the easy path and made a single ixlsff lock,
which is at least conservative.


# 1.34 01-Apr-2019 jmatthew

Enable the full pre-reset code path again, now that we've fixed the cause
of the pcie errors.

ok dlg@


# 1.33 01-Apr-2019 jmatthew

Don't use a prefetchable mapping for the registers - the controller only
accepts specific read and write widths, so prefetching and write combining
can cause it to generate pcie errors.

ok dlg@


# 1.32 29-Mar-2019 dlg

remove ifiq_barrier in ixl_down cos ifiq tasks don't use nic resources.

ixl_down frees all the dma memory used by the rings, but that memory
isn't used by ifiqs, so there's no need to wait for them to finish


# 1.31 22-Mar-2019 dlg

back out 1.28

i misread the doco, having promisc vlan reception is what we want.


# 1.30 22-Mar-2019 dlg

use a cond in ixl_atq_exec to wait for a command to be completed.

the main change is to not run ixl_atq_done, cos the interrupt handler
is supposed to do that for us.


# 1.29 22-Mar-2019 dlg

don't blindly complete admin queue entries

there's a bit in the flags field that the firmware sets when the
command is done, so check to decide if the command is ready to be
completed. this in turn makes ixl_iff work.

"oh" jmatthew@


# 1.28 21-Mar-2019 dlg

don't set VLAN things when configuring promisc

the doco says vlan things should only be set if you're manipulating
something on a specific vlan, otherwise the config applies to
everything, which is how the stack wants things to work at the
moment.

there's still something wrong in here, but let's get this out of
the way first.


# 1.27 21-Mar-2019 dlg

run event callbacks directly in the atq processing

previously events were queued on an SLIST, but multiple link state
events could fire with the same callback. this corrupted the SLIST
and effectively caused an infinite loop.

ok jmatthew@


# 1.26 12-Mar-2019 jmatthew

Until we can figure out why it causes NMIs on some machines, skip the
pre-reset steps described in Intel's datasheet and also their driver code.

ok dlg@


# 1.25 06-Mar-2019 jmatthew

Ignore rx interrupts if the interface isn't supposed to be running.
In some situations the pxe rom seems to leave rx interrupts
pending, so we get them as soon as we turn interrupts on.
Trying to process rx interrupts before we've allocated an rx ring
leads to crashes.

ok dlg@


# 1.24 01-Mar-2019 dlg

use ifiq_input instead of if_input

call if_rxr_livelocked if ifiq_input says to slow down


# 1.23 26-Feb-2019 dlg

tweak the mbuf loading in the tx path so it's easier to read.

count mbuf load failures as output errors so i can see if that's
a problem (it's not, but at least i can see it isn't now).


# 1.22 26-Feb-2019 dlg

put the rx buffer at the end of the cluster.

makes m_prepend later less likely to allocate a new mbuf.


# 1.21 26-Feb-2019 dlg

avoid a deadlock in ixl_down when calling ifq_barrier.

this is particularly noticable on sparc64 when you reboot.

ok jmatthew@


# 1.20 24-Feb-2019 dlg

get rid of an unused softc member


# 1.19 01-Feb-2019 jmatthew

fix up calculation of our physical function id, making the second port
on dual port cards work much better.

ok dlg@


# 1.18 29-Jan-2019 dlg

get rid of some more debug printfs

suggested by jmatthew@


# 1.17 29-Jan-2019 dlg

don't need to print the base queue number.

ok jmatthew@


# 1.16 22-Jan-2019 jmatthew

Increase hardmtu to the maximum according to the datasheet and set the rx
packet size limit to match so jumbos actually work. Larger packets are
split across multiple buffers on the ring, so the buffers themselves stay
the same size.

ok dlg@


# 1.15 22-Jan-2019 jmatthew

Add and remove mac filters for multicast addresses.

ok dlg@


# 1.14 21-Jan-2019 jmatthew

also add a mac filter that ignores vlans for the broadcast address, so we can
see arp requests on vlans, among other things.

ok dlg@


# 1.13 20-Jan-2019 jmatthew

Replace the default mac filter with one that ignores vlans, and enable
promisc vlan mode so we can see vlan tagged traffic.

ok dlg@


# 1.12 20-Jan-2019 jmatthew

implement ixl_rxrinfo, dynamically allocating the buffer to prepare for
multiple receive rings at some point in the future.

ok dlg@


# 1.11 20-Jan-2019 jmatthew

Handle link state change interrupts by issuing IXL_AQ_OP_PHY_LINK_STATUS
to the admin queue. We don't need to wait for or process the reply,
because the existing admin reply queue processing already does it.

ok dlg@


# 1.10 19-Jan-2019 jmatthew

actually set CAUSE_ENA on the rx and tx queues, and re-enable interrupts
at the start of the interrupt handler. now it works well enough to commit
over.

ok dlg@


# 1.9 18-Jan-2019 jmatthew

pack hmc bits in the right order

ok dlg@


# 1.8 18-Nov-2018 jmatthew

request notification of link state changes, which helps us detect
link when it takes a bit longer to establish.

ok dlg@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.7 21-Dec-2017 dlg

add some ifq and ifiq barriers in ixl_down.

move rxfill in ixl_up so the qtail is only written once.


# 1.6 21-Dec-2017 dlg

now that we have multiple input queues in ifnet structs, use them.

for now we still only have one set of tx and rx rings, but sets us up to
bump that number.


# 1.5 15-Dec-2017 dlg

put where im up to into the tree so jmatthew@ can look at it.


# 1.4 29-Nov-2017 dlg

turns out you dont have to configure a vsi as the default in a veb.

what a waste of two days.


# 1.3 29-Nov-2017 dlg

let this build on sparc64 again.


# 1.2 28-Nov-2017 dlg

remove the #if 0ed out ixl_add_veb now that cvs has backed it up

im pretty sure we dont need it if we're just using the chip as a
single ethernet port.


# 1.1 28-Nov-2017 dlg

add ixl(4) for the "Intel Ethernet 700 Series"

this doesn't work yet, but it very recently got too big to hack on
without cvs to help me manage further changes to it.

ok deraadt@


# 1.73 26-Feb-2021 jan

Add missing PCI product IDs for x710 10GBase-T into ixl(4)

OK phessler


# 1.72 25-Jan-2021 dlg

if the rx descriptor reports the rss hash, use it for the mbuf flowid.

ok jmatthew@


# 1.71 22-Dec-2020 dlg

name the rx rings like ix does for systat mb


# 1.70 12-Dec-2020 jan

Rename the macro MCLGETI to MCLGETL and removes the dead parameter ifp.

OK dlg@, bluhm@
No Opinion mpi@
Not against it claudio@


# 1.69 02-Nov-2020 dlg

add support for ctl reads and writes on 710 chips with API < 1.5

this gets rid of some annoying errors when bringing such an interface
up, but more importantly is allows RSS to work on these boards with
older firmware.

ok jmatthew@


Revision tags: OPENBSD_6_8_BASE
# 1.68 16-Jul-2020 dlg

use a mutex to serialise the test and set of ifp->if_link_state.

this was serialised by NET_LOCK, but now i get link state change
information in an interrupt context, so i shouldn't (can't) do that
anymore.

ok jmatthew@


# 1.67 16-Jul-2020 dlg

sc_atq_mtx is unused, so get rid of it


# 1.66 12-Jul-2020 dlg

it's not an error if the API doesnt support reading sfp/qsfp stuff.

it's just not supported. the manpage says why.

ok sthen@ deraadt@


# 1.65 11-Jul-2020 dlg

don't complain if the API is too low to support "query phy types".

it just means we won't know what to show in ifconfig media output,
but that's not a huge deal. there's still some more issues around
api versions and driver support that we're working on though.

ok sthen@ jmatthew@


# 1.64 10-Jul-2020 patrick

Change users of IFQ_SET_MAXLEN() and IFQ_IS_EMPTY() to use the "new" API.

ok dlg@ tobhe@


# 1.63 07-Jul-2020 dlg

fix a copy pasto.

from netbsd if_ixl.c r1.39 by yamaguchi


# 1.62 07-Jul-2020 dlg

don't try and be too clever in the kstat update timeout.


# 1.61 07-Jul-2020 dlg

remove some old code from a previous version of the kstat diff.

ixl(4) is only enabled on 64bit archs, so we don't need a 32bit
fallback for the 48 and 64 bit counter reads.


# 1.60 07-Jul-2020 dlg

add kstat support for reading hardware counters.

this chip is annoyingly complicated, which is reflected in how
complicated it is to read counters off the chip. while we just use
ixl as a normal network interface, the chip is capable of being a
switch with physical ports, virtual ports and all sorts of other
functionality, and there are counters in different places for all
these different pieces. in our simple setup the driver interface
is mapped to a single physical port which we talk to via a single
virtual switch interface. this diff adds counters on each interface
for the physical port and for the virtual switch interface (vsi).

the port counters show what the hardware is doing, while the vsi
counters show how the driver is interacting with the chip. for
things like packet counters, these numbers tend to correlate strongly,
but there are some differences. if the chip drops packets cos there's
no descriptors on the rx ring, that's shown in the vsi counters.
problems talking to the physical network (eg, packet corruption off
the wire) are reported on the port counters.

on top of the chip just being complicated, reading the counters is
a complicated activity on its own. because the counters can be read
by multiple consumers in a virtualised environment, the counters
never really get reset. they are also stored in annoyingly small
fields. this means you basically have to poll the chip periodically
and calculate differences between the polls to avoid losing numbers
if they overflow too quickly.


# 1.59 26-Jun-2020 dlg

whitespace fixes, no functional change.


# 1.58 26-Jun-2020 dlg

fix link state handling so we can see link go both up and down.


# 1.57 25-Jun-2020 dlg

LUT writes go through actual regsiters on 710s, not ctl writes.

found by jmatthew@


# 1.56 25-Jun-2020 dlg

add rss/toeplitz support for 710 chips.

this basically distributes incoming packets over the rx rings, where
without this they would all land on ring 0.

note that the 722 stuff is stubbed out at the moment. i don't have
an x722 to test with, so it's hard to get motivated to write the
code for it.

this is based on stuff supplied by christiano haesbaert.


# 1.55 25-Jun-2020 dlg

use the ixl_chip struct to store different rss_hena settings for 710/722


# 1.54 25-Jun-2020 dlg

add definitions for rss bits.

based on info from christiano haesbaert


# 1.53 25-Jun-2020 dlg

we're close to a point where the differences between 710s and 722s matter.

this adds a struct ixl_chip, which should hold the differences in
functionality between 710s and 722s. this adds which type of chip
each product is to the ixl_devices array.

based on stuff from christiano haesbaert


# 1.52 25-Jun-2020 dlg

use intrmap to set up multiple queues across multiple cpus.

ixl(4) is only enabled on amd64 and sparc64, and both of them now
support pci_intr_establish_cpu(), so it is safe to apply this.

a few things to note:

- the chip only supports a power of 2 number of queues, (ie, 1, 2,
4, 8, etc), so this also tests the INTRMAP_POWEROF2 flag to
intrmap_create. i tested this on a box with 6 cpus and it did the
right thing.
- the chip can support a lot of vectors, but we're limiting it to
8 for now.
- rss with toeplitz is not implemented yet, so all rxed packets end
up on the 0th queue for now.

jmatthew@ had done most of the work already, and christiano haesbaert
provided some hints to motivate me to work on this bit of it.

tested by jmatthew@ on sparc64
ok jmatthew@


# 1.51 24-Jun-2020 dlg

set IFQ_SET_MAXLEN to the number of slots on the tx ring, not 1.

this effectively enables tx mitigation on this chip. hrvoje popovski
tested it and discovered it adds about 20% to forwarding performance
on his test machine, and brings it more in line with ix(4) performance.

jmatthew thinks i copied setting it to 1 from myx, but myx resets
it to a proper value later on when it figures out what the chip is
capable of. how embarrassment.


# 1.50 24-Jun-2020 dlg

get rid of the per device sff lock because we only use the global one.

no functional change


# 1.49 21-Jun-2020 jmatthew

The onboard interfaces on T7/S7 machines don't provide a valid MAC address
for themselves, so use the "local-mac-address" Open Firmware property
instead, as done in ix(4).

ok dlg@


# 1.48 09-May-2020 jmatthew

Use MSI-X interrupts where available. The first vector is used for
events and command completions as that's the only vector they can go to.
tx/rx queues are mapped to subsequent vectors.

ok mpi@ dlg@


Revision tags: OPENBSD_6_7_BASE
# 1.47 22-Apr-2020 mpi

Use I40E_QUEUE_TYPE_EOL instead of hardcoding its value localy.

ok jmatthew@


# 1.46 19-Nov-2019 yasuoka

Correct the link speed constant variables. From Shoichi Yamaguchi.

ok jmatthew


Revision tags: OPENBSD_6_6_BASE
# 1.45 02-Oct-2019 yasuoka

When dequeuing an aqb from the live queue and move it to the idle queue,
remove it from the live queue. Found by Shoichi Yamaguchi.

ok dlg


# 1.44 30-Sep-2019 dlg

use the right values when figuring out which if_baudrate to use.

i was basically using ISSET(bitfield, shift) instead of
ISSET(bitfield, 1 << shift), so things didn't line up properly.

before this the baudrate would come out as 0, but lacp mode in
trunk(4) wants a non-zero if_baudrate value to help pick the best
aggregator to use. if the port reports 0, it doesn't get selected.
this probably explains why trunk(4) doesn't like doing lacp on
virtual ethernet interfaces too, but at least i can get a number
that actually means something on a hardware interface. alternatively,
use aggr(4), which doesn't care about the baudrate.

ok jmatthew@


# 1.43 27-Aug-2019 dlg

don't check the page number was set correctly.

there are some extremely terrible modules out there that violate
specs badly. some SFP+ DACs i have don't support page switching,
and always report page 0xff. let userland look at them anyway.


# 1.42 27-Aug-2019 dlg

make SIOCGIFSFFPAGE support QSFP modules better.

basically the "read phy register" admin op uses the "address" field
on SFP modules for the i2c address, but uses that same field for
the "sff page number" for QSFP. this wraps the calls to the admin
op with sfp and qsfp specific code, and chooses between them depending
on what type of phy the fw reports.

tested by me and jmatthew@ pretty hard


# 1.41 29-Jul-2019 jmatthew

remove some VF bits now that iavf(4) exists

ok dlg@


# 1.40 21-Jul-2019 dlg

apply backpressure when ifiq says the stack is getting busy


# 1.39 04-Jul-2019 jmatthew

when the mac address changes, update the mac/vlan filters accordingly.

ok dlg@


# 1.38 04-May-2019 jsg

fix array bounds check in ixl_search_link_speed()
ok dlg@


# 1.37 15-Apr-2019 visa

Use timeout_del_barrier(9) instead of timeout_del(9) followed by
conditional timeout_barrier(9).

OK kn@ dlg@


Revision tags: OPENBSD_6_5_BASE
# 1.36 10-Apr-2019 phessler

Add support for X722 to ixl(4)

full list of pci ids from sthen@

OK sthen@, jmatthew@, deraadt@


# 1.35 10-Apr-2019 dlg

implement support for SIOCGIFSFFPAGE

this relies on some firmware commands for accessing the i2c bus
that are not available until a relatively recent API version. our
nics using API 1.4 and 1.5 don't handle the command, but the API
1.7 nic we have is happy to talk to the module that is plugged into
it.

xl710 cards (the 40g ones with a qsfp+ connector) can be split up
into 4 functions that represent lanes on a single port. you can get
qsfp+ to 4x sfp+ cables so you can use the different lanes/functions
as completely independent interfaces. however, because each lane/
function is one port and therefore module, we need to serialise
access to the module by at least the port. this is to prevent
concurrent reads of different pages of the one module from stepping
on each other. i took the easy path and made a single ixlsff lock,
which is at least conservative.


# 1.34 01-Apr-2019 jmatthew

Enable the full pre-reset code path again, now that we've fixed the cause
of the pcie errors.

ok dlg@


# 1.33 01-Apr-2019 jmatthew

Don't use a prefetchable mapping for the registers - the controller only
accepts specific read and write widths, so prefetching and write combining
can cause it to generate pcie errors.

ok dlg@


# 1.32 29-Mar-2019 dlg

remove ifiq_barrier in ixl_down cos ifiq tasks don't use nic resources.

ixl_down frees all the dma memory used by the rings, but that memory
isn't used by ifiqs, so there's no need to wait for them to finish


# 1.31 22-Mar-2019 dlg

back out 1.28

i misread the doco, having promisc vlan reception is what we want.


# 1.30 22-Mar-2019 dlg

use a cond in ixl_atq_exec to wait for a command to be completed.

the main change is to not run ixl_atq_done, cos the interrupt handler
is supposed to do that for us.


# 1.29 22-Mar-2019 dlg

don't blindly complete admin queue entries

there's a bit in the flags field that the firmware sets when the
command is done, so check to decide if the command is ready to be
completed. this in turn makes ixl_iff work.

"oh" jmatthew@


# 1.28 21-Mar-2019 dlg

don't set VLAN things when configuring promisc

the doco says vlan things should only be set if you're manipulating
something on a specific vlan, otherwise the config applies to
everything, which is how the stack wants things to work at the
moment.

there's still something wrong in here, but let's get this out of
the way first.


# 1.27 21-Mar-2019 dlg

run event callbacks directly in the atq processing

previously events were queued on an SLIST, but multiple link state
events could fire with the same callback. this corrupted the SLIST
and effectively caused an infinite loop.

ok jmatthew@


# 1.26 12-Mar-2019 jmatthew

Until we can figure out why it causes NMIs on some machines, skip the
pre-reset steps described in Intel's datasheet and also their driver code.

ok dlg@


# 1.25 06-Mar-2019 jmatthew

Ignore rx interrupts if the interface isn't supposed to be running.
In some situations the pxe rom seems to leave rx interrupts
pending, so we get them as soon as we turn interrupts on.
Trying to process rx interrupts before we've allocated an rx ring
leads to crashes.

ok dlg@


# 1.24 01-Mar-2019 dlg

use ifiq_input instead of if_input

call if_rxr_livelocked if ifiq_input says to slow down


# 1.23 26-Feb-2019 dlg

tweak the mbuf loading in the tx path so it's easier to read.

count mbuf load failures as output errors so i can see if that's
a problem (it's not, but at least i can see it isn't now).


# 1.22 26-Feb-2019 dlg

put the rx buffer at the end of the cluster.

makes m_prepend later less likely to allocate a new mbuf.


# 1.21 26-Feb-2019 dlg

avoid a deadlock in ixl_down when calling ifq_barrier.

this is particularly noticable on sparc64 when you reboot.

ok jmatthew@


# 1.20 24-Feb-2019 dlg

get rid of an unused softc member


# 1.19 01-Feb-2019 jmatthew

fix up calculation of our physical function id, making the second port
on dual port cards work much better.

ok dlg@


# 1.18 29-Jan-2019 dlg

get rid of some more debug printfs

suggested by jmatthew@


# 1.17 29-Jan-2019 dlg

don't need to print the base queue number.

ok jmatthew@


# 1.16 22-Jan-2019 jmatthew

Increase hardmtu to the maximum according to the datasheet and set the rx
packet size limit to match so jumbos actually work. Larger packets are
split across multiple buffers on the ring, so the buffers themselves stay
the same size.

ok dlg@


# 1.15 22-Jan-2019 jmatthew

Add and remove mac filters for multicast addresses.

ok dlg@


# 1.14 21-Jan-2019 jmatthew

also add a mac filter that ignores vlans for the broadcast address, so we can
see arp requests on vlans, among other things.

ok dlg@


# 1.13 20-Jan-2019 jmatthew

Replace the default mac filter with one that ignores vlans, and enable
promisc vlan mode so we can see vlan tagged traffic.

ok dlg@


# 1.12 20-Jan-2019 jmatthew

implement ixl_rxrinfo, dynamically allocating the buffer to prepare for
multiple receive rings at some point in the future.

ok dlg@


# 1.11 20-Jan-2019 jmatthew

Handle link state change interrupts by issuing IXL_AQ_OP_PHY_LINK_STATUS
to the admin queue. We don't need to wait for or process the reply,
because the existing admin reply queue processing already does it.

ok dlg@


# 1.10 19-Jan-2019 jmatthew

actually set CAUSE_ENA on the rx and tx queues, and re-enable interrupts
at the start of the interrupt handler. now it works well enough to commit
over.

ok dlg@


# 1.9 18-Jan-2019 jmatthew

pack hmc bits in the right order

ok dlg@


# 1.8 18-Nov-2018 jmatthew

request notification of link state changes, which helps us detect
link when it takes a bit longer to establish.

ok dlg@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.7 21-Dec-2017 dlg

add some ifq and ifiq barriers in ixl_down.

move rxfill in ixl_up so the qtail is only written once.


# 1.6 21-Dec-2017 dlg

now that we have multiple input queues in ifnet structs, use them.

for now we still only have one set of tx and rx rings, but sets us up to
bump that number.


# 1.5 15-Dec-2017 dlg

put where im up to into the tree so jmatthew@ can look at it.


# 1.4 29-Nov-2017 dlg

turns out you dont have to configure a vsi as the default in a veb.

what a waste of two days.


# 1.3 29-Nov-2017 dlg

let this build on sparc64 again.


# 1.2 28-Nov-2017 dlg

remove the #if 0ed out ixl_add_veb now that cvs has backed it up

im pretty sure we dont need it if we're just using the chip as a
single ethernet port.


# 1.1 28-Nov-2017 dlg

add ixl(4) for the "Intel Ethernet 700 Series"

this doesn't work yet, but it very recently got too big to hack on
without cvs to help me manage further changes to it.

ok deraadt@


# 1.72 25-Jan-2021 dlg

if the rx descriptor reports the rss hash, use it for the mbuf flowid.

ok jmatthew@


# 1.71 22-Dec-2020 dlg

name the rx rings like ix does for systat mb


# 1.70 12-Dec-2020 jan

Rename the macro MCLGETI to MCLGETL and removes the dead parameter ifp.

OK dlg@, bluhm@
No Opinion mpi@
Not against it claudio@


# 1.69 02-Nov-2020 dlg

add support for ctl reads and writes on 710 chips with API < 1.5

this gets rid of some annoying errors when bringing such an interface
up, but more importantly is allows RSS to work on these boards with
older firmware.

ok jmatthew@


Revision tags: OPENBSD_6_8_BASE
# 1.68 16-Jul-2020 dlg

use a mutex to serialise the test and set of ifp->if_link_state.

this was serialised by NET_LOCK, but now i get link state change
information in an interrupt context, so i shouldn't (can't) do that
anymore.

ok jmatthew@


# 1.67 16-Jul-2020 dlg

sc_atq_mtx is unused, so get rid of it


# 1.66 12-Jul-2020 dlg

it's not an error if the API doesnt support reading sfp/qsfp stuff.

it's just not supported. the manpage says why.

ok sthen@ deraadt@


# 1.65 11-Jul-2020 dlg

don't complain if the API is too low to support "query phy types".

it just means we won't know what to show in ifconfig media output,
but that's not a huge deal. there's still some more issues around
api versions and driver support that we're working on though.

ok sthen@ jmatthew@


# 1.64 10-Jul-2020 patrick

Change users of IFQ_SET_MAXLEN() and IFQ_IS_EMPTY() to use the "new" API.

ok dlg@ tobhe@


# 1.63 07-Jul-2020 dlg

fix a copy pasto.

from netbsd if_ixl.c r1.39 by yamaguchi


# 1.62 07-Jul-2020 dlg

don't try and be too clever in the kstat update timeout.


# 1.61 07-Jul-2020 dlg

remove some old code from a previous version of the kstat diff.

ixl(4) is only enabled on 64bit archs, so we don't need a 32bit
fallback for the 48 and 64 bit counter reads.


# 1.60 07-Jul-2020 dlg

add kstat support for reading hardware counters.

this chip is annoyingly complicated, which is reflected in how
complicated it is to read counters off the chip. while we just use
ixl as a normal network interface, the chip is capable of being a
switch with physical ports, virtual ports and all sorts of other
functionality, and there are counters in different places for all
these different pieces. in our simple setup the driver interface
is mapped to a single physical port which we talk to via a single
virtual switch interface. this diff adds counters on each interface
for the physical port and for the virtual switch interface (vsi).

the port counters show what the hardware is doing, while the vsi
counters show how the driver is interacting with the chip. for
things like packet counters, these numbers tend to correlate strongly,
but there are some differences. if the chip drops packets cos there's
no descriptors on the rx ring, that's shown in the vsi counters.
problems talking to the physical network (eg, packet corruption off
the wire) are reported on the port counters.

on top of the chip just being complicated, reading the counters is
a complicated activity on its own. because the counters can be read
by multiple consumers in a virtualised environment, the counters
never really get reset. they are also stored in annoyingly small
fields. this means you basically have to poll the chip periodically
and calculate differences between the polls to avoid losing numbers
if they overflow too quickly.


# 1.59 26-Jun-2020 dlg

whitespace fixes, no functional change.


# 1.58 26-Jun-2020 dlg

fix link state handling so we can see link go both up and down.


# 1.57 25-Jun-2020 dlg

LUT writes go through actual regsiters on 710s, not ctl writes.

found by jmatthew@


# 1.56 25-Jun-2020 dlg

add rss/toeplitz support for 710 chips.

this basically distributes incoming packets over the rx rings, where
without this they would all land on ring 0.

note that the 722 stuff is stubbed out at the moment. i don't have
an x722 to test with, so it's hard to get motivated to write the
code for it.

this is based on stuff supplied by christiano haesbaert.


# 1.55 25-Jun-2020 dlg

use the ixl_chip struct to store different rss_hena settings for 710/722


# 1.54 25-Jun-2020 dlg

add definitions for rss bits.

based on info from christiano haesbaert


# 1.53 25-Jun-2020 dlg

we're close to a point where the differences between 710s and 722s matter.

this adds a struct ixl_chip, which should hold the differences in
functionality between 710s and 722s. this adds which type of chip
each product is to the ixl_devices array.

based on stuff from christiano haesbaert


# 1.52 25-Jun-2020 dlg

use intrmap to set up multiple queues across multiple cpus.

ixl(4) is only enabled on amd64 and sparc64, and both of them now
support pci_intr_establish_cpu(), so it is safe to apply this.

a few things to note:

- the chip only supports a power of 2 number of queues, (ie, 1, 2,
4, 8, etc), so this also tests the INTRMAP_POWEROF2 flag to
intrmap_create. i tested this on a box with 6 cpus and it did the
right thing.
- the chip can support a lot of vectors, but we're limiting it to
8 for now.
- rss with toeplitz is not implemented yet, so all rxed packets end
up on the 0th queue for now.

jmatthew@ had done most of the work already, and christiano haesbaert
provided some hints to motivate me to work on this bit of it.

tested by jmatthew@ on sparc64
ok jmatthew@


# 1.51 24-Jun-2020 dlg

set IFQ_SET_MAXLEN to the number of slots on the tx ring, not 1.

this effectively enables tx mitigation on this chip. hrvoje popovski
tested it and discovered it adds about 20% to forwarding performance
on his test machine, and brings it more in line with ix(4) performance.

jmatthew thinks i copied setting it to 1 from myx, but myx resets
it to a proper value later on when it figures out what the chip is
capable of. how embarrassment.


# 1.50 24-Jun-2020 dlg

get rid of the per device sff lock because we only use the global one.

no functional change


# 1.49 21-Jun-2020 jmatthew

The onboard interfaces on T7/S7 machines don't provide a valid MAC address
for themselves, so use the "local-mac-address" Open Firmware property
instead, as done in ix(4).

ok dlg@


# 1.48 09-May-2020 jmatthew

Use MSI-X interrupts where available. The first vector is used for
events and command completions as that's the only vector they can go to.
tx/rx queues are mapped to subsequent vectors.

ok mpi@ dlg@


Revision tags: OPENBSD_6_7_BASE
# 1.47 22-Apr-2020 mpi

Use I40E_QUEUE_TYPE_EOL instead of hardcoding its value localy.

ok jmatthew@


# 1.46 19-Nov-2019 yasuoka

Correct the link speed constant variables. From Shoichi Yamaguchi.

ok jmatthew


Revision tags: OPENBSD_6_6_BASE
# 1.45 02-Oct-2019 yasuoka

When dequeuing an aqb from the live queue and move it to the idle queue,
remove it from the live queue. Found by Shoichi Yamaguchi.

ok dlg


# 1.44 30-Sep-2019 dlg

use the right values when figuring out which if_baudrate to use.

i was basically using ISSET(bitfield, shift) instead of
ISSET(bitfield, 1 << shift), so things didn't line up properly.

before this the baudrate would come out as 0, but lacp mode in
trunk(4) wants a non-zero if_baudrate value to help pick the best
aggregator to use. if the port reports 0, it doesn't get selected.
this probably explains why trunk(4) doesn't like doing lacp on
virtual ethernet interfaces too, but at least i can get a number
that actually means something on a hardware interface. alternatively,
use aggr(4), which doesn't care about the baudrate.

ok jmatthew@


# 1.43 27-Aug-2019 dlg

don't check the page number was set correctly.

there are some extremely terrible modules out there that violate
specs badly. some SFP+ DACs i have don't support page switching,
and always report page 0xff. let userland look at them anyway.


# 1.42 27-Aug-2019 dlg

make SIOCGIFSFFPAGE support QSFP modules better.

basically the "read phy register" admin op uses the "address" field
on SFP modules for the i2c address, but uses that same field for
the "sff page number" for QSFP. this wraps the calls to the admin
op with sfp and qsfp specific code, and chooses between them depending
on what type of phy the fw reports.

tested by me and jmatthew@ pretty hard


# 1.41 29-Jul-2019 jmatthew

remove some VF bits now that iavf(4) exists

ok dlg@


# 1.40 21-Jul-2019 dlg

apply backpressure when ifiq says the stack is getting busy


# 1.39 04-Jul-2019 jmatthew

when the mac address changes, update the mac/vlan filters accordingly.

ok dlg@


# 1.38 04-May-2019 jsg

fix array bounds check in ixl_search_link_speed()
ok dlg@


# 1.37 15-Apr-2019 visa

Use timeout_del_barrier(9) instead of timeout_del(9) followed by
conditional timeout_barrier(9).

OK kn@ dlg@


Revision tags: OPENBSD_6_5_BASE
# 1.36 10-Apr-2019 phessler

Add support for X722 to ixl(4)

full list of pci ids from sthen@

OK sthen@, jmatthew@, deraadt@


# 1.35 10-Apr-2019 dlg

implement support for SIOCGIFSFFPAGE

this relies on some firmware commands for accessing the i2c bus
that are not available until a relatively recent API version. our
nics using API 1.4 and 1.5 don't handle the command, but the API
1.7 nic we have is happy to talk to the module that is plugged into
it.

xl710 cards (the 40g ones with a qsfp+ connector) can be split up
into 4 functions that represent lanes on a single port. you can get
qsfp+ to 4x sfp+ cables so you can use the different lanes/functions
as completely independent interfaces. however, because each lane/
function is one port and therefore module, we need to serialise
access to the module by at least the port. this is to prevent
concurrent reads of different pages of the one module from stepping
on each other. i took the easy path and made a single ixlsff lock,
which is at least conservative.


# 1.34 01-Apr-2019 jmatthew

Enable the full pre-reset code path again, now that we've fixed the cause
of the pcie errors.

ok dlg@


# 1.33 01-Apr-2019 jmatthew

Don't use a prefetchable mapping for the registers - the controller only
accepts specific read and write widths, so prefetching and write combining
can cause it to generate pcie errors.

ok dlg@


# 1.32 29-Mar-2019 dlg

remove ifiq_barrier in ixl_down cos ifiq tasks don't use nic resources.

ixl_down frees all the dma memory used by the rings, but that memory
isn't used by ifiqs, so there's no need to wait for them to finish


# 1.31 22-Mar-2019 dlg

back out 1.28

i misread the doco, having promisc vlan reception is what we want.


# 1.30 22-Mar-2019 dlg

use a cond in ixl_atq_exec to wait for a command to be completed.

the main change is to not run ixl_atq_done, cos the interrupt handler
is supposed to do that for us.


# 1.29 22-Mar-2019 dlg

don't blindly complete admin queue entries

there's a bit in the flags field that the firmware sets when the
command is done, so check to decide if the command is ready to be
completed. this in turn makes ixl_iff work.

"oh" jmatthew@


# 1.28 21-Mar-2019 dlg

don't set VLAN things when configuring promisc

the doco says vlan things should only be set if you're manipulating
something on a specific vlan, otherwise the config applies to
everything, which is how the stack wants things to work at the
moment.

there's still something wrong in here, but let's get this out of
the way first.


# 1.27 21-Mar-2019 dlg

run event callbacks directly in the atq processing

previously events were queued on an SLIST, but multiple link state
events could fire with the same callback. this corrupted the SLIST
and effectively caused an infinite loop.

ok jmatthew@


# 1.26 12-Mar-2019 jmatthew

Until we can figure out why it causes NMIs on some machines, skip the
pre-reset steps described in Intel's datasheet and also their driver code.

ok dlg@


# 1.25 06-Mar-2019 jmatthew

Ignore rx interrupts if the interface isn't supposed to be running.
In some situations the pxe rom seems to leave rx interrupts
pending, so we get them as soon as we turn interrupts on.
Trying to process rx interrupts before we've allocated an rx ring
leads to crashes.

ok dlg@


# 1.24 01-Mar-2019 dlg

use ifiq_input instead of if_input

call if_rxr_livelocked if ifiq_input says to slow down


# 1.23 26-Feb-2019 dlg

tweak the mbuf loading in the tx path so it's easier to read.

count mbuf load failures as output errors so i can see if that's
a problem (it's not, but at least i can see it isn't now).


# 1.22 26-Feb-2019 dlg

put the rx buffer at the end of the cluster.

makes m_prepend later less likely to allocate a new mbuf.


# 1.21 26-Feb-2019 dlg

avoid a deadlock in ixl_down when calling ifq_barrier.

this is particularly noticable on sparc64 when you reboot.

ok jmatthew@


# 1.20 24-Feb-2019 dlg

get rid of an unused softc member


# 1.19 01-Feb-2019 jmatthew

fix up calculation of our physical function id, making the second port
on dual port cards work much better.

ok dlg@


# 1.18 29-Jan-2019 dlg

get rid of some more debug printfs

suggested by jmatthew@


# 1.17 29-Jan-2019 dlg

don't need to print the base queue number.

ok jmatthew@


# 1.16 22-Jan-2019 jmatthew

Increase hardmtu to the maximum according to the datasheet and set the rx
packet size limit to match so jumbos actually work. Larger packets are
split across multiple buffers on the ring, so the buffers themselves stay
the same size.

ok dlg@


# 1.15 22-Jan-2019 jmatthew

Add and remove mac filters for multicast addresses.

ok dlg@


# 1.14 21-Jan-2019 jmatthew

also add a mac filter that ignores vlans for the broadcast address, so we can
see arp requests on vlans, among other things.

ok dlg@


# 1.13 20-Jan-2019 jmatthew

Replace the default mac filter with one that ignores vlans, and enable
promisc vlan mode so we can see vlan tagged traffic.

ok dlg@


# 1.12 20-Jan-2019 jmatthew

implement ixl_rxrinfo, dynamically allocating the buffer to prepare for
multiple receive rings at some point in the future.

ok dlg@


# 1.11 20-Jan-2019 jmatthew

Handle link state change interrupts by issuing IXL_AQ_OP_PHY_LINK_STATUS
to the admin queue. We don't need to wait for or process the reply,
because the existing admin reply queue processing already does it.

ok dlg@


# 1.10 19-Jan-2019 jmatthew

actually set CAUSE_ENA on the rx and tx queues, and re-enable interrupts
at the start of the interrupt handler. now it works well enough to commit
over.

ok dlg@


# 1.9 18-Jan-2019 jmatthew

pack hmc bits in the right order

ok dlg@


# 1.8 18-Nov-2018 jmatthew

request notification of link state changes, which helps us detect
link when it takes a bit longer to establish.

ok dlg@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.7 21-Dec-2017 dlg

add some ifq and ifiq barriers in ixl_down.

move rxfill in ixl_up so the qtail is only written once.


# 1.6 21-Dec-2017 dlg

now that we have multiple input queues in ifnet structs, use them.

for now we still only have one set of tx and rx rings, but sets us up to
bump that number.


# 1.5 15-Dec-2017 dlg

put where im up to into the tree so jmatthew@ can look at it.


# 1.4 29-Nov-2017 dlg

turns out you dont have to configure a vsi as the default in a veb.

what a waste of two days.


# 1.3 29-Nov-2017 dlg

let this build on sparc64 again.


# 1.2 28-Nov-2017 dlg

remove the #if 0ed out ixl_add_veb now that cvs has backed it up

im pretty sure we dont need it if we're just using the chip as a
single ethernet port.


# 1.1 28-Nov-2017 dlg

add ixl(4) for the "Intel Ethernet 700 Series"

this doesn't work yet, but it very recently got too big to hack on
without cvs to help me manage further changes to it.

ok deraadt@


# 1.71 22-Dec-2020 dlg

name the rx rings like ix does for systat mb


# 1.70 12-Dec-2020 jan

Rename the macro MCLGETI to MCLGETL and removes the dead parameter ifp.

OK dlg@, bluhm@
No Opinion mpi@
Not against it claudio@


# 1.69 02-Nov-2020 dlg

add support for ctl reads and writes on 710 chips with API < 1.5

this gets rid of some annoying errors when bringing such an interface
up, but more importantly is allows RSS to work on these boards with
older firmware.

ok jmatthew@


Revision tags: OPENBSD_6_8_BASE
# 1.68 16-Jul-2020 dlg

use a mutex to serialise the test and set of ifp->if_link_state.

this was serialised by NET_LOCK, but now i get link state change
information in an interrupt context, so i shouldn't (can't) do that
anymore.

ok jmatthew@


# 1.67 16-Jul-2020 dlg

sc_atq_mtx is unused, so get rid of it


# 1.66 12-Jul-2020 dlg

it's not an error if the API doesnt support reading sfp/qsfp stuff.

it's just not supported. the manpage says why.

ok sthen@ deraadt@


# 1.65 11-Jul-2020 dlg

don't complain if the API is too low to support "query phy types".

it just means we won't know what to show in ifconfig media output,
but that's not a huge deal. there's still some more issues around
api versions and driver support that we're working on though.

ok sthen@ jmatthew@


# 1.64 10-Jul-2020 patrick

Change users of IFQ_SET_MAXLEN() and IFQ_IS_EMPTY() to use the "new" API.

ok dlg@ tobhe@


# 1.63 07-Jul-2020 dlg

fix a copy pasto.

from netbsd if_ixl.c r1.39 by yamaguchi


# 1.62 07-Jul-2020 dlg

don't try and be too clever in the kstat update timeout.


# 1.61 07-Jul-2020 dlg

remove some old code from a previous version of the kstat diff.

ixl(4) is only enabled on 64bit archs, so we don't need a 32bit
fallback for the 48 and 64 bit counter reads.


# 1.60 07-Jul-2020 dlg

add kstat support for reading hardware counters.

this chip is annoyingly complicated, which is reflected in how
complicated it is to read counters off the chip. while we just use
ixl as a normal network interface, the chip is capable of being a
switch with physical ports, virtual ports and all sorts of other
functionality, and there are counters in different places for all
these different pieces. in our simple setup the driver interface
is mapped to a single physical port which we talk to via a single
virtual switch interface. this diff adds counters on each interface
for the physical port and for the virtual switch interface (vsi).

the port counters show what the hardware is doing, while the vsi
counters show how the driver is interacting with the chip. for
things like packet counters, these numbers tend to correlate strongly,
but there are some differences. if the chip drops packets cos there's
no descriptors on the rx ring, that's shown in the vsi counters.
problems talking to the physical network (eg, packet corruption off
the wire) are reported on the port counters.

on top of the chip just being complicated, reading the counters is
a complicated activity on its own. because the counters can be read
by multiple consumers in a virtualised environment, the counters
never really get reset. they are also stored in annoyingly small
fields. this means you basically have to poll the chip periodically
and calculate differences between the polls to avoid losing numbers
if they overflow too quickly.


# 1.59 26-Jun-2020 dlg

whitespace fixes, no functional change.


# 1.58 26-Jun-2020 dlg

fix link state handling so we can see link go both up and down.


# 1.57 25-Jun-2020 dlg

LUT writes go through actual regsiters on 710s, not ctl writes.

found by jmatthew@


# 1.56 25-Jun-2020 dlg

add rss/toeplitz support for 710 chips.

this basically distributes incoming packets over the rx rings, where
without this they would all land on ring 0.

note that the 722 stuff is stubbed out at the moment. i don't have
an x722 to test with, so it's hard to get motivated to write the
code for it.

this is based on stuff supplied by christiano haesbaert.


# 1.55 25-Jun-2020 dlg

use the ixl_chip struct to store different rss_hena settings for 710/722


# 1.54 25-Jun-2020 dlg

add definitions for rss bits.

based on info from christiano haesbaert


# 1.53 25-Jun-2020 dlg

we're close to a point where the differences between 710s and 722s matter.

this adds a struct ixl_chip, which should hold the differences in
functionality between 710s and 722s. this adds which type of chip
each product is to the ixl_devices array.

based on stuff from christiano haesbaert


# 1.52 25-Jun-2020 dlg

use intrmap to set up multiple queues across multiple cpus.

ixl(4) is only enabled on amd64 and sparc64, and both of them now
support pci_intr_establish_cpu(), so it is safe to apply this.

a few things to note:

- the chip only supports a power of 2 number of queues, (ie, 1, 2,
4, 8, etc), so this also tests the INTRMAP_POWEROF2 flag to
intrmap_create. i tested this on a box with 6 cpus and it did the
right thing.
- the chip can support a lot of vectors, but we're limiting it to
8 for now.
- rss with toeplitz is not implemented yet, so all rxed packets end
up on the 0th queue for now.

jmatthew@ had done most of the work already, and christiano haesbaert
provided some hints to motivate me to work on this bit of it.

tested by jmatthew@ on sparc64
ok jmatthew@


# 1.51 24-Jun-2020 dlg

set IFQ_SET_MAXLEN to the number of slots on the tx ring, not 1.

this effectively enables tx mitigation on this chip. hrvoje popovski
tested it and discovered it adds about 20% to forwarding performance
on his test machine, and brings it more in line with ix(4) performance.

jmatthew thinks i copied setting it to 1 from myx, but myx resets
it to a proper value later on when it figures out what the chip is
capable of. how embarrassment.


# 1.50 24-Jun-2020 dlg

get rid of the per device sff lock because we only use the global one.

no functional change


# 1.49 21-Jun-2020 jmatthew

The onboard interfaces on T7/S7 machines don't provide a valid MAC address
for themselves, so use the "local-mac-address" Open Firmware property
instead, as done in ix(4).

ok dlg@


# 1.48 09-May-2020 jmatthew

Use MSI-X interrupts where available. The first vector is used for
events and command completions as that's the only vector they can go to.
tx/rx queues are mapped to subsequent vectors.

ok mpi@ dlg@


Revision tags: OPENBSD_6_7_BASE
# 1.47 22-Apr-2020 mpi

Use I40E_QUEUE_TYPE_EOL instead of hardcoding its value localy.

ok jmatthew@


# 1.46 19-Nov-2019 yasuoka

Correct the link speed constant variables. From Shoichi Yamaguchi.

ok jmatthew


Revision tags: OPENBSD_6_6_BASE
# 1.45 02-Oct-2019 yasuoka

When dequeuing an aqb from the live queue and move it to the idle queue,
remove it from the live queue. Found by Shoichi Yamaguchi.

ok dlg


# 1.44 30-Sep-2019 dlg

use the right values when figuring out which if_baudrate to use.

i was basically using ISSET(bitfield, shift) instead of
ISSET(bitfield, 1 << shift), so things didn't line up properly.

before this the baudrate would come out as 0, but lacp mode in
trunk(4) wants a non-zero if_baudrate value to help pick the best
aggregator to use. if the port reports 0, it doesn't get selected.
this probably explains why trunk(4) doesn't like doing lacp on
virtual ethernet interfaces too, but at least i can get a number
that actually means something on a hardware interface. alternatively,
use aggr(4), which doesn't care about the baudrate.

ok jmatthew@


# 1.43 27-Aug-2019 dlg

don't check the page number was set correctly.

there are some extremely terrible modules out there that violate
specs badly. some SFP+ DACs i have don't support page switching,
and always report page 0xff. let userland look at them anyway.


# 1.42 27-Aug-2019 dlg

make SIOCGIFSFFPAGE support QSFP modules better.

basically the "read phy register" admin op uses the "address" field
on SFP modules for the i2c address, but uses that same field for
the "sff page number" for QSFP. this wraps the calls to the admin
op with sfp and qsfp specific code, and chooses between them depending
on what type of phy the fw reports.

tested by me and jmatthew@ pretty hard


# 1.41 29-Jul-2019 jmatthew

remove some VF bits now that iavf(4) exists

ok dlg@


# 1.40 21-Jul-2019 dlg

apply backpressure when ifiq says the stack is getting busy


# 1.39 04-Jul-2019 jmatthew

when the mac address changes, update the mac/vlan filters accordingly.

ok dlg@


# 1.38 04-May-2019 jsg

fix array bounds check in ixl_search_link_speed()
ok dlg@


# 1.37 15-Apr-2019 visa

Use timeout_del_barrier(9) instead of timeout_del(9) followed by
conditional timeout_barrier(9).

OK kn@ dlg@


Revision tags: OPENBSD_6_5_BASE
# 1.36 10-Apr-2019 phessler

Add support for X722 to ixl(4)

full list of pci ids from sthen@

OK sthen@, jmatthew@, deraadt@


# 1.35 10-Apr-2019 dlg

implement support for SIOCGIFSFFPAGE

this relies on some firmware commands for accessing the i2c bus
that are not available until a relatively recent API version. our
nics using API 1.4 and 1.5 don't handle the command, but the API
1.7 nic we have is happy to talk to the module that is plugged into
it.

xl710 cards (the 40g ones with a qsfp+ connector) can be split up
into 4 functions that represent lanes on a single port. you can get
qsfp+ to 4x sfp+ cables so you can use the different lanes/functions
as completely independent interfaces. however, because each lane/
function is one port and therefore module, we need to serialise
access to the module by at least the port. this is to prevent
concurrent reads of different pages of the one module from stepping
on each other. i took the easy path and made a single ixlsff lock,
which is at least conservative.


# 1.34 01-Apr-2019 jmatthew

Enable the full pre-reset code path again, now that we've fixed the cause
of the pcie errors.

ok dlg@


# 1.33 01-Apr-2019 jmatthew

Don't use a prefetchable mapping for the registers - the controller only
accepts specific read and write widths, so prefetching and write combining
can cause it to generate pcie errors.

ok dlg@


# 1.32 29-Mar-2019 dlg

remove ifiq_barrier in ixl_down cos ifiq tasks don't use nic resources.

ixl_down frees all the dma memory used by the rings, but that memory
isn't used by ifiqs, so there's no need to wait for them to finish


# 1.31 22-Mar-2019 dlg

back out 1.28

i misread the doco, having promisc vlan reception is what we want.


# 1.30 22-Mar-2019 dlg

use a cond in ixl_atq_exec to wait for a command to be completed.

the main change is to not run ixl_atq_done, cos the interrupt handler
is supposed to do that for us.


# 1.29 22-Mar-2019 dlg

don't blindly complete admin queue entries

there's a bit in the flags field that the firmware sets when the
command is done, so check to decide if the command is ready to be
completed. this in turn makes ixl_iff work.

"oh" jmatthew@


# 1.28 21-Mar-2019 dlg

don't set VLAN things when configuring promisc

the doco says vlan things should only be set if you're manipulating
something on a specific vlan, otherwise the config applies to
everything, which is how the stack wants things to work at the
moment.

there's still something wrong in here, but let's get this out of
the way first.


# 1.27 21-Mar-2019 dlg

run event callbacks directly in the atq processing

previously events were queued on an SLIST, but multiple link state
events could fire with the same callback. this corrupted the SLIST
and effectively caused an infinite loop.

ok jmatthew@


# 1.26 12-Mar-2019 jmatthew

Until we can figure out why it causes NMIs on some machines, skip the
pre-reset steps described in Intel's datasheet and also their driver code.

ok dlg@


# 1.25 06-Mar-2019 jmatthew

Ignore rx interrupts if the interface isn't supposed to be running.
In some situations the pxe rom seems to leave rx interrupts
pending, so we get them as soon as we turn interrupts on.
Trying to process rx interrupts before we've allocated an rx ring
leads to crashes.

ok dlg@


# 1.24 01-Mar-2019 dlg

use ifiq_input instead of if_input

call if_rxr_livelocked if ifiq_input says to slow down


# 1.23 26-Feb-2019 dlg

tweak the mbuf loading in the tx path so it's easier to read.

count mbuf load failures as output errors so i can see if that's
a problem (it's not, but at least i can see it isn't now).


# 1.22 26-Feb-2019 dlg

put the rx buffer at the end of the cluster.

makes m_prepend later less likely to allocate a new mbuf.


# 1.21 26-Feb-2019 dlg

avoid a deadlock in ixl_down when calling ifq_barrier.

this is particularly noticable on sparc64 when you reboot.

ok jmatthew@


# 1.20 24-Feb-2019 dlg

get rid of an unused softc member


# 1.19 01-Feb-2019 jmatthew

fix up calculation of our physical function id, making the second port
on dual port cards work much better.

ok dlg@


# 1.18 29-Jan-2019 dlg

get rid of some more debug printfs

suggested by jmatthew@


# 1.17 29-Jan-2019 dlg

don't need to print the base queue number.

ok jmatthew@


# 1.16 22-Jan-2019 jmatthew

Increase hardmtu to the maximum according to the datasheet and set the rx
packet size limit to match so jumbos actually work. Larger packets are
split across multiple buffers on the ring, so the buffers themselves stay
the same size.

ok dlg@


# 1.15 22-Jan-2019 jmatthew

Add and remove mac filters for multicast addresses.

ok dlg@


# 1.14 21-Jan-2019 jmatthew

also add a mac filter that ignores vlans for the broadcast address, so we can
see arp requests on vlans, among other things.

ok dlg@


# 1.13 20-Jan-2019 jmatthew

Replace the default mac filter with one that ignores vlans, and enable
promisc vlan mode so we can see vlan tagged traffic.

ok dlg@


# 1.12 20-Jan-2019 jmatthew

implement ixl_rxrinfo, dynamically allocating the buffer to prepare for
multiple receive rings at some point in the future.

ok dlg@


# 1.11 20-Jan-2019 jmatthew

Handle link state change interrupts by issuing IXL_AQ_OP_PHY_LINK_STATUS
to the admin queue. We don't need to wait for or process the reply,
because the existing admin reply queue processing already does it.

ok dlg@


# 1.10 19-Jan-2019 jmatthew

actually set CAUSE_ENA on the rx and tx queues, and re-enable interrupts
at the start of the interrupt handler. now it works well enough to commit
over.

ok dlg@


# 1.9 18-Jan-2019 jmatthew

pack hmc bits in the right order

ok dlg@


# 1.8 18-Nov-2018 jmatthew

request notification of link state changes, which helps us detect
link when it takes a bit longer to establish.

ok dlg@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.7 21-Dec-2017 dlg

add some ifq and ifiq barriers in ixl_down.

move rxfill in ixl_up so the qtail is only written once.


# 1.6 21-Dec-2017 dlg

now that we have multiple input queues in ifnet structs, use them.

for now we still only have one set of tx and rx rings, but sets us up to
bump that number.


# 1.5 15-Dec-2017 dlg

put where im up to into the tree so jmatthew@ can look at it.


# 1.4 29-Nov-2017 dlg

turns out you dont have to configure a vsi as the default in a veb.

what a waste of two days.


# 1.3 29-Nov-2017 dlg

let this build on sparc64 again.


# 1.2 28-Nov-2017 dlg

remove the #if 0ed out ixl_add_veb now that cvs has backed it up

im pretty sure we dont need it if we're just using the chip as a
single ethernet port.


# 1.1 28-Nov-2017 dlg

add ixl(4) for the "Intel Ethernet 700 Series"

this doesn't work yet, but it very recently got too big to hack on
without cvs to help me manage further changes to it.

ok deraadt@


# 1.70 12-Dec-2020 jan

Rename the macro MCLGETI to MCLGETL and removes the dead parameter ifp.

OK dlg@, bluhm@
No Opinion mpi@
Not against it claudio@


# 1.69 02-Nov-2020 dlg

add support for ctl reads and writes on 710 chips with API < 1.5

this gets rid of some annoying errors when bringing such an interface
up, but more importantly is allows RSS to work on these boards with
older firmware.

ok jmatthew@


Revision tags: OPENBSD_6_8_BASE
# 1.68 16-Jul-2020 dlg

use a mutex to serialise the test and set of ifp->if_link_state.

this was serialised by NET_LOCK, but now i get link state change
information in an interrupt context, so i shouldn't (can't) do that
anymore.

ok jmatthew@


# 1.67 16-Jul-2020 dlg

sc_atq_mtx is unused, so get rid of it


# 1.66 12-Jul-2020 dlg

it's not an error if the API doesnt support reading sfp/qsfp stuff.

it's just not supported. the manpage says why.

ok sthen@ deraadt@


# 1.65 11-Jul-2020 dlg

don't complain if the API is too low to support "query phy types".

it just means we won't know what to show in ifconfig media output,
but that's not a huge deal. there's still some more issues around
api versions and driver support that we're working on though.

ok sthen@ jmatthew@


# 1.64 10-Jul-2020 patrick

Change users of IFQ_SET_MAXLEN() and IFQ_IS_EMPTY() to use the "new" API.

ok dlg@ tobhe@


# 1.63 07-Jul-2020 dlg

fix a copy pasto.

from netbsd if_ixl.c r1.39 by yamaguchi


# 1.62 07-Jul-2020 dlg

don't try and be too clever in the kstat update timeout.


# 1.61 07-Jul-2020 dlg

remove some old code from a previous version of the kstat diff.

ixl(4) is only enabled on 64bit archs, so we don't need a 32bit
fallback for the 48 and 64 bit counter reads.


# 1.60 07-Jul-2020 dlg

add kstat support for reading hardware counters.

this chip is annoyingly complicated, which is reflected in how
complicated it is to read counters off the chip. while we just use
ixl as a normal network interface, the chip is capable of being a
switch with physical ports, virtual ports and all sorts of other
functionality, and there are counters in different places for all
these different pieces. in our simple setup the driver interface
is mapped to a single physical port which we talk to via a single
virtual switch interface. this diff adds counters on each interface
for the physical port and for the virtual switch interface (vsi).

the port counters show what the hardware is doing, while the vsi
counters show how the driver is interacting with the chip. for
things like packet counters, these numbers tend to correlate strongly,
but there are some differences. if the chip drops packets cos there's
no descriptors on the rx ring, that's shown in the vsi counters.
problems talking to the physical network (eg, packet corruption off
the wire) are reported on the port counters.

on top of the chip just being complicated, reading the counters is
a complicated activity on its own. because the counters can be read
by multiple consumers in a virtualised environment, the counters
never really get reset. they are also stored in annoyingly small
fields. this means you basically have to poll the chip periodically
and calculate differences between the polls to avoid losing numbers
if they overflow too quickly.


# 1.59 26-Jun-2020 dlg

whitespace fixes, no functional change.


# 1.58 26-Jun-2020 dlg

fix link state handling so we can see link go both up and down.


# 1.57 25-Jun-2020 dlg

LUT writes go through actual regsiters on 710s, not ctl writes.

found by jmatthew@


# 1.56 25-Jun-2020 dlg

add rss/toeplitz support for 710 chips.

this basically distributes incoming packets over the rx rings, where
without this they would all land on ring 0.

note that the 722 stuff is stubbed out at the moment. i don't have
an x722 to test with, so it's hard to get motivated to write the
code for it.

this is based on stuff supplied by christiano haesbaert.


# 1.55 25-Jun-2020 dlg

use the ixl_chip struct to store different rss_hena settings for 710/722


# 1.54 25-Jun-2020 dlg

add definitions for rss bits.

based on info from christiano haesbaert


# 1.53 25-Jun-2020 dlg

we're close to a point where the differences between 710s and 722s matter.

this adds a struct ixl_chip, which should hold the differences in
functionality between 710s and 722s. this adds which type of chip
each product is to the ixl_devices array.

based on stuff from christiano haesbaert


# 1.52 25-Jun-2020 dlg

use intrmap to set up multiple queues across multiple cpus.

ixl(4) is only enabled on amd64 and sparc64, and both of them now
support pci_intr_establish_cpu(), so it is safe to apply this.

a few things to note:

- the chip only supports a power of 2 number of queues, (ie, 1, 2,
4, 8, etc), so this also tests the INTRMAP_POWEROF2 flag to
intrmap_create. i tested this on a box with 6 cpus and it did the
right thing.
- the chip can support a lot of vectors, but we're limiting it to
8 for now.
- rss with toeplitz is not implemented yet, so all rxed packets end
up on the 0th queue for now.

jmatthew@ had done most of the work already, and christiano haesbaert
provided some hints to motivate me to work on this bit of it.

tested by jmatthew@ on sparc64
ok jmatthew@


# 1.51 24-Jun-2020 dlg

set IFQ_SET_MAXLEN to the number of slots on the tx ring, not 1.

this effectively enables tx mitigation on this chip. hrvoje popovski
tested it and discovered it adds about 20% to forwarding performance
on his test machine, and brings it more in line with ix(4) performance.

jmatthew thinks i copied setting it to 1 from myx, but myx resets
it to a proper value later on when it figures out what the chip is
capable of. how embarrassment.


# 1.50 24-Jun-2020 dlg

get rid of the per device sff lock because we only use the global one.

no functional change


# 1.49 21-Jun-2020 jmatthew

The onboard interfaces on T7/S7 machines don't provide a valid MAC address
for themselves, so use the "local-mac-address" Open Firmware property
instead, as done in ix(4).

ok dlg@


# 1.48 09-May-2020 jmatthew

Use MSI-X interrupts where available. The first vector is used for
events and command completions as that's the only vector they can go to.
tx/rx queues are mapped to subsequent vectors.

ok mpi@ dlg@


Revision tags: OPENBSD_6_7_BASE
# 1.47 22-Apr-2020 mpi

Use I40E_QUEUE_TYPE_EOL instead of hardcoding its value localy.

ok jmatthew@


# 1.46 19-Nov-2019 yasuoka

Correct the link speed constant variables. From Shoichi Yamaguchi.

ok jmatthew


Revision tags: OPENBSD_6_6_BASE
# 1.45 02-Oct-2019 yasuoka

When dequeuing an aqb from the live queue and move it to the idle queue,
remove it from the live queue. Found by Shoichi Yamaguchi.

ok dlg


# 1.44 30-Sep-2019 dlg

use the right values when figuring out which if_baudrate to use.

i was basically using ISSET(bitfield, shift) instead of
ISSET(bitfield, 1 << shift), so things didn't line up properly.

before this the baudrate would come out as 0, but lacp mode in
trunk(4) wants a non-zero if_baudrate value to help pick the best
aggregator to use. if the port reports 0, it doesn't get selected.
this probably explains why trunk(4) doesn't like doing lacp on
virtual ethernet interfaces too, but at least i can get a number
that actually means something on a hardware interface. alternatively,
use aggr(4), which doesn't care about the baudrate.

ok jmatthew@


# 1.43 27-Aug-2019 dlg

don't check the page number was set correctly.

there are some extremely terrible modules out there that violate
specs badly. some SFP+ DACs i have don't support page switching,
and always report page 0xff. let userland look at them anyway.


# 1.42 27-Aug-2019 dlg

make SIOCGIFSFFPAGE support QSFP modules better.

basically the "read phy register" admin op uses the "address" field
on SFP modules for the i2c address, but uses that same field for
the "sff page number" for QSFP. this wraps the calls to the admin
op with sfp and qsfp specific code, and chooses between them depending
on what type of phy the fw reports.

tested by me and jmatthew@ pretty hard


# 1.41 29-Jul-2019 jmatthew

remove some VF bits now that iavf(4) exists

ok dlg@


# 1.40 21-Jul-2019 dlg

apply backpressure when ifiq says the stack is getting busy


# 1.39 04-Jul-2019 jmatthew

when the mac address changes, update the mac/vlan filters accordingly.

ok dlg@


# 1.38 04-May-2019 jsg

fix array bounds check in ixl_search_link_speed()
ok dlg@


# 1.37 15-Apr-2019 visa

Use timeout_del_barrier(9) instead of timeout_del(9) followed by
conditional timeout_barrier(9).

OK kn@ dlg@


Revision tags: OPENBSD_6_5_BASE
# 1.36 10-Apr-2019 phessler

Add support for X722 to ixl(4)

full list of pci ids from sthen@

OK sthen@, jmatthew@, deraadt@


# 1.35 10-Apr-2019 dlg

implement support for SIOCGIFSFFPAGE

this relies on some firmware commands for accessing the i2c bus
that are not available until a relatively recent API version. our
nics using API 1.4 and 1.5 don't handle the command, but the API
1.7 nic we have is happy to talk to the module that is plugged into
it.

xl710 cards (the 40g ones with a qsfp+ connector) can be split up
into 4 functions that represent lanes on a single port. you can get
qsfp+ to 4x sfp+ cables so you can use the different lanes/functions
as completely independent interfaces. however, because each lane/
function is one port and therefore module, we need to serialise
access to the module by at least the port. this is to prevent
concurrent reads of different pages of the one module from stepping
on each other. i took the easy path and made a single ixlsff lock,
which is at least conservative.


# 1.34 01-Apr-2019 jmatthew

Enable the full pre-reset code path again, now that we've fixed the cause
of the pcie errors.

ok dlg@


# 1.33 01-Apr-2019 jmatthew

Don't use a prefetchable mapping for the registers - the controller only
accepts specific read and write widths, so prefetching and write combining
can cause it to generate pcie errors.

ok dlg@


# 1.32 29-Mar-2019 dlg

remove ifiq_barrier in ixl_down cos ifiq tasks don't use nic resources.

ixl_down frees all the dma memory used by the rings, but that memory
isn't used by ifiqs, so there's no need to wait for them to finish


# 1.31 22-Mar-2019 dlg

back out 1.28

i misread the doco, having promisc vlan reception is what we want.


# 1.30 22-Mar-2019 dlg

use a cond in ixl_atq_exec to wait for a command to be completed.

the main change is to not run ixl_atq_done, cos the interrupt handler
is supposed to do that for us.


# 1.29 22-Mar-2019 dlg

don't blindly complete admin queue entries

there's a bit in the flags field that the firmware sets when the
command is done, so check to decide if the command is ready to be
completed. this in turn makes ixl_iff work.

"oh" jmatthew@


# 1.28 21-Mar-2019 dlg

don't set VLAN things when configuring promisc

the doco says vlan things should only be set if you're manipulating
something on a specific vlan, otherwise the config applies to
everything, which is how the stack wants things to work at the
moment.

there's still something wrong in here, but let's get this out of
the way first.


# 1.27 21-Mar-2019 dlg

run event callbacks directly in the atq processing

previously events were queued on an SLIST, but multiple link state
events could fire with the same callback. this corrupted the SLIST
and effectively caused an infinite loop.

ok jmatthew@


# 1.26 12-Mar-2019 jmatthew

Until we can figure out why it causes NMIs on some machines, skip the
pre-reset steps described in Intel's datasheet and also their driver code.

ok dlg@


# 1.25 06-Mar-2019 jmatthew

Ignore rx interrupts if the interface isn't supposed to be running.
In some situations the pxe rom seems to leave rx interrupts
pending, so we get them as soon as we turn interrupts on.
Trying to process rx interrupts before we've allocated an rx ring
leads to crashes.

ok dlg@


# 1.24 01-Mar-2019 dlg

use ifiq_input instead of if_input

call if_rxr_livelocked if ifiq_input says to slow down


# 1.23 26-Feb-2019 dlg

tweak the mbuf loading in the tx path so it's easier to read.

count mbuf load failures as output errors so i can see if that's
a problem (it's not, but at least i can see it isn't now).


# 1.22 26-Feb-2019 dlg

put the rx buffer at the end of the cluster.

makes m_prepend later less likely to allocate a new mbuf.


# 1.21 26-Feb-2019 dlg

avoid a deadlock in ixl_down when calling ifq_barrier.

this is particularly noticable on sparc64 when you reboot.

ok jmatthew@


# 1.20 24-Feb-2019 dlg

get rid of an unused softc member


# 1.19 01-Feb-2019 jmatthew

fix up calculation of our physical function id, making the second port
on dual port cards work much better.

ok dlg@


# 1.18 29-Jan-2019 dlg

get rid of some more debug printfs

suggested by jmatthew@


# 1.17 29-Jan-2019 dlg

don't need to print the base queue number.

ok jmatthew@


# 1.16 22-Jan-2019 jmatthew

Increase hardmtu to the maximum according to the datasheet and set the rx
packet size limit to match so jumbos actually work. Larger packets are
split across multiple buffers on the ring, so the buffers themselves stay
the same size.

ok dlg@


# 1.15 22-Jan-2019 jmatthew

Add and remove mac filters for multicast addresses.

ok dlg@


# 1.14 21-Jan-2019 jmatthew

also add a mac filter that ignores vlans for the broadcast address, so we can
see arp requests on vlans, among other things.

ok dlg@


# 1.13 20-Jan-2019 jmatthew

Replace the default mac filter with one that ignores vlans, and enable
promisc vlan mode so we can see vlan tagged traffic.

ok dlg@


# 1.12 20-Jan-2019 jmatthew

implement ixl_rxrinfo, dynamically allocating the buffer to prepare for
multiple receive rings at some point in the future.

ok dlg@


# 1.11 20-Jan-2019 jmatthew

Handle link state change interrupts by issuing IXL_AQ_OP_PHY_LINK_STATUS
to the admin queue. We don't need to wait for or process the reply,
because the existing admin reply queue processing already does it.

ok dlg@


# 1.10 19-Jan-2019 jmatthew

actually set CAUSE_ENA on the rx and tx queues, and re-enable interrupts
at the start of the interrupt handler. now it works well enough to commit
over.

ok dlg@


# 1.9 18-Jan-2019 jmatthew

pack hmc bits in the right order

ok dlg@


# 1.8 18-Nov-2018 jmatthew

request notification of link state changes, which helps us detect
link when it takes a bit longer to establish.

ok dlg@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.7 21-Dec-2017 dlg

add some ifq and ifiq barriers in ixl_down.

move rxfill in ixl_up so the qtail is only written once.


# 1.6 21-Dec-2017 dlg

now that we have multiple input queues in ifnet structs, use them.

for now we still only have one set of tx and rx rings, but sets us up to
bump that number.


# 1.5 15-Dec-2017 dlg

put where im up to into the tree so jmatthew@ can look at it.


# 1.4 29-Nov-2017 dlg

turns out you dont have to configure a vsi as the default in a veb.

what a waste of two days.


# 1.3 29-Nov-2017 dlg

let this build on sparc64 again.


# 1.2 28-Nov-2017 dlg

remove the #if 0ed out ixl_add_veb now that cvs has backed it up

im pretty sure we dont need it if we're just using the chip as a
single ethernet port.


# 1.1 28-Nov-2017 dlg

add ixl(4) for the "Intel Ethernet 700 Series"

this doesn't work yet, but it very recently got too big to hack on
without cvs to help me manage further changes to it.

ok deraadt@


# 1.69 02-Nov-2020 dlg

add support for ctl reads and writes on 710 chips with API < 1.5

this gets rid of some annoying errors when bringing such an interface
up, but more importantly is allows RSS to work on these boards with
older firmware.

ok jmatthew@


Revision tags: OPENBSD_6_8_BASE
# 1.68 16-Jul-2020 dlg

use a mutex to serialise the test and set of ifp->if_link_state.

this was serialised by NET_LOCK, but now i get link state change
information in an interrupt context, so i shouldn't (can't) do that
anymore.

ok jmatthew@


# 1.67 16-Jul-2020 dlg

sc_atq_mtx is unused, so get rid of it


# 1.66 12-Jul-2020 dlg

it's not an error if the API doesnt support reading sfp/qsfp stuff.

it's just not supported. the manpage says why.

ok sthen@ deraadt@


# 1.65 11-Jul-2020 dlg

don't complain if the API is too low to support "query phy types".

it just means we won't know what to show in ifconfig media output,
but that's not a huge deal. there's still some more issues around
api versions and driver support that we're working on though.

ok sthen@ jmatthew@


# 1.64 10-Jul-2020 patrick

Change users of IFQ_SET_MAXLEN() and IFQ_IS_EMPTY() to use the "new" API.

ok dlg@ tobhe@


# 1.63 07-Jul-2020 dlg

fix a copy pasto.

from netbsd if_ixl.c r1.39 by yamaguchi


# 1.62 07-Jul-2020 dlg

don't try and be too clever in the kstat update timeout.


# 1.61 07-Jul-2020 dlg

remove some old code from a previous version of the kstat diff.

ixl(4) is only enabled on 64bit archs, so we don't need a 32bit
fallback for the 48 and 64 bit counter reads.


# 1.60 07-Jul-2020 dlg

add kstat support for reading hardware counters.

this chip is annoyingly complicated, which is reflected in how
complicated it is to read counters off the chip. while we just use
ixl as a normal network interface, the chip is capable of being a
switch with physical ports, virtual ports and all sorts of other
functionality, and there are counters in different places for all
these different pieces. in our simple setup the driver interface
is mapped to a single physical port which we talk to via a single
virtual switch interface. this diff adds counters on each interface
for the physical port and for the virtual switch interface (vsi).

the port counters show what the hardware is doing, while the vsi
counters show how the driver is interacting with the chip. for
things like packet counters, these numbers tend to correlate strongly,
but there are some differences. if the chip drops packets cos there's
no descriptors on the rx ring, that's shown in the vsi counters.
problems talking to the physical network (eg, packet corruption off
the wire) are reported on the port counters.

on top of the chip just being complicated, reading the counters is
a complicated activity on its own. because the counters can be read
by multiple consumers in a virtualised environment, the counters
never really get reset. they are also stored in annoyingly small
fields. this means you basically have to poll the chip periodically
and calculate differences between the polls to avoid losing numbers
if they overflow too quickly.


# 1.59 26-Jun-2020 dlg

whitespace fixes, no functional change.


# 1.58 26-Jun-2020 dlg

fix link state handling so we can see link go both up and down.


# 1.57 25-Jun-2020 dlg

LUT writes go through actual regsiters on 710s, not ctl writes.

found by jmatthew@


# 1.56 25-Jun-2020 dlg

add rss/toeplitz support for 710 chips.

this basically distributes incoming packets over the rx rings, where
without this they would all land on ring 0.

note that the 722 stuff is stubbed out at the moment. i don't have
an x722 to test with, so it's hard to get motivated to write the
code for it.

this is based on stuff supplied by christiano haesbaert.


# 1.55 25-Jun-2020 dlg

use the ixl_chip struct to store different rss_hena settings for 710/722


# 1.54 25-Jun-2020 dlg

add definitions for rss bits.

based on info from christiano haesbaert


# 1.53 25-Jun-2020 dlg

we're close to a point where the differences between 710s and 722s matter.

this adds a struct ixl_chip, which should hold the differences in
functionality between 710s and 722s. this adds which type of chip
each product is to the ixl_devices array.

based on stuff from christiano haesbaert


# 1.52 25-Jun-2020 dlg

use intrmap to set up multiple queues across multiple cpus.

ixl(4) is only enabled on amd64 and sparc64, and both of them now
support pci_intr_establish_cpu(), so it is safe to apply this.

a few things to note:

- the chip only supports a power of 2 number of queues, (ie, 1, 2,
4, 8, etc), so this also tests the INTRMAP_POWEROF2 flag to
intrmap_create. i tested this on a box with 6 cpus and it did the
right thing.
- the chip can support a lot of vectors, but we're limiting it to
8 for now.
- rss with toeplitz is not implemented yet, so all rxed packets end
up on the 0th queue for now.

jmatthew@ had done most of the work already, and christiano haesbaert
provided some hints to motivate me to work on this bit of it.

tested by jmatthew@ on sparc64
ok jmatthew@


# 1.51 24-Jun-2020 dlg

set IFQ_SET_MAXLEN to the number of slots on the tx ring, not 1.

this effectively enables tx mitigation on this chip. hrvoje popovski
tested it and discovered it adds about 20% to forwarding performance
on his test machine, and brings it more in line with ix(4) performance.

jmatthew thinks i copied setting it to 1 from myx, but myx resets
it to a proper value later on when it figures out what the chip is
capable of. how embarrassment.


# 1.50 24-Jun-2020 dlg

get rid of the per device sff lock because we only use the global one.

no functional change


# 1.49 21-Jun-2020 jmatthew

The onboard interfaces on T7/S7 machines don't provide a valid MAC address
for themselves, so use the "local-mac-address" Open Firmware property
instead, as done in ix(4).

ok dlg@


# 1.48 09-May-2020 jmatthew

Use MSI-X interrupts where available. The first vector is used for
events and command completions as that's the only vector they can go to.
tx/rx queues are mapped to subsequent vectors.

ok mpi@ dlg@


Revision tags: OPENBSD_6_7_BASE
# 1.47 22-Apr-2020 mpi

Use I40E_QUEUE_TYPE_EOL instead of hardcoding its value localy.

ok jmatthew@


# 1.46 19-Nov-2019 yasuoka

Correct the link speed constant variables. From Shoichi Yamaguchi.

ok jmatthew


Revision tags: OPENBSD_6_6_BASE
# 1.45 02-Oct-2019 yasuoka

When dequeuing an aqb from the live queue and move it to the idle queue,
remove it from the live queue. Found by Shoichi Yamaguchi.

ok dlg


# 1.44 30-Sep-2019 dlg

use the right values when figuring out which if_baudrate to use.

i was basically using ISSET(bitfield, shift) instead of
ISSET(bitfield, 1 << shift), so things didn't line up properly.

before this the baudrate would come out as 0, but lacp mode in
trunk(4) wants a non-zero if_baudrate value to help pick the best
aggregator to use. if the port reports 0, it doesn't get selected.
this probably explains why trunk(4) doesn't like doing lacp on
virtual ethernet interfaces too, but at least i can get a number
that actually means something on a hardware interface. alternatively,
use aggr(4), which doesn't care about the baudrate.

ok jmatthew@


# 1.43 27-Aug-2019 dlg

don't check the page number was set correctly.

there are some extremely terrible modules out there that violate
specs badly. some SFP+ DACs i have don't support page switching,
and always report page 0xff. let userland look at them anyway.


# 1.42 27-Aug-2019 dlg

make SIOCGIFSFFPAGE support QSFP modules better.

basically the "read phy register" admin op uses the "address" field
on SFP modules for the i2c address, but uses that same field for
the "sff page number" for QSFP. this wraps the calls to the admin
op with sfp and qsfp specific code, and chooses between them depending
on what type of phy the fw reports.

tested by me and jmatthew@ pretty hard


# 1.41 29-Jul-2019 jmatthew

remove some VF bits now that iavf(4) exists

ok dlg@


# 1.40 21-Jul-2019 dlg

apply backpressure when ifiq says the stack is getting busy


# 1.39 04-Jul-2019 jmatthew

when the mac address changes, update the mac/vlan filters accordingly.

ok dlg@


# 1.38 04-May-2019 jsg

fix array bounds check in ixl_search_link_speed()
ok dlg@


# 1.37 15-Apr-2019 visa

Use timeout_del_barrier(9) instead of timeout_del(9) followed by
conditional timeout_barrier(9).

OK kn@ dlg@


Revision tags: OPENBSD_6_5_BASE
# 1.36 10-Apr-2019 phessler

Add support for X722 to ixl(4)

full list of pci ids from sthen@

OK sthen@, jmatthew@, deraadt@


# 1.35 10-Apr-2019 dlg

implement support for SIOCGIFSFFPAGE

this relies on some firmware commands for accessing the i2c bus
that are not available until a relatively recent API version. our
nics using API 1.4 and 1.5 don't handle the command, but the API
1.7 nic we have is happy to talk to the module that is plugged into
it.

xl710 cards (the 40g ones with a qsfp+ connector) can be split up
into 4 functions that represent lanes on a single port. you can get
qsfp+ to 4x sfp+ cables so you can use the different lanes/functions
as completely independent interfaces. however, because each lane/
function is one port and therefore module, we need to serialise
access to the module by at least the port. this is to prevent
concurrent reads of different pages of the one module from stepping
on each other. i took the easy path and made a single ixlsff lock,
which is at least conservative.


# 1.34 01-Apr-2019 jmatthew

Enable the full pre-reset code path again, now that we've fixed the cause
of the pcie errors.

ok dlg@


# 1.33 01-Apr-2019 jmatthew

Don't use a prefetchable mapping for the registers - the controller only
accepts specific read and write widths, so prefetching and write combining
can cause it to generate pcie errors.

ok dlg@


# 1.32 29-Mar-2019 dlg

remove ifiq_barrier in ixl_down cos ifiq tasks don't use nic resources.

ixl_down frees all the dma memory used by the rings, but that memory
isn't used by ifiqs, so there's no need to wait for them to finish


# 1.31 22-Mar-2019 dlg

back out 1.28

i misread the doco, having promisc vlan reception is what we want.


# 1.30 22-Mar-2019 dlg

use a cond in ixl_atq_exec to wait for a command to be completed.

the main change is to not run ixl_atq_done, cos the interrupt handler
is supposed to do that for us.


# 1.29 22-Mar-2019 dlg

don't blindly complete admin queue entries

there's a bit in the flags field that the firmware sets when the
command is done, so check to decide if the command is ready to be
completed. this in turn makes ixl_iff work.

"oh" jmatthew@


# 1.28 21-Mar-2019 dlg

don't set VLAN things when configuring promisc

the doco says vlan things should only be set if you're manipulating
something on a specific vlan, otherwise the config applies to
everything, which is how the stack wants things to work at the
moment.

there's still something wrong in here, but let's get this out of
the way first.


# 1.27 21-Mar-2019 dlg

run event callbacks directly in the atq processing

previously events were queued on an SLIST, but multiple link state
events could fire with the same callback. this corrupted the SLIST
and effectively caused an infinite loop.

ok jmatthew@


# 1.26 12-Mar-2019 jmatthew

Until we can figure out why it causes NMIs on some machines, skip the
pre-reset steps described in Intel's datasheet and also their driver code.

ok dlg@


# 1.25 06-Mar-2019 jmatthew

Ignore rx interrupts if the interface isn't supposed to be running.
In some situations the pxe rom seems to leave rx interrupts
pending, so we get them as soon as we turn interrupts on.
Trying to process rx interrupts before we've allocated an rx ring
leads to crashes.

ok dlg@


# 1.24 01-Mar-2019 dlg

use ifiq_input instead of if_input

call if_rxr_livelocked if ifiq_input says to slow down


# 1.23 26-Feb-2019 dlg

tweak the mbuf loading in the tx path so it's easier to read.

count mbuf load failures as output errors so i can see if that's
a problem (it's not, but at least i can see it isn't now).


# 1.22 26-Feb-2019 dlg

put the rx buffer at the end of the cluster.

makes m_prepend later less likely to allocate a new mbuf.


# 1.21 26-Feb-2019 dlg

avoid a deadlock in ixl_down when calling ifq_barrier.

this is particularly noticable on sparc64 when you reboot.

ok jmatthew@


# 1.20 24-Feb-2019 dlg

get rid of an unused softc member


# 1.19 01-Feb-2019 jmatthew

fix up calculation of our physical function id, making the second port
on dual port cards work much better.

ok dlg@


# 1.18 29-Jan-2019 dlg

get rid of some more debug printfs

suggested by jmatthew@


# 1.17 29-Jan-2019 dlg

don't need to print the base queue number.

ok jmatthew@


# 1.16 22-Jan-2019 jmatthew

Increase hardmtu to the maximum according to the datasheet and set the rx
packet size limit to match so jumbos actually work. Larger packets are
split across multiple buffers on the ring, so the buffers themselves stay
the same size.

ok dlg@


# 1.15 22-Jan-2019 jmatthew

Add and remove mac filters for multicast addresses.

ok dlg@


# 1.14 21-Jan-2019 jmatthew

also add a mac filter that ignores vlans for the broadcast address, so we can
see arp requests on vlans, among other things.

ok dlg@


# 1.13 20-Jan-2019 jmatthew

Replace the default mac filter with one that ignores vlans, and enable
promisc vlan mode so we can see vlan tagged traffic.

ok dlg@


# 1.12 20-Jan-2019 jmatthew

implement ixl_rxrinfo, dynamically allocating the buffer to prepare for
multiple receive rings at some point in the future.

ok dlg@


# 1.11 20-Jan-2019 jmatthew

Handle link state change interrupts by issuing IXL_AQ_OP_PHY_LINK_STATUS
to the admin queue. We don't need to wait for or process the reply,
because the existing admin reply queue processing already does it.

ok dlg@


# 1.10 19-Jan-2019 jmatthew

actually set CAUSE_ENA on the rx and tx queues, and re-enable interrupts
at the start of the interrupt handler. now it works well enough to commit
over.

ok dlg@


# 1.9 18-Jan-2019 jmatthew

pack hmc bits in the right order

ok dlg@


# 1.8 18-Nov-2018 jmatthew

request notification of link state changes, which helps us detect
link when it takes a bit longer to establish.

ok dlg@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.7 21-Dec-2017 dlg

add some ifq and ifiq barriers in ixl_down.

move rxfill in ixl_up so the qtail is only written once.


# 1.6 21-Dec-2017 dlg

now that we have multiple input queues in ifnet structs, use them.

for now we still only have one set of tx and rx rings, but sets us up to
bump that number.


# 1.5 15-Dec-2017 dlg

put where im up to into the tree so jmatthew@ can look at it.


# 1.4 29-Nov-2017 dlg

turns out you dont have to configure a vsi as the default in a veb.

what a waste of two days.


# 1.3 29-Nov-2017 dlg

let this build on sparc64 again.


# 1.2 28-Nov-2017 dlg

remove the #if 0ed out ixl_add_veb now that cvs has backed it up

im pretty sure we dont need it if we're just using the chip as a
single ethernet port.


# 1.1 28-Nov-2017 dlg

add ixl(4) for the "Intel Ethernet 700 Series"

this doesn't work yet, but it very recently got too big to hack on
without cvs to help me manage further changes to it.

ok deraadt@


# 1.68 16-Jul-2020 dlg

use a mutex to serialise the test and set of ifp->if_link_state.

this was serialised by NET_LOCK, but now i get link state change
information in an interrupt context, so i shouldn't (can't) do that
anymore.

ok jmatthew@


# 1.67 16-Jul-2020 dlg

sc_atq_mtx is unused, so get rid of it


# 1.66 12-Jul-2020 dlg

it's not an error if the API doesnt support reading sfp/qsfp stuff.

it's just not supported. the manpage says why.

ok sthen@ deraadt@


# 1.65 11-Jul-2020 dlg

don't complain if the API is too low to support "query phy types".

it just means we won't know what to show in ifconfig media output,
but that's not a huge deal. there's still some more issues around
api versions and driver support that we're working on though.

ok sthen@ jmatthew@


# 1.64 10-Jul-2020 patrick

Change users of IFQ_SET_MAXLEN() and IFQ_IS_EMPTY() to use the "new" API.

ok dlg@ tobhe@


# 1.63 07-Jul-2020 dlg

fix a copy pasto.

from netbsd if_ixl.c r1.39 by yamaguchi


# 1.62 07-Jul-2020 dlg

don't try and be too clever in the kstat update timeout.


# 1.61 07-Jul-2020 dlg

remove some old code from a previous version of the kstat diff.

ixl(4) is only enabled on 64bit archs, so we don't need a 32bit
fallback for the 48 and 64 bit counter reads.


# 1.60 07-Jul-2020 dlg

add kstat support for reading hardware counters.

this chip is annoyingly complicated, which is reflected in how
complicated it is to read counters off the chip. while we just use
ixl as a normal network interface, the chip is capable of being a
switch with physical ports, virtual ports and all sorts of other
functionality, and there are counters in different places for all
these different pieces. in our simple setup the driver interface
is mapped to a single physical port which we talk to via a single
virtual switch interface. this diff adds counters on each interface
for the physical port and for the virtual switch interface (vsi).

the port counters show what the hardware is doing, while the vsi
counters show how the driver is interacting with the chip. for
things like packet counters, these numbers tend to correlate strongly,
but there are some differences. if the chip drops packets cos there's
no descriptors on the rx ring, that's shown in the vsi counters.
problems talking to the physical network (eg, packet corruption off
the wire) are reported on the port counters.

on top of the chip just being complicated, reading the counters is
a complicated activity on its own. because the counters can be read
by multiple consumers in a virtualised environment, the counters
never really get reset. they are also stored in annoyingly small
fields. this means you basically have to poll the chip periodically
and calculate differences between the polls to avoid losing numbers
if they overflow too quickly.


# 1.59 26-Jun-2020 dlg

whitespace fixes, no functional change.


# 1.58 26-Jun-2020 dlg

fix link state handling so we can see link go both up and down.


# 1.57 25-Jun-2020 dlg

LUT writes go through actual regsiters on 710s, not ctl writes.

found by jmatthew@


# 1.56 25-Jun-2020 dlg

add rss/toeplitz support for 710 chips.

this basically distributes incoming packets over the rx rings, where
without this they would all land on ring 0.

note that the 722 stuff is stubbed out at the moment. i don't have
an x722 to test with, so it's hard to get motivated to write the
code for it.

this is based on stuff supplied by christiano haesbaert.


# 1.55 25-Jun-2020 dlg

use the ixl_chip struct to store different rss_hena settings for 710/722


# 1.54 25-Jun-2020 dlg

add definitions for rss bits.

based on info from christiano haesbaert


# 1.53 25-Jun-2020 dlg

we're close to a point where the differences between 710s and 722s matter.

this adds a struct ixl_chip, which should hold the differences in
functionality between 710s and 722s. this adds which type of chip
each product is to the ixl_devices array.

based on stuff from christiano haesbaert


# 1.52 25-Jun-2020 dlg

use intrmap to set up multiple queues across multiple cpus.

ixl(4) is only enabled on amd64 and sparc64, and both of them now
support pci_intr_establish_cpu(), so it is safe to apply this.

a few things to note:

- the chip only supports a power of 2 number of queues, (ie, 1, 2,
4, 8, etc), so this also tests the INTRMAP_POWEROF2 flag to
intrmap_create. i tested this on a box with 6 cpus and it did the
right thing.
- the chip can support a lot of vectors, but we're limiting it to
8 for now.
- rss with toeplitz is not implemented yet, so all rxed packets end
up on the 0th queue for now.

jmatthew@ had done most of the work already, and christiano haesbaert
provided some hints to motivate me to work on this bit of it.

tested by jmatthew@ on sparc64
ok jmatthew@


# 1.51 24-Jun-2020 dlg

set IFQ_SET_MAXLEN to the number of slots on the tx ring, not 1.

this effectively enables tx mitigation on this chip. hrvoje popovski
tested it and discovered it adds about 20% to forwarding performance
on his test machine, and brings it more in line with ix(4) performance.

jmatthew thinks i copied setting it to 1 from myx, but myx resets
it to a proper value later on when it figures out what the chip is
capable of. how embarrassment.


# 1.50 24-Jun-2020 dlg

get rid of the per device sff lock because we only use the global one.

no functional change


# 1.49 21-Jun-2020 jmatthew

The onboard interfaces on T7/S7 machines don't provide a valid MAC address
for themselves, so use the "local-mac-address" Open Firmware property
instead, as done in ix(4).

ok dlg@


# 1.48 09-May-2020 jmatthew

Use MSI-X interrupts where available. The first vector is used for
events and command completions as that's the only vector they can go to.
tx/rx queues are mapped to subsequent vectors.

ok mpi@ dlg@


Revision tags: OPENBSD_6_7_BASE
# 1.47 22-Apr-2020 mpi

Use I40E_QUEUE_TYPE_EOL instead of hardcoding its value localy.

ok jmatthew@


# 1.46 19-Nov-2019 yasuoka

Correct the link speed constant variables. From Shoichi Yamaguchi.

ok jmatthew


Revision tags: OPENBSD_6_6_BASE
# 1.45 02-Oct-2019 yasuoka

When dequeuing an aqb from the live queue and move it to the idle queue,
remove it from the live queue. Found by Shoichi Yamaguchi.

ok dlg


# 1.44 30-Sep-2019 dlg

use the right values when figuring out which if_baudrate to use.

i was basically using ISSET(bitfield, shift) instead of
ISSET(bitfield, 1 << shift), so things didn't line up properly.

before this the baudrate would come out as 0, but lacp mode in
trunk(4) wants a non-zero if_baudrate value to help pick the best
aggregator to use. if the port reports 0, it doesn't get selected.
this probably explains why trunk(4) doesn't like doing lacp on
virtual ethernet interfaces too, but at least i can get a number
that actually means something on a hardware interface. alternatively,
use aggr(4), which doesn't care about the baudrate.

ok jmatthew@


# 1.43 27-Aug-2019 dlg

don't check the page number was set correctly.

there are some extremely terrible modules out there that violate
specs badly. some SFP+ DACs i have don't support page switching,
and always report page 0xff. let userland look at them anyway.


# 1.42 27-Aug-2019 dlg

make SIOCGIFSFFPAGE support QSFP modules better.

basically the "read phy register" admin op uses the "address" field
on SFP modules for the i2c address, but uses that same field for
the "sff page number" for QSFP. this wraps the calls to the admin
op with sfp and qsfp specific code, and chooses between them depending
on what type of phy the fw reports.

tested by me and jmatthew@ pretty hard


# 1.41 29-Jul-2019 jmatthew

remove some VF bits now that iavf(4) exists

ok dlg@


# 1.40 21-Jul-2019 dlg

apply backpressure when ifiq says the stack is getting busy


# 1.39 04-Jul-2019 jmatthew

when the mac address changes, update the mac/vlan filters accordingly.

ok dlg@


# 1.38 04-May-2019 jsg

fix array bounds check in ixl_search_link_speed()
ok dlg@


# 1.37 15-Apr-2019 visa

Use timeout_del_barrier(9) instead of timeout_del(9) followed by
conditional timeout_barrier(9).

OK kn@ dlg@


Revision tags: OPENBSD_6_5_BASE
# 1.36 10-Apr-2019 phessler

Add support for X722 to ixl(4)

full list of pci ids from sthen@

OK sthen@, jmatthew@, deraadt@


# 1.35 10-Apr-2019 dlg

implement support for SIOCGIFSFFPAGE

this relies on some firmware commands for accessing the i2c bus
that are not available until a relatively recent API version. our
nics using API 1.4 and 1.5 don't handle the command, but the API
1.7 nic we have is happy to talk to the module that is plugged into
it.

xl710 cards (the 40g ones with a qsfp+ connector) can be split up
into 4 functions that represent lanes on a single port. you can get
qsfp+ to 4x sfp+ cables so you can use the different lanes/functions
as completely independent interfaces. however, because each lane/
function is one port and therefore module, we need to serialise
access to the module by at least the port. this is to prevent
concurrent reads of different pages of the one module from stepping
on each other. i took the easy path and made a single ixlsff lock,
which is at least conservative.


# 1.34 01-Apr-2019 jmatthew

Enable the full pre-reset code path again, now that we've fixed the cause
of the pcie errors.

ok dlg@


# 1.33 01-Apr-2019 jmatthew

Don't use a prefetchable mapping for the registers - the controller only
accepts specific read and write widths, so prefetching and write combining
can cause it to generate pcie errors.

ok dlg@


# 1.32 29-Mar-2019 dlg

remove ifiq_barrier in ixl_down cos ifiq tasks don't use nic resources.

ixl_down frees all the dma memory used by the rings, but that memory
isn't used by ifiqs, so there's no need to wait for them to finish


# 1.31 22-Mar-2019 dlg

back out 1.28

i misread the doco, having promisc vlan reception is what we want.


# 1.30 22-Mar-2019 dlg

use a cond in ixl_atq_exec to wait for a command to be completed.

the main change is to not run ixl_atq_done, cos the interrupt handler
is supposed to do that for us.


# 1.29 22-Mar-2019 dlg

don't blindly complete admin queue entries

there's a bit in the flags field that the firmware sets when the
command is done, so check to decide if the command is ready to be
completed. this in turn makes ixl_iff work.

"oh" jmatthew@


# 1.28 21-Mar-2019 dlg

don't set VLAN things when configuring promisc

the doco says vlan things should only be set if you're manipulating
something on a specific vlan, otherwise the config applies to
everything, which is how the stack wants things to work at the
moment.

there's still something wrong in here, but let's get this out of
the way first.


# 1.27 21-Mar-2019 dlg

run event callbacks directly in the atq processing

previously events were queued on an SLIST, but multiple link state
events could fire with the same callback. this corrupted the SLIST
and effectively caused an infinite loop.

ok jmatthew@


# 1.26 12-Mar-2019 jmatthew

Until we can figure out why it causes NMIs on some machines, skip the
pre-reset steps described in Intel's datasheet and also their driver code.

ok dlg@


# 1.25 06-Mar-2019 jmatthew

Ignore rx interrupts if the interface isn't supposed to be running.
In some situations the pxe rom seems to leave rx interrupts
pending, so we get them as soon as we turn interrupts on.
Trying to process rx interrupts before we've allocated an rx ring
leads to crashes.

ok dlg@


# 1.24 01-Mar-2019 dlg

use ifiq_input instead of if_input

call if_rxr_livelocked if ifiq_input says to slow down


# 1.23 26-Feb-2019 dlg

tweak the mbuf loading in the tx path so it's easier to read.

count mbuf load failures as output errors so i can see if that's
a problem (it's not, but at least i can see it isn't now).


# 1.22 26-Feb-2019 dlg

put the rx buffer at the end of the cluster.

makes m_prepend later less likely to allocate a new mbuf.


# 1.21 26-Feb-2019 dlg

avoid a deadlock in ixl_down when calling ifq_barrier.

this is particularly noticable on sparc64 when you reboot.

ok jmatthew@


# 1.20 24-Feb-2019 dlg

get rid of an unused softc member


# 1.19 01-Feb-2019 jmatthew

fix up calculation of our physical function id, making the second port
on dual port cards work much better.

ok dlg@


# 1.18 29-Jan-2019 dlg

get rid of some more debug printfs

suggested by jmatthew@


# 1.17 29-Jan-2019 dlg

don't need to print the base queue number.

ok jmatthew@


# 1.16 22-Jan-2019 jmatthew

Increase hardmtu to the maximum according to the datasheet and set the rx
packet size limit to match so jumbos actually work. Larger packets are
split across multiple buffers on the ring, so the buffers themselves stay
the same size.

ok dlg@


# 1.15 22-Jan-2019 jmatthew

Add and remove mac filters for multicast addresses.

ok dlg@


# 1.14 21-Jan-2019 jmatthew

also add a mac filter that ignores vlans for the broadcast address, so we can
see arp requests on vlans, among other things.

ok dlg@


# 1.13 20-Jan-2019 jmatthew

Replace the default mac filter with one that ignores vlans, and enable
promisc vlan mode so we can see vlan tagged traffic.

ok dlg@


# 1.12 20-Jan-2019 jmatthew

implement ixl_rxrinfo, dynamically allocating the buffer to prepare for
multiple receive rings at some point in the future.

ok dlg@


# 1.11 20-Jan-2019 jmatthew

Handle link state change interrupts by issuing IXL_AQ_OP_PHY_LINK_STATUS
to the admin queue. We don't need to wait for or process the reply,
because the existing admin reply queue processing already does it.

ok dlg@


# 1.10 19-Jan-2019 jmatthew

actually set CAUSE_ENA on the rx and tx queues, and re-enable interrupts
at the start of the interrupt handler. now it works well enough to commit
over.

ok dlg@


# 1.9 18-Jan-2019 jmatthew

pack hmc bits in the right order

ok dlg@


# 1.8 18-Nov-2018 jmatthew

request notification of link state changes, which helps us detect
link when it takes a bit longer to establish.

ok dlg@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.7 21-Dec-2017 dlg

add some ifq and ifiq barriers in ixl_down.

move rxfill in ixl_up so the qtail is only written once.


# 1.6 21-Dec-2017 dlg

now that we have multiple input queues in ifnet structs, use them.

for now we still only have one set of tx and rx rings, but sets us up to
bump that number.


# 1.5 15-Dec-2017 dlg

put where im up to into the tree so jmatthew@ can look at it.


# 1.4 29-Nov-2017 dlg

turns out you dont have to configure a vsi as the default in a veb.

what a waste of two days.


# 1.3 29-Nov-2017 dlg

let this build on sparc64 again.


# 1.2 28-Nov-2017 dlg

remove the #if 0ed out ixl_add_veb now that cvs has backed it up

im pretty sure we dont need it if we're just using the chip as a
single ethernet port.


# 1.1 28-Nov-2017 dlg

add ixl(4) for the "Intel Ethernet 700 Series"

this doesn't work yet, but it very recently got too big to hack on
without cvs to help me manage further changes to it.

ok deraadt@


# 1.66 12-Jul-2020 dlg

it's not an error if the API doesnt support reading sfp/qsfp stuff.

it's just not supported. the manpage says why.

ok sthen@ deraadt@


# 1.65 11-Jul-2020 dlg

don't complain if the API is too low to support "query phy types".

it just means we won't know what to show in ifconfig media output,
but that's not a huge deal. there's still some more issues around
api versions and driver support that we're working on though.

ok sthen@ jmatthew@


# 1.64 10-Jul-2020 patrick

Change users of IFQ_SET_MAXLEN() and IFQ_IS_EMPTY() to use the "new" API.

ok dlg@ tobhe@


# 1.63 07-Jul-2020 dlg

fix a copy pasto.

from netbsd if_ixl.c r1.39 by yamaguchi


# 1.62 07-Jul-2020 dlg

don't try and be too clever in the kstat update timeout.


# 1.61 07-Jul-2020 dlg

remove some old code from a previous version of the kstat diff.

ixl(4) is only enabled on 64bit archs, so we don't need a 32bit
fallback for the 48 and 64 bit counter reads.


# 1.60 07-Jul-2020 dlg

add kstat support for reading hardware counters.

this chip is annoyingly complicated, which is reflected in how
complicated it is to read counters off the chip. while we just use
ixl as a normal network interface, the chip is capable of being a
switch with physical ports, virtual ports and all sorts of other
functionality, and there are counters in different places for all
these different pieces. in our simple setup the driver interface
is mapped to a single physical port which we talk to via a single
virtual switch interface. this diff adds counters on each interface
for the physical port and for the virtual switch interface (vsi).

the port counters show what the hardware is doing, while the vsi
counters show how the driver is interacting with the chip. for
things like packet counters, these numbers tend to correlate strongly,
but there are some differences. if the chip drops packets cos there's
no descriptors on the rx ring, that's shown in the vsi counters.
problems talking to the physical network (eg, packet corruption off
the wire) are reported on the port counters.

on top of the chip just being complicated, reading the counters is
a complicated activity on its own. because the counters can be read
by multiple consumers in a virtualised environment, the counters
never really get reset. they are also stored in annoyingly small
fields. this means you basically have to poll the chip periodically
and calculate differences between the polls to avoid losing numbers
if they overflow too quickly.


# 1.59 26-Jun-2020 dlg

whitespace fixes, no functional change.


# 1.58 26-Jun-2020 dlg

fix link state handling so we can see link go both up and down.


# 1.57 25-Jun-2020 dlg

LUT writes go through actual regsiters on 710s, not ctl writes.

found by jmatthew@


# 1.56 25-Jun-2020 dlg

add rss/toeplitz support for 710 chips.

this basically distributes incoming packets over the rx rings, where
without this they would all land on ring 0.

note that the 722 stuff is stubbed out at the moment. i don't have
an x722 to test with, so it's hard to get motivated to write the
code for it.

this is based on stuff supplied by christiano haesbaert.


# 1.55 25-Jun-2020 dlg

use the ixl_chip struct to store different rss_hena settings for 710/722


# 1.54 25-Jun-2020 dlg

add definitions for rss bits.

based on info from christiano haesbaert


# 1.53 25-Jun-2020 dlg

we're close to a point where the differences between 710s and 722s matter.

this adds a struct ixl_chip, which should hold the differences in
functionality between 710s and 722s. this adds which type of chip
each product is to the ixl_devices array.

based on stuff from christiano haesbaert


# 1.52 25-Jun-2020 dlg

use intrmap to set up multiple queues across multiple cpus.

ixl(4) is only enabled on amd64 and sparc64, and both of them now
support pci_intr_establish_cpu(), so it is safe to apply this.

a few things to note:

- the chip only supports a power of 2 number of queues, (ie, 1, 2,
4, 8, etc), so this also tests the INTRMAP_POWEROF2 flag to
intrmap_create. i tested this on a box with 6 cpus and it did the
right thing.
- the chip can support a lot of vectors, but we're limiting it to
8 for now.
- rss with toeplitz is not implemented yet, so all rxed packets end
up on the 0th queue for now.

jmatthew@ had done most of the work already, and christiano haesbaert
provided some hints to motivate me to work on this bit of it.

tested by jmatthew@ on sparc64
ok jmatthew@


# 1.51 24-Jun-2020 dlg

set IFQ_SET_MAXLEN to the number of slots on the tx ring, not 1.

this effectively enables tx mitigation on this chip. hrvoje popovski
tested it and discovered it adds about 20% to forwarding performance
on his test machine, and brings it more in line with ix(4) performance.

jmatthew thinks i copied setting it to 1 from myx, but myx resets
it to a proper value later on when it figures out what the chip is
capable of. how embarrassment.


# 1.50 24-Jun-2020 dlg

get rid of the per device sff lock because we only use the global one.

no functional change


# 1.49 21-Jun-2020 jmatthew

The onboard interfaces on T7/S7 machines don't provide a valid MAC address
for themselves, so use the "local-mac-address" Open Firmware property
instead, as done in ix(4).

ok dlg@


# 1.48 09-May-2020 jmatthew

Use MSI-X interrupts where available. The first vector is used for
events and command completions as that's the only vector they can go to.
tx/rx queues are mapped to subsequent vectors.

ok mpi@ dlg@


Revision tags: OPENBSD_6_7_BASE
# 1.47 22-Apr-2020 mpi

Use I40E_QUEUE_TYPE_EOL instead of hardcoding its value localy.

ok jmatthew@


# 1.46 19-Nov-2019 yasuoka

Correct the link speed constant variables. From Shoichi Yamaguchi.

ok jmatthew


Revision tags: OPENBSD_6_6_BASE
# 1.45 02-Oct-2019 yasuoka

When dequeuing an aqb from the live queue and move it to the idle queue,
remove it from the live queue. Found by Shoichi Yamaguchi.

ok dlg


# 1.44 30-Sep-2019 dlg

use the right values when figuring out which if_baudrate to use.

i was basically using ISSET(bitfield, shift) instead of
ISSET(bitfield, 1 << shift), so things didn't line up properly.

before this the baudrate would come out as 0, but lacp mode in
trunk(4) wants a non-zero if_baudrate value to help pick the best
aggregator to use. if the port reports 0, it doesn't get selected.
this probably explains why trunk(4) doesn't like doing lacp on
virtual ethernet interfaces too, but at least i can get a number
that actually means something on a hardware interface. alternatively,
use aggr(4), which doesn't care about the baudrate.

ok jmatthew@


# 1.43 27-Aug-2019 dlg

don't check the page number was set correctly.

there are some extremely terrible modules out there that violate
specs badly. some SFP+ DACs i have don't support page switching,
and always report page 0xff. let userland look at them anyway.


# 1.42 27-Aug-2019 dlg

make SIOCGIFSFFPAGE support QSFP modules better.

basically the "read phy register" admin op uses the "address" field
on SFP modules for the i2c address, but uses that same field for
the "sff page number" for QSFP. this wraps the calls to the admin
op with sfp and qsfp specific code, and chooses between them depending
on what type of phy the fw reports.

tested by me and jmatthew@ pretty hard


# 1.41 29-Jul-2019 jmatthew

remove some VF bits now that iavf(4) exists

ok dlg@


# 1.40 21-Jul-2019 dlg

apply backpressure when ifiq says the stack is getting busy


# 1.39 04-Jul-2019 jmatthew

when the mac address changes, update the mac/vlan filters accordingly.

ok dlg@


# 1.38 04-May-2019 jsg

fix array bounds check in ixl_search_link_speed()
ok dlg@


# 1.37 15-Apr-2019 visa

Use timeout_del_barrier(9) instead of timeout_del(9) followed by
conditional timeout_barrier(9).

OK kn@ dlg@


Revision tags: OPENBSD_6_5_BASE
# 1.36 10-Apr-2019 phessler

Add support for X722 to ixl(4)

full list of pci ids from sthen@

OK sthen@, jmatthew@, deraadt@


# 1.35 10-Apr-2019 dlg

implement support for SIOCGIFSFFPAGE

this relies on some firmware commands for accessing the i2c bus
that are not available until a relatively recent API version. our
nics using API 1.4 and 1.5 don't handle the command, but the API
1.7 nic we have is happy to talk to the module that is plugged into
it.

xl710 cards (the 40g ones with a qsfp+ connector) can be split up
into 4 functions that represent lanes on a single port. you can get
qsfp+ to 4x sfp+ cables so you can use the different lanes/functions
as completely independent interfaces. however, because each lane/
function is one port and therefore module, we need to serialise
access to the module by at least the port. this is to prevent
concurrent reads of different pages of the one module from stepping
on each other. i took the easy path and made a single ixlsff lock,
which is at least conservative.


# 1.34 01-Apr-2019 jmatthew

Enable the full pre-reset code path again, now that we've fixed the cause
of the pcie errors.

ok dlg@


# 1.33 01-Apr-2019 jmatthew

Don't use a prefetchable mapping for the registers - the controller only
accepts specific read and write widths, so prefetching and write combining
can cause it to generate pcie errors.

ok dlg@


# 1.32 29-Mar-2019 dlg

remove ifiq_barrier in ixl_down cos ifiq tasks don't use nic resources.

ixl_down frees all the dma memory used by the rings, but that memory
isn't used by ifiqs, so there's no need to wait for them to finish


# 1.31 22-Mar-2019 dlg

back out 1.28

i misread the doco, having promisc vlan reception is what we want.


# 1.30 22-Mar-2019 dlg

use a cond in ixl_atq_exec to wait for a command to be completed.

the main change is to not run ixl_atq_done, cos the interrupt handler
is supposed to do that for us.


# 1.29 22-Mar-2019 dlg

don't blindly complete admin queue entries

there's a bit in the flags field that the firmware sets when the
command is done, so check to decide if the command is ready to be
completed. this in turn makes ixl_iff work.

"oh" jmatthew@


# 1.28 21-Mar-2019 dlg

don't set VLAN things when configuring promisc

the doco says vlan things should only be set if you're manipulating
something on a specific vlan, otherwise the config applies to
everything, which is how the stack wants things to work at the
moment.

there's still something wrong in here, but let's get this out of
the way first.


# 1.27 21-Mar-2019 dlg

run event callbacks directly in the atq processing

previously events were queued on an SLIST, but multiple link state
events could fire with the same callback. this corrupted the SLIST
and effectively caused an infinite loop.

ok jmatthew@


# 1.26 12-Mar-2019 jmatthew

Until we can figure out why it causes NMIs on some machines, skip the
pre-reset steps described in Intel's datasheet and also their driver code.

ok dlg@


# 1.25 06-Mar-2019 jmatthew

Ignore rx interrupts if the interface isn't supposed to be running.
In some situations the pxe rom seems to leave rx interrupts
pending, so we get them as soon as we turn interrupts on.
Trying to process rx interrupts before we've allocated an rx ring
leads to crashes.

ok dlg@


# 1.24 01-Mar-2019 dlg

use ifiq_input instead of if_input

call if_rxr_livelocked if ifiq_input says to slow down


# 1.23 26-Feb-2019 dlg

tweak the mbuf loading in the tx path so it's easier to read.

count mbuf load failures as output errors so i can see if that's
a problem (it's not, but at least i can see it isn't now).


# 1.22 26-Feb-2019 dlg

put the rx buffer at the end of the cluster.

makes m_prepend later less likely to allocate a new mbuf.


# 1.21 26-Feb-2019 dlg

avoid a deadlock in ixl_down when calling ifq_barrier.

this is particularly noticable on sparc64 when you reboot.

ok jmatthew@


# 1.20 24-Feb-2019 dlg

get rid of an unused softc member


# 1.19 01-Feb-2019 jmatthew

fix up calculation of our physical function id, making the second port
on dual port cards work much better.

ok dlg@


# 1.18 29-Jan-2019 dlg

get rid of some more debug printfs

suggested by jmatthew@


# 1.17 29-Jan-2019 dlg

don't need to print the base queue number.

ok jmatthew@


# 1.16 22-Jan-2019 jmatthew

Increase hardmtu to the maximum according to the datasheet and set the rx
packet size limit to match so jumbos actually work. Larger packets are
split across multiple buffers on the ring, so the buffers themselves stay
the same size.

ok dlg@


# 1.15 22-Jan-2019 jmatthew

Add and remove mac filters for multicast addresses.

ok dlg@


# 1.14 21-Jan-2019 jmatthew

also add a mac filter that ignores vlans for the broadcast address, so we can
see arp requests on vlans, among other things.

ok dlg@


# 1.13 20-Jan-2019 jmatthew

Replace the default mac filter with one that ignores vlans, and enable
promisc vlan mode so we can see vlan tagged traffic.

ok dlg@


# 1.12 20-Jan-2019 jmatthew

implement ixl_rxrinfo, dynamically allocating the buffer to prepare for
multiple receive rings at some point in the future.

ok dlg@


# 1.11 20-Jan-2019 jmatthew

Handle link state change interrupts by issuing IXL_AQ_OP_PHY_LINK_STATUS
to the admin queue. We don't need to wait for or process the reply,
because the existing admin reply queue processing already does it.

ok dlg@


# 1.10 19-Jan-2019 jmatthew

actually set CAUSE_ENA on the rx and tx queues, and re-enable interrupts
at the start of the interrupt handler. now it works well enough to commit
over.

ok dlg@


# 1.9 18-Jan-2019 jmatthew

pack hmc bits in the right order

ok dlg@


# 1.8 18-Nov-2018 jmatthew

request notification of link state changes, which helps us detect
link when it takes a bit longer to establish.

ok dlg@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.7 21-Dec-2017 dlg

add some ifq and ifiq barriers in ixl_down.

move rxfill in ixl_up so the qtail is only written once.


# 1.6 21-Dec-2017 dlg

now that we have multiple input queues in ifnet structs, use them.

for now we still only have one set of tx and rx rings, but sets us up to
bump that number.


# 1.5 15-Dec-2017 dlg

put where im up to into the tree so jmatthew@ can look at it.


# 1.4 29-Nov-2017 dlg

turns out you dont have to configure a vsi as the default in a veb.

what a waste of two days.


# 1.3 29-Nov-2017 dlg

let this build on sparc64 again.


# 1.2 28-Nov-2017 dlg

remove the #if 0ed out ixl_add_veb now that cvs has backed it up

im pretty sure we dont need it if we're just using the chip as a
single ethernet port.


# 1.1 28-Nov-2017 dlg

add ixl(4) for the "Intel Ethernet 700 Series"

this doesn't work yet, but it very recently got too big to hack on
without cvs to help me manage further changes to it.

ok deraadt@


# 1.65 11-Jul-2020 dlg

don't complain if the API is too low to support "query phy types".

it just means we won't know what to show in ifconfig media output,
but that's not a huge deal. there's still some more issues around
api versions and driver support that we're working on though.

ok sthen@ jmatthew@


# 1.64 10-Jul-2020 patrick

Change users of IFQ_SET_MAXLEN() and IFQ_IS_EMPTY() to use the "new" API.

ok dlg@ tobhe@


# 1.63 07-Jul-2020 dlg

fix a copy pasto.

from netbsd if_ixl.c r1.39 by yamaguchi


# 1.62 07-Jul-2020 dlg

don't try and be too clever in the kstat update timeout.


# 1.61 07-Jul-2020 dlg

remove some old code from a previous version of the kstat diff.

ixl(4) is only enabled on 64bit archs, so we don't need a 32bit
fallback for the 48 and 64 bit counter reads.


# 1.60 07-Jul-2020 dlg

add kstat support for reading hardware counters.

this chip is annoyingly complicated, which is reflected in how
complicated it is to read counters off the chip. while we just use
ixl as a normal network interface, the chip is capable of being a
switch with physical ports, virtual ports and all sorts of other
functionality, and there are counters in different places for all
these different pieces. in our simple setup the driver interface
is mapped to a single physical port which we talk to via a single
virtual switch interface. this diff adds counters on each interface
for the physical port and for the virtual switch interface (vsi).

the port counters show what the hardware is doing, while the vsi
counters show how the driver is interacting with the chip. for
things like packet counters, these numbers tend to correlate strongly,
but there are some differences. if the chip drops packets cos there's
no descriptors on the rx ring, that's shown in the vsi counters.
problems talking to the physical network (eg, packet corruption off
the wire) are reported on the port counters.

on top of the chip just being complicated, reading the counters is
a complicated activity on its own. because the counters can be read
by multiple consumers in a virtualised environment, the counters
never really get reset. they are also stored in annoyingly small
fields. this means you basically have to poll the chip periodically
and calculate differences between the polls to avoid losing numbers
if they overflow too quickly.


# 1.59 26-Jun-2020 dlg

whitespace fixes, no functional change.


# 1.58 26-Jun-2020 dlg

fix link state handling so we can see link go both up and down.


# 1.57 25-Jun-2020 dlg

LUT writes go through actual regsiters on 710s, not ctl writes.

found by jmatthew@


# 1.56 25-Jun-2020 dlg

add rss/toeplitz support for 710 chips.

this basically distributes incoming packets over the rx rings, where
without this they would all land on ring 0.

note that the 722 stuff is stubbed out at the moment. i don't have
an x722 to test with, so it's hard to get motivated to write the
code for it.

this is based on stuff supplied by christiano haesbaert.


# 1.55 25-Jun-2020 dlg

use the ixl_chip struct to store different rss_hena settings for 710/722


# 1.54 25-Jun-2020 dlg

add definitions for rss bits.

based on info from christiano haesbaert


# 1.53 25-Jun-2020 dlg

we're close to a point where the differences between 710s and 722s matter.

this adds a struct ixl_chip, which should hold the differences in
functionality between 710s and 722s. this adds which type of chip
each product is to the ixl_devices array.

based on stuff from christiano haesbaert


# 1.52 25-Jun-2020 dlg

use intrmap to set up multiple queues across multiple cpus.

ixl(4) is only enabled on amd64 and sparc64, and both of them now
support pci_intr_establish_cpu(), so it is safe to apply this.

a few things to note:

- the chip only supports a power of 2 number of queues, (ie, 1, 2,
4, 8, etc), so this also tests the INTRMAP_POWEROF2 flag to
intrmap_create. i tested this on a box with 6 cpus and it did the
right thing.
- the chip can support a lot of vectors, but we're limiting it to
8 for now.
- rss with toeplitz is not implemented yet, so all rxed packets end
up on the 0th queue for now.

jmatthew@ had done most of the work already, and christiano haesbaert
provided some hints to motivate me to work on this bit of it.

tested by jmatthew@ on sparc64
ok jmatthew@


# 1.51 24-Jun-2020 dlg

set IFQ_SET_MAXLEN to the number of slots on the tx ring, not 1.

this effectively enables tx mitigation on this chip. hrvoje popovski
tested it and discovered it adds about 20% to forwarding performance
on his test machine, and brings it more in line with ix(4) performance.

jmatthew thinks i copied setting it to 1 from myx, but myx resets
it to a proper value later on when it figures out what the chip is
capable of. how embarrassment.


# 1.50 24-Jun-2020 dlg

get rid of the per device sff lock because we only use the global one.

no functional change


# 1.49 21-Jun-2020 jmatthew

The onboard interfaces on T7/S7 machines don't provide a valid MAC address
for themselves, so use the "local-mac-address" Open Firmware property
instead, as done in ix(4).

ok dlg@


# 1.48 09-May-2020 jmatthew

Use MSI-X interrupts where available. The first vector is used for
events and command completions as that's the only vector they can go to.
tx/rx queues are mapped to subsequent vectors.

ok mpi@ dlg@


Revision tags: OPENBSD_6_7_BASE
# 1.47 22-Apr-2020 mpi

Use I40E_QUEUE_TYPE_EOL instead of hardcoding its value localy.

ok jmatthew@


# 1.46 19-Nov-2019 yasuoka

Correct the link speed constant variables. From Shoichi Yamaguchi.

ok jmatthew


Revision tags: OPENBSD_6_6_BASE
# 1.45 02-Oct-2019 yasuoka

When dequeuing an aqb from the live queue and move it to the idle queue,
remove it from the live queue. Found by Shoichi Yamaguchi.

ok dlg


# 1.44 30-Sep-2019 dlg

use the right values when figuring out which if_baudrate to use.

i was basically using ISSET(bitfield, shift) instead of
ISSET(bitfield, 1 << shift), so things didn't line up properly.

before this the baudrate would come out as 0, but lacp mode in
trunk(4) wants a non-zero if_baudrate value to help pick the best
aggregator to use. if the port reports 0, it doesn't get selected.
this probably explains why trunk(4) doesn't like doing lacp on
virtual ethernet interfaces too, but at least i can get a number
that actually means something on a hardware interface. alternatively,
use aggr(4), which doesn't care about the baudrate.

ok jmatthew@


# 1.43 27-Aug-2019 dlg

don't check the page number was set correctly.

there are some extremely terrible modules out there that violate
specs badly. some SFP+ DACs i have don't support page switching,
and always report page 0xff. let userland look at them anyway.


# 1.42 27-Aug-2019 dlg

make SIOCGIFSFFPAGE support QSFP modules better.

basically the "read phy register" admin op uses the "address" field
on SFP modules for the i2c address, but uses that same field for
the "sff page number" for QSFP. this wraps the calls to the admin
op with sfp and qsfp specific code, and chooses between them depending
on what type of phy the fw reports.

tested by me and jmatthew@ pretty hard


# 1.41 29-Jul-2019 jmatthew

remove some VF bits now that iavf(4) exists

ok dlg@


# 1.40 21-Jul-2019 dlg

apply backpressure when ifiq says the stack is getting busy


# 1.39 04-Jul-2019 jmatthew

when the mac address changes, update the mac/vlan filters accordingly.

ok dlg@


# 1.38 04-May-2019 jsg

fix array bounds check in ixl_search_link_speed()
ok dlg@


# 1.37 15-Apr-2019 visa

Use timeout_del_barrier(9) instead of timeout_del(9) followed by
conditional timeout_barrier(9).

OK kn@ dlg@


Revision tags: OPENBSD_6_5_BASE
# 1.36 10-Apr-2019 phessler

Add support for X722 to ixl(4)

full list of pci ids from sthen@

OK sthen@, jmatthew@, deraadt@


# 1.35 10-Apr-2019 dlg

implement support for SIOCGIFSFFPAGE

this relies on some firmware commands for accessing the i2c bus
that are not available until a relatively recent API version. our
nics using API 1.4 and 1.5 don't handle the command, but the API
1.7 nic we have is happy to talk to the module that is plugged into
it.

xl710 cards (the 40g ones with a qsfp+ connector) can be split up
into 4 functions that represent lanes on a single port. you can get
qsfp+ to 4x sfp+ cables so you can use the different lanes/functions
as completely independent interfaces. however, because each lane/
function is one port and therefore module, we need to serialise
access to the module by at least the port. this is to prevent
concurrent reads of different pages of the one module from stepping
on each other. i took the easy path and made a single ixlsff lock,
which is at least conservative.


# 1.34 01-Apr-2019 jmatthew

Enable the full pre-reset code path again, now that we've fixed the cause
of the pcie errors.

ok dlg@


# 1.33 01-Apr-2019 jmatthew

Don't use a prefetchable mapping for the registers - the controller only
accepts specific read and write widths, so prefetching and write combining
can cause it to generate pcie errors.

ok dlg@


# 1.32 29-Mar-2019 dlg

remove ifiq_barrier in ixl_down cos ifiq tasks don't use nic resources.

ixl_down frees all the dma memory used by the rings, but that memory
isn't used by ifiqs, so there's no need to wait for them to finish


# 1.31 22-Mar-2019 dlg

back out 1.28

i misread the doco, having promisc vlan reception is what we want.


# 1.30 22-Mar-2019 dlg

use a cond in ixl_atq_exec to wait for a command to be completed.

the main change is to not run ixl_atq_done, cos the interrupt handler
is supposed to do that for us.


# 1.29 22-Mar-2019 dlg

don't blindly complete admin queue entries

there's a bit in the flags field that the firmware sets when the
command is done, so check to decide if the command is ready to be
completed. this in turn makes ixl_iff work.

"oh" jmatthew@


# 1.28 21-Mar-2019 dlg

don't set VLAN things when configuring promisc

the doco says vlan things should only be set if you're manipulating
something on a specific vlan, otherwise the config applies to
everything, which is how the stack wants things to work at the
moment.

there's still something wrong in here, but let's get this out of
the way first.


# 1.27 21-Mar-2019 dlg

run event callbacks directly in the atq processing

previously events were queued on an SLIST, but multiple link state
events could fire with the same callback. this corrupted the SLIST
and effectively caused an infinite loop.

ok jmatthew@


# 1.26 12-Mar-2019 jmatthew

Until we can figure out why it causes NMIs on some machines, skip the
pre-reset steps described in Intel's datasheet and also their driver code.

ok dlg@


# 1.25 06-Mar-2019 jmatthew

Ignore rx interrupts if the interface isn't supposed to be running.
In some situations the pxe rom seems to leave rx interrupts
pending, so we get them as soon as we turn interrupts on.
Trying to process rx interrupts before we've allocated an rx ring
leads to crashes.

ok dlg@


# 1.24 01-Mar-2019 dlg

use ifiq_input instead of if_input

call if_rxr_livelocked if ifiq_input says to slow down


# 1.23 26-Feb-2019 dlg

tweak the mbuf loading in the tx path so it's easier to read.

count mbuf load failures as output errors so i can see if that's
a problem (it's not, but at least i can see it isn't now).


# 1.22 26-Feb-2019 dlg

put the rx buffer at the end of the cluster.

makes m_prepend later less likely to allocate a new mbuf.


# 1.21 26-Feb-2019 dlg

avoid a deadlock in ixl_down when calling ifq_barrier.

this is particularly noticable on sparc64 when you reboot.

ok jmatthew@


# 1.20 24-Feb-2019 dlg

get rid of an unused softc member


# 1.19 01-Feb-2019 jmatthew

fix up calculation of our physical function id, making the second port
on dual port cards work much better.

ok dlg@


# 1.18 29-Jan-2019 dlg

get rid of some more debug printfs

suggested by jmatthew@


# 1.17 29-Jan-2019 dlg

don't need to print the base queue number.

ok jmatthew@


# 1.16 22-Jan-2019 jmatthew

Increase hardmtu to the maximum according to the datasheet and set the rx
packet size limit to match so jumbos actually work. Larger packets are
split across multiple buffers on the ring, so the buffers themselves stay
the same size.

ok dlg@


# 1.15 22-Jan-2019 jmatthew

Add and remove mac filters for multicast addresses.

ok dlg@


# 1.14 21-Jan-2019 jmatthew

also add a mac filter that ignores vlans for the broadcast address, so we can
see arp requests on vlans, among other things.

ok dlg@


# 1.13 20-Jan-2019 jmatthew

Replace the default mac filter with one that ignores vlans, and enable
promisc vlan mode so we can see vlan tagged traffic.

ok dlg@


# 1.12 20-Jan-2019 jmatthew

implement ixl_rxrinfo, dynamically allocating the buffer to prepare for
multiple receive rings at some point in the future.

ok dlg@


# 1.11 20-Jan-2019 jmatthew

Handle link state change interrupts by issuing IXL_AQ_OP_PHY_LINK_STATUS
to the admin queue. We don't need to wait for or process the reply,
because the existing admin reply queue processing already does it.

ok dlg@


# 1.10 19-Jan-2019 jmatthew

actually set CAUSE_ENA on the rx and tx queues, and re-enable interrupts
at the start of the interrupt handler. now it works well enough to commit
over.

ok dlg@


# 1.9 18-Jan-2019 jmatthew

pack hmc bits in the right order

ok dlg@


# 1.8 18-Nov-2018 jmatthew

request notification of link state changes, which helps us detect
link when it takes a bit longer to establish.

ok dlg@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.7 21-Dec-2017 dlg

add some ifq and ifiq barriers in ixl_down.

move rxfill in ixl_up so the qtail is only written once.


# 1.6 21-Dec-2017 dlg

now that we have multiple input queues in ifnet structs, use them.

for now we still only have one set of tx and rx rings, but sets us up to
bump that number.


# 1.5 15-Dec-2017 dlg

put where im up to into the tree so jmatthew@ can look at it.


# 1.4 29-Nov-2017 dlg

turns out you dont have to configure a vsi as the default in a veb.

what a waste of two days.


# 1.3 29-Nov-2017 dlg

let this build on sparc64 again.


# 1.2 28-Nov-2017 dlg

remove the #if 0ed out ixl_add_veb now that cvs has backed it up

im pretty sure we dont need it if we're just using the chip as a
single ethernet port.


# 1.1 28-Nov-2017 dlg

add ixl(4) for the "Intel Ethernet 700 Series"

this doesn't work yet, but it very recently got too big to hack on
without cvs to help me manage further changes to it.

ok deraadt@


# 1.63 07-Jul-2020 dlg

fix a copy pasto.

from netbsd if_ixl.c r1.39 by yamaguchi


# 1.62 07-Jul-2020 dlg

don't try and be too clever in the kstat update timeout.


# 1.61 07-Jul-2020 dlg

remove some old code from a previous version of the kstat diff.

ixl(4) is only enabled on 64bit archs, so we don't need a 32bit
fallback for the 48 and 64 bit counter reads.


# 1.60 07-Jul-2020 dlg

add kstat support for reading hardware counters.

this chip is annoyingly complicated, which is reflected in how
complicated it is to read counters off the chip. while we just use
ixl as a normal network interface, the chip is capable of being a
switch with physical ports, virtual ports and all sorts of other
functionality, and there are counters in different places for all
these different pieces. in our simple setup the driver interface
is mapped to a single physical port which we talk to via a single
virtual switch interface. this diff adds counters on each interface
for the physical port and for the virtual switch interface (vsi).

the port counters show what the hardware is doing, while the vsi
counters show how the driver is interacting with the chip. for
things like packet counters, these numbers tend to correlate strongly,
but there are some differences. if the chip drops packets cos there's
no descriptors on the rx ring, that's shown in the vsi counters.
problems talking to the physical network (eg, packet corruption off
the wire) are reported on the port counters.

on top of the chip just being complicated, reading the counters is
a complicated activity on its own. because the counters can be read
by multiple consumers in a virtualised environment, the counters
never really get reset. they are also stored in annoyingly small
fields. this means you basically have to poll the chip periodically
and calculate differences between the polls to avoid losing numbers
if they overflow too quickly.


# 1.59 26-Jun-2020 dlg

whitespace fixes, no functional change.


# 1.58 26-Jun-2020 dlg

fix link state handling so we can see link go both up and down.


# 1.57 25-Jun-2020 dlg

LUT writes go through actual regsiters on 710s, not ctl writes.

found by jmatthew@


# 1.56 25-Jun-2020 dlg

add rss/toeplitz support for 710 chips.

this basically distributes incoming packets over the rx rings, where
without this they would all land on ring 0.

note that the 722 stuff is stubbed out at the moment. i don't have
an x722 to test with, so it's hard to get motivated to write the
code for it.

this is based on stuff supplied by christiano haesbaert.


# 1.55 25-Jun-2020 dlg

use the ixl_chip struct to store different rss_hena settings for 710/722


# 1.54 25-Jun-2020 dlg

add definitions for rss bits.

based on info from christiano haesbaert


# 1.53 25-Jun-2020 dlg

we're close to a point where the differences between 710s and 722s matter.

this adds a struct ixl_chip, which should hold the differences in
functionality between 710s and 722s. this adds which type of chip
each product is to the ixl_devices array.

based on stuff from christiano haesbaert


# 1.52 25-Jun-2020 dlg

use intrmap to set up multiple queues across multiple cpus.

ixl(4) is only enabled on amd64 and sparc64, and both of them now
support pci_intr_establish_cpu(), so it is safe to apply this.

a few things to note:

- the chip only supports a power of 2 number of queues, (ie, 1, 2,
4, 8, etc), so this also tests the INTRMAP_POWEROF2 flag to
intrmap_create. i tested this on a box with 6 cpus and it did the
right thing.
- the chip can support a lot of vectors, but we're limiting it to
8 for now.
- rss with toeplitz is not implemented yet, so all rxed packets end
up on the 0th queue for now.

jmatthew@ had done most of the work already, and christiano haesbaert
provided some hints to motivate me to work on this bit of it.

tested by jmatthew@ on sparc64
ok jmatthew@


# 1.51 24-Jun-2020 dlg

set IFQ_SET_MAXLEN to the number of slots on the tx ring, not 1.

this effectively enables tx mitigation on this chip. hrvoje popovski
tested it and discovered it adds about 20% to forwarding performance
on his test machine, and brings it more in line with ix(4) performance.

jmatthew thinks i copied setting it to 1 from myx, but myx resets
it to a proper value later on when it figures out what the chip is
capable of. how embarrassment.


# 1.50 24-Jun-2020 dlg

get rid of the per device sff lock because we only use the global one.

no functional change


# 1.49 21-Jun-2020 jmatthew

The onboard interfaces on T7/S7 machines don't provide a valid MAC address
for themselves, so use the "local-mac-address" Open Firmware property
instead, as done in ix(4).

ok dlg@


# 1.48 09-May-2020 jmatthew

Use MSI-X interrupts where available. The first vector is used for
events and command completions as that's the only vector they can go to.
tx/rx queues are mapped to subsequent vectors.

ok mpi@ dlg@


Revision tags: OPENBSD_6_7_BASE
# 1.47 22-Apr-2020 mpi

Use I40E_QUEUE_TYPE_EOL instead of hardcoding its value localy.

ok jmatthew@


# 1.46 19-Nov-2019 yasuoka

Correct the link speed constant variables. From Shoichi Yamaguchi.

ok jmatthew


Revision tags: OPENBSD_6_6_BASE
# 1.45 02-Oct-2019 yasuoka

When dequeuing an aqb from the live queue and move it to the idle queue,
remove it from the live queue. Found by Shoichi Yamaguchi.

ok dlg


# 1.44 30-Sep-2019 dlg

use the right values when figuring out which if_baudrate to use.

i was basically using ISSET(bitfield, shift) instead of
ISSET(bitfield, 1 << shift), so things didn't line up properly.

before this the baudrate would come out as 0, but lacp mode in
trunk(4) wants a non-zero if_baudrate value to help pick the best
aggregator to use. if the port reports 0, it doesn't get selected.
this probably explains why trunk(4) doesn't like doing lacp on
virtual ethernet interfaces too, but at least i can get a number
that actually means something on a hardware interface. alternatively,
use aggr(4), which doesn't care about the baudrate.

ok jmatthew@


# 1.43 27-Aug-2019 dlg

don't check the page number was set correctly.

there are some extremely terrible modules out there that violate
specs badly. some SFP+ DACs i have don't support page switching,
and always report page 0xff. let userland look at them anyway.


# 1.42 27-Aug-2019 dlg

make SIOCGIFSFFPAGE support QSFP modules better.

basically the "read phy register" admin op uses the "address" field
on SFP modules for the i2c address, but uses that same field for
the "sff page number" for QSFP. this wraps the calls to the admin
op with sfp and qsfp specific code, and chooses between them depending
on what type of phy the fw reports.

tested by me and jmatthew@ pretty hard


# 1.41 29-Jul-2019 jmatthew

remove some VF bits now that iavf(4) exists

ok dlg@


# 1.40 21-Jul-2019 dlg

apply backpressure when ifiq says the stack is getting busy


# 1.39 04-Jul-2019 jmatthew

when the mac address changes, update the mac/vlan filters accordingly.

ok dlg@


# 1.38 04-May-2019 jsg

fix array bounds check in ixl_search_link_speed()
ok dlg@


# 1.37 15-Apr-2019 visa

Use timeout_del_barrier(9) instead of timeout_del(9) followed by
conditional timeout_barrier(9).

OK kn@ dlg@


Revision tags: OPENBSD_6_5_BASE
# 1.36 10-Apr-2019 phessler

Add support for X722 to ixl(4)

full list of pci ids from sthen@

OK sthen@, jmatthew@, deraadt@


# 1.35 10-Apr-2019 dlg

implement support for SIOCGIFSFFPAGE

this relies on some firmware commands for accessing the i2c bus
that are not available until a relatively recent API version. our
nics using API 1.4 and 1.5 don't handle the command, but the API
1.7 nic we have is happy to talk to the module that is plugged into
it.

xl710 cards (the 40g ones with a qsfp+ connector) can be split up
into 4 functions that represent lanes on a single port. you can get
qsfp+ to 4x sfp+ cables so you can use the different lanes/functions
as completely independent interfaces. however, because each lane/
function is one port and therefore module, we need to serialise
access to the module by at least the port. this is to prevent
concurrent reads of different pages of the one module from stepping
on each other. i took the easy path and made a single ixlsff lock,
which is at least conservative.


# 1.34 01-Apr-2019 jmatthew

Enable the full pre-reset code path again, now that we've fixed the cause
of the pcie errors.

ok dlg@


# 1.33 01-Apr-2019 jmatthew

Don't use a prefetchable mapping for the registers - the controller only
accepts specific read and write widths, so prefetching and write combining
can cause it to generate pcie errors.

ok dlg@


# 1.32 29-Mar-2019 dlg

remove ifiq_barrier in ixl_down cos ifiq tasks don't use nic resources.

ixl_down frees all the dma memory used by the rings, but that memory
isn't used by ifiqs, so there's no need to wait for them to finish


# 1.31 22-Mar-2019 dlg

back out 1.28

i misread the doco, having promisc vlan reception is what we want.


# 1.30 22-Mar-2019 dlg

use a cond in ixl_atq_exec to wait for a command to be completed.

the main change is to not run ixl_atq_done, cos the interrupt handler
is supposed to do that for us.


# 1.29 22-Mar-2019 dlg

don't blindly complete admin queue entries

there's a bit in the flags field that the firmware sets when the
command is done, so check to decide if the command is ready to be
completed. this in turn makes ixl_iff work.

"oh" jmatthew@


# 1.28 21-Mar-2019 dlg

don't set VLAN things when configuring promisc

the doco says vlan things should only be set if you're manipulating
something on a specific vlan, otherwise the config applies to
everything, which is how the stack wants things to work at the
moment.

there's still something wrong in here, but let's get this out of
the way first.


# 1.27 21-Mar-2019 dlg

run event callbacks directly in the atq processing

previously events were queued on an SLIST, but multiple link state
events could fire with the same callback. this corrupted the SLIST
and effectively caused an infinite loop.

ok jmatthew@


# 1.26 12-Mar-2019 jmatthew

Until we can figure out why it causes NMIs on some machines, skip the
pre-reset steps described in Intel's datasheet and also their driver code.

ok dlg@


# 1.25 06-Mar-2019 jmatthew

Ignore rx interrupts if the interface isn't supposed to be running.
In some situations the pxe rom seems to leave rx interrupts
pending, so we get them as soon as we turn interrupts on.
Trying to process rx interrupts before we've allocated an rx ring
leads to crashes.

ok dlg@


# 1.24 01-Mar-2019 dlg

use ifiq_input instead of if_input

call if_rxr_livelocked if ifiq_input says to slow down


# 1.23 26-Feb-2019 dlg

tweak the mbuf loading in the tx path so it's easier to read.

count mbuf load failures as output errors so i can see if that's
a problem (it's not, but at least i can see it isn't now).


# 1.22 26-Feb-2019 dlg

put the rx buffer at the end of the cluster.

makes m_prepend later less likely to allocate a new mbuf.


# 1.21 26-Feb-2019 dlg

avoid a deadlock in ixl_down when calling ifq_barrier.

this is particularly noticable on sparc64 when you reboot.

ok jmatthew@


# 1.20 24-Feb-2019 dlg

get rid of an unused softc member


# 1.19 01-Feb-2019 jmatthew

fix up calculation of our physical function id, making the second port
on dual port cards work much better.

ok dlg@


# 1.18 29-Jan-2019 dlg

get rid of some more debug printfs

suggested by jmatthew@


# 1.17 29-Jan-2019 dlg

don't need to print the base queue number.

ok jmatthew@


# 1.16 22-Jan-2019 jmatthew

Increase hardmtu to the maximum according to the datasheet and set the rx
packet size limit to match so jumbos actually work. Larger packets are
split across multiple buffers on the ring, so the buffers themselves stay
the same size.

ok dlg@


# 1.15 22-Jan-2019 jmatthew

Add and remove mac filters for multicast addresses.

ok dlg@


# 1.14 21-Jan-2019 jmatthew

also add a mac filter that ignores vlans for the broadcast address, so we can
see arp requests on vlans, among other things.

ok dlg@


# 1.13 20-Jan-2019 jmatthew

Replace the default mac filter with one that ignores vlans, and enable
promisc vlan mode so we can see vlan tagged traffic.

ok dlg@


# 1.12 20-Jan-2019 jmatthew

implement ixl_rxrinfo, dynamically allocating the buffer to prepare for
multiple receive rings at some point in the future.

ok dlg@


# 1.11 20-Jan-2019 jmatthew

Handle link state change interrupts by issuing IXL_AQ_OP_PHY_LINK_STATUS
to the admin queue. We don't need to wait for or process the reply,
because the existing admin reply queue processing already does it.

ok dlg@


# 1.10 19-Jan-2019 jmatthew

actually set CAUSE_ENA on the rx and tx queues, and re-enable interrupts
at the start of the interrupt handler. now it works well enough to commit
over.

ok dlg@


# 1.9 18-Jan-2019 jmatthew

pack hmc bits in the right order

ok dlg@


# 1.8 18-Nov-2018 jmatthew

request notification of link state changes, which helps us detect
link when it takes a bit longer to establish.

ok dlg@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.7 21-Dec-2017 dlg

add some ifq and ifiq barriers in ixl_down.

move rxfill in ixl_up so the qtail is only written once.


# 1.6 21-Dec-2017 dlg

now that we have multiple input queues in ifnet structs, use them.

for now we still only have one set of tx and rx rings, but sets us up to
bump that number.


# 1.5 15-Dec-2017 dlg

put where im up to into the tree so jmatthew@ can look at it.


# 1.4 29-Nov-2017 dlg

turns out you dont have to configure a vsi as the default in a veb.

what a waste of two days.


# 1.3 29-Nov-2017 dlg

let this build on sparc64 again.


# 1.2 28-Nov-2017 dlg

remove the #if 0ed out ixl_add_veb now that cvs has backed it up

im pretty sure we dont need it if we're just using the chip as a
single ethernet port.


# 1.1 28-Nov-2017 dlg

add ixl(4) for the "Intel Ethernet 700 Series"

this doesn't work yet, but it very recently got too big to hack on
without cvs to help me manage further changes to it.

ok deraadt@


# 1.63 07-Jul-2020 dlg

fix a copy pasto.

from netbsd if_ixl.c r1.39 by yamaguchi


# 1.62 07-Jul-2020 dlg

don't try and be too clever in the kstat update timeout.


# 1.61 07-Jul-2020 dlg

remove some old code from a previous version of the kstat diff.

ixl(4) is only enabled on 64bit archs, so we don't need a 32bit
fallback for the 48 and 64 bit counter reads.


# 1.60 07-Jul-2020 dlg

add kstat support for reading hardware counters.

this chip is annoyingly complicated, which is reflected in how
complicated it is to read counters off the chip. while we just use
ixl as a normal network interface, the chip is capable of being a
switch with physical ports, virtual ports and all sorts of other
functionality, and there are counters in different places for all
these different pieces. in our simple setup the driver interface
is mapped to a single physical port which we talk to via a single
virtual switch interface. this diff adds counters on each interface
for the physical port and for the virtual switch interface (vsi).

the port counters show what the hardware is doing, while the vsi
counters show how the driver is interacting with the chip. for
things like packet counters, these numbers tend to correlate strongly,
but there are some differences. if the chip drops packets cos there's
no descriptors on the rx ring, that's shown in the vsi counters.
problems talking to the physical network (eg, packet corruption off
the wire) are reported on the port counters.

on top of the chip just being complicated, reading the counters is
a complicated activity on its own. because the counters can be read
by multiple consumers in a virtualised environment, the counters
never really get reset. they are also stored in annoyingly small
fields. this means you basically have to poll the chip periodically
and calculate differences between the polls to avoid losing numbers
if they overflow too quickly.


# 1.59 26-Jun-2020 dlg

whitespace fixes, no functional change.


# 1.58 26-Jun-2020 dlg

fix link state handling so we can see link go both up and down.


# 1.57 25-Jun-2020 dlg

LUT writes go through actual regsiters on 710s, not ctl writes.

found by jmatthew@


# 1.56 25-Jun-2020 dlg

add rss/toeplitz support for 710 chips.

this basically distributes incoming packets over the rx rings, where
without this they would all land on ring 0.

note that the 722 stuff is stubbed out at the moment. i don't have
an x722 to test with, so it's hard to get motivated to write the
code for it.

this is based on stuff supplied by christiano haesbaert.


# 1.55 25-Jun-2020 dlg

use the ixl_chip struct to store different rss_hena settings for 710/722


# 1.54 25-Jun-2020 dlg

add definitions for rss bits.

based on info from christiano haesbaert


# 1.53 25-Jun-2020 dlg

we're close to a point where the differences between 710s and 722s matter.

this adds a struct ixl_chip, which should hold the differences in
functionality between 710s and 722s. this adds which type of chip
each product is to the ixl_devices array.

based on stuff from christiano haesbaert


# 1.52 25-Jun-2020 dlg

use intrmap to set up multiple queues across multiple cpus.

ixl(4) is only enabled on amd64 and sparc64, and both of them now
support pci_intr_establish_cpu(), so it is safe to apply this.

a few things to note:

- the chip only supports a power of 2 number of queues, (ie, 1, 2,
4, 8, etc), so this also tests the INTRMAP_POWEROF2 flag to
intrmap_create. i tested this on a box with 6 cpus and it did the
right thing.
- the chip can support a lot of vectors, but we're limiting it to
8 for now.
- rss with toeplitz is not implemented yet, so all rxed packets end
up on the 0th queue for now.

jmatthew@ had done most of the work already, and christiano haesbaert
provided some hints to motivate me to work on this bit of it.

tested by jmatthew@ on sparc64
ok jmatthew@


# 1.51 24-Jun-2020 dlg

set IFQ_SET_MAXLEN to the number of slots on the tx ring, not 1.

this effectively enables tx mitigation on this chip. hrvoje popovski
tested it and discovered it adds about 20% to forwarding performance
on his test machine, and brings it more in line with ix(4) performance.

jmatthew thinks i copied setting it to 1 from myx, but myx resets
it to a proper value later on when it figures out what the chip is
capable of. how embarrassment.


# 1.50 24-Jun-2020 dlg

get rid of the per device sff lock because we only use the global one.

no functional change


# 1.49 21-Jun-2020 jmatthew

The onboard interfaces on T7/S7 machines don't provide a valid MAC address
for themselves, so use the "local-mac-address" Open Firmware property
instead, as done in ix(4).

ok dlg@


# 1.48 09-May-2020 jmatthew

Use MSI-X interrupts where available. The first vector is used for
events and command completions as that's the only vector they can go to.
tx/rx queues are mapped to subsequent vectors.

ok mpi@ dlg@


Revision tags: OPENBSD_6_7_BASE
# 1.47 22-Apr-2020 mpi

Use I40E_QUEUE_TYPE_EOL instead of hardcoding its value localy.

ok jmatthew@


# 1.46 19-Nov-2019 yasuoka

Correct the link speed constant variables. From Shoichi Yamaguchi.

ok jmatthew


Revision tags: OPENBSD_6_6_BASE
# 1.45 02-Oct-2019 yasuoka

When dequeuing an aqb from the live queue and move it to the idle queue,
remove it from the live queue. Found by Shoichi Yamaguchi.

ok dlg


# 1.44 30-Sep-2019 dlg

use the right values when figuring out which if_baudrate to use.

i was basically using ISSET(bitfield, shift) instead of
ISSET(bitfield, 1 << shift), so things didn't line up properly.

before this the baudrate would come out as 0, but lacp mode in
trunk(4) wants a non-zero if_baudrate value to help pick the best
aggregator to use. if the port reports 0, it doesn't get selected.
this probably explains why trunk(4) doesn't like doing lacp on
virtual ethernet interfaces too, but at least i can get a number
that actually means something on a hardware interface. alternatively,
use aggr(4), which doesn't care about the baudrate.

ok jmatthew@


# 1.43 27-Aug-2019 dlg

don't check the page number was set correctly.

there are some extremely terrible modules out there that violate
specs badly. some SFP+ DACs i have don't support page switching,
and always report page 0xff. let userland look at them anyway.


# 1.42 27-Aug-2019 dlg

make SIOCGIFSFFPAGE support QSFP modules better.

basically the "read phy register" admin op uses the "address" field
on SFP modules for the i2c address, but uses that same field for
the "sff page number" for QSFP. this wraps the calls to the admin
op with sfp and qsfp specific code, and chooses between them depending
on what type of phy the fw reports.

tested by me and jmatthew@ pretty hard


# 1.41 29-Jul-2019 jmatthew

remove some VF bits now that iavf(4) exists

ok dlg@


# 1.40 21-Jul-2019 dlg

apply backpressure when ifiq says the stack is getting busy


# 1.39 04-Jul-2019 jmatthew

when the mac address changes, update the mac/vlan filters accordingly.

ok dlg@


# 1.38 04-May-2019 jsg

fix array bounds check in ixl_search_link_speed()
ok dlg@


# 1.37 15-Apr-2019 visa

Use timeout_del_barrier(9) instead of timeout_del(9) followed by
conditional timeout_barrier(9).

OK kn@ dlg@


Revision tags: OPENBSD_6_5_BASE
# 1.36 10-Apr-2019 phessler

Add support for X722 to ixl(4)

full list of pci ids from sthen@

OK sthen@, jmatthew@, deraadt@


# 1.35 10-Apr-2019 dlg

implement support for SIOCGIFSFFPAGE

this relies on some firmware commands for accessing the i2c bus
that are not available until a relatively recent API version. our
nics using API 1.4 and 1.5 don't handle the command, but the API
1.7 nic we have is happy to talk to the module that is plugged into
it.

xl710 cards (the 40g ones with a qsfp+ connector) can be split up
into 4 functions that represent lanes on a single port. you can get
qsfp+ to 4x sfp+ cables so you can use the different lanes/functions
as completely independent interfaces. however, because each lane/
function is one port and therefore module, we need to serialise
access to the module by at least the port. this is to prevent
concurrent reads of different pages of the one module from stepping
on each other. i took the easy path and made a single ixlsff lock,
which is at least conservative.


# 1.34 01-Apr-2019 jmatthew

Enable the full pre-reset code path again, now that we've fixed the cause
of the pcie errors.

ok dlg@


# 1.33 01-Apr-2019 jmatthew

Don't use a prefetchable mapping for the registers - the controller only
accepts specific read and write widths, so prefetching and write combining
can cause it to generate pcie errors.

ok dlg@


# 1.32 29-Mar-2019 dlg

remove ifiq_barrier in ixl_down cos ifiq tasks don't use nic resources.

ixl_down frees all the dma memory used by the rings, but that memory
isn't used by ifiqs, so there's no need to wait for them to finish


# 1.31 22-Mar-2019 dlg

back out 1.28

i misread the doco, having promisc vlan reception is what we want.


# 1.30 22-Mar-2019 dlg

use a cond in ixl_atq_exec to wait for a command to be completed.

the main change is to not run ixl_atq_done, cos the interrupt handler
is supposed to do that for us.


# 1.29 22-Mar-2019 dlg

don't blindly complete admin queue entries

there's a bit in the flags field that the firmware sets when the
command is done, so check to decide if the command is ready to be
completed. this in turn makes ixl_iff work.

"oh" jmatthew@


# 1.28 21-Mar-2019 dlg

don't set VLAN things when configuring promisc

the doco says vlan things should only be set if you're manipulating
something on a specific vlan, otherwise the config applies to
everything, which is how the stack wants things to work at the
moment.

there's still something wrong in here, but let's get this out of
the way first.


# 1.27 21-Mar-2019 dlg

run event callbacks directly in the atq processing

previously events were queued on an SLIST, but multiple link state
events could fire with the same callback. this corrupted the SLIST
and effectively caused an infinite loop.

ok jmatthew@


# 1.26 12-Mar-2019 jmatthew

Until we can figure out why it causes NMIs on some machines, skip the
pre-reset steps described in Intel's datasheet and also their driver code.

ok dlg@


# 1.25 06-Mar-2019 jmatthew

Ignore rx interrupts if the interface isn't supposed to be running.
In some situations the pxe rom seems to leave rx interrupts
pending, so we get them as soon as we turn interrupts on.
Trying to process rx interrupts before we've allocated an rx ring
leads to crashes.

ok dlg@


# 1.24 01-Mar-2019 dlg

use ifiq_input instead of if_input

call if_rxr_livelocked if ifiq_input says to slow down


# 1.23 26-Feb-2019 dlg

tweak the mbuf loading in the tx path so it's easier to read.

count mbuf load failures as output errors so i can see if that's
a problem (it's not, but at least i can see it isn't now).


# 1.22 26-Feb-2019 dlg

put the rx buffer at the end of the cluster.

makes m_prepend later less likely to allocate a new mbuf.


# 1.21 26-Feb-2019 dlg

avoid a deadlock in ixl_down when calling ifq_barrier.

this is particularly noticable on sparc64 when you reboot.

ok jmatthew@


# 1.20 24-Feb-2019 dlg

get rid of an unused softc member


# 1.19 01-Feb-2019 jmatthew

fix up calculation of our physical function id, making the second port
on dual port cards work much better.

ok dlg@


# 1.18 29-Jan-2019 dlg

get rid of some more debug printfs

suggested by jmatthew@


# 1.17 29-Jan-2019 dlg

don't need to print the base queue number.

ok jmatthew@


# 1.16 22-Jan-2019 jmatthew

Increase hardmtu to the maximum according to the datasheet and set the rx
packet size limit to match so jumbos actually work. Larger packets are
split across multiple buffers on the ring, so the buffers themselves stay
the same size.

ok dlg@


# 1.15 22-Jan-2019 jmatthew

Add and remove mac filters for multicast addresses.

ok dlg@


# 1.14 21-Jan-2019 jmatthew

also add a mac filter that ignores vlans for the broadcast address, so we can
see arp requests on vlans, among other things.

ok dlg@


# 1.13 20-Jan-2019 jmatthew

Replace the default mac filter with one that ignores vlans, and enable
promisc vlan mode so we can see vlan tagged traffic.

ok dlg@


# 1.12 20-Jan-2019 jmatthew

implement ixl_rxrinfo, dynamically allocating the buffer to prepare for
multiple receive rings at some point in the future.

ok dlg@


# 1.11 20-Jan-2019 jmatthew

Handle link state change interrupts by issuing IXL_AQ_OP_PHY_LINK_STATUS
to the admin queue. We don't need to wait for or process the reply,
because the existing admin reply queue processing already does it.

ok dlg@


# 1.10 19-Jan-2019 jmatthew

actually set CAUSE_ENA on the rx and tx queues, and re-enable interrupts
at the start of the interrupt handler. now it works well enough to commit
over.

ok dlg@


# 1.9 18-Jan-2019 jmatthew

pack hmc bits in the right order

ok dlg@


# 1.8 18-Nov-2018 jmatthew

request notification of link state changes, which helps us detect
link when it takes a bit longer to establish.

ok dlg@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.7 21-Dec-2017 dlg

add some ifq and ifiq barriers in ixl_down.

move rxfill in ixl_up so the qtail is only written once.


# 1.6 21-Dec-2017 dlg

now that we have multiple input queues in ifnet structs, use them.

for now we still only have one set of tx and rx rings, but sets us up to
bump that number.


# 1.5 15-Dec-2017 dlg

put where im up to into the tree so jmatthew@ can look at it.


# 1.4 29-Nov-2017 dlg

turns out you dont have to configure a vsi as the default in a veb.

what a waste of two days.


# 1.3 29-Nov-2017 dlg

let this build on sparc64 again.


# 1.2 28-Nov-2017 dlg

remove the #if 0ed out ixl_add_veb now that cvs has backed it up

im pretty sure we dont need it if we're just using the chip as a
single ethernet port.


# 1.1 28-Nov-2017 dlg

add ixl(4) for the "Intel Ethernet 700 Series"

this doesn't work yet, but it very recently got too big to hack on
without cvs to help me manage further changes to it.

ok deraadt@


# 1.59 26-Jun-2020 dlg

whitespace fixes, no functional change.


# 1.58 26-Jun-2020 dlg

fix link state handling so we can see link go both up and down.


# 1.57 25-Jun-2020 dlg

LUT writes go through actual regsiters on 710s, not ctl writes.

found by jmatthew@


# 1.56 25-Jun-2020 dlg

add rss/toeplitz support for 710 chips.

this basically distributes incoming packets over the rx rings, where
without this they would all land on ring 0.

note that the 722 stuff is stubbed out at the moment. i don't have
an x722 to test with, so it's hard to get motivated to write the
code for it.

this is based on stuff supplied by christiano haesbaert.


# 1.55 25-Jun-2020 dlg

use the ixl_chip struct to store different rss_hena settings for 710/722


# 1.54 25-Jun-2020 dlg

add definitions for rss bits.

based on info from christiano haesbaert


# 1.53 25-Jun-2020 dlg

we're close to a point where the differences between 710s and 722s matter.

this adds a struct ixl_chip, which should hold the differences in
functionality between 710s and 722s. this adds which type of chip
each product is to the ixl_devices array.

based on stuff from christiano haesbaert


# 1.52 25-Jun-2020 dlg

use intrmap to set up multiple queues across multiple cpus.

ixl(4) is only enabled on amd64 and sparc64, and both of them now
support pci_intr_establish_cpu(), so it is safe to apply this.

a few things to note:

- the chip only supports a power of 2 number of queues, (ie, 1, 2,
4, 8, etc), so this also tests the INTRMAP_POWEROF2 flag to
intrmap_create. i tested this on a box with 6 cpus and it did the
right thing.
- the chip can support a lot of vectors, but we're limiting it to
8 for now.
- rss with toeplitz is not implemented yet, so all rxed packets end
up on the 0th queue for now.

jmatthew@ had done most of the work already, and christiano haesbaert
provided some hints to motivate me to work on this bit of it.

tested by jmatthew@ on sparc64
ok jmatthew@


# 1.51 24-Jun-2020 dlg

set IFQ_SET_MAXLEN to the number of slots on the tx ring, not 1.

this effectively enables tx mitigation on this chip. hrvoje popovski
tested it and discovered it adds about 20% to forwarding performance
on his test machine, and brings it more in line with ix(4) performance.

jmatthew thinks i copied setting it to 1 from myx, but myx resets
it to a proper value later on when it figures out what the chip is
capable of. how embarrassment.


# 1.50 24-Jun-2020 dlg

get rid of the per device sff lock because we only use the global one.

no functional change


# 1.49 21-Jun-2020 jmatthew

The onboard interfaces on T7/S7 machines don't provide a valid MAC address
for themselves, so use the "local-mac-address" Open Firmware property
instead, as done in ix(4).

ok dlg@


# 1.48 09-May-2020 jmatthew

Use MSI-X interrupts where available. The first vector is used for
events and command completions as that's the only vector they can go to.
tx/rx queues are mapped to subsequent vectors.

ok mpi@ dlg@


Revision tags: OPENBSD_6_7_BASE
# 1.47 22-Apr-2020 mpi

Use I40E_QUEUE_TYPE_EOL instead of hardcoding its value localy.

ok jmatthew@


# 1.46 19-Nov-2019 yasuoka

Correct the link speed constant variables. From Shoichi Yamaguchi.

ok jmatthew


Revision tags: OPENBSD_6_6_BASE
# 1.45 02-Oct-2019 yasuoka

When dequeuing an aqb from the live queue and move it to the idle queue,
remove it from the live queue. Found by Shoichi Yamaguchi.

ok dlg


# 1.44 30-Sep-2019 dlg

use the right values when figuring out which if_baudrate to use.

i was basically using ISSET(bitfield, shift) instead of
ISSET(bitfield, 1 << shift), so things didn't line up properly.

before this the baudrate would come out as 0, but lacp mode in
trunk(4) wants a non-zero if_baudrate value to help pick the best
aggregator to use. if the port reports 0, it doesn't get selected.
this probably explains why trunk(4) doesn't like doing lacp on
virtual ethernet interfaces too, but at least i can get a number
that actually means something on a hardware interface. alternatively,
use aggr(4), which doesn't care about the baudrate.

ok jmatthew@


# 1.43 27-Aug-2019 dlg

don't check the page number was set correctly.

there are some extremely terrible modules out there that violate
specs badly. some SFP+ DACs i have don't support page switching,
and always report page 0xff. let userland look at them anyway.


# 1.42 27-Aug-2019 dlg

make SIOCGIFSFFPAGE support QSFP modules better.

basically the "read phy register" admin op uses the "address" field
on SFP modules for the i2c address, but uses that same field for
the "sff page number" for QSFP. this wraps the calls to the admin
op with sfp and qsfp specific code, and chooses between them depending
on what type of phy the fw reports.

tested by me and jmatthew@ pretty hard


# 1.41 29-Jul-2019 jmatthew

remove some VF bits now that iavf(4) exists

ok dlg@


# 1.40 21-Jul-2019 dlg

apply backpressure when ifiq says the stack is getting busy


# 1.39 04-Jul-2019 jmatthew

when the mac address changes, update the mac/vlan filters accordingly.

ok dlg@


# 1.38 04-May-2019 jsg

fix array bounds check in ixl_search_link_speed()
ok dlg@


# 1.37 15-Apr-2019 visa

Use timeout_del_barrier(9) instead of timeout_del(9) followed by
conditional timeout_barrier(9).

OK kn@ dlg@


Revision tags: OPENBSD_6_5_BASE
# 1.36 10-Apr-2019 phessler

Add support for X722 to ixl(4)

full list of pci ids from sthen@

OK sthen@, jmatthew@, deraadt@


# 1.35 10-Apr-2019 dlg

implement support for SIOCGIFSFFPAGE

this relies on some firmware commands for accessing the i2c bus
that are not available until a relatively recent API version. our
nics using API 1.4 and 1.5 don't handle the command, but the API
1.7 nic we have is happy to talk to the module that is plugged into
it.

xl710 cards (the 40g ones with a qsfp+ connector) can be split up
into 4 functions that represent lanes on a single port. you can get
qsfp+ to 4x sfp+ cables so you can use the different lanes/functions
as completely independent interfaces. however, because each lane/
function is one port and therefore module, we need to serialise
access to the module by at least the port. this is to prevent
concurrent reads of different pages of the one module from stepping
on each other. i took the easy path and made a single ixlsff lock,
which is at least conservative.


# 1.34 01-Apr-2019 jmatthew

Enable the full pre-reset code path again, now that we've fixed the cause
of the pcie errors.

ok dlg@


# 1.33 01-Apr-2019 jmatthew

Don't use a prefetchable mapping for the registers - the controller only
accepts specific read and write widths, so prefetching and write combining
can cause it to generate pcie errors.

ok dlg@


# 1.32 29-Mar-2019 dlg

remove ifiq_barrier in ixl_down cos ifiq tasks don't use nic resources.

ixl_down frees all the dma memory used by the rings, but that memory
isn't used by ifiqs, so there's no need to wait for them to finish


# 1.31 22-Mar-2019 dlg

back out 1.28

i misread the doco, having promisc vlan reception is what we want.


# 1.30 22-Mar-2019 dlg

use a cond in ixl_atq_exec to wait for a command to be completed.

the main change is to not run ixl_atq_done, cos the interrupt handler
is supposed to do that for us.


# 1.29 22-Mar-2019 dlg

don't blindly complete admin queue entries

there's a bit in the flags field that the firmware sets when the
command is done, so check to decide if the command is ready to be
completed. this in turn makes ixl_iff work.

"oh" jmatthew@


# 1.28 21-Mar-2019 dlg

don't set VLAN things when configuring promisc

the doco says vlan things should only be set if you're manipulating
something on a specific vlan, otherwise the config applies to
everything, which is how the stack wants things to work at the
moment.

there's still something wrong in here, but let's get this out of
the way first.


# 1.27 21-Mar-2019 dlg

run event callbacks directly in the atq processing

previously events were queued on an SLIST, but multiple link state
events could fire with the same callback. this corrupted the SLIST
and effectively caused an infinite loop.

ok jmatthew@


# 1.26 12-Mar-2019 jmatthew

Until we can figure out why it causes NMIs on some machines, skip the
pre-reset steps described in Intel's datasheet and also their driver code.

ok dlg@


# 1.25 06-Mar-2019 jmatthew

Ignore rx interrupts if the interface isn't supposed to be running.
In some situations the pxe rom seems to leave rx interrupts
pending, so we get them as soon as we turn interrupts on.
Trying to process rx interrupts before we've allocated an rx ring
leads to crashes.

ok dlg@


# 1.24 01-Mar-2019 dlg

use ifiq_input instead of if_input

call if_rxr_livelocked if ifiq_input says to slow down


# 1.23 26-Feb-2019 dlg

tweak the mbuf loading in the tx path so it's easier to read.

count mbuf load failures as output errors so i can see if that's
a problem (it's not, but at least i can see it isn't now).


# 1.22 26-Feb-2019 dlg

put the rx buffer at the end of the cluster.

makes m_prepend later less likely to allocate a new mbuf.


# 1.21 26-Feb-2019 dlg

avoid a deadlock in ixl_down when calling ifq_barrier.

this is particularly noticable on sparc64 when you reboot.

ok jmatthew@


# 1.20 24-Feb-2019 dlg

get rid of an unused softc member


# 1.19 01-Feb-2019 jmatthew

fix up calculation of our physical function id, making the second port
on dual port cards work much better.

ok dlg@


# 1.18 29-Jan-2019 dlg

get rid of some more debug printfs

suggested by jmatthew@


# 1.17 29-Jan-2019 dlg

don't need to print the base queue number.

ok jmatthew@


# 1.16 22-Jan-2019 jmatthew

Increase hardmtu to the maximum according to the datasheet and set the rx
packet size limit to match so jumbos actually work. Larger packets are
split across multiple buffers on the ring, so the buffers themselves stay
the same size.

ok dlg@


# 1.15 22-Jan-2019 jmatthew

Add and remove mac filters for multicast addresses.

ok dlg@


# 1.14 21-Jan-2019 jmatthew

also add a mac filter that ignores vlans for the broadcast address, so we can
see arp requests on vlans, among other things.

ok dlg@


# 1.13 20-Jan-2019 jmatthew

Replace the default mac filter with one that ignores vlans, and enable
promisc vlan mode so we can see vlan tagged traffic.

ok dlg@


# 1.12 20-Jan-2019 jmatthew

implement ixl_rxrinfo, dynamically allocating the buffer to prepare for
multiple receive rings at some point in the future.

ok dlg@


# 1.11 20-Jan-2019 jmatthew

Handle link state change interrupts by issuing IXL_AQ_OP_PHY_LINK_STATUS
to the admin queue. We don't need to wait for or process the reply,
because the existing admin reply queue processing already does it.

ok dlg@


# 1.10 19-Jan-2019 jmatthew

actually set CAUSE_ENA on the rx and tx queues, and re-enable interrupts
at the start of the interrupt handler. now it works well enough to commit
over.

ok dlg@


# 1.9 18-Jan-2019 jmatthew

pack hmc bits in the right order

ok dlg@


# 1.8 18-Nov-2018 jmatthew

request notification of link state changes, which helps us detect
link when it takes a bit longer to establish.

ok dlg@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.7 21-Dec-2017 dlg

add some ifq and ifiq barriers in ixl_down.

move rxfill in ixl_up so the qtail is only written once.


# 1.6 21-Dec-2017 dlg

now that we have multiple input queues in ifnet structs, use them.

for now we still only have one set of tx and rx rings, but sets us up to
bump that number.


# 1.5 15-Dec-2017 dlg

put where im up to into the tree so jmatthew@ can look at it.


# 1.4 29-Nov-2017 dlg

turns out you dont have to configure a vsi as the default in a veb.

what a waste of two days.


# 1.3 29-Nov-2017 dlg

let this build on sparc64 again.


# 1.2 28-Nov-2017 dlg

remove the #if 0ed out ixl_add_veb now that cvs has backed it up

im pretty sure we dont need it if we're just using the chip as a
single ethernet port.


# 1.1 28-Nov-2017 dlg

add ixl(4) for the "Intel Ethernet 700 Series"

this doesn't work yet, but it very recently got too big to hack on
without cvs to help me manage further changes to it.

ok deraadt@


# 1.57 25-Jun-2020 dlg

LUT writes go through actual regsiters on 710s, not ctl writes.

found by jmatthew@


# 1.56 25-Jun-2020 dlg

add rss/toeplitz support for 710 chips.

this basically distributes incoming packets over the rx rings, where
without this they would all land on ring 0.

note that the 722 stuff is stubbed out at the moment. i don't have
an x722 to test with, so it's hard to get motivated to write the
code for it.

this is based on stuff supplied by christiano haesbaert.


# 1.55 25-Jun-2020 dlg

use the ixl_chip struct to store different rss_hena settings for 710/722


# 1.54 25-Jun-2020 dlg

add definitions for rss bits.

based on info from christiano haesbaert


# 1.53 25-Jun-2020 dlg

we're close to a point where the differences between 710s and 722s matter.

this adds a struct ixl_chip, which should hold the differences in
functionality between 710s and 722s. this adds which type of chip
each product is to the ixl_devices array.

based on stuff from christiano haesbaert


# 1.52 25-Jun-2020 dlg

use intrmap to set up multiple queues across multiple cpus.

ixl(4) is only enabled on amd64 and sparc64, and both of them now
support pci_intr_establish_cpu(), so it is safe to apply this.

a few things to note:

- the chip only supports a power of 2 number of queues, (ie, 1, 2,
4, 8, etc), so this also tests the INTRMAP_POWEROF2 flag to
intrmap_create. i tested this on a box with 6 cpus and it did the
right thing.
- the chip can support a lot of vectors, but we're limiting it to
8 for now.
- rss with toeplitz is not implemented yet, so all rxed packets end
up on the 0th queue for now.

jmatthew@ had done most of the work already, and christiano haesbaert
provided some hints to motivate me to work on this bit of it.

tested by jmatthew@ on sparc64
ok jmatthew@


# 1.51 24-Jun-2020 dlg

set IFQ_SET_MAXLEN to the number of slots on the tx ring, not 1.

this effectively enables tx mitigation on this chip. hrvoje popovski
tested it and discovered it adds about 20% to forwarding performance
on his test machine, and brings it more in line with ix(4) performance.

jmatthew thinks i copied setting it to 1 from myx, but myx resets
it to a proper value later on when it figures out what the chip is
capable of. how embarrassment.


# 1.50 24-Jun-2020 dlg

get rid of the per device sff lock because we only use the global one.

no functional change


# 1.49 21-Jun-2020 jmatthew

The onboard interfaces on T7/S7 machines don't provide a valid MAC address
for themselves, so use the "local-mac-address" Open Firmware property
instead, as done in ix(4).

ok dlg@


# 1.48 09-May-2020 jmatthew

Use MSI-X interrupts where available. The first vector is used for
events and command completions as that's the only vector they can go to.
tx/rx queues are mapped to subsequent vectors.

ok mpi@ dlg@


Revision tags: OPENBSD_6_7_BASE
# 1.47 22-Apr-2020 mpi

Use I40E_QUEUE_TYPE_EOL instead of hardcoding its value localy.

ok jmatthew@


# 1.46 19-Nov-2019 yasuoka

Correct the link speed constant variables. From Shoichi Yamaguchi.

ok jmatthew


Revision tags: OPENBSD_6_6_BASE
# 1.45 02-Oct-2019 yasuoka

When dequeuing an aqb from the live queue and move it to the idle queue,
remove it from the live queue. Found by Shoichi Yamaguchi.

ok dlg


# 1.44 30-Sep-2019 dlg

use the right values when figuring out which if_baudrate to use.

i was basically using ISSET(bitfield, shift) instead of
ISSET(bitfield, 1 << shift), so things didn't line up properly.

before this the baudrate would come out as 0, but lacp mode in
trunk(4) wants a non-zero if_baudrate value to help pick the best
aggregator to use. if the port reports 0, it doesn't get selected.
this probably explains why trunk(4) doesn't like doing lacp on
virtual ethernet interfaces too, but at least i can get a number
that actually means something on a hardware interface. alternatively,
use aggr(4), which doesn't care about the baudrate.

ok jmatthew@


# 1.43 27-Aug-2019 dlg

don't check the page number was set correctly.

there are some extremely terrible modules out there that violate
specs badly. some SFP+ DACs i have don't support page switching,
and always report page 0xff. let userland look at them anyway.


# 1.42 27-Aug-2019 dlg

make SIOCGIFSFFPAGE support QSFP modules better.

basically the "read phy register" admin op uses the "address" field
on SFP modules for the i2c address, but uses that same field for
the "sff page number" for QSFP. this wraps the calls to the admin
op with sfp and qsfp specific code, and chooses between them depending
on what type of phy the fw reports.

tested by me and jmatthew@ pretty hard


# 1.41 29-Jul-2019 jmatthew

remove some VF bits now that iavf(4) exists

ok dlg@


# 1.40 21-Jul-2019 dlg

apply backpressure when ifiq says the stack is getting busy


# 1.39 04-Jul-2019 jmatthew

when the mac address changes, update the mac/vlan filters accordingly.

ok dlg@


# 1.38 04-May-2019 jsg

fix array bounds check in ixl_search_link_speed()
ok dlg@


# 1.37 15-Apr-2019 visa

Use timeout_del_barrier(9) instead of timeout_del(9) followed by
conditional timeout_barrier(9).

OK kn@ dlg@


Revision tags: OPENBSD_6_5_BASE
# 1.36 10-Apr-2019 phessler

Add support for X722 to ixl(4)

full list of pci ids from sthen@

OK sthen@, jmatthew@, deraadt@


# 1.35 10-Apr-2019 dlg

implement support for SIOCGIFSFFPAGE

this relies on some firmware commands for accessing the i2c bus
that are not available until a relatively recent API version. our
nics using API 1.4 and 1.5 don't handle the command, but the API
1.7 nic we have is happy to talk to the module that is plugged into
it.

xl710 cards (the 40g ones with a qsfp+ connector) can be split up
into 4 functions that represent lanes on a single port. you can get
qsfp+ to 4x sfp+ cables so you can use the different lanes/functions
as completely independent interfaces. however, because each lane/
function is one port and therefore module, we need to serialise
access to the module by at least the port. this is to prevent
concurrent reads of different pages of the one module from stepping
on each other. i took the easy path and made a single ixlsff lock,
which is at least conservative.


# 1.34 01-Apr-2019 jmatthew

Enable the full pre-reset code path again, now that we've fixed the cause
of the pcie errors.

ok dlg@


# 1.33 01-Apr-2019 jmatthew

Don't use a prefetchable mapping for the registers - the controller only
accepts specific read and write widths, so prefetching and write combining
can cause it to generate pcie errors.

ok dlg@


# 1.32 29-Mar-2019 dlg

remove ifiq_barrier in ixl_down cos ifiq tasks don't use nic resources.

ixl_down frees all the dma memory used by the rings, but that memory
isn't used by ifiqs, so there's no need to wait for them to finish


# 1.31 22-Mar-2019 dlg

back out 1.28

i misread the doco, having promisc vlan reception is what we want.


# 1.30 22-Mar-2019 dlg

use a cond in ixl_atq_exec to wait for a command to be completed.

the main change is to not run ixl_atq_done, cos the interrupt handler
is supposed to do that for us.


# 1.29 22-Mar-2019 dlg

don't blindly complete admin queue entries

there's a bit in the flags field that the firmware sets when the
command is done, so check to decide if the command is ready to be
completed. this in turn makes ixl_iff work.

"oh" jmatthew@


# 1.28 21-Mar-2019 dlg

don't set VLAN things when configuring promisc

the doco says vlan things should only be set if you're manipulating
something on a specific vlan, otherwise the config applies to
everything, which is how the stack wants things to work at the
moment.

there's still something wrong in here, but let's get this out of
the way first.


# 1.27 21-Mar-2019 dlg

run event callbacks directly in the atq processing

previously events were queued on an SLIST, but multiple link state
events could fire with the same callback. this corrupted the SLIST
and effectively caused an infinite loop.

ok jmatthew@


# 1.26 12-Mar-2019 jmatthew

Until we can figure out why it causes NMIs on some machines, skip the
pre-reset steps described in Intel's datasheet and also their driver code.

ok dlg@


# 1.25 06-Mar-2019 jmatthew

Ignore rx interrupts if the interface isn't supposed to be running.
In some situations the pxe rom seems to leave rx interrupts
pending, so we get them as soon as we turn interrupts on.
Trying to process rx interrupts before we've allocated an rx ring
leads to crashes.

ok dlg@


# 1.24 01-Mar-2019 dlg

use ifiq_input instead of if_input

call if_rxr_livelocked if ifiq_input says to slow down


# 1.23 26-Feb-2019 dlg

tweak the mbuf loading in the tx path so it's easier to read.

count mbuf load failures as output errors so i can see if that's
a problem (it's not, but at least i can see it isn't now).


# 1.22 26-Feb-2019 dlg

put the rx buffer at the end of the cluster.

makes m_prepend later less likely to allocate a new mbuf.


# 1.21 26-Feb-2019 dlg

avoid a deadlock in ixl_down when calling ifq_barrier.

this is particularly noticable on sparc64 when you reboot.

ok jmatthew@


# 1.20 24-Feb-2019 dlg

get rid of an unused softc member


# 1.19 01-Feb-2019 jmatthew

fix up calculation of our physical function id, making the second port
on dual port cards work much better.

ok dlg@


# 1.18 29-Jan-2019 dlg

get rid of some more debug printfs

suggested by jmatthew@


# 1.17 29-Jan-2019 dlg

don't need to print the base queue number.

ok jmatthew@


# 1.16 22-Jan-2019 jmatthew

Increase hardmtu to the maximum according to the datasheet and set the rx
packet size limit to match so jumbos actually work. Larger packets are
split across multiple buffers on the ring, so the buffers themselves stay
the same size.

ok dlg@


# 1.15 22-Jan-2019 jmatthew

Add and remove mac filters for multicast addresses.

ok dlg@


# 1.14 21-Jan-2019 jmatthew

also add a mac filter that ignores vlans for the broadcast address, so we can
see arp requests on vlans, among other things.

ok dlg@


# 1.13 20-Jan-2019 jmatthew

Replace the default mac filter with one that ignores vlans, and enable
promisc vlan mode so we can see vlan tagged traffic.

ok dlg@


# 1.12 20-Jan-2019 jmatthew

implement ixl_rxrinfo, dynamically allocating the buffer to prepare for
multiple receive rings at some point in the future.

ok dlg@


# 1.11 20-Jan-2019 jmatthew

Handle link state change interrupts by issuing IXL_AQ_OP_PHY_LINK_STATUS
to the admin queue. We don't need to wait for or process the reply,
because the existing admin reply queue processing already does it.

ok dlg@


# 1.10 19-Jan-2019 jmatthew

actually set CAUSE_ENA on the rx and tx queues, and re-enable interrupts
at the start of the interrupt handler. now it works well enough to commit
over.

ok dlg@


# 1.9 18-Jan-2019 jmatthew

pack hmc bits in the right order

ok dlg@


# 1.8 18-Nov-2018 jmatthew

request notification of link state changes, which helps us detect
link when it takes a bit longer to establish.

ok dlg@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.7 21-Dec-2017 dlg

add some ifq and ifiq barriers in ixl_down.

move rxfill in ixl_up so the qtail is only written once.


# 1.6 21-Dec-2017 dlg

now that we have multiple input queues in ifnet structs, use them.

for now we still only have one set of tx and rx rings, but sets us up to
bump that number.


# 1.5 15-Dec-2017 dlg

put where im up to into the tree so jmatthew@ can look at it.


# 1.4 29-Nov-2017 dlg

turns out you dont have to configure a vsi as the default in a veb.

what a waste of two days.


# 1.3 29-Nov-2017 dlg

let this build on sparc64 again.


# 1.2 28-Nov-2017 dlg

remove the #if 0ed out ixl_add_veb now that cvs has backed it up

im pretty sure we dont need it if we're just using the chip as a
single ethernet port.


# 1.1 28-Nov-2017 dlg

add ixl(4) for the "Intel Ethernet 700 Series"

this doesn't work yet, but it very recently got too big to hack on
without cvs to help me manage further changes to it.

ok deraadt@


# 1.49 21-Jun-2020 jmatthew

The onboard interfaces on T7/S7 machines don't provide a valid MAC address
for themselves, so use the "local-mac-address" Open Firmware property
instead, as done in ix(4).

ok dlg@


# 1.48 09-May-2020 jmatthew

Use MSI-X interrupts where available. The first vector is used for
events and command completions as that's the only vector they can go to.
tx/rx queues are mapped to subsequent vectors.

ok mpi@ dlg@


Revision tags: OPENBSD_6_7_BASE
# 1.47 22-Apr-2020 mpi

Use I40E_QUEUE_TYPE_EOL instead of hardcoding its value localy.

ok jmatthew@


# 1.46 19-Nov-2019 yasuoka

Correct the link speed constant variables. From Shoichi Yamaguchi.

ok jmatthew


Revision tags: OPENBSD_6_6_BASE
# 1.45 02-Oct-2019 yasuoka

When dequeuing an aqb from the live queue and move it to the idle queue,
remove it from the live queue. Found by Shoichi Yamaguchi.

ok dlg


# 1.44 30-Sep-2019 dlg

use the right values when figuring out which if_baudrate to use.

i was basically using ISSET(bitfield, shift) instead of
ISSET(bitfield, 1 << shift), so things didn't line up properly.

before this the baudrate would come out as 0, but lacp mode in
trunk(4) wants a non-zero if_baudrate value to help pick the best
aggregator to use. if the port reports 0, it doesn't get selected.
this probably explains why trunk(4) doesn't like doing lacp on
virtual ethernet interfaces too, but at least i can get a number
that actually means something on a hardware interface. alternatively,
use aggr(4), which doesn't care about the baudrate.

ok jmatthew@


# 1.43 27-Aug-2019 dlg

don't check the page number was set correctly.

there are some extremely terrible modules out there that violate
specs badly. some SFP+ DACs i have don't support page switching,
and always report page 0xff. let userland look at them anyway.


# 1.42 27-Aug-2019 dlg

make SIOCGIFSFFPAGE support QSFP modules better.

basically the "read phy register" admin op uses the "address" field
on SFP modules for the i2c address, but uses that same field for
the "sff page number" for QSFP. this wraps the calls to the admin
op with sfp and qsfp specific code, and chooses between them depending
on what type of phy the fw reports.

tested by me and jmatthew@ pretty hard


# 1.41 29-Jul-2019 jmatthew

remove some VF bits now that iavf(4) exists

ok dlg@


# 1.40 21-Jul-2019 dlg

apply backpressure when ifiq says the stack is getting busy


# 1.39 04-Jul-2019 jmatthew

when the mac address changes, update the mac/vlan filters accordingly.

ok dlg@


# 1.38 04-May-2019 jsg

fix array bounds check in ixl_search_link_speed()
ok dlg@


# 1.37 15-Apr-2019 visa

Use timeout_del_barrier(9) instead of timeout_del(9) followed by
conditional timeout_barrier(9).

OK kn@ dlg@


Revision tags: OPENBSD_6_5_BASE
# 1.36 10-Apr-2019 phessler

Add support for X722 to ixl(4)

full list of pci ids from sthen@

OK sthen@, jmatthew@, deraadt@


# 1.35 10-Apr-2019 dlg

implement support for SIOCGIFSFFPAGE

this relies on some firmware commands for accessing the i2c bus
that are not available until a relatively recent API version. our
nics using API 1.4 and 1.5 don't handle the command, but the API
1.7 nic we have is happy to talk to the module that is plugged into
it.

xl710 cards (the 40g ones with a qsfp+ connector) can be split up
into 4 functions that represent lanes on a single port. you can get
qsfp+ to 4x sfp+ cables so you can use the different lanes/functions
as completely independent interfaces. however, because each lane/
function is one port and therefore module, we need to serialise
access to the module by at least the port. this is to prevent
concurrent reads of different pages of the one module from stepping
on each other. i took the easy path and made a single ixlsff lock,
which is at least conservative.


# 1.34 01-Apr-2019 jmatthew

Enable the full pre-reset code path again, now that we've fixed the cause
of the pcie errors.

ok dlg@


# 1.33 01-Apr-2019 jmatthew

Don't use a prefetchable mapping for the registers - the controller only
accepts specific read and write widths, so prefetching and write combining
can cause it to generate pcie errors.

ok dlg@


# 1.32 29-Mar-2019 dlg

remove ifiq_barrier in ixl_down cos ifiq tasks don't use nic resources.

ixl_down frees all the dma memory used by the rings, but that memory
isn't used by ifiqs, so there's no need to wait for them to finish


# 1.31 22-Mar-2019 dlg

back out 1.28

i misread the doco, having promisc vlan reception is what we want.


# 1.30 22-Mar-2019 dlg

use a cond in ixl_atq_exec to wait for a command to be completed.

the main change is to not run ixl_atq_done, cos the interrupt handler
is supposed to do that for us.


# 1.29 22-Mar-2019 dlg

don't blindly complete admin queue entries

there's a bit in the flags field that the firmware sets when the
command is done, so check to decide if the command is ready to be
completed. this in turn makes ixl_iff work.

"oh" jmatthew@


# 1.28 21-Mar-2019 dlg

don't set VLAN things when configuring promisc

the doco says vlan things should only be set if you're manipulating
something on a specific vlan, otherwise the config applies to
everything, which is how the stack wants things to work at the
moment.

there's still something wrong in here, but let's get this out of
the way first.


# 1.27 21-Mar-2019 dlg

run event callbacks directly in the atq processing

previously events were queued on an SLIST, but multiple link state
events could fire with the same callback. this corrupted the SLIST
and effectively caused an infinite loop.

ok jmatthew@


# 1.26 12-Mar-2019 jmatthew

Until we can figure out why it causes NMIs on some machines, skip the
pre-reset steps described in Intel's datasheet and also their driver code.

ok dlg@


# 1.25 06-Mar-2019 jmatthew

Ignore rx interrupts if the interface isn't supposed to be running.
In some situations the pxe rom seems to leave rx interrupts
pending, so we get them as soon as we turn interrupts on.
Trying to process rx interrupts before we've allocated an rx ring
leads to crashes.

ok dlg@


# 1.24 01-Mar-2019 dlg

use ifiq_input instead of if_input

call if_rxr_livelocked if ifiq_input says to slow down


# 1.23 26-Feb-2019 dlg

tweak the mbuf loading in the tx path so it's easier to read.

count mbuf load failures as output errors so i can see if that's
a problem (it's not, but at least i can see it isn't now).


# 1.22 26-Feb-2019 dlg

put the rx buffer at the end of the cluster.

makes m_prepend later less likely to allocate a new mbuf.


# 1.21 26-Feb-2019 dlg

avoid a deadlock in ixl_down when calling ifq_barrier.

this is particularly noticable on sparc64 when you reboot.

ok jmatthew@


# 1.20 24-Feb-2019 dlg

get rid of an unused softc member


# 1.19 01-Feb-2019 jmatthew

fix up calculation of our physical function id, making the second port
on dual port cards work much better.

ok dlg@


# 1.18 29-Jan-2019 dlg

get rid of some more debug printfs

suggested by jmatthew@


# 1.17 29-Jan-2019 dlg

don't need to print the base queue number.

ok jmatthew@


# 1.16 22-Jan-2019 jmatthew

Increase hardmtu to the maximum according to the datasheet and set the rx
packet size limit to match so jumbos actually work. Larger packets are
split across multiple buffers on the ring, so the buffers themselves stay
the same size.

ok dlg@


# 1.15 22-Jan-2019 jmatthew

Add and remove mac filters for multicast addresses.

ok dlg@


# 1.14 21-Jan-2019 jmatthew

also add a mac filter that ignores vlans for the broadcast address, so we can
see arp requests on vlans, among other things.

ok dlg@


# 1.13 20-Jan-2019 jmatthew

Replace the default mac filter with one that ignores vlans, and enable
promisc vlan mode so we can see vlan tagged traffic.

ok dlg@


# 1.12 20-Jan-2019 jmatthew

implement ixl_rxrinfo, dynamically allocating the buffer to prepare for
multiple receive rings at some point in the future.

ok dlg@


# 1.11 20-Jan-2019 jmatthew

Handle link state change interrupts by issuing IXL_AQ_OP_PHY_LINK_STATUS
to the admin queue. We don't need to wait for or process the reply,
because the existing admin reply queue processing already does it.

ok dlg@


# 1.10 19-Jan-2019 jmatthew

actually set CAUSE_ENA on the rx and tx queues, and re-enable interrupts
at the start of the interrupt handler. now it works well enough to commit
over.

ok dlg@


# 1.9 18-Jan-2019 jmatthew

pack hmc bits in the right order

ok dlg@


# 1.8 18-Nov-2018 jmatthew

request notification of link state changes, which helps us detect
link when it takes a bit longer to establish.

ok dlg@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.7 21-Dec-2017 dlg

add some ifq and ifiq barriers in ixl_down.

move rxfill in ixl_up so the qtail is only written once.


# 1.6 21-Dec-2017 dlg

now that we have multiple input queues in ifnet structs, use them.

for now we still only have one set of tx and rx rings, but sets us up to
bump that number.


# 1.5 15-Dec-2017 dlg

put where im up to into the tree so jmatthew@ can look at it.


# 1.4 29-Nov-2017 dlg

turns out you dont have to configure a vsi as the default in a veb.

what a waste of two days.


# 1.3 29-Nov-2017 dlg

let this build on sparc64 again.


# 1.2 28-Nov-2017 dlg

remove the #if 0ed out ixl_add_veb now that cvs has backed it up

im pretty sure we dont need it if we're just using the chip as a
single ethernet port.


# 1.1 28-Nov-2017 dlg

add ixl(4) for the "Intel Ethernet 700 Series"

this doesn't work yet, but it very recently got too big to hack on
without cvs to help me manage further changes to it.

ok deraadt@


# 1.48 09-May-2020 jmatthew

Use MSI-X interrupts where available. The first vector is used for
events and command completions as that's the only vector they can go to.
tx/rx queues are mapped to subsequent vectors.

ok mpi@ dlg@


Revision tags: OPENBSD_6_7_BASE
# 1.47 22-Apr-2020 mpi

Use I40E_QUEUE_TYPE_EOL instead of hardcoding its value localy.

ok jmatthew@


# 1.46 19-Nov-2019 yasuoka

Correct the link speed constant variables. From Shoichi Yamaguchi.

ok jmatthew


Revision tags: OPENBSD_6_6_BASE
# 1.45 02-Oct-2019 yasuoka

When dequeuing an aqb from the live queue and move it to the idle queue,
remove it from the live queue. Found by Shoichi Yamaguchi.

ok dlg


# 1.44 30-Sep-2019 dlg

use the right values when figuring out which if_baudrate to use.

i was basically using ISSET(bitfield, shift) instead of
ISSET(bitfield, 1 << shift), so things didn't line up properly.

before this the baudrate would come out as 0, but lacp mode in
trunk(4) wants a non-zero if_baudrate value to help pick the best
aggregator to use. if the port reports 0, it doesn't get selected.
this probably explains why trunk(4) doesn't like doing lacp on
virtual ethernet interfaces too, but at least i can get a number
that actually means something on a hardware interface. alternatively,
use aggr(4), which doesn't care about the baudrate.

ok jmatthew@


# 1.43 27-Aug-2019 dlg

don't check the page number was set correctly.

there are some extremely terrible modules out there that violate
specs badly. some SFP+ DACs i have don't support page switching,
and always report page 0xff. let userland look at them anyway.


# 1.42 27-Aug-2019 dlg

make SIOCGIFSFFPAGE support QSFP modules better.

basically the "read phy register" admin op uses the "address" field
on SFP modules for the i2c address, but uses that same field for
the "sff page number" for QSFP. this wraps the calls to the admin
op with sfp and qsfp specific code, and chooses between them depending
on what type of phy the fw reports.

tested by me and jmatthew@ pretty hard


# 1.41 29-Jul-2019 jmatthew

remove some VF bits now that iavf(4) exists

ok dlg@


# 1.40 21-Jul-2019 dlg

apply backpressure when ifiq says the stack is getting busy


# 1.39 04-Jul-2019 jmatthew

when the mac address changes, update the mac/vlan filters accordingly.

ok dlg@


# 1.38 04-May-2019 jsg

fix array bounds check in ixl_search_link_speed()
ok dlg@


# 1.37 15-Apr-2019 visa

Use timeout_del_barrier(9) instead of timeout_del(9) followed by
conditional timeout_barrier(9).

OK kn@ dlg@


Revision tags: OPENBSD_6_5_BASE
# 1.36 10-Apr-2019 phessler

Add support for X722 to ixl(4)

full list of pci ids from sthen@

OK sthen@, jmatthew@, deraadt@


# 1.35 10-Apr-2019 dlg

implement support for SIOCGIFSFFPAGE

this relies on some firmware commands for accessing the i2c bus
that are not available until a relatively recent API version. our
nics using API 1.4 and 1.5 don't handle the command, but the API
1.7 nic we have is happy to talk to the module that is plugged into
it.

xl710 cards (the 40g ones with a qsfp+ connector) can be split up
into 4 functions that represent lanes on a single port. you can get
qsfp+ to 4x sfp+ cables so you can use the different lanes/functions
as completely independent interfaces. however, because each lane/
function is one port and therefore module, we need to serialise
access to the module by at least the port. this is to prevent
concurrent reads of different pages of the one module from stepping
on each other. i took the easy path and made a single ixlsff lock,
which is at least conservative.


# 1.34 01-Apr-2019 jmatthew

Enable the full pre-reset code path again, now that we've fixed the cause
of the pcie errors.

ok dlg@


# 1.33 01-Apr-2019 jmatthew

Don't use a prefetchable mapping for the registers - the controller only
accepts specific read and write widths, so prefetching and write combining
can cause it to generate pcie errors.

ok dlg@


# 1.32 29-Mar-2019 dlg

remove ifiq_barrier in ixl_down cos ifiq tasks don't use nic resources.

ixl_down frees all the dma memory used by the rings, but that memory
isn't used by ifiqs, so there's no need to wait for them to finish


# 1.31 22-Mar-2019 dlg

back out 1.28

i misread the doco, having promisc vlan reception is what we want.


# 1.30 22-Mar-2019 dlg

use a cond in ixl_atq_exec to wait for a command to be completed.

the main change is to not run ixl_atq_done, cos the interrupt handler
is supposed to do that for us.


# 1.29 22-Mar-2019 dlg

don't blindly complete admin queue entries

there's a bit in the flags field that the firmware sets when the
command is done, so check to decide if the command is ready to be
completed. this in turn makes ixl_iff work.

"oh" jmatthew@


# 1.28 21-Mar-2019 dlg

don't set VLAN things when configuring promisc

the doco says vlan things should only be set if you're manipulating
something on a specific vlan, otherwise the config applies to
everything, which is how the stack wants things to work at the
moment.

there's still something wrong in here, but let's get this out of
the way first.


# 1.27 21-Mar-2019 dlg

run event callbacks directly in the atq processing

previously events were queued on an SLIST, but multiple link state
events could fire with the same callback. this corrupted the SLIST
and effectively caused an infinite loop.

ok jmatthew@


# 1.26 12-Mar-2019 jmatthew

Until we can figure out why it causes NMIs on some machines, skip the
pre-reset steps described in Intel's datasheet and also their driver code.

ok dlg@


# 1.25 06-Mar-2019 jmatthew

Ignore rx interrupts if the interface isn't supposed to be running.
In some situations the pxe rom seems to leave rx interrupts
pending, so we get them as soon as we turn interrupts on.
Trying to process rx interrupts before we've allocated an rx ring
leads to crashes.

ok dlg@


# 1.24 01-Mar-2019 dlg

use ifiq_input instead of if_input

call if_rxr_livelocked if ifiq_input says to slow down


# 1.23 26-Feb-2019 dlg

tweak the mbuf loading in the tx path so it's easier to read.

count mbuf load failures as output errors so i can see if that's
a problem (it's not, but at least i can see it isn't now).


# 1.22 26-Feb-2019 dlg

put the rx buffer at the end of the cluster.

makes m_prepend later less likely to allocate a new mbuf.


# 1.21 26-Feb-2019 dlg

avoid a deadlock in ixl_down when calling ifq_barrier.

this is particularly noticable on sparc64 when you reboot.

ok jmatthew@


# 1.20 24-Feb-2019 dlg

get rid of an unused softc member


# 1.19 01-Feb-2019 jmatthew

fix up calculation of our physical function id, making the second port
on dual port cards work much better.

ok dlg@


# 1.18 29-Jan-2019 dlg

get rid of some more debug printfs

suggested by jmatthew@


# 1.17 29-Jan-2019 dlg

don't need to print the base queue number.

ok jmatthew@


# 1.16 22-Jan-2019 jmatthew

Increase hardmtu to the maximum according to the datasheet and set the rx
packet size limit to match so jumbos actually work. Larger packets are
split across multiple buffers on the ring, so the buffers themselves stay
the same size.

ok dlg@


# 1.15 22-Jan-2019 jmatthew

Add and remove mac filters for multicast addresses.

ok dlg@


# 1.14 21-Jan-2019 jmatthew

also add a mac filter that ignores vlans for the broadcast address, so we can
see arp requests on vlans, among other things.

ok dlg@


# 1.13 20-Jan-2019 jmatthew

Replace the default mac filter with one that ignores vlans, and enable
promisc vlan mode so we can see vlan tagged traffic.

ok dlg@


# 1.12 20-Jan-2019 jmatthew

implement ixl_rxrinfo, dynamically allocating the buffer to prepare for
multiple receive rings at some point in the future.

ok dlg@


# 1.11 20-Jan-2019 jmatthew

Handle link state change interrupts by issuing IXL_AQ_OP_PHY_LINK_STATUS
to the admin queue. We don't need to wait for or process the reply,
because the existing admin reply queue processing already does it.

ok dlg@


# 1.10 19-Jan-2019 jmatthew

actually set CAUSE_ENA on the rx and tx queues, and re-enable interrupts
at the start of the interrupt handler. now it works well enough to commit
over.

ok dlg@


# 1.9 18-Jan-2019 jmatthew

pack hmc bits in the right order

ok dlg@


# 1.8 18-Nov-2018 jmatthew

request notification of link state changes, which helps us detect
link when it takes a bit longer to establish.

ok dlg@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.7 21-Dec-2017 dlg

add some ifq and ifiq barriers in ixl_down.

move rxfill in ixl_up so the qtail is only written once.


# 1.6 21-Dec-2017 dlg

now that we have multiple input queues in ifnet structs, use them.

for now we still only have one set of tx and rx rings, but sets us up to
bump that number.


# 1.5 15-Dec-2017 dlg

put where im up to into the tree so jmatthew@ can look at it.


# 1.4 29-Nov-2017 dlg

turns out you dont have to configure a vsi as the default in a veb.

what a waste of two days.


# 1.3 29-Nov-2017 dlg

let this build on sparc64 again.


# 1.2 28-Nov-2017 dlg

remove the #if 0ed out ixl_add_veb now that cvs has backed it up

im pretty sure we dont need it if we're just using the chip as a
single ethernet port.


# 1.1 28-Nov-2017 dlg

add ixl(4) for the "Intel Ethernet 700 Series"

this doesn't work yet, but it very recently got too big to hack on
without cvs to help me manage further changes to it.

ok deraadt@


# 1.47 22-Apr-2020 mpi

Use I40E_QUEUE_TYPE_EOL instead of hardcoding its value localy.

ok jmatthew@


# 1.46 19-Nov-2019 yasuoka

Correct the link speed constant variables. From Shoichi Yamaguchi.

ok jmatthew


Revision tags: OPENBSD_6_6_BASE
# 1.45 02-Oct-2019 yasuoka

When dequeuing an aqb from the live queue and move it to the idle queue,
remove it from the live queue. Found by Shoichi Yamaguchi.

ok dlg


# 1.44 30-Sep-2019 dlg

use the right values when figuring out which if_baudrate to use.

i was basically using ISSET(bitfield, shift) instead of
ISSET(bitfield, 1 << shift), so things didn't line up properly.

before this the baudrate would come out as 0, but lacp mode in
trunk(4) wants a non-zero if_baudrate value to help pick the best
aggregator to use. if the port reports 0, it doesn't get selected.
this probably explains why trunk(4) doesn't like doing lacp on
virtual ethernet interfaces too, but at least i can get a number
that actually means something on a hardware interface. alternatively,
use aggr(4), which doesn't care about the baudrate.

ok jmatthew@


# 1.43 27-Aug-2019 dlg

don't check the page number was set correctly.

there are some extremely terrible modules out there that violate
specs badly. some SFP+ DACs i have don't support page switching,
and always report page 0xff. let userland look at them anyway.


# 1.42 27-Aug-2019 dlg

make SIOCGIFSFFPAGE support QSFP modules better.

basically the "read phy register" admin op uses the "address" field
on SFP modules for the i2c address, but uses that same field for
the "sff page number" for QSFP. this wraps the calls to the admin
op with sfp and qsfp specific code, and chooses between them depending
on what type of phy the fw reports.

tested by me and jmatthew@ pretty hard


# 1.41 29-Jul-2019 jmatthew

remove some VF bits now that iavf(4) exists

ok dlg@


# 1.40 21-Jul-2019 dlg

apply backpressure when ifiq says the stack is getting busy


# 1.39 04-Jul-2019 jmatthew

when the mac address changes, update the mac/vlan filters accordingly.

ok dlg@


# 1.38 04-May-2019 jsg

fix array bounds check in ixl_search_link_speed()
ok dlg@


# 1.37 15-Apr-2019 visa

Use timeout_del_barrier(9) instead of timeout_del(9) followed by
conditional timeout_barrier(9).

OK kn@ dlg@


Revision tags: OPENBSD_6_5_BASE
# 1.36 10-Apr-2019 phessler

Add support for X722 to ixl(4)

full list of pci ids from sthen@

OK sthen@, jmatthew@, deraadt@


# 1.35 10-Apr-2019 dlg

implement support for SIOCGIFSFFPAGE

this relies on some firmware commands for accessing the i2c bus
that are not available until a relatively recent API version. our
nics using API 1.4 and 1.5 don't handle the command, but the API
1.7 nic we have is happy to talk to the module that is plugged into
it.

xl710 cards (the 40g ones with a qsfp+ connector) can be split up
into 4 functions that represent lanes on a single port. you can get
qsfp+ to 4x sfp+ cables so you can use the different lanes/functions
as completely independent interfaces. however, because each lane/
function is one port and therefore module, we need to serialise
access to the module by at least the port. this is to prevent
concurrent reads of different pages of the one module from stepping
on each other. i took the easy path and made a single ixlsff lock,
which is at least conservative.


# 1.34 01-Apr-2019 jmatthew

Enable the full pre-reset code path again, now that we've fixed the cause
of the pcie errors.

ok dlg@


# 1.33 01-Apr-2019 jmatthew

Don't use a prefetchable mapping for the registers - the controller only
accepts specific read and write widths, so prefetching and write combining
can cause it to generate pcie errors.

ok dlg@


# 1.32 29-Mar-2019 dlg

remove ifiq_barrier in ixl_down cos ifiq tasks don't use nic resources.

ixl_down frees all the dma memory used by the rings, but that memory
isn't used by ifiqs, so there's no need to wait for them to finish


# 1.31 22-Mar-2019 dlg

back out 1.28

i misread the doco, having promisc vlan reception is what we want.


# 1.30 22-Mar-2019 dlg

use a cond in ixl_atq_exec to wait for a command to be completed.

the main change is to not run ixl_atq_done, cos the interrupt handler
is supposed to do that for us.


# 1.29 22-Mar-2019 dlg

don't blindly complete admin queue entries

there's a bit in the flags field that the firmware sets when the
command is done, so check to decide if the command is ready to be
completed. this in turn makes ixl_iff work.

"oh" jmatthew@


# 1.28 21-Mar-2019 dlg

don't set VLAN things when configuring promisc

the doco says vlan things should only be set if you're manipulating
something on a specific vlan, otherwise the config applies to
everything, which is how the stack wants things to work at the
moment.

there's still something wrong in here, but let's get this out of
the way first.


# 1.27 21-Mar-2019 dlg

run event callbacks directly in the atq processing

previously events were queued on an SLIST, but multiple link state
events could fire with the same callback. this corrupted the SLIST
and effectively caused an infinite loop.

ok jmatthew@


# 1.26 12-Mar-2019 jmatthew

Until we can figure out why it causes NMIs on some machines, skip the
pre-reset steps described in Intel's datasheet and also their driver code.

ok dlg@


# 1.25 06-Mar-2019 jmatthew

Ignore rx interrupts if the interface isn't supposed to be running.
In some situations the pxe rom seems to leave rx interrupts
pending, so we get them as soon as we turn interrupts on.
Trying to process rx interrupts before we've allocated an rx ring
leads to crashes.

ok dlg@


# 1.24 01-Mar-2019 dlg

use ifiq_input instead of if_input

call if_rxr_livelocked if ifiq_input says to slow down


# 1.23 26-Feb-2019 dlg

tweak the mbuf loading in the tx path so it's easier to read.

count mbuf load failures as output errors so i can see if that's
a problem (it's not, but at least i can see it isn't now).


# 1.22 26-Feb-2019 dlg

put the rx buffer at the end of the cluster.

makes m_prepend later less likely to allocate a new mbuf.


# 1.21 26-Feb-2019 dlg

avoid a deadlock in ixl_down when calling ifq_barrier.

this is particularly noticable on sparc64 when you reboot.

ok jmatthew@


# 1.20 24-Feb-2019 dlg

get rid of an unused softc member


# 1.19 01-Feb-2019 jmatthew

fix up calculation of our physical function id, making the second port
on dual port cards work much better.

ok dlg@


# 1.18 29-Jan-2019 dlg

get rid of some more debug printfs

suggested by jmatthew@


# 1.17 29-Jan-2019 dlg

don't need to print the base queue number.

ok jmatthew@


# 1.16 22-Jan-2019 jmatthew

Increase hardmtu to the maximum according to the datasheet and set the rx
packet size limit to match so jumbos actually work. Larger packets are
split across multiple buffers on the ring, so the buffers themselves stay
the same size.

ok dlg@


# 1.15 22-Jan-2019 jmatthew

Add and remove mac filters for multicast addresses.

ok dlg@


# 1.14 21-Jan-2019 jmatthew

also add a mac filter that ignores vlans for the broadcast address, so we can
see arp requests on vlans, among other things.

ok dlg@


# 1.13 20-Jan-2019 jmatthew

Replace the default mac filter with one that ignores vlans, and enable
promisc vlan mode so we can see vlan tagged traffic.

ok dlg@


# 1.12 20-Jan-2019 jmatthew

implement ixl_rxrinfo, dynamically allocating the buffer to prepare for
multiple receive rings at some point in the future.

ok dlg@


# 1.11 20-Jan-2019 jmatthew

Handle link state change interrupts by issuing IXL_AQ_OP_PHY_LINK_STATUS
to the admin queue. We don't need to wait for or process the reply,
because the existing admin reply queue processing already does it.

ok dlg@


# 1.10 19-Jan-2019 jmatthew

actually set CAUSE_ENA on the rx and tx queues, and re-enable interrupts
at the start of the interrupt handler. now it works well enough to commit
over.

ok dlg@


# 1.9 18-Jan-2019 jmatthew

pack hmc bits in the right order

ok dlg@


# 1.8 18-Nov-2018 jmatthew

request notification of link state changes, which helps us detect
link when it takes a bit longer to establish.

ok dlg@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.7 21-Dec-2017 dlg

add some ifq and ifiq barriers in ixl_down.

move rxfill in ixl_up so the qtail is only written once.


# 1.6 21-Dec-2017 dlg

now that we have multiple input queues in ifnet structs, use them.

for now we still only have one set of tx and rx rings, but sets us up to
bump that number.


# 1.5 15-Dec-2017 dlg

put where im up to into the tree so jmatthew@ can look at it.


# 1.4 29-Nov-2017 dlg

turns out you dont have to configure a vsi as the default in a veb.

what a waste of two days.


# 1.3 29-Nov-2017 dlg

let this build on sparc64 again.


# 1.2 28-Nov-2017 dlg

remove the #if 0ed out ixl_add_veb now that cvs has backed it up

im pretty sure we dont need it if we're just using the chip as a
single ethernet port.


# 1.1 28-Nov-2017 dlg

add ixl(4) for the "Intel Ethernet 700 Series"

this doesn't work yet, but it very recently got too big to hack on
without cvs to help me manage further changes to it.

ok deraadt@


# 1.46 19-Nov-2019 yasuoka

Correct the link speed constant variables. From Shoichi Yamaguchi.

ok jmatthew


Revision tags: OPENBSD_6_6_BASE
# 1.45 02-Oct-2019 yasuoka

When dequeuing an aqb from the live queue and move it to the idle queue,
remove it from the live queue. Found by Shoichi Yamaguchi.

ok dlg


# 1.44 30-Sep-2019 dlg

use the right values when figuring out which if_baudrate to use.

i was basically using ISSET(bitfield, shift) instead of
ISSET(bitfield, 1 << shift), so things didn't line up properly.

before this the baudrate would come out as 0, but lacp mode in
trunk(4) wants a non-zero if_baudrate value to help pick the best
aggregator to use. if the port reports 0, it doesn't get selected.
this probably explains why trunk(4) doesn't like doing lacp on
virtual ethernet interfaces too, but at least i can get a number
that actually means something on a hardware interface. alternatively,
use aggr(4), which doesn't care about the baudrate.

ok jmatthew@


# 1.43 27-Aug-2019 dlg

don't check the page number was set correctly.

there are some extremely terrible modules out there that violate
specs badly. some SFP+ DACs i have don't support page switching,
and always report page 0xff. let userland look at them anyway.


# 1.42 27-Aug-2019 dlg

make SIOCGIFSFFPAGE support QSFP modules better.

basically the "read phy register" admin op uses the "address" field
on SFP modules for the i2c address, but uses that same field for
the "sff page number" for QSFP. this wraps the calls to the admin
op with sfp and qsfp specific code, and chooses between them depending
on what type of phy the fw reports.

tested by me and jmatthew@ pretty hard


# 1.41 29-Jul-2019 jmatthew

remove some VF bits now that iavf(4) exists

ok dlg@


# 1.40 21-Jul-2019 dlg

apply backpressure when ifiq says the stack is getting busy


# 1.39 04-Jul-2019 jmatthew

when the mac address changes, update the mac/vlan filters accordingly.

ok dlg@


# 1.38 04-May-2019 jsg

fix array bounds check in ixl_search_link_speed()
ok dlg@


# 1.37 15-Apr-2019 visa

Use timeout_del_barrier(9) instead of timeout_del(9) followed by
conditional timeout_barrier(9).

OK kn@ dlg@


Revision tags: OPENBSD_6_5_BASE
# 1.36 10-Apr-2019 phessler

Add support for X722 to ixl(4)

full list of pci ids from sthen@

OK sthen@, jmatthew@, deraadt@


# 1.35 10-Apr-2019 dlg

implement support for SIOCGIFSFFPAGE

this relies on some firmware commands for accessing the i2c bus
that are not available until a relatively recent API version. our
nics using API 1.4 and 1.5 don't handle the command, but the API
1.7 nic we have is happy to talk to the module that is plugged into
it.

xl710 cards (the 40g ones with a qsfp+ connector) can be split up
into 4 functions that represent lanes on a single port. you can get
qsfp+ to 4x sfp+ cables so you can use the different lanes/functions
as completely independent interfaces. however, because each lane/
function is one port and therefore module, we need to serialise
access to the module by at least the port. this is to prevent
concurrent reads of different pages of the one module from stepping
on each other. i took the easy path and made a single ixlsff lock,
which is at least conservative.


# 1.34 01-Apr-2019 jmatthew

Enable the full pre-reset code path again, now that we've fixed the cause
of the pcie errors.

ok dlg@


# 1.33 01-Apr-2019 jmatthew

Don't use a prefetchable mapping for the registers - the controller only
accepts specific read and write widths, so prefetching and write combining
can cause it to generate pcie errors.

ok dlg@


# 1.32 29-Mar-2019 dlg

remove ifiq_barrier in ixl_down cos ifiq tasks don't use nic resources.

ixl_down frees all the dma memory used by the rings, but that memory
isn't used by ifiqs, so there's no need to wait for them to finish


# 1.31 22-Mar-2019 dlg

back out 1.28

i misread the doco, having promisc vlan reception is what we want.


# 1.30 22-Mar-2019 dlg

use a cond in ixl_atq_exec to wait for a command to be completed.

the main change is to not run ixl_atq_done, cos the interrupt handler
is supposed to do that for us.


# 1.29 22-Mar-2019 dlg

don't blindly complete admin queue entries

there's a bit in the flags field that the firmware sets when the
command is done, so check to decide if the command is ready to be
completed. this in turn makes ixl_iff work.

"oh" jmatthew@


# 1.28 21-Mar-2019 dlg

don't set VLAN things when configuring promisc

the doco says vlan things should only be set if you're manipulating
something on a specific vlan, otherwise the config applies to
everything, which is how the stack wants things to work at the
moment.

there's still something wrong in here, but let's get this out of
the way first.


# 1.27 21-Mar-2019 dlg

run event callbacks directly in the atq processing

previously events were queued on an SLIST, but multiple link state
events could fire with the same callback. this corrupted the SLIST
and effectively caused an infinite loop.

ok jmatthew@


# 1.26 12-Mar-2019 jmatthew

Until we can figure out why it causes NMIs on some machines, skip the
pre-reset steps described in Intel's datasheet and also their driver code.

ok dlg@


# 1.25 06-Mar-2019 jmatthew

Ignore rx interrupts if the interface isn't supposed to be running.
In some situations the pxe rom seems to leave rx interrupts
pending, so we get them as soon as we turn interrupts on.
Trying to process rx interrupts before we've allocated an rx ring
leads to crashes.

ok dlg@


# 1.24 01-Mar-2019 dlg

use ifiq_input instead of if_input

call if_rxr_livelocked if ifiq_input says to slow down


# 1.23 26-Feb-2019 dlg

tweak the mbuf loading in the tx path so it's easier to read.

count mbuf load failures as output errors so i can see if that's
a problem (it's not, but at least i can see it isn't now).


# 1.22 26-Feb-2019 dlg

put the rx buffer at the end of the cluster.

makes m_prepend later less likely to allocate a new mbuf.


# 1.21 26-Feb-2019 dlg

avoid a deadlock in ixl_down when calling ifq_barrier.

this is particularly noticable on sparc64 when you reboot.

ok jmatthew@


# 1.20 24-Feb-2019 dlg

get rid of an unused softc member


# 1.19 01-Feb-2019 jmatthew

fix up calculation of our physical function id, making the second port
on dual port cards work much better.

ok dlg@


# 1.18 29-Jan-2019 dlg

get rid of some more debug printfs

suggested by jmatthew@


# 1.17 29-Jan-2019 dlg

don't need to print the base queue number.

ok jmatthew@


# 1.16 22-Jan-2019 jmatthew

Increase hardmtu to the maximum according to the datasheet and set the rx
packet size limit to match so jumbos actually work. Larger packets are
split across multiple buffers on the ring, so the buffers themselves stay
the same size.

ok dlg@


# 1.15 22-Jan-2019 jmatthew

Add and remove mac filters for multicast addresses.

ok dlg@


# 1.14 21-Jan-2019 jmatthew

also add a mac filter that ignores vlans for the broadcast address, so we can
see arp requests on vlans, among other things.

ok dlg@


# 1.13 20-Jan-2019 jmatthew

Replace the default mac filter with one that ignores vlans, and enable
promisc vlan mode so we can see vlan tagged traffic.

ok dlg@


# 1.12 20-Jan-2019 jmatthew

implement ixl_rxrinfo, dynamically allocating the buffer to prepare for
multiple receive rings at some point in the future.

ok dlg@


# 1.11 20-Jan-2019 jmatthew

Handle link state change interrupts by issuing IXL_AQ_OP_PHY_LINK_STATUS
to the admin queue. We don't need to wait for or process the reply,
because the existing admin reply queue processing already does it.

ok dlg@


# 1.10 19-Jan-2019 jmatthew

actually set CAUSE_ENA on the rx and tx queues, and re-enable interrupts
at the start of the interrupt handler. now it works well enough to commit
over.

ok dlg@


# 1.9 18-Jan-2019 jmatthew

pack hmc bits in the right order

ok dlg@


# 1.8 18-Nov-2018 jmatthew

request notification of link state changes, which helps us detect
link when it takes a bit longer to establish.

ok dlg@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.7 21-Dec-2017 dlg

add some ifq and ifiq barriers in ixl_down.

move rxfill in ixl_up so the qtail is only written once.


# 1.6 21-Dec-2017 dlg

now that we have multiple input queues in ifnet structs, use them.

for now we still only have one set of tx and rx rings, but sets us up to
bump that number.


# 1.5 15-Dec-2017 dlg

put where im up to into the tree so jmatthew@ can look at it.


# 1.4 29-Nov-2017 dlg

turns out you dont have to configure a vsi as the default in a veb.

what a waste of two days.


# 1.3 29-Nov-2017 dlg

let this build on sparc64 again.


# 1.2 28-Nov-2017 dlg

remove the #if 0ed out ixl_add_veb now that cvs has backed it up

im pretty sure we dont need it if we're just using the chip as a
single ethernet port.


# 1.1 28-Nov-2017 dlg

add ixl(4) for the "Intel Ethernet 700 Series"

this doesn't work yet, but it very recently got too big to hack on
without cvs to help me manage further changes to it.

ok deraadt@


# 1.45 02-Oct-2019 yasuoka

When dequeuing an aqb from the live queue and move it to the idle queue,
remove it from the live queue. Found by Shoichi Yamaguchi.

ok dlg


# 1.44 30-Sep-2019 dlg

use the right values when figuring out which if_baudrate to use.

i was basically using ISSET(bitfield, shift) instead of
ISSET(bitfield, 1 << shift), so things didn't line up properly.

before this the baudrate would come out as 0, but lacp mode in
trunk(4) wants a non-zero if_baudrate value to help pick the best
aggregator to use. if the port reports 0, it doesn't get selected.
this probably explains why trunk(4) doesn't like doing lacp on
virtual ethernet interfaces too, but at least i can get a number
that actually means something on a hardware interface. alternatively,
use aggr(4), which doesn't care about the baudrate.

ok jmatthew@


# 1.43 27-Aug-2019 dlg

don't check the page number was set correctly.

there are some extremely terrible modules out there that violate
specs badly. some SFP+ DACs i have don't support page switching,
and always report page 0xff. let userland look at them anyway.


# 1.42 27-Aug-2019 dlg

make SIOCGIFSFFPAGE support QSFP modules better.

basically the "read phy register" admin op uses the "address" field
on SFP modules for the i2c address, but uses that same field for
the "sff page number" for QSFP. this wraps the calls to the admin
op with sfp and qsfp specific code, and chooses between them depending
on what type of phy the fw reports.

tested by me and jmatthew@ pretty hard


# 1.41 29-Jul-2019 jmatthew

remove some VF bits now that iavf(4) exists

ok dlg@


# 1.40 21-Jul-2019 dlg

apply backpressure when ifiq says the stack is getting busy


# 1.39 04-Jul-2019 jmatthew

when the mac address changes, update the mac/vlan filters accordingly.

ok dlg@


# 1.38 04-May-2019 jsg

fix array bounds check in ixl_search_link_speed()
ok dlg@


# 1.37 15-Apr-2019 visa

Use timeout_del_barrier(9) instead of timeout_del(9) followed by
conditional timeout_barrier(9).

OK kn@ dlg@


Revision tags: OPENBSD_6_5_BASE
# 1.36 10-Apr-2019 phessler

Add support for X722 to ixl(4)

full list of pci ids from sthen@

OK sthen@, jmatthew@, deraadt@


# 1.35 10-Apr-2019 dlg

implement support for SIOCGIFSFFPAGE

this relies on some firmware commands for accessing the i2c bus
that are not available until a relatively recent API version. our
nics using API 1.4 and 1.5 don't handle the command, but the API
1.7 nic we have is happy to talk to the module that is plugged into
it.

xl710 cards (the 40g ones with a qsfp+ connector) can be split up
into 4 functions that represent lanes on a single port. you can get
qsfp+ to 4x sfp+ cables so you can use the different lanes/functions
as completely independent interfaces. however, because each lane/
function is one port and therefore module, we need to serialise
access to the module by at least the port. this is to prevent
concurrent reads of different pages of the one module from stepping
on each other. i took the easy path and made a single ixlsff lock,
which is at least conservative.


# 1.34 01-Apr-2019 jmatthew

Enable the full pre-reset code path again, now that we've fixed the cause
of the pcie errors.

ok dlg@


# 1.33 01-Apr-2019 jmatthew

Don't use a prefetchable mapping for the registers - the controller only
accepts specific read and write widths, so prefetching and write combining
can cause it to generate pcie errors.

ok dlg@


# 1.32 29-Mar-2019 dlg

remove ifiq_barrier in ixl_down cos ifiq tasks don't use nic resources.

ixl_down frees all the dma memory used by the rings, but that memory
isn't used by ifiqs, so there's no need to wait for them to finish


# 1.31 22-Mar-2019 dlg

back out 1.28

i misread the doco, having promisc vlan reception is what we want.


# 1.30 22-Mar-2019 dlg

use a cond in ixl_atq_exec to wait for a command to be completed.

the main change is to not run ixl_atq_done, cos the interrupt handler
is supposed to do that for us.


# 1.29 22-Mar-2019 dlg

don't blindly complete admin queue entries

there's a bit in the flags field that the firmware sets when the
command is done, so check to decide if the command is ready to be
completed. this in turn makes ixl_iff work.

"oh" jmatthew@


# 1.28 21-Mar-2019 dlg

don't set VLAN things when configuring promisc

the doco says vlan things should only be set if you're manipulating
something on a specific vlan, otherwise the config applies to
everything, which is how the stack wants things to work at the
moment.

there's still something wrong in here, but let's get this out of
the way first.


# 1.27 21-Mar-2019 dlg

run event callbacks directly in the atq processing

previously events were queued on an SLIST, but multiple link state
events could fire with the same callback. this corrupted the SLIST
and effectively caused an infinite loop.

ok jmatthew@


# 1.26 12-Mar-2019 jmatthew

Until we can figure out why it causes NMIs on some machines, skip the
pre-reset steps described in Intel's datasheet and also their driver code.

ok dlg@


# 1.25 06-Mar-2019 jmatthew

Ignore rx interrupts if the interface isn't supposed to be running.
In some situations the pxe rom seems to leave rx interrupts
pending, so we get them as soon as we turn interrupts on.
Trying to process rx interrupts before we've allocated an rx ring
leads to crashes.

ok dlg@


# 1.24 01-Mar-2019 dlg

use ifiq_input instead of if_input

call if_rxr_livelocked if ifiq_input says to slow down


# 1.23 26-Feb-2019 dlg

tweak the mbuf loading in the tx path so it's easier to read.

count mbuf load failures as output errors so i can see if that's
a problem (it's not, but at least i can see it isn't now).


# 1.22 26-Feb-2019 dlg

put the rx buffer at the end of the cluster.

makes m_prepend later less likely to allocate a new mbuf.


# 1.21 26-Feb-2019 dlg

avoid a deadlock in ixl_down when calling ifq_barrier.

this is particularly noticable on sparc64 when you reboot.

ok jmatthew@


# 1.20 24-Feb-2019 dlg

get rid of an unused softc member


# 1.19 01-Feb-2019 jmatthew

fix up calculation of our physical function id, making the second port
on dual port cards work much better.

ok dlg@


# 1.18 29-Jan-2019 dlg

get rid of some more debug printfs

suggested by jmatthew@


# 1.17 29-Jan-2019 dlg

don't need to print the base queue number.

ok jmatthew@


# 1.16 22-Jan-2019 jmatthew

Increase hardmtu to the maximum according to the datasheet and set the rx
packet size limit to match so jumbos actually work. Larger packets are
split across multiple buffers on the ring, so the buffers themselves stay
the same size.

ok dlg@


# 1.15 22-Jan-2019 jmatthew

Add and remove mac filters for multicast addresses.

ok dlg@


# 1.14 21-Jan-2019 jmatthew

also add a mac filter that ignores vlans for the broadcast address, so we can
see arp requests on vlans, among other things.

ok dlg@


# 1.13 20-Jan-2019 jmatthew

Replace the default mac filter with one that ignores vlans, and enable
promisc vlan mode so we can see vlan tagged traffic.

ok dlg@


# 1.12 20-Jan-2019 jmatthew

implement ixl_rxrinfo, dynamically allocating the buffer to prepare for
multiple receive rings at some point in the future.

ok dlg@


# 1.11 20-Jan-2019 jmatthew

Handle link state change interrupts by issuing IXL_AQ_OP_PHY_LINK_STATUS
to the admin queue. We don't need to wait for or process the reply,
because the existing admin reply queue processing already does it.

ok dlg@


# 1.10 19-Jan-2019 jmatthew

actually set CAUSE_ENA on the rx and tx queues, and re-enable interrupts
at the start of the interrupt handler. now it works well enough to commit
over.

ok dlg@


# 1.9 18-Jan-2019 jmatthew

pack hmc bits in the right order

ok dlg@


# 1.8 18-Nov-2018 jmatthew

request notification of link state changes, which helps us detect
link when it takes a bit longer to establish.

ok dlg@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.7 21-Dec-2017 dlg

add some ifq and ifiq barriers in ixl_down.

move rxfill in ixl_up so the qtail is only written once.


# 1.6 21-Dec-2017 dlg

now that we have multiple input queues in ifnet structs, use them.

for now we still only have one set of tx and rx rings, but sets us up to
bump that number.


# 1.5 15-Dec-2017 dlg

put where im up to into the tree so jmatthew@ can look at it.


# 1.4 29-Nov-2017 dlg

turns out you dont have to configure a vsi as the default in a veb.

what a waste of two days.


# 1.3 29-Nov-2017 dlg

let this build on sparc64 again.


# 1.2 28-Nov-2017 dlg

remove the #if 0ed out ixl_add_veb now that cvs has backed it up

im pretty sure we dont need it if we're just using the chip as a
single ethernet port.


# 1.1 28-Nov-2017 dlg

add ixl(4) for the "Intel Ethernet 700 Series"

this doesn't work yet, but it very recently got too big to hack on
without cvs to help me manage further changes to it.

ok deraadt@


# 1.44 30-Sep-2019 dlg

use the right values when figuring out which if_baudrate to use.

i was basically using ISSET(bitfield, shift) instead of
ISSET(bitfield, 1 << shift), so things didn't line up properly.

before this the baudrate would come out as 0, but lacp mode in
trunk(4) wants a non-zero if_baudrate value to help pick the best
aggregator to use. if the port reports 0, it doesn't get selected.
this probably explains why trunk(4) doesn't like doing lacp on
virtual ethernet interfaces too, but at least i can get a number
that actually means something on a hardware interface. alternatively,
use aggr(4), which doesn't care about the baudrate.

ok jmatthew@


# 1.43 27-Aug-2019 dlg

don't check the page number was set correctly.

there are some extremely terrible modules out there that violate
specs badly. some SFP+ DACs i have don't support page switching,
and always report page 0xff. let userland look at them anyway.


# 1.42 27-Aug-2019 dlg

make SIOCGIFSFFPAGE support QSFP modules better.

basically the "read phy register" admin op uses the "address" field
on SFP modules for the i2c address, but uses that same field for
the "sff page number" for QSFP. this wraps the calls to the admin
op with sfp and qsfp specific code, and chooses between them depending
on what type of phy the fw reports.

tested by me and jmatthew@ pretty hard


# 1.41 29-Jul-2019 jmatthew

remove some VF bits now that iavf(4) exists

ok dlg@


# 1.40 21-Jul-2019 dlg

apply backpressure when ifiq says the stack is getting busy


# 1.39 04-Jul-2019 jmatthew

when the mac address changes, update the mac/vlan filters accordingly.

ok dlg@


# 1.38 04-May-2019 jsg

fix array bounds check in ixl_search_link_speed()
ok dlg@


# 1.37 15-Apr-2019 visa

Use timeout_del_barrier(9) instead of timeout_del(9) followed by
conditional timeout_barrier(9).

OK kn@ dlg@


Revision tags: OPENBSD_6_5_BASE
# 1.36 10-Apr-2019 phessler

Add support for X722 to ixl(4)

full list of pci ids from sthen@

OK sthen@, jmatthew@, deraadt@


# 1.35 10-Apr-2019 dlg

implement support for SIOCGIFSFFPAGE

this relies on some firmware commands for accessing the i2c bus
that are not available until a relatively recent API version. our
nics using API 1.4 and 1.5 don't handle the command, but the API
1.7 nic we have is happy to talk to the module that is plugged into
it.

xl710 cards (the 40g ones with a qsfp+ connector) can be split up
into 4 functions that represent lanes on a single port. you can get
qsfp+ to 4x sfp+ cables so you can use the different lanes/functions
as completely independent interfaces. however, because each lane/
function is one port and therefore module, we need to serialise
access to the module by at least the port. this is to prevent
concurrent reads of different pages of the one module from stepping
on each other. i took the easy path and made a single ixlsff lock,
which is at least conservative.


# 1.34 01-Apr-2019 jmatthew

Enable the full pre-reset code path again, now that we've fixed the cause
of the pcie errors.

ok dlg@


# 1.33 01-Apr-2019 jmatthew

Don't use a prefetchable mapping for the registers - the controller only
accepts specific read and write widths, so prefetching and write combining
can cause it to generate pcie errors.

ok dlg@


# 1.32 29-Mar-2019 dlg

remove ifiq_barrier in ixl_down cos ifiq tasks don't use nic resources.

ixl_down frees all the dma memory used by the rings, but that memory
isn't used by ifiqs, so there's no need to wait for them to finish


# 1.31 22-Mar-2019 dlg

back out 1.28

i misread the doco, having promisc vlan reception is what we want.


# 1.30 22-Mar-2019 dlg

use a cond in ixl_atq_exec to wait for a command to be completed.

the main change is to not run ixl_atq_done, cos the interrupt handler
is supposed to do that for us.


# 1.29 22-Mar-2019 dlg

don't blindly complete admin queue entries

there's a bit in the flags field that the firmware sets when the
command is done, so check to decide if the command is ready to be
completed. this in turn makes ixl_iff work.

"oh" jmatthew@


# 1.28 21-Mar-2019 dlg

don't set VLAN things when configuring promisc

the doco says vlan things should only be set if you're manipulating
something on a specific vlan, otherwise the config applies to
everything, which is how the stack wants things to work at the
moment.

there's still something wrong in here, but let's get this out of
the way first.


# 1.27 21-Mar-2019 dlg

run event callbacks directly in the atq processing

previously events were queued on an SLIST, but multiple link state
events could fire with the same callback. this corrupted the SLIST
and effectively caused an infinite loop.

ok jmatthew@


# 1.26 12-Mar-2019 jmatthew

Until we can figure out why it causes NMIs on some machines, skip the
pre-reset steps described in Intel's datasheet and also their driver code.

ok dlg@


# 1.25 06-Mar-2019 jmatthew

Ignore rx interrupts if the interface isn't supposed to be running.
In some situations the pxe rom seems to leave rx interrupts
pending, so we get them as soon as we turn interrupts on.
Trying to process rx interrupts before we've allocated an rx ring
leads to crashes.

ok dlg@


# 1.24 01-Mar-2019 dlg

use ifiq_input instead of if_input

call if_rxr_livelocked if ifiq_input says to slow down


# 1.23 26-Feb-2019 dlg

tweak the mbuf loading in the tx path so it's easier to read.

count mbuf load failures as output errors so i can see if that's
a problem (it's not, but at least i can see it isn't now).


# 1.22 26-Feb-2019 dlg

put the rx buffer at the end of the cluster.

makes m_prepend later less likely to allocate a new mbuf.


# 1.21 26-Feb-2019 dlg

avoid a deadlock in ixl_down when calling ifq_barrier.

this is particularly noticable on sparc64 when you reboot.

ok jmatthew@


# 1.20 24-Feb-2019 dlg

get rid of an unused softc member


# 1.19 01-Feb-2019 jmatthew

fix up calculation of our physical function id, making the second port
on dual port cards work much better.

ok dlg@


# 1.18 29-Jan-2019 dlg

get rid of some more debug printfs

suggested by jmatthew@


# 1.17 29-Jan-2019 dlg

don't need to print the base queue number.

ok jmatthew@


# 1.16 22-Jan-2019 jmatthew

Increase hardmtu to the maximum according to the datasheet and set the rx
packet size limit to match so jumbos actually work. Larger packets are
split across multiple buffers on the ring, so the buffers themselves stay
the same size.

ok dlg@


# 1.15 22-Jan-2019 jmatthew

Add and remove mac filters for multicast addresses.

ok dlg@


# 1.14 21-Jan-2019 jmatthew

also add a mac filter that ignores vlans for the broadcast address, so we can
see arp requests on vlans, among other things.

ok dlg@


# 1.13 20-Jan-2019 jmatthew

Replace the default mac filter with one that ignores vlans, and enable
promisc vlan mode so we can see vlan tagged traffic.

ok dlg@


# 1.12 20-Jan-2019 jmatthew

implement ixl_rxrinfo, dynamically allocating the buffer to prepare for
multiple receive rings at some point in the future.

ok dlg@


# 1.11 20-Jan-2019 jmatthew

Handle link state change interrupts by issuing IXL_AQ_OP_PHY_LINK_STATUS
to the admin queue. We don't need to wait for or process the reply,
because the existing admin reply queue processing already does it.

ok dlg@


# 1.10 19-Jan-2019 jmatthew

actually set CAUSE_ENA on the rx and tx queues, and re-enable interrupts
at the start of the interrupt handler. now it works well enough to commit
over.

ok dlg@


# 1.9 18-Jan-2019 jmatthew

pack hmc bits in the right order

ok dlg@


# 1.8 18-Nov-2018 jmatthew

request notification of link state changes, which helps us detect
link when it takes a bit longer to establish.

ok dlg@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.7 21-Dec-2017 dlg

add some ifq and ifiq barriers in ixl_down.

move rxfill in ixl_up so the qtail is only written once.


# 1.6 21-Dec-2017 dlg

now that we have multiple input queues in ifnet structs, use them.

for now we still only have one set of tx and rx rings, but sets us up to
bump that number.


# 1.5 15-Dec-2017 dlg

put where im up to into the tree so jmatthew@ can look at it.


# 1.4 29-Nov-2017 dlg

turns out you dont have to configure a vsi as the default in a veb.

what a waste of two days.


# 1.3 29-Nov-2017 dlg

let this build on sparc64 again.


# 1.2 28-Nov-2017 dlg

remove the #if 0ed out ixl_add_veb now that cvs has backed it up

im pretty sure we dont need it if we're just using the chip as a
single ethernet port.


# 1.1 28-Nov-2017 dlg

add ixl(4) for the "Intel Ethernet 700 Series"

this doesn't work yet, but it very recently got too big to hack on
without cvs to help me manage further changes to it.

ok deraadt@


# 1.43 27-Aug-2019 dlg

don't check the page number was set correctly.

there are some extremely terrible modules out there that violate
specs badly. some SFP+ DACs i have don't support page switching,
and always report page 0xff. let userland look at them anyway.


# 1.42 27-Aug-2019 dlg

make SIOCGIFSFFPAGE support QSFP modules better.

basically the "read phy register" admin op uses the "address" field
on SFP modules for the i2c address, but uses that same field for
the "sff page number" for QSFP. this wraps the calls to the admin
op with sfp and qsfp specific code, and chooses between them depending
on what type of phy the fw reports.

tested by me and jmatthew@ pretty hard


# 1.41 29-Jul-2019 jmatthew

remove some VF bits now that iavf(4) exists

ok dlg@


# 1.40 21-Jul-2019 dlg

apply backpressure when ifiq says the stack is getting busy


# 1.39 04-Jul-2019 jmatthew

when the mac address changes, update the mac/vlan filters accordingly.

ok dlg@


# 1.38 04-May-2019 jsg

fix array bounds check in ixl_search_link_speed()
ok dlg@


# 1.37 15-Apr-2019 visa

Use timeout_del_barrier(9) instead of timeout_del(9) followed by
conditional timeout_barrier(9).

OK kn@ dlg@


Revision tags: OPENBSD_6_5_BASE
# 1.36 10-Apr-2019 phessler

Add support for X722 to ixl(4)

full list of pci ids from sthen@

OK sthen@, jmatthew@, deraadt@


# 1.35 10-Apr-2019 dlg

implement support for SIOCGIFSFFPAGE

this relies on some firmware commands for accessing the i2c bus
that are not available until a relatively recent API version. our
nics using API 1.4 and 1.5 don't handle the command, but the API
1.7 nic we have is happy to talk to the module that is plugged into
it.

xl710 cards (the 40g ones with a qsfp+ connector) can be split up
into 4 functions that represent lanes on a single port. you can get
qsfp+ to 4x sfp+ cables so you can use the different lanes/functions
as completely independent interfaces. however, because each lane/
function is one port and therefore module, we need to serialise
access to the module by at least the port. this is to prevent
concurrent reads of different pages of the one module from stepping
on each other. i took the easy path and made a single ixlsff lock,
which is at least conservative.


# 1.34 01-Apr-2019 jmatthew

Enable the full pre-reset code path again, now that we've fixed the cause
of the pcie errors.

ok dlg@


# 1.33 01-Apr-2019 jmatthew

Don't use a prefetchable mapping for the registers - the controller only
accepts specific read and write widths, so prefetching and write combining
can cause it to generate pcie errors.

ok dlg@


# 1.32 29-Mar-2019 dlg

remove ifiq_barrier in ixl_down cos ifiq tasks don't use nic resources.

ixl_down frees all the dma memory used by the rings, but that memory
isn't used by ifiqs, so there's no need to wait for them to finish


# 1.31 22-Mar-2019 dlg

back out 1.28

i misread the doco, having promisc vlan reception is what we want.


# 1.30 22-Mar-2019 dlg

use a cond in ixl_atq_exec to wait for a command to be completed.

the main change is to not run ixl_atq_done, cos the interrupt handler
is supposed to do that for us.


# 1.29 22-Mar-2019 dlg

don't blindly complete admin queue entries

there's a bit in the flags field that the firmware sets when the
command is done, so check to decide if the command is ready to be
completed. this in turn makes ixl_iff work.

"oh" jmatthew@


# 1.28 21-Mar-2019 dlg

don't set VLAN things when configuring promisc

the doco says vlan things should only be set if you're manipulating
something on a specific vlan, otherwise the config applies to
everything, which is how the stack wants things to work at the
moment.

there's still something wrong in here, but let's get this out of
the way first.


# 1.27 21-Mar-2019 dlg

run event callbacks directly in the atq processing

previously events were queued on an SLIST, but multiple link state
events could fire with the same callback. this corrupted the SLIST
and effectively caused an infinite loop.

ok jmatthew@


# 1.26 12-Mar-2019 jmatthew

Until we can figure out why it causes NMIs on some machines, skip the
pre-reset steps described in Intel's datasheet and also their driver code.

ok dlg@


# 1.25 06-Mar-2019 jmatthew

Ignore rx interrupts if the interface isn't supposed to be running.
In some situations the pxe rom seems to leave rx interrupts
pending, so we get them as soon as we turn interrupts on.
Trying to process rx interrupts before we've allocated an rx ring
leads to crashes.

ok dlg@


# 1.24 01-Mar-2019 dlg

use ifiq_input instead of if_input

call if_rxr_livelocked if ifiq_input says to slow down


# 1.23 26-Feb-2019 dlg

tweak the mbuf loading in the tx path so it's easier to read.

count mbuf load failures as output errors so i can see if that's
a problem (it's not, but at least i can see it isn't now).


# 1.22 26-Feb-2019 dlg

put the rx buffer at the end of the cluster.

makes m_prepend later less likely to allocate a new mbuf.


# 1.21 26-Feb-2019 dlg

avoid a deadlock in ixl_down when calling ifq_barrier.

this is particularly noticable on sparc64 when you reboot.

ok jmatthew@


# 1.20 24-Feb-2019 dlg

get rid of an unused softc member


# 1.19 01-Feb-2019 jmatthew

fix up calculation of our physical function id, making the second port
on dual port cards work much better.

ok dlg@


# 1.18 29-Jan-2019 dlg

get rid of some more debug printfs

suggested by jmatthew@


# 1.17 29-Jan-2019 dlg

don't need to print the base queue number.

ok jmatthew@


# 1.16 22-Jan-2019 jmatthew

Increase hardmtu to the maximum according to the datasheet and set the rx
packet size limit to match so jumbos actually work. Larger packets are
split across multiple buffers on the ring, so the buffers themselves stay
the same size.

ok dlg@


# 1.15 22-Jan-2019 jmatthew

Add and remove mac filters for multicast addresses.

ok dlg@


# 1.14 21-Jan-2019 jmatthew

also add a mac filter that ignores vlans for the broadcast address, so we can
see arp requests on vlans, among other things.

ok dlg@


# 1.13 20-Jan-2019 jmatthew

Replace the default mac filter with one that ignores vlans, and enable
promisc vlan mode so we can see vlan tagged traffic.

ok dlg@


# 1.12 20-Jan-2019 jmatthew

implement ixl_rxrinfo, dynamically allocating the buffer to prepare for
multiple receive rings at some point in the future.

ok dlg@


# 1.11 20-Jan-2019 jmatthew

Handle link state change interrupts by issuing IXL_AQ_OP_PHY_LINK_STATUS
to the admin queue. We don't need to wait for or process the reply,
because the existing admin reply queue processing already does it.

ok dlg@


# 1.10 19-Jan-2019 jmatthew

actually set CAUSE_ENA on the rx and tx queues, and re-enable interrupts
at the start of the interrupt handler. now it works well enough to commit
over.

ok dlg@


# 1.9 18-Jan-2019 jmatthew

pack hmc bits in the right order

ok dlg@


# 1.8 18-Nov-2018 jmatthew

request notification of link state changes, which helps us detect
link when it takes a bit longer to establish.

ok dlg@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.7 21-Dec-2017 dlg

add some ifq and ifiq barriers in ixl_down.

move rxfill in ixl_up so the qtail is only written once.


# 1.6 21-Dec-2017 dlg

now that we have multiple input queues in ifnet structs, use them.

for now we still only have one set of tx and rx rings, but sets us up to
bump that number.


# 1.5 15-Dec-2017 dlg

put where im up to into the tree so jmatthew@ can look at it.


# 1.4 29-Nov-2017 dlg

turns out you dont have to configure a vsi as the default in a veb.

what a waste of two days.


# 1.3 29-Nov-2017 dlg

let this build on sparc64 again.


# 1.2 28-Nov-2017 dlg

remove the #if 0ed out ixl_add_veb now that cvs has backed it up

im pretty sure we dont need it if we're just using the chip as a
single ethernet port.


# 1.1 28-Nov-2017 dlg

add ixl(4) for the "Intel Ethernet 700 Series"

this doesn't work yet, but it very recently got too big to hack on
without cvs to help me manage further changes to it.

ok deraadt@


# 1.41 29-Jul-2019 jmatthew

remove some VF bits now that iavf(4) exists

ok dlg@


# 1.40 21-Jul-2019 dlg

apply backpressure when ifiq says the stack is getting busy


# 1.39 04-Jul-2019 jmatthew

when the mac address changes, update the mac/vlan filters accordingly.

ok dlg@


# 1.38 04-May-2019 jsg

fix array bounds check in ixl_search_link_speed()
ok dlg@


# 1.37 15-Apr-2019 visa

Use timeout_del_barrier(9) instead of timeout_del(9) followed by
conditional timeout_barrier(9).

OK kn@ dlg@


Revision tags: OPENBSD_6_5_BASE
# 1.36 10-Apr-2019 phessler

Add support for X722 to ixl(4)

full list of pci ids from sthen@

OK sthen@, jmatthew@, deraadt@


# 1.35 10-Apr-2019 dlg

implement support for SIOCGIFSFFPAGE

this relies on some firmware commands for accessing the i2c bus
that are not available until a relatively recent API version. our
nics using API 1.4 and 1.5 don't handle the command, but the API
1.7 nic we have is happy to talk to the module that is plugged into
it.

xl710 cards (the 40g ones with a qsfp+ connector) can be split up
into 4 functions that represent lanes on a single port. you can get
qsfp+ to 4x sfp+ cables so you can use the different lanes/functions
as completely independent interfaces. however, because each lane/
function is one port and therefore module, we need to serialise
access to the module by at least the port. this is to prevent
concurrent reads of different pages of the one module from stepping
on each other. i took the easy path and made a single ixlsff lock,
which is at least conservative.


# 1.34 01-Apr-2019 jmatthew

Enable the full pre-reset code path again, now that we've fixed the cause
of the pcie errors.

ok dlg@


# 1.33 01-Apr-2019 jmatthew

Don't use a prefetchable mapping for the registers - the controller only
accepts specific read and write widths, so prefetching and write combining
can cause it to generate pcie errors.

ok dlg@


# 1.32 29-Mar-2019 dlg

remove ifiq_barrier in ixl_down cos ifiq tasks don't use nic resources.

ixl_down frees all the dma memory used by the rings, but that memory
isn't used by ifiqs, so there's no need to wait for them to finish


# 1.31 22-Mar-2019 dlg

back out 1.28

i misread the doco, having promisc vlan reception is what we want.


# 1.30 22-Mar-2019 dlg

use a cond in ixl_atq_exec to wait for a command to be completed.

the main change is to not run ixl_atq_done, cos the interrupt handler
is supposed to do that for us.


# 1.29 22-Mar-2019 dlg

don't blindly complete admin queue entries

there's a bit in the flags field that the firmware sets when the
command is done, so check to decide if the command is ready to be
completed. this in turn makes ixl_iff work.

"oh" jmatthew@


# 1.28 21-Mar-2019 dlg

don't set VLAN things when configuring promisc

the doco says vlan things should only be set if you're manipulating
something on a specific vlan, otherwise the config applies to
everything, which is how the stack wants things to work at the
moment.

there's still something wrong in here, but let's get this out of
the way first.


# 1.27 21-Mar-2019 dlg

run event callbacks directly in the atq processing

previously events were queued on an SLIST, but multiple link state
events could fire with the same callback. this corrupted the SLIST
and effectively caused an infinite loop.

ok jmatthew@


# 1.26 12-Mar-2019 jmatthew

Until we can figure out why it causes NMIs on some machines, skip the
pre-reset steps described in Intel's datasheet and also their driver code.

ok dlg@


# 1.25 06-Mar-2019 jmatthew

Ignore rx interrupts if the interface isn't supposed to be running.
In some situations the pxe rom seems to leave rx interrupts
pending, so we get them as soon as we turn interrupts on.
Trying to process rx interrupts before we've allocated an rx ring
leads to crashes.

ok dlg@


# 1.24 01-Mar-2019 dlg

use ifiq_input instead of if_input

call if_rxr_livelocked if ifiq_input says to slow down


# 1.23 26-Feb-2019 dlg

tweak the mbuf loading in the tx path so it's easier to read.

count mbuf load failures as output errors so i can see if that's
a problem (it's not, but at least i can see it isn't now).


# 1.22 26-Feb-2019 dlg

put the rx buffer at the end of the cluster.

makes m_prepend later less likely to allocate a new mbuf.


# 1.21 26-Feb-2019 dlg

avoid a deadlock in ixl_down when calling ifq_barrier.

this is particularly noticable on sparc64 when you reboot.

ok jmatthew@


# 1.20 24-Feb-2019 dlg

get rid of an unused softc member


# 1.19 01-Feb-2019 jmatthew

fix up calculation of our physical function id, making the second port
on dual port cards work much better.

ok dlg@


# 1.18 29-Jan-2019 dlg

get rid of some more debug printfs

suggested by jmatthew@


# 1.17 29-Jan-2019 dlg

don't need to print the base queue number.

ok jmatthew@


# 1.16 22-Jan-2019 jmatthew

Increase hardmtu to the maximum according to the datasheet and set the rx
packet size limit to match so jumbos actually work. Larger packets are
split across multiple buffers on the ring, so the buffers themselves stay
the same size.

ok dlg@


# 1.15 22-Jan-2019 jmatthew

Add and remove mac filters for multicast addresses.

ok dlg@


# 1.14 21-Jan-2019 jmatthew

also add a mac filter that ignores vlans for the broadcast address, so we can
see arp requests on vlans, among other things.

ok dlg@


# 1.13 20-Jan-2019 jmatthew

Replace the default mac filter with one that ignores vlans, and enable
promisc vlan mode so we can see vlan tagged traffic.

ok dlg@


# 1.12 20-Jan-2019 jmatthew

implement ixl_rxrinfo, dynamically allocating the buffer to prepare for
multiple receive rings at some point in the future.

ok dlg@


# 1.11 20-Jan-2019 jmatthew

Handle link state change interrupts by issuing IXL_AQ_OP_PHY_LINK_STATUS
to the admin queue. We don't need to wait for or process the reply,
because the existing admin reply queue processing already does it.

ok dlg@


# 1.10 19-Jan-2019 jmatthew

actually set CAUSE_ENA on the rx and tx queues, and re-enable interrupts
at the start of the interrupt handler. now it works well enough to commit
over.

ok dlg@


# 1.9 18-Jan-2019 jmatthew

pack hmc bits in the right order

ok dlg@


# 1.8 18-Nov-2018 jmatthew

request notification of link state changes, which helps us detect
link when it takes a bit longer to establish.

ok dlg@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.7 21-Dec-2017 dlg

add some ifq and ifiq barriers in ixl_down.

move rxfill in ixl_up so the qtail is only written once.


# 1.6 21-Dec-2017 dlg

now that we have multiple input queues in ifnet structs, use them.

for now we still only have one set of tx and rx rings, but sets us up to
bump that number.


# 1.5 15-Dec-2017 dlg

put where im up to into the tree so jmatthew@ can look at it.


# 1.4 29-Nov-2017 dlg

turns out you dont have to configure a vsi as the default in a veb.

what a waste of two days.


# 1.3 29-Nov-2017 dlg

let this build on sparc64 again.


# 1.2 28-Nov-2017 dlg

remove the #if 0ed out ixl_add_veb now that cvs has backed it up

im pretty sure we dont need it if we're just using the chip as a
single ethernet port.


# 1.1 28-Nov-2017 dlg

add ixl(4) for the "Intel Ethernet 700 Series"

this doesn't work yet, but it very recently got too big to hack on
without cvs to help me manage further changes to it.

ok deraadt@


# 1.40 21-Jul-2019 dlg

apply backpressure when ifiq says the stack is getting busy


# 1.39 04-Jul-2019 jmatthew

when the mac address changes, update the mac/vlan filters accordingly.

ok dlg@


# 1.38 04-May-2019 jsg

fix array bounds check in ixl_search_link_speed()
ok dlg@


# 1.37 15-Apr-2019 visa

Use timeout_del_barrier(9) instead of timeout_del(9) followed by
conditional timeout_barrier(9).

OK kn@ dlg@


Revision tags: OPENBSD_6_5_BASE
# 1.36 10-Apr-2019 phessler

Add support for X722 to ixl(4)

full list of pci ids from sthen@

OK sthen@, jmatthew@, deraadt@


# 1.35 10-Apr-2019 dlg

implement support for SIOCGIFSFFPAGE

this relies on some firmware commands for accessing the i2c bus
that are not available until a relatively recent API version. our
nics using API 1.4 and 1.5 don't handle the command, but the API
1.7 nic we have is happy to talk to the module that is plugged into
it.

xl710 cards (the 40g ones with a qsfp+ connector) can be split up
into 4 functions that represent lanes on a single port. you can get
qsfp+ to 4x sfp+ cables so you can use the different lanes/functions
as completely independent interfaces. however, because each lane/
function is one port and therefore module, we need to serialise
access to the module by at least the port. this is to prevent
concurrent reads of different pages of the one module from stepping
on each other. i took the easy path and made a single ixlsff lock,
which is at least conservative.


# 1.34 01-Apr-2019 jmatthew

Enable the full pre-reset code path again, now that we've fixed the cause
of the pcie errors.

ok dlg@


# 1.33 01-Apr-2019 jmatthew

Don't use a prefetchable mapping for the registers - the controller only
accepts specific read and write widths, so prefetching and write combining
can cause it to generate pcie errors.

ok dlg@


# 1.32 29-Mar-2019 dlg

remove ifiq_barrier in ixl_down cos ifiq tasks don't use nic resources.

ixl_down frees all the dma memory used by the rings, but that memory
isn't used by ifiqs, so there's no need to wait for them to finish


# 1.31 22-Mar-2019 dlg

back out 1.28

i misread the doco, having promisc vlan reception is what we want.


# 1.30 22-Mar-2019 dlg

use a cond in ixl_atq_exec to wait for a command to be completed.

the main change is to not run ixl_atq_done, cos the interrupt handler
is supposed to do that for us.


# 1.29 22-Mar-2019 dlg

don't blindly complete admin queue entries

there's a bit in the flags field that the firmware sets when the
command is done, so check to decide if the command is ready to be
completed. this in turn makes ixl_iff work.

"oh" jmatthew@


# 1.28 21-Mar-2019 dlg

don't set VLAN things when configuring promisc

the doco says vlan things should only be set if you're manipulating
something on a specific vlan, otherwise the config applies to
everything, which is how the stack wants things to work at the
moment.

there's still something wrong in here, but let's get this out of
the way first.


# 1.27 21-Mar-2019 dlg

run event callbacks directly in the atq processing

previously events were queued on an SLIST, but multiple link state
events could fire with the same callback. this corrupted the SLIST
and effectively caused an infinite loop.

ok jmatthew@


# 1.26 12-Mar-2019 jmatthew

Until we can figure out why it causes NMIs on some machines, skip the
pre-reset steps described in Intel's datasheet and also their driver code.

ok dlg@


# 1.25 06-Mar-2019 jmatthew

Ignore rx interrupts if the interface isn't supposed to be running.
In some situations the pxe rom seems to leave rx interrupts
pending, so we get them as soon as we turn interrupts on.
Trying to process rx interrupts before we've allocated an rx ring
leads to crashes.

ok dlg@


# 1.24 01-Mar-2019 dlg

use ifiq_input instead of if_input

call if_rxr_livelocked if ifiq_input says to slow down


# 1.23 26-Feb-2019 dlg

tweak the mbuf loading in the tx path so it's easier to read.

count mbuf load failures as output errors so i can see if that's
a problem (it's not, but at least i can see it isn't now).


# 1.22 26-Feb-2019 dlg

put the rx buffer at the end of the cluster.

makes m_prepend later less likely to allocate a new mbuf.


# 1.21 26-Feb-2019 dlg

avoid a deadlock in ixl_down when calling ifq_barrier.

this is particularly noticable on sparc64 when you reboot.

ok jmatthew@


# 1.20 24-Feb-2019 dlg

get rid of an unused softc member


# 1.19 01-Feb-2019 jmatthew

fix up calculation of our physical function id, making the second port
on dual port cards work much better.

ok dlg@


# 1.18 29-Jan-2019 dlg

get rid of some more debug printfs

suggested by jmatthew@


# 1.17 29-Jan-2019 dlg

don't need to print the base queue number.

ok jmatthew@


# 1.16 22-Jan-2019 jmatthew

Increase hardmtu to the maximum according to the datasheet and set the rx
packet size limit to match so jumbos actually work. Larger packets are
split across multiple buffers on the ring, so the buffers themselves stay
the same size.

ok dlg@


# 1.15 22-Jan-2019 jmatthew

Add and remove mac filters for multicast addresses.

ok dlg@


# 1.14 21-Jan-2019 jmatthew

also add a mac filter that ignores vlans for the broadcast address, so we can
see arp requests on vlans, among other things.

ok dlg@


# 1.13 20-Jan-2019 jmatthew

Replace the default mac filter with one that ignores vlans, and enable
promisc vlan mode so we can see vlan tagged traffic.

ok dlg@


# 1.12 20-Jan-2019 jmatthew

implement ixl_rxrinfo, dynamically allocating the buffer to prepare for
multiple receive rings at some point in the future.

ok dlg@


# 1.11 20-Jan-2019 jmatthew

Handle link state change interrupts by issuing IXL_AQ_OP_PHY_LINK_STATUS
to the admin queue. We don't need to wait for or process the reply,
because the existing admin reply queue processing already does it.

ok dlg@


# 1.10 19-Jan-2019 jmatthew

actually set CAUSE_ENA on the rx and tx queues, and re-enable interrupts
at the start of the interrupt handler. now it works well enough to commit
over.

ok dlg@


# 1.9 18-Jan-2019 jmatthew

pack hmc bits in the right order

ok dlg@


# 1.8 18-Nov-2018 jmatthew

request notification of link state changes, which helps us detect
link when it takes a bit longer to establish.

ok dlg@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.7 21-Dec-2017 dlg

add some ifq and ifiq barriers in ixl_down.

move rxfill in ixl_up so the qtail is only written once.


# 1.6 21-Dec-2017 dlg

now that we have multiple input queues in ifnet structs, use them.

for now we still only have one set of tx and rx rings, but sets us up to
bump that number.


# 1.5 15-Dec-2017 dlg

put where im up to into the tree so jmatthew@ can look at it.


# 1.4 29-Nov-2017 dlg

turns out you dont have to configure a vsi as the default in a veb.

what a waste of two days.


# 1.3 29-Nov-2017 dlg

let this build on sparc64 again.


# 1.2 28-Nov-2017 dlg

remove the #if 0ed out ixl_add_veb now that cvs has backed it up

im pretty sure we dont need it if we're just using the chip as a
single ethernet port.


# 1.1 28-Nov-2017 dlg

add ixl(4) for the "Intel Ethernet 700 Series"

this doesn't work yet, but it very recently got too big to hack on
without cvs to help me manage further changes to it.

ok deraadt@


# 1.39 04-Jul-2019 jmatthew

when the mac address changes, update the mac/vlan filters accordingly.

ok dlg@


# 1.38 04-May-2019 jsg

fix array bounds check in ixl_search_link_speed()
ok dlg@


# 1.37 15-Apr-2019 visa

Use timeout_del_barrier(9) instead of timeout_del(9) followed by
conditional timeout_barrier(9).

OK kn@ dlg@


Revision tags: OPENBSD_6_5_BASE
# 1.36 10-Apr-2019 phessler

Add support for X722 to ixl(4)

full list of pci ids from sthen@

OK sthen@, jmatthew@, deraadt@


# 1.35 10-Apr-2019 dlg

implement support for SIOCGIFSFFPAGE

this relies on some firmware commands for accessing the i2c bus
that are not available until a relatively recent API version. our
nics using API 1.4 and 1.5 don't handle the command, but the API
1.7 nic we have is happy to talk to the module that is plugged into
it.

xl710 cards (the 40g ones with a qsfp+ connector) can be split up
into 4 functions that represent lanes on a single port. you can get
qsfp+ to 4x sfp+ cables so you can use the different lanes/functions
as completely independent interfaces. however, because each lane/
function is one port and therefore module, we need to serialise
access to the module by at least the port. this is to prevent
concurrent reads of different pages of the one module from stepping
on each other. i took the easy path and made a single ixlsff lock,
which is at least conservative.


# 1.34 01-Apr-2019 jmatthew

Enable the full pre-reset code path again, now that we've fixed the cause
of the pcie errors.

ok dlg@


# 1.33 01-Apr-2019 jmatthew

Don't use a prefetchable mapping for the registers - the controller only
accepts specific read and write widths, so prefetching and write combining
can cause it to generate pcie errors.

ok dlg@


# 1.32 29-Mar-2019 dlg

remove ifiq_barrier in ixl_down cos ifiq tasks don't use nic resources.

ixl_down frees all the dma memory used by the rings, but that memory
isn't used by ifiqs, so there's no need to wait for them to finish


# 1.31 22-Mar-2019 dlg

back out 1.28

i misread the doco, having promisc vlan reception is what we want.


# 1.30 22-Mar-2019 dlg

use a cond in ixl_atq_exec to wait for a command to be completed.

the main change is to not run ixl_atq_done, cos the interrupt handler
is supposed to do that for us.


# 1.29 22-Mar-2019 dlg

don't blindly complete admin queue entries

there's a bit in the flags field that the firmware sets when the
command is done, so check to decide if the command is ready to be
completed. this in turn makes ixl_iff work.

"oh" jmatthew@


# 1.28 21-Mar-2019 dlg

don't set VLAN things when configuring promisc

the doco says vlan things should only be set if you're manipulating
something on a specific vlan, otherwise the config applies to
everything, which is how the stack wants things to work at the
moment.

there's still something wrong in here, but let's get this out of
the way first.


# 1.27 21-Mar-2019 dlg

run event callbacks directly in the atq processing

previously events were queued on an SLIST, but multiple link state
events could fire with the same callback. this corrupted the SLIST
and effectively caused an infinite loop.

ok jmatthew@


# 1.26 12-Mar-2019 jmatthew

Until we can figure out why it causes NMIs on some machines, skip the
pre-reset steps described in Intel's datasheet and also their driver code.

ok dlg@


# 1.25 06-Mar-2019 jmatthew

Ignore rx interrupts if the interface isn't supposed to be running.
In some situations the pxe rom seems to leave rx interrupts
pending, so we get them as soon as we turn interrupts on.
Trying to process rx interrupts before we've allocated an rx ring
leads to crashes.

ok dlg@


# 1.24 01-Mar-2019 dlg

use ifiq_input instead of if_input

call if_rxr_livelocked if ifiq_input says to slow down


# 1.23 26-Feb-2019 dlg

tweak the mbuf loading in the tx path so it's easier to read.

count mbuf load failures as output errors so i can see if that's
a problem (it's not, but at least i can see it isn't now).


# 1.22 26-Feb-2019 dlg

put the rx buffer at the end of the cluster.

makes m_prepend later less likely to allocate a new mbuf.


# 1.21 26-Feb-2019 dlg

avoid a deadlock in ixl_down when calling ifq_barrier.

this is particularly noticable on sparc64 when you reboot.

ok jmatthew@


# 1.20 24-Feb-2019 dlg

get rid of an unused softc member


# 1.19 01-Feb-2019 jmatthew

fix up calculation of our physical function id, making the second port
on dual port cards work much better.

ok dlg@


# 1.18 29-Jan-2019 dlg

get rid of some more debug printfs

suggested by jmatthew@


# 1.17 29-Jan-2019 dlg

don't need to print the base queue number.

ok jmatthew@


# 1.16 22-Jan-2019 jmatthew

Increase hardmtu to the maximum according to the datasheet and set the rx
packet size limit to match so jumbos actually work. Larger packets are
split across multiple buffers on the ring, so the buffers themselves stay
the same size.

ok dlg@


# 1.15 22-Jan-2019 jmatthew

Add and remove mac filters for multicast addresses.

ok dlg@


# 1.14 21-Jan-2019 jmatthew

also add a mac filter that ignores vlans for the broadcast address, so we can
see arp requests on vlans, among other things.

ok dlg@


# 1.13 20-Jan-2019 jmatthew

Replace the default mac filter with one that ignores vlans, and enable
promisc vlan mode so we can see vlan tagged traffic.

ok dlg@


# 1.12 20-Jan-2019 jmatthew

implement ixl_rxrinfo, dynamically allocating the buffer to prepare for
multiple receive rings at some point in the future.

ok dlg@


# 1.11 20-Jan-2019 jmatthew

Handle link state change interrupts by issuing IXL_AQ_OP_PHY_LINK_STATUS
to the admin queue. We don't need to wait for or process the reply,
because the existing admin reply queue processing already does it.

ok dlg@


# 1.10 19-Jan-2019 jmatthew

actually set CAUSE_ENA on the rx and tx queues, and re-enable interrupts
at the start of the interrupt handler. now it works well enough to commit
over.

ok dlg@


# 1.9 18-Jan-2019 jmatthew

pack hmc bits in the right order

ok dlg@


# 1.8 18-Nov-2018 jmatthew

request notification of link state changes, which helps us detect
link when it takes a bit longer to establish.

ok dlg@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.7 21-Dec-2017 dlg

add some ifq and ifiq barriers in ixl_down.

move rxfill in ixl_up so the qtail is only written once.


# 1.6 21-Dec-2017 dlg

now that we have multiple input queues in ifnet structs, use them.

for now we still only have one set of tx and rx rings, but sets us up to
bump that number.


# 1.5 15-Dec-2017 dlg

put where im up to into the tree so jmatthew@ can look at it.


# 1.4 29-Nov-2017 dlg

turns out you dont have to configure a vsi as the default in a veb.

what a waste of two days.


# 1.3 29-Nov-2017 dlg

let this build on sparc64 again.


# 1.2 28-Nov-2017 dlg

remove the #if 0ed out ixl_add_veb now that cvs has backed it up

im pretty sure we dont need it if we're just using the chip as a
single ethernet port.


# 1.1 28-Nov-2017 dlg

add ixl(4) for the "Intel Ethernet 700 Series"

this doesn't work yet, but it very recently got too big to hack on
without cvs to help me manage further changes to it.

ok deraadt@


# 1.38 04-May-2019 jsg

fix array bounds check in ixl_search_link_speed()
ok dlg@


# 1.37 15-Apr-2019 visa

Use timeout_del_barrier(9) instead of timeout_del(9) followed by
conditional timeout_barrier(9).

OK kn@ dlg@


Revision tags: OPENBSD_6_5_BASE
# 1.36 10-Apr-2019 phessler

Add support for X722 to ixl(4)

full list of pci ids from sthen@

OK sthen@, jmatthew@, deraadt@


# 1.35 10-Apr-2019 dlg

implement support for SIOCGIFSFFPAGE

this relies on some firmware commands for accessing the i2c bus
that are not available until a relatively recent API version. our
nics using API 1.4 and 1.5 don't handle the command, but the API
1.7 nic we have is happy to talk to the module that is plugged into
it.

xl710 cards (the 40g ones with a qsfp+ connector) can be split up
into 4 functions that represent lanes on a single port. you can get
qsfp+ to 4x sfp+ cables so you can use the different lanes/functions
as completely independent interfaces. however, because each lane/
function is one port and therefore module, we need to serialise
access to the module by at least the port. this is to prevent
concurrent reads of different pages of the one module from stepping
on each other. i took the easy path and made a single ixlsff lock,
which is at least conservative.


# 1.34 01-Apr-2019 jmatthew

Enable the full pre-reset code path again, now that we've fixed the cause
of the pcie errors.

ok dlg@


# 1.33 01-Apr-2019 jmatthew

Don't use a prefetchable mapping for the registers - the controller only
accepts specific read and write widths, so prefetching and write combining
can cause it to generate pcie errors.

ok dlg@


# 1.32 29-Mar-2019 dlg

remove ifiq_barrier in ixl_down cos ifiq tasks don't use nic resources.

ixl_down frees all the dma memory used by the rings, but that memory
isn't used by ifiqs, so there's no need to wait for them to finish


# 1.31 22-Mar-2019 dlg

back out 1.28

i misread the doco, having promisc vlan reception is what we want.


# 1.30 22-Mar-2019 dlg

use a cond in ixl_atq_exec to wait for a command to be completed.

the main change is to not run ixl_atq_done, cos the interrupt handler
is supposed to do that for us.


# 1.29 22-Mar-2019 dlg

don't blindly complete admin queue entries

there's a bit in the flags field that the firmware sets when the
command is done, so check to decide if the command is ready to be
completed. this in turn makes ixl_iff work.

"oh" jmatthew@


# 1.28 21-Mar-2019 dlg

don't set VLAN things when configuring promisc

the doco says vlan things should only be set if you're manipulating
something on a specific vlan, otherwise the config applies to
everything, which is how the stack wants things to work at the
moment.

there's still something wrong in here, but let's get this out of
the way first.


# 1.27 21-Mar-2019 dlg

run event callbacks directly in the atq processing

previously events were queued on an SLIST, but multiple link state
events could fire with the same callback. this corrupted the SLIST
and effectively caused an infinite loop.

ok jmatthew@


# 1.26 12-Mar-2019 jmatthew

Until we can figure out why it causes NMIs on some machines, skip the
pre-reset steps described in Intel's datasheet and also their driver code.

ok dlg@


# 1.25 06-Mar-2019 jmatthew

Ignore rx interrupts if the interface isn't supposed to be running.
In some situations the pxe rom seems to leave rx interrupts
pending, so we get them as soon as we turn interrupts on.
Trying to process rx interrupts before we've allocated an rx ring
leads to crashes.

ok dlg@


# 1.24 01-Mar-2019 dlg

use ifiq_input instead of if_input

call if_rxr_livelocked if ifiq_input says to slow down


# 1.23 26-Feb-2019 dlg

tweak the mbuf loading in the tx path so it's easier to read.

count mbuf load failures as output errors so i can see if that's
a problem (it's not, but at least i can see it isn't now).


# 1.22 26-Feb-2019 dlg

put the rx buffer at the end of the cluster.

makes m_prepend later less likely to allocate a new mbuf.


# 1.21 26-Feb-2019 dlg

avoid a deadlock in ixl_down when calling ifq_barrier.

this is particularly noticable on sparc64 when you reboot.

ok jmatthew@


# 1.20 24-Feb-2019 dlg

get rid of an unused softc member


# 1.19 01-Feb-2019 jmatthew

fix up calculation of our physical function id, making the second port
on dual port cards work much better.

ok dlg@


# 1.18 29-Jan-2019 dlg

get rid of some more debug printfs

suggested by jmatthew@


# 1.17 29-Jan-2019 dlg

don't need to print the base queue number.

ok jmatthew@


# 1.16 22-Jan-2019 jmatthew

Increase hardmtu to the maximum according to the datasheet and set the rx
packet size limit to match so jumbos actually work. Larger packets are
split across multiple buffers on the ring, so the buffers themselves stay
the same size.

ok dlg@


# 1.15 22-Jan-2019 jmatthew

Add and remove mac filters for multicast addresses.

ok dlg@


# 1.14 21-Jan-2019 jmatthew

also add a mac filter that ignores vlans for the broadcast address, so we can
see arp requests on vlans, among other things.

ok dlg@


# 1.13 20-Jan-2019 jmatthew

Replace the default mac filter with one that ignores vlans, and enable
promisc vlan mode so we can see vlan tagged traffic.

ok dlg@


# 1.12 20-Jan-2019 jmatthew

implement ixl_rxrinfo, dynamically allocating the buffer to prepare for
multiple receive rings at some point in the future.

ok dlg@


# 1.11 20-Jan-2019 jmatthew

Handle link state change interrupts by issuing IXL_AQ_OP_PHY_LINK_STATUS
to the admin queue. We don't need to wait for or process the reply,
because the existing admin reply queue processing already does it.

ok dlg@


# 1.10 19-Jan-2019 jmatthew

actually set CAUSE_ENA on the rx and tx queues, and re-enable interrupts
at the start of the interrupt handler. now it works well enough to commit
over.

ok dlg@


# 1.9 18-Jan-2019 jmatthew

pack hmc bits in the right order

ok dlg@


# 1.8 18-Nov-2018 jmatthew

request notification of link state changes, which helps us detect
link when it takes a bit longer to establish.

ok dlg@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.7 21-Dec-2017 dlg

add some ifq and ifiq barriers in ixl_down.

move rxfill in ixl_up so the qtail is only written once.


# 1.6 21-Dec-2017 dlg

now that we have multiple input queues in ifnet structs, use them.

for now we still only have one set of tx and rx rings, but sets us up to
bump that number.


# 1.5 15-Dec-2017 dlg

put where im up to into the tree so jmatthew@ can look at it.


# 1.4 29-Nov-2017 dlg

turns out you dont have to configure a vsi as the default in a veb.

what a waste of two days.


# 1.3 29-Nov-2017 dlg

let this build on sparc64 again.


# 1.2 28-Nov-2017 dlg

remove the #if 0ed out ixl_add_veb now that cvs has backed it up

im pretty sure we dont need it if we're just using the chip as a
single ethernet port.


# 1.1 28-Nov-2017 dlg

add ixl(4) for the "Intel Ethernet 700 Series"

this doesn't work yet, but it very recently got too big to hack on
without cvs to help me manage further changes to it.

ok deraadt@


# 1.37 15-Apr-2019 visa

Use timeout_del_barrier(9) instead of timeout_del(9) followed by
conditional timeout_barrier(9).

OK kn@ dlg@


Revision tags: OPENBSD_6_5_BASE
# 1.36 10-Apr-2019 phessler

Add support for X722 to ixl(4)

full list of pci ids from sthen@

OK sthen@, jmatthew@, deraadt@


# 1.35 10-Apr-2019 dlg

implement support for SIOCGIFSFFPAGE

this relies on some firmware commands for accessing the i2c bus
that are not available until a relatively recent API version. our
nics using API 1.4 and 1.5 don't handle the command, but the API
1.7 nic we have is happy to talk to the module that is plugged into
it.

xl710 cards (the 40g ones with a qsfp+ connector) can be split up
into 4 functions that represent lanes on a single port. you can get
qsfp+ to 4x sfp+ cables so you can use the different lanes/functions
as completely independent interfaces. however, because each lane/
function is one port and therefore module, we need to serialise
access to the module by at least the port. this is to prevent
concurrent reads of different pages of the one module from stepping
on each other. i took the easy path and made a single ixlsff lock,
which is at least conservative.


# 1.34 01-Apr-2019 jmatthew

Enable the full pre-reset code path again, now that we've fixed the cause
of the pcie errors.

ok dlg@


# 1.33 01-Apr-2019 jmatthew

Don't use a prefetchable mapping for the registers - the controller only
accepts specific read and write widths, so prefetching and write combining
can cause it to generate pcie errors.

ok dlg@


# 1.32 29-Mar-2019 dlg

remove ifiq_barrier in ixl_down cos ifiq tasks don't use nic resources.

ixl_down frees all the dma memory used by the rings, but that memory
isn't used by ifiqs, so there's no need to wait for them to finish


# 1.31 22-Mar-2019 dlg

back out 1.28

i misread the doco, having promisc vlan reception is what we want.


# 1.30 22-Mar-2019 dlg

use a cond in ixl_atq_exec to wait for a command to be completed.

the main change is to not run ixl_atq_done, cos the interrupt handler
is supposed to do that for us.


# 1.29 22-Mar-2019 dlg

don't blindly complete admin queue entries

there's a bit in the flags field that the firmware sets when the
command is done, so check to decide if the command is ready to be
completed. this in turn makes ixl_iff work.

"oh" jmatthew@


# 1.28 21-Mar-2019 dlg

don't set VLAN things when configuring promisc

the doco says vlan things should only be set if you're manipulating
something on a specific vlan, otherwise the config applies to
everything, which is how the stack wants things to work at the
moment.

there's still something wrong in here, but let's get this out of
the way first.


# 1.27 21-Mar-2019 dlg

run event callbacks directly in the atq processing

previously events were queued on an SLIST, but multiple link state
events could fire with the same callback. this corrupted the SLIST
and effectively caused an infinite loop.

ok jmatthew@


# 1.26 12-Mar-2019 jmatthew

Until we can figure out why it causes NMIs on some machines, skip the
pre-reset steps described in Intel's datasheet and also their driver code.

ok dlg@


# 1.25 06-Mar-2019 jmatthew

Ignore rx interrupts if the interface isn't supposed to be running.
In some situations the pxe rom seems to leave rx interrupts
pending, so we get them as soon as we turn interrupts on.
Trying to process rx interrupts before we've allocated an rx ring
leads to crashes.

ok dlg@


# 1.24 01-Mar-2019 dlg

use ifiq_input instead of if_input

call if_rxr_livelocked if ifiq_input says to slow down


# 1.23 26-Feb-2019 dlg

tweak the mbuf loading in the tx path so it's easier to read.

count mbuf load failures as output errors so i can see if that's
a problem (it's not, but at least i can see it isn't now).


# 1.22 26-Feb-2019 dlg

put the rx buffer at the end of the cluster.

makes m_prepend later less likely to allocate a new mbuf.


# 1.21 26-Feb-2019 dlg

avoid a deadlock in ixl_down when calling ifq_barrier.

this is particularly noticable on sparc64 when you reboot.

ok jmatthew@


# 1.20 24-Feb-2019 dlg

get rid of an unused softc member


# 1.19 01-Feb-2019 jmatthew

fix up calculation of our physical function id, making the second port
on dual port cards work much better.

ok dlg@


# 1.18 29-Jan-2019 dlg

get rid of some more debug printfs

suggested by jmatthew@


# 1.17 29-Jan-2019 dlg

don't need to print the base queue number.

ok jmatthew@


# 1.16 22-Jan-2019 jmatthew

Increase hardmtu to the maximum according to the datasheet and set the rx
packet size limit to match so jumbos actually work. Larger packets are
split across multiple buffers on the ring, so the buffers themselves stay
the same size.

ok dlg@


# 1.15 22-Jan-2019 jmatthew

Add and remove mac filters for multicast addresses.

ok dlg@


# 1.14 21-Jan-2019 jmatthew

also add a mac filter that ignores vlans for the broadcast address, so we can
see arp requests on vlans, among other things.

ok dlg@


# 1.13 20-Jan-2019 jmatthew

Replace the default mac filter with one that ignores vlans, and enable
promisc vlan mode so we can see vlan tagged traffic.

ok dlg@


# 1.12 20-Jan-2019 jmatthew

implement ixl_rxrinfo, dynamically allocating the buffer to prepare for
multiple receive rings at some point in the future.

ok dlg@


# 1.11 20-Jan-2019 jmatthew

Handle link state change interrupts by issuing IXL_AQ_OP_PHY_LINK_STATUS
to the admin queue. We don't need to wait for or process the reply,
because the existing admin reply queue processing already does it.

ok dlg@


# 1.10 19-Jan-2019 jmatthew

actually set CAUSE_ENA on the rx and tx queues, and re-enable interrupts
at the start of the interrupt handler. now it works well enough to commit
over.

ok dlg@


# 1.9 18-Jan-2019 jmatthew

pack hmc bits in the right order

ok dlg@


# 1.8 18-Nov-2018 jmatthew

request notification of link state changes, which helps us detect
link when it takes a bit longer to establish.

ok dlg@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.7 21-Dec-2017 dlg

add some ifq and ifiq barriers in ixl_down.

move rxfill in ixl_up so the qtail is only written once.


# 1.6 21-Dec-2017 dlg

now that we have multiple input queues in ifnet structs, use them.

for now we still only have one set of tx and rx rings, but sets us up to
bump that number.


# 1.5 15-Dec-2017 dlg

put where im up to into the tree so jmatthew@ can look at it.


# 1.4 29-Nov-2017 dlg

turns out you dont have to configure a vsi as the default in a veb.

what a waste of two days.


# 1.3 29-Nov-2017 dlg

let this build on sparc64 again.


# 1.2 28-Nov-2017 dlg

remove the #if 0ed out ixl_add_veb now that cvs has backed it up

im pretty sure we dont need it if we're just using the chip as a
single ethernet port.


# 1.1 28-Nov-2017 dlg

add ixl(4) for the "Intel Ethernet 700 Series"

this doesn't work yet, but it very recently got too big to hack on
without cvs to help me manage further changes to it.

ok deraadt@


# 1.25 06-Mar-2019 jmatthew

Ignore rx interrupts if the interface isn't supposed to be running.
In some situations the pxe rom seems to leave rx interrupts
pending, so we get them as soon as we turn interrupts on.
Trying to process rx interrupts before we've allocated an rx ring
leads to crashes.

ok dlg@


# 1.24 01-Mar-2019 dlg

use ifiq_input instead of if_input

call if_rxr_livelocked if ifiq_input says to slow down


# 1.23 26-Feb-2019 dlg

tweak the mbuf loading in the tx path so it's easier to read.

count mbuf load failures as output errors so i can see if that's
a problem (it's not, but at least i can see it isn't now).


# 1.22 26-Feb-2019 dlg

put the rx buffer at the end of the cluster.

makes m_prepend later less likely to allocate a new mbuf.


# 1.21 26-Feb-2019 dlg

avoid a deadlock in ixl_down when calling ifq_barrier.

this is particularly noticable on sparc64 when you reboot.

ok jmatthew@


# 1.20 24-Feb-2019 dlg

get rid of an unused softc member


# 1.19 01-Feb-2019 jmatthew

fix up calculation of our physical function id, making the second port
on dual port cards work much better.

ok dlg@


# 1.18 29-Jan-2019 dlg

get rid of some more debug printfs

suggested by jmatthew@


# 1.17 29-Jan-2019 dlg

don't need to print the base queue number.

ok jmatthew@


# 1.16 22-Jan-2019 jmatthew

Increase hardmtu to the maximum according to the datasheet and set the rx
packet size limit to match so jumbos actually work. Larger packets are
split across multiple buffers on the ring, so the buffers themselves stay
the same size.

ok dlg@


# 1.15 22-Jan-2019 jmatthew

Add and remove mac filters for multicast addresses.

ok dlg@


# 1.14 21-Jan-2019 jmatthew

also add a mac filter that ignores vlans for the broadcast address, so we can
see arp requests on vlans, among other things.

ok dlg@


# 1.13 20-Jan-2019 jmatthew

Replace the default mac filter with one that ignores vlans, and enable
promisc vlan mode so we can see vlan tagged traffic.

ok dlg@


# 1.12 20-Jan-2019 jmatthew

implement ixl_rxrinfo, dynamically allocating the buffer to prepare for
multiple receive rings at some point in the future.

ok dlg@


# 1.11 20-Jan-2019 jmatthew

Handle link state change interrupts by issuing IXL_AQ_OP_PHY_LINK_STATUS
to the admin queue. We don't need to wait for or process the reply,
because the existing admin reply queue processing already does it.

ok dlg@


# 1.10 19-Jan-2019 jmatthew

actually set CAUSE_ENA on the rx and tx queues, and re-enable interrupts
at the start of the interrupt handler. now it works well enough to commit
over.

ok dlg@


# 1.9 18-Jan-2019 jmatthew

pack hmc bits in the right order

ok dlg@


# 1.8 18-Nov-2018 jmatthew

request notification of link state changes, which helps us detect
link when it takes a bit longer to establish.

ok dlg@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.7 21-Dec-2017 dlg

add some ifq and ifiq barriers in ixl_down.

move rxfill in ixl_up so the qtail is only written once.


# 1.6 21-Dec-2017 dlg

now that we have multiple input queues in ifnet structs, use them.

for now we still only have one set of tx and rx rings, but sets us up to
bump that number.


# 1.5 15-Dec-2017 dlg

put where im up to into the tree so jmatthew@ can look at it.


# 1.4 29-Nov-2017 dlg

turns out you dont have to configure a vsi as the default in a veb.

what a waste of two days.


# 1.3 29-Nov-2017 dlg

let this build on sparc64 again.


# 1.2 28-Nov-2017 dlg

remove the #if 0ed out ixl_add_veb now that cvs has backed it up

im pretty sure we dont need it if we're just using the chip as a
single ethernet port.


# 1.1 28-Nov-2017 dlg

add ixl(4) for the "Intel Ethernet 700 Series"

this doesn't work yet, but it very recently got too big to hack on
without cvs to help me manage further changes to it.

ok deraadt@


# 1.24 01-Mar-2019 dlg

use ifiq_input instead of if_input

call if_rxr_livelocked if ifiq_input says to slow down


# 1.23 26-Feb-2019 dlg

tweak the mbuf loading in the tx path so it's easier to read.

count mbuf load failures as output errors so i can see if that's
a problem (it's not, but at least i can see it isn't now).


# 1.22 26-Feb-2019 dlg

put the rx buffer at the end of the cluster.

makes m_prepend later less likely to allocate a new mbuf.


# 1.21 26-Feb-2019 dlg

avoid a deadlock in ixl_down when calling ifq_barrier.

this is particularly noticable on sparc64 when you reboot.

ok jmatthew@


# 1.20 24-Feb-2019 dlg

get rid of an unused softc member


# 1.19 01-Feb-2019 jmatthew

fix up calculation of our physical function id, making the second port
on dual port cards work much better.

ok dlg@


# 1.18 29-Jan-2019 dlg

get rid of some more debug printfs

suggested by jmatthew@


# 1.17 29-Jan-2019 dlg

don't need to print the base queue number.

ok jmatthew@


# 1.16 22-Jan-2019 jmatthew

Increase hardmtu to the maximum according to the datasheet and set the rx
packet size limit to match so jumbos actually work. Larger packets are
split across multiple buffers on the ring, so the buffers themselves stay
the same size.

ok dlg@


# 1.15 22-Jan-2019 jmatthew

Add and remove mac filters for multicast addresses.

ok dlg@


# 1.14 21-Jan-2019 jmatthew

also add a mac filter that ignores vlans for the broadcast address, so we can
see arp requests on vlans, among other things.

ok dlg@


# 1.13 20-Jan-2019 jmatthew

Replace the default mac filter with one that ignores vlans, and enable
promisc vlan mode so we can see vlan tagged traffic.

ok dlg@


# 1.12 20-Jan-2019 jmatthew

implement ixl_rxrinfo, dynamically allocating the buffer to prepare for
multiple receive rings at some point in the future.

ok dlg@


# 1.11 20-Jan-2019 jmatthew

Handle link state change interrupts by issuing IXL_AQ_OP_PHY_LINK_STATUS
to the admin queue. We don't need to wait for or process the reply,
because the existing admin reply queue processing already does it.

ok dlg@


# 1.10 19-Jan-2019 jmatthew

actually set CAUSE_ENA on the rx and tx queues, and re-enable interrupts
at the start of the interrupt handler. now it works well enough to commit
over.

ok dlg@


# 1.9 18-Jan-2019 jmatthew

pack hmc bits in the right order

ok dlg@


# 1.8 18-Nov-2018 jmatthew

request notification of link state changes, which helps us detect
link when it takes a bit longer to establish.

ok dlg@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.7 21-Dec-2017 dlg

add some ifq and ifiq barriers in ixl_down.

move rxfill in ixl_up so the qtail is only written once.


# 1.6 21-Dec-2017 dlg

now that we have multiple input queues in ifnet structs, use them.

for now we still only have one set of tx and rx rings, but sets us up to
bump that number.


# 1.5 15-Dec-2017 dlg

put where im up to into the tree so jmatthew@ can look at it.


# 1.4 29-Nov-2017 dlg

turns out you dont have to configure a vsi as the default in a veb.

what a waste of two days.


# 1.3 29-Nov-2017 dlg

let this build on sparc64 again.


# 1.2 28-Nov-2017 dlg

remove the #if 0ed out ixl_add_veb now that cvs has backed it up

im pretty sure we dont need it if we're just using the chip as a
single ethernet port.


# 1.1 28-Nov-2017 dlg

add ixl(4) for the "Intel Ethernet 700 Series"

this doesn't work yet, but it very recently got too big to hack on
without cvs to help me manage further changes to it.

ok deraadt@


# 1.23 26-Feb-2019 dlg

tweak the mbuf loading in the tx path so it's easier to read.

count mbuf load failures as output errors so i can see if that's
a problem (it's not, but at least i can see it isn't now).


# 1.22 26-Feb-2019 dlg

put the rx buffer at the end of the cluster.

makes m_prepend later less likely to allocate a new mbuf.


# 1.21 26-Feb-2019 dlg

avoid a deadlock in ixl_down when calling ifq_barrier.

this is particularly noticable on sparc64 when you reboot.

ok jmatthew@


# 1.20 24-Feb-2019 dlg

get rid of an unused softc member


# 1.19 01-Feb-2019 jmatthew

fix up calculation of our physical function id, making the second port
on dual port cards work much better.

ok dlg@


# 1.18 29-Jan-2019 dlg

get rid of some more debug printfs

suggested by jmatthew@


# 1.17 29-Jan-2019 dlg

don't need to print the base queue number.

ok jmatthew@


# 1.16 22-Jan-2019 jmatthew

Increase hardmtu to the maximum according to the datasheet and set the rx
packet size limit to match so jumbos actually work. Larger packets are
split across multiple buffers on the ring, so the buffers themselves stay
the same size.

ok dlg@


# 1.15 22-Jan-2019 jmatthew

Add and remove mac filters for multicast addresses.

ok dlg@


# 1.14 21-Jan-2019 jmatthew

also add a mac filter that ignores vlans for the broadcast address, so we can
see arp requests on vlans, among other things.

ok dlg@


# 1.13 20-Jan-2019 jmatthew

Replace the default mac filter with one that ignores vlans, and enable
promisc vlan mode so we can see vlan tagged traffic.

ok dlg@


# 1.12 20-Jan-2019 jmatthew

implement ixl_rxrinfo, dynamically allocating the buffer to prepare for
multiple receive rings at some point in the future.

ok dlg@


# 1.11 20-Jan-2019 jmatthew

Handle link state change interrupts by issuing IXL_AQ_OP_PHY_LINK_STATUS
to the admin queue. We don't need to wait for or process the reply,
because the existing admin reply queue processing already does it.

ok dlg@


# 1.10 19-Jan-2019 jmatthew

actually set CAUSE_ENA on the rx and tx queues, and re-enable interrupts
at the start of the interrupt handler. now it works well enough to commit
over.

ok dlg@


# 1.9 18-Jan-2019 jmatthew

pack hmc bits in the right order

ok dlg@


# 1.8 18-Nov-2018 jmatthew

request notification of link state changes, which helps us detect
link when it takes a bit longer to establish.

ok dlg@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.7 21-Dec-2017 dlg

add some ifq and ifiq barriers in ixl_down.

move rxfill in ixl_up so the qtail is only written once.


# 1.6 21-Dec-2017 dlg

now that we have multiple input queues in ifnet structs, use them.

for now we still only have one set of tx and rx rings, but sets us up to
bump that number.


# 1.5 15-Dec-2017 dlg

put where im up to into the tree so jmatthew@ can look at it.


# 1.4 29-Nov-2017 dlg

turns out you dont have to configure a vsi as the default in a veb.

what a waste of two days.


# 1.3 29-Nov-2017 dlg

let this build on sparc64 again.


# 1.2 28-Nov-2017 dlg

remove the #if 0ed out ixl_add_veb now that cvs has backed it up

im pretty sure we dont need it if we're just using the chip as a
single ethernet port.


# 1.1 28-Nov-2017 dlg

add ixl(4) for the "Intel Ethernet 700 Series"

this doesn't work yet, but it very recently got too big to hack on
without cvs to help me manage further changes to it.

ok deraadt@


# 1.22 26-Feb-2019 dlg

put the rx buffer at the end of the cluster.

makes m_prepend later less likely to allocate a new mbuf.


# 1.21 26-Feb-2019 dlg

avoid a deadlock in ixl_down when calling ifq_barrier.

this is particularly noticable on sparc64 when you reboot.

ok jmatthew@


# 1.20 24-Feb-2019 dlg

get rid of an unused softc member


# 1.19 01-Feb-2019 jmatthew

fix up calculation of our physical function id, making the second port
on dual port cards work much better.

ok dlg@


# 1.18 29-Jan-2019 dlg

get rid of some more debug printfs

suggested by jmatthew@


# 1.17 29-Jan-2019 dlg

don't need to print the base queue number.

ok jmatthew@


# 1.16 22-Jan-2019 jmatthew

Increase hardmtu to the maximum according to the datasheet and set the rx
packet size limit to match so jumbos actually work. Larger packets are
split across multiple buffers on the ring, so the buffers themselves stay
the same size.

ok dlg@


# 1.15 22-Jan-2019 jmatthew

Add and remove mac filters for multicast addresses.

ok dlg@


# 1.14 21-Jan-2019 jmatthew

also add a mac filter that ignores vlans for the broadcast address, so we can
see arp requests on vlans, among other things.

ok dlg@


# 1.13 20-Jan-2019 jmatthew

Replace the default mac filter with one that ignores vlans, and enable
promisc vlan mode so we can see vlan tagged traffic.

ok dlg@


# 1.12 20-Jan-2019 jmatthew

implement ixl_rxrinfo, dynamically allocating the buffer to prepare for
multiple receive rings at some point in the future.

ok dlg@


# 1.11 20-Jan-2019 jmatthew

Handle link state change interrupts by issuing IXL_AQ_OP_PHY_LINK_STATUS
to the admin queue. We don't need to wait for or process the reply,
because the existing admin reply queue processing already does it.

ok dlg@


# 1.10 19-Jan-2019 jmatthew

actually set CAUSE_ENA on the rx and tx queues, and re-enable interrupts
at the start of the interrupt handler. now it works well enough to commit
over.

ok dlg@


# 1.9 18-Jan-2019 jmatthew

pack hmc bits in the right order

ok dlg@


# 1.8 18-Nov-2018 jmatthew

request notification of link state changes, which helps us detect
link when it takes a bit longer to establish.

ok dlg@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.7 21-Dec-2017 dlg

add some ifq and ifiq barriers in ixl_down.

move rxfill in ixl_up so the qtail is only written once.


# 1.6 21-Dec-2017 dlg

now that we have multiple input queues in ifnet structs, use them.

for now we still only have one set of tx and rx rings, but sets us up to
bump that number.


# 1.5 15-Dec-2017 dlg

put where im up to into the tree so jmatthew@ can look at it.


# 1.4 29-Nov-2017 dlg

turns out you dont have to configure a vsi as the default in a veb.

what a waste of two days.


# 1.3 29-Nov-2017 dlg

let this build on sparc64 again.


# 1.2 28-Nov-2017 dlg

remove the #if 0ed out ixl_add_veb now that cvs has backed it up

im pretty sure we dont need it if we're just using the chip as a
single ethernet port.


# 1.1 28-Nov-2017 dlg

add ixl(4) for the "Intel Ethernet 700 Series"

this doesn't work yet, but it very recently got too big to hack on
without cvs to help me manage further changes to it.

ok deraadt@


# 1.20 24-Feb-2019 dlg

get rid of an unused softc member


# 1.19 01-Feb-2019 jmatthew

fix up calculation of our physical function id, making the second port
on dual port cards work much better.

ok dlg@


# 1.18 29-Jan-2019 dlg

get rid of some more debug printfs

suggested by jmatthew@


# 1.17 29-Jan-2019 dlg

don't need to print the base queue number.

ok jmatthew@


# 1.16 22-Jan-2019 jmatthew

Increase hardmtu to the maximum according to the datasheet and set the rx
packet size limit to match so jumbos actually work. Larger packets are
split across multiple buffers on the ring, so the buffers themselves stay
the same size.

ok dlg@


# 1.15 22-Jan-2019 jmatthew

Add and remove mac filters for multicast addresses.

ok dlg@


# 1.14 21-Jan-2019 jmatthew

also add a mac filter that ignores vlans for the broadcast address, so we can
see arp requests on vlans, among other things.

ok dlg@


# 1.13 20-Jan-2019 jmatthew

Replace the default mac filter with one that ignores vlans, and enable
promisc vlan mode so we can see vlan tagged traffic.

ok dlg@


# 1.12 20-Jan-2019 jmatthew

implement ixl_rxrinfo, dynamically allocating the buffer to prepare for
multiple receive rings at some point in the future.

ok dlg@


# 1.11 20-Jan-2019 jmatthew

Handle link state change interrupts by issuing IXL_AQ_OP_PHY_LINK_STATUS
to the admin queue. We don't need to wait for or process the reply,
because the existing admin reply queue processing already does it.

ok dlg@


# 1.10 19-Jan-2019 jmatthew

actually set CAUSE_ENA on the rx and tx queues, and re-enable interrupts
at the start of the interrupt handler. now it works well enough to commit
over.

ok dlg@


# 1.9 18-Jan-2019 jmatthew

pack hmc bits in the right order

ok dlg@


# 1.8 18-Nov-2018 jmatthew

request notification of link state changes, which helps us detect
link when it takes a bit longer to establish.

ok dlg@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.7 21-Dec-2017 dlg

add some ifq and ifiq barriers in ixl_down.

move rxfill in ixl_up so the qtail is only written once.


# 1.6 21-Dec-2017 dlg

now that we have multiple input queues in ifnet structs, use them.

for now we still only have one set of tx and rx rings, but sets us up to
bump that number.


# 1.5 15-Dec-2017 dlg

put where im up to into the tree so jmatthew@ can look at it.


# 1.4 29-Nov-2017 dlg

turns out you dont have to configure a vsi as the default in a veb.

what a waste of two days.


# 1.3 29-Nov-2017 dlg

let this build on sparc64 again.


# 1.2 28-Nov-2017 dlg

remove the #if 0ed out ixl_add_veb now that cvs has backed it up

im pretty sure we dont need it if we're just using the chip as a
single ethernet port.


# 1.1 28-Nov-2017 dlg

add ixl(4) for the "Intel Ethernet 700 Series"

this doesn't work yet, but it very recently got too big to hack on
without cvs to help me manage further changes to it.

ok deraadt@


# 1.19 01-Feb-2019 jmatthew

fix up calculation of our physical function id, making the second port
on dual port cards work much better.

ok dlg@


# 1.18 29-Jan-2019 dlg

get rid of some more debug printfs

suggested by jmatthew@


# 1.17 29-Jan-2019 dlg

don't need to print the base queue number.

ok jmatthew@


# 1.16 22-Jan-2019 jmatthew

Increase hardmtu to the maximum according to the datasheet and set the rx
packet size limit to match so jumbos actually work. Larger packets are
split across multiple buffers on the ring, so the buffers themselves stay
the same size.

ok dlg@


# 1.15 22-Jan-2019 jmatthew

Add and remove mac filters for multicast addresses.

ok dlg@


# 1.14 21-Jan-2019 jmatthew

also add a mac filter that ignores vlans for the broadcast address, so we can
see arp requests on vlans, among other things.

ok dlg@


# 1.13 20-Jan-2019 jmatthew

Replace the default mac filter with one that ignores vlans, and enable
promisc vlan mode so we can see vlan tagged traffic.

ok dlg@


# 1.12 20-Jan-2019 jmatthew

implement ixl_rxrinfo, dynamically allocating the buffer to prepare for
multiple receive rings at some point in the future.

ok dlg@


# 1.11 20-Jan-2019 jmatthew

Handle link state change interrupts by issuing IXL_AQ_OP_PHY_LINK_STATUS
to the admin queue. We don't need to wait for or process the reply,
because the existing admin reply queue processing already does it.

ok dlg@


# 1.10 19-Jan-2019 jmatthew

actually set CAUSE_ENA on the rx and tx queues, and re-enable interrupts
at the start of the interrupt handler. now it works well enough to commit
over.

ok dlg@


# 1.9 18-Jan-2019 jmatthew

pack hmc bits in the right order

ok dlg@


# 1.8 18-Nov-2018 jmatthew

request notification of link state changes, which helps us detect
link when it takes a bit longer to establish.

ok dlg@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.7 21-Dec-2017 dlg

add some ifq and ifiq barriers in ixl_down.

move rxfill in ixl_up so the qtail is only written once.


# 1.6 21-Dec-2017 dlg

now that we have multiple input queues in ifnet structs, use them.

for now we still only have one set of tx and rx rings, but sets us up to
bump that number.


# 1.5 15-Dec-2017 dlg

put where im up to into the tree so jmatthew@ can look at it.


# 1.4 29-Nov-2017 dlg

turns out you dont have to configure a vsi as the default in a veb.

what a waste of two days.


# 1.3 29-Nov-2017 dlg

let this build on sparc64 again.


# 1.2 28-Nov-2017 dlg

remove the #if 0ed out ixl_add_veb now that cvs has backed it up

im pretty sure we dont need it if we're just using the chip as a
single ethernet port.


# 1.1 28-Nov-2017 dlg

add ixl(4) for the "Intel Ethernet 700 Series"

this doesn't work yet, but it very recently got too big to hack on
without cvs to help me manage further changes to it.

ok deraadt@


# 1.16 22-Jan-2019 jmatthew

Increase hardmtu to the maximum according to the datasheet and set the rx
packet size limit to match so jumbos actually work. Larger packets are
split across multiple buffers on the ring, so the buffers themselves stay
the same size.

ok dlg@


# 1.15 22-Jan-2019 jmatthew

Add and remove mac filters for multicast addresses.

ok dlg@


# 1.14 21-Jan-2019 jmatthew

also add a mac filter that ignores vlans for the broadcast address, so we can
see arp requests on vlans, among other things.

ok dlg@


# 1.13 20-Jan-2019 jmatthew

Replace the default mac filter with one that ignores vlans, and enable
promisc vlan mode so we can see vlan tagged traffic.

ok dlg@


# 1.12 20-Jan-2019 jmatthew

implement ixl_rxrinfo, dynamically allocating the buffer to prepare for
multiple receive rings at some point in the future.

ok dlg@


# 1.11 20-Jan-2019 jmatthew

Handle link state change interrupts by issuing IXL_AQ_OP_PHY_LINK_STATUS
to the admin queue. We don't need to wait for or process the reply,
because the existing admin reply queue processing already does it.

ok dlg@


# 1.10 19-Jan-2019 jmatthew

actually set CAUSE_ENA on the rx and tx queues, and re-enable interrupts
at the start of the interrupt handler. now it works well enough to commit
over.

ok dlg@


# 1.9 18-Jan-2019 jmatthew

pack hmc bits in the right order

ok dlg@


# 1.8 18-Nov-2018 jmatthew

request notification of link state changes, which helps us detect
link when it takes a bit longer to establish.

ok dlg@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.7 21-Dec-2017 dlg

add some ifq and ifiq barriers in ixl_down.

move rxfill in ixl_up so the qtail is only written once.


# 1.6 21-Dec-2017 dlg

now that we have multiple input queues in ifnet structs, use them.

for now we still only have one set of tx and rx rings, but sets us up to
bump that number.


# 1.5 15-Dec-2017 dlg

put where im up to into the tree so jmatthew@ can look at it.


# 1.4 29-Nov-2017 dlg

turns out you dont have to configure a vsi as the default in a veb.

what a waste of two days.


# 1.3 29-Nov-2017 dlg

let this build on sparc64 again.


# 1.2 28-Nov-2017 dlg

remove the #if 0ed out ixl_add_veb now that cvs has backed it up

im pretty sure we dont need it if we're just using the chip as a
single ethernet port.


# 1.1 28-Nov-2017 dlg

add ixl(4) for the "Intel Ethernet 700 Series"

this doesn't work yet, but it very recently got too big to hack on
without cvs to help me manage further changes to it.

ok deraadt@


# 1.14 21-Jan-2019 jmatthew

also add a mac filter that ignores vlans for the broadcast address, so we can
see arp requests on vlans, among other things.

ok dlg@


# 1.13 20-Jan-2019 jmatthew

Replace the default mac filter with one that ignores vlans, and enable
promisc vlan mode so we can see vlan tagged traffic.

ok dlg@


# 1.12 20-Jan-2019 jmatthew

implement ixl_rxrinfo, dynamically allocating the buffer to prepare for
multiple receive rings at some point in the future.

ok dlg@


# 1.11 20-Jan-2019 jmatthew

Handle link state change interrupts by issuing IXL_AQ_OP_PHY_LINK_STATUS
to the admin queue. We don't need to wait for or process the reply,
because the existing admin reply queue processing already does it.

ok dlg@


# 1.10 19-Jan-2019 jmatthew

actually set CAUSE_ENA on the rx and tx queues, and re-enable interrupts
at the start of the interrupt handler. now it works well enough to commit
over.

ok dlg@


# 1.9 18-Jan-2019 jmatthew

pack hmc bits in the right order

ok dlg@


# 1.8 18-Nov-2018 jmatthew

request notification of link state changes, which helps us detect
link when it takes a bit longer to establish.

ok dlg@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.7 21-Dec-2017 dlg

add some ifq and ifiq barriers in ixl_down.

move rxfill in ixl_up so the qtail is only written once.


# 1.6 21-Dec-2017 dlg

now that we have multiple input queues in ifnet structs, use them.

for now we still only have one set of tx and rx rings, but sets us up to
bump that number.


# 1.5 15-Dec-2017 dlg

put where im up to into the tree so jmatthew@ can look at it.


# 1.4 29-Nov-2017 dlg

turns out you dont have to configure a vsi as the default in a veb.

what a waste of two days.


# 1.3 29-Nov-2017 dlg

let this build on sparc64 again.


# 1.2 28-Nov-2017 dlg

remove the #if 0ed out ixl_add_veb now that cvs has backed it up

im pretty sure we dont need it if we're just using the chip as a
single ethernet port.


# 1.1 28-Nov-2017 dlg

add ixl(4) for the "Intel Ethernet 700 Series"

this doesn't work yet, but it very recently got too big to hack on
without cvs to help me manage further changes to it.

ok deraadt@


# 1.10 19-Jan-2019 jmatthew

actually set CAUSE_ENA on the rx and tx queues, and re-enable interrupts
at the start of the interrupt handler. now it works well enough to commit
over.

ok dlg@


# 1.9 18-Jan-2019 jmatthew

pack hmc bits in the right order

ok dlg@


# 1.8 18-Nov-2018 jmatthew

request notification of link state changes, which helps us detect
link when it takes a bit longer to establish.

ok dlg@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.7 21-Dec-2017 dlg

add some ifq and ifiq barriers in ixl_down.

move rxfill in ixl_up so the qtail is only written once.


# 1.6 21-Dec-2017 dlg

now that we have multiple input queues in ifnet structs, use them.

for now we still only have one set of tx and rx rings, but sets us up to
bump that number.


# 1.5 15-Dec-2017 dlg

put where im up to into the tree so jmatthew@ can look at it.


# 1.4 29-Nov-2017 dlg

turns out you dont have to configure a vsi as the default in a veb.

what a waste of two days.


# 1.3 29-Nov-2017 dlg

let this build on sparc64 again.


# 1.2 28-Nov-2017 dlg

remove the #if 0ed out ixl_add_veb now that cvs has backed it up

im pretty sure we dont need it if we're just using the chip as a
single ethernet port.


# 1.1 28-Nov-2017 dlg

add ixl(4) for the "Intel Ethernet 700 Series"

this doesn't work yet, but it very recently got too big to hack on
without cvs to help me manage further changes to it.

ok deraadt@


# 1.8 18-Nov-2018 jmatthew

request notification of link state changes, which helps us detect
link when it takes a bit longer to establish.

ok dlg@


Revision tags: OPENBSD_6_3_BASE OPENBSD_6_4_BASE
# 1.7 21-Dec-2017 dlg

add some ifq and ifiq barriers in ixl_down.

move rxfill in ixl_up so the qtail is only written once.


# 1.6 21-Dec-2017 dlg

now that we have multiple input queues in ifnet structs, use them.

for now we still only have one set of tx and rx rings, but sets us up to
bump that number.


# 1.5 15-Dec-2017 dlg

put where im up to into the tree so jmatthew@ can look at it.


# 1.4 29-Nov-2017 dlg

turns out you dont have to configure a vsi as the default in a veb.

what a waste of two days.


# 1.3 29-Nov-2017 dlg

let this build on sparc64 again.


# 1.2 28-Nov-2017 dlg

remove the #if 0ed out ixl_add_veb now that cvs has backed it up

im pretty sure we dont need it if we're just using the chip as a
single ethernet port.


# 1.1 28-Nov-2017 dlg

add ixl(4) for the "Intel Ethernet 700 Series"

this doesn't work yet, but it very recently got too big to hack on
without cvs to help me manage further changes to it.

ok deraadt@


# 1.7 21-Dec-2017 dlg

add some ifq and ifiq barriers in ixl_down.

move rxfill in ixl_up so the qtail is only written once.


# 1.6 21-Dec-2017 dlg

now that we have multiple input queues in ifnet structs, use them.

for now we still only have one set of tx and rx rings, but sets us up to
bump that number.


# 1.5 15-Dec-2017 dlg

put where im up to into the tree so jmatthew@ can look at it.


# 1.4 29-Nov-2017 dlg

turns out you dont have to configure a vsi as the default in a veb.

what a waste of two days.


# 1.3 29-Nov-2017 dlg

let this build on sparc64 again.


# 1.2 28-Nov-2017 dlg

remove the #if 0ed out ixl_add_veb now that cvs has backed it up

im pretty sure we dont need it if we're just using the chip as a
single ethernet port.


# 1.1 28-Nov-2017 dlg

add ixl(4) for the "Intel Ethernet 700 Series"

this doesn't work yet, but it very recently got too big to hack on
without cvs to help me manage further changes to it.

ok deraadt@