#
309447 |
|
02-Dec-2016 |
jhb |
MFC 303522,303647,303860,303880,304168-304170,304479,304482,304485,305548, 305549: Chelsio T4/T5 VF driver.
303522: Various fixes to the t4/5nex character device.
- Remove null open/close methods. - Don't set d_flags to 0 explicitly. - Remove t5_cdevsw as the .d_name member isn't really used and doesn't warrant a separate cdevsw just for the name. - Use ENOTTY as the error value for an unknown ioctl request. - Use make_dev_s() to close race with setting si_drv1.
303647: Store the offset of the KDOORBELL and GTS registers in the softc.
VF devices use a different register layout than PF devices. Storing the offset in a value in the softc allows code to be shared between the PF and VF drivers.
303860: Reserve an adapter flag IS_VF to mark VF devices vs PF devices.
303880: Track the base absolute ID of ingress and egress queues.
Use this to map an absolute queue ID to a logical queue ID in interrupt handlers. For the regular cxgbe/cxl drivers this should be a no-op as the base absolute ID should be zero. VF devices have a non-zero base absolute ID and require this change. While here, export the absolute ID of egress queues via a sysctl.
304168: Make SGE parameter handling more VF-friendly.
Add fields to hold the SGE control register and free list buffer sizes to the sge_params structure. Populate these new fields in t4_init_sge_params() for PF devices and change t4_read_chip_settings() to pull these values out of the params structure instead of reading registers directly. This will permit t4_read_chip_settings() to be reused for VF devices which cannot read SGE registers directly.
While here, move the call to t4_init_sge_params() to get_params__post_init(). The VF driver will populate the SGE parameters structure via a different method before calling t4_read_chip_settings().
304169: Update mailbox writes to work with VF devices.
- Use alternate register locations for the data and control registers for VFs. - Do a dummy read to force the writes to the mailbox data registers to post before the write to the control register on VFs. - Do not check the PCI-e firmware register for errors on VFs.
304170: Add support for register dumps on VF devices.
- Add handling of VF register sets to t4_get_regs_len() and t4_get_regs(). - While here, use t4_get_regs_len() in the ioctl handler for regdump instead of inlining it.
304479: Add structures for VF-specific adapter parameters.
While here, mark which parameters are PF-specific and which are VF-specific.
304482: Adjust t4_port_init() to work with VF devices.
Specifically, the FW_PORT_CMD may or may not work for a VF (the PF driver can choose whether or not to permit access to this command), so don't attempt to fetch port information on a VF if permission is denied by the PF.
304485: Reorder sysctls so that nodes shared with the VF driver are added first.
This permits a single early return for VF devices in the routines that add sysctl nodes.
305548: Don't break out of the m_advance() loop if len drops to zero.
If a packet contains the Ethernet header (14 bytes) in the first mbuf and the payload (IP + UDP + data) in the second mbuf, then the attempt to fetch the l3hdr will return a NULL pointer. The first loop iteration will drop len to zero and exit the loop without setting 'p'. However, the desired data is at the start of the second mbuf, so the correct behavior is to loop around and let the conditional set 'p' to m_data of the next mbuf (and leave offset as 0).
305549: Chelsio T4/T5 VF driver.
The cxgbev/cxlv driver supports Virtual Function devices for Chelsio T4 and T4 adapters. The VF devices share most of their code with the existing PF4 driver (cxgbe/cxl) and as such the VF device driver currently depends on the PF4 driver.
Similar to the cxgbe/cxl drivers, the VF driver includes a t4vf/t5vf PCI device driver that attaches to the VF device. It then creates child cxgbev/cxlv devices representing ports assigned to the VF. By default, the PF driver assigns a single port to each VF.
t4vf_hw.c contains VF-specific routines from the shared code used to fetch VF-specific parameters from the firmware.
t4_vf.c contains the VF-specific PCI device driver and includes its own attach routine.
VF devices are required to use a different firmware request when transmitting packets (which in turn requires a different CPL message to encapsulate messages). This alternate firmware request does not permit chaining multiple packets in a single message, so each packet results in a firmware request. In addition, the different CPL message requires more detailed information when enabling hardware checksums, so parse_pkt() on VF devices must examine L2 and L3 headers for all packets (not just TSO packets) for VF devices. Finally, L2 checksums on non-UDP/non-TCP packets do not work reliably (the firmware trashes the IPv4 fragment field), so IPv4 checksums for such packets are calculated in software.
Most of the other changes in the non-VF-specific code are to expose various variables and functions private to the PF driver so that they can be used by the VF driver.
Note that a limited subset of cxgbetool functions are supported on VF devices including register dumps, scheduler classes, and clearing of statistics. In addition, TOE is not supported on VF devices, only for the PF interfaces.
Sponsored by: Chelsio Communications
|
#
284052 |
|
06-Jun-2015 |
np |
MFC r276480, r276485, r276498, r277225, r277226, r277227, r277230, r277637, and r283149 (by emaste@).
r276485 is the real change here, the rest deal with the fallout of mp_ring's reliance on 64b atomics.
Use the incorrectly spelled 'eigth' from struct pkthdr in this branch instead of MFC'ing r261733, which would have renamed the field of a public structure in a -STABLE branch. ---
r276480: Temporarily unplug cxgbe(4) from !amd64 builds.
r276485: cxgbe(4): major tx rework.
a) Front load as much work as possible in if_transmit, before any driver lock or software queue has to get involved.
b) Replace buf_ring with a brand new mp_ring (multiproducer ring). This is specifically for the tx multiqueue model where one of the if_transmit producer threads becomes the consumer and other producers carry on as usual. mp_ring is implemented as standalone code and it should be possible to use it in any driver with tx multiqueue. It also has: - the ability to enqueue/dequeue multiple items. This might become significant if packet batching is ever implemented. - an abdication mechanism to allow a thread to give up writing tx descriptors and have another if_transmit thread take over. A thread that's writing tx descriptors can end up doing so for an unbounded time period if a) there are other if_transmit threads continuously feeding the sofware queue, and b) the chip keeps up with whatever the thread is throwing at it. - accurate statistics about interesting events even when the stats come at the expense of additional branches/conditional code.
The NIC txq lock is uncontested on the fast path at this point. I've left it there for synchronization with the control events (interface up/down, modload/unload).
c) Add support for "type 1" coalescing work request in the normal NIC tx path. This work request is optimized for frames with a single item in the DMA gather list. These are very common when forwarding packets. Note that netmap tx in cxgbe already uses these "type 1" work requests.
d) Do not request automatic cidx updates every 32 descriptors. Instead, request updates via bits in individual work requests (still every 32 descriptors approximately). Also, request an automatic final update when the queue idles after activity. This means NIC tx reclaim is still performed lazily but it will catch up quickly as soon as the queue idles. This seems to be the best middle ground and I'll probably do something similar for netmap tx as well.
e) Implement a faster tx path for WRQs (used by TOE tx and control queues, _not_ by the normal NIC tx). Allow work requests to be written directly to the hardware descriptor ring if room is available. I will convert t4_tom and iw_cxgbe modules to this faster style gradually.
r276498: cxgbe(4): remove buf_ring specific restriction on the txq size.
r277225: Make cxgbe(4) buildable with the gcc in base.
r277226: Allow cxgbe(4) to be built on i386. Driver attach will succeed only on a subset of i386 systems.
r277227: Plug cxgbe(4) back into !powerpc && !arm builds, instead of building it on amd64 only.
r277230: Build cxgbe(4) on powerpc64 too.
r277637: Make sure the compiler flag to get cxgbe(4) to compile with gcc is used only when gcc is being used. This is what r277225 should have been.
|
#
284052 |
|
06-Jun-2015 |
np |
MFC r276480, r276485, r276498, r277225, r277226, r277227, r277230, r277637, and r283149 (by emaste@).
r276485 is the real change here, the rest deal with the fallout of mp_ring's reliance on 64b atomics.
Use the incorrectly spelled 'eigth' from struct pkthdr in this branch instead of MFC'ing r261733, which would have renamed the field of a public structure in a -STABLE branch. ---
r276480: Temporarily unplug cxgbe(4) from !amd64 builds.
r276485: cxgbe(4): major tx rework.
a) Front load as much work as possible in if_transmit, before any driver lock or software queue has to get involved.
b) Replace buf_ring with a brand new mp_ring (multiproducer ring). This is specifically for the tx multiqueue model where one of the if_transmit producer threads becomes the consumer and other producers carry on as usual. mp_ring is implemented as standalone code and it should be possible to use it in any driver with tx multiqueue. It also has: - the ability to enqueue/dequeue multiple items. This might become significant if packet batching is ever implemented. - an abdication mechanism to allow a thread to give up writing tx descriptors and have another if_transmit thread take over. A thread that's writing tx descriptors can end up doing so for an unbounded time period if a) there are other if_transmit threads continuously feeding the sofware queue, and b) the chip keeps up with whatever the thread is throwing at it. - accurate statistics about interesting events even when the stats come at the expense of additional branches/conditional code.
The NIC txq lock is uncontested on the fast path at this point. I've left it there for synchronization with the control events (interface up/down, modload/unload).
c) Add support for "type 1" coalescing work request in the normal NIC tx path. This work request is optimized for frames with a single item in the DMA gather list. These are very common when forwarding packets. Note that netmap tx in cxgbe already uses these "type 1" work requests.
d) Do not request automatic cidx updates every 32 descriptors. Instead, request updates via bits in individual work requests (still every 32 descriptors approximately). Also, request an automatic final update when the queue idles after activity. This means NIC tx reclaim is still performed lazily but it will catch up quickly as soon as the queue idles. This seems to be the best middle ground and I'll probably do something similar for netmap tx as well.
e) Implement a faster tx path for WRQs (used by TOE tx and control queues, _not_ by the normal NIC tx). Allow work requests to be written directly to the hardware descriptor ring if room is available. I will convert t4_tom and iw_cxgbe modules to this faster style gradually.
r276498: cxgbe(4): remove buf_ring specific restriction on the txq size.
r277225: Make cxgbe(4) buildable with the gcc in base.
r277226: Allow cxgbe(4) to be built on i386. Driver attach will succeed only on a subset of i386 systems.
r277227: Plug cxgbe(4) back into !powerpc && !arm builds, instead of building it on amd64 only.
r277230: Build cxgbe(4) on powerpc64 too.
r277637: Make sure the compiler flag to get cxgbe(4) to compile with gcc is used only when gcc is being used. This is what r277225 should have been.
|