History log of /freebsd-10-stable/sys/powerpc/conf/NOTES
Revision Date Author Comments
# 309447 02-Dec-2016 jhb

MFC 303522,303647,303860,303880,304168-304170,304479,304482,304485,305548,
305549:
Chelsio T4/T5 VF driver.

303522:
Various fixes to the t4/5nex character device.

- Remove null open/close methods.
- Don't set d_flags to 0 explicitly.
- Remove t5_cdevsw as the .d_name member isn't really used and doesn't
warrant a separate cdevsw just for the name.
- Use ENOTTY as the error value for an unknown ioctl request.
- Use make_dev_s() to close race with setting si_drv1.

303647:
Store the offset of the KDOORBELL and GTS registers in the softc.

VF devices use a different register layout than PF devices. Storing
the offset in a value in the softc allows code to be shared between the
PF and VF drivers.

303860:
Reserve an adapter flag IS_VF to mark VF devices vs PF devices.

303880:
Track the base absolute ID of ingress and egress queues.

Use this to map an absolute queue ID to a logical queue ID in interrupt
handlers. For the regular cxgbe/cxl drivers this should be a no-op as
the base absolute ID should be zero. VF devices have a non-zero base
absolute ID and require this change. While here, export the absolute ID
of egress queues via a sysctl.

304168:
Make SGE parameter handling more VF-friendly.

Add fields to hold the SGE control register and free list buffer sizes to
the sge_params structure. Populate these new fields in
t4_init_sge_params() for PF devices and change t4_read_chip_settings() to
pull these values out of the params structure instead of reading
registers directly. This will permit t4_read_chip_settings() to be reused
for VF devices which cannot read SGE registers directly.

While here, move the call to t4_init_sge_params() to
get_params__post_init(). The VF driver will populate the SGE parameters
structure via a different method before calling t4_read_chip_settings().

304169:
Update mailbox writes to work with VF devices.

- Use alternate register locations for the data and control registers for
VFs.
- Do a dummy read to force the writes to the mailbox data registers to
post before the write to the control register on VFs.
- Do not check the PCI-e firmware register for errors on VFs.

304170:
Add support for register dumps on VF devices.

- Add handling of VF register sets to t4_get_regs_len() and t4_get_regs().
- While here, use t4_get_regs_len() in the ioctl handler for regdump
instead of inlining it.

304479:
Add structures for VF-specific adapter parameters.

While here, mark which parameters are PF-specific and which are
VF-specific.

304482:
Adjust t4_port_init() to work with VF devices.

Specifically, the FW_PORT_CMD may or may not work for a VF (the PF
driver can choose whether or not to permit access to this command),
so don't attempt to fetch port information on a VF if permission is
denied by the PF.

304485:
Reorder sysctls so that nodes shared with the VF driver are added first.

This permits a single early return for VF devices in the routines that
add sysctl nodes.

305548:
Don't break out of the m_advance() loop if len drops to zero.

If a packet contains the Ethernet header (14 bytes) in the first mbuf
and the payload (IP + UDP + data) in the second mbuf, then the attempt
to fetch the l3hdr will return a NULL pointer. The first loop iteration
will drop len to zero and exit the loop without setting 'p'. However,
the desired data is at the start of the second mbuf, so the correct
behavior is to loop around and let the conditional set 'p' to m_data of
the next mbuf (and leave offset as 0).

305549:
Chelsio T4/T5 VF driver.

The cxgbev/cxlv driver supports Virtual Function devices for Chelsio
T4 and T4 adapters. The VF devices share most of their code with the
existing PF4 driver (cxgbe/cxl) and as such the VF device driver
currently depends on the PF4 driver.

Similar to the cxgbe/cxl drivers, the VF driver includes a t4vf/t5vf
PCI device driver that attaches to the VF device. It then creates
child cxgbev/cxlv devices representing ports assigned to the VF.
By default, the PF driver assigns a single port to each VF.

t4vf_hw.c contains VF-specific routines from the shared code used to
fetch VF-specific parameters from the firmware.

t4_vf.c contains the VF-specific PCI device driver and includes its
own attach routine.

VF devices are required to use a different firmware request when
transmitting packets (which in turn requires a different CPL message
to encapsulate messages). This alternate firmware request does not
permit chaining multiple packets in a single message, so each packet
results in a firmware request. In addition, the different CPL message
requires more detailed information when enabling hardware checksums,
so parse_pkt() on VF devices must examine L2 and L3 headers for all
packets (not just TSO packets) for VF devices. Finally, L2 checksums
on non-UDP/non-TCP packets do not work reliably (the firmware trashes
the IPv4 fragment field), so IPv4 checksums for such packets are
calculated in software.

Most of the other changes in the non-VF-specific code are to expose
various variables and functions private to the PF driver so that they
can be used by the VF driver.

Note that a limited subset of cxgbetool functions are supported on VF
devices including register dumps, scheduler classes, and clearing of
statistics. In addition, TOE is not supported on VF devices, only for
the PF interfaces.

Sponsored by: Chelsio Communications


# 284052 06-Jun-2015 np

MFC r276480, r276485, r276498, r277225, r277226, r277227, r277230,
r277637, and r283149 (by emaste@).

r276485 is the real change here, the rest deal with the fallout of
mp_ring's reliance on 64b atomics.

Use the incorrectly spelled 'eigth' from struct pkthdr in this branch
instead of MFC'ing r261733, which would have renamed the field of a
public structure in a -STABLE branch.
---

r276480:
Temporarily unplug cxgbe(4) from !amd64 builds.

r276485:
cxgbe(4): major tx rework.

a) Front load as much work as possible in if_transmit, before any driver
lock or software queue has to get involved.

b) Replace buf_ring with a brand new mp_ring (multiproducer ring). This
is specifically for the tx multiqueue model where one of the if_transmit
producer threads becomes the consumer and other producers carry on as
usual. mp_ring is implemented as standalone code and it should be
possible to use it in any driver with tx multiqueue. It also has:
- the ability to enqueue/dequeue multiple items. This might become
significant if packet batching is ever implemented.
- an abdication mechanism to allow a thread to give up writing tx
descriptors and have another if_transmit thread take over. A thread
that's writing tx descriptors can end up doing so for an unbounded
time period if a) there are other if_transmit threads continuously
feeding the sofware queue, and b) the chip keeps up with whatever the
thread is throwing at it.
- accurate statistics about interesting events even when the stats come
at the expense of additional branches/conditional code.

The NIC txq lock is uncontested on the fast path at this point. I've
left it there for synchronization with the control events (interface
up/down, modload/unload).

c) Add support for "type 1" coalescing work request in the normal NIC tx
path. This work request is optimized for frames with a single item in
the DMA gather list. These are very common when forwarding packets.
Note that netmap tx in cxgbe already uses these "type 1" work requests.

d) Do not request automatic cidx updates every 32 descriptors. Instead,
request updates via bits in individual work requests (still every 32
descriptors approximately). Also, request an automatic final update
when the queue idles after activity. This means NIC tx reclaim is still
performed lazily but it will catch up quickly as soon as the queue
idles. This seems to be the best middle ground and I'll probably do
something similar for netmap tx as well.

e) Implement a faster tx path for WRQs (used by TOE tx and control
queues, _not_ by the normal NIC tx). Allow work requests to be written
directly to the hardware descriptor ring if room is available. I will
convert t4_tom and iw_cxgbe modules to this faster style gradually.

r276498:
cxgbe(4): remove buf_ring specific restriction on the txq size.

r277225:
Make cxgbe(4) buildable with the gcc in base.

r277226:
Allow cxgbe(4) to be built on i386. Driver attach will succeed only on
a subset of i386 systems.

r277227:
Plug cxgbe(4) back into !powerpc && !arm builds, instead of building it
on amd64 only.

r277230:
Build cxgbe(4) on powerpc64 too.

r277637:
Make sure the compiler flag to get cxgbe(4) to compile with gcc is used
only when gcc is being used. This is what r277225 should have been.


# 266331 17-May-2014 ian

MFC 263301

In kernel config files, it is supposed to be 'options<space><tab>' not
'options<tab><tab>', per long standing (but recently not so strictly
enforced) convention.


# 284052 06-Jun-2015 np

MFC r276480, r276485, r276498, r277225, r277226, r277227, r277230,
r277637, and r283149 (by emaste@).

r276485 is the real change here, the rest deal with the fallout of
mp_ring's reliance on 64b atomics.

Use the incorrectly spelled 'eigth' from struct pkthdr in this branch
instead of MFC'ing r261733, which would have renamed the field of a
public structure in a -STABLE branch.
---

r276480:
Temporarily unplug cxgbe(4) from !amd64 builds.

r276485:
cxgbe(4): major tx rework.

a) Front load as much work as possible in if_transmit, before any driver
lock or software queue has to get involved.

b) Replace buf_ring with a brand new mp_ring (multiproducer ring). This
is specifically for the tx multiqueue model where one of the if_transmit
producer threads becomes the consumer and other producers carry on as
usual. mp_ring is implemented as standalone code and it should be
possible to use it in any driver with tx multiqueue. It also has:
- the ability to enqueue/dequeue multiple items. This might become
significant if packet batching is ever implemented.
- an abdication mechanism to allow a thread to give up writing tx
descriptors and have another if_transmit thread take over. A thread
that's writing tx descriptors can end up doing so for an unbounded
time period if a) there are other if_transmit threads continuously
feeding the sofware queue, and b) the chip keeps up with whatever the
thread is throwing at it.
- accurate statistics about interesting events even when the stats come
at the expense of additional branches/conditional code.

The NIC txq lock is uncontested on the fast path at this point. I've
left it there for synchronization with the control events (interface
up/down, modload/unload).

c) Add support for "type 1" coalescing work request in the normal NIC tx
path. This work request is optimized for frames with a single item in
the DMA gather list. These are very common when forwarding packets.
Note that netmap tx in cxgbe already uses these "type 1" work requests.

d) Do not request automatic cidx updates every 32 descriptors. Instead,
request updates via bits in individual work requests (still every 32
descriptors approximately). Also, request an automatic final update
when the queue idles after activity. This means NIC tx reclaim is still
performed lazily but it will catch up quickly as soon as the queue
idles. This seems to be the best middle ground and I'll probably do
something similar for netmap tx as well.

e) Implement a faster tx path for WRQs (used by TOE tx and control
queues, _not_ by the normal NIC tx). Allow work requests to be written
directly to the hardware descriptor ring if room is available. I will
convert t4_tom and iw_cxgbe modules to this faster style gradually.

r276498:
cxgbe(4): remove buf_ring specific restriction on the txq size.

r277225:
Make cxgbe(4) buildable with the gcc in base.

r277226:
Allow cxgbe(4) to be built on i386. Driver attach will succeed only on
a subset of i386 systems.

r277227:
Plug cxgbe(4) back into !powerpc && !arm builds, instead of building it
on amd64 only.

r277230:
Build cxgbe(4) on powerpc64 too.

r277637:
Make sure the compiler flag to get cxgbe(4) to compile with gcc is used
only when gcc is being used. This is what r277225 should have been.


# 266331 17-May-2014 ian

MFC 263301

In kernel config files, it is supposed to be 'options<space><tab>' not
'options<tab><tab>', per long standing (but recently not so strictly
enforced) convention.