History log of /freebsd-10.1-release/sys/dev/ath/if_ath_tx_edma.c
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# 272461 02-Oct-2014 gjb

Copy stable/10@r272459 to releng/10.1 as part of
the 10.1-RELEASE process.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation

# 256281 10-Oct-2013 gjb

Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


# 250444 10-May-2013 adrian

Make sure the holding descriptor and link pointer are both freed during
a non-loss reset.

When the drain functions are called, the holding descriptor and link pointers
are NULLed out.

But when the processq function is called during a non-loss reset, this
doesn't occur. So the next time a DMA occurs, it's chained to a descriptor
that no longer exists and the hardware gets angry.

Tested:

* AR5416, STA mode; use sysctl dev.ath.X.forcebstuck=1 to force a non-loss
reset.

TODO:

* Further AR9380 testing just to check that the behaviour for the EDMA
chips is sane.

PR: kern/178477


# 250408 09-May-2013 adrian

Update the holding buffer locking for EDMA.


# 249284 08-Apr-2013 adrian

Fix this to compile when ATH_DEBUG_ALQ is defined but ATH_DEBUG isn't.


# 248779 26-Mar-2013 adrian

* Stop processing after HAL_EIO; this is what the reference driver does.
* If we hit an empty queue condition (which I haven't yet root caused, grr.)
.. make sure we release the lock before continuing.


# 248750 26-Mar-2013 adrian

Implement the replacement EDMA FIFO code.

(Yes, the previous code temporarily broke EDMA TX. I'm sorry; I should've
actually setup ATH_BUF_FIFOEND on frames so txq->axq_fifo_depth was
cleared!)

This code implements a whole bunch of sorely needed EDMA TX improvements
along with CABQ TX support.

The specifics:

* When filling/refilling the FIFO, use the new TXQ staging queue
for FIFO frames

* Tag frames with ATH_BUF_FIFOPTR and ATH_BUF_FIFOEND correctly.
For now the non-CABQ transmit path pushes one frame into the TXQ
staging queue without setting up the intermediary link pointers
to chain them together, so draining frames from the txq staging
queue to the FIFO queue occurs AMPDU / MPDU at a time.

* In the CABQ case, manually tag the list with ATH_BUF_FIFOPTR and
ATH_BUF_FIFOEND so a chain of frames is pushed into the FIFO
at once.

* Now that frames are in a FIFO pending queue, we can top up the
FIFO after completing a single frame. This means we can keep
it filled rather than waiting for it drain and _then_ adding
more frames.

* The EDMA restart routine now walks the FIFO queue in the TXQ
rather than the pending queue and re-initialises the FIFO with
that.

* When restarting EDMA, we may have partially completed sending
a list. So stamp the first frame that we see in a list with
ATH_BUF_FIFOPTR and push _that_ into the hardware.

* When completing frames, only check those on the FIFO queue.
We should never ever queue frames from the pending queue
direct to the hardware, so there's no point in checking.

* Until I figure out what's going on, make sure if the TXSTATUS
for an empty queue pops up, complain loudly and continue.
This will stop the panics that people are seeing. I'll add
some code later which will assist in ensuring I'm populating
each descriptor with the correct queue ID.

* When considering whether to queue frames to the hardware queue
directly or software queue frames, make sure the depth of
the FIFO is taken into account now.

* When completing frames, tag them with ATH_BUF_BUSY if they're
not the final frame in a FIFO list. The same holding descriptor
behaviour is required when handling descriptors linked together
with a link pointer as the hardware will re-read the previous
descriptor to refresh the link pointer before contiuning.

* .. and if we complete the FIFO list (ie, the buffer has
ATH_BUF_FIFOEND set), then we don't need the holding buffer
any longer. Thus, free it.

Tested:

* AR9380/AR9580, STA and hostap
* AR9280, STA/hostap

TODO:

* I don't yet trust that the EDMA restart routine is totally correct
in all circumstances. I'll continue to thrash this out under heavy
multiple-TXQ traffic load and fix whatever pops up.


# 248717 26-Mar-2013 adrian

Remove the mcast path calls to ath_hal_gettxdesclinkptr() for axq_link -
they're no longer needed for the legacy path and they're not wanted
for the EDMA path.

Tested:

* AR9280, hostap + CABQ
* AR9380/AR9580, hostap + CABQ


# 248716 26-Mar-2013 adrian

Remove this dead code - it's no longer relevant (as yes, we do actually
support TX on EDMA chips.)


# 248714 26-Mar-2013 adrian

Convert the EDMA multicast queue code over to use the HAL method to set
the descriptor link pointer, rather than directly.

This is needed on AR9380 and later (ie, EDMA) NICs so the multicast queue
has a chance in hell of being put together right.

Tested:

* AR9380, AR9580 in hostap mode, CABQ traffic (but with other patches..)


# 248675 24-Mar-2013 adrian

Fix the locking changes due to the TXQ change drive-by.

Tested:

* AR9580, STA mode


# 248671 23-Mar-2013 adrian

Overhaul the TXQ locking (again!) as part of some beacon/cabq timing
related issues.

Moving the TX locking under one lock made things easier to progress on
but it had one important side-effect - it increased the latency when
handling CABQ setup when sending beacons.

This commit introduces a bunch of new changes and a few unrelated changs
that are just easier to lump in here.

The aim is to have the CABQ locking separate from other locking.
The CABQ transmit path in the beacon process thus doesn't have to grab
the general TX lock, reducing lock contention/latency and making it
more likely that we'll make the beacon TX timing.

The second half of this commit is the CABQ related setup changes needed
for sane looking EDMA CABQ support. Right now the EDMA TX code naively
assumes that only one frame (MPDU or A-MPDU) is being pushed into each
FIFO slot. For the CABQ this isn't true - a whole list of frames is
being pushed in - and thus CABQ handling breaks very quickly.

The aim here is to setup the CABQ list and then push _that list_ to
the hardware for transmission. I can then extend the EDMA TX code
to stamp that list as being "one" FIFO entry (likely by tagging the
last buffer in that list as "FIFO END") so the EDMA TX completion code
correctly tracks things.

Major:

* Migrate the per-TXQ add/removal locking back to per-TXQ, rather than
a single lock.

* Leave the software queue side of things under the ATH_TX_LOCK lock,
(continuing) to serialise things as they are.

* Add a new function which is called whenever there's a beacon miss,
to print out some debugging. This is primarily designed to help
me figure out if the beacon miss events are due to a noisy environment,
issues with the PHY/MAC, or other.

* Move the CABQ setup/enable to occur _after_ all the VAPs have been
looked at. This means that for multiple VAPS in bursted mode, the
CABQ gets primed once all VAPs are checked, rather than being primed
on the first VAP and then having frames appended after this.

Minor:

* Add a (disabled) twiddle to let me enable/disable cabq traffic.
It's primarily there to let me easily debug what's going on with beacon
and CABQ setup/traffic; there's some DMA engine hangs which I'm finally
trying to trace down.

* Clear bf_next when flushing frames; it should quieten some warnings
that show up when a node goes away.

Tested:

* AR9280, STA/hostap, up to 4 vaps (staggered)
* AR5416, STA/hostap, up to 4 vaps (staggered)

TODO:

* (Lots) more AR9380 and later testing, as I may have missed something here.
* Leverage this to fix CABQ hanling for AR9380 and later chips.
* Force bursted beaconing on the chips that default to staggered beacons and
ensure the CABQ stuff is all sane (eg, the MORE bits that aren't being
correctly set when chaining descriptors.)


# 248543 20-Mar-2013 adrian

Fix the EDMA CABQ handling - for now, the CABQ takes a descriptor chain
like the legacy chips expect.


# 246450 07-Feb-2013 adrian

Methodize the process of adding the software TX queue to the taskqueue.

Move it (for now) to the TX taskqueue.


# 245927 25-Jan-2013 adrian

Migrate the TX sending code out from under the ath0 taskq and into
the separate ath0 TX taskq.

Whilst here, make sure that the TX software scheduler is also
running out of the TX task, rather than the ath0 taskqueue.

Make sure that the tx taskqueue is blocked/unblocked as necessary.

This allows for a little more parallelism on multi-core machines,
as well as (eventually) supporting a higher task priority for TX
tasks, allowing said TX task to preempt an already running RX or
TX completion task.

Tested:

* AR5416, AR9280 hostap and STA modes


# 243786 02-Dec-2012 adrian

Delete the per-TXQ locks and replace them with a single TX lock.

I couldn't think of a way to maintain the hardware TXQ locks _and_ layer
on top of that per-TXQ software queuing and any other kind of fine-grained
locks (eg per-TID, or per-node locks.)

So for now, to facilitate some further code refactoring and development
as part of the final push to get software queue ps-poll and u-apsd handling
into this driver, just do away with them entirely.

I may eventually bring them back at some point, when it looks slightly more
architectually cleaner to do so. But as it stands at the present, it's
not really buying us much:

* in order to properly serialise things and not get bitten by scheduling
and locking interactions with things higher up in the stack, we need to
wrap the whole TX path in a long held lock. Otherwise we can end up
being pre-empted during frame handling, resulting in some out of order
frame handling between sequence number allocation and encryption handling
(ie, the seqno and the CCMP IV get out of sequence);

* .. so whilst that's the case, holding the lock for that long means that
we're acquiring and releasing the TXQ lock _inside_ that context;

* And we also acquire it per-frame during frame completion, but we currently
can't hold the lock for the duration of the TX completion as we need
to call net80211 layer things with the locks _unheld_ to avoid LOR.

* .. the other places were grab that lock are reset/flush, which don't happen
often.

My eventual aim is to change the TX path so all rejected frame transmissions
and all frame completions result in any ieee80211_free_node() calls to occur
outside of the TX lock; then I can cut back on the amount of locking that
goes on here.

There may be some LORs that occur when ieee80211_free_node() is called when
the TX queue path fails; I'll begin to address these in follow-up commits.


# 243163 16-Nov-2012 adrian

* Remove a duplicate TX ALQ post routine!
* For CABQ traffic, I -can- chain them together using the next pointer
and just push that particular chain head to the CABQ. However, this
doesn't magically make EDMA TX CABQ work - I have to do some further
hoop jumping.


# 242880 10-Nov-2012 adrian

Correct some rather weird and broken behaviour observed when doing
actual traffic with an AR9380/AR9382/AR9485.

The sample rate control stats would show impossibly large numbers for
"successful packets transmitted." The number was a tad under 2^^64-1.
So after a bit of digging, I found that the sample rate control code
was making 'tries' turn into a negative number.. and this was because
ts_longretry was too small.

The hardware returns "ts_longretry" at the current rate selection,
not overall for that TX descriptor. So if you setup four TX rate
scenarios and the second one works, ts_longretry is only set for
the number of attempts at that second rate scenario. The FreeBSD HAL
code does the correction in ath_hal_proctxdesc() - however, this isn't
possible with EDMA.

EDMA TX completion is done separate from the original TX descriptor.
So the real solution is to split out "find ts_rate and ts_longretry"
from "complete TX descriptor". Until that's done, put a hack in
the EDMA TX path that uses the rate scenario information in the ath_buf.

Tested: AR9380, AR9382, AR9485 STA mode


# 242782 08-Nov-2012 adrian

Add some hooks into the driver to attach, detach and record EDMA descriptor
events.

This is primarily for the TX EDMA and TX EDMA completion. I haven't yet
tied it into the EDMA RX path or the legacy TX/RX path.

Things that I don't quite like:

* Make the pointer type 'void' in ath_softc and have if_ath_alq*()
return a malloc'ed buffer. That would remove the need to include
if_ath_alq.h in if_athvar.h.
* The sysctl setup needs to be cleaned up.


# 242780 08-Nov-2012 adrian

Oops, fix bogus spacing.


# 242779 08-Nov-2012 adrian

Implement the ATH_RESET_NOLOSS path for TX stop and start; this is needed
for 802.11n TX device restarting.

Remove the debug printf()s; they're no longer needed here.


# 242604 05-Nov-2012 adrian

Clear IFF_DRV_OACTIVE if any slots were completed.

This unblocks TX EDMA under high load.


# 242540 04-Nov-2012 adrian

Oops - conditionalise that.


# 242532 03-Nov-2012 adrian

EDMA TX tweaks:

* don't poke ath_hal_txstart() if nothing was pushed into the FIFO during
the refill process;

* shuffle around the TX debugging output a little so it's logged at
TX hardware enqueue;

* Add logging of the TX status processing.


# 239504 21-Aug-2012 adrian

Initialise an uninitialised variable.

GCC on -9 didn't pick this up; clang did.

Submitted by: David Wolfskill


# 239410 20-Aug-2012 adrian

Flesh out some initial EDMA TX FIFO fill, complete and refill routines.

Note: This is totally sub-optimal and a work in progress.

* Support filling an empty FIFO TXQ with frames from the ath_buf queue
in the ath_txq list. However, since there's (currently) no clean, easy
way to separate the frames that are in the FIFO versus just waiting,
the code waits for the FIFO to be totally empty before it attempts to
queue more. This is highly sub-optimal but is enough to get the ball
rolling.

* A _lot_ of the code assumes that the TX status is filled out in the
struct ath_buf bf_status field. So for now, memcpy() the completion over.

* None of the TX drain / reset routines will attempt to complete completed
frames before draining, so it can't be used for 802.11n TX aggregation.
(This won't work anyway, as the aggregation TX descriptor API hasn't
yet been converted; and that'll happen in some future commits.)

* Fix an issue where the FIFO counter wasn't being incremented, leading
to the queue logic just plain not working.

* HAL_EIO means "descriptor wasn't valid", versus "not finished, don't
continue." So don't stop processing descriptors when HAL_EIO is hit.

* Don't service frame completion from the beacon queue. It isn't currently
fully setup like a real queue and the first attempt at accessing the
queue lock will panic the kernel.

Tested:

* AR9380, STA mode

This commit is brought to you by said AR9380 in STA mode.


# 239205 11-Aug-2012 adrian

Revert the ath_tx_draintxq() method, and instead teach it the minimum
necessary to "do" EDMA.

It was just using the TX completion status for logging information about
the descriptor completion. Since with EDMA we don't know this without
checking the TX completion FIFO, we can't provide this information.
So don't.


# 239204 11-Aug-2012 adrian

Break out ath_draintxq() into a method and un-methodize ath_tx_processq().

Now that I understand what's going on with this, I've realised that
it's going to be quite difficult to implement a processq method in
the EDMA case. Because there's a separate TX status FIFO, I can't
just run processq() on each EDMA TXQ to see what's finished.
i have to actually run the TX status queue and handle individual
TXQs.

So:

* unmethodize ath_tx_processq();
* leave ath_tx_draintxq() as a method, as it only uses the completion status
for debugging rather than actively completing the frames (ie, all frames
here are failed);
* Methodize ath_draintxq().

The EDMA ath_draintxq() will have to take care of running the TX
completion FIFO before (potentially) freeing frames in the queue.

The only two places where ath_tx_draintxq() (on a single TXQ) are used:

* ath_draintxq(); and
* the CABQ handling in the beacon setup code - it drains the CABQ before
populating the CABQ with frames for a new beacon (when doing multi-VAP
operation.)

So it's quite possible that once I methodize the CABQ and beacon handling,
I can just drop ath_tx_draintxq() in its entirety.

Finally, it's also quite possible that I can remove ath_tx_draintxq()
in the future and just "teach" it to not check the status when doing
EDMA.


# 239197 11-Aug-2012 adrian

Begin fleshing out the TX FIFO support.

* Add ATH_TXQ_FIRST() for easy tasting of what's on the list;
* Add an "axq_fifo_depth" for easy tracking of how deep the current
FIFO is;
* Flesh out the handoff (mcast, hw) functions;
* Begin fleshing out a TX ISR proc, which tastes the TX status FIFO.

The legacy hardware stuffs the TX completion at the end of the final frame
descriptor (or final sub-frame when doing aggregate.) So it's feasible
to do a per-TXQ drain and process, as the needed info is right there.

For EDMA hardware, there's a separate TX completion FIFO. So the TX
process routine needs to read the single FIFO and then process the
frames in each hardware queue.

This makes it difficult to do a per-queue process, as you'll end up with
frames in the TX completion FIFO for a different TXQ to the one you've
passed to ath_tx_draintxq() or ath_tx_processq().

Testing:

I've tested the TX queue and TX completion code in hostap mode on an
AR9380. Beacon frames successfully transmit and the completion routine
is called. Occasional data frames end up in TXQ 1 and are also
successfully completed.

However, this requires some changes to the beacon code path as:

* The AR9380 beacon configuration API is now in TU/8, rather than
TU;
* The AR9380 TX API requires the rate control is setup using a call
to setup11nratescenario, rather than having the try0 series setup
(rate/tries for the first series); so the beacon won't go out.

I'll follow this up with commits to the beacon code.


# 238931 31-Jul-2012 adrian

Migrate some more TX side setup routines to be methods.


# 238930 31-Jul-2012 adrian

Break out the hardware handoff and TX DMA restart code into methods.

These (and a few others) will differ based on the underlying DMA
implementation.

For the EDMA NICs, simply stub them out in a fashion which will let
me focus on implementing the necessary descriptor API changes.


# 238855 28-Jul-2012 adrian

Flesh out the initial TX FIFO storage for each hardware TX queue.


# 238836 27-Jul-2012 adrian

Allocate a descriptor ring for EDMA TX completion status.

Configure the hardware with said ring physical address and size.


# 238710 23-Jul-2012 adrian

Begin separating out the TX DMA setup in preparation for TX EDMA support.

* Introduce TX DMA setup/teardown methods, mirroring what's done in
the RX path.

Although the TX DMA descriptor is setup via ath_desc_alloc() /
ath_desc_free(), there TX status descriptor ring will be allocated
in this path.

* Remove some of the TX EDMA capability probing from the RX path and
push it into the new TX EDMA path.