History log of /freebsd-current/sys/dev/ath/if_ath_tx_ht.c
Revision Date Author Comments
# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# 4d846d26 10-May-2023 Warner Losh <imp@FreeBSD.org>

spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD

The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix


# e4b7980c 19-Jan-2023 Gordon Bergling <gbe@FreeBSD.org>

ath(4): Fix a sysctl description and a typo in a comment

- s/delimeter/delimiter/

MFC after: 7 days


# fe5ebb23 24-Sep-2020 Bjoern A. Zeeb <bz@FreeBSD.org>

Provide MS() and SM() macros for 80211 and wireless drivers.

We have (two versions) of MS() and SM() macros which we use throughout
the wireless code. Change all but three places (ath_hal, rtwn, and rsu)
to the newly provided _IEEE80211_MASKSHIFT() and _IEEE80211_SHIFTMASK()
macros. Also change one internal case using both _S and _M instead of
just _S away from _M (one of the reasons rtwn and rsu were not changed).

This was done semi-mechanically. No functional changes intended.

Requested by: gnn (D26091)
Reviewed by: adrian (pre line wrap)
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC (d/b/a "Netgate")
Differential Revision: https://reviews.freebsd.org/D26539


# 9966c0f9 01-Sep-2020 Mateusz Guzik <mjg@FreeBSD.org>

ath: clean up empty lines in .c and .h files


# cce63444 15-May-2020 Adrian Chadd <adrian@FreeBSD.org>

[ath] [ath_rate] Extend ath_rate_sample to better handle 11n rates and aggregates.

My initial rate control code was .. suboptimal. I wanted to at least get MCS
rates sent, but it didn't do anywhere near enough to handle low signal level links
or remotely keep accurate statistics.

So, 8 years later, here's what I should've done back then.

* Firstly, I wasn't at all tracking packet sizes other than the two buckets
(250 and 1600 bytes.) So, extend it to include 4096, 8192, 16384, 32768 and
65536. I may go add 2048 at some point if I find it's useful.

This is important for a few reasons. First, when forming A-MPDU or AMSDU
aggregates the frame sizes are larger, and thus the TX time calculation
is woefully, increasingly wrong. Secondly, the behaviour of 802.11 channels
isn't some fixed thing, both due to channel conditions and radios themselves.
Notably, there was some observations done a few years ago on 11n chipsets
which noticed longer aggregates showed an increase in failed A-MPDU sub-frame
reception as you got further along in the transmit time. It could be due to
a variety of things - transmitter linearity, channel conditions changing,
frequency/phase drift, etc - but the observation was to potentially form
shorter aggregates to improve BER.

* .. and then modify the ath TX path to report the length of the aggregate sent,
so as the statistics kept would line up with the correct bucket.

* Then on the rate control look-up side - i was also only using the first frame
length for an A-MPDU rate control lookup which isn't good enough here.
So, add a new method that walks the TID software queue for that node to
find out what the likely length of data available is. It isn't ALL of the
data in the queue because we'll only ever send enough data to fit inside the
block-ack window, so limit how many bytes we return to roughly what ath_tx_form_aggr()
would do.

* .. and cache that in the first ath_buf in the aggregate so it and the eventual
AMPDU length can be returned to the rate control code.

* THEN, modify the rate control code to look at them both when deciding which bucket
to attribute the sent frame on. I'm erring on the side of caution and using the
size bucket that the lookup is based on.

Ok, so now the rate lookups and statistics are "more correct". However, MCS rates
are not the same as 11abg rates in that they're not a monotonically incrementing
set of faster rates and you can't assume that just because a given MCS rate fails,
the next higher one wouldn't work better or be a lower average tx time.

So, I had to do a bunch of surgery to the best rate and sample rate math.
This is the bit that's a WIP.

* First, simplify the statistics updates (update_stats()) to do a single pass on
all rates.
* Next, make sure that each rate average tx time is updated based on /its/ failure/success.
Eg if you sent a frame with { MCS15, MCS12, MCS8 } and MCS8 succeeded, MCS15 and MCS
12 would have their average tx time updated for /their/ part of the transmission,
not the whole transmission.
* Next, EWMA wasn't being fully calculated based on the /failures/ in each of the
rate attempts. So, if MCS15, MCS12 failed above but MCS8 didn't, then ensure
that the statistics noted that /all/ subframes failed at those rates, rather than
the eventual set of transmitted/sent frames. This ensures the EWMA /and/ average
TX time are updated correctly.
* When picking a sample rate and initial rate, probe rates aroud the current MCS
but limit it to MCS0..7 /for all spatial streams/, rather than doing crazy things
like hitting MCS7 and then probing MCS8 - MCS8 is basically MCS0 but two spatial
streams. It's a /lot/ slower than MCS7. Also, the reverse is true - if we're at
MCS8 then don't probe MCS7 as part of it, it's not likely to succeed.
* Fix bugs in pick_best_rate() where I was /immediately/ choosing the highest MCS
rate if there weren't any frames yet transmitted. I was defaulting to 25% EWMA and
.. then each comparison would accept the higher rate. Just skip those; sampling
will fill in the details.

So, this seems to work a lot better. It's not perfect; I'm still seeing a lot of
instability around higher MCS rates because there are bursts of loss/retransmissions
that aren't /too/ bad. But i'll keep iterating over this and tidying up my hacks.

Ok, so why this still something I'm poking at? rather than porting minstrel_ht?

ath_rate_sample tries to minimise airtime, not maximise throughput. I have
extended it with an EWMA based on sub-frame success/failures - high MCS rates
that have partially successful receptions still show super short average frame
times, but a /lot/ of retransmits have to happen for that to work.
So for MCS rates I also track this EWMA and ensure that the rates I'm choosing
don't have super crappy packet failures. I don't mind not getting lower
peak throughput versus minstrel_ht; instead I want to see if I can make "minimise
airtime" work well.

Tested:

* AR9380, STA mode
* AR9344, STA mode
* AR9580, STA/AP mode


# 718cf2cc 27-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys/dev: further adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.


# a2d74cc3 21-Jan-2017 Adrian Chadd <adrian@FreeBSD.org>

[ath] only apply the AR9300 delimiter workaround for the first sub-frame.

This is supposed to only be applied to the first subframe and only if
RTS/CTS is being done. I'm still not yet checking RTS/CTS exchange status
so it's just happening for all subframes on AR9380 and later.

This gets MCS23 throughput up from around 250mbit to 303mbit with RTS/CTS
protection enabled, and around 330mbit with no HT protection enabled.

Now, MCS23 has a PHY rate of 450mbit and we should be seeing closer to
400mbit for a straight one-way UDP test, but this beats the previous
maximum throughput.

Tested:

* AR9380 (STA) -> AR9580 (AP) - STA with the modifications, doing UDP TX
test using iperf.


# 28dc144e 21-Jan-2017 Adrian Chadd <adrian@FreeBSD.org>

[ath] improve the debugging when looking into the maximum A-MPDU size being chosen.

This is how I caught the "why are we only sending 8K aggregates?" problem.


# 8f1e1139 21-Jan-2017 Andriy Voskoboinyk <avos@FreeBSD.org>

ath: adapt LDPC support checks

Set both IEEE80211_HTCAP_LDPC and IEEE80211_HTC_TXLDPC capability flags
if LDPC is supported + set 'do_ldpc = 1' only when it is not disabled,
not just supported.

Reviewed by: adrian
Differential Revision: https://reviews.freebsd.org/D9277


# 707210ff 02-Dec-2016 Adrian Chadd <adrian@FreeBSD.org>

[ath] use the correct AMPDU frame limit for the given node, rather than the global config.

This is important in hostap, ibss, (11s at some magical future date, etc)
where different nodes may have smaller limits.

Oops!

MFC after: 1 week
Relnotes: Yes


# c969f82d 21-Nov-2016 Adrian Chadd <adrian@FreeBSD.org>

[ath] obey the peer A-MPDU density and max-size.

* Obey the peer A-MPDU density if it's larger than the currently configured
one.

* Pay attention to the peer A-MPDU max-size and don't assume we can transmit
a full A-MPDU (64k!) if the peer announces smaller values.

Relnotes: ath(4): Fix A-MPDU transmit; obey A-MPDU density and max size.


# ffa25bec 18-Jul-2016 Adrian Chadd <adrian@FreeBSD.org>

[ath] don't do LDPC, STBC or short-gi for locationing frames.

The 11n duration calculation function in net80211 and the HAL round /up/
the duration calculation for short-gi, so we can't use that.

The 11n duration calculation doesn't know about the extra symbol time
needed for STBC, nor the LDPC encoding duration, so we can't use
that.

This (along with other, local hacks) allow the locationing services to
get down to around 200nS (yes, nanoseconds) of variance when speaking
to a "good" AP.

Tested:

* AR9380, STA mode, local locationing frame hacks


# 7ff1939d 15-Jul-2016 Adrian Chadd <adrian@FreeBSD.org>

[ath] [ath_hal] break out the duration calculation to optionally include SIFS.

The pre-11n calculations include SIFS, but the 11n ones don't.

The reason is that (mostly) the 11n hardware is doing the SIFS calculation
for us but the pre-11n hardware isn't. This means that we're over-shooting
the times in the duration field for non-11n frames on 11n hardware, which
is OK, if not a little inefficient.

Now, this is all fine for what the hardware needs for doing duration math
for ACK, RTS/CTS, frame length, etc, but it isn't useful for doing PHY
duration calculations. Ie, given a frame to TX and its timestamp, what
would the end of the actual transmission time be; and similar for an
RX timestamp and figuring out its original length.

So, this adds a new field to the duration routines which requests
SIFS or no SIFS to be included. All the callers currently will call
it requesting SIFS, so this /should/ be a glorious no-op. I'm however
planning some future work around airtime fairness and positioning which
requires these routines to have SIFS be optional.

Notably though, the 11n version doesn't do any SIFS addition at the moment.
I'll go and tweak and verify all of the packet durations before I go and
flip that part on.

Tested:

* AR9330, STA mode
* AR9330, AP mode
* AR9380, STA mode


# a7038bd1 07-Jul-2016 Adrian Chadd <adrian@FreeBSD.org>

[ath] obey the short-GI vap config flag when transmitting.

This makes 'ifconfig wlanX -shortgi' work correctly.

Tested:

* AR9380, STA mode

Approved by: re (gjb)


# a0391187 29-Apr-2016 Adrian Chadd <adrian@FreeBSD.org>

[ath] initialise do_ldpc to 0.

I .. can't believe I missed this.

This showed up because the AP was TX'ing LDPC to an iwm(4) chipset,
which didn't advertise LDPC and doesn't /accept/ LDPC. Amusingly, all
the two other FreeBSD 11n parts I had tested with (AR9380, Intel 7260)
and I completely forgot to test on ye olde hardware.

That'll teach me.

Tested:

* AR9580 (AP) - Intel 7260 (STA), AR9380 (STA), Intel 6205 (STA)


# ce725c8e 28-Apr-2016 Adrian Chadd <adrian@FreeBSD.org>

[ath] Add LDPC transmit support.

LDPC adds better transmit reliability if both ends support it.

You in theory can do both STBC and LDPC at the same time.
If I see issues I'll disable it.

* Only enable it if both ends of a connection negotiate it.
* Disable it if any rate is non-11n.
* Count both LDPC TX and STBC TX.

Tested:

* AR9380, STA mode


# f026d693 25-Apr-2016 Adrian Chadd <adrian@FreeBSD.org>

[ath] obey the STBC flag setting in iv_flags_ht

Add support for the FHT_STBC_TX flag in iv_flags_ht, so it'll now obey
the per-vap ifconfig stbctx flag.

This means that we can do STBC TX on one vap and not another VAP.
(As well as STBC RX on said vap; that changes the HTCAP announcement.)


# 19b34b56 27-Nov-2015 Adrian Chadd <adrian@FreeBSD.org>

wrap in ATH_DEBUG.

Thanks sparc64 build!


# 20b0b9ea 27-Nov-2015 Adrian Chadd <adrian@FreeBSD.org>

[ath] conditionally print out the rate series information if ATH_DEBUG_XMIT is set.


# 22a3aee6 15-May-2013 Adrian Chadd <adrian@FreeBSD.org>

Implement my first cut at "correct" node power-save and
PS-POLL support.

This implements PS-POLL awareness i nthe

* Implement frame "leaking", which allows for a software queue
to be scheduled even though it's asleep
* Track whether a frame has been leaked or not
* Leak out a single non-AMPDU frame when transmitting aggregates
* Queue BAR frames if the node is asleep
* Direct-dispatch the rest of control and management frames.
This allows for things like re-association to occur (which involves
sending probe req/resp as well as assoc request/response) when
the node is asleep and then tries reassociating.
* Limit how many frames can set in the software node queue whilst
the node is asleep. net80211 is already buffering frames for us
so this is mostly just paranoia.
* Add a PS-POLL method which leaks out a frame if there's something
in the software queue, else it calls net80211's ps-poll routine.
Since the ath PS-POLL routine marks the node as having a single frame
to leak, either a software queued frame would leak, OR the next queued
frame would leak. The next queued frame could be something from the
net80211 power save queue, OR it could be a NULL frame from net80211.

TODO:

* Don't transmit further BAR frames (eg via a timeout) if the node is
currently asleep. Otherwise we may end up exhausting management frames
due to the lots of queued BAR frames.

I may just undo this bit later on and direct-dispatch BAR frames
even if the node is asleep.

* It would be nice to burst out a single A-MPDU frame if both ends
support this. I may end adding a FreeBSD IE soon to negotiate
this power save behaviour.

* I should make STAs timeout of power save mode if they've been in power
save for more than a handful of seconds. This way cards that get
"stuck" in power save mode don't stay there for the "inactivity" timeout
in net80211.

* Move the queue depth check into the driver layer (ath_start / ath_transmit)
rather than doing it in the TX path.

* There could be some naughty corner cases with ps-poll leaking.
Specifically, if net80211 generates a NULL data frame whilst another
transmitter sends a normal data frame out net80211 output / transmit,
we need to ensure that the NULL data frame goes out first.
This is one of those things that should occur inside the VAP/ic TX lock.
Grr, more investigations to do..

Tested:

* STA: AR5416, AR9280
* AP: AR5416, AR9280, AR9160


# de00e5cb 17-Apr-2013 Adrian Chadd <adrian@FreeBSD.org>

Update the rate series setup code to use the decisions already made in
ath_tx_rate_fill_rcflags(). Include setting up the TX power cap in the
rate scenario setup code being passed to the HAL.

Other things:

* add a tx power cap field in ath_rc.
* Add a three-stream flag in ath_rc.
* Delete the LDPC flag from ath_rc - it's not a per-rate flag, it's a
global flag for the transmission.


# 7a27f0a3 28-Feb-2013 Adrian Chadd <adrian@FreeBSD.org>

Oops - fix an incorrect test.


# 56129906 28-Feb-2013 Adrian Chadd <adrian@FreeBSD.org>

Don't enable the HT flags for legacy rates.

I stumbled across this whilst trying to debug another weird hang reported
on the freebsd-wireless list.

Whilst here, add in the STBC check to ath_rateseries_setup().

Whilst here, fix the short preamble flag to be set only for legacy rates.

Whilst here, comment that we should be using the full set of decisions
made by ath_rateseries_setup() rather than recalculating them!


# 1a3a5607 26-Feb-2013 Adrian Chadd <adrian@FreeBSD.org>

Enable STBC for the given rate series if it's negotiated:

* If both ends have negotiated (at least) one stream;
* Only if it's a single stream rate (MCS0-7);
* Only if there's more than one TX chain enabled.

Tested:

* AR9280 STA mode -> Atheros AP; tested both MCS2 (STBC) and MCS12 (no STBC.)
Verified using athalq to inspect the TX descriptors.

TODO:

* Test AR5416 - no STBC should be enabled;
* Test AR9280 with one TX chain enabled - no STBC should be enabled.


# 6322256b 25-Feb-2013 Adrian Chadd <adrian@FreeBSD.org>

Part #2 of the TX chainmask changes:

* Remove ar5416UpdateChainmasks();
* Remove the TX chainmask override code from the ar5416 TX descriptor
setup routines;
* Write a driver method to calculate the current chainmask based on the
operating mode and update the driver state;
* Call the HAL chainmask method before calling ath_hal_reset();
* Use the currently configured chainmask in the TX descriptors rather than
the hardware TX chainmasks.

Tested:

* AR5416, STA/AP mode - legacy and 11n modes


# a54ecf78 20-Feb-2013 Adrian Chadd <adrian@FreeBSD.org>

Add an option to allow the minimum number of delimiters to be tweaked.

This is primarily for debugging purposes.

Tested:

* AR5416, STA mode


# 4a502c33 20-Feb-2013 Adrian Chadd <adrian@FreeBSD.org>

Add a new option to limit the maximum size of aggregates.
The default is to limit them to what the hardware is capable of.

Add sysctl twiddles for both the non-RTS and RTS protected aggregate
generation.

Whilst here, add some comments about stuff that I've discovered during
my exploration of the TX aggregate / delimiter setup path from the
reference driver.


# d03904f1 08-Feb-2013 Adrian Chadd <adrian@FreeBSD.org>

Fix a corner case that I noticed with the AR5416 (and it's currently
crappy 802.11n performance, sigh.)

With the AR5416, aggregates need to be limited to 8KiB if RTS/CTS is
enabled. However, larger aggregates were going out with RTSCTS enabled.
The following was going on:

* The first buffer in the list would have RTS/CTS enabled in
bf->bf_state.txflags;
* The aggregate would be formed;
* The "copy over the txflags from the first buffer" logic that I added
blanked the RTS/CTS TX flags fields, and then copied the bf_first
RTS/CTS flags over;
* .. but that'd cause bf_first to be blanked out! And thus the flag
was cleared;
* So the rest of the aggregate formation would run with those flags
cleared, and thus > 8KiB aggregates were formed.

The driver is now (again) correctly limiting aggregate formation for
the AR5416 but there are still other pending issues to resolve.

Tested:

* AR5416, STA mode


# 375307d4 01-Dec-2012 Adrian Chadd <adrian@FreeBSD.org>

Delete the per-TXQ locks and replace them with a single TX lock.

I couldn't think of a way to maintain the hardware TXQ locks _and_ layer
on top of that per-TXQ software queuing and any other kind of fine-grained
locks (eg per-TID, or per-node locks.)

So for now, to facilitate some further code refactoring and development
as part of the final push to get software queue ps-poll and u-apsd handling
into this driver, just do away with them entirely.

I may eventually bring them back at some point, when it looks slightly more
architectually cleaner to do so. But as it stands at the present, it's
not really buying us much:

* in order to properly serialise things and not get bitten by scheduling
and locking interactions with things higher up in the stack, we need to
wrap the whole TX path in a long held lock. Otherwise we can end up
being pre-empted during frame handling, resulting in some out of order
frame handling between sequence number allocation and encryption handling
(ie, the seqno and the CCMP IV get out of sequence);

* .. so whilst that's the case, holding the lock for that long means that
we're acquiring and releasing the TXQ lock _inside_ that context;

* And we also acquire it per-frame during frame completion, but we currently
can't hold the lock for the duration of the TX completion as we need
to call net80211 layer things with the locks _unheld_ to avoid LOR.

* .. the other places were grab that lock are reset/flush, which don't happen
often.

My eventual aim is to change the TX path so all rejected frame transmissions
and all frame completions result in any ieee80211_free_node() calls to occur
outside of the TX lock; then I can cut back on the amount of locking that
goes on here.

There may be some LORs that occur when ieee80211_free_node() is called when
the TX queue path fails; I'll begin to address these in follow-up commits.


# 64dbfc6d 03-Nov-2012 Adrian Chadd <adrian@FreeBSD.org>

For AR9380 NICs - the non-enterprise versions don't support RTS protection
of small (< 256 byte) aggregate frames.

This needs to be done or 11n aggregation TX just simply doesn't work
on these NICs.

Whilst here, extend some debug printing; I was using this whilst
debugging the TX power setup in the TX descriptor(s) on the AR9380.


# 3e6cc97f 07-Oct-2012 Adrian Chadd <adrian@FreeBSD.org>

Migrate the TID TXQ accesses to a new set of macros, rather than reusing
the ATH_TXQ_* macros.

* Introduce the new macros;
* rename the TID queue and TID filtered frame queue so the compiler
tells me I'm using the wrong macro.

These should correspond 1:1 to the existing code.


# 76af1a93 07-Sep-2012 Adrian Chadd <adrian@FreeBSD.org>

Correctly mask out the RTS/CTS flags when forming aggregates.

This had the side effect of clearing HAL_TXDESC_CLRDMASK for a bunch of
frames, meaning they'd end up being potentially filtered if there were
an error. This is fine in the previous world as they'd just be
software retried but now that I'm working on filtered frames, these
descriptors would be endlessly retried until another valid frame would
come along that had CLRDMASK set.


# 8c08c07a 31-Jul-2012 Adrian Chadd <adrian@FreeBSD.org>

Shuffle the call to ath_hal_setuplasttxdesc() to _after_ the rate control
code is called and remove it from ath_buf_set_rate().

For the legacy (non-11n API) TX routines, ath_hal_filltxdesc() takes care
of setting up the intermediary and final descriptors right, complete
with copying the rate control info into the final descriptor so the
rate modules can grab it.

The 11n version doesn't do this - ath_hal_chaintxdesc() doesn't
copy the rate control bits over, nor does it clear isaggr/moreaggr/
pad delimiters. So the call to setuplasttxdesc() is needed here.

So:

* legacy NICs - never call the 11n rate control stuff, so filltxdesc
copies the rate control info right;
* 11n NICs transmitting legacy or 11n non-aggregate frames -
ath_hal_set11nratescenario() is called to setup rate control and
then ath_hal_filltxdesc() chains them together - so the rate control
info is right;
* 11n aggregate frames - set11nratescenario() is called, then
ath_hal_chaintxdesc() is called to chain a list of aggregate and subframes
together. This requires a call to ath_hal_setuplasttxdesc() to complete
things.

Tested:

* AR9280 in station mode

TODO:

* I really should make sure that the descriptor contents get blanked
out correctly or garbage left over from aggregate frames may show
up in non-aggregate frames, leading to badness.


# 3e647f1c 27-Jul-2012 Adrian Chadd <adrian@FreeBSD.org>

Introduce a couple more fields in the rate scenario setup as part of
(future) TPC support in the AR9300 HAL.

This is effectively a no-op for the moment as (a) TPC isn't really
supported, (b) the AR9300 HAL isn't yet public, and (c) the existing
HAL code doesn't use these fields.

Obtained from: Qualcomm Atheros


# 59ab7720 22-Jul-2012 Adrian Chadd <adrian@FreeBSD.org>

Revert this; it wasn't supposed to be part of this commit.


# 3fdfc330 22-Jul-2012 Adrian Chadd <adrian@FreeBSD.org>

Begin separating out the TX DMA setup in preparation for TX EDMA support.

* Introduce TX DMA setup/teardown methods, mirroring what's done in
the RX path.

Although the TX DMA descriptor is setup via ath_desc_alloc() /
ath_desc_free(), there TX status descriptor ring will be allocated
in this path.

* Remove some of the TX EDMA capability probing from the RX path and
push it into the new TX EDMA path.


# b25c1f2a 16-Jun-2012 Adrian Chadd <adrian@FreeBSD.org>

A few nitpicks:

* Use ATH_RC_NUM instead of '4' when iterating over the ratecontrol series
array.

* A few style(9) fixes, hopefully no regressions here.

* Add some comments that better describe what's going on.


# 46f53901 12-Jun-2012 Adrian Chadd <adrian@FreeBSD.org>

Remove a duplicate definition.


# a108d2d6 11-Jun-2012 Adrian Chadd <adrian@FreeBSD.org>

Revert r233227 and followup commits as it breaks CCMP PN replay detection.

This showed up when doing heavy UDP throughput on SMP machines.

The problem with this is because the 802.11 sequence number is being
allocated separately to the CCMP PN replay number (which is assigned
during ieee80211_crypto_encap()).

Under significant throughput (200+ MBps) the TX path would be stressed
enough that frame TX/retry would force sequence number and PN allocation
to be out of order. So once the frames were reordered via 802.11 seqnos,
the CCMP PN would be far out of order, causing most frames to be discarded
by the receiver.

I've fixed this in some local work by being forced to:

(a) deal with the issues that lead to the parallel TX causing out of
order sequence numbers in the first place;
(b) fix all the packet queuing issues which lead to strange (but mostly
valid) TX.

I'll begin fixing these in a subsequent commit or five.

PR: kern/166190


# 781e7eaf 06-Apr-2012 Adrian Chadd <adrian@FreeBSD.org>

As I thought, this is a bad idea. When forming aggregates, the RTS/CTS
stuff and rate control lookup is only done on the first frame.


# 045bc788 06-Apr-2012 Adrian Chadd <adrian@FreeBSD.org>

Enforce the RTS aggregation limit if RTS/CTS protection is enabled;
if any subframes in an aggregate have different protection from the
first frame in the formed aggregate, don't add that frame to the
aggregate.

This is likely a suboptimal method (I think we'll mostly be OK marking
frames that have seqno's with the same protection as normal data frames)
but I'll just be cautious for now.


# 875a9451 06-Apr-2012 Adrian Chadd <adrian@FreeBSD.org>

Remove duplicate txflags field from ath_buf.

rename bf_state.bfs_flags to bf_state.bfs_txflags, as that is what
it effectively is.


# 091e146c 26-Mar-2012 Adrian Chadd <adrian@FreeBSD.org>

Use the assigned sequence number when checking if a retried packet is
within the BAW.

This regression was introduced in ane earlier commit by me to fix the
BAW seqno allocation-but-not-insertion-into-BAW race. Since it was only
ever using the to-be allocated sequence number, any frame retries
with the first frame in the BAW still in the software queue would
have constantly failed, as ni_txseqs[tid] would always be outside
the BAW.

TODO:

* Extract out the mostly common code here in the agg and non-agg ADDBA
case and stuff it into a single function.

PR: kern/166357


# 0b96ef63 19-Mar-2012 Adrian Chadd <adrian@FreeBSD.org>

Delay sequence number allocation for A-MPDU until just before the frame
is queued to the hardware.

Because multiple concurrent paths can execute ath_start(), multiple
concurrent paths can push frames into the software/hardware TX queue
and since preemption/interrupting can occur, there's the possibility
that a gap in time will occur between allocating the sequence number
and queuing it to the hardware.

Because of this, it's possible that a thread will have allocated a
sequence number and then be preempted by another thread doing the same.
If the second thread sneaks the frame into the BAW, the (earlier) sequence
number of the first frame will be now outside the BAW and will result
in the frame being constantly re-added to the tail of the queue.
There it will live until the sequence numbers cycle around again.

This also creates a hole in the RX BAW tracking which can also cause
issues.

This patch delays the sequence number allocation to occur only just before
the frame is going to be added to the BAW. I've been wanting to do this
anyway as part of a general code tidyup but I've not gotten around to it.
This fixes the PR.

However, it still makes it quite difficult to try and ensure in-order
queuing and dequeuing of frames. Since multiple copies of ath_start()
can be run at the same time (eg one TXing process thread, one TX completion
task/one RX task) the driver may end up having frames dequeued and pushed
into the hardware slightly/occasionally out of order.

And, to make matters more annoying, net80211 may have the same behaviour -
in the non-aggregation case, the TX code allocates sequence numbers
before it's thrown to the driver. I'll open another PR to investigate
this and potentially introduce some kind of final-pass TX serialisation
before frames are thrown to the hardware. It's also very likely worthwhile
adding some debugging code into ath(4) and net80211 to catch when/if this
does occur.

PR: kern/166190


# eb6f0de0 08-Nov-2011 Adrian Chadd <adrian@FreeBSD.org>

Introduce TX aggregation and software TX queue management
for Atheros AR5416 and later wireless devices.

This is a very large commit - the complete history can be
found in the user/adrian/if_ath_tx branch.

Legacy (ie, pre-AR5416) devices also use the per-software
TXQ support and (in theory) can support non-aggregation
ADDBA sessions. However, the net80211 stack doesn't currently
support this.

In summary:

TX path:

* queued frames normally go onto a per-TID, per-node queue
* some special frames (eg ADDBA control frames) are thrown
directly onto the relevant hardware queue so they can
go out before any software queued frames are queued.
* Add methods to create, suspend, resume and tear down an
aggregation session.
* Add in software retransmission of both normal and aggregate
frames.
* Add in completion handling of aggregate frames, including
parsing the block ack bitmap provided by the hardware.
* Write an aggregation function which can assemble frames into
an aggregate based on the selected rate control and channel
configuration.
* The per-TID queues are locked based on their target hardware
TX queue. This matches what ath9k/atheros does, and thus
simplified porting over some of the aggregation logic.
* When doing TX aggregation, stick the sequence number allocation
in the TX path rather than net80211 TX path, and protect it
by the TXQ lock.

Rate control:

* Delay rate control selection until the frame is about to
be queued to the hardware, so retried frames can have their
rate control choices changed. Frames with a static rate
control selection have that applied before each TX, just
to simplify the TX path (ie, not have "static" and "dynamic"
rate control special cased.)
* Teach ath_rate_sample about aggregates - both completion and
errors.
* Add an EWMA for tracking what the current "good" MCS rate is
based on failure rates.

Misc:

* Introduce a bunch of dirty hacks and workarounds so TID mapping
and net80211 frame inspection can be kept out of the net80211
layer. Because of the way this code works (and it's from Atheros
and Linux ath9k), there is a consistent, 1:1 mapping between
TID and AC. So we need to ensure that frames going to a specific
TID will _always_ end up on the right AC, and vice versa, or the
completion/locking will simply get very confused. I plan on
addressing this mess in the future.

Known issues:

* There is no BAR frame transmission just yet. A whole lot of
tidying up needs to occur before BAR frame TX can occur in the
"correct" place - ie, once the TID TX queue has been drained.

* Interface reset/purge/etc results in frames in the TX and RX
queues being removed. This creates holes in the sequence numbers
being assigned and the TX/RX AMPDU code (on either side) just
hangs.

* There's no filtered frame support at the present moment, so
stations going into power saving mode will simply have a number
of frames dropped - likely resulting in a traffic "hang".

* Raw frame TX is going to just not function with 11n aggregation.
Likely this needs to be modified to always override the sequence
number if the frame is going into an aggregation session.
However, general raw frame injection currently doesn't work in
general in net80211, so let's just ignore this for now until
this is sorted out.

* HT protection is just not implemented and won't be until the above
is sorted out. In addition, the AR5416 has issues RTS protecting
large aggregates (anything >8k), so the work around needs to be
ported and tested. Thus, this will be put on hold until the above
work is complete.

* The rate control module 'sample' is the only currently supported
module; onoe/amrr haven't been tested and have likely bit rotted
a little. I'll follow up with some commits to make them work again
for non-11n rates, but they won't be updated to handle 11n and
aggregation. If someone wishes to do so then they're welcome to
send along patches.

* .. and "sample" doesn't really do a good job of 11n TX. Specifically,
the metrics used (packet TX time and failure/success rates) isn't as
useful for 11n. It's likely that it should be extended to take into
account the aggregate throughput possible and then choose a rate
which maximises that. Ie, it may be acceptable for a higher MCS rate
with a higher failure to be used if it gives a more acceptable
throughput/latency then a lower MCS rate @ a lower error rate.
Again, patches will be gratefully accepted.

Because of this, ATH_ENABLE_11N is still not enabled by default.

Sponsored by: Hobnob, Inc.
Obtained from: Linux, Atheros


# 6246be6e 30-May-2011 Adrian Chadd <adrian@FreeBSD.org>

Enable setting the short-GI bit when TX'ing HT rates but only if the
hardware supports it.

Since ni->ni_htcap in hostap mode is what the remote end has advertised,
not what has been negotiated/decided, we need to check ourselves what
the current channel width is and what the hardware supports before
enabling short-GI.

It's important that short-GI isn't enabled when it isn't negotiated
and when the hardware doesn't support it (ie, short-gi for 20mhz channels
on any chip < AR9287.)

I've quickly verified this on the AR9285 in 11n mode.


# 532f2442 25-Mar-2011 Adrian Chadd <adrian@FreeBSD.org>

After discussing with Bernhard, the "right" way in net80211 to check
the channel width is ni->ni_chw, which is set to the negotiated channel
width. ni->ni_htflags is the capability, rather than the negotiated
value.

Teach both the TX path and the sample rate module about this.


# ab2e5836 24-Mar-2011 Adrian Chadd <adrian@FreeBSD.org>

Re-disable the setting of 2040/shortgi bits for now.

This seems to work fine for STA but not HT/20 AP mode.

Further discussion with net80211 people will need to take place
to ensure that the right flags are set based on the negotiated
capabilities of the remote peer, rather than whatever the local
parameters are.

Sending short-gi frames in 20mhz may work on some chips but
it certainly isn't supported on anything currently supported
by the HAL; and sending HT40 frames in HT20 mode just plain
won't work.


# 7b83029b 24-Mar-2011 Adrian Chadd <adrian@FreeBSD.org>

Flip back HT/40 and Short-GI (for 40mhz operation). These are now verified to work.


# 1198947a 22-Mar-2011 Adrian Chadd <adrian@FreeBSD.org>

Clean up setting the short preamble bit in the rate - this way it
is very obvious (and cleanly so) that it occurs for non-11n rates.


# fce6d676 13-Mar-2011 Adrian Chadd <adrian@FreeBSD.org>

The number of streams is not based on the interface stream count, but the
number of streams needed for that MCS rate.


# 4c957574 02-Mar-2011 Adrian Chadd <adrian@FreeBSD.org>

Disable trying to do HT/40 and short-GI TX.

These flags are just plain wrong - they're the node flags from negotiation,
not the configured flags. I'll jump in later on and figure out exactly
what should be done to properly set these two flags when in both STA mode
(ie, what the AP says is possible and what's configured) and AP mode
(ie, where the AP has a configuration, but then negotiates what's possible
with each node, so per-node configuration can and will differ.)

This allows the 11n 2.4ghz/ht20 mode to associate (but perform poorly still)
and exchange MCS rates with atheros reference APs and a Cisco/Linksys
E3000 AP.


# 2b5684a8 21-Feb-2011 Adrian Chadd <adrian@FreeBSD.org>

Don't set the RTS/CTS enable bit per-scenario if the global RTS/CTS
flags aren't set.


# 146b49d8 21-Feb-2011 Adrian Chadd <adrian@FreeBSD.org>

* Don't setup the scenario if the try count is 0
* Comment what else is going on during rate scenario setup


# 9a97e25e 20-Feb-2011 Adrian Chadd <adrian@FreeBSD.org>

Implement setting the short preamble bit if it's needed for the current node.

Short preamble rates are only for legacy rates; MCS rate codes don't have a short
preamble code like this.


# 7842451a 17-Feb-2011 Adrian Chadd <adrian@FreeBSD.org>

Just be double-sure short-gi isn't being enabled in 20mhz mode.


# 1325ba9d 13-Feb-2011 Adrian Chadd <adrian@FreeBSD.org>

This should be TX stream, not RX stream.


# bf26df36 11-Feb-2011 Adrian Chadd <adrian@FreeBSD.org>

The current code used the fields in ath_set11nratescenario() . Use them
correctly:

* pass in whether to allow the hardware to override the duration field
in the main data frame (durupdate_en) - PS_POLL frames in particular
don't have the duration bit overriden;
* there's no rts/cts duration here; that's done elsehwere


# f449ab1c 11-Feb-2011 Adrian Chadd <adrian@FreeBSD.org>

.. how'd this compile before I commit it and then not now?

Fixed.


# 6c9b00e1 11-Feb-2011 Adrian Chadd <adrian@FreeBSD.org>

The last parameter to ath_computedur_ht() is short-GI, not short-preamble.


# 4b44f6f2 01-Feb-2011 Adrian Chadd <adrian@FreeBSD.org>

Include some preliminary TX HT rate scenario setup code.

The AR5416 and later TX descriptors have new fields for supporting
11n bits (eg 20/40mhz mode, short/long GI) and enabling/disabling
RTS/CTS protection per rate.

These functions will be responsible for initialising the TX descriptors
for the AR5416 and later chips for both HT and legacy frames.

Beacon frames will remain using the non-11n TX descriptor setup for now;
Linux ath9k does much the same.

Note that these functions aren't yet used anywhere; a few more framework
changes are needed before all of the right rate information is available
for TX.