History log of /openbsd-current/sys/dev/pci/if_myx.c
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 1.120 24-May-2024 jsg

remove unneeded includes; ok miod@


Revision tags: OPENBSD_7_5_BASE
# 1.119 10-Nov-2023 bluhm

Make ifq and ifiq interface MP safe.

Rename ifq_set_maxlen() to ifq_init_maxlen(). This function neither
uses WRITE_ONCE() nor a mutex and is called before the ifq mutex
is initialized. The new name expresses that it should be used only
during interface attach when there is no concurrency.

Protect ifq_len(), ifq_empty(), ifiq_len(), and ifiq_empty() with
READ_ONCE(). They can be used without lock as they only read a
single integer.

OK dlg@


Revision tags: OPENBSD_7_4_BASE
# 1.118 14-Jul-2023 claudio

struct sleep_state is no longer used, remove it.
Also remove the priority argument to sleep_finish() the code can use
the p_flag P_SINTR flag to know if the signal check is needed or not.
OK cheloha@ kettenis@ mpi@


# 1.117 28-Jun-2023 claudio

First step at removing struct sleep_state.

Pass the timeout and sleep priority not only to sleep_setup() but also
to sleep_finish(). With that sls_timeout and sls_catch can be removed
from struct sleep_state.

The timeout is now setup first thing in sleep_finish() and no longer as
last thing in sleep_setup(). This should not cause a noticeable difference
since the code run between sleep_setup() and sleep_finish() is minimal.

OK kettenis@


Revision tags: OPENBSD_7_1_BASE OPENBSD_7_2_BASE OPENBSD_7_3_BASE
# 1.116 11-Mar-2022 mpi

Constify struct cfattach.


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.115 08-Feb-2021 mpi

Simplify sleep_setup API to two operations in preparation for splitting
the SCHED_LOCK().

Putting a thread on a sleep queue is reduce to the following:

sleep_setup();
/* check condition or release lock */
sleep_finish();

Previous version ok cheloha@, jmatthew@, ok claudio@


# 1.114 17-Jan-2021 dlg

this hardware is fine with BUS_DMA_64BIT mappings.

this raises performance of tcpbench on an m3000 from ~3kpps and
~8MB/s to ~70kpps and ~191MB/s when transmitting, and ~10kpps and
~15MB/s to ~120kpps and 174MB/s when receiving.

i also tested this on a v245 and an m4000 a while back.


# 1.113 12-Dec-2020 jan

Rename the macro MCLGETI to MCLGETL and removes the dead parameter ifp.

OK dlg@, bluhm@
No Opinion mpi@
Not against it claudio@


# 1.112 27-Nov-2020 kevlo

Add initialization of sc_sff_lock rwlock.

ok semarie@


Revision tags: OPENBSD_6_8_BASE
# 1.111 17-Jul-2020 dlg

name the rx rings so systat mb shows them.


# 1.110 17-Jul-2020 dlg

add kstats to myx.

myx is unusually minimal, so there's not a lot of information that
the chip provides. the most interesting is the number of packets
the chip drops cos of a lack of space on the rx rings.


# 1.109 10-Jul-2020 patrick

Change users of IFQ_SET_MAXLEN() and IFQ_IS_EMPTY() to use the "new" API.

ok dlg@ tobhe@


Revision tags: OPENBSD_6_6_BASE OPENBSD_6_7_BASE
# 1.108 03-Jul-2019 dlg

use ifiq_input return values to apply backpressure to rings.


# 1.107 16-Apr-2019 dlg

i2c reads are more reliable a byte at a time.

reading all 256 at a time was a nice idea, but meant page 0xa2 wasnt
appearing like it should. this follows what freebsd does more
closely too.


# 1.106 16-Apr-2019 dlg

make sff page reads work on little endian archs too. like amd64.

some modules seem to need more time when waiting for bytes while here.

hrvoje popovski hit the endian issue


# 1.105 15-Apr-2019 dlg

implement SIOCGIFSFFPAGE so ifconfig can get transceiver info.

myx doesn't allow i2c writes, so you can only read whatever page
the firmware is already pointing at on device 0xa0. if you try to
read another page it will return ENXIO.

tested on a 10G-PCIE-8A-R with an xfp module.


# 1.104 15-Apr-2019 dlg

trim some debug code that printed out the name of a command

the list of commands is going to grow, but the thought of keeping
the list in debug code up to date with it just makes me feel tired.

this prints the command id number instead in the same format we
represent it in the header.


Revision tags: OPENBSD_6_2_BASE OPENBSD_6_3_BASE OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.103 01-Aug-2017 dlg

defer init of the myxmcl pool to mountroot, and enable pool cpu caches.

pool_cache_init cannot be called during autoconf because we cant
be confident about the number of cpus in the machine until the first
run of attaches.

mountroot is after autoconf, and myx already has code that runs
there for the firmware loading.

discussed with deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.102 07-Feb-2017 dlg

move the mbuf pools to m_pool_init and a single global memory limit

this replaces individual calls to pool_init, pool_set_constraints, and
pool_sethardlimit with calls to m_pool_init. m_pool_init inits the
mbuf pools with the mbuf pool allocator, and because of that doesnt
set per pool limits.

ok bluhm@ as part of a larger diff


# 1.101 24-Jan-2017 dlg

add support for multiple transmit ifqueues per network interface.

an ifq to transmit a packet is picked by the current traffic
conditioner (ie, priq or hfsc) by providing an index into an array
of ifqs. by default interfaces get a single ifq but can ask for
more using if_attach_queues().

the vast majority of our drivers still think there's a 1:1 mapping
between interfaces and transmit queues, so their if_start routines
take an ifnet pointer instead of a pointer to the ifqueue struct.
instead of changing all the drivers in the tree, drivers can opt
into using an if_qstart routine and setting the IFXF_MPSAFE flag.
the stack provides a compatability wrapper from the new if_qstart
handler to the previous if_start handlers if IFXF_MPSAFE isnt set.

enabling hfsc on an interface configures it to transmit everything
through the first ifq. any other ifqs are left configured as priq,
but unused, when hfsc is enabled.

getting this in now so everyone can kick the tyres.

ok mpi@ visa@ (who provided some tweaks for cnmac).


# 1.100 22-Jan-2017 dlg

move counting if_opackets next to counting if_obytes in if_enqueue.

this means packets are consistently counted in one place, unlike the
many and various ways that drivers thought they should do it.

ok mpi@ deraadt@


# 1.99 31-Oct-2016 dlg

turns out these chips can handle buffers up to 9400 bytes in length.

raise the mtu to 9380 bytes so we can take advantage of the extra space.

i need to revisit the macro names at some point.


# 1.98 31-Oct-2016 dlg

revert 1.97 where i moved myx to using the system pools

my early revision board doesnt like it at all


# 1.97 28-Oct-2016 dlg

get rid of the custom pool in myx for jumbo frames.

now it asks the mbuf layer for the 9k from its pools.

a question from chris@ made me go look at the chip doco again and i
realised that the chip only requires 4 byte alignment for rx buffers,
no 4k alignment for jumbo buffers.

i also found that the chip is supposed to be able to rx up to 9400
bytes instead of 9000. ill fix that later though.


# 1.96 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


Revision tags: OPENBSD_6_0_BASE
# 1.95 23-May-2016 tedu

remove the function pointer from mbufs. this memory is shared with data
via unions, and we don't want to make it easy to control the target.
instead an integer index into an array of acceptable functions is used.
drivers using custom functions must register them to receive an index.
ok deraadt


# 1.94 13-Apr-2016 mpi

G/C IFQ_SET_READY().


# 1.93 13-Apr-2016 mpi

G/C IFQ_SET_READY().


Revision tags: OPENBSD_5_9_BASE
# 1.92 11-Dec-2015 mpi

Replace mountroothook_establish(9) by config_mountroot(9) a narrower API
similar to config_defer(9).

ok mikeb@, deraadt@


# 1.91 09-Dec-2015 dlg

rework the if_start mpsafe serialisation so it can serialise arbitrary work

work is represented by struct task.

the start routine is now wrapped by a task which is serialised by the
infrastructure. if_start_barrier has been renamed to ifq_barrier and
is now implemented as a task that gets serialised with the start
routine.

this also adds an ifq_restart() function. it serialises a call to
ifq_clr_oactive and calls the start routine again. it exists to
avoid a race that kettenis@ identified in between when a start
routine discovers theres no space left on a ring, and when it calls
ifq_set_oactive. if the txeof side of the driver empties the ring
and calls ifq_clr_oactive in between the above calls in start, the
queue will be marked oactive and the stack will never call the start
routine again.

by serialising the ifq_set_oactive call in the start routine and
ifq_clr_oactive calls we avoid that race.

tested on various nics
ok mpi@


# 1.90 03-Dec-2015 dlg

tell the stack myx_start is mpsafe.

as per the stack commit, the driver changes are:

1. setting ifp->if_xflags = IFXF_MPSAFE
2. only calling if_start() instead of its own start routine
3. clearing IFF_RUNNING before calling if_start_barrier() on its way down
4. only using IFQ_DEQUEUE (not ifq_deq_begin/commit/rollback)


# 1.89 01-Dec-2015 dlg

myx doesnt use atomic.h anymore.


# 1.88 25-Nov-2015 dlg

replace IFF_OACTIVE manipulation with mpsafe operations.

there are two things shared between the network stack and drivers
in the send path: the send queue and the IFF_OACTIVE flag. the send
queue is now protected by a mutex. this diff makes the oactive
functionality mpsafe too.

IFF_OACTIVE is part of if_flags. there are two problems with that.
firstly, if_flags is a short and we dont have any MI atomic operations
to manipulate a short. secondly, while we could make the IFF_OACTIVE
operates mpsafe, all changes to other flags would have to be made
safe at the same time, otherwise a read-modify-write cycle on their
updates could clobber the oactive change.

instead, this moves the oactive mark into struct ifqueue and provides
an API for changing it. there's ifq_set_oactive, ifq_clr_oactive,
and ifq_is_oactive. these are modelled on ifsq_set_oactive,
ifsq_clr_oactive, and ifsq_is_oactive in dragonflybsd.

this diff includes changes to all the drivers manipulating IFF_OACTIVE
to now use the ifsq_{set,clr_is}_oactive API too.

ok kettenis@ mpi@ jmatthew@ deraadt@


# 1.87 24-Nov-2015 dlg

fix tx ring accounting in myx_start.

turns out i was calculating the number of packets (not descriptors)
on the tx ring, and then using that as the free space for descriptors.


# 1.86 19-Nov-2015 dlg

get rid of sc_tx_free and the atomic ops on it in myx_start and myx_txeof.

myx_start calculates the free space by reading the consumer index
and doing some maths, which lets us avoid the interlocked cpu ops.


# 1.85 25-Oct-2015 mpi

arp_ifinit() is no longer needed.


# 1.84 29-Sep-2015 dlg

get rid of the mutex between access to the status block and myx_down

myx is unusual in that it has an explicit command to shut down the
chip that gets an interrupt when it's done. so myx_down sends the
command and has to sleep until it gets that interrupt. this moves
to using a single int to represent that state (so loads and stores
are atomic), and sleep_setup/sleep_finish in myx_down to wait for
it to change.

this has been running in production at work for a few months now
tested by chris@


# 1.83 01-Sep-2015 deraadt

free() firmware with right len; ok dlg


# 1.82 15-Aug-2015 dlg

do the global tx free accounting in myx_start with a single atomic op
instead of one per packet.

seems to let me send packets a little faster.


# 1.81 15-Aug-2015 dlg

rework the tx path to use a ring to keep track of dmamaps/mbufs.

this removes the myx_buf structure and uses myx_slot instead. theyre
the same expcet slots dont have list entry structures, so theyre
smaller.

this cuts out four mutex ops per packet out of the tx handling.
just have to get rid of the atomic op per packet in myx_start now.


# 1.80 14-Aug-2015 dlg

move to a per rx ring timeout for refilling empty rings.

this lets me get rid of the locking around the refilling of the rx ring.

the timeout only runs refill if the rx ring is empty. we know rxeof
wont try and refill it in that situation because there's no packets
on the ring so we wont get interrupts for it. therefore we dont
need to lock between the timeout and rxeof cos they cant run at the
same time.


# 1.79 14-Aug-2015 dlg

rework how we track the packets on the rx rings.

originally there were two mutex protected lists for rx packets, a
list of free packets, and a list of packets that were on the ring.
filling the ring popped packets off the free list, attached an mbuf
and dmamapped it, and pushed it onto the list of active packets.
the hw fills packets in order, so on rx completion we'd pop packets
the active list, unmap the mbuf and shove it up the stack before
putting the packet on the free list.

the problem with the lists is that every rx ring operation resulted
in two mutex ops. so 4 mutex ops per packet after you do both fill
and rxeof.

this replaces the mutexed lists with rings that shadow the hardware
rings. filling the rx ring pushes a producer index along, while
rxeof chases it with a consumer. because we know only one thing can
do either of those tasks at a time, we can get away with not using
atomic ops for them.

there's more to be done, but this is a good first step.


Revision tags: OPENBSD_5_8_BASE
# 1.78 24-Jun-2015 mpi

Increment if_ipackets in if_input().

Note that pseudo-drivers not using if_input() are not affected by this
conversion.

ok mikeb@, kettenis@, claudio@, dlg@


# 1.77 17-May-2015 chris

We don't need KERNEL_LOCK() around if_input() anymore, as if_input() has
appropriate locking around bpf now.

ok dlg@


# 1.76 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.75 20-Feb-2015 chris

Now that if_input() is a thing, use it

ok dlg@


# 1.74 18-Feb-2015 dlg

myri employees and their drivers for linux and solaris have repeatedly
told me that if you're going to rx into buffers greater than 4k in
size, they have to be aligned to a 4k boundary.

the mru of this chip is 9k, but ive been using the 12k mcl pool to
provide the alignment. however, if we move to putting 8 items on a
pool page there'll be enough slack space in the mcl12k pool pages
to allow item colouring, which in turn will break the chip requirement
above. in practice the chips i have seem to work fine with unaligned
buffers, but i dont want to risk breaking early revision chips.

this moves myx to using a private pool for allocating clusters for
the big rx ring. the item size is 9k, but we specify a 4k alignment
so every item we get out of it will be correct for the chip.


# 1.73 18-Feb-2015 dlg

enable pcie relaxed transaction ordering and bump the max payload
size up to 4k.

found while reading someone elses driver.


# 1.72 22-Dec-2014 tedu

unifdef INET


# 1.71 28-Oct-2014 dlg

the if_rxring accounting would get screwed up if the first mbuf to
be put on the ring couldnt be allocated.

this pulls the code that puts the mbufs on the ring out of myx_rx_fill
so it can return early if firstmb cant be allocated, which puts it
in the right place to return unused slots to the if_rxring.

this means myx rx wont lock up if you're DoSsed to the point where
you exhaust your mbuf pools and cant allocate mbufs for the ring.

ok jmatthew@


# 1.70 04-Oct-2014 dlg

replace mutexes to serialise the operations on the flag that restricts
the number of contexts that are refilling the rx rings with atomic
ops.

this is borrowed from code i wrote for the scsi midlayer but cant
put in yet because i havent got atomic.h up to scrach on all archs
yet. the archs myx runs on do have enough atomic.h to be fine though.


# 1.69 03-Oct-2014 dlg

refill the rx ring in myx_rxeof, not much later at the end of myx_intr.


# 1.68 03-Oct-2014 dlg

in rxeof, instead of taking the biglock on every packet to call bpf
and ether_input, queue all the mbufs onto an mbuf_list on the stack
and then take the biglock once outside the loop.


# 1.67 03-Oct-2014 dlg

we dont need the kernel lock to call bus_dmamap_load and unload thanks
to ketenis.

move the if_ipacket and if_opacket increments out of biglock too.
theyre only updated from the interrupt handler, which is only run
on a single cpu so there's no chance of the update racing. everywhere
else only reads them.


# 1.66 03-Oct-2014 dlg

dont need to hold the kernel lock to call MCLGETI and m_freem now.


# 1.65 03-Oct-2014 dlg

dont take the kernel lock on every interrupt in case we might change
the link state or to clear OACTIVE, just take it when we know we
really are going to do those things.


# 1.64 14-Sep-2014 jsg

remove uneeded proc.h includes
ok mpi@ kspillner@


# 1.63 19-Aug-2014 dlg

in myx_start, replace

while (space) {
IFQ_POLL;
myx_dequeue(free descr);
IFQ_DEQUEUE;
etc;
}

with

while (space && myx_dequeue(free descr)) {
IFQ_DEQUEUE;
etc;
}


# 1.62 18-Aug-2014 dlg

dont rely on mbuf.h to provide pool.h.

ok miod@, who has offerred to help with any MD fallout
ok guenther@


Revision tags: OPENBSD_5_6_BASE
# 1.61 12-Jul-2014 tedu

add a size argument to free. will be used soon, but for now default to 0.
after discussions with beck deraadt kettenis.


# 1.60 10-Jul-2014 dlg

rings that dont rx packets dont need to be refilled.


# 1.59 08-Jul-2014 dlg

cut things that relied on mclgeti for rx ring accounting/restriction over
to using if_rxr.

cut the reporting systat did over to the rxr ioctl.

tested as much as i can on alpha, amd64, and sparc64.
mpi@ has run it on macppc.
ok mpi@


# 1.58 17-Jun-2014 dlg

whitespace fix.

im sick of fixing this by hand on all my boxes while hacking on
other stuff and having it pollute my diffs.

no functional change.


# 1.57 24-Mar-2014 dlg

nothing after the irq ack posting relies on it being ordered.


Revision tags: OPENBSD_5_5_BASE
# 1.56 10-Feb-2014 dlg

the mac addresses you program with MYXCMD_SET_MCASTGROUP are in a different
format to the one used for MYXCMD_SET_LLADDR. for reasons.

this lets ospf work if you dont happen to have PROMISC enabled on your
interface like my production firewalls happen to have, which is why i
never noticed this before.


# 1.55 05-Feb-2014 dlg

after running myx(4) without biglock in production for a few days
i discovered that there's a race between the interrupt code and
myx_start which causes the count of free tx descriptors to get
distorted, which eventually leads to a permanent setting of
IFF_OACTIVE, which in turn prevents the driver from transmitting
packets.

fixing that went horribly wrong when i then discovered that there's
a race between the interrupt handler and myx_down, where the interrupt
can tell myx_down to wake up and free all the rings while the
interrupt handler is still looking at them. free panics for all.

this moves the handling of the tx free count under the biglock (for
now), and moves myx_up and myx_down to managing a "driver state"
variable independantly of the IFF_UP and IFF_RUNNING flags, and
very very careful reordering of the checks of that state variable
and the hardware state.

as a bonus we get to avoid excessive calls to myx_txeof and myx_rxeof
in the isr, and less stuff checked unconditionally. on the other
hand, the sc_state handling added some more checks so it might not
be a win overall.

tested on smp sparc64 with msi and nonmsi interrupts, and on amd64 smp
in production again.


# 1.54 31-Jan-2014 dlg

sc_function is set, but never used for anything useful. clean it up...


# 1.53 31-Jan-2014 dlg

sc_lladdr is never used, so we can get the space in the sc back.


# 1.52 23-Jan-2014 dlg

a lot of people have pointed out to me that taking a lock just to read an
int isnt necessary.


# 1.51 23-Jan-2014 dlg

factor the mutex/bus_space handling of the sts block out.


# 1.50 21-Jan-2014 dlg

introduce fine grained locking.

this doesnt give up the big lock coming from process context, only from
the interrupt side. it is excessively careful about when it takes
the big lock again. notably it goes to a lot of effort to not hold
a mutex while calling into other subsystems or before taking the
big lock.

ive been hitting it as hard as i can without problems.

intensly read by mpi@
ok claudio@ kettenis@


# 1.49 19-Jan-2014 dlg

white space fix


# 1.48 19-Jan-2014 dlg

introduce fine grained locking around the lists of packet handlers
myx maintains. this moves it away from relying on splnet to protect
them.


# 1.47 19-Jan-2014 dlg

hwflags is never used, so clean it up


# 1.46 19-Jan-2014 dlg

replace bcmp with memcmp


# 1.45 19-Jan-2014 dlg

bcopy to memcpy


# 1.44 19-Jan-2014 dlg

replace bzero with memset.


# 1.43 19-Jan-2014 dlg

all 64bit archs myx runs on support bus_space 8 things because of work i
did at n2k13.


Revision tags: OPENBSD_5_3_BASE OPENBSD_5_4_BASE
# 1.42 29-Jan-2013 brad

- Set ENETRESET within myx_ioctl() instead of calling myx_iff() directly, to be
consistent with other drivers.
- Clear IFF_ALLMULTI flag early and at the top of myx_iff().
- Set IFF_ALLMULTI when in promisc mode.

ok dlg@


# 1.41 25-Jan-2013 dlg

we go to a lot of effort to post the first tx descriptor last, but we
really should be trying to post everything except the flags field in the
first tx descriptor. this shuffles things around so the rest of that first
txd is posted as part of the "everything else" before its flags field.


# 1.40 25-Jan-2013 dlg

the myx_dmamem struct doesnt need a name.


# 1.39 21-Jan-2013 dlg

myx does reads and writes in one direction to packet buffers. lets try
STREAMING them.


# 1.38 15-Jan-2013 dlg

dont use amd64 is currently broken cos it has no
bus_space_write_raw_region_8. disabling it for now.


# 1.37 15-Jan-2013 dlg

use bus_space_write_raw_region_8 on 64bit archs when writing to the rings


# 1.36 14-Jan-2013 dlg

map the registers PREFETCHABLE so things that can do write combining can
try and do write combining like the myx doco likes.


# 1.35 14-Jan-2013 dlg

avoid extra bus_space barriers in the interrupt handler.


# 1.34 14-Jan-2013 dlg

when posting descriptors to the chips rings, avoid going write barrier
write barrier write barrier when using myx_write to post descriptors.

instead let its go write write write barrier by using the appropriate
bus_space write directly followed by a single bus_space barrier.

the story above is mostly true, except that myx wants use to write all the
descriptors except the first, barrier, and then write the first one out to
signale that the chip can proceed.

it is also worth noting that the barriers cover more address space than
what we actually wrote to. this makes the code much simpler, and avoids
generating extra fence operations (which is what barrier functions end up
as on most of our archs) when we wrap around the end of the ring. the
bus_space doco encourages this.

bus_space use was discussed with krw@ kettenis@ deraadt@


# 1.33 14-Jan-2013 dlg

the myri doco suggests its nice to post stuff by filling in everything
in the rings except the first descriptor. once you've written as
much as you can out, then you go back and post the first descriptor
to signal that the chip should go ahead and work.


# 1.32 14-Jan-2013 dlg

;; is a long way of saying ;


# 1.31 29-Nov-2012 brad

Remove setting an initial assumed baudrate upon driver attach which is not
necessarily correct, there might not even be a link when attaching.

ok mikeb@ reyk@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.30 28-Nov-2011 blambert

Fix reversed error-handling gotos in myx_buf_fill(), which would lead to
either an mbuf leak or a NULL pointer dereference.

ok sthen@ claudio@ dlg@
testing claudio@ dlg@


Revision tags: OPENBSD_5_0_BASE
# 1.29 08-Aug-2011 dlg

myx requires the driver pad short ethernet frames to 60 bytes by
adding a descriptor pointing at zeroed bytes onto the end of transmit
chains. i was accounting for this extra descriptor when i was
completing the chain, but not when i was setting this up. this
meant the number of free descriptors kept growing until it overflowed.
at this point the check for space in the ring failed and packets
no longer flowed.

this counts the pad descriptor in the tx chain setup too.

ok deraadt@


# 1.28 23-Jun-2011 dlg

cope with empty rx rings by scheduling a timeout to keep trying until it
gets some packets onto the rings.

also annoying, but the hardware doesnt report empty rings, we have to
handle it ourselves.


# 1.27 23-Jun-2011 dlg

this chip has an annoying "feature" where it cannot report the link
state unless the chip is up and handling packets. while its down
it does not report the link state, so it is unknown.

this tweaks the link state handling, in particular it adds code to
myx_down so it moves the link state to unknown, ie, it correctly
reflects reality.

stupidity pointed out by deraadt


# 1.26 22-Jun-2011 deraadt

reset the tx_count on UP, since it may have been advanced from non-zero
by a previous use
ok claudio


# 1.25 22-Jun-2011 dlg

msi support. this is a complicated one...

ok kettenis@


# 1.24 22-Jun-2011 jsg

another myri10ge device matched by freebsd/linux drivers
ok dlg@


# 1.23 22-Jun-2011 dlg

oops, handle refill like i said i was going to two revisions ago.


# 1.22 22-Jun-2011 deraadt

set the mac address on the chip correctly (repair the byte order)
it now works on sparc64, too
ok dlg


# 1.21 22-Jun-2011 dlg

deraadt plugged his myx into a sparc64 and discovered 3 problems:

1. we want to write raw values to registers all the time, so promote the
myx_raw{read,write} to myx_{read,write} and use them everywhere. get rid
of the raw funcs.
2. i was setting the watermarks on the rx ring before knowhing how big
they were.
3. rxfill in the interrupt handler could lose data if you loop on
sts_isvalid.

almost working now...

"please commit your diff" deraadt@


# 1.20 21-Jun-2011 dlg

do the unaligned dma tests so we can figure out if we need to fall
back to the unaligned firmware. apparently this is only an issue
on the "A" controllers which have been supersceded, but those are
the chips we (openbsd devs) have.


# 1.19 21-Jun-2011 dlg

report the controllers part number. eg, i now know i have a
10G-PCIE-8A-R. dmesg looks like this:

myx0 at pci4 dev 0 function 0 "Myricom Z8E" rev 0x00: apic 1 int 8, model 10G-PCIE-8A-R, address 00:60:dd:47:c6:74


# 1.18 21-Jun-2011 dlg

wire up jumbos properly. the hardware supports up to 9018 bytes off
the wire (9000 + ether header + vlan tag), but has some cool alignment
requirements. if you want to use a single rx ring desc to point at
a jumbo it needs to start on a 4k boundary and be physically
contiguous. to ensure this im pulling frames from the 12k pool and
waiting for arianes diff to ensure mbufs are contig.

direction from andrew gallatin. tested locally.


# 1.17 21-Jun-2011 deraadt

minor cleanups; ok dlg


# 1.16 20-Jun-2011 dlg

make the interrupt handler look more like what the doco suggests. seems to
fix a bad lockup i kept getting.


# 1.15 20-Jun-2011 dlg

dont need debug, the myx_cmd stuff works fine.


# 1.14 20-Jun-2011 dlg

i got myx working!


# 1.13 02-May-2011 chl

Do not check malloc return value against NULL, as M_WAITOK is used.

ok dlg@ krw@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.12 19-May-2010 oga

BUS_DMA_ZERO instead of alloc, map, bzero.

ok krw@


Revision tags: OPENBSD_4_7_BASE
# 1.11 13-Aug-2009 jasper

- consistify cfdriver for the ethernet drivers (0 -> NULL)

ok dlg@


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.10 28-Nov-2008 brad

Eliminate the redundant bits of code for MTU and multicast handling
from the individual drivers now that ether_ioctl() handles this.

Shrinks the i386 kernels by..
RAMDISK - 2176 bytes
RAMDISKB - 1504 bytes
RAMDISKC - 736 bytes

Tested by naddy@/okan@/sthen@/brad@/todd@/jmc@ and lots of users.
Build tested on almost all archs by todd@/brad@

ok naddy@


# 1.9 02-Oct-2008 brad

First step towards cleaning up the Ethernet driver ioctl handling.
Move calling ether_ioctl() from the top of the ioctl function, which
at the moment does absolutely nothing, to the default switch case.
Thus allowing drivers to define their own ioctl handlers and then
falling back on ether_ioctl(). The only functional change this results
in at the moment is having all Ethernet drivers returning the proper
errno of ENOTTY instead of EINVAL/ENXIO when encountering unknown
ioctl's.

Shrinks the i386 kernels by..
RAMDISK - 1024 bytes
RAMDISKB - 1120 bytes
RAMDISKC - 832 bytes

Tested by martin@/jsing@/todd@/brad@
Build tested on almost all archs by todd@/brad@

ok jsing@


# 1.8 10-Sep-2008 blambert

Convert timeout_add() calls using multiples of hz to timeout_add_sec()

Really just the low-hanging fruit of (hopefully) forthcoming timeout
conversions.

ok art@, krw@


Revision tags: OPENBSD_4_4_BASE
# 1.7 23-May-2008 brad

Simplify the combination use of pci_mapreg_type()/pci_mapreg_map() as
suggested by dlg@ awhile ago.

ok dlg@


Revision tags: OPENBSD_4_3_BASE
# 1.6 16-Jan-2008 thib

Set the baudrate with IF_Gbps(10); and remove an
XXX comment now that if_baudrate is 64bits.

ok reyk@


Revision tags: OPENBSD_4_2_BASE
# 1.5 01-Jun-2007 reyk

initialize the rings


# 1.4 31-May-2007 reyk

further improvement of the bus space i/o. firmware loading, booting,
and card initalization works now.

thanks to dlg@ who pointed me to the fact that
bus_space_write_region_N and bus_space_write_raw_region_N use count of
elements vs. size of buffer arguments.


# 1.3 31-May-2007 reyk

enable all debugging messages by default if the driver is compiled with
MYX_DEBUG


# 1.2 31-May-2007 reyk

fix the myx_write function


# 1.1 31-May-2007 reyk

initial bits of a new driver for the Myricom Myri-10G Lanai-Z8E 10Gb
Ethernet chipset. not working yet.

ok dlg@


# 1.119 10-Nov-2023 bluhm

Make ifq and ifiq interface MP safe.

Rename ifq_set_maxlen() to ifq_init_maxlen(). This function neither
uses WRITE_ONCE() nor a mutex and is called before the ifq mutex
is initialized. The new name expresses that it should be used only
during interface attach when there is no concurrency.

Protect ifq_len(), ifq_empty(), ifiq_len(), and ifiq_empty() with
READ_ONCE(). They can be used without lock as they only read a
single integer.

OK dlg@


Revision tags: OPENBSD_7_4_BASE
# 1.118 14-Jul-2023 claudio

struct sleep_state is no longer used, remove it.
Also remove the priority argument to sleep_finish() the code can use
the p_flag P_SINTR flag to know if the signal check is needed or not.
OK cheloha@ kettenis@ mpi@


# 1.117 28-Jun-2023 claudio

First step at removing struct sleep_state.

Pass the timeout and sleep priority not only to sleep_setup() but also
to sleep_finish(). With that sls_timeout and sls_catch can be removed
from struct sleep_state.

The timeout is now setup first thing in sleep_finish() and no longer as
last thing in sleep_setup(). This should not cause a noticeable difference
since the code run between sleep_setup() and sleep_finish() is minimal.

OK kettenis@


Revision tags: OPENBSD_7_1_BASE OPENBSD_7_2_BASE OPENBSD_7_3_BASE
# 1.116 11-Mar-2022 mpi

Constify struct cfattach.


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.115 08-Feb-2021 mpi

Simplify sleep_setup API to two operations in preparation for splitting
the SCHED_LOCK().

Putting a thread on a sleep queue is reduce to the following:

sleep_setup();
/* check condition or release lock */
sleep_finish();

Previous version ok cheloha@, jmatthew@, ok claudio@


# 1.114 17-Jan-2021 dlg

this hardware is fine with BUS_DMA_64BIT mappings.

this raises performance of tcpbench on an m3000 from ~3kpps and
~8MB/s to ~70kpps and ~191MB/s when transmitting, and ~10kpps and
~15MB/s to ~120kpps and 174MB/s when receiving.

i also tested this on a v245 and an m4000 a while back.


# 1.113 12-Dec-2020 jan

Rename the macro MCLGETI to MCLGETL and removes the dead parameter ifp.

OK dlg@, bluhm@
No Opinion mpi@
Not against it claudio@


# 1.112 27-Nov-2020 kevlo

Add initialization of sc_sff_lock rwlock.

ok semarie@


Revision tags: OPENBSD_6_8_BASE
# 1.111 17-Jul-2020 dlg

name the rx rings so systat mb shows them.


# 1.110 17-Jul-2020 dlg

add kstats to myx.

myx is unusually minimal, so there's not a lot of information that
the chip provides. the most interesting is the number of packets
the chip drops cos of a lack of space on the rx rings.


# 1.109 10-Jul-2020 patrick

Change users of IFQ_SET_MAXLEN() and IFQ_IS_EMPTY() to use the "new" API.

ok dlg@ tobhe@


Revision tags: OPENBSD_6_6_BASE OPENBSD_6_7_BASE
# 1.108 03-Jul-2019 dlg

use ifiq_input return values to apply backpressure to rings.


# 1.107 16-Apr-2019 dlg

i2c reads are more reliable a byte at a time.

reading all 256 at a time was a nice idea, but meant page 0xa2 wasnt
appearing like it should. this follows what freebsd does more
closely too.


# 1.106 16-Apr-2019 dlg

make sff page reads work on little endian archs too. like amd64.

some modules seem to need more time when waiting for bytes while here.

hrvoje popovski hit the endian issue


# 1.105 15-Apr-2019 dlg

implement SIOCGIFSFFPAGE so ifconfig can get transceiver info.

myx doesn't allow i2c writes, so you can only read whatever page
the firmware is already pointing at on device 0xa0. if you try to
read another page it will return ENXIO.

tested on a 10G-PCIE-8A-R with an xfp module.


# 1.104 15-Apr-2019 dlg

trim some debug code that printed out the name of a command

the list of commands is going to grow, but the thought of keeping
the list in debug code up to date with it just makes me feel tired.

this prints the command id number instead in the same format we
represent it in the header.


Revision tags: OPENBSD_6_2_BASE OPENBSD_6_3_BASE OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.103 01-Aug-2017 dlg

defer init of the myxmcl pool to mountroot, and enable pool cpu caches.

pool_cache_init cannot be called during autoconf because we cant
be confident about the number of cpus in the machine until the first
run of attaches.

mountroot is after autoconf, and myx already has code that runs
there for the firmware loading.

discussed with deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.102 07-Feb-2017 dlg

move the mbuf pools to m_pool_init and a single global memory limit

this replaces individual calls to pool_init, pool_set_constraints, and
pool_sethardlimit with calls to m_pool_init. m_pool_init inits the
mbuf pools with the mbuf pool allocator, and because of that doesnt
set per pool limits.

ok bluhm@ as part of a larger diff


# 1.101 24-Jan-2017 dlg

add support for multiple transmit ifqueues per network interface.

an ifq to transmit a packet is picked by the current traffic
conditioner (ie, priq or hfsc) by providing an index into an array
of ifqs. by default interfaces get a single ifq but can ask for
more using if_attach_queues().

the vast majority of our drivers still think there's a 1:1 mapping
between interfaces and transmit queues, so their if_start routines
take an ifnet pointer instead of a pointer to the ifqueue struct.
instead of changing all the drivers in the tree, drivers can opt
into using an if_qstart routine and setting the IFXF_MPSAFE flag.
the stack provides a compatability wrapper from the new if_qstart
handler to the previous if_start handlers if IFXF_MPSAFE isnt set.

enabling hfsc on an interface configures it to transmit everything
through the first ifq. any other ifqs are left configured as priq,
but unused, when hfsc is enabled.

getting this in now so everyone can kick the tyres.

ok mpi@ visa@ (who provided some tweaks for cnmac).


# 1.100 22-Jan-2017 dlg

move counting if_opackets next to counting if_obytes in if_enqueue.

this means packets are consistently counted in one place, unlike the
many and various ways that drivers thought they should do it.

ok mpi@ deraadt@


# 1.99 31-Oct-2016 dlg

turns out these chips can handle buffers up to 9400 bytes in length.

raise the mtu to 9380 bytes so we can take advantage of the extra space.

i need to revisit the macro names at some point.


# 1.98 31-Oct-2016 dlg

revert 1.97 where i moved myx to using the system pools

my early revision board doesnt like it at all


# 1.97 28-Oct-2016 dlg

get rid of the custom pool in myx for jumbo frames.

now it asks the mbuf layer for the 9k from its pools.

a question from chris@ made me go look at the chip doco again and i
realised that the chip only requires 4 byte alignment for rx buffers,
no 4k alignment for jumbo buffers.

i also found that the chip is supposed to be able to rx up to 9400
bytes instead of 9000. ill fix that later though.


# 1.96 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


Revision tags: OPENBSD_6_0_BASE
# 1.95 23-May-2016 tedu

remove the function pointer from mbufs. this memory is shared with data
via unions, and we don't want to make it easy to control the target.
instead an integer index into an array of acceptable functions is used.
drivers using custom functions must register them to receive an index.
ok deraadt


# 1.94 13-Apr-2016 mpi

G/C IFQ_SET_READY().


# 1.93 13-Apr-2016 mpi

G/C IFQ_SET_READY().


Revision tags: OPENBSD_5_9_BASE
# 1.92 11-Dec-2015 mpi

Replace mountroothook_establish(9) by config_mountroot(9) a narrower API
similar to config_defer(9).

ok mikeb@, deraadt@


# 1.91 09-Dec-2015 dlg

rework the if_start mpsafe serialisation so it can serialise arbitrary work

work is represented by struct task.

the start routine is now wrapped by a task which is serialised by the
infrastructure. if_start_barrier has been renamed to ifq_barrier and
is now implemented as a task that gets serialised with the start
routine.

this also adds an ifq_restart() function. it serialises a call to
ifq_clr_oactive and calls the start routine again. it exists to
avoid a race that kettenis@ identified in between when a start
routine discovers theres no space left on a ring, and when it calls
ifq_set_oactive. if the txeof side of the driver empties the ring
and calls ifq_clr_oactive in between the above calls in start, the
queue will be marked oactive and the stack will never call the start
routine again.

by serialising the ifq_set_oactive call in the start routine and
ifq_clr_oactive calls we avoid that race.

tested on various nics
ok mpi@


# 1.90 03-Dec-2015 dlg

tell the stack myx_start is mpsafe.

as per the stack commit, the driver changes are:

1. setting ifp->if_xflags = IFXF_MPSAFE
2. only calling if_start() instead of its own start routine
3. clearing IFF_RUNNING before calling if_start_barrier() on its way down
4. only using IFQ_DEQUEUE (not ifq_deq_begin/commit/rollback)


# 1.89 01-Dec-2015 dlg

myx doesnt use atomic.h anymore.


# 1.88 25-Nov-2015 dlg

replace IFF_OACTIVE manipulation with mpsafe operations.

there are two things shared between the network stack and drivers
in the send path: the send queue and the IFF_OACTIVE flag. the send
queue is now protected by a mutex. this diff makes the oactive
functionality mpsafe too.

IFF_OACTIVE is part of if_flags. there are two problems with that.
firstly, if_flags is a short and we dont have any MI atomic operations
to manipulate a short. secondly, while we could make the IFF_OACTIVE
operates mpsafe, all changes to other flags would have to be made
safe at the same time, otherwise a read-modify-write cycle on their
updates could clobber the oactive change.

instead, this moves the oactive mark into struct ifqueue and provides
an API for changing it. there's ifq_set_oactive, ifq_clr_oactive,
and ifq_is_oactive. these are modelled on ifsq_set_oactive,
ifsq_clr_oactive, and ifsq_is_oactive in dragonflybsd.

this diff includes changes to all the drivers manipulating IFF_OACTIVE
to now use the ifsq_{set,clr_is}_oactive API too.

ok kettenis@ mpi@ jmatthew@ deraadt@


# 1.87 24-Nov-2015 dlg

fix tx ring accounting in myx_start.

turns out i was calculating the number of packets (not descriptors)
on the tx ring, and then using that as the free space for descriptors.


# 1.86 19-Nov-2015 dlg

get rid of sc_tx_free and the atomic ops on it in myx_start and myx_txeof.

myx_start calculates the free space by reading the consumer index
and doing some maths, which lets us avoid the interlocked cpu ops.


# 1.85 25-Oct-2015 mpi

arp_ifinit() is no longer needed.


# 1.84 29-Sep-2015 dlg

get rid of the mutex between access to the status block and myx_down

myx is unusual in that it has an explicit command to shut down the
chip that gets an interrupt when it's done. so myx_down sends the
command and has to sleep until it gets that interrupt. this moves
to using a single int to represent that state (so loads and stores
are atomic), and sleep_setup/sleep_finish in myx_down to wait for
it to change.

this has been running in production at work for a few months now
tested by chris@


# 1.83 01-Sep-2015 deraadt

free() firmware with right len; ok dlg


# 1.82 15-Aug-2015 dlg

do the global tx free accounting in myx_start with a single atomic op
instead of one per packet.

seems to let me send packets a little faster.


# 1.81 15-Aug-2015 dlg

rework the tx path to use a ring to keep track of dmamaps/mbufs.

this removes the myx_buf structure and uses myx_slot instead. theyre
the same expcet slots dont have list entry structures, so theyre
smaller.

this cuts out four mutex ops per packet out of the tx handling.
just have to get rid of the atomic op per packet in myx_start now.


# 1.80 14-Aug-2015 dlg

move to a per rx ring timeout for refilling empty rings.

this lets me get rid of the locking around the refilling of the rx ring.

the timeout only runs refill if the rx ring is empty. we know rxeof
wont try and refill it in that situation because there's no packets
on the ring so we wont get interrupts for it. therefore we dont
need to lock between the timeout and rxeof cos they cant run at the
same time.


# 1.79 14-Aug-2015 dlg

rework how we track the packets on the rx rings.

originally there were two mutex protected lists for rx packets, a
list of free packets, and a list of packets that were on the ring.
filling the ring popped packets off the free list, attached an mbuf
and dmamapped it, and pushed it onto the list of active packets.
the hw fills packets in order, so on rx completion we'd pop packets
the active list, unmap the mbuf and shove it up the stack before
putting the packet on the free list.

the problem with the lists is that every rx ring operation resulted
in two mutex ops. so 4 mutex ops per packet after you do both fill
and rxeof.

this replaces the mutexed lists with rings that shadow the hardware
rings. filling the rx ring pushes a producer index along, while
rxeof chases it with a consumer. because we know only one thing can
do either of those tasks at a time, we can get away with not using
atomic ops for them.

there's more to be done, but this is a good first step.


Revision tags: OPENBSD_5_8_BASE
# 1.78 24-Jun-2015 mpi

Increment if_ipackets in if_input().

Note that pseudo-drivers not using if_input() are not affected by this
conversion.

ok mikeb@, kettenis@, claudio@, dlg@


# 1.77 17-May-2015 chris

We don't need KERNEL_LOCK() around if_input() anymore, as if_input() has
appropriate locking around bpf now.

ok dlg@


# 1.76 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.75 20-Feb-2015 chris

Now that if_input() is a thing, use it

ok dlg@


# 1.74 18-Feb-2015 dlg

myri employees and their drivers for linux and solaris have repeatedly
told me that if you're going to rx into buffers greater than 4k in
size, they have to be aligned to a 4k boundary.

the mru of this chip is 9k, but ive been using the 12k mcl pool to
provide the alignment. however, if we move to putting 8 items on a
pool page there'll be enough slack space in the mcl12k pool pages
to allow item colouring, which in turn will break the chip requirement
above. in practice the chips i have seem to work fine with unaligned
buffers, but i dont want to risk breaking early revision chips.

this moves myx to using a private pool for allocating clusters for
the big rx ring. the item size is 9k, but we specify a 4k alignment
so every item we get out of it will be correct for the chip.


# 1.73 18-Feb-2015 dlg

enable pcie relaxed transaction ordering and bump the max payload
size up to 4k.

found while reading someone elses driver.


# 1.72 22-Dec-2014 tedu

unifdef INET


# 1.71 28-Oct-2014 dlg

the if_rxring accounting would get screwed up if the first mbuf to
be put on the ring couldnt be allocated.

this pulls the code that puts the mbufs on the ring out of myx_rx_fill
so it can return early if firstmb cant be allocated, which puts it
in the right place to return unused slots to the if_rxring.

this means myx rx wont lock up if you're DoSsed to the point where
you exhaust your mbuf pools and cant allocate mbufs for the ring.

ok jmatthew@


# 1.70 04-Oct-2014 dlg

replace mutexes to serialise the operations on the flag that restricts
the number of contexts that are refilling the rx rings with atomic
ops.

this is borrowed from code i wrote for the scsi midlayer but cant
put in yet because i havent got atomic.h up to scrach on all archs
yet. the archs myx runs on do have enough atomic.h to be fine though.


# 1.69 03-Oct-2014 dlg

refill the rx ring in myx_rxeof, not much later at the end of myx_intr.


# 1.68 03-Oct-2014 dlg

in rxeof, instead of taking the biglock on every packet to call bpf
and ether_input, queue all the mbufs onto an mbuf_list on the stack
and then take the biglock once outside the loop.


# 1.67 03-Oct-2014 dlg

we dont need the kernel lock to call bus_dmamap_load and unload thanks
to ketenis.

move the if_ipacket and if_opacket increments out of biglock too.
theyre only updated from the interrupt handler, which is only run
on a single cpu so there's no chance of the update racing. everywhere
else only reads them.


# 1.66 03-Oct-2014 dlg

dont need to hold the kernel lock to call MCLGETI and m_freem now.


# 1.65 03-Oct-2014 dlg

dont take the kernel lock on every interrupt in case we might change
the link state or to clear OACTIVE, just take it when we know we
really are going to do those things.


# 1.64 14-Sep-2014 jsg

remove uneeded proc.h includes
ok mpi@ kspillner@


# 1.63 19-Aug-2014 dlg

in myx_start, replace

while (space) {
IFQ_POLL;
myx_dequeue(free descr);
IFQ_DEQUEUE;
etc;
}

with

while (space && myx_dequeue(free descr)) {
IFQ_DEQUEUE;
etc;
}


# 1.62 18-Aug-2014 dlg

dont rely on mbuf.h to provide pool.h.

ok miod@, who has offerred to help with any MD fallout
ok guenther@


Revision tags: OPENBSD_5_6_BASE
# 1.61 12-Jul-2014 tedu

add a size argument to free. will be used soon, but for now default to 0.
after discussions with beck deraadt kettenis.


# 1.60 10-Jul-2014 dlg

rings that dont rx packets dont need to be refilled.


# 1.59 08-Jul-2014 dlg

cut things that relied on mclgeti for rx ring accounting/restriction over
to using if_rxr.

cut the reporting systat did over to the rxr ioctl.

tested as much as i can on alpha, amd64, and sparc64.
mpi@ has run it on macppc.
ok mpi@


# 1.58 17-Jun-2014 dlg

whitespace fix.

im sick of fixing this by hand on all my boxes while hacking on
other stuff and having it pollute my diffs.

no functional change.


# 1.57 24-Mar-2014 dlg

nothing after the irq ack posting relies on it being ordered.


Revision tags: OPENBSD_5_5_BASE
# 1.56 10-Feb-2014 dlg

the mac addresses you program with MYXCMD_SET_MCASTGROUP are in a different
format to the one used for MYXCMD_SET_LLADDR. for reasons.

this lets ospf work if you dont happen to have PROMISC enabled on your
interface like my production firewalls happen to have, which is why i
never noticed this before.


# 1.55 05-Feb-2014 dlg

after running myx(4) without biglock in production for a few days
i discovered that there's a race between the interrupt code and
myx_start which causes the count of free tx descriptors to get
distorted, which eventually leads to a permanent setting of
IFF_OACTIVE, which in turn prevents the driver from transmitting
packets.

fixing that went horribly wrong when i then discovered that there's
a race between the interrupt handler and myx_down, where the interrupt
can tell myx_down to wake up and free all the rings while the
interrupt handler is still looking at them. free panics for all.

this moves the handling of the tx free count under the biglock (for
now), and moves myx_up and myx_down to managing a "driver state"
variable independantly of the IFF_UP and IFF_RUNNING flags, and
very very careful reordering of the checks of that state variable
and the hardware state.

as a bonus we get to avoid excessive calls to myx_txeof and myx_rxeof
in the isr, and less stuff checked unconditionally. on the other
hand, the sc_state handling added some more checks so it might not
be a win overall.

tested on smp sparc64 with msi and nonmsi interrupts, and on amd64 smp
in production again.


# 1.54 31-Jan-2014 dlg

sc_function is set, but never used for anything useful. clean it up...


# 1.53 31-Jan-2014 dlg

sc_lladdr is never used, so we can get the space in the sc back.


# 1.52 23-Jan-2014 dlg

a lot of people have pointed out to me that taking a lock just to read an
int isnt necessary.


# 1.51 23-Jan-2014 dlg

factor the mutex/bus_space handling of the sts block out.


# 1.50 21-Jan-2014 dlg

introduce fine grained locking.

this doesnt give up the big lock coming from process context, only from
the interrupt side. it is excessively careful about when it takes
the big lock again. notably it goes to a lot of effort to not hold
a mutex while calling into other subsystems or before taking the
big lock.

ive been hitting it as hard as i can without problems.

intensly read by mpi@
ok claudio@ kettenis@


# 1.49 19-Jan-2014 dlg

white space fix


# 1.48 19-Jan-2014 dlg

introduce fine grained locking around the lists of packet handlers
myx maintains. this moves it away from relying on splnet to protect
them.


# 1.47 19-Jan-2014 dlg

hwflags is never used, so clean it up


# 1.46 19-Jan-2014 dlg

replace bcmp with memcmp


# 1.45 19-Jan-2014 dlg

bcopy to memcpy


# 1.44 19-Jan-2014 dlg

replace bzero with memset.


# 1.43 19-Jan-2014 dlg

all 64bit archs myx runs on support bus_space 8 things because of work i
did at n2k13.


Revision tags: OPENBSD_5_3_BASE OPENBSD_5_4_BASE
# 1.42 29-Jan-2013 brad

- Set ENETRESET within myx_ioctl() instead of calling myx_iff() directly, to be
consistent with other drivers.
- Clear IFF_ALLMULTI flag early and at the top of myx_iff().
- Set IFF_ALLMULTI when in promisc mode.

ok dlg@


# 1.41 25-Jan-2013 dlg

we go to a lot of effort to post the first tx descriptor last, but we
really should be trying to post everything except the flags field in the
first tx descriptor. this shuffles things around so the rest of that first
txd is posted as part of the "everything else" before its flags field.


# 1.40 25-Jan-2013 dlg

the myx_dmamem struct doesnt need a name.


# 1.39 21-Jan-2013 dlg

myx does reads and writes in one direction to packet buffers. lets try
STREAMING them.


# 1.38 15-Jan-2013 dlg

dont use amd64 is currently broken cos it has no
bus_space_write_raw_region_8. disabling it for now.


# 1.37 15-Jan-2013 dlg

use bus_space_write_raw_region_8 on 64bit archs when writing to the rings


# 1.36 14-Jan-2013 dlg

map the registers PREFETCHABLE so things that can do write combining can
try and do write combining like the myx doco likes.


# 1.35 14-Jan-2013 dlg

avoid extra bus_space barriers in the interrupt handler.


# 1.34 14-Jan-2013 dlg

when posting descriptors to the chips rings, avoid going write barrier
write barrier write barrier when using myx_write to post descriptors.

instead let its go write write write barrier by using the appropriate
bus_space write directly followed by a single bus_space barrier.

the story above is mostly true, except that myx wants use to write all the
descriptors except the first, barrier, and then write the first one out to
signale that the chip can proceed.

it is also worth noting that the barriers cover more address space than
what we actually wrote to. this makes the code much simpler, and avoids
generating extra fence operations (which is what barrier functions end up
as on most of our archs) when we wrap around the end of the ring. the
bus_space doco encourages this.

bus_space use was discussed with krw@ kettenis@ deraadt@


# 1.33 14-Jan-2013 dlg

the myri doco suggests its nice to post stuff by filling in everything
in the rings except the first descriptor. once you've written as
much as you can out, then you go back and post the first descriptor
to signal that the chip should go ahead and work.


# 1.32 14-Jan-2013 dlg

;; is a long way of saying ;


# 1.31 29-Nov-2012 brad

Remove setting an initial assumed baudrate upon driver attach which is not
necessarily correct, there might not even be a link when attaching.

ok mikeb@ reyk@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.30 28-Nov-2011 blambert

Fix reversed error-handling gotos in myx_buf_fill(), which would lead to
either an mbuf leak or a NULL pointer dereference.

ok sthen@ claudio@ dlg@
testing claudio@ dlg@


Revision tags: OPENBSD_5_0_BASE
# 1.29 08-Aug-2011 dlg

myx requires the driver pad short ethernet frames to 60 bytes by
adding a descriptor pointing at zeroed bytes onto the end of transmit
chains. i was accounting for this extra descriptor when i was
completing the chain, but not when i was setting this up. this
meant the number of free descriptors kept growing until it overflowed.
at this point the check for space in the ring failed and packets
no longer flowed.

this counts the pad descriptor in the tx chain setup too.

ok deraadt@


# 1.28 23-Jun-2011 dlg

cope with empty rx rings by scheduling a timeout to keep trying until it
gets some packets onto the rings.

also annoying, but the hardware doesnt report empty rings, we have to
handle it ourselves.


# 1.27 23-Jun-2011 dlg

this chip has an annoying "feature" where it cannot report the link
state unless the chip is up and handling packets. while its down
it does not report the link state, so it is unknown.

this tweaks the link state handling, in particular it adds code to
myx_down so it moves the link state to unknown, ie, it correctly
reflects reality.

stupidity pointed out by deraadt


# 1.26 22-Jun-2011 deraadt

reset the tx_count on UP, since it may have been advanced from non-zero
by a previous use
ok claudio


# 1.25 22-Jun-2011 dlg

msi support. this is a complicated one...

ok kettenis@


# 1.24 22-Jun-2011 jsg

another myri10ge device matched by freebsd/linux drivers
ok dlg@


# 1.23 22-Jun-2011 dlg

oops, handle refill like i said i was going to two revisions ago.


# 1.22 22-Jun-2011 deraadt

set the mac address on the chip correctly (repair the byte order)
it now works on sparc64, too
ok dlg


# 1.21 22-Jun-2011 dlg

deraadt plugged his myx into a sparc64 and discovered 3 problems:

1. we want to write raw values to registers all the time, so promote the
myx_raw{read,write} to myx_{read,write} and use them everywhere. get rid
of the raw funcs.
2. i was setting the watermarks on the rx ring before knowhing how big
they were.
3. rxfill in the interrupt handler could lose data if you loop on
sts_isvalid.

almost working now...

"please commit your diff" deraadt@


# 1.20 21-Jun-2011 dlg

do the unaligned dma tests so we can figure out if we need to fall
back to the unaligned firmware. apparently this is only an issue
on the "A" controllers which have been supersceded, but those are
the chips we (openbsd devs) have.


# 1.19 21-Jun-2011 dlg

report the controllers part number. eg, i now know i have a
10G-PCIE-8A-R. dmesg looks like this:

myx0 at pci4 dev 0 function 0 "Myricom Z8E" rev 0x00: apic 1 int 8, model 10G-PCIE-8A-R, address 00:60:dd:47:c6:74


# 1.18 21-Jun-2011 dlg

wire up jumbos properly. the hardware supports up to 9018 bytes off
the wire (9000 + ether header + vlan tag), but has some cool alignment
requirements. if you want to use a single rx ring desc to point at
a jumbo it needs to start on a 4k boundary and be physically
contiguous. to ensure this im pulling frames from the 12k pool and
waiting for arianes diff to ensure mbufs are contig.

direction from andrew gallatin. tested locally.


# 1.17 21-Jun-2011 deraadt

minor cleanups; ok dlg


# 1.16 20-Jun-2011 dlg

make the interrupt handler look more like what the doco suggests. seems to
fix a bad lockup i kept getting.


# 1.15 20-Jun-2011 dlg

dont need debug, the myx_cmd stuff works fine.


# 1.14 20-Jun-2011 dlg

i got myx working!


# 1.13 02-May-2011 chl

Do not check malloc return value against NULL, as M_WAITOK is used.

ok dlg@ krw@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.12 19-May-2010 oga

BUS_DMA_ZERO instead of alloc, map, bzero.

ok krw@


Revision tags: OPENBSD_4_7_BASE
# 1.11 13-Aug-2009 jasper

- consistify cfdriver for the ethernet drivers (0 -> NULL)

ok dlg@


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.10 28-Nov-2008 brad

Eliminate the redundant bits of code for MTU and multicast handling
from the individual drivers now that ether_ioctl() handles this.

Shrinks the i386 kernels by..
RAMDISK - 2176 bytes
RAMDISKB - 1504 bytes
RAMDISKC - 736 bytes

Tested by naddy@/okan@/sthen@/brad@/todd@/jmc@ and lots of users.
Build tested on almost all archs by todd@/brad@

ok naddy@


# 1.9 02-Oct-2008 brad

First step towards cleaning up the Ethernet driver ioctl handling.
Move calling ether_ioctl() from the top of the ioctl function, which
at the moment does absolutely nothing, to the default switch case.
Thus allowing drivers to define their own ioctl handlers and then
falling back on ether_ioctl(). The only functional change this results
in at the moment is having all Ethernet drivers returning the proper
errno of ENOTTY instead of EINVAL/ENXIO when encountering unknown
ioctl's.

Shrinks the i386 kernels by..
RAMDISK - 1024 bytes
RAMDISKB - 1120 bytes
RAMDISKC - 832 bytes

Tested by martin@/jsing@/todd@/brad@
Build tested on almost all archs by todd@/brad@

ok jsing@


# 1.8 10-Sep-2008 blambert

Convert timeout_add() calls using multiples of hz to timeout_add_sec()

Really just the low-hanging fruit of (hopefully) forthcoming timeout
conversions.

ok art@, krw@


Revision tags: OPENBSD_4_4_BASE
# 1.7 23-May-2008 brad

Simplify the combination use of pci_mapreg_type()/pci_mapreg_map() as
suggested by dlg@ awhile ago.

ok dlg@


Revision tags: OPENBSD_4_3_BASE
# 1.6 16-Jan-2008 thib

Set the baudrate with IF_Gbps(10); and remove an
XXX comment now that if_baudrate is 64bits.

ok reyk@


Revision tags: OPENBSD_4_2_BASE
# 1.5 01-Jun-2007 reyk

initialize the rings


# 1.4 31-May-2007 reyk

further improvement of the bus space i/o. firmware loading, booting,
and card initalization works now.

thanks to dlg@ who pointed me to the fact that
bus_space_write_region_N and bus_space_write_raw_region_N use count of
elements vs. size of buffer arguments.


# 1.3 31-May-2007 reyk

enable all debugging messages by default if the driver is compiled with
MYX_DEBUG


# 1.2 31-May-2007 reyk

fix the myx_write function


# 1.1 31-May-2007 reyk

initial bits of a new driver for the Myricom Myri-10G Lanai-Z8E 10Gb
Ethernet chipset. not working yet.

ok dlg@


# 1.118 14-Jul-2023 claudio

struct sleep_state is no longer used, remove it.
Also remove the priority argument to sleep_finish() the code can use
the p_flag P_SINTR flag to know if the signal check is needed or not.
OK cheloha@ kettenis@ mpi@


# 1.117 28-Jun-2023 claudio

First step at removing struct sleep_state.

Pass the timeout and sleep priority not only to sleep_setup() but also
to sleep_finish(). With that sls_timeout and sls_catch can be removed
from struct sleep_state.

The timeout is now setup first thing in sleep_finish() and no longer as
last thing in sleep_setup(). This should not cause a noticeable difference
since the code run between sleep_setup() and sleep_finish() is minimal.

OK kettenis@


Revision tags: OPENBSD_7_1_BASE OPENBSD_7_2_BASE OPENBSD_7_3_BASE
# 1.116 11-Mar-2022 mpi

Constify struct cfattach.


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.115 08-Feb-2021 mpi

Simplify sleep_setup API to two operations in preparation for splitting
the SCHED_LOCK().

Putting a thread on a sleep queue is reduce to the following:

sleep_setup();
/* check condition or release lock */
sleep_finish();

Previous version ok cheloha@, jmatthew@, ok claudio@


# 1.114 17-Jan-2021 dlg

this hardware is fine with BUS_DMA_64BIT mappings.

this raises performance of tcpbench on an m3000 from ~3kpps and
~8MB/s to ~70kpps and ~191MB/s when transmitting, and ~10kpps and
~15MB/s to ~120kpps and 174MB/s when receiving.

i also tested this on a v245 and an m4000 a while back.


# 1.113 12-Dec-2020 jan

Rename the macro MCLGETI to MCLGETL and removes the dead parameter ifp.

OK dlg@, bluhm@
No Opinion mpi@
Not against it claudio@


# 1.112 27-Nov-2020 kevlo

Add initialization of sc_sff_lock rwlock.

ok semarie@


Revision tags: OPENBSD_6_8_BASE
# 1.111 17-Jul-2020 dlg

name the rx rings so systat mb shows them.


# 1.110 17-Jul-2020 dlg

add kstats to myx.

myx is unusually minimal, so there's not a lot of information that
the chip provides. the most interesting is the number of packets
the chip drops cos of a lack of space on the rx rings.


# 1.109 10-Jul-2020 patrick

Change users of IFQ_SET_MAXLEN() and IFQ_IS_EMPTY() to use the "new" API.

ok dlg@ tobhe@


Revision tags: OPENBSD_6_6_BASE OPENBSD_6_7_BASE
# 1.108 03-Jul-2019 dlg

use ifiq_input return values to apply backpressure to rings.


# 1.107 16-Apr-2019 dlg

i2c reads are more reliable a byte at a time.

reading all 256 at a time was a nice idea, but meant page 0xa2 wasnt
appearing like it should. this follows what freebsd does more
closely too.


# 1.106 16-Apr-2019 dlg

make sff page reads work on little endian archs too. like amd64.

some modules seem to need more time when waiting for bytes while here.

hrvoje popovski hit the endian issue


# 1.105 15-Apr-2019 dlg

implement SIOCGIFSFFPAGE so ifconfig can get transceiver info.

myx doesn't allow i2c writes, so you can only read whatever page
the firmware is already pointing at on device 0xa0. if you try to
read another page it will return ENXIO.

tested on a 10G-PCIE-8A-R with an xfp module.


# 1.104 15-Apr-2019 dlg

trim some debug code that printed out the name of a command

the list of commands is going to grow, but the thought of keeping
the list in debug code up to date with it just makes me feel tired.

this prints the command id number instead in the same format we
represent it in the header.


Revision tags: OPENBSD_6_2_BASE OPENBSD_6_3_BASE OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.103 01-Aug-2017 dlg

defer init of the myxmcl pool to mountroot, and enable pool cpu caches.

pool_cache_init cannot be called during autoconf because we cant
be confident about the number of cpus in the machine until the first
run of attaches.

mountroot is after autoconf, and myx already has code that runs
there for the firmware loading.

discussed with deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.102 07-Feb-2017 dlg

move the mbuf pools to m_pool_init and a single global memory limit

this replaces individual calls to pool_init, pool_set_constraints, and
pool_sethardlimit with calls to m_pool_init. m_pool_init inits the
mbuf pools with the mbuf pool allocator, and because of that doesnt
set per pool limits.

ok bluhm@ as part of a larger diff


# 1.101 24-Jan-2017 dlg

add support for multiple transmit ifqueues per network interface.

an ifq to transmit a packet is picked by the current traffic
conditioner (ie, priq or hfsc) by providing an index into an array
of ifqs. by default interfaces get a single ifq but can ask for
more using if_attach_queues().

the vast majority of our drivers still think there's a 1:1 mapping
between interfaces and transmit queues, so their if_start routines
take an ifnet pointer instead of a pointer to the ifqueue struct.
instead of changing all the drivers in the tree, drivers can opt
into using an if_qstart routine and setting the IFXF_MPSAFE flag.
the stack provides a compatability wrapper from the new if_qstart
handler to the previous if_start handlers if IFXF_MPSAFE isnt set.

enabling hfsc on an interface configures it to transmit everything
through the first ifq. any other ifqs are left configured as priq,
but unused, when hfsc is enabled.

getting this in now so everyone can kick the tyres.

ok mpi@ visa@ (who provided some tweaks for cnmac).


# 1.100 22-Jan-2017 dlg

move counting if_opackets next to counting if_obytes in if_enqueue.

this means packets are consistently counted in one place, unlike the
many and various ways that drivers thought they should do it.

ok mpi@ deraadt@


# 1.99 31-Oct-2016 dlg

turns out these chips can handle buffers up to 9400 bytes in length.

raise the mtu to 9380 bytes so we can take advantage of the extra space.

i need to revisit the macro names at some point.


# 1.98 31-Oct-2016 dlg

revert 1.97 where i moved myx to using the system pools

my early revision board doesnt like it at all


# 1.97 28-Oct-2016 dlg

get rid of the custom pool in myx for jumbo frames.

now it asks the mbuf layer for the 9k from its pools.

a question from chris@ made me go look at the chip doco again and i
realised that the chip only requires 4 byte alignment for rx buffers,
no 4k alignment for jumbo buffers.

i also found that the chip is supposed to be able to rx up to 9400
bytes instead of 9000. ill fix that later though.


# 1.96 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


Revision tags: OPENBSD_6_0_BASE
# 1.95 23-May-2016 tedu

remove the function pointer from mbufs. this memory is shared with data
via unions, and we don't want to make it easy to control the target.
instead an integer index into an array of acceptable functions is used.
drivers using custom functions must register them to receive an index.
ok deraadt


# 1.94 13-Apr-2016 mpi

G/C IFQ_SET_READY().


# 1.93 13-Apr-2016 mpi

G/C IFQ_SET_READY().


Revision tags: OPENBSD_5_9_BASE
# 1.92 11-Dec-2015 mpi

Replace mountroothook_establish(9) by config_mountroot(9) a narrower API
similar to config_defer(9).

ok mikeb@, deraadt@


# 1.91 09-Dec-2015 dlg

rework the if_start mpsafe serialisation so it can serialise arbitrary work

work is represented by struct task.

the start routine is now wrapped by a task which is serialised by the
infrastructure. if_start_barrier has been renamed to ifq_barrier and
is now implemented as a task that gets serialised with the start
routine.

this also adds an ifq_restart() function. it serialises a call to
ifq_clr_oactive and calls the start routine again. it exists to
avoid a race that kettenis@ identified in between when a start
routine discovers theres no space left on a ring, and when it calls
ifq_set_oactive. if the txeof side of the driver empties the ring
and calls ifq_clr_oactive in between the above calls in start, the
queue will be marked oactive and the stack will never call the start
routine again.

by serialising the ifq_set_oactive call in the start routine and
ifq_clr_oactive calls we avoid that race.

tested on various nics
ok mpi@


# 1.90 03-Dec-2015 dlg

tell the stack myx_start is mpsafe.

as per the stack commit, the driver changes are:

1. setting ifp->if_xflags = IFXF_MPSAFE
2. only calling if_start() instead of its own start routine
3. clearing IFF_RUNNING before calling if_start_barrier() on its way down
4. only using IFQ_DEQUEUE (not ifq_deq_begin/commit/rollback)


# 1.89 01-Dec-2015 dlg

myx doesnt use atomic.h anymore.


# 1.88 25-Nov-2015 dlg

replace IFF_OACTIVE manipulation with mpsafe operations.

there are two things shared between the network stack and drivers
in the send path: the send queue and the IFF_OACTIVE flag. the send
queue is now protected by a mutex. this diff makes the oactive
functionality mpsafe too.

IFF_OACTIVE is part of if_flags. there are two problems with that.
firstly, if_flags is a short and we dont have any MI atomic operations
to manipulate a short. secondly, while we could make the IFF_OACTIVE
operates mpsafe, all changes to other flags would have to be made
safe at the same time, otherwise a read-modify-write cycle on their
updates could clobber the oactive change.

instead, this moves the oactive mark into struct ifqueue and provides
an API for changing it. there's ifq_set_oactive, ifq_clr_oactive,
and ifq_is_oactive. these are modelled on ifsq_set_oactive,
ifsq_clr_oactive, and ifsq_is_oactive in dragonflybsd.

this diff includes changes to all the drivers manipulating IFF_OACTIVE
to now use the ifsq_{set,clr_is}_oactive API too.

ok kettenis@ mpi@ jmatthew@ deraadt@


# 1.87 24-Nov-2015 dlg

fix tx ring accounting in myx_start.

turns out i was calculating the number of packets (not descriptors)
on the tx ring, and then using that as the free space for descriptors.


# 1.86 19-Nov-2015 dlg

get rid of sc_tx_free and the atomic ops on it in myx_start and myx_txeof.

myx_start calculates the free space by reading the consumer index
and doing some maths, which lets us avoid the interlocked cpu ops.


# 1.85 25-Oct-2015 mpi

arp_ifinit() is no longer needed.


# 1.84 29-Sep-2015 dlg

get rid of the mutex between access to the status block and myx_down

myx is unusual in that it has an explicit command to shut down the
chip that gets an interrupt when it's done. so myx_down sends the
command and has to sleep until it gets that interrupt. this moves
to using a single int to represent that state (so loads and stores
are atomic), and sleep_setup/sleep_finish in myx_down to wait for
it to change.

this has been running in production at work for a few months now
tested by chris@


# 1.83 01-Sep-2015 deraadt

free() firmware with right len; ok dlg


# 1.82 15-Aug-2015 dlg

do the global tx free accounting in myx_start with a single atomic op
instead of one per packet.

seems to let me send packets a little faster.


# 1.81 15-Aug-2015 dlg

rework the tx path to use a ring to keep track of dmamaps/mbufs.

this removes the myx_buf structure and uses myx_slot instead. theyre
the same expcet slots dont have list entry structures, so theyre
smaller.

this cuts out four mutex ops per packet out of the tx handling.
just have to get rid of the atomic op per packet in myx_start now.


# 1.80 14-Aug-2015 dlg

move to a per rx ring timeout for refilling empty rings.

this lets me get rid of the locking around the refilling of the rx ring.

the timeout only runs refill if the rx ring is empty. we know rxeof
wont try and refill it in that situation because there's no packets
on the ring so we wont get interrupts for it. therefore we dont
need to lock between the timeout and rxeof cos they cant run at the
same time.


# 1.79 14-Aug-2015 dlg

rework how we track the packets on the rx rings.

originally there were two mutex protected lists for rx packets, a
list of free packets, and a list of packets that were on the ring.
filling the ring popped packets off the free list, attached an mbuf
and dmamapped it, and pushed it onto the list of active packets.
the hw fills packets in order, so on rx completion we'd pop packets
the active list, unmap the mbuf and shove it up the stack before
putting the packet on the free list.

the problem with the lists is that every rx ring operation resulted
in two mutex ops. so 4 mutex ops per packet after you do both fill
and rxeof.

this replaces the mutexed lists with rings that shadow the hardware
rings. filling the rx ring pushes a producer index along, while
rxeof chases it with a consumer. because we know only one thing can
do either of those tasks at a time, we can get away with not using
atomic ops for them.

there's more to be done, but this is a good first step.


Revision tags: OPENBSD_5_8_BASE
# 1.78 24-Jun-2015 mpi

Increment if_ipackets in if_input().

Note that pseudo-drivers not using if_input() are not affected by this
conversion.

ok mikeb@, kettenis@, claudio@, dlg@


# 1.77 17-May-2015 chris

We don't need KERNEL_LOCK() around if_input() anymore, as if_input() has
appropriate locking around bpf now.

ok dlg@


# 1.76 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.75 20-Feb-2015 chris

Now that if_input() is a thing, use it

ok dlg@


# 1.74 18-Feb-2015 dlg

myri employees and their drivers for linux and solaris have repeatedly
told me that if you're going to rx into buffers greater than 4k in
size, they have to be aligned to a 4k boundary.

the mru of this chip is 9k, but ive been using the 12k mcl pool to
provide the alignment. however, if we move to putting 8 items on a
pool page there'll be enough slack space in the mcl12k pool pages
to allow item colouring, which in turn will break the chip requirement
above. in practice the chips i have seem to work fine with unaligned
buffers, but i dont want to risk breaking early revision chips.

this moves myx to using a private pool for allocating clusters for
the big rx ring. the item size is 9k, but we specify a 4k alignment
so every item we get out of it will be correct for the chip.


# 1.73 18-Feb-2015 dlg

enable pcie relaxed transaction ordering and bump the max payload
size up to 4k.

found while reading someone elses driver.


# 1.72 22-Dec-2014 tedu

unifdef INET


# 1.71 28-Oct-2014 dlg

the if_rxring accounting would get screwed up if the first mbuf to
be put on the ring couldnt be allocated.

this pulls the code that puts the mbufs on the ring out of myx_rx_fill
so it can return early if firstmb cant be allocated, which puts it
in the right place to return unused slots to the if_rxring.

this means myx rx wont lock up if you're DoSsed to the point where
you exhaust your mbuf pools and cant allocate mbufs for the ring.

ok jmatthew@


# 1.70 04-Oct-2014 dlg

replace mutexes to serialise the operations on the flag that restricts
the number of contexts that are refilling the rx rings with atomic
ops.

this is borrowed from code i wrote for the scsi midlayer but cant
put in yet because i havent got atomic.h up to scrach on all archs
yet. the archs myx runs on do have enough atomic.h to be fine though.


# 1.69 03-Oct-2014 dlg

refill the rx ring in myx_rxeof, not much later at the end of myx_intr.


# 1.68 03-Oct-2014 dlg

in rxeof, instead of taking the biglock on every packet to call bpf
and ether_input, queue all the mbufs onto an mbuf_list on the stack
and then take the biglock once outside the loop.


# 1.67 03-Oct-2014 dlg

we dont need the kernel lock to call bus_dmamap_load and unload thanks
to ketenis.

move the if_ipacket and if_opacket increments out of biglock too.
theyre only updated from the interrupt handler, which is only run
on a single cpu so there's no chance of the update racing. everywhere
else only reads them.


# 1.66 03-Oct-2014 dlg

dont need to hold the kernel lock to call MCLGETI and m_freem now.


# 1.65 03-Oct-2014 dlg

dont take the kernel lock on every interrupt in case we might change
the link state or to clear OACTIVE, just take it when we know we
really are going to do those things.


# 1.64 14-Sep-2014 jsg

remove uneeded proc.h includes
ok mpi@ kspillner@


# 1.63 19-Aug-2014 dlg

in myx_start, replace

while (space) {
IFQ_POLL;
myx_dequeue(free descr);
IFQ_DEQUEUE;
etc;
}

with

while (space && myx_dequeue(free descr)) {
IFQ_DEQUEUE;
etc;
}


# 1.62 18-Aug-2014 dlg

dont rely on mbuf.h to provide pool.h.

ok miod@, who has offerred to help with any MD fallout
ok guenther@


Revision tags: OPENBSD_5_6_BASE
# 1.61 12-Jul-2014 tedu

add a size argument to free. will be used soon, but for now default to 0.
after discussions with beck deraadt kettenis.


# 1.60 10-Jul-2014 dlg

rings that dont rx packets dont need to be refilled.


# 1.59 08-Jul-2014 dlg

cut things that relied on mclgeti for rx ring accounting/restriction over
to using if_rxr.

cut the reporting systat did over to the rxr ioctl.

tested as much as i can on alpha, amd64, and sparc64.
mpi@ has run it on macppc.
ok mpi@


# 1.58 17-Jun-2014 dlg

whitespace fix.

im sick of fixing this by hand on all my boxes while hacking on
other stuff and having it pollute my diffs.

no functional change.


# 1.57 24-Mar-2014 dlg

nothing after the irq ack posting relies on it being ordered.


Revision tags: OPENBSD_5_5_BASE
# 1.56 10-Feb-2014 dlg

the mac addresses you program with MYXCMD_SET_MCASTGROUP are in a different
format to the one used for MYXCMD_SET_LLADDR. for reasons.

this lets ospf work if you dont happen to have PROMISC enabled on your
interface like my production firewalls happen to have, which is why i
never noticed this before.


# 1.55 05-Feb-2014 dlg

after running myx(4) without biglock in production for a few days
i discovered that there's a race between the interrupt code and
myx_start which causes the count of free tx descriptors to get
distorted, which eventually leads to a permanent setting of
IFF_OACTIVE, which in turn prevents the driver from transmitting
packets.

fixing that went horribly wrong when i then discovered that there's
a race between the interrupt handler and myx_down, where the interrupt
can tell myx_down to wake up and free all the rings while the
interrupt handler is still looking at them. free panics for all.

this moves the handling of the tx free count under the biglock (for
now), and moves myx_up and myx_down to managing a "driver state"
variable independantly of the IFF_UP and IFF_RUNNING flags, and
very very careful reordering of the checks of that state variable
and the hardware state.

as a bonus we get to avoid excessive calls to myx_txeof and myx_rxeof
in the isr, and less stuff checked unconditionally. on the other
hand, the sc_state handling added some more checks so it might not
be a win overall.

tested on smp sparc64 with msi and nonmsi interrupts, and on amd64 smp
in production again.


# 1.54 31-Jan-2014 dlg

sc_function is set, but never used for anything useful. clean it up...


# 1.53 31-Jan-2014 dlg

sc_lladdr is never used, so we can get the space in the sc back.


# 1.52 23-Jan-2014 dlg

a lot of people have pointed out to me that taking a lock just to read an
int isnt necessary.


# 1.51 23-Jan-2014 dlg

factor the mutex/bus_space handling of the sts block out.


# 1.50 21-Jan-2014 dlg

introduce fine grained locking.

this doesnt give up the big lock coming from process context, only from
the interrupt side. it is excessively careful about when it takes
the big lock again. notably it goes to a lot of effort to not hold
a mutex while calling into other subsystems or before taking the
big lock.

ive been hitting it as hard as i can without problems.

intensly read by mpi@
ok claudio@ kettenis@


# 1.49 19-Jan-2014 dlg

white space fix


# 1.48 19-Jan-2014 dlg

introduce fine grained locking around the lists of packet handlers
myx maintains. this moves it away from relying on splnet to protect
them.


# 1.47 19-Jan-2014 dlg

hwflags is never used, so clean it up


# 1.46 19-Jan-2014 dlg

replace bcmp with memcmp


# 1.45 19-Jan-2014 dlg

bcopy to memcpy


# 1.44 19-Jan-2014 dlg

replace bzero with memset.


# 1.43 19-Jan-2014 dlg

all 64bit archs myx runs on support bus_space 8 things because of work i
did at n2k13.


Revision tags: OPENBSD_5_3_BASE OPENBSD_5_4_BASE
# 1.42 29-Jan-2013 brad

- Set ENETRESET within myx_ioctl() instead of calling myx_iff() directly, to be
consistent with other drivers.
- Clear IFF_ALLMULTI flag early and at the top of myx_iff().
- Set IFF_ALLMULTI when in promisc mode.

ok dlg@


# 1.41 25-Jan-2013 dlg

we go to a lot of effort to post the first tx descriptor last, but we
really should be trying to post everything except the flags field in the
first tx descriptor. this shuffles things around so the rest of that first
txd is posted as part of the "everything else" before its flags field.


# 1.40 25-Jan-2013 dlg

the myx_dmamem struct doesnt need a name.


# 1.39 21-Jan-2013 dlg

myx does reads and writes in one direction to packet buffers. lets try
STREAMING them.


# 1.38 15-Jan-2013 dlg

dont use amd64 is currently broken cos it has no
bus_space_write_raw_region_8. disabling it for now.


# 1.37 15-Jan-2013 dlg

use bus_space_write_raw_region_8 on 64bit archs when writing to the rings


# 1.36 14-Jan-2013 dlg

map the registers PREFETCHABLE so things that can do write combining can
try and do write combining like the myx doco likes.


# 1.35 14-Jan-2013 dlg

avoid extra bus_space barriers in the interrupt handler.


# 1.34 14-Jan-2013 dlg

when posting descriptors to the chips rings, avoid going write barrier
write barrier write barrier when using myx_write to post descriptors.

instead let its go write write write barrier by using the appropriate
bus_space write directly followed by a single bus_space barrier.

the story above is mostly true, except that myx wants use to write all the
descriptors except the first, barrier, and then write the first one out to
signale that the chip can proceed.

it is also worth noting that the barriers cover more address space than
what we actually wrote to. this makes the code much simpler, and avoids
generating extra fence operations (which is what barrier functions end up
as on most of our archs) when we wrap around the end of the ring. the
bus_space doco encourages this.

bus_space use was discussed with krw@ kettenis@ deraadt@


# 1.33 14-Jan-2013 dlg

the myri doco suggests its nice to post stuff by filling in everything
in the rings except the first descriptor. once you've written as
much as you can out, then you go back and post the first descriptor
to signal that the chip should go ahead and work.


# 1.32 14-Jan-2013 dlg

;; is a long way of saying ;


# 1.31 29-Nov-2012 brad

Remove setting an initial assumed baudrate upon driver attach which is not
necessarily correct, there might not even be a link when attaching.

ok mikeb@ reyk@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.30 28-Nov-2011 blambert

Fix reversed error-handling gotos in myx_buf_fill(), which would lead to
either an mbuf leak or a NULL pointer dereference.

ok sthen@ claudio@ dlg@
testing claudio@ dlg@


Revision tags: OPENBSD_5_0_BASE
# 1.29 08-Aug-2011 dlg

myx requires the driver pad short ethernet frames to 60 bytes by
adding a descriptor pointing at zeroed bytes onto the end of transmit
chains. i was accounting for this extra descriptor when i was
completing the chain, but not when i was setting this up. this
meant the number of free descriptors kept growing until it overflowed.
at this point the check for space in the ring failed and packets
no longer flowed.

this counts the pad descriptor in the tx chain setup too.

ok deraadt@


# 1.28 23-Jun-2011 dlg

cope with empty rx rings by scheduling a timeout to keep trying until it
gets some packets onto the rings.

also annoying, but the hardware doesnt report empty rings, we have to
handle it ourselves.


# 1.27 23-Jun-2011 dlg

this chip has an annoying "feature" where it cannot report the link
state unless the chip is up and handling packets. while its down
it does not report the link state, so it is unknown.

this tweaks the link state handling, in particular it adds code to
myx_down so it moves the link state to unknown, ie, it correctly
reflects reality.

stupidity pointed out by deraadt


# 1.26 22-Jun-2011 deraadt

reset the tx_count on UP, since it may have been advanced from non-zero
by a previous use
ok claudio


# 1.25 22-Jun-2011 dlg

msi support. this is a complicated one...

ok kettenis@


# 1.24 22-Jun-2011 jsg

another myri10ge device matched by freebsd/linux drivers
ok dlg@


# 1.23 22-Jun-2011 dlg

oops, handle refill like i said i was going to two revisions ago.


# 1.22 22-Jun-2011 deraadt

set the mac address on the chip correctly (repair the byte order)
it now works on sparc64, too
ok dlg


# 1.21 22-Jun-2011 dlg

deraadt plugged his myx into a sparc64 and discovered 3 problems:

1. we want to write raw values to registers all the time, so promote the
myx_raw{read,write} to myx_{read,write} and use them everywhere. get rid
of the raw funcs.
2. i was setting the watermarks on the rx ring before knowhing how big
they were.
3. rxfill in the interrupt handler could lose data if you loop on
sts_isvalid.

almost working now...

"please commit your diff" deraadt@


# 1.20 21-Jun-2011 dlg

do the unaligned dma tests so we can figure out if we need to fall
back to the unaligned firmware. apparently this is only an issue
on the "A" controllers which have been supersceded, but those are
the chips we (openbsd devs) have.


# 1.19 21-Jun-2011 dlg

report the controllers part number. eg, i now know i have a
10G-PCIE-8A-R. dmesg looks like this:

myx0 at pci4 dev 0 function 0 "Myricom Z8E" rev 0x00: apic 1 int 8, model 10G-PCIE-8A-R, address 00:60:dd:47:c6:74


# 1.18 21-Jun-2011 dlg

wire up jumbos properly. the hardware supports up to 9018 bytes off
the wire (9000 + ether header + vlan tag), but has some cool alignment
requirements. if you want to use a single rx ring desc to point at
a jumbo it needs to start on a 4k boundary and be physically
contiguous. to ensure this im pulling frames from the 12k pool and
waiting for arianes diff to ensure mbufs are contig.

direction from andrew gallatin. tested locally.


# 1.17 21-Jun-2011 deraadt

minor cleanups; ok dlg


# 1.16 20-Jun-2011 dlg

make the interrupt handler look more like what the doco suggests. seems to
fix a bad lockup i kept getting.


# 1.15 20-Jun-2011 dlg

dont need debug, the myx_cmd stuff works fine.


# 1.14 20-Jun-2011 dlg

i got myx working!


# 1.13 02-May-2011 chl

Do not check malloc return value against NULL, as M_WAITOK is used.

ok dlg@ krw@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.12 19-May-2010 oga

BUS_DMA_ZERO instead of alloc, map, bzero.

ok krw@


Revision tags: OPENBSD_4_7_BASE
# 1.11 13-Aug-2009 jasper

- consistify cfdriver for the ethernet drivers (0 -> NULL)

ok dlg@


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.10 28-Nov-2008 brad

Eliminate the redundant bits of code for MTU and multicast handling
from the individual drivers now that ether_ioctl() handles this.

Shrinks the i386 kernels by..
RAMDISK - 2176 bytes
RAMDISKB - 1504 bytes
RAMDISKC - 736 bytes

Tested by naddy@/okan@/sthen@/brad@/todd@/jmc@ and lots of users.
Build tested on almost all archs by todd@/brad@

ok naddy@


# 1.9 02-Oct-2008 brad

First step towards cleaning up the Ethernet driver ioctl handling.
Move calling ether_ioctl() from the top of the ioctl function, which
at the moment does absolutely nothing, to the default switch case.
Thus allowing drivers to define their own ioctl handlers and then
falling back on ether_ioctl(). The only functional change this results
in at the moment is having all Ethernet drivers returning the proper
errno of ENOTTY instead of EINVAL/ENXIO when encountering unknown
ioctl's.

Shrinks the i386 kernels by..
RAMDISK - 1024 bytes
RAMDISKB - 1120 bytes
RAMDISKC - 832 bytes

Tested by martin@/jsing@/todd@/brad@
Build tested on almost all archs by todd@/brad@

ok jsing@


# 1.8 10-Sep-2008 blambert

Convert timeout_add() calls using multiples of hz to timeout_add_sec()

Really just the low-hanging fruit of (hopefully) forthcoming timeout
conversions.

ok art@, krw@


Revision tags: OPENBSD_4_4_BASE
# 1.7 23-May-2008 brad

Simplify the combination use of pci_mapreg_type()/pci_mapreg_map() as
suggested by dlg@ awhile ago.

ok dlg@


Revision tags: OPENBSD_4_3_BASE
# 1.6 16-Jan-2008 thib

Set the baudrate with IF_Gbps(10); and remove an
XXX comment now that if_baudrate is 64bits.

ok reyk@


Revision tags: OPENBSD_4_2_BASE
# 1.5 01-Jun-2007 reyk

initialize the rings


# 1.4 31-May-2007 reyk

further improvement of the bus space i/o. firmware loading, booting,
and card initalization works now.

thanks to dlg@ who pointed me to the fact that
bus_space_write_region_N and bus_space_write_raw_region_N use count of
elements vs. size of buffer arguments.


# 1.3 31-May-2007 reyk

enable all debugging messages by default if the driver is compiled with
MYX_DEBUG


# 1.2 31-May-2007 reyk

fix the myx_write function


# 1.1 31-May-2007 reyk

initial bits of a new driver for the Myricom Myri-10G Lanai-Z8E 10Gb
Ethernet chipset. not working yet.

ok dlg@


# 1.117 28-Jun-2023 claudio

First step at removing struct sleep_state.

Pass the timeout and sleep priority not only to sleep_setup() but also
to sleep_finish(). With that sls_timeout and sls_catch can be removed
from struct sleep_state.

The timeout is now setup first thing in sleep_finish() and no longer as
last thing in sleep_setup(). This should not cause a noticeable difference
since the code run between sleep_setup() and sleep_finish() is minimal.

OK kettenis@


Revision tags: OPENBSD_7_1_BASE OPENBSD_7_2_BASE OPENBSD_7_3_BASE
# 1.116 11-Mar-2022 mpi

Constify struct cfattach.


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.115 08-Feb-2021 mpi

Simplify sleep_setup API to two operations in preparation for splitting
the SCHED_LOCK().

Putting a thread on a sleep queue is reduce to the following:

sleep_setup();
/* check condition or release lock */
sleep_finish();

Previous version ok cheloha@, jmatthew@, ok claudio@


# 1.114 17-Jan-2021 dlg

this hardware is fine with BUS_DMA_64BIT mappings.

this raises performance of tcpbench on an m3000 from ~3kpps and
~8MB/s to ~70kpps and ~191MB/s when transmitting, and ~10kpps and
~15MB/s to ~120kpps and 174MB/s when receiving.

i also tested this on a v245 and an m4000 a while back.


# 1.113 12-Dec-2020 jan

Rename the macro MCLGETI to MCLGETL and removes the dead parameter ifp.

OK dlg@, bluhm@
No Opinion mpi@
Not against it claudio@


# 1.112 27-Nov-2020 kevlo

Add initialization of sc_sff_lock rwlock.

ok semarie@


Revision tags: OPENBSD_6_8_BASE
# 1.111 17-Jul-2020 dlg

name the rx rings so systat mb shows them.


# 1.110 17-Jul-2020 dlg

add kstats to myx.

myx is unusually minimal, so there's not a lot of information that
the chip provides. the most interesting is the number of packets
the chip drops cos of a lack of space on the rx rings.


# 1.109 10-Jul-2020 patrick

Change users of IFQ_SET_MAXLEN() and IFQ_IS_EMPTY() to use the "new" API.

ok dlg@ tobhe@


Revision tags: OPENBSD_6_6_BASE OPENBSD_6_7_BASE
# 1.108 03-Jul-2019 dlg

use ifiq_input return values to apply backpressure to rings.


# 1.107 16-Apr-2019 dlg

i2c reads are more reliable a byte at a time.

reading all 256 at a time was a nice idea, but meant page 0xa2 wasnt
appearing like it should. this follows what freebsd does more
closely too.


# 1.106 16-Apr-2019 dlg

make sff page reads work on little endian archs too. like amd64.

some modules seem to need more time when waiting for bytes while here.

hrvoje popovski hit the endian issue


# 1.105 15-Apr-2019 dlg

implement SIOCGIFSFFPAGE so ifconfig can get transceiver info.

myx doesn't allow i2c writes, so you can only read whatever page
the firmware is already pointing at on device 0xa0. if you try to
read another page it will return ENXIO.

tested on a 10G-PCIE-8A-R with an xfp module.


# 1.104 15-Apr-2019 dlg

trim some debug code that printed out the name of a command

the list of commands is going to grow, but the thought of keeping
the list in debug code up to date with it just makes me feel tired.

this prints the command id number instead in the same format we
represent it in the header.


Revision tags: OPENBSD_6_2_BASE OPENBSD_6_3_BASE OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.103 01-Aug-2017 dlg

defer init of the myxmcl pool to mountroot, and enable pool cpu caches.

pool_cache_init cannot be called during autoconf because we cant
be confident about the number of cpus in the machine until the first
run of attaches.

mountroot is after autoconf, and myx already has code that runs
there for the firmware loading.

discussed with deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.102 07-Feb-2017 dlg

move the mbuf pools to m_pool_init and a single global memory limit

this replaces individual calls to pool_init, pool_set_constraints, and
pool_sethardlimit with calls to m_pool_init. m_pool_init inits the
mbuf pools with the mbuf pool allocator, and because of that doesnt
set per pool limits.

ok bluhm@ as part of a larger diff


# 1.101 24-Jan-2017 dlg

add support for multiple transmit ifqueues per network interface.

an ifq to transmit a packet is picked by the current traffic
conditioner (ie, priq or hfsc) by providing an index into an array
of ifqs. by default interfaces get a single ifq but can ask for
more using if_attach_queues().

the vast majority of our drivers still think there's a 1:1 mapping
between interfaces and transmit queues, so their if_start routines
take an ifnet pointer instead of a pointer to the ifqueue struct.
instead of changing all the drivers in the tree, drivers can opt
into using an if_qstart routine and setting the IFXF_MPSAFE flag.
the stack provides a compatability wrapper from the new if_qstart
handler to the previous if_start handlers if IFXF_MPSAFE isnt set.

enabling hfsc on an interface configures it to transmit everything
through the first ifq. any other ifqs are left configured as priq,
but unused, when hfsc is enabled.

getting this in now so everyone can kick the tyres.

ok mpi@ visa@ (who provided some tweaks for cnmac).


# 1.100 22-Jan-2017 dlg

move counting if_opackets next to counting if_obytes in if_enqueue.

this means packets are consistently counted in one place, unlike the
many and various ways that drivers thought they should do it.

ok mpi@ deraadt@


# 1.99 31-Oct-2016 dlg

turns out these chips can handle buffers up to 9400 bytes in length.

raise the mtu to 9380 bytes so we can take advantage of the extra space.

i need to revisit the macro names at some point.


# 1.98 31-Oct-2016 dlg

revert 1.97 where i moved myx to using the system pools

my early revision board doesnt like it at all


# 1.97 28-Oct-2016 dlg

get rid of the custom pool in myx for jumbo frames.

now it asks the mbuf layer for the 9k from its pools.

a question from chris@ made me go look at the chip doco again and i
realised that the chip only requires 4 byte alignment for rx buffers,
no 4k alignment for jumbo buffers.

i also found that the chip is supposed to be able to rx up to 9400
bytes instead of 9000. ill fix that later though.


# 1.96 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


Revision tags: OPENBSD_6_0_BASE
# 1.95 23-May-2016 tedu

remove the function pointer from mbufs. this memory is shared with data
via unions, and we don't want to make it easy to control the target.
instead an integer index into an array of acceptable functions is used.
drivers using custom functions must register them to receive an index.
ok deraadt


# 1.94 13-Apr-2016 mpi

G/C IFQ_SET_READY().


# 1.93 13-Apr-2016 mpi

G/C IFQ_SET_READY().


Revision tags: OPENBSD_5_9_BASE
# 1.92 11-Dec-2015 mpi

Replace mountroothook_establish(9) by config_mountroot(9) a narrower API
similar to config_defer(9).

ok mikeb@, deraadt@


# 1.91 09-Dec-2015 dlg

rework the if_start mpsafe serialisation so it can serialise arbitrary work

work is represented by struct task.

the start routine is now wrapped by a task which is serialised by the
infrastructure. if_start_barrier has been renamed to ifq_barrier and
is now implemented as a task that gets serialised with the start
routine.

this also adds an ifq_restart() function. it serialises a call to
ifq_clr_oactive and calls the start routine again. it exists to
avoid a race that kettenis@ identified in between when a start
routine discovers theres no space left on a ring, and when it calls
ifq_set_oactive. if the txeof side of the driver empties the ring
and calls ifq_clr_oactive in between the above calls in start, the
queue will be marked oactive and the stack will never call the start
routine again.

by serialising the ifq_set_oactive call in the start routine and
ifq_clr_oactive calls we avoid that race.

tested on various nics
ok mpi@


# 1.90 03-Dec-2015 dlg

tell the stack myx_start is mpsafe.

as per the stack commit, the driver changes are:

1. setting ifp->if_xflags = IFXF_MPSAFE
2. only calling if_start() instead of its own start routine
3. clearing IFF_RUNNING before calling if_start_barrier() on its way down
4. only using IFQ_DEQUEUE (not ifq_deq_begin/commit/rollback)


# 1.89 01-Dec-2015 dlg

myx doesnt use atomic.h anymore.


# 1.88 25-Nov-2015 dlg

replace IFF_OACTIVE manipulation with mpsafe operations.

there are two things shared between the network stack and drivers
in the send path: the send queue and the IFF_OACTIVE flag. the send
queue is now protected by a mutex. this diff makes the oactive
functionality mpsafe too.

IFF_OACTIVE is part of if_flags. there are two problems with that.
firstly, if_flags is a short and we dont have any MI atomic operations
to manipulate a short. secondly, while we could make the IFF_OACTIVE
operates mpsafe, all changes to other flags would have to be made
safe at the same time, otherwise a read-modify-write cycle on their
updates could clobber the oactive change.

instead, this moves the oactive mark into struct ifqueue and provides
an API for changing it. there's ifq_set_oactive, ifq_clr_oactive,
and ifq_is_oactive. these are modelled on ifsq_set_oactive,
ifsq_clr_oactive, and ifsq_is_oactive in dragonflybsd.

this diff includes changes to all the drivers manipulating IFF_OACTIVE
to now use the ifsq_{set,clr_is}_oactive API too.

ok kettenis@ mpi@ jmatthew@ deraadt@


# 1.87 24-Nov-2015 dlg

fix tx ring accounting in myx_start.

turns out i was calculating the number of packets (not descriptors)
on the tx ring, and then using that as the free space for descriptors.


# 1.86 19-Nov-2015 dlg

get rid of sc_tx_free and the atomic ops on it in myx_start and myx_txeof.

myx_start calculates the free space by reading the consumer index
and doing some maths, which lets us avoid the interlocked cpu ops.


# 1.85 25-Oct-2015 mpi

arp_ifinit() is no longer needed.


# 1.84 29-Sep-2015 dlg

get rid of the mutex between access to the status block and myx_down

myx is unusual in that it has an explicit command to shut down the
chip that gets an interrupt when it's done. so myx_down sends the
command and has to sleep until it gets that interrupt. this moves
to using a single int to represent that state (so loads and stores
are atomic), and sleep_setup/sleep_finish in myx_down to wait for
it to change.

this has been running in production at work for a few months now
tested by chris@


# 1.83 01-Sep-2015 deraadt

free() firmware with right len; ok dlg


# 1.82 15-Aug-2015 dlg

do the global tx free accounting in myx_start with a single atomic op
instead of one per packet.

seems to let me send packets a little faster.


# 1.81 15-Aug-2015 dlg

rework the tx path to use a ring to keep track of dmamaps/mbufs.

this removes the myx_buf structure and uses myx_slot instead. theyre
the same expcet slots dont have list entry structures, so theyre
smaller.

this cuts out four mutex ops per packet out of the tx handling.
just have to get rid of the atomic op per packet in myx_start now.


# 1.80 14-Aug-2015 dlg

move to a per rx ring timeout for refilling empty rings.

this lets me get rid of the locking around the refilling of the rx ring.

the timeout only runs refill if the rx ring is empty. we know rxeof
wont try and refill it in that situation because there's no packets
on the ring so we wont get interrupts for it. therefore we dont
need to lock between the timeout and rxeof cos they cant run at the
same time.


# 1.79 14-Aug-2015 dlg

rework how we track the packets on the rx rings.

originally there were two mutex protected lists for rx packets, a
list of free packets, and a list of packets that were on the ring.
filling the ring popped packets off the free list, attached an mbuf
and dmamapped it, and pushed it onto the list of active packets.
the hw fills packets in order, so on rx completion we'd pop packets
the active list, unmap the mbuf and shove it up the stack before
putting the packet on the free list.

the problem with the lists is that every rx ring operation resulted
in two mutex ops. so 4 mutex ops per packet after you do both fill
and rxeof.

this replaces the mutexed lists with rings that shadow the hardware
rings. filling the rx ring pushes a producer index along, while
rxeof chases it with a consumer. because we know only one thing can
do either of those tasks at a time, we can get away with not using
atomic ops for them.

there's more to be done, but this is a good first step.


Revision tags: OPENBSD_5_8_BASE
# 1.78 24-Jun-2015 mpi

Increment if_ipackets in if_input().

Note that pseudo-drivers not using if_input() are not affected by this
conversion.

ok mikeb@, kettenis@, claudio@, dlg@


# 1.77 17-May-2015 chris

We don't need KERNEL_LOCK() around if_input() anymore, as if_input() has
appropriate locking around bpf now.

ok dlg@


# 1.76 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.75 20-Feb-2015 chris

Now that if_input() is a thing, use it

ok dlg@


# 1.74 18-Feb-2015 dlg

myri employees and their drivers for linux and solaris have repeatedly
told me that if you're going to rx into buffers greater than 4k in
size, they have to be aligned to a 4k boundary.

the mru of this chip is 9k, but ive been using the 12k mcl pool to
provide the alignment. however, if we move to putting 8 items on a
pool page there'll be enough slack space in the mcl12k pool pages
to allow item colouring, which in turn will break the chip requirement
above. in practice the chips i have seem to work fine with unaligned
buffers, but i dont want to risk breaking early revision chips.

this moves myx to using a private pool for allocating clusters for
the big rx ring. the item size is 9k, but we specify a 4k alignment
so every item we get out of it will be correct for the chip.


# 1.73 18-Feb-2015 dlg

enable pcie relaxed transaction ordering and bump the max payload
size up to 4k.

found while reading someone elses driver.


# 1.72 22-Dec-2014 tedu

unifdef INET


# 1.71 28-Oct-2014 dlg

the if_rxring accounting would get screwed up if the first mbuf to
be put on the ring couldnt be allocated.

this pulls the code that puts the mbufs on the ring out of myx_rx_fill
so it can return early if firstmb cant be allocated, which puts it
in the right place to return unused slots to the if_rxring.

this means myx rx wont lock up if you're DoSsed to the point where
you exhaust your mbuf pools and cant allocate mbufs for the ring.

ok jmatthew@


# 1.70 04-Oct-2014 dlg

replace mutexes to serialise the operations on the flag that restricts
the number of contexts that are refilling the rx rings with atomic
ops.

this is borrowed from code i wrote for the scsi midlayer but cant
put in yet because i havent got atomic.h up to scrach on all archs
yet. the archs myx runs on do have enough atomic.h to be fine though.


# 1.69 03-Oct-2014 dlg

refill the rx ring in myx_rxeof, not much later at the end of myx_intr.


# 1.68 03-Oct-2014 dlg

in rxeof, instead of taking the biglock on every packet to call bpf
and ether_input, queue all the mbufs onto an mbuf_list on the stack
and then take the biglock once outside the loop.


# 1.67 03-Oct-2014 dlg

we dont need the kernel lock to call bus_dmamap_load and unload thanks
to ketenis.

move the if_ipacket and if_opacket increments out of biglock too.
theyre only updated from the interrupt handler, which is only run
on a single cpu so there's no chance of the update racing. everywhere
else only reads them.


# 1.66 03-Oct-2014 dlg

dont need to hold the kernel lock to call MCLGETI and m_freem now.


# 1.65 03-Oct-2014 dlg

dont take the kernel lock on every interrupt in case we might change
the link state or to clear OACTIVE, just take it when we know we
really are going to do those things.


# 1.64 14-Sep-2014 jsg

remove uneeded proc.h includes
ok mpi@ kspillner@


# 1.63 19-Aug-2014 dlg

in myx_start, replace

while (space) {
IFQ_POLL;
myx_dequeue(free descr);
IFQ_DEQUEUE;
etc;
}

with

while (space && myx_dequeue(free descr)) {
IFQ_DEQUEUE;
etc;
}


# 1.62 18-Aug-2014 dlg

dont rely on mbuf.h to provide pool.h.

ok miod@, who has offerred to help with any MD fallout
ok guenther@


Revision tags: OPENBSD_5_6_BASE
# 1.61 12-Jul-2014 tedu

add a size argument to free. will be used soon, but for now default to 0.
after discussions with beck deraadt kettenis.


# 1.60 10-Jul-2014 dlg

rings that dont rx packets dont need to be refilled.


# 1.59 08-Jul-2014 dlg

cut things that relied on mclgeti for rx ring accounting/restriction over
to using if_rxr.

cut the reporting systat did over to the rxr ioctl.

tested as much as i can on alpha, amd64, and sparc64.
mpi@ has run it on macppc.
ok mpi@


# 1.58 17-Jun-2014 dlg

whitespace fix.

im sick of fixing this by hand on all my boxes while hacking on
other stuff and having it pollute my diffs.

no functional change.


# 1.57 24-Mar-2014 dlg

nothing after the irq ack posting relies on it being ordered.


Revision tags: OPENBSD_5_5_BASE
# 1.56 10-Feb-2014 dlg

the mac addresses you program with MYXCMD_SET_MCASTGROUP are in a different
format to the one used for MYXCMD_SET_LLADDR. for reasons.

this lets ospf work if you dont happen to have PROMISC enabled on your
interface like my production firewalls happen to have, which is why i
never noticed this before.


# 1.55 05-Feb-2014 dlg

after running myx(4) without biglock in production for a few days
i discovered that there's a race between the interrupt code and
myx_start which causes the count of free tx descriptors to get
distorted, which eventually leads to a permanent setting of
IFF_OACTIVE, which in turn prevents the driver from transmitting
packets.

fixing that went horribly wrong when i then discovered that there's
a race between the interrupt handler and myx_down, where the interrupt
can tell myx_down to wake up and free all the rings while the
interrupt handler is still looking at them. free panics for all.

this moves the handling of the tx free count under the biglock (for
now), and moves myx_up and myx_down to managing a "driver state"
variable independantly of the IFF_UP and IFF_RUNNING flags, and
very very careful reordering of the checks of that state variable
and the hardware state.

as a bonus we get to avoid excessive calls to myx_txeof and myx_rxeof
in the isr, and less stuff checked unconditionally. on the other
hand, the sc_state handling added some more checks so it might not
be a win overall.

tested on smp sparc64 with msi and nonmsi interrupts, and on amd64 smp
in production again.


# 1.54 31-Jan-2014 dlg

sc_function is set, but never used for anything useful. clean it up...


# 1.53 31-Jan-2014 dlg

sc_lladdr is never used, so we can get the space in the sc back.


# 1.52 23-Jan-2014 dlg

a lot of people have pointed out to me that taking a lock just to read an
int isnt necessary.


# 1.51 23-Jan-2014 dlg

factor the mutex/bus_space handling of the sts block out.


# 1.50 21-Jan-2014 dlg

introduce fine grained locking.

this doesnt give up the big lock coming from process context, only from
the interrupt side. it is excessively careful about when it takes
the big lock again. notably it goes to a lot of effort to not hold
a mutex while calling into other subsystems or before taking the
big lock.

ive been hitting it as hard as i can without problems.

intensly read by mpi@
ok claudio@ kettenis@


# 1.49 19-Jan-2014 dlg

white space fix


# 1.48 19-Jan-2014 dlg

introduce fine grained locking around the lists of packet handlers
myx maintains. this moves it away from relying on splnet to protect
them.


# 1.47 19-Jan-2014 dlg

hwflags is never used, so clean it up


# 1.46 19-Jan-2014 dlg

replace bcmp with memcmp


# 1.45 19-Jan-2014 dlg

bcopy to memcpy


# 1.44 19-Jan-2014 dlg

replace bzero with memset.


# 1.43 19-Jan-2014 dlg

all 64bit archs myx runs on support bus_space 8 things because of work i
did at n2k13.


Revision tags: OPENBSD_5_3_BASE OPENBSD_5_4_BASE
# 1.42 29-Jan-2013 brad

- Set ENETRESET within myx_ioctl() instead of calling myx_iff() directly, to be
consistent with other drivers.
- Clear IFF_ALLMULTI flag early and at the top of myx_iff().
- Set IFF_ALLMULTI when in promisc mode.

ok dlg@


# 1.41 25-Jan-2013 dlg

we go to a lot of effort to post the first tx descriptor last, but we
really should be trying to post everything except the flags field in the
first tx descriptor. this shuffles things around so the rest of that first
txd is posted as part of the "everything else" before its flags field.


# 1.40 25-Jan-2013 dlg

the myx_dmamem struct doesnt need a name.


# 1.39 21-Jan-2013 dlg

myx does reads and writes in one direction to packet buffers. lets try
STREAMING them.


# 1.38 15-Jan-2013 dlg

dont use amd64 is currently broken cos it has no
bus_space_write_raw_region_8. disabling it for now.


# 1.37 15-Jan-2013 dlg

use bus_space_write_raw_region_8 on 64bit archs when writing to the rings


# 1.36 14-Jan-2013 dlg

map the registers PREFETCHABLE so things that can do write combining can
try and do write combining like the myx doco likes.


# 1.35 14-Jan-2013 dlg

avoid extra bus_space barriers in the interrupt handler.


# 1.34 14-Jan-2013 dlg

when posting descriptors to the chips rings, avoid going write barrier
write barrier write barrier when using myx_write to post descriptors.

instead let its go write write write barrier by using the appropriate
bus_space write directly followed by a single bus_space barrier.

the story above is mostly true, except that myx wants use to write all the
descriptors except the first, barrier, and then write the first one out to
signale that the chip can proceed.

it is also worth noting that the barriers cover more address space than
what we actually wrote to. this makes the code much simpler, and avoids
generating extra fence operations (which is what barrier functions end up
as on most of our archs) when we wrap around the end of the ring. the
bus_space doco encourages this.

bus_space use was discussed with krw@ kettenis@ deraadt@


# 1.33 14-Jan-2013 dlg

the myri doco suggests its nice to post stuff by filling in everything
in the rings except the first descriptor. once you've written as
much as you can out, then you go back and post the first descriptor
to signal that the chip should go ahead and work.


# 1.32 14-Jan-2013 dlg

;; is a long way of saying ;


# 1.31 29-Nov-2012 brad

Remove setting an initial assumed baudrate upon driver attach which is not
necessarily correct, there might not even be a link when attaching.

ok mikeb@ reyk@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.30 28-Nov-2011 blambert

Fix reversed error-handling gotos in myx_buf_fill(), which would lead to
either an mbuf leak or a NULL pointer dereference.

ok sthen@ claudio@ dlg@
testing claudio@ dlg@


Revision tags: OPENBSD_5_0_BASE
# 1.29 08-Aug-2011 dlg

myx requires the driver pad short ethernet frames to 60 bytes by
adding a descriptor pointing at zeroed bytes onto the end of transmit
chains. i was accounting for this extra descriptor when i was
completing the chain, but not when i was setting this up. this
meant the number of free descriptors kept growing until it overflowed.
at this point the check for space in the ring failed and packets
no longer flowed.

this counts the pad descriptor in the tx chain setup too.

ok deraadt@


# 1.28 23-Jun-2011 dlg

cope with empty rx rings by scheduling a timeout to keep trying until it
gets some packets onto the rings.

also annoying, but the hardware doesnt report empty rings, we have to
handle it ourselves.


# 1.27 23-Jun-2011 dlg

this chip has an annoying "feature" where it cannot report the link
state unless the chip is up and handling packets. while its down
it does not report the link state, so it is unknown.

this tweaks the link state handling, in particular it adds code to
myx_down so it moves the link state to unknown, ie, it correctly
reflects reality.

stupidity pointed out by deraadt


# 1.26 22-Jun-2011 deraadt

reset the tx_count on UP, since it may have been advanced from non-zero
by a previous use
ok claudio


# 1.25 22-Jun-2011 dlg

msi support. this is a complicated one...

ok kettenis@


# 1.24 22-Jun-2011 jsg

another myri10ge device matched by freebsd/linux drivers
ok dlg@


# 1.23 22-Jun-2011 dlg

oops, handle refill like i said i was going to two revisions ago.


# 1.22 22-Jun-2011 deraadt

set the mac address on the chip correctly (repair the byte order)
it now works on sparc64, too
ok dlg


# 1.21 22-Jun-2011 dlg

deraadt plugged his myx into a sparc64 and discovered 3 problems:

1. we want to write raw values to registers all the time, so promote the
myx_raw{read,write} to myx_{read,write} and use them everywhere. get rid
of the raw funcs.
2. i was setting the watermarks on the rx ring before knowhing how big
they were.
3. rxfill in the interrupt handler could lose data if you loop on
sts_isvalid.

almost working now...

"please commit your diff" deraadt@


# 1.20 21-Jun-2011 dlg

do the unaligned dma tests so we can figure out if we need to fall
back to the unaligned firmware. apparently this is only an issue
on the "A" controllers which have been supersceded, but those are
the chips we (openbsd devs) have.


# 1.19 21-Jun-2011 dlg

report the controllers part number. eg, i now know i have a
10G-PCIE-8A-R. dmesg looks like this:

myx0 at pci4 dev 0 function 0 "Myricom Z8E" rev 0x00: apic 1 int 8, model 10G-PCIE-8A-R, address 00:60:dd:47:c6:74


# 1.18 21-Jun-2011 dlg

wire up jumbos properly. the hardware supports up to 9018 bytes off
the wire (9000 + ether header + vlan tag), but has some cool alignment
requirements. if you want to use a single rx ring desc to point at
a jumbo it needs to start on a 4k boundary and be physically
contiguous. to ensure this im pulling frames from the 12k pool and
waiting for arianes diff to ensure mbufs are contig.

direction from andrew gallatin. tested locally.


# 1.17 21-Jun-2011 deraadt

minor cleanups; ok dlg


# 1.16 20-Jun-2011 dlg

make the interrupt handler look more like what the doco suggests. seems to
fix a bad lockup i kept getting.


# 1.15 20-Jun-2011 dlg

dont need debug, the myx_cmd stuff works fine.


# 1.14 20-Jun-2011 dlg

i got myx working!


# 1.13 02-May-2011 chl

Do not check malloc return value against NULL, as M_WAITOK is used.

ok dlg@ krw@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.12 19-May-2010 oga

BUS_DMA_ZERO instead of alloc, map, bzero.

ok krw@


Revision tags: OPENBSD_4_7_BASE
# 1.11 13-Aug-2009 jasper

- consistify cfdriver for the ethernet drivers (0 -> NULL)

ok dlg@


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.10 28-Nov-2008 brad

Eliminate the redundant bits of code for MTU and multicast handling
from the individual drivers now that ether_ioctl() handles this.

Shrinks the i386 kernels by..
RAMDISK - 2176 bytes
RAMDISKB - 1504 bytes
RAMDISKC - 736 bytes

Tested by naddy@/okan@/sthen@/brad@/todd@/jmc@ and lots of users.
Build tested on almost all archs by todd@/brad@

ok naddy@


# 1.9 02-Oct-2008 brad

First step towards cleaning up the Ethernet driver ioctl handling.
Move calling ether_ioctl() from the top of the ioctl function, which
at the moment does absolutely nothing, to the default switch case.
Thus allowing drivers to define their own ioctl handlers and then
falling back on ether_ioctl(). The only functional change this results
in at the moment is having all Ethernet drivers returning the proper
errno of ENOTTY instead of EINVAL/ENXIO when encountering unknown
ioctl's.

Shrinks the i386 kernels by..
RAMDISK - 1024 bytes
RAMDISKB - 1120 bytes
RAMDISKC - 832 bytes

Tested by martin@/jsing@/todd@/brad@
Build tested on almost all archs by todd@/brad@

ok jsing@


# 1.8 10-Sep-2008 blambert

Convert timeout_add() calls using multiples of hz to timeout_add_sec()

Really just the low-hanging fruit of (hopefully) forthcoming timeout
conversions.

ok art@, krw@


Revision tags: OPENBSD_4_4_BASE
# 1.7 23-May-2008 brad

Simplify the combination use of pci_mapreg_type()/pci_mapreg_map() as
suggested by dlg@ awhile ago.

ok dlg@


Revision tags: OPENBSD_4_3_BASE
# 1.6 16-Jan-2008 thib

Set the baudrate with IF_Gbps(10); and remove an
XXX comment now that if_baudrate is 64bits.

ok reyk@


Revision tags: OPENBSD_4_2_BASE
# 1.5 01-Jun-2007 reyk

initialize the rings


# 1.4 31-May-2007 reyk

further improvement of the bus space i/o. firmware loading, booting,
and card initalization works now.

thanks to dlg@ who pointed me to the fact that
bus_space_write_region_N and bus_space_write_raw_region_N use count of
elements vs. size of buffer arguments.


# 1.3 31-May-2007 reyk

enable all debugging messages by default if the driver is compiled with
MYX_DEBUG


# 1.2 31-May-2007 reyk

fix the myx_write function


# 1.1 31-May-2007 reyk

initial bits of a new driver for the Myricom Myri-10G Lanai-Z8E 10Gb
Ethernet chipset. not working yet.

ok dlg@


# 1.116 11-Mar-2022 mpi

Constify struct cfattach.


Revision tags: OPENBSD_6_9_BASE OPENBSD_7_0_BASE
# 1.115 08-Feb-2021 mpi

Simplify sleep_setup API to two operations in preparation for splitting
the SCHED_LOCK().

Putting a thread on a sleep queue is reduce to the following:

sleep_setup();
/* check condition or release lock */
sleep_finish();

Previous version ok cheloha@, jmatthew@, ok claudio@


# 1.114 17-Jan-2021 dlg

this hardware is fine with BUS_DMA_64BIT mappings.

this raises performance of tcpbench on an m3000 from ~3kpps and
~8MB/s to ~70kpps and ~191MB/s when transmitting, and ~10kpps and
~15MB/s to ~120kpps and 174MB/s when receiving.

i also tested this on a v245 and an m4000 a while back.


# 1.113 12-Dec-2020 jan

Rename the macro MCLGETI to MCLGETL and removes the dead parameter ifp.

OK dlg@, bluhm@
No Opinion mpi@
Not against it claudio@


# 1.112 27-Nov-2020 kevlo

Add initialization of sc_sff_lock rwlock.

ok semarie@


Revision tags: OPENBSD_6_8_BASE
# 1.111 17-Jul-2020 dlg

name the rx rings so systat mb shows them.


# 1.110 17-Jul-2020 dlg

add kstats to myx.

myx is unusually minimal, so there's not a lot of information that
the chip provides. the most interesting is the number of packets
the chip drops cos of a lack of space on the rx rings.


# 1.109 10-Jul-2020 patrick

Change users of IFQ_SET_MAXLEN() and IFQ_IS_EMPTY() to use the "new" API.

ok dlg@ tobhe@


Revision tags: OPENBSD_6_6_BASE OPENBSD_6_7_BASE
# 1.108 03-Jul-2019 dlg

use ifiq_input return values to apply backpressure to rings.


# 1.107 16-Apr-2019 dlg

i2c reads are more reliable a byte at a time.

reading all 256 at a time was a nice idea, but meant page 0xa2 wasnt
appearing like it should. this follows what freebsd does more
closely too.


# 1.106 16-Apr-2019 dlg

make sff page reads work on little endian archs too. like amd64.

some modules seem to need more time when waiting for bytes while here.

hrvoje popovski hit the endian issue


# 1.105 15-Apr-2019 dlg

implement SIOCGIFSFFPAGE so ifconfig can get transceiver info.

myx doesn't allow i2c writes, so you can only read whatever page
the firmware is already pointing at on device 0xa0. if you try to
read another page it will return ENXIO.

tested on a 10G-PCIE-8A-R with an xfp module.


# 1.104 15-Apr-2019 dlg

trim some debug code that printed out the name of a command

the list of commands is going to grow, but the thought of keeping
the list in debug code up to date with it just makes me feel tired.

this prints the command id number instead in the same format we
represent it in the header.


Revision tags: OPENBSD_6_2_BASE OPENBSD_6_3_BASE OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.103 01-Aug-2017 dlg

defer init of the myxmcl pool to mountroot, and enable pool cpu caches.

pool_cache_init cannot be called during autoconf because we cant
be confident about the number of cpus in the machine until the first
run of attaches.

mountroot is after autoconf, and myx already has code that runs
there for the firmware loading.

discussed with deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.102 07-Feb-2017 dlg

move the mbuf pools to m_pool_init and a single global memory limit

this replaces individual calls to pool_init, pool_set_constraints, and
pool_sethardlimit with calls to m_pool_init. m_pool_init inits the
mbuf pools with the mbuf pool allocator, and because of that doesnt
set per pool limits.

ok bluhm@ as part of a larger diff


# 1.101 24-Jan-2017 dlg

add support for multiple transmit ifqueues per network interface.

an ifq to transmit a packet is picked by the current traffic
conditioner (ie, priq or hfsc) by providing an index into an array
of ifqs. by default interfaces get a single ifq but can ask for
more using if_attach_queues().

the vast majority of our drivers still think there's a 1:1 mapping
between interfaces and transmit queues, so their if_start routines
take an ifnet pointer instead of a pointer to the ifqueue struct.
instead of changing all the drivers in the tree, drivers can opt
into using an if_qstart routine and setting the IFXF_MPSAFE flag.
the stack provides a compatability wrapper from the new if_qstart
handler to the previous if_start handlers if IFXF_MPSAFE isnt set.

enabling hfsc on an interface configures it to transmit everything
through the first ifq. any other ifqs are left configured as priq,
but unused, when hfsc is enabled.

getting this in now so everyone can kick the tyres.

ok mpi@ visa@ (who provided some tweaks for cnmac).


# 1.100 22-Jan-2017 dlg

move counting if_opackets next to counting if_obytes in if_enqueue.

this means packets are consistently counted in one place, unlike the
many and various ways that drivers thought they should do it.

ok mpi@ deraadt@


# 1.99 31-Oct-2016 dlg

turns out these chips can handle buffers up to 9400 bytes in length.

raise the mtu to 9380 bytes so we can take advantage of the extra space.

i need to revisit the macro names at some point.


# 1.98 31-Oct-2016 dlg

revert 1.97 where i moved myx to using the system pools

my early revision board doesnt like it at all


# 1.97 28-Oct-2016 dlg

get rid of the custom pool in myx for jumbo frames.

now it asks the mbuf layer for the 9k from its pools.

a question from chris@ made me go look at the chip doco again and i
realised that the chip only requires 4 byte alignment for rx buffers,
no 4k alignment for jumbo buffers.

i also found that the chip is supposed to be able to rx up to 9400
bytes instead of 9000. ill fix that later though.


# 1.96 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


Revision tags: OPENBSD_6_0_BASE
# 1.95 23-May-2016 tedu

remove the function pointer from mbufs. this memory is shared with data
via unions, and we don't want to make it easy to control the target.
instead an integer index into an array of acceptable functions is used.
drivers using custom functions must register them to receive an index.
ok deraadt


# 1.94 13-Apr-2016 mpi

G/C IFQ_SET_READY().


# 1.93 13-Apr-2016 mpi

G/C IFQ_SET_READY().


Revision tags: OPENBSD_5_9_BASE
# 1.92 11-Dec-2015 mpi

Replace mountroothook_establish(9) by config_mountroot(9) a narrower API
similar to config_defer(9).

ok mikeb@, deraadt@


# 1.91 09-Dec-2015 dlg

rework the if_start mpsafe serialisation so it can serialise arbitrary work

work is represented by struct task.

the start routine is now wrapped by a task which is serialised by the
infrastructure. if_start_barrier has been renamed to ifq_barrier and
is now implemented as a task that gets serialised with the start
routine.

this also adds an ifq_restart() function. it serialises a call to
ifq_clr_oactive and calls the start routine again. it exists to
avoid a race that kettenis@ identified in between when a start
routine discovers theres no space left on a ring, and when it calls
ifq_set_oactive. if the txeof side of the driver empties the ring
and calls ifq_clr_oactive in between the above calls in start, the
queue will be marked oactive and the stack will never call the start
routine again.

by serialising the ifq_set_oactive call in the start routine and
ifq_clr_oactive calls we avoid that race.

tested on various nics
ok mpi@


# 1.90 03-Dec-2015 dlg

tell the stack myx_start is mpsafe.

as per the stack commit, the driver changes are:

1. setting ifp->if_xflags = IFXF_MPSAFE
2. only calling if_start() instead of its own start routine
3. clearing IFF_RUNNING before calling if_start_barrier() on its way down
4. only using IFQ_DEQUEUE (not ifq_deq_begin/commit/rollback)


# 1.89 01-Dec-2015 dlg

myx doesnt use atomic.h anymore.


# 1.88 25-Nov-2015 dlg

replace IFF_OACTIVE manipulation with mpsafe operations.

there are two things shared between the network stack and drivers
in the send path: the send queue and the IFF_OACTIVE flag. the send
queue is now protected by a mutex. this diff makes the oactive
functionality mpsafe too.

IFF_OACTIVE is part of if_flags. there are two problems with that.
firstly, if_flags is a short and we dont have any MI atomic operations
to manipulate a short. secondly, while we could make the IFF_OACTIVE
operates mpsafe, all changes to other flags would have to be made
safe at the same time, otherwise a read-modify-write cycle on their
updates could clobber the oactive change.

instead, this moves the oactive mark into struct ifqueue and provides
an API for changing it. there's ifq_set_oactive, ifq_clr_oactive,
and ifq_is_oactive. these are modelled on ifsq_set_oactive,
ifsq_clr_oactive, and ifsq_is_oactive in dragonflybsd.

this diff includes changes to all the drivers manipulating IFF_OACTIVE
to now use the ifsq_{set,clr_is}_oactive API too.

ok kettenis@ mpi@ jmatthew@ deraadt@


# 1.87 24-Nov-2015 dlg

fix tx ring accounting in myx_start.

turns out i was calculating the number of packets (not descriptors)
on the tx ring, and then using that as the free space for descriptors.


# 1.86 19-Nov-2015 dlg

get rid of sc_tx_free and the atomic ops on it in myx_start and myx_txeof.

myx_start calculates the free space by reading the consumer index
and doing some maths, which lets us avoid the interlocked cpu ops.


# 1.85 25-Oct-2015 mpi

arp_ifinit() is no longer needed.


# 1.84 29-Sep-2015 dlg

get rid of the mutex between access to the status block and myx_down

myx is unusual in that it has an explicit command to shut down the
chip that gets an interrupt when it's done. so myx_down sends the
command and has to sleep until it gets that interrupt. this moves
to using a single int to represent that state (so loads and stores
are atomic), and sleep_setup/sleep_finish in myx_down to wait for
it to change.

this has been running in production at work for a few months now
tested by chris@


# 1.83 01-Sep-2015 deraadt

free() firmware with right len; ok dlg


# 1.82 15-Aug-2015 dlg

do the global tx free accounting in myx_start with a single atomic op
instead of one per packet.

seems to let me send packets a little faster.


# 1.81 15-Aug-2015 dlg

rework the tx path to use a ring to keep track of dmamaps/mbufs.

this removes the myx_buf structure and uses myx_slot instead. theyre
the same expcet slots dont have list entry structures, so theyre
smaller.

this cuts out four mutex ops per packet out of the tx handling.
just have to get rid of the atomic op per packet in myx_start now.


# 1.80 14-Aug-2015 dlg

move to a per rx ring timeout for refilling empty rings.

this lets me get rid of the locking around the refilling of the rx ring.

the timeout only runs refill if the rx ring is empty. we know rxeof
wont try and refill it in that situation because there's no packets
on the ring so we wont get interrupts for it. therefore we dont
need to lock between the timeout and rxeof cos they cant run at the
same time.


# 1.79 14-Aug-2015 dlg

rework how we track the packets on the rx rings.

originally there were two mutex protected lists for rx packets, a
list of free packets, and a list of packets that were on the ring.
filling the ring popped packets off the free list, attached an mbuf
and dmamapped it, and pushed it onto the list of active packets.
the hw fills packets in order, so on rx completion we'd pop packets
the active list, unmap the mbuf and shove it up the stack before
putting the packet on the free list.

the problem with the lists is that every rx ring operation resulted
in two mutex ops. so 4 mutex ops per packet after you do both fill
and rxeof.

this replaces the mutexed lists with rings that shadow the hardware
rings. filling the rx ring pushes a producer index along, while
rxeof chases it with a consumer. because we know only one thing can
do either of those tasks at a time, we can get away with not using
atomic ops for them.

there's more to be done, but this is a good first step.


Revision tags: OPENBSD_5_8_BASE
# 1.78 24-Jun-2015 mpi

Increment if_ipackets in if_input().

Note that pseudo-drivers not using if_input() are not affected by this
conversion.

ok mikeb@, kettenis@, claudio@, dlg@


# 1.77 17-May-2015 chris

We don't need KERNEL_LOCK() around if_input() anymore, as if_input() has
appropriate locking around bpf now.

ok dlg@


# 1.76 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.75 20-Feb-2015 chris

Now that if_input() is a thing, use it

ok dlg@


# 1.74 18-Feb-2015 dlg

myri employees and their drivers for linux and solaris have repeatedly
told me that if you're going to rx into buffers greater than 4k in
size, they have to be aligned to a 4k boundary.

the mru of this chip is 9k, but ive been using the 12k mcl pool to
provide the alignment. however, if we move to putting 8 items on a
pool page there'll be enough slack space in the mcl12k pool pages
to allow item colouring, which in turn will break the chip requirement
above. in practice the chips i have seem to work fine with unaligned
buffers, but i dont want to risk breaking early revision chips.

this moves myx to using a private pool for allocating clusters for
the big rx ring. the item size is 9k, but we specify a 4k alignment
so every item we get out of it will be correct for the chip.


# 1.73 18-Feb-2015 dlg

enable pcie relaxed transaction ordering and bump the max payload
size up to 4k.

found while reading someone elses driver.


# 1.72 22-Dec-2014 tedu

unifdef INET


# 1.71 28-Oct-2014 dlg

the if_rxring accounting would get screwed up if the first mbuf to
be put on the ring couldnt be allocated.

this pulls the code that puts the mbufs on the ring out of myx_rx_fill
so it can return early if firstmb cant be allocated, which puts it
in the right place to return unused slots to the if_rxring.

this means myx rx wont lock up if you're DoSsed to the point where
you exhaust your mbuf pools and cant allocate mbufs for the ring.

ok jmatthew@


# 1.70 04-Oct-2014 dlg

replace mutexes to serialise the operations on the flag that restricts
the number of contexts that are refilling the rx rings with atomic
ops.

this is borrowed from code i wrote for the scsi midlayer but cant
put in yet because i havent got atomic.h up to scrach on all archs
yet. the archs myx runs on do have enough atomic.h to be fine though.


# 1.69 03-Oct-2014 dlg

refill the rx ring in myx_rxeof, not much later at the end of myx_intr.


# 1.68 03-Oct-2014 dlg

in rxeof, instead of taking the biglock on every packet to call bpf
and ether_input, queue all the mbufs onto an mbuf_list on the stack
and then take the biglock once outside the loop.


# 1.67 03-Oct-2014 dlg

we dont need the kernel lock to call bus_dmamap_load and unload thanks
to ketenis.

move the if_ipacket and if_opacket increments out of biglock too.
theyre only updated from the interrupt handler, which is only run
on a single cpu so there's no chance of the update racing. everywhere
else only reads them.


# 1.66 03-Oct-2014 dlg

dont need to hold the kernel lock to call MCLGETI and m_freem now.


# 1.65 03-Oct-2014 dlg

dont take the kernel lock on every interrupt in case we might change
the link state or to clear OACTIVE, just take it when we know we
really are going to do those things.


# 1.64 14-Sep-2014 jsg

remove uneeded proc.h includes
ok mpi@ kspillner@


# 1.63 19-Aug-2014 dlg

in myx_start, replace

while (space) {
IFQ_POLL;
myx_dequeue(free descr);
IFQ_DEQUEUE;
etc;
}

with

while (space && myx_dequeue(free descr)) {
IFQ_DEQUEUE;
etc;
}


# 1.62 18-Aug-2014 dlg

dont rely on mbuf.h to provide pool.h.

ok miod@, who has offerred to help with any MD fallout
ok guenther@


Revision tags: OPENBSD_5_6_BASE
# 1.61 12-Jul-2014 tedu

add a size argument to free. will be used soon, but for now default to 0.
after discussions with beck deraadt kettenis.


# 1.60 10-Jul-2014 dlg

rings that dont rx packets dont need to be refilled.


# 1.59 08-Jul-2014 dlg

cut things that relied on mclgeti for rx ring accounting/restriction over
to using if_rxr.

cut the reporting systat did over to the rxr ioctl.

tested as much as i can on alpha, amd64, and sparc64.
mpi@ has run it on macppc.
ok mpi@


# 1.58 17-Jun-2014 dlg

whitespace fix.

im sick of fixing this by hand on all my boxes while hacking on
other stuff and having it pollute my diffs.

no functional change.


# 1.57 24-Mar-2014 dlg

nothing after the irq ack posting relies on it being ordered.


Revision tags: OPENBSD_5_5_BASE
# 1.56 10-Feb-2014 dlg

the mac addresses you program with MYXCMD_SET_MCASTGROUP are in a different
format to the one used for MYXCMD_SET_LLADDR. for reasons.

this lets ospf work if you dont happen to have PROMISC enabled on your
interface like my production firewalls happen to have, which is why i
never noticed this before.


# 1.55 05-Feb-2014 dlg

after running myx(4) without biglock in production for a few days
i discovered that there's a race between the interrupt code and
myx_start which causes the count of free tx descriptors to get
distorted, which eventually leads to a permanent setting of
IFF_OACTIVE, which in turn prevents the driver from transmitting
packets.

fixing that went horribly wrong when i then discovered that there's
a race between the interrupt handler and myx_down, where the interrupt
can tell myx_down to wake up and free all the rings while the
interrupt handler is still looking at them. free panics for all.

this moves the handling of the tx free count under the biglock (for
now), and moves myx_up and myx_down to managing a "driver state"
variable independantly of the IFF_UP and IFF_RUNNING flags, and
very very careful reordering of the checks of that state variable
and the hardware state.

as a bonus we get to avoid excessive calls to myx_txeof and myx_rxeof
in the isr, and less stuff checked unconditionally. on the other
hand, the sc_state handling added some more checks so it might not
be a win overall.

tested on smp sparc64 with msi and nonmsi interrupts, and on amd64 smp
in production again.


# 1.54 31-Jan-2014 dlg

sc_function is set, but never used for anything useful. clean it up...


# 1.53 31-Jan-2014 dlg

sc_lladdr is never used, so we can get the space in the sc back.


# 1.52 23-Jan-2014 dlg

a lot of people have pointed out to me that taking a lock just to read an
int isnt necessary.


# 1.51 23-Jan-2014 dlg

factor the mutex/bus_space handling of the sts block out.


# 1.50 21-Jan-2014 dlg

introduce fine grained locking.

this doesnt give up the big lock coming from process context, only from
the interrupt side. it is excessively careful about when it takes
the big lock again. notably it goes to a lot of effort to not hold
a mutex while calling into other subsystems or before taking the
big lock.

ive been hitting it as hard as i can without problems.

intensly read by mpi@
ok claudio@ kettenis@


# 1.49 19-Jan-2014 dlg

white space fix


# 1.48 19-Jan-2014 dlg

introduce fine grained locking around the lists of packet handlers
myx maintains. this moves it away from relying on splnet to protect
them.


# 1.47 19-Jan-2014 dlg

hwflags is never used, so clean it up


# 1.46 19-Jan-2014 dlg

replace bcmp with memcmp


# 1.45 19-Jan-2014 dlg

bcopy to memcpy


# 1.44 19-Jan-2014 dlg

replace bzero with memset.


# 1.43 19-Jan-2014 dlg

all 64bit archs myx runs on support bus_space 8 things because of work i
did at n2k13.


Revision tags: OPENBSD_5_3_BASE OPENBSD_5_4_BASE
# 1.42 29-Jan-2013 brad

- Set ENETRESET within myx_ioctl() instead of calling myx_iff() directly, to be
consistent with other drivers.
- Clear IFF_ALLMULTI flag early and at the top of myx_iff().
- Set IFF_ALLMULTI when in promisc mode.

ok dlg@


# 1.41 25-Jan-2013 dlg

we go to a lot of effort to post the first tx descriptor last, but we
really should be trying to post everything except the flags field in the
first tx descriptor. this shuffles things around so the rest of that first
txd is posted as part of the "everything else" before its flags field.


# 1.40 25-Jan-2013 dlg

the myx_dmamem struct doesnt need a name.


# 1.39 21-Jan-2013 dlg

myx does reads and writes in one direction to packet buffers. lets try
STREAMING them.


# 1.38 15-Jan-2013 dlg

dont use amd64 is currently broken cos it has no
bus_space_write_raw_region_8. disabling it for now.


# 1.37 15-Jan-2013 dlg

use bus_space_write_raw_region_8 on 64bit archs when writing to the rings


# 1.36 14-Jan-2013 dlg

map the registers PREFETCHABLE so things that can do write combining can
try and do write combining like the myx doco likes.


# 1.35 14-Jan-2013 dlg

avoid extra bus_space barriers in the interrupt handler.


# 1.34 14-Jan-2013 dlg

when posting descriptors to the chips rings, avoid going write barrier
write barrier write barrier when using myx_write to post descriptors.

instead let its go write write write barrier by using the appropriate
bus_space write directly followed by a single bus_space barrier.

the story above is mostly true, except that myx wants use to write all the
descriptors except the first, barrier, and then write the first one out to
signale that the chip can proceed.

it is also worth noting that the barriers cover more address space than
what we actually wrote to. this makes the code much simpler, and avoids
generating extra fence operations (which is what barrier functions end up
as on most of our archs) when we wrap around the end of the ring. the
bus_space doco encourages this.

bus_space use was discussed with krw@ kettenis@ deraadt@


# 1.33 14-Jan-2013 dlg

the myri doco suggests its nice to post stuff by filling in everything
in the rings except the first descriptor. once you've written as
much as you can out, then you go back and post the first descriptor
to signal that the chip should go ahead and work.


# 1.32 14-Jan-2013 dlg

;; is a long way of saying ;


# 1.31 29-Nov-2012 brad

Remove setting an initial assumed baudrate upon driver attach which is not
necessarily correct, there might not even be a link when attaching.

ok mikeb@ reyk@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.30 28-Nov-2011 blambert

Fix reversed error-handling gotos in myx_buf_fill(), which would lead to
either an mbuf leak or a NULL pointer dereference.

ok sthen@ claudio@ dlg@
testing claudio@ dlg@


Revision tags: OPENBSD_5_0_BASE
# 1.29 08-Aug-2011 dlg

myx requires the driver pad short ethernet frames to 60 bytes by
adding a descriptor pointing at zeroed bytes onto the end of transmit
chains. i was accounting for this extra descriptor when i was
completing the chain, but not when i was setting this up. this
meant the number of free descriptors kept growing until it overflowed.
at this point the check for space in the ring failed and packets
no longer flowed.

this counts the pad descriptor in the tx chain setup too.

ok deraadt@


# 1.28 23-Jun-2011 dlg

cope with empty rx rings by scheduling a timeout to keep trying until it
gets some packets onto the rings.

also annoying, but the hardware doesnt report empty rings, we have to
handle it ourselves.


# 1.27 23-Jun-2011 dlg

this chip has an annoying "feature" where it cannot report the link
state unless the chip is up and handling packets. while its down
it does not report the link state, so it is unknown.

this tweaks the link state handling, in particular it adds code to
myx_down so it moves the link state to unknown, ie, it correctly
reflects reality.

stupidity pointed out by deraadt


# 1.26 22-Jun-2011 deraadt

reset the tx_count on UP, since it may have been advanced from non-zero
by a previous use
ok claudio


# 1.25 22-Jun-2011 dlg

msi support. this is a complicated one...

ok kettenis@


# 1.24 22-Jun-2011 jsg

another myri10ge device matched by freebsd/linux drivers
ok dlg@


# 1.23 22-Jun-2011 dlg

oops, handle refill like i said i was going to two revisions ago.


# 1.22 22-Jun-2011 deraadt

set the mac address on the chip correctly (repair the byte order)
it now works on sparc64, too
ok dlg


# 1.21 22-Jun-2011 dlg

deraadt plugged his myx into a sparc64 and discovered 3 problems:

1. we want to write raw values to registers all the time, so promote the
myx_raw{read,write} to myx_{read,write} and use them everywhere. get rid
of the raw funcs.
2. i was setting the watermarks on the rx ring before knowhing how big
they were.
3. rxfill in the interrupt handler could lose data if you loop on
sts_isvalid.

almost working now...

"please commit your diff" deraadt@


# 1.20 21-Jun-2011 dlg

do the unaligned dma tests so we can figure out if we need to fall
back to the unaligned firmware. apparently this is only an issue
on the "A" controllers which have been supersceded, but those are
the chips we (openbsd devs) have.


# 1.19 21-Jun-2011 dlg

report the controllers part number. eg, i now know i have a
10G-PCIE-8A-R. dmesg looks like this:

myx0 at pci4 dev 0 function 0 "Myricom Z8E" rev 0x00: apic 1 int 8, model 10G-PCIE-8A-R, address 00:60:dd:47:c6:74


# 1.18 21-Jun-2011 dlg

wire up jumbos properly. the hardware supports up to 9018 bytes off
the wire (9000 + ether header + vlan tag), but has some cool alignment
requirements. if you want to use a single rx ring desc to point at
a jumbo it needs to start on a 4k boundary and be physically
contiguous. to ensure this im pulling frames from the 12k pool and
waiting for arianes diff to ensure mbufs are contig.

direction from andrew gallatin. tested locally.


# 1.17 21-Jun-2011 deraadt

minor cleanups; ok dlg


# 1.16 20-Jun-2011 dlg

make the interrupt handler look more like what the doco suggests. seems to
fix a bad lockup i kept getting.


# 1.15 20-Jun-2011 dlg

dont need debug, the myx_cmd stuff works fine.


# 1.14 20-Jun-2011 dlg

i got myx working!


# 1.13 02-May-2011 chl

Do not check malloc return value against NULL, as M_WAITOK is used.

ok dlg@ krw@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.12 19-May-2010 oga

BUS_DMA_ZERO instead of alloc, map, bzero.

ok krw@


Revision tags: OPENBSD_4_7_BASE
# 1.11 13-Aug-2009 jasper

- consistify cfdriver for the ethernet drivers (0 -> NULL)

ok dlg@


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.10 28-Nov-2008 brad

Eliminate the redundant bits of code for MTU and multicast handling
from the individual drivers now that ether_ioctl() handles this.

Shrinks the i386 kernels by..
RAMDISK - 2176 bytes
RAMDISKB - 1504 bytes
RAMDISKC - 736 bytes

Tested by naddy@/okan@/sthen@/brad@/todd@/jmc@ and lots of users.
Build tested on almost all archs by todd@/brad@

ok naddy@


# 1.9 02-Oct-2008 brad

First step towards cleaning up the Ethernet driver ioctl handling.
Move calling ether_ioctl() from the top of the ioctl function, which
at the moment does absolutely nothing, to the default switch case.
Thus allowing drivers to define their own ioctl handlers and then
falling back on ether_ioctl(). The only functional change this results
in at the moment is having all Ethernet drivers returning the proper
errno of ENOTTY instead of EINVAL/ENXIO when encountering unknown
ioctl's.

Shrinks the i386 kernels by..
RAMDISK - 1024 bytes
RAMDISKB - 1120 bytes
RAMDISKC - 832 bytes

Tested by martin@/jsing@/todd@/brad@
Build tested on almost all archs by todd@/brad@

ok jsing@


# 1.8 10-Sep-2008 blambert

Convert timeout_add() calls using multiples of hz to timeout_add_sec()

Really just the low-hanging fruit of (hopefully) forthcoming timeout
conversions.

ok art@, krw@


Revision tags: OPENBSD_4_4_BASE
# 1.7 23-May-2008 brad

Simplify the combination use of pci_mapreg_type()/pci_mapreg_map() as
suggested by dlg@ awhile ago.

ok dlg@


Revision tags: OPENBSD_4_3_BASE
# 1.6 16-Jan-2008 thib

Set the baudrate with IF_Gbps(10); and remove an
XXX comment now that if_baudrate is 64bits.

ok reyk@


Revision tags: OPENBSD_4_2_BASE
# 1.5 01-Jun-2007 reyk

initialize the rings


# 1.4 31-May-2007 reyk

further improvement of the bus space i/o. firmware loading, booting,
and card initalization works now.

thanks to dlg@ who pointed me to the fact that
bus_space_write_region_N and bus_space_write_raw_region_N use count of
elements vs. size of buffer arguments.


# 1.3 31-May-2007 reyk

enable all debugging messages by default if the driver is compiled with
MYX_DEBUG


# 1.2 31-May-2007 reyk

fix the myx_write function


# 1.1 31-May-2007 reyk

initial bits of a new driver for the Myricom Myri-10G Lanai-Z8E 10Gb
Ethernet chipset. not working yet.

ok dlg@


# 1.115 08-Feb-2021 mpi

Simplify sleep_setup API to two operations in preparation for splitting
the SCHED_LOCK().

Putting a thread on a sleep queue is reduce to the following:

sleep_setup();
/* check condition or release lock */
sleep_finish();

Previous version ok cheloha@, jmatthew@, ok claudio@


# 1.114 17-Jan-2021 dlg

this hardware is fine with BUS_DMA_64BIT mappings.

this raises performance of tcpbench on an m3000 from ~3kpps and
~8MB/s to ~70kpps and ~191MB/s when transmitting, and ~10kpps and
~15MB/s to ~120kpps and 174MB/s when receiving.

i also tested this on a v245 and an m4000 a while back.


# 1.113 12-Dec-2020 jan

Rename the macro MCLGETI to MCLGETL and removes the dead parameter ifp.

OK dlg@, bluhm@
No Opinion mpi@
Not against it claudio@


# 1.112 27-Nov-2020 kevlo

Add initialization of sc_sff_lock rwlock.

ok semarie@


Revision tags: OPENBSD_6_8_BASE
# 1.111 17-Jul-2020 dlg

name the rx rings so systat mb shows them.


# 1.110 17-Jul-2020 dlg

add kstats to myx.

myx is unusually minimal, so there's not a lot of information that
the chip provides. the most interesting is the number of packets
the chip drops cos of a lack of space on the rx rings.


# 1.109 10-Jul-2020 patrick

Change users of IFQ_SET_MAXLEN() and IFQ_IS_EMPTY() to use the "new" API.

ok dlg@ tobhe@


Revision tags: OPENBSD_6_6_BASE OPENBSD_6_7_BASE
# 1.108 03-Jul-2019 dlg

use ifiq_input return values to apply backpressure to rings.


# 1.107 16-Apr-2019 dlg

i2c reads are more reliable a byte at a time.

reading all 256 at a time was a nice idea, but meant page 0xa2 wasnt
appearing like it should. this follows what freebsd does more
closely too.


# 1.106 16-Apr-2019 dlg

make sff page reads work on little endian archs too. like amd64.

some modules seem to need more time when waiting for bytes while here.

hrvoje popovski hit the endian issue


# 1.105 15-Apr-2019 dlg

implement SIOCGIFSFFPAGE so ifconfig can get transceiver info.

myx doesn't allow i2c writes, so you can only read whatever page
the firmware is already pointing at on device 0xa0. if you try to
read another page it will return ENXIO.

tested on a 10G-PCIE-8A-R with an xfp module.


# 1.104 15-Apr-2019 dlg

trim some debug code that printed out the name of a command

the list of commands is going to grow, but the thought of keeping
the list in debug code up to date with it just makes me feel tired.

this prints the command id number instead in the same format we
represent it in the header.


Revision tags: OPENBSD_6_2_BASE OPENBSD_6_3_BASE OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.103 01-Aug-2017 dlg

defer init of the myxmcl pool to mountroot, and enable pool cpu caches.

pool_cache_init cannot be called during autoconf because we cant
be confident about the number of cpus in the machine until the first
run of attaches.

mountroot is after autoconf, and myx already has code that runs
there for the firmware loading.

discussed with deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.102 07-Feb-2017 dlg

move the mbuf pools to m_pool_init and a single global memory limit

this replaces individual calls to pool_init, pool_set_constraints, and
pool_sethardlimit with calls to m_pool_init. m_pool_init inits the
mbuf pools with the mbuf pool allocator, and because of that doesnt
set per pool limits.

ok bluhm@ as part of a larger diff


# 1.101 24-Jan-2017 dlg

add support for multiple transmit ifqueues per network interface.

an ifq to transmit a packet is picked by the current traffic
conditioner (ie, priq or hfsc) by providing an index into an array
of ifqs. by default interfaces get a single ifq but can ask for
more using if_attach_queues().

the vast majority of our drivers still think there's a 1:1 mapping
between interfaces and transmit queues, so their if_start routines
take an ifnet pointer instead of a pointer to the ifqueue struct.
instead of changing all the drivers in the tree, drivers can opt
into using an if_qstart routine and setting the IFXF_MPSAFE flag.
the stack provides a compatability wrapper from the new if_qstart
handler to the previous if_start handlers if IFXF_MPSAFE isnt set.

enabling hfsc on an interface configures it to transmit everything
through the first ifq. any other ifqs are left configured as priq,
but unused, when hfsc is enabled.

getting this in now so everyone can kick the tyres.

ok mpi@ visa@ (who provided some tweaks for cnmac).


# 1.100 22-Jan-2017 dlg

move counting if_opackets next to counting if_obytes in if_enqueue.

this means packets are consistently counted in one place, unlike the
many and various ways that drivers thought they should do it.

ok mpi@ deraadt@


# 1.99 31-Oct-2016 dlg

turns out these chips can handle buffers up to 9400 bytes in length.

raise the mtu to 9380 bytes so we can take advantage of the extra space.

i need to revisit the macro names at some point.


# 1.98 31-Oct-2016 dlg

revert 1.97 where i moved myx to using the system pools

my early revision board doesnt like it at all


# 1.97 28-Oct-2016 dlg

get rid of the custom pool in myx for jumbo frames.

now it asks the mbuf layer for the 9k from its pools.

a question from chris@ made me go look at the chip doco again and i
realised that the chip only requires 4 byte alignment for rx buffers,
no 4k alignment for jumbo buffers.

i also found that the chip is supposed to be able to rx up to 9400
bytes instead of 9000. ill fix that later though.


# 1.96 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


Revision tags: OPENBSD_6_0_BASE
# 1.95 23-May-2016 tedu

remove the function pointer from mbufs. this memory is shared with data
via unions, and we don't want to make it easy to control the target.
instead an integer index into an array of acceptable functions is used.
drivers using custom functions must register them to receive an index.
ok deraadt


# 1.94 13-Apr-2016 mpi

G/C IFQ_SET_READY().


# 1.93 13-Apr-2016 mpi

G/C IFQ_SET_READY().


Revision tags: OPENBSD_5_9_BASE
# 1.92 11-Dec-2015 mpi

Replace mountroothook_establish(9) by config_mountroot(9) a narrower API
similar to config_defer(9).

ok mikeb@, deraadt@


# 1.91 09-Dec-2015 dlg

rework the if_start mpsafe serialisation so it can serialise arbitrary work

work is represented by struct task.

the start routine is now wrapped by a task which is serialised by the
infrastructure. if_start_barrier has been renamed to ifq_barrier and
is now implemented as a task that gets serialised with the start
routine.

this also adds an ifq_restart() function. it serialises a call to
ifq_clr_oactive and calls the start routine again. it exists to
avoid a race that kettenis@ identified in between when a start
routine discovers theres no space left on a ring, and when it calls
ifq_set_oactive. if the txeof side of the driver empties the ring
and calls ifq_clr_oactive in between the above calls in start, the
queue will be marked oactive and the stack will never call the start
routine again.

by serialising the ifq_set_oactive call in the start routine and
ifq_clr_oactive calls we avoid that race.

tested on various nics
ok mpi@


# 1.90 03-Dec-2015 dlg

tell the stack myx_start is mpsafe.

as per the stack commit, the driver changes are:

1. setting ifp->if_xflags = IFXF_MPSAFE
2. only calling if_start() instead of its own start routine
3. clearing IFF_RUNNING before calling if_start_barrier() on its way down
4. only using IFQ_DEQUEUE (not ifq_deq_begin/commit/rollback)


# 1.89 01-Dec-2015 dlg

myx doesnt use atomic.h anymore.


# 1.88 25-Nov-2015 dlg

replace IFF_OACTIVE manipulation with mpsafe operations.

there are two things shared between the network stack and drivers
in the send path: the send queue and the IFF_OACTIVE flag. the send
queue is now protected by a mutex. this diff makes the oactive
functionality mpsafe too.

IFF_OACTIVE is part of if_flags. there are two problems with that.
firstly, if_flags is a short and we dont have any MI atomic operations
to manipulate a short. secondly, while we could make the IFF_OACTIVE
operates mpsafe, all changes to other flags would have to be made
safe at the same time, otherwise a read-modify-write cycle on their
updates could clobber the oactive change.

instead, this moves the oactive mark into struct ifqueue and provides
an API for changing it. there's ifq_set_oactive, ifq_clr_oactive,
and ifq_is_oactive. these are modelled on ifsq_set_oactive,
ifsq_clr_oactive, and ifsq_is_oactive in dragonflybsd.

this diff includes changes to all the drivers manipulating IFF_OACTIVE
to now use the ifsq_{set,clr_is}_oactive API too.

ok kettenis@ mpi@ jmatthew@ deraadt@


# 1.87 24-Nov-2015 dlg

fix tx ring accounting in myx_start.

turns out i was calculating the number of packets (not descriptors)
on the tx ring, and then using that as the free space for descriptors.


# 1.86 19-Nov-2015 dlg

get rid of sc_tx_free and the atomic ops on it in myx_start and myx_txeof.

myx_start calculates the free space by reading the consumer index
and doing some maths, which lets us avoid the interlocked cpu ops.


# 1.85 25-Oct-2015 mpi

arp_ifinit() is no longer needed.


# 1.84 29-Sep-2015 dlg

get rid of the mutex between access to the status block and myx_down

myx is unusual in that it has an explicit command to shut down the
chip that gets an interrupt when it's done. so myx_down sends the
command and has to sleep until it gets that interrupt. this moves
to using a single int to represent that state (so loads and stores
are atomic), and sleep_setup/sleep_finish in myx_down to wait for
it to change.

this has been running in production at work for a few months now
tested by chris@


# 1.83 01-Sep-2015 deraadt

free() firmware with right len; ok dlg


# 1.82 15-Aug-2015 dlg

do the global tx free accounting in myx_start with a single atomic op
instead of one per packet.

seems to let me send packets a little faster.


# 1.81 15-Aug-2015 dlg

rework the tx path to use a ring to keep track of dmamaps/mbufs.

this removes the myx_buf structure and uses myx_slot instead. theyre
the same expcet slots dont have list entry structures, so theyre
smaller.

this cuts out four mutex ops per packet out of the tx handling.
just have to get rid of the atomic op per packet in myx_start now.


# 1.80 14-Aug-2015 dlg

move to a per rx ring timeout for refilling empty rings.

this lets me get rid of the locking around the refilling of the rx ring.

the timeout only runs refill if the rx ring is empty. we know rxeof
wont try and refill it in that situation because there's no packets
on the ring so we wont get interrupts for it. therefore we dont
need to lock between the timeout and rxeof cos they cant run at the
same time.


# 1.79 14-Aug-2015 dlg

rework how we track the packets on the rx rings.

originally there were two mutex protected lists for rx packets, a
list of free packets, and a list of packets that were on the ring.
filling the ring popped packets off the free list, attached an mbuf
and dmamapped it, and pushed it onto the list of active packets.
the hw fills packets in order, so on rx completion we'd pop packets
the active list, unmap the mbuf and shove it up the stack before
putting the packet on the free list.

the problem with the lists is that every rx ring operation resulted
in two mutex ops. so 4 mutex ops per packet after you do both fill
and rxeof.

this replaces the mutexed lists with rings that shadow the hardware
rings. filling the rx ring pushes a producer index along, while
rxeof chases it with a consumer. because we know only one thing can
do either of those tasks at a time, we can get away with not using
atomic ops for them.

there's more to be done, but this is a good first step.


Revision tags: OPENBSD_5_8_BASE
# 1.78 24-Jun-2015 mpi

Increment if_ipackets in if_input().

Note that pseudo-drivers not using if_input() are not affected by this
conversion.

ok mikeb@, kettenis@, claudio@, dlg@


# 1.77 17-May-2015 chris

We don't need KERNEL_LOCK() around if_input() anymore, as if_input() has
appropriate locking around bpf now.

ok dlg@


# 1.76 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.75 20-Feb-2015 chris

Now that if_input() is a thing, use it

ok dlg@


# 1.74 18-Feb-2015 dlg

myri employees and their drivers for linux and solaris have repeatedly
told me that if you're going to rx into buffers greater than 4k in
size, they have to be aligned to a 4k boundary.

the mru of this chip is 9k, but ive been using the 12k mcl pool to
provide the alignment. however, if we move to putting 8 items on a
pool page there'll be enough slack space in the mcl12k pool pages
to allow item colouring, which in turn will break the chip requirement
above. in practice the chips i have seem to work fine with unaligned
buffers, but i dont want to risk breaking early revision chips.

this moves myx to using a private pool for allocating clusters for
the big rx ring. the item size is 9k, but we specify a 4k alignment
so every item we get out of it will be correct for the chip.


# 1.73 18-Feb-2015 dlg

enable pcie relaxed transaction ordering and bump the max payload
size up to 4k.

found while reading someone elses driver.


# 1.72 22-Dec-2014 tedu

unifdef INET


# 1.71 28-Oct-2014 dlg

the if_rxring accounting would get screwed up if the first mbuf to
be put on the ring couldnt be allocated.

this pulls the code that puts the mbufs on the ring out of myx_rx_fill
so it can return early if firstmb cant be allocated, which puts it
in the right place to return unused slots to the if_rxring.

this means myx rx wont lock up if you're DoSsed to the point where
you exhaust your mbuf pools and cant allocate mbufs for the ring.

ok jmatthew@


# 1.70 04-Oct-2014 dlg

replace mutexes to serialise the operations on the flag that restricts
the number of contexts that are refilling the rx rings with atomic
ops.

this is borrowed from code i wrote for the scsi midlayer but cant
put in yet because i havent got atomic.h up to scrach on all archs
yet. the archs myx runs on do have enough atomic.h to be fine though.


# 1.69 03-Oct-2014 dlg

refill the rx ring in myx_rxeof, not much later at the end of myx_intr.


# 1.68 03-Oct-2014 dlg

in rxeof, instead of taking the biglock on every packet to call bpf
and ether_input, queue all the mbufs onto an mbuf_list on the stack
and then take the biglock once outside the loop.


# 1.67 03-Oct-2014 dlg

we dont need the kernel lock to call bus_dmamap_load and unload thanks
to ketenis.

move the if_ipacket and if_opacket increments out of biglock too.
theyre only updated from the interrupt handler, which is only run
on a single cpu so there's no chance of the update racing. everywhere
else only reads them.


# 1.66 03-Oct-2014 dlg

dont need to hold the kernel lock to call MCLGETI and m_freem now.


# 1.65 03-Oct-2014 dlg

dont take the kernel lock on every interrupt in case we might change
the link state or to clear OACTIVE, just take it when we know we
really are going to do those things.


# 1.64 14-Sep-2014 jsg

remove uneeded proc.h includes
ok mpi@ kspillner@


# 1.63 19-Aug-2014 dlg

in myx_start, replace

while (space) {
IFQ_POLL;
myx_dequeue(free descr);
IFQ_DEQUEUE;
etc;
}

with

while (space && myx_dequeue(free descr)) {
IFQ_DEQUEUE;
etc;
}


# 1.62 18-Aug-2014 dlg

dont rely on mbuf.h to provide pool.h.

ok miod@, who has offerred to help with any MD fallout
ok guenther@


Revision tags: OPENBSD_5_6_BASE
# 1.61 12-Jul-2014 tedu

add a size argument to free. will be used soon, but for now default to 0.
after discussions with beck deraadt kettenis.


# 1.60 10-Jul-2014 dlg

rings that dont rx packets dont need to be refilled.


# 1.59 08-Jul-2014 dlg

cut things that relied on mclgeti for rx ring accounting/restriction over
to using if_rxr.

cut the reporting systat did over to the rxr ioctl.

tested as much as i can on alpha, amd64, and sparc64.
mpi@ has run it on macppc.
ok mpi@


# 1.58 17-Jun-2014 dlg

whitespace fix.

im sick of fixing this by hand on all my boxes while hacking on
other stuff and having it pollute my diffs.

no functional change.


# 1.57 24-Mar-2014 dlg

nothing after the irq ack posting relies on it being ordered.


Revision tags: OPENBSD_5_5_BASE
# 1.56 10-Feb-2014 dlg

the mac addresses you program with MYXCMD_SET_MCASTGROUP are in a different
format to the one used for MYXCMD_SET_LLADDR. for reasons.

this lets ospf work if you dont happen to have PROMISC enabled on your
interface like my production firewalls happen to have, which is why i
never noticed this before.


# 1.55 05-Feb-2014 dlg

after running myx(4) without biglock in production for a few days
i discovered that there's a race between the interrupt code and
myx_start which causes the count of free tx descriptors to get
distorted, which eventually leads to a permanent setting of
IFF_OACTIVE, which in turn prevents the driver from transmitting
packets.

fixing that went horribly wrong when i then discovered that there's
a race between the interrupt handler and myx_down, where the interrupt
can tell myx_down to wake up and free all the rings while the
interrupt handler is still looking at them. free panics for all.

this moves the handling of the tx free count under the biglock (for
now), and moves myx_up and myx_down to managing a "driver state"
variable independantly of the IFF_UP and IFF_RUNNING flags, and
very very careful reordering of the checks of that state variable
and the hardware state.

as a bonus we get to avoid excessive calls to myx_txeof and myx_rxeof
in the isr, and less stuff checked unconditionally. on the other
hand, the sc_state handling added some more checks so it might not
be a win overall.

tested on smp sparc64 with msi and nonmsi interrupts, and on amd64 smp
in production again.


# 1.54 31-Jan-2014 dlg

sc_function is set, but never used for anything useful. clean it up...


# 1.53 31-Jan-2014 dlg

sc_lladdr is never used, so we can get the space in the sc back.


# 1.52 23-Jan-2014 dlg

a lot of people have pointed out to me that taking a lock just to read an
int isnt necessary.


# 1.51 23-Jan-2014 dlg

factor the mutex/bus_space handling of the sts block out.


# 1.50 21-Jan-2014 dlg

introduce fine grained locking.

this doesnt give up the big lock coming from process context, only from
the interrupt side. it is excessively careful about when it takes
the big lock again. notably it goes to a lot of effort to not hold
a mutex while calling into other subsystems or before taking the
big lock.

ive been hitting it as hard as i can without problems.

intensly read by mpi@
ok claudio@ kettenis@


# 1.49 19-Jan-2014 dlg

white space fix


# 1.48 19-Jan-2014 dlg

introduce fine grained locking around the lists of packet handlers
myx maintains. this moves it away from relying on splnet to protect
them.


# 1.47 19-Jan-2014 dlg

hwflags is never used, so clean it up


# 1.46 19-Jan-2014 dlg

replace bcmp with memcmp


# 1.45 19-Jan-2014 dlg

bcopy to memcpy


# 1.44 19-Jan-2014 dlg

replace bzero with memset.


# 1.43 19-Jan-2014 dlg

all 64bit archs myx runs on support bus_space 8 things because of work i
did at n2k13.


Revision tags: OPENBSD_5_3_BASE OPENBSD_5_4_BASE
# 1.42 29-Jan-2013 brad

- Set ENETRESET within myx_ioctl() instead of calling myx_iff() directly, to be
consistent with other drivers.
- Clear IFF_ALLMULTI flag early and at the top of myx_iff().
- Set IFF_ALLMULTI when in promisc mode.

ok dlg@


# 1.41 25-Jan-2013 dlg

we go to a lot of effort to post the first tx descriptor last, but we
really should be trying to post everything except the flags field in the
first tx descriptor. this shuffles things around so the rest of that first
txd is posted as part of the "everything else" before its flags field.


# 1.40 25-Jan-2013 dlg

the myx_dmamem struct doesnt need a name.


# 1.39 21-Jan-2013 dlg

myx does reads and writes in one direction to packet buffers. lets try
STREAMING them.


# 1.38 15-Jan-2013 dlg

dont use amd64 is currently broken cos it has no
bus_space_write_raw_region_8. disabling it for now.


# 1.37 15-Jan-2013 dlg

use bus_space_write_raw_region_8 on 64bit archs when writing to the rings


# 1.36 14-Jan-2013 dlg

map the registers PREFETCHABLE so things that can do write combining can
try and do write combining like the myx doco likes.


# 1.35 14-Jan-2013 dlg

avoid extra bus_space barriers in the interrupt handler.


# 1.34 14-Jan-2013 dlg

when posting descriptors to the chips rings, avoid going write barrier
write barrier write barrier when using myx_write to post descriptors.

instead let its go write write write barrier by using the appropriate
bus_space write directly followed by a single bus_space barrier.

the story above is mostly true, except that myx wants use to write all the
descriptors except the first, barrier, and then write the first one out to
signale that the chip can proceed.

it is also worth noting that the barriers cover more address space than
what we actually wrote to. this makes the code much simpler, and avoids
generating extra fence operations (which is what barrier functions end up
as on most of our archs) when we wrap around the end of the ring. the
bus_space doco encourages this.

bus_space use was discussed with krw@ kettenis@ deraadt@


# 1.33 14-Jan-2013 dlg

the myri doco suggests its nice to post stuff by filling in everything
in the rings except the first descriptor. once you've written as
much as you can out, then you go back and post the first descriptor
to signal that the chip should go ahead and work.


# 1.32 14-Jan-2013 dlg

;; is a long way of saying ;


# 1.31 29-Nov-2012 brad

Remove setting an initial assumed baudrate upon driver attach which is not
necessarily correct, there might not even be a link when attaching.

ok mikeb@ reyk@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.30 28-Nov-2011 blambert

Fix reversed error-handling gotos in myx_buf_fill(), which would lead to
either an mbuf leak or a NULL pointer dereference.

ok sthen@ claudio@ dlg@
testing claudio@ dlg@


Revision tags: OPENBSD_5_0_BASE
# 1.29 08-Aug-2011 dlg

myx requires the driver pad short ethernet frames to 60 bytes by
adding a descriptor pointing at zeroed bytes onto the end of transmit
chains. i was accounting for this extra descriptor when i was
completing the chain, but not when i was setting this up. this
meant the number of free descriptors kept growing until it overflowed.
at this point the check for space in the ring failed and packets
no longer flowed.

this counts the pad descriptor in the tx chain setup too.

ok deraadt@


# 1.28 23-Jun-2011 dlg

cope with empty rx rings by scheduling a timeout to keep trying until it
gets some packets onto the rings.

also annoying, but the hardware doesnt report empty rings, we have to
handle it ourselves.


# 1.27 23-Jun-2011 dlg

this chip has an annoying "feature" where it cannot report the link
state unless the chip is up and handling packets. while its down
it does not report the link state, so it is unknown.

this tweaks the link state handling, in particular it adds code to
myx_down so it moves the link state to unknown, ie, it correctly
reflects reality.

stupidity pointed out by deraadt


# 1.26 22-Jun-2011 deraadt

reset the tx_count on UP, since it may have been advanced from non-zero
by a previous use
ok claudio


# 1.25 22-Jun-2011 dlg

msi support. this is a complicated one...

ok kettenis@


# 1.24 22-Jun-2011 jsg

another myri10ge device matched by freebsd/linux drivers
ok dlg@


# 1.23 22-Jun-2011 dlg

oops, handle refill like i said i was going to two revisions ago.


# 1.22 22-Jun-2011 deraadt

set the mac address on the chip correctly (repair the byte order)
it now works on sparc64, too
ok dlg


# 1.21 22-Jun-2011 dlg

deraadt plugged his myx into a sparc64 and discovered 3 problems:

1. we want to write raw values to registers all the time, so promote the
myx_raw{read,write} to myx_{read,write} and use them everywhere. get rid
of the raw funcs.
2. i was setting the watermarks on the rx ring before knowhing how big
they were.
3. rxfill in the interrupt handler could lose data if you loop on
sts_isvalid.

almost working now...

"please commit your diff" deraadt@


# 1.20 21-Jun-2011 dlg

do the unaligned dma tests so we can figure out if we need to fall
back to the unaligned firmware. apparently this is only an issue
on the "A" controllers which have been supersceded, but those are
the chips we (openbsd devs) have.


# 1.19 21-Jun-2011 dlg

report the controllers part number. eg, i now know i have a
10G-PCIE-8A-R. dmesg looks like this:

myx0 at pci4 dev 0 function 0 "Myricom Z8E" rev 0x00: apic 1 int 8, model 10G-PCIE-8A-R, address 00:60:dd:47:c6:74


# 1.18 21-Jun-2011 dlg

wire up jumbos properly. the hardware supports up to 9018 bytes off
the wire (9000 + ether header + vlan tag), but has some cool alignment
requirements. if you want to use a single rx ring desc to point at
a jumbo it needs to start on a 4k boundary and be physically
contiguous. to ensure this im pulling frames from the 12k pool and
waiting for arianes diff to ensure mbufs are contig.

direction from andrew gallatin. tested locally.


# 1.17 21-Jun-2011 deraadt

minor cleanups; ok dlg


# 1.16 20-Jun-2011 dlg

make the interrupt handler look more like what the doco suggests. seems to
fix a bad lockup i kept getting.


# 1.15 20-Jun-2011 dlg

dont need debug, the myx_cmd stuff works fine.


# 1.14 20-Jun-2011 dlg

i got myx working!


# 1.13 02-May-2011 chl

Do not check malloc return value against NULL, as M_WAITOK is used.

ok dlg@ krw@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.12 19-May-2010 oga

BUS_DMA_ZERO instead of alloc, map, bzero.

ok krw@


Revision tags: OPENBSD_4_7_BASE
# 1.11 13-Aug-2009 jasper

- consistify cfdriver for the ethernet drivers (0 -> NULL)

ok dlg@


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.10 28-Nov-2008 brad

Eliminate the redundant bits of code for MTU and multicast handling
from the individual drivers now that ether_ioctl() handles this.

Shrinks the i386 kernels by..
RAMDISK - 2176 bytes
RAMDISKB - 1504 bytes
RAMDISKC - 736 bytes

Tested by naddy@/okan@/sthen@/brad@/todd@/jmc@ and lots of users.
Build tested on almost all archs by todd@/brad@

ok naddy@


# 1.9 02-Oct-2008 brad

First step towards cleaning up the Ethernet driver ioctl handling.
Move calling ether_ioctl() from the top of the ioctl function, which
at the moment does absolutely nothing, to the default switch case.
Thus allowing drivers to define their own ioctl handlers and then
falling back on ether_ioctl(). The only functional change this results
in at the moment is having all Ethernet drivers returning the proper
errno of ENOTTY instead of EINVAL/ENXIO when encountering unknown
ioctl's.

Shrinks the i386 kernels by..
RAMDISK - 1024 bytes
RAMDISKB - 1120 bytes
RAMDISKC - 832 bytes

Tested by martin@/jsing@/todd@/brad@
Build tested on almost all archs by todd@/brad@

ok jsing@


# 1.8 10-Sep-2008 blambert

Convert timeout_add() calls using multiples of hz to timeout_add_sec()

Really just the low-hanging fruit of (hopefully) forthcoming timeout
conversions.

ok art@, krw@


Revision tags: OPENBSD_4_4_BASE
# 1.7 23-May-2008 brad

Simplify the combination use of pci_mapreg_type()/pci_mapreg_map() as
suggested by dlg@ awhile ago.

ok dlg@


Revision tags: OPENBSD_4_3_BASE
# 1.6 16-Jan-2008 thib

Set the baudrate with IF_Gbps(10); and remove an
XXX comment now that if_baudrate is 64bits.

ok reyk@


Revision tags: OPENBSD_4_2_BASE
# 1.5 01-Jun-2007 reyk

initialize the rings


# 1.4 31-May-2007 reyk

further improvement of the bus space i/o. firmware loading, booting,
and card initalization works now.

thanks to dlg@ who pointed me to the fact that
bus_space_write_region_N and bus_space_write_raw_region_N use count of
elements vs. size of buffer arguments.


# 1.3 31-May-2007 reyk

enable all debugging messages by default if the driver is compiled with
MYX_DEBUG


# 1.2 31-May-2007 reyk

fix the myx_write function


# 1.1 31-May-2007 reyk

initial bits of a new driver for the Myricom Myri-10G Lanai-Z8E 10Gb
Ethernet chipset. not working yet.

ok dlg@


# 1.114 17-Jan-2021 dlg

this hardware is fine with BUS_DMA_64BIT mappings.

this raises performance of tcpbench on an m3000 from ~3kpps and
~8MB/s to ~70kpps and ~191MB/s when transmitting, and ~10kpps and
~15MB/s to ~120kpps and 174MB/s when receiving.

i also tested this on a v245 and an m4000 a while back.


# 1.113 12-Dec-2020 jan

Rename the macro MCLGETI to MCLGETL and removes the dead parameter ifp.

OK dlg@, bluhm@
No Opinion mpi@
Not against it claudio@


# 1.112 27-Nov-2020 kevlo

Add initialization of sc_sff_lock rwlock.

ok semarie@


Revision tags: OPENBSD_6_8_BASE
# 1.111 17-Jul-2020 dlg

name the rx rings so systat mb shows them.


# 1.110 17-Jul-2020 dlg

add kstats to myx.

myx is unusually minimal, so there's not a lot of information that
the chip provides. the most interesting is the number of packets
the chip drops cos of a lack of space on the rx rings.


# 1.109 10-Jul-2020 patrick

Change users of IFQ_SET_MAXLEN() and IFQ_IS_EMPTY() to use the "new" API.

ok dlg@ tobhe@


Revision tags: OPENBSD_6_6_BASE OPENBSD_6_7_BASE
# 1.108 03-Jul-2019 dlg

use ifiq_input return values to apply backpressure to rings.


# 1.107 16-Apr-2019 dlg

i2c reads are more reliable a byte at a time.

reading all 256 at a time was a nice idea, but meant page 0xa2 wasnt
appearing like it should. this follows what freebsd does more
closely too.


# 1.106 16-Apr-2019 dlg

make sff page reads work on little endian archs too. like amd64.

some modules seem to need more time when waiting for bytes while here.

hrvoje popovski hit the endian issue


# 1.105 15-Apr-2019 dlg

implement SIOCGIFSFFPAGE so ifconfig can get transceiver info.

myx doesn't allow i2c writes, so you can only read whatever page
the firmware is already pointing at on device 0xa0. if you try to
read another page it will return ENXIO.

tested on a 10G-PCIE-8A-R with an xfp module.


# 1.104 15-Apr-2019 dlg

trim some debug code that printed out the name of a command

the list of commands is going to grow, but the thought of keeping
the list in debug code up to date with it just makes me feel tired.

this prints the command id number instead in the same format we
represent it in the header.


Revision tags: OPENBSD_6_2_BASE OPENBSD_6_3_BASE OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.103 01-Aug-2017 dlg

defer init of the myxmcl pool to mountroot, and enable pool cpu caches.

pool_cache_init cannot be called during autoconf because we cant
be confident about the number of cpus in the machine until the first
run of attaches.

mountroot is after autoconf, and myx already has code that runs
there for the firmware loading.

discussed with deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.102 07-Feb-2017 dlg

move the mbuf pools to m_pool_init and a single global memory limit

this replaces individual calls to pool_init, pool_set_constraints, and
pool_sethardlimit with calls to m_pool_init. m_pool_init inits the
mbuf pools with the mbuf pool allocator, and because of that doesnt
set per pool limits.

ok bluhm@ as part of a larger diff


# 1.101 24-Jan-2017 dlg

add support for multiple transmit ifqueues per network interface.

an ifq to transmit a packet is picked by the current traffic
conditioner (ie, priq or hfsc) by providing an index into an array
of ifqs. by default interfaces get a single ifq but can ask for
more using if_attach_queues().

the vast majority of our drivers still think there's a 1:1 mapping
between interfaces and transmit queues, so their if_start routines
take an ifnet pointer instead of a pointer to the ifqueue struct.
instead of changing all the drivers in the tree, drivers can opt
into using an if_qstart routine and setting the IFXF_MPSAFE flag.
the stack provides a compatability wrapper from the new if_qstart
handler to the previous if_start handlers if IFXF_MPSAFE isnt set.

enabling hfsc on an interface configures it to transmit everything
through the first ifq. any other ifqs are left configured as priq,
but unused, when hfsc is enabled.

getting this in now so everyone can kick the tyres.

ok mpi@ visa@ (who provided some tweaks for cnmac).


# 1.100 22-Jan-2017 dlg

move counting if_opackets next to counting if_obytes in if_enqueue.

this means packets are consistently counted in one place, unlike the
many and various ways that drivers thought they should do it.

ok mpi@ deraadt@


# 1.99 31-Oct-2016 dlg

turns out these chips can handle buffers up to 9400 bytes in length.

raise the mtu to 9380 bytes so we can take advantage of the extra space.

i need to revisit the macro names at some point.


# 1.98 31-Oct-2016 dlg

revert 1.97 where i moved myx to using the system pools

my early revision board doesnt like it at all


# 1.97 28-Oct-2016 dlg

get rid of the custom pool in myx for jumbo frames.

now it asks the mbuf layer for the 9k from its pools.

a question from chris@ made me go look at the chip doco again and i
realised that the chip only requires 4 byte alignment for rx buffers,
no 4k alignment for jumbo buffers.

i also found that the chip is supposed to be able to rx up to 9400
bytes instead of 9000. ill fix that later though.


# 1.96 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


Revision tags: OPENBSD_6_0_BASE
# 1.95 23-May-2016 tedu

remove the function pointer from mbufs. this memory is shared with data
via unions, and we don't want to make it easy to control the target.
instead an integer index into an array of acceptable functions is used.
drivers using custom functions must register them to receive an index.
ok deraadt


# 1.94 13-Apr-2016 mpi

G/C IFQ_SET_READY().


# 1.93 13-Apr-2016 mpi

G/C IFQ_SET_READY().


Revision tags: OPENBSD_5_9_BASE
# 1.92 11-Dec-2015 mpi

Replace mountroothook_establish(9) by config_mountroot(9) a narrower API
similar to config_defer(9).

ok mikeb@, deraadt@


# 1.91 09-Dec-2015 dlg

rework the if_start mpsafe serialisation so it can serialise arbitrary work

work is represented by struct task.

the start routine is now wrapped by a task which is serialised by the
infrastructure. if_start_barrier has been renamed to ifq_barrier and
is now implemented as a task that gets serialised with the start
routine.

this also adds an ifq_restart() function. it serialises a call to
ifq_clr_oactive and calls the start routine again. it exists to
avoid a race that kettenis@ identified in between when a start
routine discovers theres no space left on a ring, and when it calls
ifq_set_oactive. if the txeof side of the driver empties the ring
and calls ifq_clr_oactive in between the above calls in start, the
queue will be marked oactive and the stack will never call the start
routine again.

by serialising the ifq_set_oactive call in the start routine and
ifq_clr_oactive calls we avoid that race.

tested on various nics
ok mpi@


# 1.90 03-Dec-2015 dlg

tell the stack myx_start is mpsafe.

as per the stack commit, the driver changes are:

1. setting ifp->if_xflags = IFXF_MPSAFE
2. only calling if_start() instead of its own start routine
3. clearing IFF_RUNNING before calling if_start_barrier() on its way down
4. only using IFQ_DEQUEUE (not ifq_deq_begin/commit/rollback)


# 1.89 01-Dec-2015 dlg

myx doesnt use atomic.h anymore.


# 1.88 25-Nov-2015 dlg

replace IFF_OACTIVE manipulation with mpsafe operations.

there are two things shared between the network stack and drivers
in the send path: the send queue and the IFF_OACTIVE flag. the send
queue is now protected by a mutex. this diff makes the oactive
functionality mpsafe too.

IFF_OACTIVE is part of if_flags. there are two problems with that.
firstly, if_flags is a short and we dont have any MI atomic operations
to manipulate a short. secondly, while we could make the IFF_OACTIVE
operates mpsafe, all changes to other flags would have to be made
safe at the same time, otherwise a read-modify-write cycle on their
updates could clobber the oactive change.

instead, this moves the oactive mark into struct ifqueue and provides
an API for changing it. there's ifq_set_oactive, ifq_clr_oactive,
and ifq_is_oactive. these are modelled on ifsq_set_oactive,
ifsq_clr_oactive, and ifsq_is_oactive in dragonflybsd.

this diff includes changes to all the drivers manipulating IFF_OACTIVE
to now use the ifsq_{set,clr_is}_oactive API too.

ok kettenis@ mpi@ jmatthew@ deraadt@


# 1.87 24-Nov-2015 dlg

fix tx ring accounting in myx_start.

turns out i was calculating the number of packets (not descriptors)
on the tx ring, and then using that as the free space for descriptors.


# 1.86 19-Nov-2015 dlg

get rid of sc_tx_free and the atomic ops on it in myx_start and myx_txeof.

myx_start calculates the free space by reading the consumer index
and doing some maths, which lets us avoid the interlocked cpu ops.


# 1.85 25-Oct-2015 mpi

arp_ifinit() is no longer needed.


# 1.84 29-Sep-2015 dlg

get rid of the mutex between access to the status block and myx_down

myx is unusual in that it has an explicit command to shut down the
chip that gets an interrupt when it's done. so myx_down sends the
command and has to sleep until it gets that interrupt. this moves
to using a single int to represent that state (so loads and stores
are atomic), and sleep_setup/sleep_finish in myx_down to wait for
it to change.

this has been running in production at work for a few months now
tested by chris@


# 1.83 01-Sep-2015 deraadt

free() firmware with right len; ok dlg


# 1.82 15-Aug-2015 dlg

do the global tx free accounting in myx_start with a single atomic op
instead of one per packet.

seems to let me send packets a little faster.


# 1.81 15-Aug-2015 dlg

rework the tx path to use a ring to keep track of dmamaps/mbufs.

this removes the myx_buf structure and uses myx_slot instead. theyre
the same expcet slots dont have list entry structures, so theyre
smaller.

this cuts out four mutex ops per packet out of the tx handling.
just have to get rid of the atomic op per packet in myx_start now.


# 1.80 14-Aug-2015 dlg

move to a per rx ring timeout for refilling empty rings.

this lets me get rid of the locking around the refilling of the rx ring.

the timeout only runs refill if the rx ring is empty. we know rxeof
wont try and refill it in that situation because there's no packets
on the ring so we wont get interrupts for it. therefore we dont
need to lock between the timeout and rxeof cos they cant run at the
same time.


# 1.79 14-Aug-2015 dlg

rework how we track the packets on the rx rings.

originally there were two mutex protected lists for rx packets, a
list of free packets, and a list of packets that were on the ring.
filling the ring popped packets off the free list, attached an mbuf
and dmamapped it, and pushed it onto the list of active packets.
the hw fills packets in order, so on rx completion we'd pop packets
the active list, unmap the mbuf and shove it up the stack before
putting the packet on the free list.

the problem with the lists is that every rx ring operation resulted
in two mutex ops. so 4 mutex ops per packet after you do both fill
and rxeof.

this replaces the mutexed lists with rings that shadow the hardware
rings. filling the rx ring pushes a producer index along, while
rxeof chases it with a consumer. because we know only one thing can
do either of those tasks at a time, we can get away with not using
atomic ops for them.

there's more to be done, but this is a good first step.


Revision tags: OPENBSD_5_8_BASE
# 1.78 24-Jun-2015 mpi

Increment if_ipackets in if_input().

Note that pseudo-drivers not using if_input() are not affected by this
conversion.

ok mikeb@, kettenis@, claudio@, dlg@


# 1.77 17-May-2015 chris

We don't need KERNEL_LOCK() around if_input() anymore, as if_input() has
appropriate locking around bpf now.

ok dlg@


# 1.76 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.75 20-Feb-2015 chris

Now that if_input() is a thing, use it

ok dlg@


# 1.74 18-Feb-2015 dlg

myri employees and their drivers for linux and solaris have repeatedly
told me that if you're going to rx into buffers greater than 4k in
size, they have to be aligned to a 4k boundary.

the mru of this chip is 9k, but ive been using the 12k mcl pool to
provide the alignment. however, if we move to putting 8 items on a
pool page there'll be enough slack space in the mcl12k pool pages
to allow item colouring, which in turn will break the chip requirement
above. in practice the chips i have seem to work fine with unaligned
buffers, but i dont want to risk breaking early revision chips.

this moves myx to using a private pool for allocating clusters for
the big rx ring. the item size is 9k, but we specify a 4k alignment
so every item we get out of it will be correct for the chip.


# 1.73 18-Feb-2015 dlg

enable pcie relaxed transaction ordering and bump the max payload
size up to 4k.

found while reading someone elses driver.


# 1.72 22-Dec-2014 tedu

unifdef INET


# 1.71 28-Oct-2014 dlg

the if_rxring accounting would get screwed up if the first mbuf to
be put on the ring couldnt be allocated.

this pulls the code that puts the mbufs on the ring out of myx_rx_fill
so it can return early if firstmb cant be allocated, which puts it
in the right place to return unused slots to the if_rxring.

this means myx rx wont lock up if you're DoSsed to the point where
you exhaust your mbuf pools and cant allocate mbufs for the ring.

ok jmatthew@


# 1.70 04-Oct-2014 dlg

replace mutexes to serialise the operations on the flag that restricts
the number of contexts that are refilling the rx rings with atomic
ops.

this is borrowed from code i wrote for the scsi midlayer but cant
put in yet because i havent got atomic.h up to scrach on all archs
yet. the archs myx runs on do have enough atomic.h to be fine though.


# 1.69 03-Oct-2014 dlg

refill the rx ring in myx_rxeof, not much later at the end of myx_intr.


# 1.68 03-Oct-2014 dlg

in rxeof, instead of taking the biglock on every packet to call bpf
and ether_input, queue all the mbufs onto an mbuf_list on the stack
and then take the biglock once outside the loop.


# 1.67 03-Oct-2014 dlg

we dont need the kernel lock to call bus_dmamap_load and unload thanks
to ketenis.

move the if_ipacket and if_opacket increments out of biglock too.
theyre only updated from the interrupt handler, which is only run
on a single cpu so there's no chance of the update racing. everywhere
else only reads them.


# 1.66 03-Oct-2014 dlg

dont need to hold the kernel lock to call MCLGETI and m_freem now.


# 1.65 03-Oct-2014 dlg

dont take the kernel lock on every interrupt in case we might change
the link state or to clear OACTIVE, just take it when we know we
really are going to do those things.


# 1.64 14-Sep-2014 jsg

remove uneeded proc.h includes
ok mpi@ kspillner@


# 1.63 19-Aug-2014 dlg

in myx_start, replace

while (space) {
IFQ_POLL;
myx_dequeue(free descr);
IFQ_DEQUEUE;
etc;
}

with

while (space && myx_dequeue(free descr)) {
IFQ_DEQUEUE;
etc;
}


# 1.62 18-Aug-2014 dlg

dont rely on mbuf.h to provide pool.h.

ok miod@, who has offerred to help with any MD fallout
ok guenther@


Revision tags: OPENBSD_5_6_BASE
# 1.61 12-Jul-2014 tedu

add a size argument to free. will be used soon, but for now default to 0.
after discussions with beck deraadt kettenis.


# 1.60 10-Jul-2014 dlg

rings that dont rx packets dont need to be refilled.


# 1.59 08-Jul-2014 dlg

cut things that relied on mclgeti for rx ring accounting/restriction over
to using if_rxr.

cut the reporting systat did over to the rxr ioctl.

tested as much as i can on alpha, amd64, and sparc64.
mpi@ has run it on macppc.
ok mpi@


# 1.58 17-Jun-2014 dlg

whitespace fix.

im sick of fixing this by hand on all my boxes while hacking on
other stuff and having it pollute my diffs.

no functional change.


# 1.57 24-Mar-2014 dlg

nothing after the irq ack posting relies on it being ordered.


Revision tags: OPENBSD_5_5_BASE
# 1.56 10-Feb-2014 dlg

the mac addresses you program with MYXCMD_SET_MCASTGROUP are in a different
format to the one used for MYXCMD_SET_LLADDR. for reasons.

this lets ospf work if you dont happen to have PROMISC enabled on your
interface like my production firewalls happen to have, which is why i
never noticed this before.


# 1.55 05-Feb-2014 dlg

after running myx(4) without biglock in production for a few days
i discovered that there's a race between the interrupt code and
myx_start which causes the count of free tx descriptors to get
distorted, which eventually leads to a permanent setting of
IFF_OACTIVE, which in turn prevents the driver from transmitting
packets.

fixing that went horribly wrong when i then discovered that there's
a race between the interrupt handler and myx_down, where the interrupt
can tell myx_down to wake up and free all the rings while the
interrupt handler is still looking at them. free panics for all.

this moves the handling of the tx free count under the biglock (for
now), and moves myx_up and myx_down to managing a "driver state"
variable independantly of the IFF_UP and IFF_RUNNING flags, and
very very careful reordering of the checks of that state variable
and the hardware state.

as a bonus we get to avoid excessive calls to myx_txeof and myx_rxeof
in the isr, and less stuff checked unconditionally. on the other
hand, the sc_state handling added some more checks so it might not
be a win overall.

tested on smp sparc64 with msi and nonmsi interrupts, and on amd64 smp
in production again.


# 1.54 31-Jan-2014 dlg

sc_function is set, but never used for anything useful. clean it up...


# 1.53 31-Jan-2014 dlg

sc_lladdr is never used, so we can get the space in the sc back.


# 1.52 23-Jan-2014 dlg

a lot of people have pointed out to me that taking a lock just to read an
int isnt necessary.


# 1.51 23-Jan-2014 dlg

factor the mutex/bus_space handling of the sts block out.


# 1.50 21-Jan-2014 dlg

introduce fine grained locking.

this doesnt give up the big lock coming from process context, only from
the interrupt side. it is excessively careful about when it takes
the big lock again. notably it goes to a lot of effort to not hold
a mutex while calling into other subsystems or before taking the
big lock.

ive been hitting it as hard as i can without problems.

intensly read by mpi@
ok claudio@ kettenis@


# 1.49 19-Jan-2014 dlg

white space fix


# 1.48 19-Jan-2014 dlg

introduce fine grained locking around the lists of packet handlers
myx maintains. this moves it away from relying on splnet to protect
them.


# 1.47 19-Jan-2014 dlg

hwflags is never used, so clean it up


# 1.46 19-Jan-2014 dlg

replace bcmp with memcmp


# 1.45 19-Jan-2014 dlg

bcopy to memcpy


# 1.44 19-Jan-2014 dlg

replace bzero with memset.


# 1.43 19-Jan-2014 dlg

all 64bit archs myx runs on support bus_space 8 things because of work i
did at n2k13.


Revision tags: OPENBSD_5_3_BASE OPENBSD_5_4_BASE
# 1.42 29-Jan-2013 brad

- Set ENETRESET within myx_ioctl() instead of calling myx_iff() directly, to be
consistent with other drivers.
- Clear IFF_ALLMULTI flag early and at the top of myx_iff().
- Set IFF_ALLMULTI when in promisc mode.

ok dlg@


# 1.41 25-Jan-2013 dlg

we go to a lot of effort to post the first tx descriptor last, but we
really should be trying to post everything except the flags field in the
first tx descriptor. this shuffles things around so the rest of that first
txd is posted as part of the "everything else" before its flags field.


# 1.40 25-Jan-2013 dlg

the myx_dmamem struct doesnt need a name.


# 1.39 21-Jan-2013 dlg

myx does reads and writes in one direction to packet buffers. lets try
STREAMING them.


# 1.38 15-Jan-2013 dlg

dont use amd64 is currently broken cos it has no
bus_space_write_raw_region_8. disabling it for now.


# 1.37 15-Jan-2013 dlg

use bus_space_write_raw_region_8 on 64bit archs when writing to the rings


# 1.36 14-Jan-2013 dlg

map the registers PREFETCHABLE so things that can do write combining can
try and do write combining like the myx doco likes.


# 1.35 14-Jan-2013 dlg

avoid extra bus_space barriers in the interrupt handler.


# 1.34 14-Jan-2013 dlg

when posting descriptors to the chips rings, avoid going write barrier
write barrier write barrier when using myx_write to post descriptors.

instead let its go write write write barrier by using the appropriate
bus_space write directly followed by a single bus_space barrier.

the story above is mostly true, except that myx wants use to write all the
descriptors except the first, barrier, and then write the first one out to
signale that the chip can proceed.

it is also worth noting that the barriers cover more address space than
what we actually wrote to. this makes the code much simpler, and avoids
generating extra fence operations (which is what barrier functions end up
as on most of our archs) when we wrap around the end of the ring. the
bus_space doco encourages this.

bus_space use was discussed with krw@ kettenis@ deraadt@


# 1.33 14-Jan-2013 dlg

the myri doco suggests its nice to post stuff by filling in everything
in the rings except the first descriptor. once you've written as
much as you can out, then you go back and post the first descriptor
to signal that the chip should go ahead and work.


# 1.32 14-Jan-2013 dlg

;; is a long way of saying ;


# 1.31 29-Nov-2012 brad

Remove setting an initial assumed baudrate upon driver attach which is not
necessarily correct, there might not even be a link when attaching.

ok mikeb@ reyk@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.30 28-Nov-2011 blambert

Fix reversed error-handling gotos in myx_buf_fill(), which would lead to
either an mbuf leak or a NULL pointer dereference.

ok sthen@ claudio@ dlg@
testing claudio@ dlg@


Revision tags: OPENBSD_5_0_BASE
# 1.29 08-Aug-2011 dlg

myx requires the driver pad short ethernet frames to 60 bytes by
adding a descriptor pointing at zeroed bytes onto the end of transmit
chains. i was accounting for this extra descriptor when i was
completing the chain, but not when i was setting this up. this
meant the number of free descriptors kept growing until it overflowed.
at this point the check for space in the ring failed and packets
no longer flowed.

this counts the pad descriptor in the tx chain setup too.

ok deraadt@


# 1.28 23-Jun-2011 dlg

cope with empty rx rings by scheduling a timeout to keep trying until it
gets some packets onto the rings.

also annoying, but the hardware doesnt report empty rings, we have to
handle it ourselves.


# 1.27 23-Jun-2011 dlg

this chip has an annoying "feature" where it cannot report the link
state unless the chip is up and handling packets. while its down
it does not report the link state, so it is unknown.

this tweaks the link state handling, in particular it adds code to
myx_down so it moves the link state to unknown, ie, it correctly
reflects reality.

stupidity pointed out by deraadt


# 1.26 22-Jun-2011 deraadt

reset the tx_count on UP, since it may have been advanced from non-zero
by a previous use
ok claudio


# 1.25 22-Jun-2011 dlg

msi support. this is a complicated one...

ok kettenis@


# 1.24 22-Jun-2011 jsg

another myri10ge device matched by freebsd/linux drivers
ok dlg@


# 1.23 22-Jun-2011 dlg

oops, handle refill like i said i was going to two revisions ago.


# 1.22 22-Jun-2011 deraadt

set the mac address on the chip correctly (repair the byte order)
it now works on sparc64, too
ok dlg


# 1.21 22-Jun-2011 dlg

deraadt plugged his myx into a sparc64 and discovered 3 problems:

1. we want to write raw values to registers all the time, so promote the
myx_raw{read,write} to myx_{read,write} and use them everywhere. get rid
of the raw funcs.
2. i was setting the watermarks on the rx ring before knowhing how big
they were.
3. rxfill in the interrupt handler could lose data if you loop on
sts_isvalid.

almost working now...

"please commit your diff" deraadt@


# 1.20 21-Jun-2011 dlg

do the unaligned dma tests so we can figure out if we need to fall
back to the unaligned firmware. apparently this is only an issue
on the "A" controllers which have been supersceded, but those are
the chips we (openbsd devs) have.


# 1.19 21-Jun-2011 dlg

report the controllers part number. eg, i now know i have a
10G-PCIE-8A-R. dmesg looks like this:

myx0 at pci4 dev 0 function 0 "Myricom Z8E" rev 0x00: apic 1 int 8, model 10G-PCIE-8A-R, address 00:60:dd:47:c6:74


# 1.18 21-Jun-2011 dlg

wire up jumbos properly. the hardware supports up to 9018 bytes off
the wire (9000 + ether header + vlan tag), but has some cool alignment
requirements. if you want to use a single rx ring desc to point at
a jumbo it needs to start on a 4k boundary and be physically
contiguous. to ensure this im pulling frames from the 12k pool and
waiting for arianes diff to ensure mbufs are contig.

direction from andrew gallatin. tested locally.


# 1.17 21-Jun-2011 deraadt

minor cleanups; ok dlg


# 1.16 20-Jun-2011 dlg

make the interrupt handler look more like what the doco suggests. seems to
fix a bad lockup i kept getting.


# 1.15 20-Jun-2011 dlg

dont need debug, the myx_cmd stuff works fine.


# 1.14 20-Jun-2011 dlg

i got myx working!


# 1.13 02-May-2011 chl

Do not check malloc return value against NULL, as M_WAITOK is used.

ok dlg@ krw@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.12 19-May-2010 oga

BUS_DMA_ZERO instead of alloc, map, bzero.

ok krw@


Revision tags: OPENBSD_4_7_BASE
# 1.11 13-Aug-2009 jasper

- consistify cfdriver for the ethernet drivers (0 -> NULL)

ok dlg@


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.10 28-Nov-2008 brad

Eliminate the redundant bits of code for MTU and multicast handling
from the individual drivers now that ether_ioctl() handles this.

Shrinks the i386 kernels by..
RAMDISK - 2176 bytes
RAMDISKB - 1504 bytes
RAMDISKC - 736 bytes

Tested by naddy@/okan@/sthen@/brad@/todd@/jmc@ and lots of users.
Build tested on almost all archs by todd@/brad@

ok naddy@


# 1.9 02-Oct-2008 brad

First step towards cleaning up the Ethernet driver ioctl handling.
Move calling ether_ioctl() from the top of the ioctl function, which
at the moment does absolutely nothing, to the default switch case.
Thus allowing drivers to define their own ioctl handlers and then
falling back on ether_ioctl(). The only functional change this results
in at the moment is having all Ethernet drivers returning the proper
errno of ENOTTY instead of EINVAL/ENXIO when encountering unknown
ioctl's.

Shrinks the i386 kernels by..
RAMDISK - 1024 bytes
RAMDISKB - 1120 bytes
RAMDISKC - 832 bytes

Tested by martin@/jsing@/todd@/brad@
Build tested on almost all archs by todd@/brad@

ok jsing@


# 1.8 10-Sep-2008 blambert

Convert timeout_add() calls using multiples of hz to timeout_add_sec()

Really just the low-hanging fruit of (hopefully) forthcoming timeout
conversions.

ok art@, krw@


Revision tags: OPENBSD_4_4_BASE
# 1.7 23-May-2008 brad

Simplify the combination use of pci_mapreg_type()/pci_mapreg_map() as
suggested by dlg@ awhile ago.

ok dlg@


Revision tags: OPENBSD_4_3_BASE
# 1.6 16-Jan-2008 thib

Set the baudrate with IF_Gbps(10); and remove an
XXX comment now that if_baudrate is 64bits.

ok reyk@


Revision tags: OPENBSD_4_2_BASE
# 1.5 01-Jun-2007 reyk

initialize the rings


# 1.4 31-May-2007 reyk

further improvement of the bus space i/o. firmware loading, booting,
and card initalization works now.

thanks to dlg@ who pointed me to the fact that
bus_space_write_region_N and bus_space_write_raw_region_N use count of
elements vs. size of buffer arguments.


# 1.3 31-May-2007 reyk

enable all debugging messages by default if the driver is compiled with
MYX_DEBUG


# 1.2 31-May-2007 reyk

fix the myx_write function


# 1.1 31-May-2007 reyk

initial bits of a new driver for the Myricom Myri-10G Lanai-Z8E 10Gb
Ethernet chipset. not working yet.

ok dlg@


# 1.113 12-Dec-2020 jan

Rename the macro MCLGETI to MCLGETL and removes the dead parameter ifp.

OK dlg@, bluhm@
No Opinion mpi@
Not against it claudio@


# 1.112 27-Nov-2020 kevlo

Add initialization of sc_sff_lock rwlock.

ok semarie@


Revision tags: OPENBSD_6_8_BASE
# 1.111 17-Jul-2020 dlg

name the rx rings so systat mb shows them.


# 1.110 17-Jul-2020 dlg

add kstats to myx.

myx is unusually minimal, so there's not a lot of information that
the chip provides. the most interesting is the number of packets
the chip drops cos of a lack of space on the rx rings.


# 1.109 10-Jul-2020 patrick

Change users of IFQ_SET_MAXLEN() and IFQ_IS_EMPTY() to use the "new" API.

ok dlg@ tobhe@


Revision tags: OPENBSD_6_6_BASE OPENBSD_6_7_BASE
# 1.108 03-Jul-2019 dlg

use ifiq_input return values to apply backpressure to rings.


# 1.107 16-Apr-2019 dlg

i2c reads are more reliable a byte at a time.

reading all 256 at a time was a nice idea, but meant page 0xa2 wasnt
appearing like it should. this follows what freebsd does more
closely too.


# 1.106 16-Apr-2019 dlg

make sff page reads work on little endian archs too. like amd64.

some modules seem to need more time when waiting for bytes while here.

hrvoje popovski hit the endian issue


# 1.105 15-Apr-2019 dlg

implement SIOCGIFSFFPAGE so ifconfig can get transceiver info.

myx doesn't allow i2c writes, so you can only read whatever page
the firmware is already pointing at on device 0xa0. if you try to
read another page it will return ENXIO.

tested on a 10G-PCIE-8A-R with an xfp module.


# 1.104 15-Apr-2019 dlg

trim some debug code that printed out the name of a command

the list of commands is going to grow, but the thought of keeping
the list in debug code up to date with it just makes me feel tired.

this prints the command id number instead in the same format we
represent it in the header.


Revision tags: OPENBSD_6_2_BASE OPENBSD_6_3_BASE OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.103 01-Aug-2017 dlg

defer init of the myxmcl pool to mountroot, and enable pool cpu caches.

pool_cache_init cannot be called during autoconf because we cant
be confident about the number of cpus in the machine until the first
run of attaches.

mountroot is after autoconf, and myx already has code that runs
there for the firmware loading.

discussed with deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.102 07-Feb-2017 dlg

move the mbuf pools to m_pool_init and a single global memory limit

this replaces individual calls to pool_init, pool_set_constraints, and
pool_sethardlimit with calls to m_pool_init. m_pool_init inits the
mbuf pools with the mbuf pool allocator, and because of that doesnt
set per pool limits.

ok bluhm@ as part of a larger diff


# 1.101 24-Jan-2017 dlg

add support for multiple transmit ifqueues per network interface.

an ifq to transmit a packet is picked by the current traffic
conditioner (ie, priq or hfsc) by providing an index into an array
of ifqs. by default interfaces get a single ifq but can ask for
more using if_attach_queues().

the vast majority of our drivers still think there's a 1:1 mapping
between interfaces and transmit queues, so their if_start routines
take an ifnet pointer instead of a pointer to the ifqueue struct.
instead of changing all the drivers in the tree, drivers can opt
into using an if_qstart routine and setting the IFXF_MPSAFE flag.
the stack provides a compatability wrapper from the new if_qstart
handler to the previous if_start handlers if IFXF_MPSAFE isnt set.

enabling hfsc on an interface configures it to transmit everything
through the first ifq. any other ifqs are left configured as priq,
but unused, when hfsc is enabled.

getting this in now so everyone can kick the tyres.

ok mpi@ visa@ (who provided some tweaks for cnmac).


# 1.100 22-Jan-2017 dlg

move counting if_opackets next to counting if_obytes in if_enqueue.

this means packets are consistently counted in one place, unlike the
many and various ways that drivers thought they should do it.

ok mpi@ deraadt@


# 1.99 31-Oct-2016 dlg

turns out these chips can handle buffers up to 9400 bytes in length.

raise the mtu to 9380 bytes so we can take advantage of the extra space.

i need to revisit the macro names at some point.


# 1.98 31-Oct-2016 dlg

revert 1.97 where i moved myx to using the system pools

my early revision board doesnt like it at all


# 1.97 28-Oct-2016 dlg

get rid of the custom pool in myx for jumbo frames.

now it asks the mbuf layer for the 9k from its pools.

a question from chris@ made me go look at the chip doco again and i
realised that the chip only requires 4 byte alignment for rx buffers,
no 4k alignment for jumbo buffers.

i also found that the chip is supposed to be able to rx up to 9400
bytes instead of 9000. ill fix that later though.


# 1.96 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


Revision tags: OPENBSD_6_0_BASE
# 1.95 23-May-2016 tedu

remove the function pointer from mbufs. this memory is shared with data
via unions, and we don't want to make it easy to control the target.
instead an integer index into an array of acceptable functions is used.
drivers using custom functions must register them to receive an index.
ok deraadt


# 1.94 13-Apr-2016 mpi

G/C IFQ_SET_READY().


# 1.93 13-Apr-2016 mpi

G/C IFQ_SET_READY().


Revision tags: OPENBSD_5_9_BASE
# 1.92 11-Dec-2015 mpi

Replace mountroothook_establish(9) by config_mountroot(9) a narrower API
similar to config_defer(9).

ok mikeb@, deraadt@


# 1.91 09-Dec-2015 dlg

rework the if_start mpsafe serialisation so it can serialise arbitrary work

work is represented by struct task.

the start routine is now wrapped by a task which is serialised by the
infrastructure. if_start_barrier has been renamed to ifq_barrier and
is now implemented as a task that gets serialised with the start
routine.

this also adds an ifq_restart() function. it serialises a call to
ifq_clr_oactive and calls the start routine again. it exists to
avoid a race that kettenis@ identified in between when a start
routine discovers theres no space left on a ring, and when it calls
ifq_set_oactive. if the txeof side of the driver empties the ring
and calls ifq_clr_oactive in between the above calls in start, the
queue will be marked oactive and the stack will never call the start
routine again.

by serialising the ifq_set_oactive call in the start routine and
ifq_clr_oactive calls we avoid that race.

tested on various nics
ok mpi@


# 1.90 03-Dec-2015 dlg

tell the stack myx_start is mpsafe.

as per the stack commit, the driver changes are:

1. setting ifp->if_xflags = IFXF_MPSAFE
2. only calling if_start() instead of its own start routine
3. clearing IFF_RUNNING before calling if_start_barrier() on its way down
4. only using IFQ_DEQUEUE (not ifq_deq_begin/commit/rollback)


# 1.89 01-Dec-2015 dlg

myx doesnt use atomic.h anymore.


# 1.88 25-Nov-2015 dlg

replace IFF_OACTIVE manipulation with mpsafe operations.

there are two things shared between the network stack and drivers
in the send path: the send queue and the IFF_OACTIVE flag. the send
queue is now protected by a mutex. this diff makes the oactive
functionality mpsafe too.

IFF_OACTIVE is part of if_flags. there are two problems with that.
firstly, if_flags is a short and we dont have any MI atomic operations
to manipulate a short. secondly, while we could make the IFF_OACTIVE
operates mpsafe, all changes to other flags would have to be made
safe at the same time, otherwise a read-modify-write cycle on their
updates could clobber the oactive change.

instead, this moves the oactive mark into struct ifqueue and provides
an API for changing it. there's ifq_set_oactive, ifq_clr_oactive,
and ifq_is_oactive. these are modelled on ifsq_set_oactive,
ifsq_clr_oactive, and ifsq_is_oactive in dragonflybsd.

this diff includes changes to all the drivers manipulating IFF_OACTIVE
to now use the ifsq_{set,clr_is}_oactive API too.

ok kettenis@ mpi@ jmatthew@ deraadt@


# 1.87 24-Nov-2015 dlg

fix tx ring accounting in myx_start.

turns out i was calculating the number of packets (not descriptors)
on the tx ring, and then using that as the free space for descriptors.


# 1.86 19-Nov-2015 dlg

get rid of sc_tx_free and the atomic ops on it in myx_start and myx_txeof.

myx_start calculates the free space by reading the consumer index
and doing some maths, which lets us avoid the interlocked cpu ops.


# 1.85 25-Oct-2015 mpi

arp_ifinit() is no longer needed.


# 1.84 29-Sep-2015 dlg

get rid of the mutex between access to the status block and myx_down

myx is unusual in that it has an explicit command to shut down the
chip that gets an interrupt when it's done. so myx_down sends the
command and has to sleep until it gets that interrupt. this moves
to using a single int to represent that state (so loads and stores
are atomic), and sleep_setup/sleep_finish in myx_down to wait for
it to change.

this has been running in production at work for a few months now
tested by chris@


# 1.83 01-Sep-2015 deraadt

free() firmware with right len; ok dlg


# 1.82 15-Aug-2015 dlg

do the global tx free accounting in myx_start with a single atomic op
instead of one per packet.

seems to let me send packets a little faster.


# 1.81 15-Aug-2015 dlg

rework the tx path to use a ring to keep track of dmamaps/mbufs.

this removes the myx_buf structure and uses myx_slot instead. theyre
the same expcet slots dont have list entry structures, so theyre
smaller.

this cuts out four mutex ops per packet out of the tx handling.
just have to get rid of the atomic op per packet in myx_start now.


# 1.80 14-Aug-2015 dlg

move to a per rx ring timeout for refilling empty rings.

this lets me get rid of the locking around the refilling of the rx ring.

the timeout only runs refill if the rx ring is empty. we know rxeof
wont try and refill it in that situation because there's no packets
on the ring so we wont get interrupts for it. therefore we dont
need to lock between the timeout and rxeof cos they cant run at the
same time.


# 1.79 14-Aug-2015 dlg

rework how we track the packets on the rx rings.

originally there were two mutex protected lists for rx packets, a
list of free packets, and a list of packets that were on the ring.
filling the ring popped packets off the free list, attached an mbuf
and dmamapped it, and pushed it onto the list of active packets.
the hw fills packets in order, so on rx completion we'd pop packets
the active list, unmap the mbuf and shove it up the stack before
putting the packet on the free list.

the problem with the lists is that every rx ring operation resulted
in two mutex ops. so 4 mutex ops per packet after you do both fill
and rxeof.

this replaces the mutexed lists with rings that shadow the hardware
rings. filling the rx ring pushes a producer index along, while
rxeof chases it with a consumer. because we know only one thing can
do either of those tasks at a time, we can get away with not using
atomic ops for them.

there's more to be done, but this is a good first step.


Revision tags: OPENBSD_5_8_BASE
# 1.78 24-Jun-2015 mpi

Increment if_ipackets in if_input().

Note that pseudo-drivers not using if_input() are not affected by this
conversion.

ok mikeb@, kettenis@, claudio@, dlg@


# 1.77 17-May-2015 chris

We don't need KERNEL_LOCK() around if_input() anymore, as if_input() has
appropriate locking around bpf now.

ok dlg@


# 1.76 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.75 20-Feb-2015 chris

Now that if_input() is a thing, use it

ok dlg@


# 1.74 18-Feb-2015 dlg

myri employees and their drivers for linux and solaris have repeatedly
told me that if you're going to rx into buffers greater than 4k in
size, they have to be aligned to a 4k boundary.

the mru of this chip is 9k, but ive been using the 12k mcl pool to
provide the alignment. however, if we move to putting 8 items on a
pool page there'll be enough slack space in the mcl12k pool pages
to allow item colouring, which in turn will break the chip requirement
above. in practice the chips i have seem to work fine with unaligned
buffers, but i dont want to risk breaking early revision chips.

this moves myx to using a private pool for allocating clusters for
the big rx ring. the item size is 9k, but we specify a 4k alignment
so every item we get out of it will be correct for the chip.


# 1.73 18-Feb-2015 dlg

enable pcie relaxed transaction ordering and bump the max payload
size up to 4k.

found while reading someone elses driver.


# 1.72 22-Dec-2014 tedu

unifdef INET


# 1.71 28-Oct-2014 dlg

the if_rxring accounting would get screwed up if the first mbuf to
be put on the ring couldnt be allocated.

this pulls the code that puts the mbufs on the ring out of myx_rx_fill
so it can return early if firstmb cant be allocated, which puts it
in the right place to return unused slots to the if_rxring.

this means myx rx wont lock up if you're DoSsed to the point where
you exhaust your mbuf pools and cant allocate mbufs for the ring.

ok jmatthew@


# 1.70 04-Oct-2014 dlg

replace mutexes to serialise the operations on the flag that restricts
the number of contexts that are refilling the rx rings with atomic
ops.

this is borrowed from code i wrote for the scsi midlayer but cant
put in yet because i havent got atomic.h up to scrach on all archs
yet. the archs myx runs on do have enough atomic.h to be fine though.


# 1.69 03-Oct-2014 dlg

refill the rx ring in myx_rxeof, not much later at the end of myx_intr.


# 1.68 03-Oct-2014 dlg

in rxeof, instead of taking the biglock on every packet to call bpf
and ether_input, queue all the mbufs onto an mbuf_list on the stack
and then take the biglock once outside the loop.


# 1.67 03-Oct-2014 dlg

we dont need the kernel lock to call bus_dmamap_load and unload thanks
to ketenis.

move the if_ipacket and if_opacket increments out of biglock too.
theyre only updated from the interrupt handler, which is only run
on a single cpu so there's no chance of the update racing. everywhere
else only reads them.


# 1.66 03-Oct-2014 dlg

dont need to hold the kernel lock to call MCLGETI and m_freem now.


# 1.65 03-Oct-2014 dlg

dont take the kernel lock on every interrupt in case we might change
the link state or to clear OACTIVE, just take it when we know we
really are going to do those things.


# 1.64 14-Sep-2014 jsg

remove uneeded proc.h includes
ok mpi@ kspillner@


# 1.63 19-Aug-2014 dlg

in myx_start, replace

while (space) {
IFQ_POLL;
myx_dequeue(free descr);
IFQ_DEQUEUE;
etc;
}

with

while (space && myx_dequeue(free descr)) {
IFQ_DEQUEUE;
etc;
}


# 1.62 18-Aug-2014 dlg

dont rely on mbuf.h to provide pool.h.

ok miod@, who has offerred to help with any MD fallout
ok guenther@


Revision tags: OPENBSD_5_6_BASE
# 1.61 12-Jul-2014 tedu

add a size argument to free. will be used soon, but for now default to 0.
after discussions with beck deraadt kettenis.


# 1.60 10-Jul-2014 dlg

rings that dont rx packets dont need to be refilled.


# 1.59 08-Jul-2014 dlg

cut things that relied on mclgeti for rx ring accounting/restriction over
to using if_rxr.

cut the reporting systat did over to the rxr ioctl.

tested as much as i can on alpha, amd64, and sparc64.
mpi@ has run it on macppc.
ok mpi@


# 1.58 17-Jun-2014 dlg

whitespace fix.

im sick of fixing this by hand on all my boxes while hacking on
other stuff and having it pollute my diffs.

no functional change.


# 1.57 24-Mar-2014 dlg

nothing after the irq ack posting relies on it being ordered.


Revision tags: OPENBSD_5_5_BASE
# 1.56 10-Feb-2014 dlg

the mac addresses you program with MYXCMD_SET_MCASTGROUP are in a different
format to the one used for MYXCMD_SET_LLADDR. for reasons.

this lets ospf work if you dont happen to have PROMISC enabled on your
interface like my production firewalls happen to have, which is why i
never noticed this before.


# 1.55 05-Feb-2014 dlg

after running myx(4) without biglock in production for a few days
i discovered that there's a race between the interrupt code and
myx_start which causes the count of free tx descriptors to get
distorted, which eventually leads to a permanent setting of
IFF_OACTIVE, which in turn prevents the driver from transmitting
packets.

fixing that went horribly wrong when i then discovered that there's
a race between the interrupt handler and myx_down, where the interrupt
can tell myx_down to wake up and free all the rings while the
interrupt handler is still looking at them. free panics for all.

this moves the handling of the tx free count under the biglock (for
now), and moves myx_up and myx_down to managing a "driver state"
variable independantly of the IFF_UP and IFF_RUNNING flags, and
very very careful reordering of the checks of that state variable
and the hardware state.

as a bonus we get to avoid excessive calls to myx_txeof and myx_rxeof
in the isr, and less stuff checked unconditionally. on the other
hand, the sc_state handling added some more checks so it might not
be a win overall.

tested on smp sparc64 with msi and nonmsi interrupts, and on amd64 smp
in production again.


# 1.54 31-Jan-2014 dlg

sc_function is set, but never used for anything useful. clean it up...


# 1.53 31-Jan-2014 dlg

sc_lladdr is never used, so we can get the space in the sc back.


# 1.52 23-Jan-2014 dlg

a lot of people have pointed out to me that taking a lock just to read an
int isnt necessary.


# 1.51 23-Jan-2014 dlg

factor the mutex/bus_space handling of the sts block out.


# 1.50 21-Jan-2014 dlg

introduce fine grained locking.

this doesnt give up the big lock coming from process context, only from
the interrupt side. it is excessively careful about when it takes
the big lock again. notably it goes to a lot of effort to not hold
a mutex while calling into other subsystems or before taking the
big lock.

ive been hitting it as hard as i can without problems.

intensly read by mpi@
ok claudio@ kettenis@


# 1.49 19-Jan-2014 dlg

white space fix


# 1.48 19-Jan-2014 dlg

introduce fine grained locking around the lists of packet handlers
myx maintains. this moves it away from relying on splnet to protect
them.


# 1.47 19-Jan-2014 dlg

hwflags is never used, so clean it up


# 1.46 19-Jan-2014 dlg

replace bcmp with memcmp


# 1.45 19-Jan-2014 dlg

bcopy to memcpy


# 1.44 19-Jan-2014 dlg

replace bzero with memset.


# 1.43 19-Jan-2014 dlg

all 64bit archs myx runs on support bus_space 8 things because of work i
did at n2k13.


Revision tags: OPENBSD_5_3_BASE OPENBSD_5_4_BASE
# 1.42 29-Jan-2013 brad

- Set ENETRESET within myx_ioctl() instead of calling myx_iff() directly, to be
consistent with other drivers.
- Clear IFF_ALLMULTI flag early and at the top of myx_iff().
- Set IFF_ALLMULTI when in promisc mode.

ok dlg@


# 1.41 25-Jan-2013 dlg

we go to a lot of effort to post the first tx descriptor last, but we
really should be trying to post everything except the flags field in the
first tx descriptor. this shuffles things around so the rest of that first
txd is posted as part of the "everything else" before its flags field.


# 1.40 25-Jan-2013 dlg

the myx_dmamem struct doesnt need a name.


# 1.39 21-Jan-2013 dlg

myx does reads and writes in one direction to packet buffers. lets try
STREAMING them.


# 1.38 15-Jan-2013 dlg

dont use amd64 is currently broken cos it has no
bus_space_write_raw_region_8. disabling it for now.


# 1.37 15-Jan-2013 dlg

use bus_space_write_raw_region_8 on 64bit archs when writing to the rings


# 1.36 14-Jan-2013 dlg

map the registers PREFETCHABLE so things that can do write combining can
try and do write combining like the myx doco likes.


# 1.35 14-Jan-2013 dlg

avoid extra bus_space barriers in the interrupt handler.


# 1.34 14-Jan-2013 dlg

when posting descriptors to the chips rings, avoid going write barrier
write barrier write barrier when using myx_write to post descriptors.

instead let its go write write write barrier by using the appropriate
bus_space write directly followed by a single bus_space barrier.

the story above is mostly true, except that myx wants use to write all the
descriptors except the first, barrier, and then write the first one out to
signale that the chip can proceed.

it is also worth noting that the barriers cover more address space than
what we actually wrote to. this makes the code much simpler, and avoids
generating extra fence operations (which is what barrier functions end up
as on most of our archs) when we wrap around the end of the ring. the
bus_space doco encourages this.

bus_space use was discussed with krw@ kettenis@ deraadt@


# 1.33 14-Jan-2013 dlg

the myri doco suggests its nice to post stuff by filling in everything
in the rings except the first descriptor. once you've written as
much as you can out, then you go back and post the first descriptor
to signal that the chip should go ahead and work.


# 1.32 14-Jan-2013 dlg

;; is a long way of saying ;


# 1.31 29-Nov-2012 brad

Remove setting an initial assumed baudrate upon driver attach which is not
necessarily correct, there might not even be a link when attaching.

ok mikeb@ reyk@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.30 28-Nov-2011 blambert

Fix reversed error-handling gotos in myx_buf_fill(), which would lead to
either an mbuf leak or a NULL pointer dereference.

ok sthen@ claudio@ dlg@
testing claudio@ dlg@


Revision tags: OPENBSD_5_0_BASE
# 1.29 08-Aug-2011 dlg

myx requires the driver pad short ethernet frames to 60 bytes by
adding a descriptor pointing at zeroed bytes onto the end of transmit
chains. i was accounting for this extra descriptor when i was
completing the chain, but not when i was setting this up. this
meant the number of free descriptors kept growing until it overflowed.
at this point the check for space in the ring failed and packets
no longer flowed.

this counts the pad descriptor in the tx chain setup too.

ok deraadt@


# 1.28 23-Jun-2011 dlg

cope with empty rx rings by scheduling a timeout to keep trying until it
gets some packets onto the rings.

also annoying, but the hardware doesnt report empty rings, we have to
handle it ourselves.


# 1.27 23-Jun-2011 dlg

this chip has an annoying "feature" where it cannot report the link
state unless the chip is up and handling packets. while its down
it does not report the link state, so it is unknown.

this tweaks the link state handling, in particular it adds code to
myx_down so it moves the link state to unknown, ie, it correctly
reflects reality.

stupidity pointed out by deraadt


# 1.26 22-Jun-2011 deraadt

reset the tx_count on UP, since it may have been advanced from non-zero
by a previous use
ok claudio


# 1.25 22-Jun-2011 dlg

msi support. this is a complicated one...

ok kettenis@


# 1.24 22-Jun-2011 jsg

another myri10ge device matched by freebsd/linux drivers
ok dlg@


# 1.23 22-Jun-2011 dlg

oops, handle refill like i said i was going to two revisions ago.


# 1.22 22-Jun-2011 deraadt

set the mac address on the chip correctly (repair the byte order)
it now works on sparc64, too
ok dlg


# 1.21 22-Jun-2011 dlg

deraadt plugged his myx into a sparc64 and discovered 3 problems:

1. we want to write raw values to registers all the time, so promote the
myx_raw{read,write} to myx_{read,write} and use them everywhere. get rid
of the raw funcs.
2. i was setting the watermarks on the rx ring before knowhing how big
they were.
3. rxfill in the interrupt handler could lose data if you loop on
sts_isvalid.

almost working now...

"please commit your diff" deraadt@


# 1.20 21-Jun-2011 dlg

do the unaligned dma tests so we can figure out if we need to fall
back to the unaligned firmware. apparently this is only an issue
on the "A" controllers which have been supersceded, but those are
the chips we (openbsd devs) have.


# 1.19 21-Jun-2011 dlg

report the controllers part number. eg, i now know i have a
10G-PCIE-8A-R. dmesg looks like this:

myx0 at pci4 dev 0 function 0 "Myricom Z8E" rev 0x00: apic 1 int 8, model 10G-PCIE-8A-R, address 00:60:dd:47:c6:74


# 1.18 21-Jun-2011 dlg

wire up jumbos properly. the hardware supports up to 9018 bytes off
the wire (9000 + ether header + vlan tag), but has some cool alignment
requirements. if you want to use a single rx ring desc to point at
a jumbo it needs to start on a 4k boundary and be physically
contiguous. to ensure this im pulling frames from the 12k pool and
waiting for arianes diff to ensure mbufs are contig.

direction from andrew gallatin. tested locally.


# 1.17 21-Jun-2011 deraadt

minor cleanups; ok dlg


# 1.16 20-Jun-2011 dlg

make the interrupt handler look more like what the doco suggests. seems to
fix a bad lockup i kept getting.


# 1.15 20-Jun-2011 dlg

dont need debug, the myx_cmd stuff works fine.


# 1.14 20-Jun-2011 dlg

i got myx working!


# 1.13 02-May-2011 chl

Do not check malloc return value against NULL, as M_WAITOK is used.

ok dlg@ krw@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.12 19-May-2010 oga

BUS_DMA_ZERO instead of alloc, map, bzero.

ok krw@


Revision tags: OPENBSD_4_7_BASE
# 1.11 13-Aug-2009 jasper

- consistify cfdriver for the ethernet drivers (0 -> NULL)

ok dlg@


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.10 28-Nov-2008 brad

Eliminate the redundant bits of code for MTU and multicast handling
from the individual drivers now that ether_ioctl() handles this.

Shrinks the i386 kernels by..
RAMDISK - 2176 bytes
RAMDISKB - 1504 bytes
RAMDISKC - 736 bytes

Tested by naddy@/okan@/sthen@/brad@/todd@/jmc@ and lots of users.
Build tested on almost all archs by todd@/brad@

ok naddy@


# 1.9 02-Oct-2008 brad

First step towards cleaning up the Ethernet driver ioctl handling.
Move calling ether_ioctl() from the top of the ioctl function, which
at the moment does absolutely nothing, to the default switch case.
Thus allowing drivers to define their own ioctl handlers and then
falling back on ether_ioctl(). The only functional change this results
in at the moment is having all Ethernet drivers returning the proper
errno of ENOTTY instead of EINVAL/ENXIO when encountering unknown
ioctl's.

Shrinks the i386 kernels by..
RAMDISK - 1024 bytes
RAMDISKB - 1120 bytes
RAMDISKC - 832 bytes

Tested by martin@/jsing@/todd@/brad@
Build tested on almost all archs by todd@/brad@

ok jsing@


# 1.8 10-Sep-2008 blambert

Convert timeout_add() calls using multiples of hz to timeout_add_sec()

Really just the low-hanging fruit of (hopefully) forthcoming timeout
conversions.

ok art@, krw@


Revision tags: OPENBSD_4_4_BASE
# 1.7 23-May-2008 brad

Simplify the combination use of pci_mapreg_type()/pci_mapreg_map() as
suggested by dlg@ awhile ago.

ok dlg@


Revision tags: OPENBSD_4_3_BASE
# 1.6 16-Jan-2008 thib

Set the baudrate with IF_Gbps(10); and remove an
XXX comment now that if_baudrate is 64bits.

ok reyk@


Revision tags: OPENBSD_4_2_BASE
# 1.5 01-Jun-2007 reyk

initialize the rings


# 1.4 31-May-2007 reyk

further improvement of the bus space i/o. firmware loading, booting,
and card initalization works now.

thanks to dlg@ who pointed me to the fact that
bus_space_write_region_N and bus_space_write_raw_region_N use count of
elements vs. size of buffer arguments.


# 1.3 31-May-2007 reyk

enable all debugging messages by default if the driver is compiled with
MYX_DEBUG


# 1.2 31-May-2007 reyk

fix the myx_write function


# 1.1 31-May-2007 reyk

initial bits of a new driver for the Myricom Myri-10G Lanai-Z8E 10Gb
Ethernet chipset. not working yet.

ok dlg@


# 1.112 27-Nov-2020 kevlo

Add initialization of sc_sff_lock rwlock.

ok semarie@


Revision tags: OPENBSD_6_8_BASE
# 1.111 17-Jul-2020 dlg

name the rx rings so systat mb shows them.


# 1.110 17-Jul-2020 dlg

add kstats to myx.

myx is unusually minimal, so there's not a lot of information that
the chip provides. the most interesting is the number of packets
the chip drops cos of a lack of space on the rx rings.


# 1.109 10-Jul-2020 patrick

Change users of IFQ_SET_MAXLEN() and IFQ_IS_EMPTY() to use the "new" API.

ok dlg@ tobhe@


Revision tags: OPENBSD_6_6_BASE OPENBSD_6_7_BASE
# 1.108 03-Jul-2019 dlg

use ifiq_input return values to apply backpressure to rings.


# 1.107 16-Apr-2019 dlg

i2c reads are more reliable a byte at a time.

reading all 256 at a time was a nice idea, but meant page 0xa2 wasnt
appearing like it should. this follows what freebsd does more
closely too.


# 1.106 16-Apr-2019 dlg

make sff page reads work on little endian archs too. like amd64.

some modules seem to need more time when waiting for bytes while here.

hrvoje popovski hit the endian issue


# 1.105 15-Apr-2019 dlg

implement SIOCGIFSFFPAGE so ifconfig can get transceiver info.

myx doesn't allow i2c writes, so you can only read whatever page
the firmware is already pointing at on device 0xa0. if you try to
read another page it will return ENXIO.

tested on a 10G-PCIE-8A-R with an xfp module.


# 1.104 15-Apr-2019 dlg

trim some debug code that printed out the name of a command

the list of commands is going to grow, but the thought of keeping
the list in debug code up to date with it just makes me feel tired.

this prints the command id number instead in the same format we
represent it in the header.


Revision tags: OPENBSD_6_2_BASE OPENBSD_6_3_BASE OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.103 01-Aug-2017 dlg

defer init of the myxmcl pool to mountroot, and enable pool cpu caches.

pool_cache_init cannot be called during autoconf because we cant
be confident about the number of cpus in the machine until the first
run of attaches.

mountroot is after autoconf, and myx already has code that runs
there for the firmware loading.

discussed with deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.102 07-Feb-2017 dlg

move the mbuf pools to m_pool_init and a single global memory limit

this replaces individual calls to pool_init, pool_set_constraints, and
pool_sethardlimit with calls to m_pool_init. m_pool_init inits the
mbuf pools with the mbuf pool allocator, and because of that doesnt
set per pool limits.

ok bluhm@ as part of a larger diff


# 1.101 24-Jan-2017 dlg

add support for multiple transmit ifqueues per network interface.

an ifq to transmit a packet is picked by the current traffic
conditioner (ie, priq or hfsc) by providing an index into an array
of ifqs. by default interfaces get a single ifq but can ask for
more using if_attach_queues().

the vast majority of our drivers still think there's a 1:1 mapping
between interfaces and transmit queues, so their if_start routines
take an ifnet pointer instead of a pointer to the ifqueue struct.
instead of changing all the drivers in the tree, drivers can opt
into using an if_qstart routine and setting the IFXF_MPSAFE flag.
the stack provides a compatability wrapper from the new if_qstart
handler to the previous if_start handlers if IFXF_MPSAFE isnt set.

enabling hfsc on an interface configures it to transmit everything
through the first ifq. any other ifqs are left configured as priq,
but unused, when hfsc is enabled.

getting this in now so everyone can kick the tyres.

ok mpi@ visa@ (who provided some tweaks for cnmac).


# 1.100 22-Jan-2017 dlg

move counting if_opackets next to counting if_obytes in if_enqueue.

this means packets are consistently counted in one place, unlike the
many and various ways that drivers thought they should do it.

ok mpi@ deraadt@


# 1.99 31-Oct-2016 dlg

turns out these chips can handle buffers up to 9400 bytes in length.

raise the mtu to 9380 bytes so we can take advantage of the extra space.

i need to revisit the macro names at some point.


# 1.98 31-Oct-2016 dlg

revert 1.97 where i moved myx to using the system pools

my early revision board doesnt like it at all


# 1.97 28-Oct-2016 dlg

get rid of the custom pool in myx for jumbo frames.

now it asks the mbuf layer for the 9k from its pools.

a question from chris@ made me go look at the chip doco again and i
realised that the chip only requires 4 byte alignment for rx buffers,
no 4k alignment for jumbo buffers.

i also found that the chip is supposed to be able to rx up to 9400
bytes instead of 9000. ill fix that later though.


# 1.96 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


Revision tags: OPENBSD_6_0_BASE
# 1.95 23-May-2016 tedu

remove the function pointer from mbufs. this memory is shared with data
via unions, and we don't want to make it easy to control the target.
instead an integer index into an array of acceptable functions is used.
drivers using custom functions must register them to receive an index.
ok deraadt


# 1.94 13-Apr-2016 mpi

G/C IFQ_SET_READY().


# 1.93 13-Apr-2016 mpi

G/C IFQ_SET_READY().


Revision tags: OPENBSD_5_9_BASE
# 1.92 11-Dec-2015 mpi

Replace mountroothook_establish(9) by config_mountroot(9) a narrower API
similar to config_defer(9).

ok mikeb@, deraadt@


# 1.91 09-Dec-2015 dlg

rework the if_start mpsafe serialisation so it can serialise arbitrary work

work is represented by struct task.

the start routine is now wrapped by a task which is serialised by the
infrastructure. if_start_barrier has been renamed to ifq_barrier and
is now implemented as a task that gets serialised with the start
routine.

this also adds an ifq_restart() function. it serialises a call to
ifq_clr_oactive and calls the start routine again. it exists to
avoid a race that kettenis@ identified in between when a start
routine discovers theres no space left on a ring, and when it calls
ifq_set_oactive. if the txeof side of the driver empties the ring
and calls ifq_clr_oactive in between the above calls in start, the
queue will be marked oactive and the stack will never call the start
routine again.

by serialising the ifq_set_oactive call in the start routine and
ifq_clr_oactive calls we avoid that race.

tested on various nics
ok mpi@


# 1.90 03-Dec-2015 dlg

tell the stack myx_start is mpsafe.

as per the stack commit, the driver changes are:

1. setting ifp->if_xflags = IFXF_MPSAFE
2. only calling if_start() instead of its own start routine
3. clearing IFF_RUNNING before calling if_start_barrier() on its way down
4. only using IFQ_DEQUEUE (not ifq_deq_begin/commit/rollback)


# 1.89 01-Dec-2015 dlg

myx doesnt use atomic.h anymore.


# 1.88 25-Nov-2015 dlg

replace IFF_OACTIVE manipulation with mpsafe operations.

there are two things shared between the network stack and drivers
in the send path: the send queue and the IFF_OACTIVE flag. the send
queue is now protected by a mutex. this diff makes the oactive
functionality mpsafe too.

IFF_OACTIVE is part of if_flags. there are two problems with that.
firstly, if_flags is a short and we dont have any MI atomic operations
to manipulate a short. secondly, while we could make the IFF_OACTIVE
operates mpsafe, all changes to other flags would have to be made
safe at the same time, otherwise a read-modify-write cycle on their
updates could clobber the oactive change.

instead, this moves the oactive mark into struct ifqueue and provides
an API for changing it. there's ifq_set_oactive, ifq_clr_oactive,
and ifq_is_oactive. these are modelled on ifsq_set_oactive,
ifsq_clr_oactive, and ifsq_is_oactive in dragonflybsd.

this diff includes changes to all the drivers manipulating IFF_OACTIVE
to now use the ifsq_{set,clr_is}_oactive API too.

ok kettenis@ mpi@ jmatthew@ deraadt@


# 1.87 24-Nov-2015 dlg

fix tx ring accounting in myx_start.

turns out i was calculating the number of packets (not descriptors)
on the tx ring, and then using that as the free space for descriptors.


# 1.86 19-Nov-2015 dlg

get rid of sc_tx_free and the atomic ops on it in myx_start and myx_txeof.

myx_start calculates the free space by reading the consumer index
and doing some maths, which lets us avoid the interlocked cpu ops.


# 1.85 25-Oct-2015 mpi

arp_ifinit() is no longer needed.


# 1.84 29-Sep-2015 dlg

get rid of the mutex between access to the status block and myx_down

myx is unusual in that it has an explicit command to shut down the
chip that gets an interrupt when it's done. so myx_down sends the
command and has to sleep until it gets that interrupt. this moves
to using a single int to represent that state (so loads and stores
are atomic), and sleep_setup/sleep_finish in myx_down to wait for
it to change.

this has been running in production at work for a few months now
tested by chris@


# 1.83 01-Sep-2015 deraadt

free() firmware with right len; ok dlg


# 1.82 15-Aug-2015 dlg

do the global tx free accounting in myx_start with a single atomic op
instead of one per packet.

seems to let me send packets a little faster.


# 1.81 15-Aug-2015 dlg

rework the tx path to use a ring to keep track of dmamaps/mbufs.

this removes the myx_buf structure and uses myx_slot instead. theyre
the same expcet slots dont have list entry structures, so theyre
smaller.

this cuts out four mutex ops per packet out of the tx handling.
just have to get rid of the atomic op per packet in myx_start now.


# 1.80 14-Aug-2015 dlg

move to a per rx ring timeout for refilling empty rings.

this lets me get rid of the locking around the refilling of the rx ring.

the timeout only runs refill if the rx ring is empty. we know rxeof
wont try and refill it in that situation because there's no packets
on the ring so we wont get interrupts for it. therefore we dont
need to lock between the timeout and rxeof cos they cant run at the
same time.


# 1.79 14-Aug-2015 dlg

rework how we track the packets on the rx rings.

originally there were two mutex protected lists for rx packets, a
list of free packets, and a list of packets that were on the ring.
filling the ring popped packets off the free list, attached an mbuf
and dmamapped it, and pushed it onto the list of active packets.
the hw fills packets in order, so on rx completion we'd pop packets
the active list, unmap the mbuf and shove it up the stack before
putting the packet on the free list.

the problem with the lists is that every rx ring operation resulted
in two mutex ops. so 4 mutex ops per packet after you do both fill
and rxeof.

this replaces the mutexed lists with rings that shadow the hardware
rings. filling the rx ring pushes a producer index along, while
rxeof chases it with a consumer. because we know only one thing can
do either of those tasks at a time, we can get away with not using
atomic ops for them.

there's more to be done, but this is a good first step.


Revision tags: OPENBSD_5_8_BASE
# 1.78 24-Jun-2015 mpi

Increment if_ipackets in if_input().

Note that pseudo-drivers not using if_input() are not affected by this
conversion.

ok mikeb@, kettenis@, claudio@, dlg@


# 1.77 17-May-2015 chris

We don't need KERNEL_LOCK() around if_input() anymore, as if_input() has
appropriate locking around bpf now.

ok dlg@


# 1.76 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.75 20-Feb-2015 chris

Now that if_input() is a thing, use it

ok dlg@


# 1.74 18-Feb-2015 dlg

myri employees and their drivers for linux and solaris have repeatedly
told me that if you're going to rx into buffers greater than 4k in
size, they have to be aligned to a 4k boundary.

the mru of this chip is 9k, but ive been using the 12k mcl pool to
provide the alignment. however, if we move to putting 8 items on a
pool page there'll be enough slack space in the mcl12k pool pages
to allow item colouring, which in turn will break the chip requirement
above. in practice the chips i have seem to work fine with unaligned
buffers, but i dont want to risk breaking early revision chips.

this moves myx to using a private pool for allocating clusters for
the big rx ring. the item size is 9k, but we specify a 4k alignment
so every item we get out of it will be correct for the chip.


# 1.73 18-Feb-2015 dlg

enable pcie relaxed transaction ordering and bump the max payload
size up to 4k.

found while reading someone elses driver.


# 1.72 22-Dec-2014 tedu

unifdef INET


# 1.71 28-Oct-2014 dlg

the if_rxring accounting would get screwed up if the first mbuf to
be put on the ring couldnt be allocated.

this pulls the code that puts the mbufs on the ring out of myx_rx_fill
so it can return early if firstmb cant be allocated, which puts it
in the right place to return unused slots to the if_rxring.

this means myx rx wont lock up if you're DoSsed to the point where
you exhaust your mbuf pools and cant allocate mbufs for the ring.

ok jmatthew@


# 1.70 04-Oct-2014 dlg

replace mutexes to serialise the operations on the flag that restricts
the number of contexts that are refilling the rx rings with atomic
ops.

this is borrowed from code i wrote for the scsi midlayer but cant
put in yet because i havent got atomic.h up to scrach on all archs
yet. the archs myx runs on do have enough atomic.h to be fine though.


# 1.69 03-Oct-2014 dlg

refill the rx ring in myx_rxeof, not much later at the end of myx_intr.


# 1.68 03-Oct-2014 dlg

in rxeof, instead of taking the biglock on every packet to call bpf
and ether_input, queue all the mbufs onto an mbuf_list on the stack
and then take the biglock once outside the loop.


# 1.67 03-Oct-2014 dlg

we dont need the kernel lock to call bus_dmamap_load and unload thanks
to ketenis.

move the if_ipacket and if_opacket increments out of biglock too.
theyre only updated from the interrupt handler, which is only run
on a single cpu so there's no chance of the update racing. everywhere
else only reads them.


# 1.66 03-Oct-2014 dlg

dont need to hold the kernel lock to call MCLGETI and m_freem now.


# 1.65 03-Oct-2014 dlg

dont take the kernel lock on every interrupt in case we might change
the link state or to clear OACTIVE, just take it when we know we
really are going to do those things.


# 1.64 14-Sep-2014 jsg

remove uneeded proc.h includes
ok mpi@ kspillner@


# 1.63 19-Aug-2014 dlg

in myx_start, replace

while (space) {
IFQ_POLL;
myx_dequeue(free descr);
IFQ_DEQUEUE;
etc;
}

with

while (space && myx_dequeue(free descr)) {
IFQ_DEQUEUE;
etc;
}


# 1.62 18-Aug-2014 dlg

dont rely on mbuf.h to provide pool.h.

ok miod@, who has offerred to help with any MD fallout
ok guenther@


Revision tags: OPENBSD_5_6_BASE
# 1.61 12-Jul-2014 tedu

add a size argument to free. will be used soon, but for now default to 0.
after discussions with beck deraadt kettenis.


# 1.60 10-Jul-2014 dlg

rings that dont rx packets dont need to be refilled.


# 1.59 08-Jul-2014 dlg

cut things that relied on mclgeti for rx ring accounting/restriction over
to using if_rxr.

cut the reporting systat did over to the rxr ioctl.

tested as much as i can on alpha, amd64, and sparc64.
mpi@ has run it on macppc.
ok mpi@


# 1.58 17-Jun-2014 dlg

whitespace fix.

im sick of fixing this by hand on all my boxes while hacking on
other stuff and having it pollute my diffs.

no functional change.


# 1.57 24-Mar-2014 dlg

nothing after the irq ack posting relies on it being ordered.


Revision tags: OPENBSD_5_5_BASE
# 1.56 10-Feb-2014 dlg

the mac addresses you program with MYXCMD_SET_MCASTGROUP are in a different
format to the one used for MYXCMD_SET_LLADDR. for reasons.

this lets ospf work if you dont happen to have PROMISC enabled on your
interface like my production firewalls happen to have, which is why i
never noticed this before.


# 1.55 05-Feb-2014 dlg

after running myx(4) without biglock in production for a few days
i discovered that there's a race between the interrupt code and
myx_start which causes the count of free tx descriptors to get
distorted, which eventually leads to a permanent setting of
IFF_OACTIVE, which in turn prevents the driver from transmitting
packets.

fixing that went horribly wrong when i then discovered that there's
a race between the interrupt handler and myx_down, where the interrupt
can tell myx_down to wake up and free all the rings while the
interrupt handler is still looking at them. free panics for all.

this moves the handling of the tx free count under the biglock (for
now), and moves myx_up and myx_down to managing a "driver state"
variable independantly of the IFF_UP and IFF_RUNNING flags, and
very very careful reordering of the checks of that state variable
and the hardware state.

as a bonus we get to avoid excessive calls to myx_txeof and myx_rxeof
in the isr, and less stuff checked unconditionally. on the other
hand, the sc_state handling added some more checks so it might not
be a win overall.

tested on smp sparc64 with msi and nonmsi interrupts, and on amd64 smp
in production again.


# 1.54 31-Jan-2014 dlg

sc_function is set, but never used for anything useful. clean it up...


# 1.53 31-Jan-2014 dlg

sc_lladdr is never used, so we can get the space in the sc back.


# 1.52 23-Jan-2014 dlg

a lot of people have pointed out to me that taking a lock just to read an
int isnt necessary.


# 1.51 23-Jan-2014 dlg

factor the mutex/bus_space handling of the sts block out.


# 1.50 21-Jan-2014 dlg

introduce fine grained locking.

this doesnt give up the big lock coming from process context, only from
the interrupt side. it is excessively careful about when it takes
the big lock again. notably it goes to a lot of effort to not hold
a mutex while calling into other subsystems or before taking the
big lock.

ive been hitting it as hard as i can without problems.

intensly read by mpi@
ok claudio@ kettenis@


# 1.49 19-Jan-2014 dlg

white space fix


# 1.48 19-Jan-2014 dlg

introduce fine grained locking around the lists of packet handlers
myx maintains. this moves it away from relying on splnet to protect
them.


# 1.47 19-Jan-2014 dlg

hwflags is never used, so clean it up


# 1.46 19-Jan-2014 dlg

replace bcmp with memcmp


# 1.45 19-Jan-2014 dlg

bcopy to memcpy


# 1.44 19-Jan-2014 dlg

replace bzero with memset.


# 1.43 19-Jan-2014 dlg

all 64bit archs myx runs on support bus_space 8 things because of work i
did at n2k13.


Revision tags: OPENBSD_5_3_BASE OPENBSD_5_4_BASE
# 1.42 29-Jan-2013 brad

- Set ENETRESET within myx_ioctl() instead of calling myx_iff() directly, to be
consistent with other drivers.
- Clear IFF_ALLMULTI flag early and at the top of myx_iff().
- Set IFF_ALLMULTI when in promisc mode.

ok dlg@


# 1.41 25-Jan-2013 dlg

we go to a lot of effort to post the first tx descriptor last, but we
really should be trying to post everything except the flags field in the
first tx descriptor. this shuffles things around so the rest of that first
txd is posted as part of the "everything else" before its flags field.


# 1.40 25-Jan-2013 dlg

the myx_dmamem struct doesnt need a name.


# 1.39 21-Jan-2013 dlg

myx does reads and writes in one direction to packet buffers. lets try
STREAMING them.


# 1.38 15-Jan-2013 dlg

dont use amd64 is currently broken cos it has no
bus_space_write_raw_region_8. disabling it for now.


# 1.37 15-Jan-2013 dlg

use bus_space_write_raw_region_8 on 64bit archs when writing to the rings


# 1.36 14-Jan-2013 dlg

map the registers PREFETCHABLE so things that can do write combining can
try and do write combining like the myx doco likes.


# 1.35 14-Jan-2013 dlg

avoid extra bus_space barriers in the interrupt handler.


# 1.34 14-Jan-2013 dlg

when posting descriptors to the chips rings, avoid going write barrier
write barrier write barrier when using myx_write to post descriptors.

instead let its go write write write barrier by using the appropriate
bus_space write directly followed by a single bus_space barrier.

the story above is mostly true, except that myx wants use to write all the
descriptors except the first, barrier, and then write the first one out to
signale that the chip can proceed.

it is also worth noting that the barriers cover more address space than
what we actually wrote to. this makes the code much simpler, and avoids
generating extra fence operations (which is what barrier functions end up
as on most of our archs) when we wrap around the end of the ring. the
bus_space doco encourages this.

bus_space use was discussed with krw@ kettenis@ deraadt@


# 1.33 14-Jan-2013 dlg

the myri doco suggests its nice to post stuff by filling in everything
in the rings except the first descriptor. once you've written as
much as you can out, then you go back and post the first descriptor
to signal that the chip should go ahead and work.


# 1.32 14-Jan-2013 dlg

;; is a long way of saying ;


# 1.31 29-Nov-2012 brad

Remove setting an initial assumed baudrate upon driver attach which is not
necessarily correct, there might not even be a link when attaching.

ok mikeb@ reyk@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.30 28-Nov-2011 blambert

Fix reversed error-handling gotos in myx_buf_fill(), which would lead to
either an mbuf leak or a NULL pointer dereference.

ok sthen@ claudio@ dlg@
testing claudio@ dlg@


Revision tags: OPENBSD_5_0_BASE
# 1.29 08-Aug-2011 dlg

myx requires the driver pad short ethernet frames to 60 bytes by
adding a descriptor pointing at zeroed bytes onto the end of transmit
chains. i was accounting for this extra descriptor when i was
completing the chain, but not when i was setting this up. this
meant the number of free descriptors kept growing until it overflowed.
at this point the check for space in the ring failed and packets
no longer flowed.

this counts the pad descriptor in the tx chain setup too.

ok deraadt@


# 1.28 23-Jun-2011 dlg

cope with empty rx rings by scheduling a timeout to keep trying until it
gets some packets onto the rings.

also annoying, but the hardware doesnt report empty rings, we have to
handle it ourselves.


# 1.27 23-Jun-2011 dlg

this chip has an annoying "feature" where it cannot report the link
state unless the chip is up and handling packets. while its down
it does not report the link state, so it is unknown.

this tweaks the link state handling, in particular it adds code to
myx_down so it moves the link state to unknown, ie, it correctly
reflects reality.

stupidity pointed out by deraadt


# 1.26 22-Jun-2011 deraadt

reset the tx_count on UP, since it may have been advanced from non-zero
by a previous use
ok claudio


# 1.25 22-Jun-2011 dlg

msi support. this is a complicated one...

ok kettenis@


# 1.24 22-Jun-2011 jsg

another myri10ge device matched by freebsd/linux drivers
ok dlg@


# 1.23 22-Jun-2011 dlg

oops, handle refill like i said i was going to two revisions ago.


# 1.22 22-Jun-2011 deraadt

set the mac address on the chip correctly (repair the byte order)
it now works on sparc64, too
ok dlg


# 1.21 22-Jun-2011 dlg

deraadt plugged his myx into a sparc64 and discovered 3 problems:

1. we want to write raw values to registers all the time, so promote the
myx_raw{read,write} to myx_{read,write} and use them everywhere. get rid
of the raw funcs.
2. i was setting the watermarks on the rx ring before knowhing how big
they were.
3. rxfill in the interrupt handler could lose data if you loop on
sts_isvalid.

almost working now...

"please commit your diff" deraadt@


# 1.20 21-Jun-2011 dlg

do the unaligned dma tests so we can figure out if we need to fall
back to the unaligned firmware. apparently this is only an issue
on the "A" controllers which have been supersceded, but those are
the chips we (openbsd devs) have.


# 1.19 21-Jun-2011 dlg

report the controllers part number. eg, i now know i have a
10G-PCIE-8A-R. dmesg looks like this:

myx0 at pci4 dev 0 function 0 "Myricom Z8E" rev 0x00: apic 1 int 8, model 10G-PCIE-8A-R, address 00:60:dd:47:c6:74


# 1.18 21-Jun-2011 dlg

wire up jumbos properly. the hardware supports up to 9018 bytes off
the wire (9000 + ether header + vlan tag), but has some cool alignment
requirements. if you want to use a single rx ring desc to point at
a jumbo it needs to start on a 4k boundary and be physically
contiguous. to ensure this im pulling frames from the 12k pool and
waiting for arianes diff to ensure mbufs are contig.

direction from andrew gallatin. tested locally.


# 1.17 21-Jun-2011 deraadt

minor cleanups; ok dlg


# 1.16 20-Jun-2011 dlg

make the interrupt handler look more like what the doco suggests. seems to
fix a bad lockup i kept getting.


# 1.15 20-Jun-2011 dlg

dont need debug, the myx_cmd stuff works fine.


# 1.14 20-Jun-2011 dlg

i got myx working!


# 1.13 02-May-2011 chl

Do not check malloc return value against NULL, as M_WAITOK is used.

ok dlg@ krw@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.12 19-May-2010 oga

BUS_DMA_ZERO instead of alloc, map, bzero.

ok krw@


Revision tags: OPENBSD_4_7_BASE
# 1.11 13-Aug-2009 jasper

- consistify cfdriver for the ethernet drivers (0 -> NULL)

ok dlg@


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.10 28-Nov-2008 brad

Eliminate the redundant bits of code for MTU and multicast handling
from the individual drivers now that ether_ioctl() handles this.

Shrinks the i386 kernels by..
RAMDISK - 2176 bytes
RAMDISKB - 1504 bytes
RAMDISKC - 736 bytes

Tested by naddy@/okan@/sthen@/brad@/todd@/jmc@ and lots of users.
Build tested on almost all archs by todd@/brad@

ok naddy@


# 1.9 02-Oct-2008 brad

First step towards cleaning up the Ethernet driver ioctl handling.
Move calling ether_ioctl() from the top of the ioctl function, which
at the moment does absolutely nothing, to the default switch case.
Thus allowing drivers to define their own ioctl handlers and then
falling back on ether_ioctl(). The only functional change this results
in at the moment is having all Ethernet drivers returning the proper
errno of ENOTTY instead of EINVAL/ENXIO when encountering unknown
ioctl's.

Shrinks the i386 kernels by..
RAMDISK - 1024 bytes
RAMDISKB - 1120 bytes
RAMDISKC - 832 bytes

Tested by martin@/jsing@/todd@/brad@
Build tested on almost all archs by todd@/brad@

ok jsing@


# 1.8 10-Sep-2008 blambert

Convert timeout_add() calls using multiples of hz to timeout_add_sec()

Really just the low-hanging fruit of (hopefully) forthcoming timeout
conversions.

ok art@, krw@


Revision tags: OPENBSD_4_4_BASE
# 1.7 23-May-2008 brad

Simplify the combination use of pci_mapreg_type()/pci_mapreg_map() as
suggested by dlg@ awhile ago.

ok dlg@


Revision tags: OPENBSD_4_3_BASE
# 1.6 16-Jan-2008 thib

Set the baudrate with IF_Gbps(10); and remove an
XXX comment now that if_baudrate is 64bits.

ok reyk@


Revision tags: OPENBSD_4_2_BASE
# 1.5 01-Jun-2007 reyk

initialize the rings


# 1.4 31-May-2007 reyk

further improvement of the bus space i/o. firmware loading, booting,
and card initalization works now.

thanks to dlg@ who pointed me to the fact that
bus_space_write_region_N and bus_space_write_raw_region_N use count of
elements vs. size of buffer arguments.


# 1.3 31-May-2007 reyk

enable all debugging messages by default if the driver is compiled with
MYX_DEBUG


# 1.2 31-May-2007 reyk

fix the myx_write function


# 1.1 31-May-2007 reyk

initial bits of a new driver for the Myricom Myri-10G Lanai-Z8E 10Gb
Ethernet chipset. not working yet.

ok dlg@


# 1.111 17-Jul-2020 dlg

name the rx rings so systat mb shows them.


# 1.110 17-Jul-2020 dlg

add kstats to myx.

myx is unusually minimal, so there's not a lot of information that
the chip provides. the most interesting is the number of packets
the chip drops cos of a lack of space on the rx rings.


# 1.109 10-Jul-2020 patrick

Change users of IFQ_SET_MAXLEN() and IFQ_IS_EMPTY() to use the "new" API.

ok dlg@ tobhe@


Revision tags: OPENBSD_6_6_BASE OPENBSD_6_7_BASE
# 1.108 03-Jul-2019 dlg

use ifiq_input return values to apply backpressure to rings.


# 1.107 16-Apr-2019 dlg

i2c reads are more reliable a byte at a time.

reading all 256 at a time was a nice idea, but meant page 0xa2 wasnt
appearing like it should. this follows what freebsd does more
closely too.


# 1.106 16-Apr-2019 dlg

make sff page reads work on little endian archs too. like amd64.

some modules seem to need more time when waiting for bytes while here.

hrvoje popovski hit the endian issue


# 1.105 15-Apr-2019 dlg

implement SIOCGIFSFFPAGE so ifconfig can get transceiver info.

myx doesn't allow i2c writes, so you can only read whatever page
the firmware is already pointing at on device 0xa0. if you try to
read another page it will return ENXIO.

tested on a 10G-PCIE-8A-R with an xfp module.


# 1.104 15-Apr-2019 dlg

trim some debug code that printed out the name of a command

the list of commands is going to grow, but the thought of keeping
the list in debug code up to date with it just makes me feel tired.

this prints the command id number instead in the same format we
represent it in the header.


Revision tags: OPENBSD_6_2_BASE OPENBSD_6_3_BASE OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.103 01-Aug-2017 dlg

defer init of the myxmcl pool to mountroot, and enable pool cpu caches.

pool_cache_init cannot be called during autoconf because we cant
be confident about the number of cpus in the machine until the first
run of attaches.

mountroot is after autoconf, and myx already has code that runs
there for the firmware loading.

discussed with deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.102 07-Feb-2017 dlg

move the mbuf pools to m_pool_init and a single global memory limit

this replaces individual calls to pool_init, pool_set_constraints, and
pool_sethardlimit with calls to m_pool_init. m_pool_init inits the
mbuf pools with the mbuf pool allocator, and because of that doesnt
set per pool limits.

ok bluhm@ as part of a larger diff


# 1.101 24-Jan-2017 dlg

add support for multiple transmit ifqueues per network interface.

an ifq to transmit a packet is picked by the current traffic
conditioner (ie, priq or hfsc) by providing an index into an array
of ifqs. by default interfaces get a single ifq but can ask for
more using if_attach_queues().

the vast majority of our drivers still think there's a 1:1 mapping
between interfaces and transmit queues, so their if_start routines
take an ifnet pointer instead of a pointer to the ifqueue struct.
instead of changing all the drivers in the tree, drivers can opt
into using an if_qstart routine and setting the IFXF_MPSAFE flag.
the stack provides a compatability wrapper from the new if_qstart
handler to the previous if_start handlers if IFXF_MPSAFE isnt set.

enabling hfsc on an interface configures it to transmit everything
through the first ifq. any other ifqs are left configured as priq,
but unused, when hfsc is enabled.

getting this in now so everyone can kick the tyres.

ok mpi@ visa@ (who provided some tweaks for cnmac).


# 1.100 22-Jan-2017 dlg

move counting if_opackets next to counting if_obytes in if_enqueue.

this means packets are consistently counted in one place, unlike the
many and various ways that drivers thought they should do it.

ok mpi@ deraadt@


# 1.99 31-Oct-2016 dlg

turns out these chips can handle buffers up to 9400 bytes in length.

raise the mtu to 9380 bytes so we can take advantage of the extra space.

i need to revisit the macro names at some point.


# 1.98 31-Oct-2016 dlg

revert 1.97 where i moved myx to using the system pools

my early revision board doesnt like it at all


# 1.97 28-Oct-2016 dlg

get rid of the custom pool in myx for jumbo frames.

now it asks the mbuf layer for the 9k from its pools.

a question from chris@ made me go look at the chip doco again and i
realised that the chip only requires 4 byte alignment for rx buffers,
no 4k alignment for jumbo buffers.

i also found that the chip is supposed to be able to rx up to 9400
bytes instead of 9000. ill fix that later though.


# 1.96 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


Revision tags: OPENBSD_6_0_BASE
# 1.95 23-May-2016 tedu

remove the function pointer from mbufs. this memory is shared with data
via unions, and we don't want to make it easy to control the target.
instead an integer index into an array of acceptable functions is used.
drivers using custom functions must register them to receive an index.
ok deraadt


# 1.94 13-Apr-2016 mpi

G/C IFQ_SET_READY().


# 1.93 13-Apr-2016 mpi

G/C IFQ_SET_READY().


Revision tags: OPENBSD_5_9_BASE
# 1.92 11-Dec-2015 mpi

Replace mountroothook_establish(9) by config_mountroot(9) a narrower API
similar to config_defer(9).

ok mikeb@, deraadt@


# 1.91 09-Dec-2015 dlg

rework the if_start mpsafe serialisation so it can serialise arbitrary work

work is represented by struct task.

the start routine is now wrapped by a task which is serialised by the
infrastructure. if_start_barrier has been renamed to ifq_barrier and
is now implemented as a task that gets serialised with the start
routine.

this also adds an ifq_restart() function. it serialises a call to
ifq_clr_oactive and calls the start routine again. it exists to
avoid a race that kettenis@ identified in between when a start
routine discovers theres no space left on a ring, and when it calls
ifq_set_oactive. if the txeof side of the driver empties the ring
and calls ifq_clr_oactive in between the above calls in start, the
queue will be marked oactive and the stack will never call the start
routine again.

by serialising the ifq_set_oactive call in the start routine and
ifq_clr_oactive calls we avoid that race.

tested on various nics
ok mpi@


# 1.90 03-Dec-2015 dlg

tell the stack myx_start is mpsafe.

as per the stack commit, the driver changes are:

1. setting ifp->if_xflags = IFXF_MPSAFE
2. only calling if_start() instead of its own start routine
3. clearing IFF_RUNNING before calling if_start_barrier() on its way down
4. only using IFQ_DEQUEUE (not ifq_deq_begin/commit/rollback)


# 1.89 01-Dec-2015 dlg

myx doesnt use atomic.h anymore.


# 1.88 25-Nov-2015 dlg

replace IFF_OACTIVE manipulation with mpsafe operations.

there are two things shared between the network stack and drivers
in the send path: the send queue and the IFF_OACTIVE flag. the send
queue is now protected by a mutex. this diff makes the oactive
functionality mpsafe too.

IFF_OACTIVE is part of if_flags. there are two problems with that.
firstly, if_flags is a short and we dont have any MI atomic operations
to manipulate a short. secondly, while we could make the IFF_OACTIVE
operates mpsafe, all changes to other flags would have to be made
safe at the same time, otherwise a read-modify-write cycle on their
updates could clobber the oactive change.

instead, this moves the oactive mark into struct ifqueue and provides
an API for changing it. there's ifq_set_oactive, ifq_clr_oactive,
and ifq_is_oactive. these are modelled on ifsq_set_oactive,
ifsq_clr_oactive, and ifsq_is_oactive in dragonflybsd.

this diff includes changes to all the drivers manipulating IFF_OACTIVE
to now use the ifsq_{set,clr_is}_oactive API too.

ok kettenis@ mpi@ jmatthew@ deraadt@


# 1.87 24-Nov-2015 dlg

fix tx ring accounting in myx_start.

turns out i was calculating the number of packets (not descriptors)
on the tx ring, and then using that as the free space for descriptors.


# 1.86 19-Nov-2015 dlg

get rid of sc_tx_free and the atomic ops on it in myx_start and myx_txeof.

myx_start calculates the free space by reading the consumer index
and doing some maths, which lets us avoid the interlocked cpu ops.


# 1.85 25-Oct-2015 mpi

arp_ifinit() is no longer needed.


# 1.84 29-Sep-2015 dlg

get rid of the mutex between access to the status block and myx_down

myx is unusual in that it has an explicit command to shut down the
chip that gets an interrupt when it's done. so myx_down sends the
command and has to sleep until it gets that interrupt. this moves
to using a single int to represent that state (so loads and stores
are atomic), and sleep_setup/sleep_finish in myx_down to wait for
it to change.

this has been running in production at work for a few months now
tested by chris@


# 1.83 01-Sep-2015 deraadt

free() firmware with right len; ok dlg


# 1.82 15-Aug-2015 dlg

do the global tx free accounting in myx_start with a single atomic op
instead of one per packet.

seems to let me send packets a little faster.


# 1.81 15-Aug-2015 dlg

rework the tx path to use a ring to keep track of dmamaps/mbufs.

this removes the myx_buf structure and uses myx_slot instead. theyre
the same expcet slots dont have list entry structures, so theyre
smaller.

this cuts out four mutex ops per packet out of the tx handling.
just have to get rid of the atomic op per packet in myx_start now.


# 1.80 14-Aug-2015 dlg

move to a per rx ring timeout for refilling empty rings.

this lets me get rid of the locking around the refilling of the rx ring.

the timeout only runs refill if the rx ring is empty. we know rxeof
wont try and refill it in that situation because there's no packets
on the ring so we wont get interrupts for it. therefore we dont
need to lock between the timeout and rxeof cos they cant run at the
same time.


# 1.79 14-Aug-2015 dlg

rework how we track the packets on the rx rings.

originally there were two mutex protected lists for rx packets, a
list of free packets, and a list of packets that were on the ring.
filling the ring popped packets off the free list, attached an mbuf
and dmamapped it, and pushed it onto the list of active packets.
the hw fills packets in order, so on rx completion we'd pop packets
the active list, unmap the mbuf and shove it up the stack before
putting the packet on the free list.

the problem with the lists is that every rx ring operation resulted
in two mutex ops. so 4 mutex ops per packet after you do both fill
and rxeof.

this replaces the mutexed lists with rings that shadow the hardware
rings. filling the rx ring pushes a producer index along, while
rxeof chases it with a consumer. because we know only one thing can
do either of those tasks at a time, we can get away with not using
atomic ops for them.

there's more to be done, but this is a good first step.


Revision tags: OPENBSD_5_8_BASE
# 1.78 24-Jun-2015 mpi

Increment if_ipackets in if_input().

Note that pseudo-drivers not using if_input() are not affected by this
conversion.

ok mikeb@, kettenis@, claudio@, dlg@


# 1.77 17-May-2015 chris

We don't need KERNEL_LOCK() around if_input() anymore, as if_input() has
appropriate locking around bpf now.

ok dlg@


# 1.76 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.75 20-Feb-2015 chris

Now that if_input() is a thing, use it

ok dlg@


# 1.74 18-Feb-2015 dlg

myri employees and their drivers for linux and solaris have repeatedly
told me that if you're going to rx into buffers greater than 4k in
size, they have to be aligned to a 4k boundary.

the mru of this chip is 9k, but ive been using the 12k mcl pool to
provide the alignment. however, if we move to putting 8 items on a
pool page there'll be enough slack space in the mcl12k pool pages
to allow item colouring, which in turn will break the chip requirement
above. in practice the chips i have seem to work fine with unaligned
buffers, but i dont want to risk breaking early revision chips.

this moves myx to using a private pool for allocating clusters for
the big rx ring. the item size is 9k, but we specify a 4k alignment
so every item we get out of it will be correct for the chip.


# 1.73 18-Feb-2015 dlg

enable pcie relaxed transaction ordering and bump the max payload
size up to 4k.

found while reading someone elses driver.


# 1.72 22-Dec-2014 tedu

unifdef INET


# 1.71 28-Oct-2014 dlg

the if_rxring accounting would get screwed up if the first mbuf to
be put on the ring couldnt be allocated.

this pulls the code that puts the mbufs on the ring out of myx_rx_fill
so it can return early if firstmb cant be allocated, which puts it
in the right place to return unused slots to the if_rxring.

this means myx rx wont lock up if you're DoSsed to the point where
you exhaust your mbuf pools and cant allocate mbufs for the ring.

ok jmatthew@


# 1.70 04-Oct-2014 dlg

replace mutexes to serialise the operations on the flag that restricts
the number of contexts that are refilling the rx rings with atomic
ops.

this is borrowed from code i wrote for the scsi midlayer but cant
put in yet because i havent got atomic.h up to scrach on all archs
yet. the archs myx runs on do have enough atomic.h to be fine though.


# 1.69 03-Oct-2014 dlg

refill the rx ring in myx_rxeof, not much later at the end of myx_intr.


# 1.68 03-Oct-2014 dlg

in rxeof, instead of taking the biglock on every packet to call bpf
and ether_input, queue all the mbufs onto an mbuf_list on the stack
and then take the biglock once outside the loop.


# 1.67 03-Oct-2014 dlg

we dont need the kernel lock to call bus_dmamap_load and unload thanks
to ketenis.

move the if_ipacket and if_opacket increments out of biglock too.
theyre only updated from the interrupt handler, which is only run
on a single cpu so there's no chance of the update racing. everywhere
else only reads them.


# 1.66 03-Oct-2014 dlg

dont need to hold the kernel lock to call MCLGETI and m_freem now.


# 1.65 03-Oct-2014 dlg

dont take the kernel lock on every interrupt in case we might change
the link state or to clear OACTIVE, just take it when we know we
really are going to do those things.


# 1.64 14-Sep-2014 jsg

remove uneeded proc.h includes
ok mpi@ kspillner@


# 1.63 19-Aug-2014 dlg

in myx_start, replace

while (space) {
IFQ_POLL;
myx_dequeue(free descr);
IFQ_DEQUEUE;
etc;
}

with

while (space && myx_dequeue(free descr)) {
IFQ_DEQUEUE;
etc;
}


# 1.62 18-Aug-2014 dlg

dont rely on mbuf.h to provide pool.h.

ok miod@, who has offerred to help with any MD fallout
ok guenther@


Revision tags: OPENBSD_5_6_BASE
# 1.61 12-Jul-2014 tedu

add a size argument to free. will be used soon, but for now default to 0.
after discussions with beck deraadt kettenis.


# 1.60 10-Jul-2014 dlg

rings that dont rx packets dont need to be refilled.


# 1.59 08-Jul-2014 dlg

cut things that relied on mclgeti for rx ring accounting/restriction over
to using if_rxr.

cut the reporting systat did over to the rxr ioctl.

tested as much as i can on alpha, amd64, and sparc64.
mpi@ has run it on macppc.
ok mpi@


# 1.58 17-Jun-2014 dlg

whitespace fix.

im sick of fixing this by hand on all my boxes while hacking on
other stuff and having it pollute my diffs.

no functional change.


# 1.57 24-Mar-2014 dlg

nothing after the irq ack posting relies on it being ordered.


Revision tags: OPENBSD_5_5_BASE
# 1.56 10-Feb-2014 dlg

the mac addresses you program with MYXCMD_SET_MCASTGROUP are in a different
format to the one used for MYXCMD_SET_LLADDR. for reasons.

this lets ospf work if you dont happen to have PROMISC enabled on your
interface like my production firewalls happen to have, which is why i
never noticed this before.


# 1.55 05-Feb-2014 dlg

after running myx(4) without biglock in production for a few days
i discovered that there's a race between the interrupt code and
myx_start which causes the count of free tx descriptors to get
distorted, which eventually leads to a permanent setting of
IFF_OACTIVE, which in turn prevents the driver from transmitting
packets.

fixing that went horribly wrong when i then discovered that there's
a race between the interrupt handler and myx_down, where the interrupt
can tell myx_down to wake up and free all the rings while the
interrupt handler is still looking at them. free panics for all.

this moves the handling of the tx free count under the biglock (for
now), and moves myx_up and myx_down to managing a "driver state"
variable independantly of the IFF_UP and IFF_RUNNING flags, and
very very careful reordering of the checks of that state variable
and the hardware state.

as a bonus we get to avoid excessive calls to myx_txeof and myx_rxeof
in the isr, and less stuff checked unconditionally. on the other
hand, the sc_state handling added some more checks so it might not
be a win overall.

tested on smp sparc64 with msi and nonmsi interrupts, and on amd64 smp
in production again.


# 1.54 31-Jan-2014 dlg

sc_function is set, but never used for anything useful. clean it up...


# 1.53 31-Jan-2014 dlg

sc_lladdr is never used, so we can get the space in the sc back.


# 1.52 23-Jan-2014 dlg

a lot of people have pointed out to me that taking a lock just to read an
int isnt necessary.


# 1.51 23-Jan-2014 dlg

factor the mutex/bus_space handling of the sts block out.


# 1.50 21-Jan-2014 dlg

introduce fine grained locking.

this doesnt give up the big lock coming from process context, only from
the interrupt side. it is excessively careful about when it takes
the big lock again. notably it goes to a lot of effort to not hold
a mutex while calling into other subsystems or before taking the
big lock.

ive been hitting it as hard as i can without problems.

intensly read by mpi@
ok claudio@ kettenis@


# 1.49 19-Jan-2014 dlg

white space fix


# 1.48 19-Jan-2014 dlg

introduce fine grained locking around the lists of packet handlers
myx maintains. this moves it away from relying on splnet to protect
them.


# 1.47 19-Jan-2014 dlg

hwflags is never used, so clean it up


# 1.46 19-Jan-2014 dlg

replace bcmp with memcmp


# 1.45 19-Jan-2014 dlg

bcopy to memcpy


# 1.44 19-Jan-2014 dlg

replace bzero with memset.


# 1.43 19-Jan-2014 dlg

all 64bit archs myx runs on support bus_space 8 things because of work i
did at n2k13.


Revision tags: OPENBSD_5_3_BASE OPENBSD_5_4_BASE
# 1.42 29-Jan-2013 brad

- Set ENETRESET within myx_ioctl() instead of calling myx_iff() directly, to be
consistent with other drivers.
- Clear IFF_ALLMULTI flag early and at the top of myx_iff().
- Set IFF_ALLMULTI when in promisc mode.

ok dlg@


# 1.41 25-Jan-2013 dlg

we go to a lot of effort to post the first tx descriptor last, but we
really should be trying to post everything except the flags field in the
first tx descriptor. this shuffles things around so the rest of that first
txd is posted as part of the "everything else" before its flags field.


# 1.40 25-Jan-2013 dlg

the myx_dmamem struct doesnt need a name.


# 1.39 21-Jan-2013 dlg

myx does reads and writes in one direction to packet buffers. lets try
STREAMING them.


# 1.38 15-Jan-2013 dlg

dont use amd64 is currently broken cos it has no
bus_space_write_raw_region_8. disabling it for now.


# 1.37 15-Jan-2013 dlg

use bus_space_write_raw_region_8 on 64bit archs when writing to the rings


# 1.36 14-Jan-2013 dlg

map the registers PREFETCHABLE so things that can do write combining can
try and do write combining like the myx doco likes.


# 1.35 14-Jan-2013 dlg

avoid extra bus_space barriers in the interrupt handler.


# 1.34 14-Jan-2013 dlg

when posting descriptors to the chips rings, avoid going write barrier
write barrier write barrier when using myx_write to post descriptors.

instead let its go write write write barrier by using the appropriate
bus_space write directly followed by a single bus_space barrier.

the story above is mostly true, except that myx wants use to write all the
descriptors except the first, barrier, and then write the first one out to
signale that the chip can proceed.

it is also worth noting that the barriers cover more address space than
what we actually wrote to. this makes the code much simpler, and avoids
generating extra fence operations (which is what barrier functions end up
as on most of our archs) when we wrap around the end of the ring. the
bus_space doco encourages this.

bus_space use was discussed with krw@ kettenis@ deraadt@


# 1.33 14-Jan-2013 dlg

the myri doco suggests its nice to post stuff by filling in everything
in the rings except the first descriptor. once you've written as
much as you can out, then you go back and post the first descriptor
to signal that the chip should go ahead and work.


# 1.32 14-Jan-2013 dlg

;; is a long way of saying ;


# 1.31 29-Nov-2012 brad

Remove setting an initial assumed baudrate upon driver attach which is not
necessarily correct, there might not even be a link when attaching.

ok mikeb@ reyk@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.30 28-Nov-2011 blambert

Fix reversed error-handling gotos in myx_buf_fill(), which would lead to
either an mbuf leak or a NULL pointer dereference.

ok sthen@ claudio@ dlg@
testing claudio@ dlg@


Revision tags: OPENBSD_5_0_BASE
# 1.29 08-Aug-2011 dlg

myx requires the driver pad short ethernet frames to 60 bytes by
adding a descriptor pointing at zeroed bytes onto the end of transmit
chains. i was accounting for this extra descriptor when i was
completing the chain, but not when i was setting this up. this
meant the number of free descriptors kept growing until it overflowed.
at this point the check for space in the ring failed and packets
no longer flowed.

this counts the pad descriptor in the tx chain setup too.

ok deraadt@


# 1.28 23-Jun-2011 dlg

cope with empty rx rings by scheduling a timeout to keep trying until it
gets some packets onto the rings.

also annoying, but the hardware doesnt report empty rings, we have to
handle it ourselves.


# 1.27 23-Jun-2011 dlg

this chip has an annoying "feature" where it cannot report the link
state unless the chip is up and handling packets. while its down
it does not report the link state, so it is unknown.

this tweaks the link state handling, in particular it adds code to
myx_down so it moves the link state to unknown, ie, it correctly
reflects reality.

stupidity pointed out by deraadt


# 1.26 22-Jun-2011 deraadt

reset the tx_count on UP, since it may have been advanced from non-zero
by a previous use
ok claudio


# 1.25 22-Jun-2011 dlg

msi support. this is a complicated one...

ok kettenis@


# 1.24 22-Jun-2011 jsg

another myri10ge device matched by freebsd/linux drivers
ok dlg@


# 1.23 22-Jun-2011 dlg

oops, handle refill like i said i was going to two revisions ago.


# 1.22 22-Jun-2011 deraadt

set the mac address on the chip correctly (repair the byte order)
it now works on sparc64, too
ok dlg


# 1.21 22-Jun-2011 dlg

deraadt plugged his myx into a sparc64 and discovered 3 problems:

1. we want to write raw values to registers all the time, so promote the
myx_raw{read,write} to myx_{read,write} and use them everywhere. get rid
of the raw funcs.
2. i was setting the watermarks on the rx ring before knowhing how big
they were.
3. rxfill in the interrupt handler could lose data if you loop on
sts_isvalid.

almost working now...

"please commit your diff" deraadt@


# 1.20 21-Jun-2011 dlg

do the unaligned dma tests so we can figure out if we need to fall
back to the unaligned firmware. apparently this is only an issue
on the "A" controllers which have been supersceded, but those are
the chips we (openbsd devs) have.


# 1.19 21-Jun-2011 dlg

report the controllers part number. eg, i now know i have a
10G-PCIE-8A-R. dmesg looks like this:

myx0 at pci4 dev 0 function 0 "Myricom Z8E" rev 0x00: apic 1 int 8, model 10G-PCIE-8A-R, address 00:60:dd:47:c6:74


# 1.18 21-Jun-2011 dlg

wire up jumbos properly. the hardware supports up to 9018 bytes off
the wire (9000 + ether header + vlan tag), but has some cool alignment
requirements. if you want to use a single rx ring desc to point at
a jumbo it needs to start on a 4k boundary and be physically
contiguous. to ensure this im pulling frames from the 12k pool and
waiting for arianes diff to ensure mbufs are contig.

direction from andrew gallatin. tested locally.


# 1.17 21-Jun-2011 deraadt

minor cleanups; ok dlg


# 1.16 20-Jun-2011 dlg

make the interrupt handler look more like what the doco suggests. seems to
fix a bad lockup i kept getting.


# 1.15 20-Jun-2011 dlg

dont need debug, the myx_cmd stuff works fine.


# 1.14 20-Jun-2011 dlg

i got myx working!


# 1.13 02-May-2011 chl

Do not check malloc return value against NULL, as M_WAITOK is used.

ok dlg@ krw@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.12 19-May-2010 oga

BUS_DMA_ZERO instead of alloc, map, bzero.

ok krw@


Revision tags: OPENBSD_4_7_BASE
# 1.11 13-Aug-2009 jasper

- consistify cfdriver for the ethernet drivers (0 -> NULL)

ok dlg@


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.10 28-Nov-2008 brad

Eliminate the redundant bits of code for MTU and multicast handling
from the individual drivers now that ether_ioctl() handles this.

Shrinks the i386 kernels by..
RAMDISK - 2176 bytes
RAMDISKB - 1504 bytes
RAMDISKC - 736 bytes

Tested by naddy@/okan@/sthen@/brad@/todd@/jmc@ and lots of users.
Build tested on almost all archs by todd@/brad@

ok naddy@


# 1.9 02-Oct-2008 brad

First step towards cleaning up the Ethernet driver ioctl handling.
Move calling ether_ioctl() from the top of the ioctl function, which
at the moment does absolutely nothing, to the default switch case.
Thus allowing drivers to define their own ioctl handlers and then
falling back on ether_ioctl(). The only functional change this results
in at the moment is having all Ethernet drivers returning the proper
errno of ENOTTY instead of EINVAL/ENXIO when encountering unknown
ioctl's.

Shrinks the i386 kernels by..
RAMDISK - 1024 bytes
RAMDISKB - 1120 bytes
RAMDISKC - 832 bytes

Tested by martin@/jsing@/todd@/brad@
Build tested on almost all archs by todd@/brad@

ok jsing@


# 1.8 10-Sep-2008 blambert

Convert timeout_add() calls using multiples of hz to timeout_add_sec()

Really just the low-hanging fruit of (hopefully) forthcoming timeout
conversions.

ok art@, krw@


Revision tags: OPENBSD_4_4_BASE
# 1.7 23-May-2008 brad

Simplify the combination use of pci_mapreg_type()/pci_mapreg_map() as
suggested by dlg@ awhile ago.

ok dlg@


Revision tags: OPENBSD_4_3_BASE
# 1.6 16-Jan-2008 thib

Set the baudrate with IF_Gbps(10); and remove an
XXX comment now that if_baudrate is 64bits.

ok reyk@


Revision tags: OPENBSD_4_2_BASE
# 1.5 01-Jun-2007 reyk

initialize the rings


# 1.4 31-May-2007 reyk

further improvement of the bus space i/o. firmware loading, booting,
and card initalization works now.

thanks to dlg@ who pointed me to the fact that
bus_space_write_region_N and bus_space_write_raw_region_N use count of
elements vs. size of buffer arguments.


# 1.3 31-May-2007 reyk

enable all debugging messages by default if the driver is compiled with
MYX_DEBUG


# 1.2 31-May-2007 reyk

fix the myx_write function


# 1.1 31-May-2007 reyk

initial bits of a new driver for the Myricom Myri-10G Lanai-Z8E 10Gb
Ethernet chipset. not working yet.

ok dlg@


# 1.109 10-Jul-2020 patrick

Change users of IFQ_SET_MAXLEN() and IFQ_IS_EMPTY() to use the "new" API.

ok dlg@ tobhe@


Revision tags: OPENBSD_6_6_BASE OPENBSD_6_7_BASE
# 1.108 03-Jul-2019 dlg

use ifiq_input return values to apply backpressure to rings.


# 1.107 16-Apr-2019 dlg

i2c reads are more reliable a byte at a time.

reading all 256 at a time was a nice idea, but meant page 0xa2 wasnt
appearing like it should. this follows what freebsd does more
closely too.


# 1.106 16-Apr-2019 dlg

make sff page reads work on little endian archs too. like amd64.

some modules seem to need more time when waiting for bytes while here.

hrvoje popovski hit the endian issue


# 1.105 15-Apr-2019 dlg

implement SIOCGIFSFFPAGE so ifconfig can get transceiver info.

myx doesn't allow i2c writes, so you can only read whatever page
the firmware is already pointing at on device 0xa0. if you try to
read another page it will return ENXIO.

tested on a 10G-PCIE-8A-R with an xfp module.


# 1.104 15-Apr-2019 dlg

trim some debug code that printed out the name of a command

the list of commands is going to grow, but the thought of keeping
the list in debug code up to date with it just makes me feel tired.

this prints the command id number instead in the same format we
represent it in the header.


Revision tags: OPENBSD_6_2_BASE OPENBSD_6_3_BASE OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.103 01-Aug-2017 dlg

defer init of the myxmcl pool to mountroot, and enable pool cpu caches.

pool_cache_init cannot be called during autoconf because we cant
be confident about the number of cpus in the machine until the first
run of attaches.

mountroot is after autoconf, and myx already has code that runs
there for the firmware loading.

discussed with deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.102 07-Feb-2017 dlg

move the mbuf pools to m_pool_init and a single global memory limit

this replaces individual calls to pool_init, pool_set_constraints, and
pool_sethardlimit with calls to m_pool_init. m_pool_init inits the
mbuf pools with the mbuf pool allocator, and because of that doesnt
set per pool limits.

ok bluhm@ as part of a larger diff


# 1.101 24-Jan-2017 dlg

add support for multiple transmit ifqueues per network interface.

an ifq to transmit a packet is picked by the current traffic
conditioner (ie, priq or hfsc) by providing an index into an array
of ifqs. by default interfaces get a single ifq but can ask for
more using if_attach_queues().

the vast majority of our drivers still think there's a 1:1 mapping
between interfaces and transmit queues, so their if_start routines
take an ifnet pointer instead of a pointer to the ifqueue struct.
instead of changing all the drivers in the tree, drivers can opt
into using an if_qstart routine and setting the IFXF_MPSAFE flag.
the stack provides a compatability wrapper from the new if_qstart
handler to the previous if_start handlers if IFXF_MPSAFE isnt set.

enabling hfsc on an interface configures it to transmit everything
through the first ifq. any other ifqs are left configured as priq,
but unused, when hfsc is enabled.

getting this in now so everyone can kick the tyres.

ok mpi@ visa@ (who provided some tweaks for cnmac).


# 1.100 22-Jan-2017 dlg

move counting if_opackets next to counting if_obytes in if_enqueue.

this means packets are consistently counted in one place, unlike the
many and various ways that drivers thought they should do it.

ok mpi@ deraadt@


# 1.99 31-Oct-2016 dlg

turns out these chips can handle buffers up to 9400 bytes in length.

raise the mtu to 9380 bytes so we can take advantage of the extra space.

i need to revisit the macro names at some point.


# 1.98 31-Oct-2016 dlg

revert 1.97 where i moved myx to using the system pools

my early revision board doesnt like it at all


# 1.97 28-Oct-2016 dlg

get rid of the custom pool in myx for jumbo frames.

now it asks the mbuf layer for the 9k from its pools.

a question from chris@ made me go look at the chip doco again and i
realised that the chip only requires 4 byte alignment for rx buffers,
no 4k alignment for jumbo buffers.

i also found that the chip is supposed to be able to rx up to 9400
bytes instead of 9000. ill fix that later though.


# 1.96 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


Revision tags: OPENBSD_6_0_BASE
# 1.95 23-May-2016 tedu

remove the function pointer from mbufs. this memory is shared with data
via unions, and we don't want to make it easy to control the target.
instead an integer index into an array of acceptable functions is used.
drivers using custom functions must register them to receive an index.
ok deraadt


# 1.94 13-Apr-2016 mpi

G/C IFQ_SET_READY().


# 1.93 13-Apr-2016 mpi

G/C IFQ_SET_READY().


Revision tags: OPENBSD_5_9_BASE
# 1.92 11-Dec-2015 mpi

Replace mountroothook_establish(9) by config_mountroot(9) a narrower API
similar to config_defer(9).

ok mikeb@, deraadt@


# 1.91 09-Dec-2015 dlg

rework the if_start mpsafe serialisation so it can serialise arbitrary work

work is represented by struct task.

the start routine is now wrapped by a task which is serialised by the
infrastructure. if_start_barrier has been renamed to ifq_barrier and
is now implemented as a task that gets serialised with the start
routine.

this also adds an ifq_restart() function. it serialises a call to
ifq_clr_oactive and calls the start routine again. it exists to
avoid a race that kettenis@ identified in between when a start
routine discovers theres no space left on a ring, and when it calls
ifq_set_oactive. if the txeof side of the driver empties the ring
and calls ifq_clr_oactive in between the above calls in start, the
queue will be marked oactive and the stack will never call the start
routine again.

by serialising the ifq_set_oactive call in the start routine and
ifq_clr_oactive calls we avoid that race.

tested on various nics
ok mpi@


# 1.90 03-Dec-2015 dlg

tell the stack myx_start is mpsafe.

as per the stack commit, the driver changes are:

1. setting ifp->if_xflags = IFXF_MPSAFE
2. only calling if_start() instead of its own start routine
3. clearing IFF_RUNNING before calling if_start_barrier() on its way down
4. only using IFQ_DEQUEUE (not ifq_deq_begin/commit/rollback)


# 1.89 01-Dec-2015 dlg

myx doesnt use atomic.h anymore.


# 1.88 25-Nov-2015 dlg

replace IFF_OACTIVE manipulation with mpsafe operations.

there are two things shared between the network stack and drivers
in the send path: the send queue and the IFF_OACTIVE flag. the send
queue is now protected by a mutex. this diff makes the oactive
functionality mpsafe too.

IFF_OACTIVE is part of if_flags. there are two problems with that.
firstly, if_flags is a short and we dont have any MI atomic operations
to manipulate a short. secondly, while we could make the IFF_OACTIVE
operates mpsafe, all changes to other flags would have to be made
safe at the same time, otherwise a read-modify-write cycle on their
updates could clobber the oactive change.

instead, this moves the oactive mark into struct ifqueue and provides
an API for changing it. there's ifq_set_oactive, ifq_clr_oactive,
and ifq_is_oactive. these are modelled on ifsq_set_oactive,
ifsq_clr_oactive, and ifsq_is_oactive in dragonflybsd.

this diff includes changes to all the drivers manipulating IFF_OACTIVE
to now use the ifsq_{set,clr_is}_oactive API too.

ok kettenis@ mpi@ jmatthew@ deraadt@


# 1.87 24-Nov-2015 dlg

fix tx ring accounting in myx_start.

turns out i was calculating the number of packets (not descriptors)
on the tx ring, and then using that as the free space for descriptors.


# 1.86 19-Nov-2015 dlg

get rid of sc_tx_free and the atomic ops on it in myx_start and myx_txeof.

myx_start calculates the free space by reading the consumer index
and doing some maths, which lets us avoid the interlocked cpu ops.


# 1.85 25-Oct-2015 mpi

arp_ifinit() is no longer needed.


# 1.84 29-Sep-2015 dlg

get rid of the mutex between access to the status block and myx_down

myx is unusual in that it has an explicit command to shut down the
chip that gets an interrupt when it's done. so myx_down sends the
command and has to sleep until it gets that interrupt. this moves
to using a single int to represent that state (so loads and stores
are atomic), and sleep_setup/sleep_finish in myx_down to wait for
it to change.

this has been running in production at work for a few months now
tested by chris@


# 1.83 01-Sep-2015 deraadt

free() firmware with right len; ok dlg


# 1.82 15-Aug-2015 dlg

do the global tx free accounting in myx_start with a single atomic op
instead of one per packet.

seems to let me send packets a little faster.


# 1.81 15-Aug-2015 dlg

rework the tx path to use a ring to keep track of dmamaps/mbufs.

this removes the myx_buf structure and uses myx_slot instead. theyre
the same expcet slots dont have list entry structures, so theyre
smaller.

this cuts out four mutex ops per packet out of the tx handling.
just have to get rid of the atomic op per packet in myx_start now.


# 1.80 14-Aug-2015 dlg

move to a per rx ring timeout for refilling empty rings.

this lets me get rid of the locking around the refilling of the rx ring.

the timeout only runs refill if the rx ring is empty. we know rxeof
wont try and refill it in that situation because there's no packets
on the ring so we wont get interrupts for it. therefore we dont
need to lock between the timeout and rxeof cos they cant run at the
same time.


# 1.79 14-Aug-2015 dlg

rework how we track the packets on the rx rings.

originally there were two mutex protected lists for rx packets, a
list of free packets, and a list of packets that were on the ring.
filling the ring popped packets off the free list, attached an mbuf
and dmamapped it, and pushed it onto the list of active packets.
the hw fills packets in order, so on rx completion we'd pop packets
the active list, unmap the mbuf and shove it up the stack before
putting the packet on the free list.

the problem with the lists is that every rx ring operation resulted
in two mutex ops. so 4 mutex ops per packet after you do both fill
and rxeof.

this replaces the mutexed lists with rings that shadow the hardware
rings. filling the rx ring pushes a producer index along, while
rxeof chases it with a consumer. because we know only one thing can
do either of those tasks at a time, we can get away with not using
atomic ops for them.

there's more to be done, but this is a good first step.


Revision tags: OPENBSD_5_8_BASE
# 1.78 24-Jun-2015 mpi

Increment if_ipackets in if_input().

Note that pseudo-drivers not using if_input() are not affected by this
conversion.

ok mikeb@, kettenis@, claudio@, dlg@


# 1.77 17-May-2015 chris

We don't need KERNEL_LOCK() around if_input() anymore, as if_input() has
appropriate locking around bpf now.

ok dlg@


# 1.76 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.75 20-Feb-2015 chris

Now that if_input() is a thing, use it

ok dlg@


# 1.74 18-Feb-2015 dlg

myri employees and their drivers for linux and solaris have repeatedly
told me that if you're going to rx into buffers greater than 4k in
size, they have to be aligned to a 4k boundary.

the mru of this chip is 9k, but ive been using the 12k mcl pool to
provide the alignment. however, if we move to putting 8 items on a
pool page there'll be enough slack space in the mcl12k pool pages
to allow item colouring, which in turn will break the chip requirement
above. in practice the chips i have seem to work fine with unaligned
buffers, but i dont want to risk breaking early revision chips.

this moves myx to using a private pool for allocating clusters for
the big rx ring. the item size is 9k, but we specify a 4k alignment
so every item we get out of it will be correct for the chip.


# 1.73 18-Feb-2015 dlg

enable pcie relaxed transaction ordering and bump the max payload
size up to 4k.

found while reading someone elses driver.


# 1.72 22-Dec-2014 tedu

unifdef INET


# 1.71 28-Oct-2014 dlg

the if_rxring accounting would get screwed up if the first mbuf to
be put on the ring couldnt be allocated.

this pulls the code that puts the mbufs on the ring out of myx_rx_fill
so it can return early if firstmb cant be allocated, which puts it
in the right place to return unused slots to the if_rxring.

this means myx rx wont lock up if you're DoSsed to the point where
you exhaust your mbuf pools and cant allocate mbufs for the ring.

ok jmatthew@


# 1.70 04-Oct-2014 dlg

replace mutexes to serialise the operations on the flag that restricts
the number of contexts that are refilling the rx rings with atomic
ops.

this is borrowed from code i wrote for the scsi midlayer but cant
put in yet because i havent got atomic.h up to scrach on all archs
yet. the archs myx runs on do have enough atomic.h to be fine though.


# 1.69 03-Oct-2014 dlg

refill the rx ring in myx_rxeof, not much later at the end of myx_intr.


# 1.68 03-Oct-2014 dlg

in rxeof, instead of taking the biglock on every packet to call bpf
and ether_input, queue all the mbufs onto an mbuf_list on the stack
and then take the biglock once outside the loop.


# 1.67 03-Oct-2014 dlg

we dont need the kernel lock to call bus_dmamap_load and unload thanks
to ketenis.

move the if_ipacket and if_opacket increments out of biglock too.
theyre only updated from the interrupt handler, which is only run
on a single cpu so there's no chance of the update racing. everywhere
else only reads them.


# 1.66 03-Oct-2014 dlg

dont need to hold the kernel lock to call MCLGETI and m_freem now.


# 1.65 03-Oct-2014 dlg

dont take the kernel lock on every interrupt in case we might change
the link state or to clear OACTIVE, just take it when we know we
really are going to do those things.


# 1.64 14-Sep-2014 jsg

remove uneeded proc.h includes
ok mpi@ kspillner@


# 1.63 19-Aug-2014 dlg

in myx_start, replace

while (space) {
IFQ_POLL;
myx_dequeue(free descr);
IFQ_DEQUEUE;
etc;
}

with

while (space && myx_dequeue(free descr)) {
IFQ_DEQUEUE;
etc;
}


# 1.62 18-Aug-2014 dlg

dont rely on mbuf.h to provide pool.h.

ok miod@, who has offerred to help with any MD fallout
ok guenther@


Revision tags: OPENBSD_5_6_BASE
# 1.61 12-Jul-2014 tedu

add a size argument to free. will be used soon, but for now default to 0.
after discussions with beck deraadt kettenis.


# 1.60 10-Jul-2014 dlg

rings that dont rx packets dont need to be refilled.


# 1.59 08-Jul-2014 dlg

cut things that relied on mclgeti for rx ring accounting/restriction over
to using if_rxr.

cut the reporting systat did over to the rxr ioctl.

tested as much as i can on alpha, amd64, and sparc64.
mpi@ has run it on macppc.
ok mpi@


# 1.58 17-Jun-2014 dlg

whitespace fix.

im sick of fixing this by hand on all my boxes while hacking on
other stuff and having it pollute my diffs.

no functional change.


# 1.57 24-Mar-2014 dlg

nothing after the irq ack posting relies on it being ordered.


Revision tags: OPENBSD_5_5_BASE
# 1.56 10-Feb-2014 dlg

the mac addresses you program with MYXCMD_SET_MCASTGROUP are in a different
format to the one used for MYXCMD_SET_LLADDR. for reasons.

this lets ospf work if you dont happen to have PROMISC enabled on your
interface like my production firewalls happen to have, which is why i
never noticed this before.


# 1.55 05-Feb-2014 dlg

after running myx(4) without biglock in production for a few days
i discovered that there's a race between the interrupt code and
myx_start which causes the count of free tx descriptors to get
distorted, which eventually leads to a permanent setting of
IFF_OACTIVE, which in turn prevents the driver from transmitting
packets.

fixing that went horribly wrong when i then discovered that there's
a race between the interrupt handler and myx_down, where the interrupt
can tell myx_down to wake up and free all the rings while the
interrupt handler is still looking at them. free panics for all.

this moves the handling of the tx free count under the biglock (for
now), and moves myx_up and myx_down to managing a "driver state"
variable independantly of the IFF_UP and IFF_RUNNING flags, and
very very careful reordering of the checks of that state variable
and the hardware state.

as a bonus we get to avoid excessive calls to myx_txeof and myx_rxeof
in the isr, and less stuff checked unconditionally. on the other
hand, the sc_state handling added some more checks so it might not
be a win overall.

tested on smp sparc64 with msi and nonmsi interrupts, and on amd64 smp
in production again.


# 1.54 31-Jan-2014 dlg

sc_function is set, but never used for anything useful. clean it up...


# 1.53 31-Jan-2014 dlg

sc_lladdr is never used, so we can get the space in the sc back.


# 1.52 23-Jan-2014 dlg

a lot of people have pointed out to me that taking a lock just to read an
int isnt necessary.


# 1.51 23-Jan-2014 dlg

factor the mutex/bus_space handling of the sts block out.


# 1.50 21-Jan-2014 dlg

introduce fine grained locking.

this doesnt give up the big lock coming from process context, only from
the interrupt side. it is excessively careful about when it takes
the big lock again. notably it goes to a lot of effort to not hold
a mutex while calling into other subsystems or before taking the
big lock.

ive been hitting it as hard as i can without problems.

intensly read by mpi@
ok claudio@ kettenis@


# 1.49 19-Jan-2014 dlg

white space fix


# 1.48 19-Jan-2014 dlg

introduce fine grained locking around the lists of packet handlers
myx maintains. this moves it away from relying on splnet to protect
them.


# 1.47 19-Jan-2014 dlg

hwflags is never used, so clean it up


# 1.46 19-Jan-2014 dlg

replace bcmp with memcmp


# 1.45 19-Jan-2014 dlg

bcopy to memcpy


# 1.44 19-Jan-2014 dlg

replace bzero with memset.


# 1.43 19-Jan-2014 dlg

all 64bit archs myx runs on support bus_space 8 things because of work i
did at n2k13.


Revision tags: OPENBSD_5_3_BASE OPENBSD_5_4_BASE
# 1.42 29-Jan-2013 brad

- Set ENETRESET within myx_ioctl() instead of calling myx_iff() directly, to be
consistent with other drivers.
- Clear IFF_ALLMULTI flag early and at the top of myx_iff().
- Set IFF_ALLMULTI when in promisc mode.

ok dlg@


# 1.41 25-Jan-2013 dlg

we go to a lot of effort to post the first tx descriptor last, but we
really should be trying to post everything except the flags field in the
first tx descriptor. this shuffles things around so the rest of that first
txd is posted as part of the "everything else" before its flags field.


# 1.40 25-Jan-2013 dlg

the myx_dmamem struct doesnt need a name.


# 1.39 21-Jan-2013 dlg

myx does reads and writes in one direction to packet buffers. lets try
STREAMING them.


# 1.38 15-Jan-2013 dlg

dont use amd64 is currently broken cos it has no
bus_space_write_raw_region_8. disabling it for now.


# 1.37 15-Jan-2013 dlg

use bus_space_write_raw_region_8 on 64bit archs when writing to the rings


# 1.36 14-Jan-2013 dlg

map the registers PREFETCHABLE so things that can do write combining can
try and do write combining like the myx doco likes.


# 1.35 14-Jan-2013 dlg

avoid extra bus_space barriers in the interrupt handler.


# 1.34 14-Jan-2013 dlg

when posting descriptors to the chips rings, avoid going write barrier
write barrier write barrier when using myx_write to post descriptors.

instead let its go write write write barrier by using the appropriate
bus_space write directly followed by a single bus_space barrier.

the story above is mostly true, except that myx wants use to write all the
descriptors except the first, barrier, and then write the first one out to
signale that the chip can proceed.

it is also worth noting that the barriers cover more address space than
what we actually wrote to. this makes the code much simpler, and avoids
generating extra fence operations (which is what barrier functions end up
as on most of our archs) when we wrap around the end of the ring. the
bus_space doco encourages this.

bus_space use was discussed with krw@ kettenis@ deraadt@


# 1.33 14-Jan-2013 dlg

the myri doco suggests its nice to post stuff by filling in everything
in the rings except the first descriptor. once you've written as
much as you can out, then you go back and post the first descriptor
to signal that the chip should go ahead and work.


# 1.32 14-Jan-2013 dlg

;; is a long way of saying ;


# 1.31 29-Nov-2012 brad

Remove setting an initial assumed baudrate upon driver attach which is not
necessarily correct, there might not even be a link when attaching.

ok mikeb@ reyk@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.30 28-Nov-2011 blambert

Fix reversed error-handling gotos in myx_buf_fill(), which would lead to
either an mbuf leak or a NULL pointer dereference.

ok sthen@ claudio@ dlg@
testing claudio@ dlg@


Revision tags: OPENBSD_5_0_BASE
# 1.29 08-Aug-2011 dlg

myx requires the driver pad short ethernet frames to 60 bytes by
adding a descriptor pointing at zeroed bytes onto the end of transmit
chains. i was accounting for this extra descriptor when i was
completing the chain, but not when i was setting this up. this
meant the number of free descriptors kept growing until it overflowed.
at this point the check for space in the ring failed and packets
no longer flowed.

this counts the pad descriptor in the tx chain setup too.

ok deraadt@


# 1.28 23-Jun-2011 dlg

cope with empty rx rings by scheduling a timeout to keep trying until it
gets some packets onto the rings.

also annoying, but the hardware doesnt report empty rings, we have to
handle it ourselves.


# 1.27 23-Jun-2011 dlg

this chip has an annoying "feature" where it cannot report the link
state unless the chip is up and handling packets. while its down
it does not report the link state, so it is unknown.

this tweaks the link state handling, in particular it adds code to
myx_down so it moves the link state to unknown, ie, it correctly
reflects reality.

stupidity pointed out by deraadt


# 1.26 22-Jun-2011 deraadt

reset the tx_count on UP, since it may have been advanced from non-zero
by a previous use
ok claudio


# 1.25 22-Jun-2011 dlg

msi support. this is a complicated one...

ok kettenis@


# 1.24 22-Jun-2011 jsg

another myri10ge device matched by freebsd/linux drivers
ok dlg@


# 1.23 22-Jun-2011 dlg

oops, handle refill like i said i was going to two revisions ago.


# 1.22 22-Jun-2011 deraadt

set the mac address on the chip correctly (repair the byte order)
it now works on sparc64, too
ok dlg


# 1.21 22-Jun-2011 dlg

deraadt plugged his myx into a sparc64 and discovered 3 problems:

1. we want to write raw values to registers all the time, so promote the
myx_raw{read,write} to myx_{read,write} and use them everywhere. get rid
of the raw funcs.
2. i was setting the watermarks on the rx ring before knowhing how big
they were.
3. rxfill in the interrupt handler could lose data if you loop on
sts_isvalid.

almost working now...

"please commit your diff" deraadt@


# 1.20 21-Jun-2011 dlg

do the unaligned dma tests so we can figure out if we need to fall
back to the unaligned firmware. apparently this is only an issue
on the "A" controllers which have been supersceded, but those are
the chips we (openbsd devs) have.


# 1.19 21-Jun-2011 dlg

report the controllers part number. eg, i now know i have a
10G-PCIE-8A-R. dmesg looks like this:

myx0 at pci4 dev 0 function 0 "Myricom Z8E" rev 0x00: apic 1 int 8, model 10G-PCIE-8A-R, address 00:60:dd:47:c6:74


# 1.18 21-Jun-2011 dlg

wire up jumbos properly. the hardware supports up to 9018 bytes off
the wire (9000 + ether header + vlan tag), but has some cool alignment
requirements. if you want to use a single rx ring desc to point at
a jumbo it needs to start on a 4k boundary and be physically
contiguous. to ensure this im pulling frames from the 12k pool and
waiting for arianes diff to ensure mbufs are contig.

direction from andrew gallatin. tested locally.


# 1.17 21-Jun-2011 deraadt

minor cleanups; ok dlg


# 1.16 20-Jun-2011 dlg

make the interrupt handler look more like what the doco suggests. seems to
fix a bad lockup i kept getting.


# 1.15 20-Jun-2011 dlg

dont need debug, the myx_cmd stuff works fine.


# 1.14 20-Jun-2011 dlg

i got myx working!


# 1.13 02-May-2011 chl

Do not check malloc return value against NULL, as M_WAITOK is used.

ok dlg@ krw@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.12 19-May-2010 oga

BUS_DMA_ZERO instead of alloc, map, bzero.

ok krw@


Revision tags: OPENBSD_4_7_BASE
# 1.11 13-Aug-2009 jasper

- consistify cfdriver for the ethernet drivers (0 -> NULL)

ok dlg@


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.10 28-Nov-2008 brad

Eliminate the redundant bits of code for MTU and multicast handling
from the individual drivers now that ether_ioctl() handles this.

Shrinks the i386 kernels by..
RAMDISK - 2176 bytes
RAMDISKB - 1504 bytes
RAMDISKC - 736 bytes

Tested by naddy@/okan@/sthen@/brad@/todd@/jmc@ and lots of users.
Build tested on almost all archs by todd@/brad@

ok naddy@


# 1.9 02-Oct-2008 brad

First step towards cleaning up the Ethernet driver ioctl handling.
Move calling ether_ioctl() from the top of the ioctl function, which
at the moment does absolutely nothing, to the default switch case.
Thus allowing drivers to define their own ioctl handlers and then
falling back on ether_ioctl(). The only functional change this results
in at the moment is having all Ethernet drivers returning the proper
errno of ENOTTY instead of EINVAL/ENXIO when encountering unknown
ioctl's.

Shrinks the i386 kernels by..
RAMDISK - 1024 bytes
RAMDISKB - 1120 bytes
RAMDISKC - 832 bytes

Tested by martin@/jsing@/todd@/brad@
Build tested on almost all archs by todd@/brad@

ok jsing@


# 1.8 10-Sep-2008 blambert

Convert timeout_add() calls using multiples of hz to timeout_add_sec()

Really just the low-hanging fruit of (hopefully) forthcoming timeout
conversions.

ok art@, krw@


Revision tags: OPENBSD_4_4_BASE
# 1.7 23-May-2008 brad

Simplify the combination use of pci_mapreg_type()/pci_mapreg_map() as
suggested by dlg@ awhile ago.

ok dlg@


Revision tags: OPENBSD_4_3_BASE
# 1.6 16-Jan-2008 thib

Set the baudrate with IF_Gbps(10); and remove an
XXX comment now that if_baudrate is 64bits.

ok reyk@


Revision tags: OPENBSD_4_2_BASE
# 1.5 01-Jun-2007 reyk

initialize the rings


# 1.4 31-May-2007 reyk

further improvement of the bus space i/o. firmware loading, booting,
and card initalization works now.

thanks to dlg@ who pointed me to the fact that
bus_space_write_region_N and bus_space_write_raw_region_N use count of
elements vs. size of buffer arguments.


# 1.3 31-May-2007 reyk

enable all debugging messages by default if the driver is compiled with
MYX_DEBUG


# 1.2 31-May-2007 reyk

fix the myx_write function


# 1.1 31-May-2007 reyk

initial bits of a new driver for the Myricom Myri-10G Lanai-Z8E 10Gb
Ethernet chipset. not working yet.

ok dlg@


# 1.108 03-Jul-2019 dlg

use ifiq_input return values to apply backpressure to rings.


# 1.107 16-Apr-2019 dlg

i2c reads are more reliable a byte at a time.

reading all 256 at a time was a nice idea, but meant page 0xa2 wasnt
appearing like it should. this follows what freebsd does more
closely too.


# 1.106 16-Apr-2019 dlg

make sff page reads work on little endian archs too. like amd64.

some modules seem to need more time when waiting for bytes while here.

hrvoje popovski hit the endian issue


# 1.105 15-Apr-2019 dlg

implement SIOCGIFSFFPAGE so ifconfig can get transceiver info.

myx doesn't allow i2c writes, so you can only read whatever page
the firmware is already pointing at on device 0xa0. if you try to
read another page it will return ENXIO.

tested on a 10G-PCIE-8A-R with an xfp module.


# 1.104 15-Apr-2019 dlg

trim some debug code that printed out the name of a command

the list of commands is going to grow, but the thought of keeping
the list in debug code up to date with it just makes me feel tired.

this prints the command id number instead in the same format we
represent it in the header.


Revision tags: OPENBSD_6_2_BASE OPENBSD_6_3_BASE OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.103 01-Aug-2017 dlg

defer init of the myxmcl pool to mountroot, and enable pool cpu caches.

pool_cache_init cannot be called during autoconf because we cant
be confident about the number of cpus in the machine until the first
run of attaches.

mountroot is after autoconf, and myx already has code that runs
there for the firmware loading.

discussed with deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.102 07-Feb-2017 dlg

move the mbuf pools to m_pool_init and a single global memory limit

this replaces individual calls to pool_init, pool_set_constraints, and
pool_sethardlimit with calls to m_pool_init. m_pool_init inits the
mbuf pools with the mbuf pool allocator, and because of that doesnt
set per pool limits.

ok bluhm@ as part of a larger diff


# 1.101 24-Jan-2017 dlg

add support for multiple transmit ifqueues per network interface.

an ifq to transmit a packet is picked by the current traffic
conditioner (ie, priq or hfsc) by providing an index into an array
of ifqs. by default interfaces get a single ifq but can ask for
more using if_attach_queues().

the vast majority of our drivers still think there's a 1:1 mapping
between interfaces and transmit queues, so their if_start routines
take an ifnet pointer instead of a pointer to the ifqueue struct.
instead of changing all the drivers in the tree, drivers can opt
into using an if_qstart routine and setting the IFXF_MPSAFE flag.
the stack provides a compatability wrapper from the new if_qstart
handler to the previous if_start handlers if IFXF_MPSAFE isnt set.

enabling hfsc on an interface configures it to transmit everything
through the first ifq. any other ifqs are left configured as priq,
but unused, when hfsc is enabled.

getting this in now so everyone can kick the tyres.

ok mpi@ visa@ (who provided some tweaks for cnmac).


# 1.100 22-Jan-2017 dlg

move counting if_opackets next to counting if_obytes in if_enqueue.

this means packets are consistently counted in one place, unlike the
many and various ways that drivers thought they should do it.

ok mpi@ deraadt@


# 1.99 31-Oct-2016 dlg

turns out these chips can handle buffers up to 9400 bytes in length.

raise the mtu to 9380 bytes so we can take advantage of the extra space.

i need to revisit the macro names at some point.


# 1.98 31-Oct-2016 dlg

revert 1.97 where i moved myx to using the system pools

my early revision board doesnt like it at all


# 1.97 28-Oct-2016 dlg

get rid of the custom pool in myx for jumbo frames.

now it asks the mbuf layer for the 9k from its pools.

a question from chris@ made me go look at the chip doco again and i
realised that the chip only requires 4 byte alignment for rx buffers,
no 4k alignment for jumbo buffers.

i also found that the chip is supposed to be able to rx up to 9400
bytes instead of 9000. ill fix that later though.


# 1.96 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


Revision tags: OPENBSD_6_0_BASE
# 1.95 23-May-2016 tedu

remove the function pointer from mbufs. this memory is shared with data
via unions, and we don't want to make it easy to control the target.
instead an integer index into an array of acceptable functions is used.
drivers using custom functions must register them to receive an index.
ok deraadt


# 1.94 13-Apr-2016 mpi

G/C IFQ_SET_READY().


# 1.93 13-Apr-2016 mpi

G/C IFQ_SET_READY().


Revision tags: OPENBSD_5_9_BASE
# 1.92 11-Dec-2015 mpi

Replace mountroothook_establish(9) by config_mountroot(9) a narrower API
similar to config_defer(9).

ok mikeb@, deraadt@


# 1.91 09-Dec-2015 dlg

rework the if_start mpsafe serialisation so it can serialise arbitrary work

work is represented by struct task.

the start routine is now wrapped by a task which is serialised by the
infrastructure. if_start_barrier has been renamed to ifq_barrier and
is now implemented as a task that gets serialised with the start
routine.

this also adds an ifq_restart() function. it serialises a call to
ifq_clr_oactive and calls the start routine again. it exists to
avoid a race that kettenis@ identified in between when a start
routine discovers theres no space left on a ring, and when it calls
ifq_set_oactive. if the txeof side of the driver empties the ring
and calls ifq_clr_oactive in between the above calls in start, the
queue will be marked oactive and the stack will never call the start
routine again.

by serialising the ifq_set_oactive call in the start routine and
ifq_clr_oactive calls we avoid that race.

tested on various nics
ok mpi@


# 1.90 03-Dec-2015 dlg

tell the stack myx_start is mpsafe.

as per the stack commit, the driver changes are:

1. setting ifp->if_xflags = IFXF_MPSAFE
2. only calling if_start() instead of its own start routine
3. clearing IFF_RUNNING before calling if_start_barrier() on its way down
4. only using IFQ_DEQUEUE (not ifq_deq_begin/commit/rollback)


# 1.89 01-Dec-2015 dlg

myx doesnt use atomic.h anymore.


# 1.88 25-Nov-2015 dlg

replace IFF_OACTIVE manipulation with mpsafe operations.

there are two things shared between the network stack and drivers
in the send path: the send queue and the IFF_OACTIVE flag. the send
queue is now protected by a mutex. this diff makes the oactive
functionality mpsafe too.

IFF_OACTIVE is part of if_flags. there are two problems with that.
firstly, if_flags is a short and we dont have any MI atomic operations
to manipulate a short. secondly, while we could make the IFF_OACTIVE
operates mpsafe, all changes to other flags would have to be made
safe at the same time, otherwise a read-modify-write cycle on their
updates could clobber the oactive change.

instead, this moves the oactive mark into struct ifqueue and provides
an API for changing it. there's ifq_set_oactive, ifq_clr_oactive,
and ifq_is_oactive. these are modelled on ifsq_set_oactive,
ifsq_clr_oactive, and ifsq_is_oactive in dragonflybsd.

this diff includes changes to all the drivers manipulating IFF_OACTIVE
to now use the ifsq_{set,clr_is}_oactive API too.

ok kettenis@ mpi@ jmatthew@ deraadt@


# 1.87 24-Nov-2015 dlg

fix tx ring accounting in myx_start.

turns out i was calculating the number of packets (not descriptors)
on the tx ring, and then using that as the free space for descriptors.


# 1.86 19-Nov-2015 dlg

get rid of sc_tx_free and the atomic ops on it in myx_start and myx_txeof.

myx_start calculates the free space by reading the consumer index
and doing some maths, which lets us avoid the interlocked cpu ops.


# 1.85 25-Oct-2015 mpi

arp_ifinit() is no longer needed.


# 1.84 29-Sep-2015 dlg

get rid of the mutex between access to the status block and myx_down

myx is unusual in that it has an explicit command to shut down the
chip that gets an interrupt when it's done. so myx_down sends the
command and has to sleep until it gets that interrupt. this moves
to using a single int to represent that state (so loads and stores
are atomic), and sleep_setup/sleep_finish in myx_down to wait for
it to change.

this has been running in production at work for a few months now
tested by chris@


# 1.83 01-Sep-2015 deraadt

free() firmware with right len; ok dlg


# 1.82 15-Aug-2015 dlg

do the global tx free accounting in myx_start with a single atomic op
instead of one per packet.

seems to let me send packets a little faster.


# 1.81 15-Aug-2015 dlg

rework the tx path to use a ring to keep track of dmamaps/mbufs.

this removes the myx_buf structure and uses myx_slot instead. theyre
the same expcet slots dont have list entry structures, so theyre
smaller.

this cuts out four mutex ops per packet out of the tx handling.
just have to get rid of the atomic op per packet in myx_start now.


# 1.80 14-Aug-2015 dlg

move to a per rx ring timeout for refilling empty rings.

this lets me get rid of the locking around the refilling of the rx ring.

the timeout only runs refill if the rx ring is empty. we know rxeof
wont try and refill it in that situation because there's no packets
on the ring so we wont get interrupts for it. therefore we dont
need to lock between the timeout and rxeof cos they cant run at the
same time.


# 1.79 14-Aug-2015 dlg

rework how we track the packets on the rx rings.

originally there were two mutex protected lists for rx packets, a
list of free packets, and a list of packets that were on the ring.
filling the ring popped packets off the free list, attached an mbuf
and dmamapped it, and pushed it onto the list of active packets.
the hw fills packets in order, so on rx completion we'd pop packets
the active list, unmap the mbuf and shove it up the stack before
putting the packet on the free list.

the problem with the lists is that every rx ring operation resulted
in two mutex ops. so 4 mutex ops per packet after you do both fill
and rxeof.

this replaces the mutexed lists with rings that shadow the hardware
rings. filling the rx ring pushes a producer index along, while
rxeof chases it with a consumer. because we know only one thing can
do either of those tasks at a time, we can get away with not using
atomic ops for them.

there's more to be done, but this is a good first step.


Revision tags: OPENBSD_5_8_BASE
# 1.78 24-Jun-2015 mpi

Increment if_ipackets in if_input().

Note that pseudo-drivers not using if_input() are not affected by this
conversion.

ok mikeb@, kettenis@, claudio@, dlg@


# 1.77 17-May-2015 chris

We don't need KERNEL_LOCK() around if_input() anymore, as if_input() has
appropriate locking around bpf now.

ok dlg@


# 1.76 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.75 20-Feb-2015 chris

Now that if_input() is a thing, use it

ok dlg@


# 1.74 18-Feb-2015 dlg

myri employees and their drivers for linux and solaris have repeatedly
told me that if you're going to rx into buffers greater than 4k in
size, they have to be aligned to a 4k boundary.

the mru of this chip is 9k, but ive been using the 12k mcl pool to
provide the alignment. however, if we move to putting 8 items on a
pool page there'll be enough slack space in the mcl12k pool pages
to allow item colouring, which in turn will break the chip requirement
above. in practice the chips i have seem to work fine with unaligned
buffers, but i dont want to risk breaking early revision chips.

this moves myx to using a private pool for allocating clusters for
the big rx ring. the item size is 9k, but we specify a 4k alignment
so every item we get out of it will be correct for the chip.


# 1.73 18-Feb-2015 dlg

enable pcie relaxed transaction ordering and bump the max payload
size up to 4k.

found while reading someone elses driver.


# 1.72 22-Dec-2014 tedu

unifdef INET


# 1.71 28-Oct-2014 dlg

the if_rxring accounting would get screwed up if the first mbuf to
be put on the ring couldnt be allocated.

this pulls the code that puts the mbufs on the ring out of myx_rx_fill
so it can return early if firstmb cant be allocated, which puts it
in the right place to return unused slots to the if_rxring.

this means myx rx wont lock up if you're DoSsed to the point where
you exhaust your mbuf pools and cant allocate mbufs for the ring.

ok jmatthew@


# 1.70 04-Oct-2014 dlg

replace mutexes to serialise the operations on the flag that restricts
the number of contexts that are refilling the rx rings with atomic
ops.

this is borrowed from code i wrote for the scsi midlayer but cant
put in yet because i havent got atomic.h up to scrach on all archs
yet. the archs myx runs on do have enough atomic.h to be fine though.


# 1.69 03-Oct-2014 dlg

refill the rx ring in myx_rxeof, not much later at the end of myx_intr.


# 1.68 03-Oct-2014 dlg

in rxeof, instead of taking the biglock on every packet to call bpf
and ether_input, queue all the mbufs onto an mbuf_list on the stack
and then take the biglock once outside the loop.


# 1.67 03-Oct-2014 dlg

we dont need the kernel lock to call bus_dmamap_load and unload thanks
to ketenis.

move the if_ipacket and if_opacket increments out of biglock too.
theyre only updated from the interrupt handler, which is only run
on a single cpu so there's no chance of the update racing. everywhere
else only reads them.


# 1.66 03-Oct-2014 dlg

dont need to hold the kernel lock to call MCLGETI and m_freem now.


# 1.65 03-Oct-2014 dlg

dont take the kernel lock on every interrupt in case we might change
the link state or to clear OACTIVE, just take it when we know we
really are going to do those things.


# 1.64 14-Sep-2014 jsg

remove uneeded proc.h includes
ok mpi@ kspillner@


# 1.63 19-Aug-2014 dlg

in myx_start, replace

while (space) {
IFQ_POLL;
myx_dequeue(free descr);
IFQ_DEQUEUE;
etc;
}

with

while (space && myx_dequeue(free descr)) {
IFQ_DEQUEUE;
etc;
}


# 1.62 18-Aug-2014 dlg

dont rely on mbuf.h to provide pool.h.

ok miod@, who has offerred to help with any MD fallout
ok guenther@


Revision tags: OPENBSD_5_6_BASE
# 1.61 12-Jul-2014 tedu

add a size argument to free. will be used soon, but for now default to 0.
after discussions with beck deraadt kettenis.


# 1.60 10-Jul-2014 dlg

rings that dont rx packets dont need to be refilled.


# 1.59 08-Jul-2014 dlg

cut things that relied on mclgeti for rx ring accounting/restriction over
to using if_rxr.

cut the reporting systat did over to the rxr ioctl.

tested as much as i can on alpha, amd64, and sparc64.
mpi@ has run it on macppc.
ok mpi@


# 1.58 17-Jun-2014 dlg

whitespace fix.

im sick of fixing this by hand on all my boxes while hacking on
other stuff and having it pollute my diffs.

no functional change.


# 1.57 24-Mar-2014 dlg

nothing after the irq ack posting relies on it being ordered.


Revision tags: OPENBSD_5_5_BASE
# 1.56 10-Feb-2014 dlg

the mac addresses you program with MYXCMD_SET_MCASTGROUP are in a different
format to the one used for MYXCMD_SET_LLADDR. for reasons.

this lets ospf work if you dont happen to have PROMISC enabled on your
interface like my production firewalls happen to have, which is why i
never noticed this before.


# 1.55 05-Feb-2014 dlg

after running myx(4) without biglock in production for a few days
i discovered that there's a race between the interrupt code and
myx_start which causes the count of free tx descriptors to get
distorted, which eventually leads to a permanent setting of
IFF_OACTIVE, which in turn prevents the driver from transmitting
packets.

fixing that went horribly wrong when i then discovered that there's
a race between the interrupt handler and myx_down, where the interrupt
can tell myx_down to wake up and free all the rings while the
interrupt handler is still looking at them. free panics for all.

this moves the handling of the tx free count under the biglock (for
now), and moves myx_up and myx_down to managing a "driver state"
variable independantly of the IFF_UP and IFF_RUNNING flags, and
very very careful reordering of the checks of that state variable
and the hardware state.

as a bonus we get to avoid excessive calls to myx_txeof and myx_rxeof
in the isr, and less stuff checked unconditionally. on the other
hand, the sc_state handling added some more checks so it might not
be a win overall.

tested on smp sparc64 with msi and nonmsi interrupts, and on amd64 smp
in production again.


# 1.54 31-Jan-2014 dlg

sc_function is set, but never used for anything useful. clean it up...


# 1.53 31-Jan-2014 dlg

sc_lladdr is never used, so we can get the space in the sc back.


# 1.52 23-Jan-2014 dlg

a lot of people have pointed out to me that taking a lock just to read an
int isnt necessary.


# 1.51 23-Jan-2014 dlg

factor the mutex/bus_space handling of the sts block out.


# 1.50 21-Jan-2014 dlg

introduce fine grained locking.

this doesnt give up the big lock coming from process context, only from
the interrupt side. it is excessively careful about when it takes
the big lock again. notably it goes to a lot of effort to not hold
a mutex while calling into other subsystems or before taking the
big lock.

ive been hitting it as hard as i can without problems.

intensly read by mpi@
ok claudio@ kettenis@


# 1.49 19-Jan-2014 dlg

white space fix


# 1.48 19-Jan-2014 dlg

introduce fine grained locking around the lists of packet handlers
myx maintains. this moves it away from relying on splnet to protect
them.


# 1.47 19-Jan-2014 dlg

hwflags is never used, so clean it up


# 1.46 19-Jan-2014 dlg

replace bcmp with memcmp


# 1.45 19-Jan-2014 dlg

bcopy to memcpy


# 1.44 19-Jan-2014 dlg

replace bzero with memset.


# 1.43 19-Jan-2014 dlg

all 64bit archs myx runs on support bus_space 8 things because of work i
did at n2k13.


Revision tags: OPENBSD_5_3_BASE OPENBSD_5_4_BASE
# 1.42 29-Jan-2013 brad

- Set ENETRESET within myx_ioctl() instead of calling myx_iff() directly, to be
consistent with other drivers.
- Clear IFF_ALLMULTI flag early and at the top of myx_iff().
- Set IFF_ALLMULTI when in promisc mode.

ok dlg@


# 1.41 25-Jan-2013 dlg

we go to a lot of effort to post the first tx descriptor last, but we
really should be trying to post everything except the flags field in the
first tx descriptor. this shuffles things around so the rest of that first
txd is posted as part of the "everything else" before its flags field.


# 1.40 25-Jan-2013 dlg

the myx_dmamem struct doesnt need a name.


# 1.39 21-Jan-2013 dlg

myx does reads and writes in one direction to packet buffers. lets try
STREAMING them.


# 1.38 15-Jan-2013 dlg

dont use amd64 is currently broken cos it has no
bus_space_write_raw_region_8. disabling it for now.


# 1.37 15-Jan-2013 dlg

use bus_space_write_raw_region_8 on 64bit archs when writing to the rings


# 1.36 14-Jan-2013 dlg

map the registers PREFETCHABLE so things that can do write combining can
try and do write combining like the myx doco likes.


# 1.35 14-Jan-2013 dlg

avoid extra bus_space barriers in the interrupt handler.


# 1.34 14-Jan-2013 dlg

when posting descriptors to the chips rings, avoid going write barrier
write barrier write barrier when using myx_write to post descriptors.

instead let its go write write write barrier by using the appropriate
bus_space write directly followed by a single bus_space barrier.

the story above is mostly true, except that myx wants use to write all the
descriptors except the first, barrier, and then write the first one out to
signale that the chip can proceed.

it is also worth noting that the barriers cover more address space than
what we actually wrote to. this makes the code much simpler, and avoids
generating extra fence operations (which is what barrier functions end up
as on most of our archs) when we wrap around the end of the ring. the
bus_space doco encourages this.

bus_space use was discussed with krw@ kettenis@ deraadt@


# 1.33 14-Jan-2013 dlg

the myri doco suggests its nice to post stuff by filling in everything
in the rings except the first descriptor. once you've written as
much as you can out, then you go back and post the first descriptor
to signal that the chip should go ahead and work.


# 1.32 14-Jan-2013 dlg

;; is a long way of saying ;


# 1.31 29-Nov-2012 brad

Remove setting an initial assumed baudrate upon driver attach which is not
necessarily correct, there might not even be a link when attaching.

ok mikeb@ reyk@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.30 28-Nov-2011 blambert

Fix reversed error-handling gotos in myx_buf_fill(), which would lead to
either an mbuf leak or a NULL pointer dereference.

ok sthen@ claudio@ dlg@
testing claudio@ dlg@


Revision tags: OPENBSD_5_0_BASE
# 1.29 08-Aug-2011 dlg

myx requires the driver pad short ethernet frames to 60 bytes by
adding a descriptor pointing at zeroed bytes onto the end of transmit
chains. i was accounting for this extra descriptor when i was
completing the chain, but not when i was setting this up. this
meant the number of free descriptors kept growing until it overflowed.
at this point the check for space in the ring failed and packets
no longer flowed.

this counts the pad descriptor in the tx chain setup too.

ok deraadt@


# 1.28 23-Jun-2011 dlg

cope with empty rx rings by scheduling a timeout to keep trying until it
gets some packets onto the rings.

also annoying, but the hardware doesnt report empty rings, we have to
handle it ourselves.


# 1.27 23-Jun-2011 dlg

this chip has an annoying "feature" where it cannot report the link
state unless the chip is up and handling packets. while its down
it does not report the link state, so it is unknown.

this tweaks the link state handling, in particular it adds code to
myx_down so it moves the link state to unknown, ie, it correctly
reflects reality.

stupidity pointed out by deraadt


# 1.26 22-Jun-2011 deraadt

reset the tx_count on UP, since it may have been advanced from non-zero
by a previous use
ok claudio


# 1.25 22-Jun-2011 dlg

msi support. this is a complicated one...

ok kettenis@


# 1.24 22-Jun-2011 jsg

another myri10ge device matched by freebsd/linux drivers
ok dlg@


# 1.23 22-Jun-2011 dlg

oops, handle refill like i said i was going to two revisions ago.


# 1.22 22-Jun-2011 deraadt

set the mac address on the chip correctly (repair the byte order)
it now works on sparc64, too
ok dlg


# 1.21 22-Jun-2011 dlg

deraadt plugged his myx into a sparc64 and discovered 3 problems:

1. we want to write raw values to registers all the time, so promote the
myx_raw{read,write} to myx_{read,write} and use them everywhere. get rid
of the raw funcs.
2. i was setting the watermarks on the rx ring before knowhing how big
they were.
3. rxfill in the interrupt handler could lose data if you loop on
sts_isvalid.

almost working now...

"please commit your diff" deraadt@


# 1.20 21-Jun-2011 dlg

do the unaligned dma tests so we can figure out if we need to fall
back to the unaligned firmware. apparently this is only an issue
on the "A" controllers which have been supersceded, but those are
the chips we (openbsd devs) have.


# 1.19 21-Jun-2011 dlg

report the controllers part number. eg, i now know i have a
10G-PCIE-8A-R. dmesg looks like this:

myx0 at pci4 dev 0 function 0 "Myricom Z8E" rev 0x00: apic 1 int 8, model 10G-PCIE-8A-R, address 00:60:dd:47:c6:74


# 1.18 21-Jun-2011 dlg

wire up jumbos properly. the hardware supports up to 9018 bytes off
the wire (9000 + ether header + vlan tag), but has some cool alignment
requirements. if you want to use a single rx ring desc to point at
a jumbo it needs to start on a 4k boundary and be physically
contiguous. to ensure this im pulling frames from the 12k pool and
waiting for arianes diff to ensure mbufs are contig.

direction from andrew gallatin. tested locally.


# 1.17 21-Jun-2011 deraadt

minor cleanups; ok dlg


# 1.16 20-Jun-2011 dlg

make the interrupt handler look more like what the doco suggests. seems to
fix a bad lockup i kept getting.


# 1.15 20-Jun-2011 dlg

dont need debug, the myx_cmd stuff works fine.


# 1.14 20-Jun-2011 dlg

i got myx working!


# 1.13 02-May-2011 chl

Do not check malloc return value against NULL, as M_WAITOK is used.

ok dlg@ krw@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.12 19-May-2010 oga

BUS_DMA_ZERO instead of alloc, map, bzero.

ok krw@


Revision tags: OPENBSD_4_7_BASE
# 1.11 13-Aug-2009 jasper

- consistify cfdriver for the ethernet drivers (0 -> NULL)

ok dlg@


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.10 28-Nov-2008 brad

Eliminate the redundant bits of code for MTU and multicast handling
from the individual drivers now that ether_ioctl() handles this.

Shrinks the i386 kernels by..
RAMDISK - 2176 bytes
RAMDISKB - 1504 bytes
RAMDISKC - 736 bytes

Tested by naddy@/okan@/sthen@/brad@/todd@/jmc@ and lots of users.
Build tested on almost all archs by todd@/brad@

ok naddy@


# 1.9 02-Oct-2008 brad

First step towards cleaning up the Ethernet driver ioctl handling.
Move calling ether_ioctl() from the top of the ioctl function, which
at the moment does absolutely nothing, to the default switch case.
Thus allowing drivers to define their own ioctl handlers and then
falling back on ether_ioctl(). The only functional change this results
in at the moment is having all Ethernet drivers returning the proper
errno of ENOTTY instead of EINVAL/ENXIO when encountering unknown
ioctl's.

Shrinks the i386 kernels by..
RAMDISK - 1024 bytes
RAMDISKB - 1120 bytes
RAMDISKC - 832 bytes

Tested by martin@/jsing@/todd@/brad@
Build tested on almost all archs by todd@/brad@

ok jsing@


# 1.8 10-Sep-2008 blambert

Convert timeout_add() calls using multiples of hz to timeout_add_sec()

Really just the low-hanging fruit of (hopefully) forthcoming timeout
conversions.

ok art@, krw@


Revision tags: OPENBSD_4_4_BASE
# 1.7 23-May-2008 brad

Simplify the combination use of pci_mapreg_type()/pci_mapreg_map() as
suggested by dlg@ awhile ago.

ok dlg@


Revision tags: OPENBSD_4_3_BASE
# 1.6 16-Jan-2008 thib

Set the baudrate with IF_Gbps(10); and remove an
XXX comment now that if_baudrate is 64bits.

ok reyk@


Revision tags: OPENBSD_4_2_BASE
# 1.5 01-Jun-2007 reyk

initialize the rings


# 1.4 31-May-2007 reyk

further improvement of the bus space i/o. firmware loading, booting,
and card initalization works now.

thanks to dlg@ who pointed me to the fact that
bus_space_write_region_N and bus_space_write_raw_region_N use count of
elements vs. size of buffer arguments.


# 1.3 31-May-2007 reyk

enable all debugging messages by default if the driver is compiled with
MYX_DEBUG


# 1.2 31-May-2007 reyk

fix the myx_write function


# 1.1 31-May-2007 reyk

initial bits of a new driver for the Myricom Myri-10G Lanai-Z8E 10Gb
Ethernet chipset. not working yet.

ok dlg@


# 1.107 16-Apr-2019 dlg

i2c reads are more reliable a byte at a time.

reading all 256 at a time was a nice idea, but meant page 0xa2 wasnt
appearing like it should. this follows what freebsd does more
closely too.


# 1.106 16-Apr-2019 dlg

make sff page reads work on little endian archs too. like amd64.

some modules seem to need more time when waiting for bytes while here.

hrvoje popovski hit the endian issue


# 1.105 15-Apr-2019 dlg

implement SIOCGIFSFFPAGE so ifconfig can get transceiver info.

myx doesn't allow i2c writes, so you can only read whatever page
the firmware is already pointing at on device 0xa0. if you try to
read another page it will return ENXIO.

tested on a 10G-PCIE-8A-R with an xfp module.


# 1.104 15-Apr-2019 dlg

trim some debug code that printed out the name of a command

the list of commands is going to grow, but the thought of keeping
the list in debug code up to date with it just makes me feel tired.

this prints the command id number instead in the same format we
represent it in the header.


Revision tags: OPENBSD_6_2_BASE OPENBSD_6_3_BASE OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.103 01-Aug-2017 dlg

defer init of the myxmcl pool to mountroot, and enable pool cpu caches.

pool_cache_init cannot be called during autoconf because we cant
be confident about the number of cpus in the machine until the first
run of attaches.

mountroot is after autoconf, and myx already has code that runs
there for the firmware loading.

discussed with deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.102 07-Feb-2017 dlg

move the mbuf pools to m_pool_init and a single global memory limit

this replaces individual calls to pool_init, pool_set_constraints, and
pool_sethardlimit with calls to m_pool_init. m_pool_init inits the
mbuf pools with the mbuf pool allocator, and because of that doesnt
set per pool limits.

ok bluhm@ as part of a larger diff


# 1.101 24-Jan-2017 dlg

add support for multiple transmit ifqueues per network interface.

an ifq to transmit a packet is picked by the current traffic
conditioner (ie, priq or hfsc) by providing an index into an array
of ifqs. by default interfaces get a single ifq but can ask for
more using if_attach_queues().

the vast majority of our drivers still think there's a 1:1 mapping
between interfaces and transmit queues, so their if_start routines
take an ifnet pointer instead of a pointer to the ifqueue struct.
instead of changing all the drivers in the tree, drivers can opt
into using an if_qstart routine and setting the IFXF_MPSAFE flag.
the stack provides a compatability wrapper from the new if_qstart
handler to the previous if_start handlers if IFXF_MPSAFE isnt set.

enabling hfsc on an interface configures it to transmit everything
through the first ifq. any other ifqs are left configured as priq,
but unused, when hfsc is enabled.

getting this in now so everyone can kick the tyres.

ok mpi@ visa@ (who provided some tweaks for cnmac).


# 1.100 22-Jan-2017 dlg

move counting if_opackets next to counting if_obytes in if_enqueue.

this means packets are consistently counted in one place, unlike the
many and various ways that drivers thought they should do it.

ok mpi@ deraadt@


# 1.99 31-Oct-2016 dlg

turns out these chips can handle buffers up to 9400 bytes in length.

raise the mtu to 9380 bytes so we can take advantage of the extra space.

i need to revisit the macro names at some point.


# 1.98 31-Oct-2016 dlg

revert 1.97 where i moved myx to using the system pools

my early revision board doesnt like it at all


# 1.97 28-Oct-2016 dlg

get rid of the custom pool in myx for jumbo frames.

now it asks the mbuf layer for the 9k from its pools.

a question from chris@ made me go look at the chip doco again and i
realised that the chip only requires 4 byte alignment for rx buffers,
no 4k alignment for jumbo buffers.

i also found that the chip is supposed to be able to rx up to 9400
bytes instead of 9000. ill fix that later though.


# 1.96 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


Revision tags: OPENBSD_6_0_BASE
# 1.95 23-May-2016 tedu

remove the function pointer from mbufs. this memory is shared with data
via unions, and we don't want to make it easy to control the target.
instead an integer index into an array of acceptable functions is used.
drivers using custom functions must register them to receive an index.
ok deraadt


# 1.94 13-Apr-2016 mpi

G/C IFQ_SET_READY().


# 1.93 13-Apr-2016 mpi

G/C IFQ_SET_READY().


Revision tags: OPENBSD_5_9_BASE
# 1.92 11-Dec-2015 mpi

Replace mountroothook_establish(9) by config_mountroot(9) a narrower API
similar to config_defer(9).

ok mikeb@, deraadt@


# 1.91 09-Dec-2015 dlg

rework the if_start mpsafe serialisation so it can serialise arbitrary work

work is represented by struct task.

the start routine is now wrapped by a task which is serialised by the
infrastructure. if_start_barrier has been renamed to ifq_barrier and
is now implemented as a task that gets serialised with the start
routine.

this also adds an ifq_restart() function. it serialises a call to
ifq_clr_oactive and calls the start routine again. it exists to
avoid a race that kettenis@ identified in between when a start
routine discovers theres no space left on a ring, and when it calls
ifq_set_oactive. if the txeof side of the driver empties the ring
and calls ifq_clr_oactive in between the above calls in start, the
queue will be marked oactive and the stack will never call the start
routine again.

by serialising the ifq_set_oactive call in the start routine and
ifq_clr_oactive calls we avoid that race.

tested on various nics
ok mpi@


# 1.90 03-Dec-2015 dlg

tell the stack myx_start is mpsafe.

as per the stack commit, the driver changes are:

1. setting ifp->if_xflags = IFXF_MPSAFE
2. only calling if_start() instead of its own start routine
3. clearing IFF_RUNNING before calling if_start_barrier() on its way down
4. only using IFQ_DEQUEUE (not ifq_deq_begin/commit/rollback)


# 1.89 01-Dec-2015 dlg

myx doesnt use atomic.h anymore.


# 1.88 25-Nov-2015 dlg

replace IFF_OACTIVE manipulation with mpsafe operations.

there are two things shared between the network stack and drivers
in the send path: the send queue and the IFF_OACTIVE flag. the send
queue is now protected by a mutex. this diff makes the oactive
functionality mpsafe too.

IFF_OACTIVE is part of if_flags. there are two problems with that.
firstly, if_flags is a short and we dont have any MI atomic operations
to manipulate a short. secondly, while we could make the IFF_OACTIVE
operates mpsafe, all changes to other flags would have to be made
safe at the same time, otherwise a read-modify-write cycle on their
updates could clobber the oactive change.

instead, this moves the oactive mark into struct ifqueue and provides
an API for changing it. there's ifq_set_oactive, ifq_clr_oactive,
and ifq_is_oactive. these are modelled on ifsq_set_oactive,
ifsq_clr_oactive, and ifsq_is_oactive in dragonflybsd.

this diff includes changes to all the drivers manipulating IFF_OACTIVE
to now use the ifsq_{set,clr_is}_oactive API too.

ok kettenis@ mpi@ jmatthew@ deraadt@


# 1.87 24-Nov-2015 dlg

fix tx ring accounting in myx_start.

turns out i was calculating the number of packets (not descriptors)
on the tx ring, and then using that as the free space for descriptors.


# 1.86 19-Nov-2015 dlg

get rid of sc_tx_free and the atomic ops on it in myx_start and myx_txeof.

myx_start calculates the free space by reading the consumer index
and doing some maths, which lets us avoid the interlocked cpu ops.


# 1.85 25-Oct-2015 mpi

arp_ifinit() is no longer needed.


# 1.84 29-Sep-2015 dlg

get rid of the mutex between access to the status block and myx_down

myx is unusual in that it has an explicit command to shut down the
chip that gets an interrupt when it's done. so myx_down sends the
command and has to sleep until it gets that interrupt. this moves
to using a single int to represent that state (so loads and stores
are atomic), and sleep_setup/sleep_finish in myx_down to wait for
it to change.

this has been running in production at work for a few months now
tested by chris@


# 1.83 01-Sep-2015 deraadt

free() firmware with right len; ok dlg


# 1.82 15-Aug-2015 dlg

do the global tx free accounting in myx_start with a single atomic op
instead of one per packet.

seems to let me send packets a little faster.


# 1.81 15-Aug-2015 dlg

rework the tx path to use a ring to keep track of dmamaps/mbufs.

this removes the myx_buf structure and uses myx_slot instead. theyre
the same expcet slots dont have list entry structures, so theyre
smaller.

this cuts out four mutex ops per packet out of the tx handling.
just have to get rid of the atomic op per packet in myx_start now.


# 1.80 14-Aug-2015 dlg

move to a per rx ring timeout for refilling empty rings.

this lets me get rid of the locking around the refilling of the rx ring.

the timeout only runs refill if the rx ring is empty. we know rxeof
wont try and refill it in that situation because there's no packets
on the ring so we wont get interrupts for it. therefore we dont
need to lock between the timeout and rxeof cos they cant run at the
same time.


# 1.79 14-Aug-2015 dlg

rework how we track the packets on the rx rings.

originally there were two mutex protected lists for rx packets, a
list of free packets, and a list of packets that were on the ring.
filling the ring popped packets off the free list, attached an mbuf
and dmamapped it, and pushed it onto the list of active packets.
the hw fills packets in order, so on rx completion we'd pop packets
the active list, unmap the mbuf and shove it up the stack before
putting the packet on the free list.

the problem with the lists is that every rx ring operation resulted
in two mutex ops. so 4 mutex ops per packet after you do both fill
and rxeof.

this replaces the mutexed lists with rings that shadow the hardware
rings. filling the rx ring pushes a producer index along, while
rxeof chases it with a consumer. because we know only one thing can
do either of those tasks at a time, we can get away with not using
atomic ops for them.

there's more to be done, but this is a good first step.


Revision tags: OPENBSD_5_8_BASE
# 1.78 24-Jun-2015 mpi

Increment if_ipackets in if_input().

Note that pseudo-drivers not using if_input() are not affected by this
conversion.

ok mikeb@, kettenis@, claudio@, dlg@


# 1.77 17-May-2015 chris

We don't need KERNEL_LOCK() around if_input() anymore, as if_input() has
appropriate locking around bpf now.

ok dlg@


# 1.76 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.75 20-Feb-2015 chris

Now that if_input() is a thing, use it

ok dlg@


# 1.74 18-Feb-2015 dlg

myri employees and their drivers for linux and solaris have repeatedly
told me that if you're going to rx into buffers greater than 4k in
size, they have to be aligned to a 4k boundary.

the mru of this chip is 9k, but ive been using the 12k mcl pool to
provide the alignment. however, if we move to putting 8 items on a
pool page there'll be enough slack space in the mcl12k pool pages
to allow item colouring, which in turn will break the chip requirement
above. in practice the chips i have seem to work fine with unaligned
buffers, but i dont want to risk breaking early revision chips.

this moves myx to using a private pool for allocating clusters for
the big rx ring. the item size is 9k, but we specify a 4k alignment
so every item we get out of it will be correct for the chip.


# 1.73 18-Feb-2015 dlg

enable pcie relaxed transaction ordering and bump the max payload
size up to 4k.

found while reading someone elses driver.


# 1.72 22-Dec-2014 tedu

unifdef INET


# 1.71 28-Oct-2014 dlg

the if_rxring accounting would get screwed up if the first mbuf to
be put on the ring couldnt be allocated.

this pulls the code that puts the mbufs on the ring out of myx_rx_fill
so it can return early if firstmb cant be allocated, which puts it
in the right place to return unused slots to the if_rxring.

this means myx rx wont lock up if you're DoSsed to the point where
you exhaust your mbuf pools and cant allocate mbufs for the ring.

ok jmatthew@


# 1.70 04-Oct-2014 dlg

replace mutexes to serialise the operations on the flag that restricts
the number of contexts that are refilling the rx rings with atomic
ops.

this is borrowed from code i wrote for the scsi midlayer but cant
put in yet because i havent got atomic.h up to scrach on all archs
yet. the archs myx runs on do have enough atomic.h to be fine though.


# 1.69 03-Oct-2014 dlg

refill the rx ring in myx_rxeof, not much later at the end of myx_intr.


# 1.68 03-Oct-2014 dlg

in rxeof, instead of taking the biglock on every packet to call bpf
and ether_input, queue all the mbufs onto an mbuf_list on the stack
and then take the biglock once outside the loop.


# 1.67 03-Oct-2014 dlg

we dont need the kernel lock to call bus_dmamap_load and unload thanks
to ketenis.

move the if_ipacket and if_opacket increments out of biglock too.
theyre only updated from the interrupt handler, which is only run
on a single cpu so there's no chance of the update racing. everywhere
else only reads them.


# 1.66 03-Oct-2014 dlg

dont need to hold the kernel lock to call MCLGETI and m_freem now.


# 1.65 03-Oct-2014 dlg

dont take the kernel lock on every interrupt in case we might change
the link state or to clear OACTIVE, just take it when we know we
really are going to do those things.


# 1.64 14-Sep-2014 jsg

remove uneeded proc.h includes
ok mpi@ kspillner@


# 1.63 19-Aug-2014 dlg

in myx_start, replace

while (space) {
IFQ_POLL;
myx_dequeue(free descr);
IFQ_DEQUEUE;
etc;
}

with

while (space && myx_dequeue(free descr)) {
IFQ_DEQUEUE;
etc;
}


# 1.62 18-Aug-2014 dlg

dont rely on mbuf.h to provide pool.h.

ok miod@, who has offerred to help with any MD fallout
ok guenther@


Revision tags: OPENBSD_5_6_BASE
# 1.61 12-Jul-2014 tedu

add a size argument to free. will be used soon, but for now default to 0.
after discussions with beck deraadt kettenis.


# 1.60 10-Jul-2014 dlg

rings that dont rx packets dont need to be refilled.


# 1.59 08-Jul-2014 dlg

cut things that relied on mclgeti for rx ring accounting/restriction over
to using if_rxr.

cut the reporting systat did over to the rxr ioctl.

tested as much as i can on alpha, amd64, and sparc64.
mpi@ has run it on macppc.
ok mpi@


# 1.58 17-Jun-2014 dlg

whitespace fix.

im sick of fixing this by hand on all my boxes while hacking on
other stuff and having it pollute my diffs.

no functional change.


# 1.57 24-Mar-2014 dlg

nothing after the irq ack posting relies on it being ordered.


Revision tags: OPENBSD_5_5_BASE
# 1.56 10-Feb-2014 dlg

the mac addresses you program with MYXCMD_SET_MCASTGROUP are in a different
format to the one used for MYXCMD_SET_LLADDR. for reasons.

this lets ospf work if you dont happen to have PROMISC enabled on your
interface like my production firewalls happen to have, which is why i
never noticed this before.


# 1.55 05-Feb-2014 dlg

after running myx(4) without biglock in production for a few days
i discovered that there's a race between the interrupt code and
myx_start which causes the count of free tx descriptors to get
distorted, which eventually leads to a permanent setting of
IFF_OACTIVE, which in turn prevents the driver from transmitting
packets.

fixing that went horribly wrong when i then discovered that there's
a race between the interrupt handler and myx_down, where the interrupt
can tell myx_down to wake up and free all the rings while the
interrupt handler is still looking at them. free panics for all.

this moves the handling of the tx free count under the biglock (for
now), and moves myx_up and myx_down to managing a "driver state"
variable independantly of the IFF_UP and IFF_RUNNING flags, and
very very careful reordering of the checks of that state variable
and the hardware state.

as a bonus we get to avoid excessive calls to myx_txeof and myx_rxeof
in the isr, and less stuff checked unconditionally. on the other
hand, the sc_state handling added some more checks so it might not
be a win overall.

tested on smp sparc64 with msi and nonmsi interrupts, and on amd64 smp
in production again.


# 1.54 31-Jan-2014 dlg

sc_function is set, but never used for anything useful. clean it up...


# 1.53 31-Jan-2014 dlg

sc_lladdr is never used, so we can get the space in the sc back.


# 1.52 23-Jan-2014 dlg

a lot of people have pointed out to me that taking a lock just to read an
int isnt necessary.


# 1.51 23-Jan-2014 dlg

factor the mutex/bus_space handling of the sts block out.


# 1.50 21-Jan-2014 dlg

introduce fine grained locking.

this doesnt give up the big lock coming from process context, only from
the interrupt side. it is excessively careful about when it takes
the big lock again. notably it goes to a lot of effort to not hold
a mutex while calling into other subsystems or before taking the
big lock.

ive been hitting it as hard as i can without problems.

intensly read by mpi@
ok claudio@ kettenis@


# 1.49 19-Jan-2014 dlg

white space fix


# 1.48 19-Jan-2014 dlg

introduce fine grained locking around the lists of packet handlers
myx maintains. this moves it away from relying on splnet to protect
them.


# 1.47 19-Jan-2014 dlg

hwflags is never used, so clean it up


# 1.46 19-Jan-2014 dlg

replace bcmp with memcmp


# 1.45 19-Jan-2014 dlg

bcopy to memcpy


# 1.44 19-Jan-2014 dlg

replace bzero with memset.


# 1.43 19-Jan-2014 dlg

all 64bit archs myx runs on support bus_space 8 things because of work i
did at n2k13.


Revision tags: OPENBSD_5_3_BASE OPENBSD_5_4_BASE
# 1.42 29-Jan-2013 brad

- Set ENETRESET within myx_ioctl() instead of calling myx_iff() directly, to be
consistent with other drivers.
- Clear IFF_ALLMULTI flag early and at the top of myx_iff().
- Set IFF_ALLMULTI when in promisc mode.

ok dlg@


# 1.41 25-Jan-2013 dlg

we go to a lot of effort to post the first tx descriptor last, but we
really should be trying to post everything except the flags field in the
first tx descriptor. this shuffles things around so the rest of that first
txd is posted as part of the "everything else" before its flags field.


# 1.40 25-Jan-2013 dlg

the myx_dmamem struct doesnt need a name.


# 1.39 21-Jan-2013 dlg

myx does reads and writes in one direction to packet buffers. lets try
STREAMING them.


# 1.38 15-Jan-2013 dlg

dont use amd64 is currently broken cos it has no
bus_space_write_raw_region_8. disabling it for now.


# 1.37 15-Jan-2013 dlg

use bus_space_write_raw_region_8 on 64bit archs when writing to the rings


# 1.36 14-Jan-2013 dlg

map the registers PREFETCHABLE so things that can do write combining can
try and do write combining like the myx doco likes.


# 1.35 14-Jan-2013 dlg

avoid extra bus_space barriers in the interrupt handler.


# 1.34 14-Jan-2013 dlg

when posting descriptors to the chips rings, avoid going write barrier
write barrier write barrier when using myx_write to post descriptors.

instead let its go write write write barrier by using the appropriate
bus_space write directly followed by a single bus_space barrier.

the story above is mostly true, except that myx wants use to write all the
descriptors except the first, barrier, and then write the first one out to
signale that the chip can proceed.

it is also worth noting that the barriers cover more address space than
what we actually wrote to. this makes the code much simpler, and avoids
generating extra fence operations (which is what barrier functions end up
as on most of our archs) when we wrap around the end of the ring. the
bus_space doco encourages this.

bus_space use was discussed with krw@ kettenis@ deraadt@


# 1.33 14-Jan-2013 dlg

the myri doco suggests its nice to post stuff by filling in everything
in the rings except the first descriptor. once you've written as
much as you can out, then you go back and post the first descriptor
to signal that the chip should go ahead and work.


# 1.32 14-Jan-2013 dlg

;; is a long way of saying ;


# 1.31 29-Nov-2012 brad

Remove setting an initial assumed baudrate upon driver attach which is not
necessarily correct, there might not even be a link when attaching.

ok mikeb@ reyk@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.30 28-Nov-2011 blambert

Fix reversed error-handling gotos in myx_buf_fill(), which would lead to
either an mbuf leak or a NULL pointer dereference.

ok sthen@ claudio@ dlg@
testing claudio@ dlg@


Revision tags: OPENBSD_5_0_BASE
# 1.29 08-Aug-2011 dlg

myx requires the driver pad short ethernet frames to 60 bytes by
adding a descriptor pointing at zeroed bytes onto the end of transmit
chains. i was accounting for this extra descriptor when i was
completing the chain, but not when i was setting this up. this
meant the number of free descriptors kept growing until it overflowed.
at this point the check for space in the ring failed and packets
no longer flowed.

this counts the pad descriptor in the tx chain setup too.

ok deraadt@


# 1.28 23-Jun-2011 dlg

cope with empty rx rings by scheduling a timeout to keep trying until it
gets some packets onto the rings.

also annoying, but the hardware doesnt report empty rings, we have to
handle it ourselves.


# 1.27 23-Jun-2011 dlg

this chip has an annoying "feature" where it cannot report the link
state unless the chip is up and handling packets. while its down
it does not report the link state, so it is unknown.

this tweaks the link state handling, in particular it adds code to
myx_down so it moves the link state to unknown, ie, it correctly
reflects reality.

stupidity pointed out by deraadt


# 1.26 22-Jun-2011 deraadt

reset the tx_count on UP, since it may have been advanced from non-zero
by a previous use
ok claudio


# 1.25 22-Jun-2011 dlg

msi support. this is a complicated one...

ok kettenis@


# 1.24 22-Jun-2011 jsg

another myri10ge device matched by freebsd/linux drivers
ok dlg@


# 1.23 22-Jun-2011 dlg

oops, handle refill like i said i was going to two revisions ago.


# 1.22 22-Jun-2011 deraadt

set the mac address on the chip correctly (repair the byte order)
it now works on sparc64, too
ok dlg


# 1.21 22-Jun-2011 dlg

deraadt plugged his myx into a sparc64 and discovered 3 problems:

1. we want to write raw values to registers all the time, so promote the
myx_raw{read,write} to myx_{read,write} and use them everywhere. get rid
of the raw funcs.
2. i was setting the watermarks on the rx ring before knowhing how big
they were.
3. rxfill in the interrupt handler could lose data if you loop on
sts_isvalid.

almost working now...

"please commit your diff" deraadt@


# 1.20 21-Jun-2011 dlg

do the unaligned dma tests so we can figure out if we need to fall
back to the unaligned firmware. apparently this is only an issue
on the "A" controllers which have been supersceded, but those are
the chips we (openbsd devs) have.


# 1.19 21-Jun-2011 dlg

report the controllers part number. eg, i now know i have a
10G-PCIE-8A-R. dmesg looks like this:

myx0 at pci4 dev 0 function 0 "Myricom Z8E" rev 0x00: apic 1 int 8, model 10G-PCIE-8A-R, address 00:60:dd:47:c6:74


# 1.18 21-Jun-2011 dlg

wire up jumbos properly. the hardware supports up to 9018 bytes off
the wire (9000 + ether header + vlan tag), but has some cool alignment
requirements. if you want to use a single rx ring desc to point at
a jumbo it needs to start on a 4k boundary and be physically
contiguous. to ensure this im pulling frames from the 12k pool and
waiting for arianes diff to ensure mbufs are contig.

direction from andrew gallatin. tested locally.


# 1.17 21-Jun-2011 deraadt

minor cleanups; ok dlg


# 1.16 20-Jun-2011 dlg

make the interrupt handler look more like what the doco suggests. seems to
fix a bad lockup i kept getting.


# 1.15 20-Jun-2011 dlg

dont need debug, the myx_cmd stuff works fine.


# 1.14 20-Jun-2011 dlg

i got myx working!


# 1.13 02-May-2011 chl

Do not check malloc return value against NULL, as M_WAITOK is used.

ok dlg@ krw@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.12 19-May-2010 oga

BUS_DMA_ZERO instead of alloc, map, bzero.

ok krw@


Revision tags: OPENBSD_4_7_BASE
# 1.11 13-Aug-2009 jasper

- consistify cfdriver for the ethernet drivers (0 -> NULL)

ok dlg@


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.10 28-Nov-2008 brad

Eliminate the redundant bits of code for MTU and multicast handling
from the individual drivers now that ether_ioctl() handles this.

Shrinks the i386 kernels by..
RAMDISK - 2176 bytes
RAMDISKB - 1504 bytes
RAMDISKC - 736 bytes

Tested by naddy@/okan@/sthen@/brad@/todd@/jmc@ and lots of users.
Build tested on almost all archs by todd@/brad@

ok naddy@


# 1.9 02-Oct-2008 brad

First step towards cleaning up the Ethernet driver ioctl handling.
Move calling ether_ioctl() from the top of the ioctl function, which
at the moment does absolutely nothing, to the default switch case.
Thus allowing drivers to define their own ioctl handlers and then
falling back on ether_ioctl(). The only functional change this results
in at the moment is having all Ethernet drivers returning the proper
errno of ENOTTY instead of EINVAL/ENXIO when encountering unknown
ioctl's.

Shrinks the i386 kernels by..
RAMDISK - 1024 bytes
RAMDISKB - 1120 bytes
RAMDISKC - 832 bytes

Tested by martin@/jsing@/todd@/brad@
Build tested on almost all archs by todd@/brad@

ok jsing@


# 1.8 10-Sep-2008 blambert

Convert timeout_add() calls using multiples of hz to timeout_add_sec()

Really just the low-hanging fruit of (hopefully) forthcoming timeout
conversions.

ok art@, krw@


Revision tags: OPENBSD_4_4_BASE
# 1.7 23-May-2008 brad

Simplify the combination use of pci_mapreg_type()/pci_mapreg_map() as
suggested by dlg@ awhile ago.

ok dlg@


Revision tags: OPENBSD_4_3_BASE
# 1.6 16-Jan-2008 thib

Set the baudrate with IF_Gbps(10); and remove an
XXX comment now that if_baudrate is 64bits.

ok reyk@


Revision tags: OPENBSD_4_2_BASE
# 1.5 01-Jun-2007 reyk

initialize the rings


# 1.4 31-May-2007 reyk

further improvement of the bus space i/o. firmware loading, booting,
and card initalization works now.

thanks to dlg@ who pointed me to the fact that
bus_space_write_region_N and bus_space_write_raw_region_N use count of
elements vs. size of buffer arguments.


# 1.3 31-May-2007 reyk

enable all debugging messages by default if the driver is compiled with
MYX_DEBUG


# 1.2 31-May-2007 reyk

fix the myx_write function


# 1.1 31-May-2007 reyk

initial bits of a new driver for the Myricom Myri-10G Lanai-Z8E 10Gb
Ethernet chipset. not working yet.

ok dlg@


Revision tags: OPENBSD_6_2_BASE
# 1.103 01-Aug-2017 dlg

defer init of the myxmcl pool to mountroot, and enable pool cpu caches.

pool_cache_init cannot be called during autoconf because we cant
be confident about the number of cpus in the machine until the first
run of attaches.

mountroot is after autoconf, and myx already has code that runs
there for the firmware loading.

discussed with deraadt@


Revision tags: OPENBSD_6_1_BASE
# 1.102 07-Feb-2017 dlg

move the mbuf pools to m_pool_init and a single global memory limit

this replaces individual calls to pool_init, pool_set_constraints, and
pool_sethardlimit with calls to m_pool_init. m_pool_init inits the
mbuf pools with the mbuf pool allocator, and because of that doesnt
set per pool limits.

ok bluhm@ as part of a larger diff


# 1.101 24-Jan-2017 dlg

add support for multiple transmit ifqueues per network interface.

an ifq to transmit a packet is picked by the current traffic
conditioner (ie, priq or hfsc) by providing an index into an array
of ifqs. by default interfaces get a single ifq but can ask for
more using if_attach_queues().

the vast majority of our drivers still think there's a 1:1 mapping
between interfaces and transmit queues, so their if_start routines
take an ifnet pointer instead of a pointer to the ifqueue struct.
instead of changing all the drivers in the tree, drivers can opt
into using an if_qstart routine and setting the IFXF_MPSAFE flag.
the stack provides a compatability wrapper from the new if_qstart
handler to the previous if_start handlers if IFXF_MPSAFE isnt set.

enabling hfsc on an interface configures it to transmit everything
through the first ifq. any other ifqs are left configured as priq,
but unused, when hfsc is enabled.

getting this in now so everyone can kick the tyres.

ok mpi@ visa@ (who provided some tweaks for cnmac).


# 1.100 22-Jan-2017 dlg

move counting if_opackets next to counting if_obytes in if_enqueue.

this means packets are consistently counted in one place, unlike the
many and various ways that drivers thought they should do it.

ok mpi@ deraadt@


# 1.99 31-Oct-2016 dlg

turns out these chips can handle buffers up to 9400 bytes in length.

raise the mtu to 9380 bytes so we can take advantage of the extra space.

i need to revisit the macro names at some point.


# 1.98 31-Oct-2016 dlg

revert 1.97 where i moved myx to using the system pools

my early revision board doesnt like it at all


# 1.97 28-Oct-2016 dlg

get rid of the custom pool in myx for jumbo frames.

now it asks the mbuf layer for the 9k from its pools.

a question from chris@ made me go look at the chip doco again and i
realised that the chip only requires 4 byte alignment for rx buffers,
no 4k alignment for jumbo buffers.

i also found that the chip is supposed to be able to rx up to 9400
bytes instead of 9000. ill fix that later though.


# 1.96 15-Sep-2016 dlg

all pools have their ipl set via pool_setipl, so fold it into pool_init.

the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.

most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.

the manpage and subr_pool.c bits i did myself.

ok tedu@ jmatthew@

@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);


Revision tags: OPENBSD_6_0_BASE
# 1.95 23-May-2016 tedu

remove the function pointer from mbufs. this memory is shared with data
via unions, and we don't want to make it easy to control the target.
instead an integer index into an array of acceptable functions is used.
drivers using custom functions must register them to receive an index.
ok deraadt


# 1.94 13-Apr-2016 mpi

G/C IFQ_SET_READY().


# 1.93 13-Apr-2016 mpi

G/C IFQ_SET_READY().


Revision tags: OPENBSD_5_9_BASE
# 1.92 11-Dec-2015 mpi

Replace mountroothook_establish(9) by config_mountroot(9) a narrower API
similar to config_defer(9).

ok mikeb@, deraadt@


# 1.91 09-Dec-2015 dlg

rework the if_start mpsafe serialisation so it can serialise arbitrary work

work is represented by struct task.

the start routine is now wrapped by a task which is serialised by the
infrastructure. if_start_barrier has been renamed to ifq_barrier and
is now implemented as a task that gets serialised with the start
routine.

this also adds an ifq_restart() function. it serialises a call to
ifq_clr_oactive and calls the start routine again. it exists to
avoid a race that kettenis@ identified in between when a start
routine discovers theres no space left on a ring, and when it calls
ifq_set_oactive. if the txeof side of the driver empties the ring
and calls ifq_clr_oactive in between the above calls in start, the
queue will be marked oactive and the stack will never call the start
routine again.

by serialising the ifq_set_oactive call in the start routine and
ifq_clr_oactive calls we avoid that race.

tested on various nics
ok mpi@


# 1.90 03-Dec-2015 dlg

tell the stack myx_start is mpsafe.

as per the stack commit, the driver changes are:

1. setting ifp->if_xflags = IFXF_MPSAFE
2. only calling if_start() instead of its own start routine
3. clearing IFF_RUNNING before calling if_start_barrier() on its way down
4. only using IFQ_DEQUEUE (not ifq_deq_begin/commit/rollback)


# 1.89 01-Dec-2015 dlg

myx doesnt use atomic.h anymore.


# 1.88 25-Nov-2015 dlg

replace IFF_OACTIVE manipulation with mpsafe operations.

there are two things shared between the network stack and drivers
in the send path: the send queue and the IFF_OACTIVE flag. the send
queue is now protected by a mutex. this diff makes the oactive
functionality mpsafe too.

IFF_OACTIVE is part of if_flags. there are two problems with that.
firstly, if_flags is a short and we dont have any MI atomic operations
to manipulate a short. secondly, while we could make the IFF_OACTIVE
operates mpsafe, all changes to other flags would have to be made
safe at the same time, otherwise a read-modify-write cycle on their
updates could clobber the oactive change.

instead, this moves the oactive mark into struct ifqueue and provides
an API for changing it. there's ifq_set_oactive, ifq_clr_oactive,
and ifq_is_oactive. these are modelled on ifsq_set_oactive,
ifsq_clr_oactive, and ifsq_is_oactive in dragonflybsd.

this diff includes changes to all the drivers manipulating IFF_OACTIVE
to now use the ifsq_{set,clr_is}_oactive API too.

ok kettenis@ mpi@ jmatthew@ deraadt@


# 1.87 24-Nov-2015 dlg

fix tx ring accounting in myx_start.

turns out i was calculating the number of packets (not descriptors)
on the tx ring, and then using that as the free space for descriptors.


# 1.86 19-Nov-2015 dlg

get rid of sc_tx_free and the atomic ops on it in myx_start and myx_txeof.

myx_start calculates the free space by reading the consumer index
and doing some maths, which lets us avoid the interlocked cpu ops.


# 1.85 25-Oct-2015 mpi

arp_ifinit() is no longer needed.


# 1.84 29-Sep-2015 dlg

get rid of the mutex between access to the status block and myx_down

myx is unusual in that it has an explicit command to shut down the
chip that gets an interrupt when it's done. so myx_down sends the
command and has to sleep until it gets that interrupt. this moves
to using a single int to represent that state (so loads and stores
are atomic), and sleep_setup/sleep_finish in myx_down to wait for
it to change.

this has been running in production at work for a few months now
tested by chris@


# 1.83 01-Sep-2015 deraadt

free() firmware with right len; ok dlg


# 1.82 15-Aug-2015 dlg

do the global tx free accounting in myx_start with a single atomic op
instead of one per packet.

seems to let me send packets a little faster.


# 1.81 15-Aug-2015 dlg

rework the tx path to use a ring to keep track of dmamaps/mbufs.

this removes the myx_buf structure and uses myx_slot instead. theyre
the same expcet slots dont have list entry structures, so theyre
smaller.

this cuts out four mutex ops per packet out of the tx handling.
just have to get rid of the atomic op per packet in myx_start now.


# 1.80 14-Aug-2015 dlg

move to a per rx ring timeout for refilling empty rings.

this lets me get rid of the locking around the refilling of the rx ring.

the timeout only runs refill if the rx ring is empty. we know rxeof
wont try and refill it in that situation because there's no packets
on the ring so we wont get interrupts for it. therefore we dont
need to lock between the timeout and rxeof cos they cant run at the
same time.


# 1.79 14-Aug-2015 dlg

rework how we track the packets on the rx rings.

originally there were two mutex protected lists for rx packets, a
list of free packets, and a list of packets that were on the ring.
filling the ring popped packets off the free list, attached an mbuf
and dmamapped it, and pushed it onto the list of active packets.
the hw fills packets in order, so on rx completion we'd pop packets
the active list, unmap the mbuf and shove it up the stack before
putting the packet on the free list.

the problem with the lists is that every rx ring operation resulted
in two mutex ops. so 4 mutex ops per packet after you do both fill
and rxeof.

this replaces the mutexed lists with rings that shadow the hardware
rings. filling the rx ring pushes a producer index along, while
rxeof chases it with a consumer. because we know only one thing can
do either of those tasks at a time, we can get away with not using
atomic ops for them.

there's more to be done, but this is a good first step.


Revision tags: OPENBSD_5_8_BASE
# 1.78 24-Jun-2015 mpi

Increment if_ipackets in if_input().

Note that pseudo-drivers not using if_input() are not affected by this
conversion.

ok mikeb@, kettenis@, claudio@, dlg@


# 1.77 17-May-2015 chris

We don't need KERNEL_LOCK() around if_input() anymore, as if_input() has
appropriate locking around bpf now.

ok dlg@


# 1.76 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.75 20-Feb-2015 chris

Now that if_input() is a thing, use it

ok dlg@


# 1.74 18-Feb-2015 dlg

myri employees and their drivers for linux and solaris have repeatedly
told me that if you're going to rx into buffers greater than 4k in
size, they have to be aligned to a 4k boundary.

the mru of this chip is 9k, but ive been using the 12k mcl pool to
provide the alignment. however, if we move to putting 8 items on a
pool page there'll be enough slack space in the mcl12k pool pages
to allow item colouring, which in turn will break the chip requirement
above. in practice the chips i have seem to work fine with unaligned
buffers, but i dont want to risk breaking early revision chips.

this moves myx to using a private pool for allocating clusters for
the big rx ring. the item size is 9k, but we specify a 4k alignment
so every item we get out of it will be correct for the chip.


# 1.73 18-Feb-2015 dlg

enable pcie relaxed transaction ordering and bump the max payload
size up to 4k.

found while reading someone elses driver.


# 1.72 22-Dec-2014 tedu

unifdef INET


# 1.71 28-Oct-2014 dlg

the if_rxring accounting would get screwed up if the first mbuf to
be put on the ring couldnt be allocated.

this pulls the code that puts the mbufs on the ring out of myx_rx_fill
so it can return early if firstmb cant be allocated, which puts it
in the right place to return unused slots to the if_rxring.

this means myx rx wont lock up if you're DoSsed to the point where
you exhaust your mbuf pools and cant allocate mbufs for the ring.

ok jmatthew@


# 1.70 04-Oct-2014 dlg

replace mutexes to serialise the operations on the flag that restricts
the number of contexts that are refilling the rx rings with atomic
ops.

this is borrowed from code i wrote for the scsi midlayer but cant
put in yet because i havent got atomic.h up to scrach on all archs
yet. the archs myx runs on do have enough atomic.h to be fine though.


# 1.69 03-Oct-2014 dlg

refill the rx ring in myx_rxeof, not much later at the end of myx_intr.


# 1.68 03-Oct-2014 dlg

in rxeof, instead of taking the biglock on every packet to call bpf
and ether_input, queue all the mbufs onto an mbuf_list on the stack
and then take the biglock once outside the loop.


# 1.67 03-Oct-2014 dlg

we dont need the kernel lock to call bus_dmamap_load and unload thanks
to ketenis.

move the if_ipacket and if_opacket increments out of biglock too.
theyre only updated from the interrupt handler, which is only run
on a single cpu so there's no chance of the update racing. everywhere
else only reads them.


# 1.66 03-Oct-2014 dlg

dont need to hold the kernel lock to call MCLGETI and m_freem now.


# 1.65 03-Oct-2014 dlg

dont take the kernel lock on every interrupt in case we might change
the link state or to clear OACTIVE, just take it when we know we
really are going to do those things.


# 1.64 14-Sep-2014 jsg

remove uneeded proc.h includes
ok mpi@ kspillner@


# 1.63 19-Aug-2014 dlg

in myx_start, replace

while (space) {
IFQ_POLL;
myx_dequeue(free descr);
IFQ_DEQUEUE;
etc;
}

with

while (space && myx_dequeue(free descr)) {
IFQ_DEQUEUE;
etc;
}


# 1.62 18-Aug-2014 dlg

dont rely on mbuf.h to provide pool.h.

ok miod@, who has offerred to help with any MD fallout
ok guenther@


Revision tags: OPENBSD_5_6_BASE
# 1.61 12-Jul-2014 tedu

add a size argument to free. will be used soon, but for now default to 0.
after discussions with beck deraadt kettenis.


# 1.60 10-Jul-2014 dlg

rings that dont rx packets dont need to be refilled.


# 1.59 08-Jul-2014 dlg

cut things that relied on mclgeti for rx ring accounting/restriction over
to using if_rxr.

cut the reporting systat did over to the rxr ioctl.

tested as much as i can on alpha, amd64, and sparc64.
mpi@ has run it on macppc.
ok mpi@


# 1.58 17-Jun-2014 dlg

whitespace fix.

im sick of fixing this by hand on all my boxes while hacking on
other stuff and having it pollute my diffs.

no functional change.


# 1.57 24-Mar-2014 dlg

nothing after the irq ack posting relies on it being ordered.


Revision tags: OPENBSD_5_5_BASE
# 1.56 10-Feb-2014 dlg

the mac addresses you program with MYXCMD_SET_MCASTGROUP are in a different
format to the one used for MYXCMD_SET_LLADDR. for reasons.

this lets ospf work if you dont happen to have PROMISC enabled on your
interface like my production firewalls happen to have, which is why i
never noticed this before.


# 1.55 05-Feb-2014 dlg

after running myx(4) without biglock in production for a few days
i discovered that there's a race between the interrupt code and
myx_start which causes the count of free tx descriptors to get
distorted, which eventually leads to a permanent setting of
IFF_OACTIVE, which in turn prevents the driver from transmitting
packets.

fixing that went horribly wrong when i then discovered that there's
a race between the interrupt handler and myx_down, where the interrupt
can tell myx_down to wake up and free all the rings while the
interrupt handler is still looking at them. free panics for all.

this moves the handling of the tx free count under the biglock (for
now), and moves myx_up and myx_down to managing a "driver state"
variable independantly of the IFF_UP and IFF_RUNNING flags, and
very very careful reordering of the checks of that state variable
and the hardware state.

as a bonus we get to avoid excessive calls to myx_txeof and myx_rxeof
in the isr, and less stuff checked unconditionally. on the other
hand, the sc_state handling added some more checks so it might not
be a win overall.

tested on smp sparc64 with msi and nonmsi interrupts, and on amd64 smp
in production again.


# 1.54 31-Jan-2014 dlg

sc_function is set, but never used for anything useful. clean it up...


# 1.53 31-Jan-2014 dlg

sc_lladdr is never used, so we can get the space in the sc back.


# 1.52 23-Jan-2014 dlg

a lot of people have pointed out to me that taking a lock just to read an
int isnt necessary.


# 1.51 23-Jan-2014 dlg

factor the mutex/bus_space handling of the sts block out.


# 1.50 21-Jan-2014 dlg

introduce fine grained locking.

this doesnt give up the big lock coming from process context, only from
the interrupt side. it is excessively careful about when it takes
the big lock again. notably it goes to a lot of effort to not hold
a mutex while calling into other subsystems or before taking the
big lock.

ive been hitting it as hard as i can without problems.

intensly read by mpi@
ok claudio@ kettenis@


# 1.49 19-Jan-2014 dlg

white space fix


# 1.48 19-Jan-2014 dlg

introduce fine grained locking around the lists of packet handlers
myx maintains. this moves it away from relying on splnet to protect
them.


# 1.47 19-Jan-2014 dlg

hwflags is never used, so clean it up


# 1.46 19-Jan-2014 dlg

replace bcmp with memcmp


# 1.45 19-Jan-2014 dlg

bcopy to memcpy


# 1.44 19-Jan-2014 dlg

replace bzero with memset.


# 1.43 19-Jan-2014 dlg

all 64bit archs myx runs on support bus_space 8 things because of work i
did at n2k13.


Revision tags: OPENBSD_5_3_BASE OPENBSD_5_4_BASE
# 1.42 29-Jan-2013 brad

- Set ENETRESET within myx_ioctl() instead of calling myx_iff() directly, to be
consistent with other drivers.
- Clear IFF_ALLMULTI flag early and at the top of myx_iff().
- Set IFF_ALLMULTI when in promisc mode.

ok dlg@


# 1.41 25-Jan-2013 dlg

we go to a lot of effort to post the first tx descriptor last, but we
really should be trying to post everything except the flags field in the
first tx descriptor. this shuffles things around so the rest of that first
txd is posted as part of the "everything else" before its flags field.


# 1.40 25-Jan-2013 dlg

the myx_dmamem struct doesnt need a name.


# 1.39 21-Jan-2013 dlg

myx does reads and writes in one direction to packet buffers. lets try
STREAMING them.


# 1.38 15-Jan-2013 dlg

dont use amd64 is currently broken cos it has no
bus_space_write_raw_region_8. disabling it for now.


# 1.37 15-Jan-2013 dlg

use bus_space_write_raw_region_8 on 64bit archs when writing to the rings


# 1.36 14-Jan-2013 dlg

map the registers PREFETCHABLE so things that can do write combining can
try and do write combining like the myx doco likes.


# 1.35 14-Jan-2013 dlg

avoid extra bus_space barriers in the interrupt handler.


# 1.34 14-Jan-2013 dlg

when posting descriptors to the chips rings, avoid going write barrier
write barrier write barrier when using myx_write to post descriptors.

instead let its go write write write barrier by using the appropriate
bus_space write directly followed by a single bus_space barrier.

the story above is mostly true, except that myx wants use to write all the
descriptors except the first, barrier, and then write the first one out to
signale that the chip can proceed.

it is also worth noting that the barriers cover more address space than
what we actually wrote to. this makes the code much simpler, and avoids
generating extra fence operations (which is what barrier functions end up
as on most of our archs) when we wrap around the end of the ring. the
bus_space doco encourages this.

bus_space use was discussed with krw@ kettenis@ deraadt@


# 1.33 14-Jan-2013 dlg

the myri doco suggests its nice to post stuff by filling in everything
in the rings except the first descriptor. once you've written as
much as you can out, then you go back and post the first descriptor
to signal that the chip should go ahead and work.


# 1.32 14-Jan-2013 dlg

;; is a long way of saying ;


# 1.31 29-Nov-2012 brad

Remove setting an initial assumed baudrate upon driver attach which is not
necessarily correct, there might not even be a link when attaching.

ok mikeb@ reyk@


Revision tags: OPENBSD_5_1_BASE OPENBSD_5_2_BASE
# 1.30 28-Nov-2011 blambert

Fix reversed error-handling gotos in myx_buf_fill(), which would lead to
either an mbuf leak or a NULL pointer dereference.

ok sthen@ claudio@ dlg@
testing claudio@ dlg@


Revision tags: OPENBSD_5_0_BASE
# 1.29 08-Aug-2011 dlg

myx requires the driver pad short ethernet frames to 60 bytes by
adding a descriptor pointing at zeroed bytes onto the end of transmit
chains. i was accounting for this extra descriptor when i was
completing the chain, but not when i was setting this up. this
meant the number of free descriptors kept growing until it overflowed.
at this point the check for space in the ring failed and packets
no longer flowed.

this counts the pad descriptor in the tx chain setup too.

ok deraadt@


# 1.28 23-Jun-2011 dlg

cope with empty rx rings by scheduling a timeout to keep trying until it
gets some packets onto the rings.

also annoying, but the hardware doesnt report empty rings, we have to
handle it ourselves.


# 1.27 23-Jun-2011 dlg

this chip has an annoying "feature" where it cannot report the link
state unless the chip is up and handling packets. while its down
it does not report the link state, so it is unknown.

this tweaks the link state handling, in particular it adds code to
myx_down so it moves the link state to unknown, ie, it correctly
reflects reality.

stupidity pointed out by deraadt


# 1.26 22-Jun-2011 deraadt

reset the tx_count on UP, since it may have been advanced from non-zero
by a previous use
ok claudio


# 1.25 22-Jun-2011 dlg

msi support. this is a complicated one...

ok kettenis@


# 1.24 22-Jun-2011 jsg

another myri10ge device matched by freebsd/linux drivers
ok dlg@


# 1.23 22-Jun-2011 dlg

oops, handle refill like i said i was going to two revisions ago.


# 1.22 22-Jun-2011 deraadt

set the mac address on the chip correctly (repair the byte order)
it now works on sparc64, too
ok dlg


# 1.21 22-Jun-2011 dlg

deraadt plugged his myx into a sparc64 and discovered 3 problems:

1. we want to write raw values to registers all the time, so promote the
myx_raw{read,write} to myx_{read,write} and use them everywhere. get rid
of the raw funcs.
2. i was setting the watermarks on the rx ring before knowhing how big
they were.
3. rxfill in the interrupt handler could lose data if you loop on
sts_isvalid.

almost working now...

"please commit your diff" deraadt@


# 1.20 21-Jun-2011 dlg

do the unaligned dma tests so we can figure out if we need to fall
back to the unaligned firmware. apparently this is only an issue
on the "A" controllers which have been supersceded, but those are
the chips we (openbsd devs) have.


# 1.19 21-Jun-2011 dlg

report the controllers part number. eg, i now know i have a
10G-PCIE-8A-R. dmesg looks like this:

myx0 at pci4 dev 0 function 0 "Myricom Z8E" rev 0x00: apic 1 int 8, model 10G-PCIE-8A-R, address 00:60:dd:47:c6:74


# 1.18 21-Jun-2011 dlg

wire up jumbos properly. the hardware supports up to 9018 bytes off
the wire (9000 + ether header + vlan tag), but has some cool alignment
requirements. if you want to use a single rx ring desc to point at
a jumbo it needs to start on a 4k boundary and be physically
contiguous. to ensure this im pulling frames from the 12k pool and
waiting for arianes diff to ensure mbufs are contig.

direction from andrew gallatin. tested locally.


# 1.17 21-Jun-2011 deraadt

minor cleanups; ok dlg


# 1.16 20-Jun-2011 dlg

make the interrupt handler look more like what the doco suggests. seems to
fix a bad lockup i kept getting.


# 1.15 20-Jun-2011 dlg

dont need debug, the myx_cmd stuff works fine.


# 1.14 20-Jun-2011 dlg

i got myx working!


# 1.13 02-May-2011 chl

Do not check malloc return value against NULL, as M_WAITOK is used.

ok dlg@ krw@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE
# 1.12 19-May-2010 oga

BUS_DMA_ZERO instead of alloc, map, bzero.

ok krw@


Revision tags: OPENBSD_4_7_BASE
# 1.11 13-Aug-2009 jasper

- consistify cfdriver for the ethernet drivers (0 -> NULL)

ok dlg@


Revision tags: OPENBSD_4_5_BASE OPENBSD_4_6_BASE
# 1.10 28-Nov-2008 brad

Eliminate the redundant bits of code for MTU and multicast handling
from the individual drivers now that ether_ioctl() handles this.

Shrinks the i386 kernels by..
RAMDISK - 2176 bytes
RAMDISKB - 1504 bytes
RAMDISKC - 736 bytes

Tested by naddy@/okan@/sthen@/brad@/todd@/jmc@ and lots of users.
Build tested on almost all archs by todd@/brad@

ok naddy@


# 1.9 02-Oct-2008 brad

First step towards cleaning up the Ethernet driver ioctl handling.
Move calling ether_ioctl() from the top of the ioctl function, which
at the moment does absolutely nothing, to the default switch case.
Thus allowing drivers to define their own ioctl handlers and then
falling back on ether_ioctl(). The only functional change this results
in at the moment is having all Ethernet drivers returning the proper
errno of ENOTTY instead of EINVAL/ENXIO when encountering unknown
ioctl's.

Shrinks the i386 kernels by..
RAMDISK - 1024 bytes
RAMDISKB - 1120 bytes
RAMDISKC - 832 bytes

Tested by martin@/jsing@/todd@/brad@
Build tested on almost all archs by todd@/brad@

ok jsing@


# 1.8 10-Sep-2008 blambert

Convert timeout_add() calls using multiples of hz to timeout_add_sec()

Really just the low-hanging fruit of (hopefully) forthcoming timeout
conversions.

ok art@, krw@


Revision tags: OPENBSD_4_4_BASE
# 1.7 23-May-2008 brad

Simplify the combination use of pci_mapreg_type()/pci_mapreg_map() as
suggested by dlg@ awhile ago.

ok dlg@


Revision tags: OPENBSD_4_3_BASE
# 1.6 16-Jan-2008 thib

Set the baudrate with IF_Gbps(10); and remove an
XXX comment now that if_baudrate is 64bits.

ok reyk@


Revision tags: OPENBSD_4_2_BASE
# 1.5 01-Jun-2007 reyk

initialize the rings


# 1.4 31-May-2007 reyk

further improvement of the bus space i/o. firmware loading, booting,
and card initalization works now.

thanks to dlg@ who pointed me to the fact that
bus_space_write_region_N and bus_space_write_raw_region_N use count of
elements vs. size of buffer arguments.


# 1.3 31-May-2007 reyk

enable all debugging messages by default if the driver is compiled with
MYX_DEBUG


# 1.2 31-May-2007 reyk

fix the myx_write function


# 1.1 31-May-2007 reyk

initial bits of a new driver for the Myricom Myri-10G Lanai-Z8E 10Gb
Ethernet chipset. not working yet.

ok dlg@