#
362511 |
|
22-Jun-2020 |
freqlabs |
MFC r362201:
Avoid trying to toggle TSO twice
Remove TSO from the toggle mask when automatically disabled by TXCKSUM* in various NIC drivers.
Reviewed by: hselasky, np, gallatin, jpaetzel Approved by: mav (mentor) Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D25120
|
#
331722 |
|
29-Mar-2018 |
eadler |
Revert r330897:
This was intended to be a non-functional change. It wasn't. The commit message was thus wrong. In addition it broke arm, and merged crypto related code.
Revert with prejudice.
This revert skips files touched in r316370 since that commit was since MFCed. This revert also skips files that require $FreeBSD$ property changes.
Thank you to those who helped me get out of this mess including but not limited to gonzo, kevans, rgrimes.
Requested by: gjb (re)
|
#
330897 |
|
14-Mar-2018 |
eadler |
Partial merge of the SPDX changes
These changes are incomplete but are making it difficult to determine what other changes can/should be merged.
No objections from: pfg
|
#
329833 |
|
22-Feb-2018 |
rpokala |
MFC r329295:
Panasas discovered that ioctl(SIOCGLAGGPORT) returns ENOTTY for mxge(4) when the NIC is not a member of a lagg. This came as a surprise, because the SIOCGLAGGPORT handler in if_lagg.c only returns ENOENT (if run against the laggX interface, rather than a physical port) or EINVAL (if run against a non-member physical port). This behavior was not seen with other drivers, such as bge(4), igb(4), and cxl(4). When I compared their respective ioctl handlers, I found that they all called ether_ioctl() for the default (i.e. unhandled) case; by contrast, mxge(4) only calls ether_ioctl() for two specific cases, and returns ENOTTY for the default case.
Remove the two cases which explicitly call ether_ioctl(), and let the default case call it instead. This matches what the vast majority of the NIC drivers do.
|
#
302408 |
|
07-Jul-2016 |
gjb |
Copy head@r302406 to stable/11 as part of the 11.0-RELEASE cycle. Prune svn:mergeinfo from the new branch, as nothing has been merged here.
Additional commits post-branch will follow.
Approved by: re (implicit) Sponsored by: The FreeBSD Foundation |
#
299506 |
|
12-May-2016 |
sephe |
mxge: Setup mbuf flowid before calling tcp_lro_rx().
Reviewed by: gallatin MFC after: 1 week Sponsored by: Microsoft OSTC Differential Revision: https://reviews.freebsd.org/D6320
|
#
298307 |
|
19-Apr-2016 |
pfg |
sys/dev: use our nitems() macro when it is avaliable through param.h.
No functional change, only trivial cases are done in this sweep, Drivers that can get further enhancements will be done independently.
Discussed in: freebsd-current
|
#
297482 |
|
01-Apr-2016 |
sephe |
tcp/lro: Use tcp_lro_flush_all in device drivers to avoid code duplication
And factor out tcp_lro_rx_done, which deduplicates the same logic with netinet/tcp_lro.c
Reviewed by: gallatin (1st version), hps, zbb, np, Dexuan Cui <decui microsoft com> Sponsored by: Microsoft OSTC Differential Revision: https://reviews.freebsd.org/D5725
|
#
297000 |
|
17-Mar-2016 |
jhibbits |
Use uintmax_t (typedef'd to rman_res_t type) for rman ranges.
On some architectures, u_long isn't large enough for resource definitions. Particularly, powerpc and arm allow 36-bit (or larger) physical addresses, but type `long' is only 32-bit. This extends rman's resources to uintmax_t. With this change, any resource can feasibly be placed anywhere in physical memory (within the constraints of the driver).
Why uintmax_t and not something machine dependent, or uint64_t? Though it's possible for uintmax_t to grow, it's highly unlikely it will become 128-bit on 32-bit architectures. 64-bit architectures should have plenty of RAM to absorb the increase on resource sizes if and when this occurs, and the number of resources on memory-constrained systems should be sufficiently small as to not pose a drastic overhead. That being said, uintmax_t was chosen for source clarity. If it's specified as uint64_t, all printf()-like calls would either need casts to uintmax_t, or be littered with PRI*64 macros. Casts to uintmax_t aren't horrible, but it would also bake into the API for resource_list_print_type() either a hidden assumption that entries get cast to uintmax_t for printing, or these calls would need the PRI*64 macros. Since source code is meant to be read more often than written, I chose the clearest path of simply using uintmax_t.
Tested on a PowerPC p5020-based board, which places all device resources in 0xfxxxxxxxx, and has 8GB RAM. Regression tested on qemu-system-i386 Regression tested on qemu-system-mips (malta profile)
Tested PAE and devinfo on virtualbox (live CD)
Special thanks to bz for his testing on ARM.
Reviewed By: bz, jhb (previous) Relnotes: Yes Sponsored by: Alex Perez/Inertial Computing Differential Revision: https://reviews.freebsd.org/D4544
|
#
295790 |
|
19-Feb-2016 |
jhibbits |
Replace several bus_alloc_resource() calls using default arguments with bus_alloc_resource_any()
Since these calls only use default arguments, bus_alloc_resource_any() is the right call.
Differential Revision: https://reviews.freebsd.org/D5306
|
#
294327 |
|
19-Jan-2016 |
hselasky |
Add optimizing LRO wrapper:
- Add optimizing LRO wrapper which pre-sorts all incoming packets according to the hash type and flowid. This prevents exhaustion of the LRO entries due to too many connections at the same time. Testing using a larger number of higher bandwidth TCP connections showed that the incoming ACK packet aggregation rate increased from ~1.3:1 to almost 3:1. Another test showed that for a number of TCP connections greater than 16 per hardware receive ring, where 8 TCP connections was the LRO active entry limit, there was a significant improvement in throughput due to being able to fully aggregate more than 8 TCP stream. For very few very high bandwidth TCP streams, the optimizing LRO wrapper will add CPU usage instead of reducing CPU usage. This is expected. Network drivers which want to use the optimizing LRO wrapper needs to call "tcp_lro_queue_mbuf()" instead of "tcp_lro_rx()" and "tcp_lro_flush_all()" instead of "tcp_lro_flush()". Further the LRO control structure must be initialized using "tcp_lro_init_args()" passing a non-zero number into the "lro_mbufs" argument.
- Make LRO statistics 64-bit. Previously 32-bit integers were used for statistics which can be prone to wrap-around. Fix this while at it and update all SYSCTL's which expose LRO statistics.
- Ensure all data is freed when destroying a LRO control structures, especially leftover LRO entries.
- Reduce number of memory allocations needed when setting up a LRO control structure by precomputing the total amount of memory needed.
- Add own memory allocation counter for LRO.
- Bump the FreeBSD version to force recompilation of all KLDs due to change of the LRO control structure size.
Sponsored by: Mellanox Technologies Reviewed by: gallatin, sbruno, rrs, gnn, transport Tested by: Netflix Differential Revision: https://reviews.freebsd.org/D4914
|
#
281855 |
|
22-Apr-2015 |
rodrigc |
Move zlib.c from net to libkern.
It is not network-specific code and would be better as part of libkern instead. Move zlib.h and zutil.h from net/ to sys/ Update includes to use sys/zlib.h and sys/zutil.h instead of net/
Submitted by: Steve Kiernan stevek@juniper.net Obtained from: Juniper Networks, Inc. GitHub Pull Request: https://github.com/freebsd/freebsd/pull/28 Relnotes: yes
|
#
275358 |
|
01-Dec-2014 |
hselasky |
Start process of removing the use of the deprecated "M_FLOWID" flag from the FreeBSD network code. The flag is still kept around in the "sys/mbuf.h" header file, but does no longer have any users. Instead the "m_pkthdr.rsstype" field in the mbuf structure is now used to decide the meaning of the "m_pkthdr.flowid" field. To modify the "m_pkthdr.rsstype" field please use the existing "M_HASHTYPE_XXX" macros as defined in the "sys/mbuf.h" header file.
This patch introduces new behaviour in the transmit direction. Previously network drivers checked if "M_FLOWID" was set in "m_flags" before using the "m_pkthdr.flowid" field. This check has now now been replaced by checking if "M_HASHTYPE_GET(m)" is different from "M_HASHTYPE_NONE". In the future more hashtypes will be added, for example hashtypes for hardware dedicated flows.
"M_HASHTYPE_OPAQUE" indicates that the "m_pkthdr.flowid" value is valid and has no particular type. This change removes the need for an "if" statement in TCP transmit code checking for the presence of a valid flowid value. The "if" statement mentioned above is now a direct variable assignment which is then later checked by the respective network drivers like before.
Additional notes: - The SCTP code changes will be committed as a separate patch. - Removal of the "M_FLOWID" flag will also be done separately. - The FreeBSD version has been bumped.
MFC after: 1 month Sponsored by: Mellanox Technologies
|
#
273377 |
|
21-Oct-2014 |
hselasky |
Fix multiple incorrect SYSCTL arguments in the kernel:
- Wrong integer type was specified.
- Wrong or missing "access" specifier. The "access" specifier sometimes included the SYSCTL type, which it should not, except for procedural SYSCTL nodes.
- Logical OR where binary OR was expected.
- Properly assert the "access" argument passed to all SYSCTL macros, using the CTASSERT macro. This applies to both static- and dynamically created SYSCTLs.
- Properly assert the the data type for both static and dynamic SYSCTLs. In the case of static SYSCTLs we only assert that the data pointed to by the SYSCTL data pointer has the correct size, hence there is no easy way to assert types in the C language outside a C-function.
- Rewrote some code which doesn't pass a constant "access" specifier when creating dynamic SYSCTL nodes, which is now a requirement.
- Updated "EXAMPLES" section in SYSCTL manual page.
MFC after: 3 days Sponsored by: Mellanox Technologies
|
#
272091 |
|
25-Sep-2014 |
glebius |
Whitespace cleanup.
|
#
272090 |
|
25-Sep-2014 |
glebius |
- Provide mxge_get_counter() to return counters that are not collected, but taken from hardware. - Mechanically convert to if_inc_counter() the rest of counters.
|
#
271856 |
|
19-Sep-2014 |
glebius |
Remove ifq_drops from struct ifqueue. Now queue drops are accounted in struct ifnet if_oqdrops.
Some netgraph modules used ifqueue w/o ifnet. Accounting of queue drops is simply removed from them. There were no API to read this statistic.
Sponsored by: Netflix Sponsored by: Nginx, Inc.
|
#
263102 |
|
13-Mar-2014 |
glebius |
Since 32-bit if_baudrate isn't enough to describe a baud rate of a 10 Gbit interface, in the r241616 a crutch was provided. It didn't work well, and finally we decided that it is time to break ABI and simply make if_baudrate a 64-bit value. Meanwhile, the entire struct if_data was reviewed.
o Remove the if_baudrate_pf crutch.
o Make all fields of struct if_data fixed machine independent size. The notion of data (packet counters, etc) are by no means MD. And it is a bug that on amd64 we've got a 64-bit counters, while on i386 32-bit, which at modern speeds overflow within a second.
This also removes quite a lot of COMPAT_FREEBSD32 code.
o Give 16 bit for the ifi_datalen field. This field was provided to make future changes to if_data less ABI breaking. Unfortunately the 8 bit size of it had effectively limited sizeof if_data to 256 bytes.
o Give 32 bits to ifi_mtu and ifi_metric. o Give 64 bits to the rest of fields, since they are counters.
__FreeBSD_version bumped.
Discussed with: emax Sponsored by: Netflix Sponsored by: Nginx, Inc.
|
#
257176 |
|
26-Oct-2013 |
glebius |
The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare to this event, adding if_var.h to files that do need it. Also, include all includes that now are included due to implicit pollution via if_var.h
Sponsored by: Netflix Sponsored by: Nginx, Inc.
|
#
254263 |
|
12-Aug-2013 |
scottl |
Update PCI drivers to no longer look at the MEMIO-enabled bit in the PCI command register. The lazy BAR allocation code in FreeBSD sometimes disables this bit when it detects a range conflict, and will re-enable it on demand when a driver allocates the BAR. Thus, the bit is no longer a reliable indication of capability, and should not be checked. This results in the elimination of a lot of code from drivers, and also gives the opportunity to simplify a lot of drivers to use a helper API to set the busmaster enable bit.
This changes fixes some recent reports of disk controllers and their associated drives/enclosures disappearing during boot.
Submitted by: jhb Reviewed by: jfv, marius, achadd, achim MFC after: 1 day
|
#
249586 |
|
17-Apr-2013 |
gabor |
- Correct mispellings of word resource
Submitted by: Christoph Mallon <christoph.mallon@gmx.de>
|
#
247268 |
|
25-Feb-2013 |
gallatin |
Several cleanups and fixes to mxge:
- Remove vestigial null pointer tests after malloc(..., M_WAITOK).
- Remove vestigal qualhack union
- Use strlcpy() instead of the error-prone strncpy() when parsing EEPROM and copying strings
- Check the MAC address in the EEPROM strings more strictly.
- Expand the macro MXGE_NEXT_STRING() at its only user. Due to a typo, the macro was very confusing.
- Remove unnecessary buffer limit check. The buffer is double-NUL terminated per construction.
PR: kern/176369 Submitted by: Christoph Mallon <christoph.mallon gmx.de>
|
#
247160 |
|
22-Feb-2013 |
gallatin |
Bump mxge copyright.
Sponsored by: Myricom
MFC After: 7 days
|
#
247159 |
|
22-Feb-2013 |
gallatin |
Improvements for newer mxge nics:
- Some mxge nics may store the serial number in the SN2 field of the EEPROM. These will also have an SN=0 field, so parse the SN2 field, and give it precedence.
- Skip MXGEFW_CMD_UNALIGNED_TEST on mxge nics which do not require it. This saves roughly 10ms per port at device attach time.
Sponsored by: Myricom
MFC After: 7 days
|
#
247152 |
|
22-Feb-2013 |
gallatin |
Try harder to make mxge safe for all combinations of INET and INET6
- Re-fix build by restoring local removed in r247151, but protected by #if defined(INET) || defined(INET6) so that the compile succeeds in the !(INET||INET6) case.
- Protect call to in_pseudo() with an #ifdef INET, to allow a kernel to link with mxge when INET is not compiled in.
- Also remove an errant (improperly commented) obsolete debugging printf
Thanks to Glebius for pointing out the !(INET||INET6) build issue.
Sponsored by: Myricom
MFC After: 7 days
|
#
247151 |
|
22-Feb-2013 |
glebius |
Fix build.
|
#
247133 |
|
21-Feb-2013 |
gallatin |
Improve mxge's receive performance for IPv6:
- Add support for IPv6 rx csum offload - Finally switch mxge from using its own driver lro, to using tcp_lro
MFC after: 7 days Sponsored by: Myricom Inc.
|
#
247011 |
|
19-Feb-2013 |
gallatin |
Add support to mxge for IPv6 TX csum offload & IPv6 TSO.
Sponsored by: Myricom, Inc. MFC after: 7 days
|
#
246128 |
|
30-Jan-2013 |
sbz |
Use DEVMETHOD_END macro defined in sys/bus.h instead of {0, 0} sentinel on device_method_t arrays
Reviewed by: cognet Approved by: cognet
|
#
243857 |
|
04-Dec-2012 |
glebius |
Mechanically substitute flags from historic mbuf allocator with malloc(9) flags in sys/dev.
|
#
241687 |
|
18-Oct-2012 |
glebius |
Utilize new macro to initialize if_baudrate.
|
#
241037 |
|
28-Sep-2012 |
glebius |
The drbr(9) API appeared to be so unclear, that most drivers in tree used it incorrectly, which lead to inaccurate overrated if_obytes accounting. The drbr(9) used to update ifnet stats on drbr_enqueue(), which is not accurate since enqueuing doesn't imply successful processing by driver. Dequeuing neither mean that. Most drivers also called drbr_stats_update() which did accounting again, leading to doubled if_obytes statistics. And in case of severe transmitting, when a packet could be several times enqueued and dequeued it could have been accounted several times.
o Thus, make drbr(9) API thinner. Now drbr(9) merely chooses between ALTQ queueing or buf_ring(9) queueing. - It doesn't touch the buf_ring stats any more. - It doesn't touch ifnet stats anymore. - drbr_stats_update() no longer exists.
o buf_ring(9) handles its stats itself: - It handles br_drops itself. - br_prod_bytes stats are dropped. Rationale: no one ever reads them but update of a common counter on every packet negatively affects performance due to excessive cache invalidation. - buf_ring_enqueue_bytes() reduced to buf_ring_enqueue(), since we no longer account bytes.
o Drivers handle their stats theirselves: if_obytes, if_omcasts.
o mlx4(4), igb(4), em(4), vxge(4), oce(4) and ixv(4) no longer use drbr_stats_update(), and update ifnet stats theirselves.
o bxe(4) was the most correct driver, it didn't call drbr_stats_update(), thus it was the only driver accurate under moderate load. Now it also maintains stats itself.
o ixgbe(4) had already taken stats from hardware, so just - drop software stats updating. - take multicast packet count from hardware as well.
o mxge(4) just no longer needs NO_SLOW_STATS define.
o cxgb(4), cxgbe(4) need no change, since they obtain stats from hardware.
Reviewed by: jfv, gnn
|
#
232874 |
|
12-Mar-2012 |
scottl |
More conversions of drivers to use the PCI parent DMA tag.
|
#
229272 |
|
02-Jan-2012 |
ed |
Use strchr() and strrchr().
It seems strchr() and strrchr() are used more often than index() and rindex(). Therefore, simply migrate all kernel code to use it.
For the XFS code, remove an empty line to make the code identical to the code in the Linux kernel.
|
#
223957 |
|
12-Jul-2011 |
gallatin |
Fix media reporting for dual port CX4 myri10ge NICs
MFC after: 7 days Sponsored by: Myricom, Inc.
|
#
220385 |
|
06-Apr-2011 |
gallatin |
Implement mxge_init()
This fixes a long standing bug in mxge(4) where "ifconfig mxge0 $IP" did not bring the interface into a RUNNING state, like it does on most (all?) other FreeBSD NIC drivers.
Thanks to gnn for mentioning the bug, and yongari for pointing out that ether_ioctl() invokes ifp->if_init() in SIOCSIFADDR.
MFC after: 7 days
|
#
219902 |
|
23-Mar-2011 |
jhb |
Do a sweep of the tree replacing calls to pci_find_extcap() with calls to pci_find_cap() instead.
|
#
217104 |
|
07-Jan-2011 |
jhb |
Use a regular taskqueue rather than a fast taskqueue for mxge(4).
Reviewed by: gallatin
|
#
215686 |
|
22-Nov-2010 |
gallatin |
Fix a TSO checksum bug on mxge(4):
The Myri10GE NIC will assume all TSO frames contain partial checksum, and will emit TSO segments with bad TCP checksums if a TSO frame contains a full checksum. The mxge driver takes care to make sure that TSO is disabled when checksum offload is disabled for this reason. However, modules that modify packet contents (like pf) may end up completing a checksum on a TSO frame, leading to the NIC emitting TSO segments with bad checksums.
To workaround this, restore the partial checksum in the mxge driver when we're fed a TSO frame with a full checksum.
Reported by: Bob Healey
MFC after: 3 days
|
#
208379 |
|
21-May-2010 |
gallatin |
Add interrupt descriptions for mxge's msi-x vectors
|
#
208312 |
|
19-May-2010 |
gallatin |
Correctly identify some twinax cables, which report a media type of 1.
|
#
207761 |
|
07-May-2010 |
fabient |
Add a fastpath to allocate from packet zone when using m_getjcl. This will add support for packet zone for at least igb and ixgbe and will avoid to check for that in bce and mxge.
MFC after: 1 week
|
#
206663 |
|
15-Apr-2010 |
gallatin |
Add missing IFCAP_LINKSTATE to mxge
Submitted by: yongari
|
#
206662 |
|
15-Apr-2010 |
gallatin |
Cleanup if_media handling in mxge(4)
- Re-probe xfp / sfp+ socket on link events, in case user has changed transceiver - correctly report current media to avoid confusing lagg (reported by Panasas) - Report link speed (submitted by yongari)
Reviewed by: yongari (earlier version)
MFC after: 7 days
|
#
205255 |
|
17-Mar-2010 |
gallatin |
Fix 2 bugs in mxge_attach()
- Don't leak slice resources when mxge_alloc_rings() fails
- Start taskq threads only after we know attach will succeed. At boot time, taskqueue_terminate() will loop infinately, waiting for the threads to exit, and hang the system.
Submitted by: Panasas MFC After: 3 days
|
#
204212 |
|
22-Feb-2010 |
gallatin |
Update mxge to support IFCAP_VLAN_HWTSO.
Note: If/when FreeBSD supports TSO over IPv6, the minimal mxge fw rev to enable IFCAP_VLAN_HWTSO will need to be increased to 1.4.37
|
#
203834 |
|
13-Feb-2010 |
mlaier |
Fix drbr and altq interaction: - introduce drbr_needs_enqueue that returns whether the interface/br needs an enqueue operation: returns true if altq is enabled or there are already packets in the ring (as we need to maintain packet order) - update all drbr consumers - fix drbr_flush - avoid using the driver queue (IFQ_DRV_*) in the altq case as the multiqueue consumer does not provide enough protection, serialize altq interaction with the main queue lock - make drbr_dequeue_cond work with altq
Discussed with: kmacy, yongari, jfv MFC after: 4 weeks
|
#
202121 |
|
11-Jan-2010 |
gallatin |
Use better default RSS hash (src + dst, rather than just src port)
MFC after:3 days
|
#
202119 |
|
11-Jan-2010 |
gallatin |
Fix reporting of 10G Twinax media
Reported by: mjacob MFC after: 3 days
|
#
201758 |
|
07-Jan-2010 |
mbr |
Remove extraneous semicolons, no functional changes.
Submitted by: Marc Balmer <marc@msys.ch> MFC after: 1 week
|
#
200845 |
|
22-Dec-2009 |
gallatin |
Don't take the driver mutex in mxge_tick(), as it is run with the mutex held.
Submitted by: rwatson MFC after: 3 days
|
#
198303 |
|
20-Oct-2009 |
gallatin |
Make mxge do a better job recovering from NIC h/w faults by checking PCI config space when the NIC is not transmitting. Previously, a h/w fault would not have been detected if the NIC was down, or handling an RX only workload.
|
#
198250 |
|
19-Oct-2009 |
gallatin |
Move mxge(4)'s NIC watchdog reset handler from a callout to a taskqueue
|
#
197645 |
|
30-Sep-2009 |
gallatin |
Two more mxge watchdog fixes:
1) Restore the PCI Express control register after a watchdog reset. This is required because the device will come out of watchdog reset with the pectl reg at its default state, and important BIOS configuration (like max payload size) could be lost.
2) Call mxge_start_locked() for every tx queue before dropping the lock in the watchdog handler. This is required, as the queue's buf ring may have filled during the reset.
|
#
197395 |
|
21-Sep-2009 |
gallatin |
Improve mxge watchdog routine's ability to reliably reset a failed NIC:
- Mark the link as down, so if watchdog reset fails, link watching failover software can notice it - Don't send MXGEFW_CMD_ETHERNET_DOWN if the NIC has been reset, it is not needed, and will fail on a freshly reset NIC. - Ensure the transmit routines aren't attempting to PIO write to doorbells while the NIC is being reset. - Download the correct f/w, rather than using the EEPROM f/w after reset. - Export a count of the number of watchdog resets via sysctl - Zero all f/w stats at reset. This will lead to less confusing diagnostic output when investigating NIC failures.
MFC after: 3 days
|
#
197391 |
|
21-Sep-2009 |
gallatin |
Add support for throttling transmit bandwidth. This is most commonly used to reduce packet loss on high delay (WAN) paths with a slow link.
|
#
195818 |
|
22-Jul-2009 |
gallatin |
mxge's tunable hw.mxge.rss_hash_type cannot be set from the loader, because it uses a reserved suffix (_type). Fix this by removing the "_" and renaming the tunable to hw.mxge.rss_hashtype. The old (rss_hash_type) tunable is still fetched, in case people load the driver via scripts. When both are present in the kernel environment, the new value (hw.mxge.rss_hashtype) overrides the old value.
Approved by: re (kib)
|
#
195049 |
|
26-Jun-2009 |
rwatson |
Use if_maddr_rlock()/if_maddr_runlock() rather than IF_ADDR_LOCK()/ IF_ADDR_UNLOCK() across network device drivers when accessing the per-interface multicast address list, if_multiaddrs. This will allow us to change the locking strategy without affecting our driver programming interface or binary interface.
For two wireless drivers, remove unnecessary locking, since they don't actually access the multicast address list.
Approved by: re (kib) MFC after: 6 weeks
|
#
194909 |
|
24-Jun-2009 |
gallatin |
Add a dying flag to prevent races at detach.
I tried re-ordering ether_ifdetach(), but this created a new race where sometimes, when under heavy receive load (>1Mpps) and running tcpdump, the machine would panic. At panic, the ithread was still in the original (not dead) if_input() path, and was accessing stale BPF data structs. By using a dying flag, I can close the interface prior to if_detach() to be certain the interface cannot send packets up in the middle of ether_ifdetach.
|
#
194836 |
|
24-Jun-2009 |
gallatin |
Allow admin to specify the initial mtu upon driver load for mxge.
|
#
194761 |
|
23-Jun-2009 |
gallatin |
- Fix bug where device would loose promisc setting when reset. - Allow all rss hash modes to be chosen
|
#
194751 |
|
23-Jun-2009 |
gallatin |
Revert most of 193311 so as to track mxge transmit stats on a per-ring basis and avoid racy (and costly) updates to the ifp stats via drbr by defining NO_SLOW_STATS
Discussed with: kmacy
|
#
194743 |
|
23-Jun-2009 |
gallatin |
Implement minimal set of changes suggested by bz to make mxge no longer depend on INET.
|
#
193311 |
|
02-Jun-2009 |
gallatin |
Buf-ring fixes for mxge
- always maintain byte/mcast/drop stats via drbr - move #define of IFNET_BUF_RING so that its picked up by all files in the driver - conditionalize IFNET_BUF_RING on the FreeBSD_version bump just after it appeared in the tree.
Sponsored by: Myricom Inc.
|
#
193250 |
|
01-Jun-2009 |
gallatin |
Set an rx jumbo cluster to the correct size before using bus_dmamap_load_mbuf_sg() on it. This prevents data corruption when the mxge MTU is between 4076 and 8172 on machines with 4KB pages and MXGE_VIRT_JUMBOS is in use (which it isn't, in -current or -stable)
|
#
191562 |
|
27-Apr-2009 |
gallatin |
Updates to mxge for multiple tx/rx rings:
- Update mxge to use if_transmit(), and the new buf_ring interfaces, so as to enable multiple transmit queues. Use of if_transmit() is conditional on IFNET_BUF_RING, and is enabled by default (as in if_em).
- Record a flow id on receive if receive hashing is active. I currently only record the rx ring id (0..8) rather than the 32-bit topelitz hash result, as doing the latter would require shifting the driver to use a larger rx return ring.
Sponsored by: Myricom, Inc.
|
#
188737 |
|
17-Feb-2009 |
gallatin |
Fix cut/paste error in previous commit and use the correct value for SFP+ reserved media type.
MFC after: 1 week
|
#
188736 |
|
17-Feb-2009 |
gallatin |
Better support for recent Myricom 10GbE NICs
- Update to firmware 1.4.39 for dual-chip NIC (10G-PCIE2-xxx) support, and SFP+ i2c support
- Identify newer "B" NICs (10G-PCIEx-8B-x) correctly, rather than mis-identifying them as "A" NICs (cosmetic only)
- Identify the IFM_10G_LRM ifmedia type, where applicable.
- Identify ifmedia types for SFP+ based NICs
- Update copyright
Sponsored by: Myricom MFC after: 1 week
|
#
188531 |
|
12-Feb-2009 |
rdivacky |
Remove obsolete C preprocessor assertions.
Approved by: kib (mentor)
|
#
185255 |
|
24-Nov-2008 |
gallatin |
Restore sfence semantics in mxge after the introduction of a global mfence based mb() in r185162
|
#
180567 |
|
17-Jul-2008 |
gallatin |
Clean up mxge's use of callouts as pointed out by jhb, and handle NIC hardware watchdog resets.
- remove buggy code at the top of mxge_tick() which tried to detect a race which is already detected in the kernel's callout code.
- move callout_stop() and callout_reset() into mxge_close() mxge_open() rather than doing the callout manipulation all over the place.
- use callout_drain(), rather than callout_stop() to prevent a potential race between mxge_tick() and mxge_detach() which could lead to softclock using a destroyed mutex
- restructure the mxge_tick() and mxge_watchdog_reset() routines to avoid resetting a callout, and then immediately stopping it if the watchdog reset routine is called, and fails.
- enable the driver to handle NIC hardware watchdog resets by restoring the NIC's PCI config space, which is lost when the NIC hardware watchdog triggers.
Reviewed by: jhb (previus version)
|
#
177862 |
|
02-Apr-2008 |
gallatin |
Initialize if_baudrate using IF_Gbps() macro.
Note that if_baudrate is a long, and 32-bits isn't enough to properly represent 10Gb/s.
Pointed out by: dwhite
|
#
177104 |
|
12-Mar-2008 |
gallatin |
Remove dead code which makes a call to mem_range_attr_set(). This fixes a bug where mxge did not declare a dependancy on mem(4), and failed to load with options nomem.
Pointed out by: antoine
|
#
176281 |
|
14-Feb-2008 |
gallatin |
Now that mxge supports MSI-X interrupts, reverse the logic and flag legacy interrupts rather than MSI as a special case. Prior to this commit, the interrupt handler was doing the slow handshaking with the device to ensure the legacy interrupt was lowered in both the legacy and MSI-X case. This handshaking was not required for MSI-X.
|
#
176261 |
|
13-Feb-2008 |
gallatin |
Add minimally invasive shims to ease MFCs of mxge back as far as RELENG_6
Sponsored by: Myricom, Inc.
|
#
175757 |
|
28-Jan-2008 |
gallatin |
Only reset driver state when a hardware error is detected. Preserve warning but do not reset if we enter the routine without seeing a hardware error.
|
#
175579 |
|
22-Jan-2008 |
gallatin |
Take advantage of the new physically contiguous 9K jumbos in 8.
|
#
175365 |
|
15-Jan-2008 |
gallatin |
Add optional support to mxge for MSI-X interrupts and multiple receive queues (which we call slices). The NIC will steer traffic into up to hw.mxge.max_slices different receive rings based on a configurable hash type (hw.mxge.rss_hash_type).
Currently the driver defaults to using a single slice, so the default behavior is unchanged. Also, transmit from non-zero slices is disabled currently.
|
#
172162 |
|
13-Sep-2007 |
gallatin |
Add support for a new device id (9). Mxge NICs with the new device id support MSI-X.
Approved by: re (bmah)
|
#
171917 |
|
22-Aug-2007 |
gallatin |
- Fix a bug which could cause a panic when enabling LRO on an down mxge interface - Fix a bug where mxge reported the link state as active when it wasn't (after ifconfig down). - Prevent spurious watchdog resets when link partner is not consuming - Add support for CX4 and popular XFP media detection - Update the firmware and associated header files to 1.4.25
Approved by: re (kensmith)
|
#
171500 |
|
19-Jul-2007 |
gallatin |
- Enable static building of mxge(4) and its firmware.
- Add custom .c wrappers for the firmware, rather than the standard firmware(9) generated firmware objects to work around toolchain problems on ia64 involving linking objects produced by ld -b -binary into the kernel.
- Move from using Myricom's ".dat" firmware blobs to using Myricom's zlib compressed ".h" firmware header files. This is done to facilitate the custom wrappers, and saves a fair amount of wired memory in the case where the firmware is built in, or preloaded.
- Fix two compile issues in mxge which only appear on non-i386/amd64.
Reviewed by: mlaier, mav (earlier version with just zlib support) Glanced at by: sam Approved by: re (kensmith)
|
#
171405 |
|
12-Jul-2007 |
gallatin |
Update the mxge(4) driver's copyright to 2007, and drop the binary distribution clause.
Approved by: re (bmah)
|
#
170853 |
|
16-Jun-2007 |
gallatin |
Also mark writecombine as enabled when PAT is used to enable it rather than MTRRs.
|
#
170733 |
|
14-Jun-2007 |
gallatin |
correct some limits on interrupt proccessing so that fast forwarding back out the same mxge interface works nicely.
|
#
170626 |
|
12-Jun-2007 |
gallatin |
Use the new IFCAP_LRO to enable/disable LRO.
|
#
170559 |
|
11-Jun-2007 |
gallatin |
Small LRO related fixes for mxge: - Allow LRO to be enabled / disabled at runtime - Fix a double-free at module unload time. - Only update timestamp in lro merge when it is present in the frame Sponsored by: Myricom
|
#
170330 |
|
05-Jun-2007 |
gallatin |
Use pmap_change_attr() to setup a write combine attribute for our device memory, rather than relying on the less reliable MTRR method used by mem_range_attr_set().
Glanced at by: jhb
|
#
169995 |
|
25-May-2007 |
gallatin |
- Use m_getcl() rather than m_getjcl() when we're allocating 2KB clusters. This helps quite a bit on my low end machines (improves performance by about 300Kpps when being blasted by a hardware packet generator). - Include one extended f/w counter forgotten in earlier commit
Sponsored by: Myricom Inc.
|
#
169905 |
|
23-May-2007 |
gallatin |
Add support for "hardware" vlan tag insertion & removal emulation in the mxge driver so as to be able to do checksum offload on vlans. This is good enough to achieve 10GbE line rate on vlans.
|
#
169871 |
|
22-May-2007 |
gallatin |
mxge cleanups:
- Remove code to use the special wc_fifo. It has been disabled by default in our other drivers as it actually slows down transmit by a small amount
- Dynamically determine the amount of space required for the rx_done ring rather than hardcoding it.
- Compute the number of tx descriptors we are willing to transmit per frame as the minimum of 128 or 1/4 the tx ring size.
- Fix a typo in the tx dma tag setup which could lead to unnecessary defragging of TSO packets (and potentially even dropping TSO packets due to EFBIG being returned).
- Add a counter to keep track of how many times we've needed to defragment a frame. It should always be zero.
- Export new extended f/w counters via sysctl
Sponsored by: Myricom, Inc.
|
#
169840 |
|
21-May-2007 |
gallatin |
Improve mxge receive performance:
- Update to the latest (1.4.18) f/w. This f/w introduces a new receive mode which allows us to use FreeBSD's physically discontinuous MJUM9BYTES clusters.
- Switch the driver from chaining MJUMPAGESIZE clusters to using MJUM9BYTES clusters to avoid mbuf chaining overheads. Due to this change, people running obsolete f/w images will be limited to an MTU of PAGE_SIZE - 16.
- Add (disabled by default) support for Large Receive Offload.
Sponsored by: Myricom, Inc.
|
#
169384 |
|
08-May-2007 |
gallatin |
- Add handling of MXGEFW_CMD_UNKNOWN in mxge_send_cmd(). - Convert mxge_send_cmd result handling to a switch rather than adding a new elseif for MXGEFW_CMD_UNKNOWN
Sponsored by: Myricom Inc.
|
#
169376 |
|
08-May-2007 |
gallatin |
Firmware update & improvements to firmware selection:
- Update to latest (1.4.17) firmware.
- Use the new MXGEFW_CMD_UNALIGNED_TEST (added in firmare 1.4.16) to have the firmware tell us if the PCIe chipset supports aligned PCIe completions.
- Hard to maintain, and frequently out of date whitelist of PCIe chipsets known to produce aligned completions removed, as it has been replaced in its role of selecting the correct firmware to run by the use of MXGEFW_CMD_UNALIGNED_TEST.
- Break the dma test out of mxge_reset() and into its own function (mxge_dma_test()) so it can be used by both the normal DMA test, and to run the unaligned test.
- Improved support for enabling ECRCs
Sponsored by: Myricom Inc.
|
#
169070 |
|
27-Apr-2007 |
gallatin |
-Fix an mbuf leak caused by a cut&paste bug where the small ring's mbufs were never freed, but the big ring was freed twice. -Don't supply rx hw csums for frames which are padded beyond the length specified in the ip header. If the padding is non-zero, the hw csum will be incorrect for such frames.
Sponsored by: Myricom
|
#
168298 |
|
03-Apr-2007 |
gallatin |
- Fix a bug in the TSO transmit routine where frames which had been defragged and had their headers in the same cluster as their payload would be fed to the NIC in header-sized chunks, and would likely exceed the number of available transmit descriptors.
- If a TSO frame exceeds the number of available transmit descriptors, don't leak busdmma resources when freeing it.
Sponsored by: Myricom Inc.
|
#
168191 |
|
31-Mar-2007 |
jhb |
Optimize sx locks to use simple atomic operations for the common cases of obtaining and releasing shared and exclusive locks. The algorithms for manipulating the lock cookie are very similar to that rwlocks. This patch also adds support for exclusive locks using the same algorithm as mutexes.
A new sx_init_flags() function has been added so that optional flags can be specified to alter a given locks behavior. The flags include SX_DUPOK, SX_NOWITNESS, SX_NOPROFILE, and SX_QUITE which are all identical in nature to the similar flags for mutexes.
Adaptive spinning on select locks may be enabled by enabling the ADAPTIVE_SX kernel option. Only locks initialized with the SX_ADAPTIVESPIN flag via sx_init_flags() will adaptively spin.
The common cases for sx_slock(), sx_sunlock(), sx_xlock(), and sx_xunlock() are now performed inline in non-debug kernels. As a result, <sys/sx.h> now requires <sys/lock.h> to be included prior to <sys/sx.h>.
The new kernel option SX_NOINLINE can be used to disable the aforementioned inlining in non-debug kernels.
The size of struct sx has changed, so the kernel ABI is probably greatly disturbed.
MFC after: 1 month Submitted by: attilio Tested by: kris, pjd
|
#
167942 |
|
27-Mar-2007 |
gallatin |
Fix a bug which could lead to receive side lockup when WC is disabled. When submitting rx buffers and not using WC fifo, always replace the invalid DMA address with the real one, otherwise allocation failures could lead to the invalid DMA address being given to the NIC, and that would cause the receive side to lockup.
|
#
166901 |
|
23-Feb-2007 |
piso |
o break newbus api: add a new argument of type driver_filter_t to bus_setup_intr()
o add an int return code to all fast handlers
o retire INTR_FAST/IH_FAST
For more info: http://docs.freebsd.org/cgi/getmsg.cgi?fetch=465712+0+current/freebsd-current
Reviewed by: many Approved by: re@
|
#
166875 |
|
21-Feb-2007 |
gallatin |
Work around a firmware bug where broadcast frames would be incorrectly treated as multicast frames and filtered, but when only when "adopting" running firmware. By "adopting", I mean using pre-existing firmware loaded from eeprom at PCI reset, rather than firmware loaded by the driver.
|
#
166756 |
|
15-Feb-2007 |
luigi |
Cleanup and document the implementation of firmware(9) based on a version that i posted earlier on the -current mailing list, and subsequent feedback received.
The core of the change is just in sys/firmware.h and kern/subr_firmware.c, while other files are just adaptation of the clients to the ABI change (const-ification of some parameters and hiding of internal info, so this is fully compatible at the binary level).
In detail: - reduce the amount of information exported to clients in struct firmware, and constify the pointer;
- internally, document and simplify the implementation of the various functions, and make sure error conditions are dealt with properly.
The diffs are large, but the code is really straightforward now (i hope).
Note also that there is a subtle issue with the implementation of firmware_register(): currently, as in the previous version, we just store a reference to the 'imagename' argument, but we should rather copy it because there is no guarantee that this is a static string. I realised this while testing this code, but i prefer to fix it in a later commit -- there is no regression with respect to the past.
Note, too, that the version in RELENG_6 has various bugs including missing locks around the module release calls, mishandling of modules loaded by /boot/loader, and so on, so an MFC is absolutely necessary there. I was just postponing it until this cleanup to avoid doing things twice.
MFC after: 1 week
|
#
166373 |
|
31-Jan-2007 |
gallatin |
- Add 99% of a callout based watchdog. The remaining 1% is waiting for pci_cfg_restore() to be exported. It was tested using a hackily accessed pci_cfg_restore().
- Add ifmedia_removeall() to mxge_detach() in order to stop leaking an ifaddr
- Fix a small acounting bug introduced by the locking code shuffle which could cause spurious watchdog resets now that we have a watchdog.
Sponsored by: Myricom
|
#
166371 |
|
31-Jan-2007 |
gallatin |
destroy busdma maps even if they are NULL, so as to avoid leaking busdma tags.
|
#
166370 |
|
31-Jan-2007 |
gallatin |
Abandon using sleepable locks in favor of mutexes for mxge's if_ioctl locking in preparation for adding a watchdog handler (callouts must not use sleepable locks). This required shuffling memory and interrupt allocation to the attach routine rather than if_ioctl so as to avoid potential sleeps while bringing up the interface.
|
#
166345 |
|
30-Jan-2007 |
gallatin |
Minor updates:
- initialize ifq_drv_maxlen correctly - mark the interface as jumbo capable - keep stats on the number of times the hw transmit queue filled and was restarted.
|
#
164751 |
|
29-Nov-2006 |
gallatin |
Fix mxge_submit_8rx() to behave like the comments says it does, and ensure that it copies at most 32 bytes at a time.
|
#
164520 |
|
22-Nov-2006 |
gallatin |
Fix transposition of width and value arguments to pci_config_write() when setting up the read request size.
Pointed out by: kmacy
|
#
164513 |
|
22-Nov-2006 |
gallatin |
Initialization bugfixes and enhancements:
- Fix bug preventing adoption of running firmware - Set PCIe max read request size to 4KB - Read PCIe link width from config space - Assume aligned completions from the southbridge ports of intel E5000 chips - Use aligned firmware when link width is x4 or less - Add hw.mxge.force_firmware tunable to allow user to force selection of aligned (or unaligned) firmware
|
#
164472 |
|
21-Nov-2006 |
gallatin |
Added MSI support.
Sponsored by: Myricom Inc.
|
#
163467 |
|
17-Oct-2006 |
gallatin |
Fix a driver bug which could result in frames MHLEN or (MHLEN - 1) bytes long being DMA'ed 2 (or 1) bytes past the end of the mbuf and corrupting random kernel memory. I had forgotten about the 2 bytes of implict padding the firmware assumes.
Sponsored by: Myricom Inc.
|
#
162328 |
|
15-Sep-2006 |
gallatin |
- Updated to the latest myri10ge firmware - Added support for multicast filtering, now that the firmware supports it. Note that this is not yet tested, as multicast seems to panic -current (even w/o mxge loaded) - Added workaround to cope with different irq data struct size on pre-multicast firmware which can found running on nics. - Added Intel E5000 PCIe chipsets to list providing aligned completions. - Replaced various magic constants with #defines, now that they are defined in the firmware headers.
|
#
162322 |
|
15-Sep-2006 |
gallatin |
- Added TSO support. This entailed increasing the number of send descriptors in the transmit busdma tag, so I moved the segment list off the stack.
- Fixed transmit routine to ensure it doesn't read past the end of an mbuf when parsing headers.
- Corrected handling of odd length segments. Setting MXGEFW_FLAGS_ALIGN_ODD is required only when offloading the checksum of that frame.
Sponsored by: Myricom Inc.
|
#
160973 |
|
04-Aug-2006 |
gallatin |
Copy the link-layer address from our ifnet pointer at reset time so that the mac address can be overridden.
|
#
160876 |
|
01-Aug-2006 |
gallatin |
- add read only sysctl to indicate if write-combining was enabled - enable mxge_dummy_rdma() right after reset, and make sure to disable when detaching the driver.
|
#
160456 |
|
17-Jul-2006 |
gallatin |
Firmware loading improvements:
- Copy ethernet firmware down in small chunks so as to avoid bugs in early versions of the bootstrap firmware. - Attempt to "adopt" the running firmware if we cannot load a suitable firmware image via firmware(9). - Separate firmware validation into its own routine, and check the major/minor driver/firmware ABI version.
|
#
159623 |
|
14-Jun-2006 |
gallatin |
Much to my surprise, IFQ_DRV_DEQUEUE() can return a null mbuf even if !IFQ_DRV_IS_EMPTY(). Taking this into account, I re-structured the transmit routine so as to avoid adding another if/then in the critical path.
Thanks to brueffer for showing my how to test with altq/pf.
|
#
159621 |
|
14-Jun-2006 |
gallatin |
Replace a sc->ifp->if_snd.ifq_drv_maxlen with IFQ_SET_MAXLEN(), and call IFQ_SET_READY().
Submitted by: brueffer
|
#
159612 |
|
14-Jun-2006 |
gallatin |
Update the mxge driver.
- Update the firmware to the latest released firmware (1.4.3), which corresponds to the firmware in the latest shipping drivers from Myricom. This firmware fixes several bugs in the firmware's PCI-e implementation, and it also changes the driver/firmware interface:
o TSO was added, and changed the format of the transmit descriptors. o The firmware no longer counts transmits descriptors, but frames. So the driver needs to keep a count of the number of frames sent. o The weird interrupt strategy changed to a normal receive return ring. This ring is much bigger, and we may be able to support DEVICE_POLLING. o Myricom's header files changed the name of firmware related #define's and enums (s/_MCP_/FW_).
- Stopped spamming the console with lots of printfs unless mxge_verbose (or bootverbose) is set.
- Made additional information available via sysctl, including the results of a PCI-e DMA benchmark run at device reset.
- Decreased the excessively long timeouts when sending commands from 2 seconds to 20ms.
Sponsored by: Myricom Inc.
|
#
159571 |
|
13-Jun-2006 |
gallatin |
- Complete the myri10ge -> mxge name change by doing a mechanical s/myri10ge/mxge/g replacement in the myri10ge files. A few contuation lines were joined because of the regained columns. - Hook the mxge driver back to the build.
|
#
158651 |
|
16-May-2006 |
phk |
Since DELAY() was moved, most <machine/clock.h> #includes have been unnecessary.
|
#
155852 |
|
19-Feb-2006 |
gallatin |
10GbE mode driver and binary firmware for Myricom's PCI-express NICs. More info regarding these nics can be found at http://www.myri.com.
Please note that the files sys/dev/myri10ge/{mcp_gen_header.h,myri10ge_mcp.h} are internally shared between all our drivers (solaris, macosx, windows, linux, etc). I'd like to keep these files unchanged, so I can just import newer versions of them when the firmware API/ABI changes. This means I'm stuck with some of the crazy-long #define names, and possibly non-style(9) characteristics of these files.
Many thanks to mlaier for doing firmware(9) just as I needed it, and to scottl for his helpful review.
Reviewed by: scottl, glebius Sponsored by: Myricom Inc.
|