369514 |
23-Mar-2021 |
git2svn |
netmap: fix issues in nm_os_extmem_create()
- Call vm_object_reference() before vm_map_lookup_done(). - Use vm_mmap_to_errno() to convert vm_map_* return values to errno. - Fix memory leak of e->obj.
Reported by: markj Reviewed by: markj MFC after: 1 week
(cherry picked from commit ee7ffaa2e6e08b63efb4673610875d40964d5058)
Git Hash: e36c2f704635a101e993fa2d1890bd44c33ebcdd Git Author: vmaffione@FreeBSD.org |
369479 |
18-Mar-2021 |
git2svn |
netmap: fix memory leak in NETMAP_REQ_PORT_INFO_GET
The netmap_ioctl() function has a reference counting bug in case of NETMAP_REQ_PORT_INFO_GET command. When `hdr->nr_name[0] == '\0'`, the function does not decrease the refcount of "nmd", which is increased by netmap_mem_find(), causing a refcount leak.
Reported by: Xiyu Yang <sherllyyang00@gmail.com> Submitted by: Carl Smith <carl.smith@alliedtelesis.co.nz> MFC after: 3 days PR: 254311
Git Hash: 4019787f50a2826e9a4bba6e70868467b3d6081a Git Author: vmaffione@FreeBSD.org |
364756 |
25-Aug-2020 |
vmaffione |
MFC r364341
netmap: fix parsing of legacy nmr->nr_ringid
Code was checking for NETMAP_{SW,HW}_RING in req->nr_ringid which had already been masked by NETMAP_RING_MASK. Therefore, the comparisons always failed and set NR_REG_ALL_NIC. Check against the original nmr structure.
Submitted by: bpoole@packetforensics.com Reported by: bpoole@packetforensics.com Reviewed by: giuseppe.lettieri@unipi.it Approved by: vmaffione |
359310 |
25-Mar-2020 |
vmaffione |
netmap: ixl: add CRC to outbound frames
With this change, ixl netmap_txsync instructs the NIC to add CRC to transmitted frames.
Submitted by: Alexandre Snarskii <snar@snar.spb.ru> Reviewed by: vmaffione |
357961 |
15-Feb-2020 |
vmaffione |
MFC r357663
netmap: improve netmap(4) and vale(4) man pages
Clean up obsolete sysctl descriptions and add missing ones.
PR: 243838 Reviewed by: bcr Differential Revision: https://reviews.freebsd.org/D23546 |
357278 |
29-Jan-2020 |
vmaffione |
MFC r357159
netmap_mem_unmap: fix NULL pointer dereference |
356805 |
16-Jan-2020 |
vmaffione |
MFC r356704
netmap: disable passthrough with no hypervisor support
The netmap passthrough subsystem requires proper support in the hypervisor. In particular, two PCI device ids (from the Red Hat PCI vendor id 0x1b36) need to be assigned to the two netmap virtual devices. We then disable these devices until the ids have not been assigned, in order to avoid conflicts with other virtual devices emulated by upstream QEMU.
PR: 241774 |
351772 |
03-Sep-2019 |
vmaffione |
MFC r351488
netmap: remove obsolete file
The netmap_pt.c module has become obsolete after the refactoring that added netmap_kloop.c. Remove it and unlink it from the build system. |
350010 |
15-Jul-2019 |
vmaffione |
MFC r349581
netmap: fix two panics with emulated adapter
This patch fixes 2 panics. The first one is due to the current VNET not being set in the emulated adapter transmission path. The second one is caused by the M_PKTHDR flag not being set when preallocated mbufs are recycled in the transmit path.
Submitted by: aleksandr.fedorov@itglobal.com Reviewed by: vmaffione Differential Revision: https://reviews.freebsd.org/D20824 |
350007 |
15-Jul-2019 |
vmaffione |
MFC r349966
netmap: fix bug introduced by r349752
r349752 introduced a NULL pointer reference bug in the emulated netmap code.
Reported by: lwhsu |
349922 |
11-Jul-2019 |
vmaffione |
MFC r349752
netmap: fix kernel pointer printing in netmap_generic.c
Print the adapter name rather than the address of the adapter to avoid kernel address leakage.
PR: Bug 238642 Submitted by: Fuqian Huang <huangfq.daxian@gmail.com> Reviewed by: vmaffione |
345668 |
29-Mar-2019 |
rpokala |
MFC r339683: Remove redundant redeclaration of netmap_vp_reg(). This should unbreak sparc64 and powerpc LINT builds.
-- While this does fix that error, powerpc.LINT, powerpc.LINT64, and -- sparc64.LINT are broken in stable/11 for other reasons. --rpokala
Sponsored by: Panasas |
344658 |
28-Feb-2019 |
vmaffione |
MFC r344510
netmap: remove redundant call to nm_set_native_flags()
This redundant call was introduced by mistake in r343772.
Sponsored by: Sunny Valley Networks |
344509 |
25-Feb-2019 |
vmaffione |
MFC r343579, r344253
netmap: fix lock order reversal related to kqueue usage
When using poll(), select() or kevent() on netmap file descriptors, netmap executes the equivalent of NIOCTXSYNC and NIOCRXSYNC commands, before collecting the events that are ready. In other words, the poll/kevent callback has side effects. This is done to avoid the overhead of two system call per iteration (e.g., poll() + ioctl(NIOC*XSYNC)).
When the kqueue subsystem invokes the kqueue(9) f_event callback (netmap_knrw), it holds the lock of the struct knlist object associated to the netmap port (the lock is provided at initialization, by calling knlist_init_mtx). However, netmap_knrw() may need to wake up another netmap port (or even the same one), which means that it may need to call knote(). Since knote() needs the lock of the struct knlist object associated to the to-be-wake-up netmap port, it is possible to have a lock order reversal problem (AB/BA deadlock).
This change prevents the deadlock by executing the knote() call in a per-selinfo taskqueue, where it is possible to hold a mutex. The change also adds a counter (kqueue_users) to keep track of how many kqueue users are referencing a given struct nm_selinfo. In this way, nm_os_selwakeup() can schedule the kevent notification task only when kqueue is actually being used. This is important to avoid wasting CPU in the common case where kqueue is not used.
Reviewed by: aleksandr.fedorov_itglobal.com Differential Revision: https://reviews.freebsd.org/D18956 |
344047 |
12-Feb-2019 |
vmaffione |
MFC r343772, r343867
netmap: refactor logging macros and pipes
Changelist: - Replace ND, D and RD macros with nm_prdis, nm_prinf, nm_prerr and nm_prlim, to avoid possible naming conflicts. - Add netmap_krings_mode_commit() helper function and use that to reduce code duplication. - Refactor pipes control code to export some functions that can be reused by the veth driver (on Linux) and epair(4). - Add check to reject API requests with version less than 11. - Small code refactoring for the null adapter. |
343866 |
07-Feb-2019 |
vmaffione |
MFC r343689
netmap: upgrade sync-kloop support
Add SYNC_KLOOP_MODE option, and add support for direct mode, where application executes the TXSYNC and RXSYNC in the context of the ioeventfd wake up callback. |
343834 |
06-Feb-2019 |
vmaffione |
MFC r343549
netmap: add notifications on kloop stop
On sync-kloop stop, send a wake-up signal to the kloop, so that waiting for the timeout is not needed. Also, improve logging in netmap_freebsd.c. |
343832 |
06-Feb-2019 |
vmaffione |
MFC r343346
netmap: improvements to the netmap kloop (CSB mode)
Changelist: - Add the proper memory barriers in the kloop ring processing functions. - Fix memory barriers usage in the user helpers (nm_sync_kloop_appl_write, nm_sync_kloop_appl_read). - Fix nm_kr_txempty() helper to look at rhead rather than rcur. This is important since the kloop can read a value of rcur which is ahead of the value of rhead (see explanation in nm_sync_kloop_appl_write) - Remove obsolete ptnetmap_guest_write_kring_csb() and ptnet_guest_read_kring_csb(). - Prepare in advance the arguments for netmap_sync_kloop_[tr]x_ring(), to make the kloop faster. - Provide kernel and user implementation for nm_ldld_barrier() and nm_ldst_barrier() |
343831 |
06-Feb-2019 |
vmaffione |
MFC r343344
netmap: fix knote() argument to match the mutex state
The nm_os_selwakeup function needs to call knote() to wake up kqueue(9) users. However, this function can be called from different code paths, with different lock requirements. This patch fixes the knote() call argument to match the relavant lock state. Also, comments have been updated to reflect current code.
PR: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=219846 Reported by: Aleksandr Fedorov <aleksandr.fedorov@itglobal.com> Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D18876 |
343771 |
05-Feb-2019 |
vmaffione |
netmap: small cleanup on em, lem, igb, ixgbe
Replace D, ND and RD macros with the corresponding nm_pr* ones. |
343559 |
29-Jan-2019 |
vmaffione |
ixl: remove unnecessary limitations related to netmap
Netmap supports the case where TX rings and RX rings have different size. Remove unnecessary limitations related to netmap support, making the code simpler. Also, check that the value of the hw head index written back from the NIC is valid.
Reviewed by: erj Differential Revision: https://reviews.freebsd.org/D18984 |
343522 |
28-Jan-2019 |
vmaffione |
MFC r343413
netmap: fix crash with monitors and VALE ports
Crash report described here: https://github.com/luigirizzo/netmap/issues/583 Fixed by providing dummy sync callback in case it is missing. |
342648 |
31-Dec-2018 |
vmaffione |
MFC r342368, r342369
netmap: fix bug in netmap_poll() optimization
The bug was introduced by r339639, although it is present in the upstream netmap code since 2015. It is due to resetting the want_rx variable to POLLIN, rather than resetting it to POLLIN|POLLRDNORM. It only affects select(), which uses POLLRDNORM. poll() is not affected, because it uses POLLIN. Also, it only affects FreeBSD, because Linux skips the optimization implemented by the piece of code where the bug occurs. To check if txsync can be skipped, it is necessary to look for unseen TX space. However, this means comparing ring->cur against ring->tail, rather than ring->head against ring->tail (like nm_ring_empty() does).
Sponsored by: Sunny Valley Networks |
342395 |
24-Dec-2018 |
vmaffione |
MFC r342300
netmap: move buf_size validation code to its own function
This code validates the netmap buf_size against the interface MTU and maximum descriptor size, to make sure the values are consistent. Moving this functionality to its own function is needed because this function is also called by Linux-specific code. |
342394 |
24-Dec-2018 |
vmaffione |
MFC r342299
netmap: pipes: make sure both ends use the same number of slots |
342131 |
15-Dec-2018 |
vmaffione |
MFC r341992
netmap: fix warning in netmap_kloop.c
Reported by: markj |
342034 |
13-Dec-2018 |
vmaffione |
MFC r341624
netmap: netmap_transmit should honor bpf packet tap hook
This allows tcpdump to capture outbound kernel packets while in netmap mode
Submitted by: Marc de la Gueronniere <mdelagueronniere@verisign.com> Reviewed by: vmaffione MFC after: 1 week Sponsored by: Verisign, Inc. Differential Revision: https://reviews.freebsd.org/D17896 |
342033 |
13-Dec-2018 |
vmaffione |
MFC r341516, r341589
netmap: align codebase to the current upstream (760279cfb2730a585)
Changelist: - Replace netmap passthrough host support with a more general mechanism to call TXSYNC/RXSYNC from an in-kernel event-loop. No kernel threads are used to use this feature: the application is required to spawn a thread (or a process) and issue a SYNC_KLOOP_START (NIOCCTRL) command in the thread body. The kernel loop is executed by the ioctl implementation, which returns to userspace only when a different thread calls SYNC_KLOOP_STOP or the netmap file descriptor is closed. - Update the if_ptnet driver to cope with the new data structures, and prune all the obsolete ptnetmap code. - Add support for "null" netmap ports, useful to allocate netmap_if, netmap_ring and netmap buffers to be used by specialized applications (e.g. hypervisors). TXSYNC/RXSYNC on these ports have no effect. - Various fixes and code refactoring.
Sponsored by: Sunny Valley Networks Differential Revision: https://reviews.freebsd.org/D18015 |
341480 |
04-Dec-2018 |
vmaffione |
MFC r341144
netmap: set IFCAP_NETMAP in if_capabilities
Revision r307394 removed (by mistake) the code that sets IFCAP_NETMAP in if_capabilities on netmap_attach. This patch reverts this change.
Reviewed by: np Approved by: gnn (mentor) Differential Revision: https://reviews.freebsd.org/D17987 |
341478 |
04-Dec-2018 |
vmaffione |
MFC r340436
vtnet: fix netmap support
netmap(4) support for vtnet(4) was incomplete and had multiple bugs. This commit fixes those bugs to bring netmap on vtnet in a functional state.
Changelist: - handle errors returned by virtqueue_enqueue() properly (they were previously ignored) - make sure netmap XOR rest of the kernel access each virtqueue. - compute the number of netmap slots for TX and RX separately, according to whether indirect descriptors are used or not for a given virtqueue. - make sure sglist are freed according to their type (mbufs or netmap buffers) - add support for mulitiqueue and netmap host (aka sw) rings. - intercept VQ interrupts directly instead of intercepting them in txq_eof and rxq_eof. This simplifies the code and makes it easier to make sure taskqueues are not running for a VQ while it is in netmap mode. - implement vntet_netmap_config() to cope with changes in the number of queues.
Reviewed by: bryanv Approved by: gnn (mentor) Sponsored by: Sunny Valley Networks Differential Revision: https://reviews.freebsd.org/D17916 |
341477 |
04-Dec-2018 |
vmaffione |
MFC r339639
netmap: align codebase to the current upstream (sha 8374e1a7e6941)
Changelist: - Move large parts of VALE code to a new file and header netmap_bdg.[ch]. This is useful to reuse the code within upcoming projects. - Improvements and bug fixes to pipes and monitors. - Introduce nm_os_onattach(), nm_os_onenter() and nm_os_onexit() to handle differences between FreeBSD and Linux. - Introduce some new helper functions to handle more host rings and fake rings (netmap_all_rings(), netmap_real_rings(), ...) - Added new sysctl to enable/disable hw checksum in emulated netmap mode. - nm_inject: add support for NS_MOREFRAG
Approved by: gnn (mentor) Differential Revision: https://reviews.freebsd.org/D17364 |
341275 |
30-Nov-2018 |
dab |
MFC r337812,r337814,r337820,r341068:
Fix several memory leaks (r337812 & r337814).
The libkqueue tests have several places that leak memory by using an idiom like:
puts(kevent_to_str(kevp));
Rework to save the pointer returned from kevent_to_str() and then free() it after it has been used.
r337812 also fixed a bug in the netmap kevent code. The inclusion of that fix was an oversight that I didn't notice until this MFC. Reference the code review and PR here in the MFC for completeness.
r337820 & r341068 were white-space only changes as a follow-up to r337812 & r337814:
After r337820, which "corrected" some spaces-instead-of-tab whitespace issues in the libkqueue tests, jmg@ pointed out that these files were originally space-based, not tab-spaced, and so the correction should have been to get rid of the tabs that had been introduced in previous changes, not the spaces. This change does that. This is a whitespace only change; no functional change is intended.
PR: 206053 Differential Revision: https://reviews.freebsd.org/D16531 Sponsored by: Dell EMC Isilon |
331722 |
29-Mar-2018 |
eadler |
Revert r330897:
This was intended to be a non-functional change. It wasn't. The commit message was thus wrong. In addition it broke arm, and merged crypto related code.
Revert with prejudice.
This revert skips files touched in r316370 since that commit was since MFCed. This revert also skips files that require $FreeBSD$ property changes.
Thank you to those who helped me get out of this mess including but not limited to gonzo, kevans, rgrimes.
Requested by: gjb (re) |
330897 |
14-Mar-2018 |
eadler |
Partial merge of the SPDX changes
These changes are incomplete but are making it difficult to determine what other changes can/should be merged.
No objections from: pfg |
312783 |
25-Jan-2017 |
loos |
Fix a crash in netmap when using the emulated mode.
This is a direct commit to stable/11 as the -head version was already fixed by a recent import of a new netmap version.
Submitted by: Vincenzo Maffione <v.maffione@gmail.com> Sponsored by: Rubicon Communications, LLC (Netgate) |
308131 |
31-Oct-2016 |
sbruno |
MFC r308038:
The buffer address is always overwritten in the extended descriptor format, we have to refresh it ... always. This fixes problems reported in NetMap with em(4) devices after conversion to extended descriptor format in svn r293331. |
302408 |
08-Jul-2016 |
gjb |
Copy head@r302406 to stable/11 as part of the 11.0-RELEASE cycle. Prune svn:mergeinfo from the new branch, as nothing has been merged here.
Additional commits post-branch will follow.
Approved by: re (implicit) Sponsored by: The FreeBSD Foundation |
300050 |
17-May-2016 |
eadler |
Don't repeat the the word 'the'
(one manual change to fix grammar)
Confirmed With: db Approved by: secteam (not really, but this is a comment typo fix)
|
298955 |
03-May-2016 |
pfg |
sys/dev: minor spelling fixes.
Most affect comments, very few have user-visible effects.
|
297298 |
26-Mar-2016 |
np |
Plug leak in m_unshare.
m_unshare passes on the source mbuf's flags as-is to m_getcl and this results in a leak if the flags include M_NOFREE. The fix is to clear the bits not listed in M_COPYALL before calling m_getcl. M_RDONLY should probably be filtered out too but that's outside the scope of this fix.
Add assertions in the zone_mbuf and zone_pack ctors to catch similar bugs.
Update netmap_get_mbuf to not pass M_NOFREE to m_getcl. It's not clear what the original code was trying to do but it's likely incorrect. Updated code is no different functionally but it avoids the newly added assertions.
Reviewed by: gnn@ Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D5698
|
295126 |
01-Feb-2016 |
glebius |
These files were getting sys/malloc.h and vm/uma.h with header pollution via sys/mbuf.h
|
293331 |
07-Jan-2016 |
sbruno |
Switch em(4) to the extended RX descriptor format. This matches the e1000/e1000e split in linux.
Split rxbuffer and txbuffer apart to support the new RX descriptor format structures. Move rxbuffer manipulation to em_setup_rxdesc() to unify the new behavior changes.
Add a RSSKEYLEN macro for help in generating the RSSKEY data structures in the card.
Change em_receive_checksum() to process the new rxdescriptor format status bit.
MFC after: 2 weeks Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D3447
|
292730 |
25-Dec-2015 |
kevlo |
Fix typo (s/harware/hardware/)
|
287543 |
07-Sep-2015 |
adrian |
Don't call enable_all_rings if the adapter has been freed.
This is a subtle use-after-free race that results in some very undesirable hang behaviour.
Reviewed by: pkelsey Obtained from: Kip Macy, NextBSD (https://github.com/NextBSD/NextBSD/commit/91a9bd1dbb33dafb41684d054e59d73976de9654)
|
285699 |
19-Jul-2015 |
luigi |
add a use count so the netmap module cannot be unloaded while in use.
|
285698 |
19-Jul-2015 |
luigi |
properly destroy persistent vale ports
|
285697 |
19-Jul-2015 |
luigi |
do not free NULL if pipe allocation fails
|
285696 |
19-Jul-2015 |
luigi |
release a reference when stopping a monitor
|
285695 |
19-Jul-2015 |
luigi |
small documentation update
|
285592 |
15-Jul-2015 |
pkelsey |
Add netmap support for ixgbe SRIOV VFs (that is, to if_ixv).
Differential Revision: https://reviews.freebsd.org/D2923 Reviewed by: erj, gnn Approved by: jmallett (mentor) Sponsored by: Norse Corp, Inc.
|
285445 |
13-Jul-2015 |
luigi |
set the refcount for the structure (dropped by mistake in the last commit).
|
285359 |
10-Jul-2015 |
luigi |
staticize functions only used in netmap.c (detected by jenkins run with gcc 4.9)
Update documentation on the use of netmap_priv_d, rename the refcount and use the same structure in FreeBSD and linux
No functional changes.
|
285349 |
10-Jul-2015 |
luigi |
Sync netmap sources with the version in our private tree. This commit contains large contributions from Giuseppe Lettieri and Stefano Garzarella, is partly supported by grants from Verisign and Cisco, and brings in the following:
- fix zerocopy monitor ports and introduce copying monitor ports (the latter are lower performance but give access to all traffic in parallel with the application)
- exclusive open mode, useful to implement solutions that recover from crashes of the main netmap client (suggested by Patrick Kelsey)
- revised memory allocator in preparation for the 'passthrough mode' (ptnetmap) recently presented at bsdcan. ptnetmap is described in S. Garzarella, G. Lettieri, L. Rizzo; Virtual device passthrough for high speed VM networking, ACM/IEEE ANCS 2015, Oakland (CA) May 2015 http://info.iet.unipi.it/~luigi/research.html
- fix rx CRC handing on ixl
- add module dependencies for netmap when building drivers as modules
- minor simplifications to device-specific routines (*txsync, *rxsync)
- general code cleanup (remove unused variables, introduce macros to access rings and remove duplicate code,
Applications do not need to be recompiled, unless of course they want to use the new features (monitors and exclusive open).
Those willing to try this code on stable/10 can just update the sys/dev/netmap/*, sys/net/netmap* with the version in HEAD and apply the small patches to individual device drivers.
MFC after: 1 month Sponsored by: (partly) Verisign, Cisco
|
283959 |
03-Jun-2015 |
sbruno |
Change EM_MULTIQUEUE to a real kernconf entry and enable support for up to 2 rx/tx queues for the 82574.
Program the 82574 to enable 5 msix vectors, assign 1 to each rx queue, 1 to each tx queue and 1 to the link handler.
Inspired by DragonFlyBSD, enable some RSS logic for handling tx queue handling/processing.
Move multiqueue handler functions so that they line up better in a diff review to if_igb.c
Always enqueue tx work to be done in em_mq_start, if unable to acquire the TX lock, then this will be processed in the background later by the taskqueue. Remove mbuf argument from em_start_mq_locked() as the work is always enqueued. (stolen from igb)
Setup TARC, TXDCTL and RXDCTL registers for better performance and stability in multiqueue and singlequeue implementations. Handle Intel errata 3 and generic multiqueue behavior with the initialization of TARC(0) and TARC(1)
Bind interrupt threads to cpus in order. (stolen from igb)
Add 2 new DDB functions, one to display the queue(s) and their settings and one to reset the adapter. Primarily used for debugging.
In the multiqueue configuration, bump RXD and TXD ring size to max for the adapter (4096). Setup an RDTR of 64 and an RADV of 128 in multiqueue configuration to cut down on the number of interrupts. RADV was arbitrarily set to 2x RDTR and can be adjusted as needed.
Cleanup the display in top a bit to make it clearer where the taskqueue threads are running and what they should be doing.
Ensure that both queues are processed by em_local_timer() by writing them both to the IMS register to generate soft interrupts.
Ensure that an soft interrupt is generated when em_msix_link() is run so that any races between assertion of the link/status interrupt and a rx/tx interrupt are handled.
Document existing tuneables: hw.em.eee_setting, hw.em.msix, hw.em.smart_pwr_down, hw.em.sbp
Document use of hw.em.num_queues and the new kernel option EM_MULTIQUEUE
Thanks to Intel for their continued support of FreeBSD.
Reviewed by: erj jfv hiren gnn wblock Obtained from: Intel Corporation MFC after: 2 weeks Relnotes: Yes Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D1994
|
282978 |
15-May-2015 |
pkelsey |
When a netmap process terminates without the full set of buffers it was granted via rings and ni_bufs_list_head represented in those rings and lists (e.g., via SIGKILL), those buffers are no longer available for subsequent users for the lifetime of the system. To mitigate this resource leak, reset the allocator state when the last ref to that allocator is released.
Note that this only recovers leaked resources for an allocator when there are no longer any users of that allocator, so there remain circumstances in which leaked allocator resources may not ever be recovered - consider a set of multiple netmap processes that are all using the same allocator (say, the global allocator) where members of that set may be killed and restarted over time but at any given point there is one member of that set running.
Based on intial work by adrian@.
Reviewed by: Giuseppe Lettieri (g.lettieri@iet.unipi.it), luigi Approved by: jmallett (mentor) MFC after: 1 week Sponsored by: Norse Corp, Inc.
|
281406 |
11-Apr-2015 |
rpaulo |
netmap: improve the netmap attach message on FreeBSD.
MFC after: 1 week
|
280430 |
24-Mar-2015 |
bz |
Make ix_crcstrip a public symbol for the moment; it probably is not the right solution but I will leave it to experts to untangle this problem to properly stop the build failures.
At the moment only if_ix.c includes dev/netmap/ixgbe_netmap.h which is good as ixgbe_netmap.h defines a couple of (file) static variables--thus local to if_ix.c. static int ix_crcstrip however now also got checked from ix_txrx.c (as an extern) and should not be visible there. In fact we do see powerpc and powerpc64 build failures because of this. It is unclear to me why on other (clang built?) architectures this does not lead to a reference of an undefined symbol and similar build breakage.
|
279232 |
24-Feb-2015 |
luigi |
Add native netmap support to ixl. Preliminary tests indicate 32 Mpps on tx, 24 Mpps on rx with source and receiver on two different ports of the same 40G card. Optimizations are likely possible. The code follows closely the one for ixgbe so i do not expect stability issues.
Hardware kindly supplied by Intel.
Reviewed by: Jack Vogel MFC after: 1 week
|
279199 |
23-Feb-2015 |
luigi |
add MODULE_VERSION, needed to track module dependencies
MFC after: 3 days
|
278774 |
14-Feb-2015 |
luigi |
two minor changes from the master netmap version: 1. handle errors from nm_config(), if any (none of the FreeBSD drivers currently returns an error on this function, so this change is a no-op at this time 2. use a full memory barrier on ioctls
|
278773 |
14-Feb-2015 |
luigi |
whitespace change: clarify the role of MAKEDEV_ETERNAL_KLD, and remove an old #ifdef __FreeBSD__ since the code is valid on all platforms.
|
277653 |
24-Jan-2015 |
adrian |
Change the permissions from 0660 to 0600.
Otherwise people in wheel can do things with netmap, including but not limited to promisc transmit/receive.
Approved by: luigi MFC after: 1 week
|
275358 |
01-Dec-2014 |
hselasky |
Start process of removing the use of the deprecated "M_FLOWID" flag from the FreeBSD network code. The flag is still kept around in the "sys/mbuf.h" header file, but does no longer have any users. Instead the "m_pkthdr.rsstype" field in the mbuf structure is now used to decide the meaning of the "m_pkthdr.flowid" field. To modify the "m_pkthdr.rsstype" field please use the existing "M_HASHTYPE_XXX" macros as defined in the "sys/mbuf.h" header file.
This patch introduces new behaviour in the transmit direction. Previously network drivers checked if "M_FLOWID" was set in "m_flags" before using the "m_pkthdr.flowid" field. This check has now now been replaced by checking if "M_HASHTYPE_GET(m)" is different from "M_HASHTYPE_NONE". In the future more hashtypes will be added, for example hashtypes for hardware dedicated flows.
"M_HASHTYPE_OPAQUE" indicates that the "m_pkthdr.flowid" value is valid and has no particular type. This change removes the need for an "if" statement in TCP transmit code checking for the presence of a valid flowid value. The "if" statement mentioned above is now a direct variable assignment which is then later checked by the respective network drivers like before.
Additional notes: - The SCTP code changes will be committed as a separate patch. - Removal of the "M_FLOWID" flag will also be done separately. - The FreeBSD version has been bumped.
MFC after: 1 month Sponsored by: Mellanox Technologies
|
274459 |
13-Nov-2014 |
luigi |
add support for private knote lock (reduces lock contention), adapting OS_selrecord accordingly. Problem and fix suggested by adrian and jmg
|
274457 |
13-Nov-2014 |
luigi |
we need full barriers here
|
274362 |
11-Nov-2014 |
luigi |
in the Linux section, properly define the NMG_LOCK type. Also import WITH_GENERIC in preparation to adding fine-grained options to disable specific netmap components.
|
274361 |
11-Nov-2014 |
luigi |
- fix typo: use ring size from the rx ring, not the tx one (they should be the same, but just in case); - reuse the previously computed len-1 value
|
274355 |
10-Nov-2014 |
luigi |
fix a typo
|
274354 |
10-Nov-2014 |
luigi |
initialize *color if passed as an argument
|
274353 |
10-Nov-2014 |
luigi |
sync a comment with our internal repo
|
272111 |
25-Sep-2014 |
luigi |
fix a panic when passing ifioctl from a netmap file descriptor to the underlying device. This needs to be merged to 10.1
Reported by: Patrick Kelsey MFC after: 3 days
|
272110 |
25-Sep-2014 |
luigi |
adapt the code to different freebsd versions. Not necessary to MFC
|
271849 |
19-Sep-2014 |
glebius |
Mechanically convert to if_inc_counter().
|
270874 |
31-Aug-2014 |
glebius |
Provide pointer from struct ifnet to struct netmap_adapter, instead of abusing spare field.
|
270253 |
20-Aug-2014 |
np |
Change netmap's global lock to sx instead of a mutex.
Reviewed by: luigi@ MFC after: 1 day
|
270097 |
17-Aug-2014 |
luigi |
staticize two functions, and use proper format for a struct sglist (reported by bz)
|
270063 |
16-Aug-2014 |
luigi |
Update to the current version of netmap. Mostly bugfixes or features developed in the past 6 months, so this is a 10.1 candidate.
Basically no user API changes (some bugfixes in sys/net/netmap_user.h).
In detail:
1. netmap support for virtio-net, including in netmap mode. Under bhyve and with a netmap backend [2] we reach over 1Mpps with standard APIs (e.g. libpcap), and 5-8 Mpps in netmap mode.
2. (kernel) add support for multiple memory allocators, so we can better partition physical and virtual interfaces giving access to separate users. The most visible effect is one additional argument to the various kernel functions to compute buffer addresses. All netmap-supported drivers are affected, but changes are mechanical and trivial
3. (kernel) simplify the prototype for *txsync() and *rxsync() driver methods. All netmap drivers affected, changes mostly mechanical.
4. add support for netmap-monitor ports. Think of it as a mirroring port on a physical switch: a netmap monitor port replicates traffic present on the main port. Restrictions apply. Drive carefully.
5. if_lem.c: support for various paravirtualization features, experimental and disabled by default. Most of these are described in our ANCS'13 paper [1]. Paravirtualized support in netmap mode is new, and beats the numbers in the paper by a large factor (under qemu-kvm, we measured gues-host throughput up to 10-12 Mpps).
A lot of refactoring and additional documentation in the files in sys/dev/netmap, but apart from #2 and #3 above, almost nothing of this stuff is visible to other kernel parts.
Example programs in tools/tools/netmap have been updated with bugfixes and to support more of the existing features.
This is meant to go into 10.1 so we plan an MFC before the Aug.22 deadline.
A lot of this code has been contributed by my colleagues at UNIPI, including Giuseppe Lettieri, Vincenzo Maffione, Stefano Garzarella.
MFC after: 3 days.
|
268530 |
11-Jul-2014 |
glebius |
Fix style bug: rename the refcount field of m_ext to ext_cnt, to match other members.
Sponsored by: Nginx, Inc.
|
267328 |
10-Jun-2014 |
luigi |
change the netmap mbuf destructor so the same code works also on FreeBSD 9. For head and 10 this change has no effect, but on stable/9 it would cause panics when using emulated netmap on top of a standard device driver.
|
267284 |
09-Jun-2014 |
luigi |
Fixes from Fanco Ficthner on transparent mode
* The way rings are updated changed with the last API bump. Also sync ->head when moving slots in netmap_sw_to_nic().
* Remove a crashing selrecord() call.
* Unclog the logic surrounding netmap_rxsync_from_host().
* Add timestamping to RX host ring.
* Remove a couple of obsolete comments.
Submitted by: Franco Fichtner MFC after: 3 days Sponsored by: Packetwerk
|
267283 |
09-Jun-2014 |
luigi |
sync the code with the one in stable/10 (wrap the if_t compatibilty function into a __FreeBSD_version conditional block)
|
267180 |
06-Jun-2014 |
luigi |
better handling of netmap emulation over standard device drivers: plug a potential mbuf leak, and detect bogus drivers that return ENOBUFS even when the packet has been queued.
MFC after: 3 days
|
267177 |
06-Jun-2014 |
luigi |
introduce mbq_lock() and mbq_unlock() for the mbq, so it is easier to buil the same code on linux (this generalizes the change in svn 267142)
MFC after: 3 days
|
267170 |
06-Jun-2014 |
luigi |
move netmap_getna() to a freebsd-specific file
|
267165 |
06-Jun-2014 |
luigi |
align comments with the ones in our development trunk
|
267164 |
06-Jun-2014 |
luigi |
rate limit some error messages
|
267163 |
06-Jun-2014 |
luigi |
remove two debugging messages, align comments with the code in our development trunk
|
267151 |
06-Jun-2014 |
luigi |
add checks for invalid buffer pointers and lengths
|
267150 |
06-Jun-2014 |
luigi |
prevent a panic when the netdev/ifp is not set in attach (internal c63a7b85)
MFC after: 3 days
|
267142 |
06-Jun-2014 |
zont |
Use mtx_lock_spin/mtx_unlock_spin primitives on spin lock
Reviewed by: luigi MFC after: 1 week
|
267128 |
05-Jun-2014 |
luigi |
whitespace change: remove trailing whitespace
|
266974 |
02-Jun-2014 |
marcel |
Introduce a procedural interface to the ifnet structure. The new interface allows the ifnet structure to be defined as an opaque type in NIC drivers. This then allows the ifnet structure to be changed without a need to change or recompile NIC drivers.
Put differently, NIC drivers can be written and compiled once and be used with different network stack implementations, provided of course that those network stack implementations have an API and ABI compatible interface.
This commit introduces the 'if_t' type to replace 'struct ifnet *' as the type of a network interface. The 'if_t' type is defined as 'void *' to enable the compiler to perform type conversion to 'struct ifnet *' and vice versa where needed and without warnings. The functions that implement the API are the only functions that need to have an explicit cast.
The MII code has been converted to use the driver API to avoid unnecessary code churn. Code churn comes from having to work with both converted and unconverted drivers in correlation with having callback functions that take an interface. By converting the MII code first, the callback functions can be defined so that the compiler will perform the typecasts automatically.
As soon as all drivers have been converted, the if_t type can be redefined as needed and the API functions can be fix to not need an explicit cast.
The immediate benefactors of this change are: 1. Juniper Networks - The network stack implementation in Junos is entirely different from FreeBSD's one and this change allows Juniper to build "stock" NIC drivers that can be used in combination with both the FreeBSD and Junos stacks. 2. FreeBSD - This change opens the door towards changing ifnet and implementing new features and optimizations in the network stack without it requiring a change in the many NIC drivers FreeBSD has.
Submitted by: Anuranjan Shukla <anshukla@juniper.net> Reviewed by: glebius@ Obtained from: Juniper Networks, Inc.
|
262238 |
20-Feb-2014 |
luigi |
compile with NOINET
|
262149 |
18-Feb-2014 |
luigi |
two small changes: - intercept FIONBIO and FIOASYNC ioctls on netmap file descriptors. libpcap calls them to set non blocking I/O on the file descriptor, for netmap this is a no-op because there is no read/write, but not intercepting would cause fcntl() to return -1 - rate limit and put under netmap.verbose some messages that occur when threads use concurrently the same file descriptor.
|
261909 |
15-Feb-2014 |
luigi |
This new version of netmap brings you the following:
- netmap pipes, providing bidirectional blocking I/O while moving 100+ Mpps between processes using shared memory channels (no mistake: over one hundred million. But mind you, i said *moving* not *processing*);
- kqueue support (BHyVe needs it);
- improved user library. Just the interface name lets you select a NIC, host port, VALE switch port, netmap pipe, and individual queues. The upcoming netmap-enabled libpcap will use this feature.
- optional extra buffers associated to netmap ports, for applications that need to buffer data yet don't want to make copies.
- segmentation offloading for the VALE switch, useful between VMs.
and a number of bug fixes and performance improvements.
My colleagues Giuseppe Lettieri and Vincenzo Maffione did a substantial amount of work on these features so we owe them a big thanks.
There are some external repositories that can be of interest:
https://code.google.com/p/netmap our public repository for netmap/VALE code, including linux versions and other stuff that does not belong here, such as python bindings.
https://code.google.com/p/netmap-libpcap a clone of the libpcap repository with netmap support. With this any libpcap client has access to most netmap feature with no recompilation. E.g. tcpdump can filter packets at 10-15 Mpps.
https://code.google.com/p/netmap-ipfw a userspace version of ipfw+dummynet which uses netmap to send/receive packets. Speed is up in the 7-10 Mpps range per core for simple rulesets.
Both netmap-libpcap and netmap-ipfw will be merged upstream at some point, but while this happens it is useful to have access to them.
And yes, this code will be merged soon. It is infinitely better than the version currently in 10 and 9.
MFC after: 3 days
|
260700 |
16-Jan-2014 |
luigi |
netmap_user.h: add separate rx/tx ring indexes add ring specifier in nm_open device name
netmap.c, netmap_vale.c more consistent errno numbers
netmap_generic.c correctly handle failure in registering interfaces.
tools/tools/netmap/ massive cleanup of the example programs (a lot of common code is now in netmap_user.h.)
nm_util.[ch] are going away soon. pcap.c will also go when i commit the native netmap support for libpcap.
|
260516 |
10-Jan-2014 |
luigi |
Fix netmap emulation when NICs attached to a VALE switch have a different number of tx and rx rings
Submitted by: Vincenzo Maffione
|
260515 |
10-Jan-2014 |
luigi |
sync with our internal repo - small change in debugging messages
|
260462 |
09-Jan-2014 |
glebius |
Fix build with VIMAGE.
|
260411 |
07-Jan-2014 |
luigi |
fix use after free when releasing a netmap adapter.
Submitted by: Giuseppe Lettieri
|
260368 |
06-Jan-2014 |
luigi |
It is 2014 and we have a new version of netmap. Most relevant features:
- netmap emulation on any NIC, even those without native netmap support.
On the ixgbe we have measured about 4Mpps/core/queue in this mode, which is still a lot more than with sockets/bpf.
- seamless interconnection of VALE switch, NICs and host stack.
If you disable accelerations on your NIC (say em0)
ifconfig em0 -txcsum -txcsum
you can use the VALE switch to connect the NIC and the host stack:
vale-ctl -h valeXX:em0
allowing sharing the NIC with other netmap clients.
- THE USER API HAS SLIGHTLY CHANGED (head/cur/tail pointers instead of pointers/count as before). This was unavoidable to support, in the future, multiple threads operating on the same rings. Netmap clients require very small source code changes to compile again. On the plus side, the new API should be easier to understand and the internals are a lot simpler.
The manual page has been updated extensively to reflect the current features and give some examples.
This is the result of work of several people including Giuseppe Lettieri, Vincenzo Maffione, Michio Honda and myself, and has been financially supported by EU projects CHANGE and OPENLAB, from NetApp University Research Fund, NEC, and of course the Universita` di Pisa.
|
259538 |
18-Dec-2013 |
glebius |
Fix build.
|
259487 |
16-Dec-2013 |
luigi |
fix the build using __builtin_prefetch() instead of redefining prefetch()
|
259412 |
15-Dec-2013 |
luigi |
split netmap code according to functions: - netmap.c base code - netmap_freebsd.c FreeBSD-specific code - netmap_generic.c emulate netmap over standard drivers - netmap_mbq.c simple mbuf tailq - netmap_mem2.c memory management - netmap_vale.c VALE switch
simplify devce-specific code
|
257758 |
06-Nov-2013 |
luigi |
remove a debugging message
|
257666 |
05-Nov-2013 |
luigi |
remove some test code.
|
257665 |
05-Nov-2013 |
luigi |
fix a bug when a device has 1 tx (or rx) queue and more than one queue of a different type.
Submitted by: Vincenzo Maffione MFC after: 3 days
|
257664 |
05-Nov-2013 |
luigi |
check errors on return from netmap_attach()
Submitted by: Giuseppe Lettieri MFC after: 3 days
|
257550 |
02-Nov-2013 |
luigi |
circumvent a couple of warnings: - on line 2550 intentionally overriding a const qualifier - on line 3219 intentionally converting uint64_t to a pointer
|
257537 |
02-Nov-2013 |
luigi |
add missing file from previous netmap update...
|
257529 |
01-Nov-2013 |
luigi |
update to the latest netmap snapshot. This includes the following: - use separate memory regions for VALE ports - locking fixes - some simplifications in the NIC-specific routines - performance improvements for the VALE switch - some new features in the pkt-gen test program - documentation updates
There are small API changes that require programs to be recompiled (NETMAP_API has been bumped so you will detect old binaries at runtime).
In particular: - struct netmap_slot now is 16 bytes to support an extra pointer, which may save one data copy when using VALE ports or VMs; - the struct netmap_if has two extra fields;
MFC after: 3 days
|
257176 |
26-Oct-2013 |
glebius |
The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare to this event, adding if_var.h to files that do need it. Also, include all includes that now are included due to implicit pollution via if_var.h
Sponsored by: Netflix Sponsored by: Nginx, Inc.
|
256200 |
09-Oct-2013 |
jfv |
Update the Intel igb driver to version 2.4.0 - This version has support for the new Intel Avoton systems, including 2.5Gb support, further it now has IPv6/TSO6 support as well. Shared code has been updated where necessary as well. Thanks to my new assistant Eric Joyner for doing the transmit path changes to bring in the IPv6/TSO6 support. Thanks to Gleb for catching the one bug and change needed in NETMAP.
Approved by: re
|
251425 |
05-Jun-2013 |
luigi |
- fix a bug in the previous commit that was dropping the last packet from each batch flowing on the VALE switch
- feature: add glue for 'indirect' buffers on the sender side: if a slot has NS_INDIRECT set, the netmap buffer contains pointer(s) to the actual userspace buffers, which are accessed with copyin(). The feature is not finalised yet, as it will likely need to deal with some iovec variant for proper scatter/gather support. This will save one copy for clients (e.g. qemu) that cannot use the netmap buffer directly.
A curiosity: on amd64 copyin() appears to be 10-15% faster than pkt_copy() or bcopy() at least for sizes of 256 and greater.
|
251139 |
30-May-2013 |
luigi |
Bring in a number of new features, mostly implemented by Michio Honda:
- the VALE switch now support up to 254 destinations per switch, unicast or broadcast (multicast goes to all ports).
- we can attach hw interfaces and the host stack to a VALE switch, which means we will be able to use it more or less as a native bridge (minor tweaks still necessary). A 'vale-ctl' program is supplied in tools/tools/netmap to attach/detach ports the switch, and list current configuration.
- the lookup function in the VALE switch can be reassigned to something else, similar to the pf hooks. This will enable attaching the firewall, or other processing functions (e.g. in-kernel openvswitch) directly on the netmap port.
The internal API used by device drivers does not change.
Userspace applications should be recompiled because we bump NETMAP_API as we now use some fields in the struct nmreq that were previously ignored -- otherwise, data structures are the same.
Manpages will be committed separately.
|
250441 |
10-May-2013 |
luigi |
another minor bugfix in the memory allocator, this time in the free routine.
|
250184 |
02-May-2013 |
luigi |
remove trailing whitespace
|
250107 |
30-Apr-2013 |
luigi |
Partial cleanup in preparation for upcoming changes:
- netmap_rx_irq()/netmap_tx_irq() can now be called by FreeBSD drivers hiding the logic for handling NIC interrupts in netmap mode. This also simplifies the case of NICs attached to VALE switches. Individual drivers will be updated with separate commits.
- use the same refcount() API for FreeBSD and linux
- plus some comments, typos and formatting fixes
Portions contributed by Michio Honda
|
250054 |
29-Apr-2013 |
luigi |
whitespace - document alternative locking under linux
|
250052 |
29-Apr-2013 |
luigi |
whitespace changes: remove $Id$ lines, and add blank lines around some #if / #elif /#endif
|
250049 |
29-Apr-2013 |
luigi |
explicitly mark some variables as const
|
249659 |
19-Apr-2013 |
luigi |
mostly whitespace changes: - remove vestiges of the old memory allocator - clean up some comments
|
249504 |
15-Apr-2013 |
luigi |
fix a bug in the computation of the userspace offset for a give netmap buffer.
Submitted by: Hugh Nhan
|
248084 |
09-Mar-2013 |
attilio |
Switch the vm_object mutex to be a rwlock. This will enable in the future further optimizations where the vm_object lock will be held in read mode most of the time the page cache resident pool of pages are accessed for reading purposes.
The change is mostly mechanical but few notes are reported: * The KPI changes as follow: - VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK() - VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK() - VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK() - VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED() (in order to avoid visibility of implementation details) - The read-mode operations are added: VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(), VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED() * The vm/vm_pager.h namespace pollution avoidance (forcing requiring sys/mutex.h in consumers directly to cater its inlining functions using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h consumers now must include also sys/rwlock.h. * zfs requires a quite convoluted fix to include FreeBSD rwlocks into the compat layer because the name clash between FreeBSD and solaris versions must be avoided. At this purpose zfs redefines the vm_object locking functions directly, isolating the FreeBSD components in specific compat stubs.
The KPI results heavilly broken by this commit. Thirdy part ports must be updated accordingly (I can think off-hand of VirtualBox, for example).
Sponsored by: EMC / Isilon storage division Reviewed by: jeff Reviewed by: pjd (ZFS specific review) Discussed with: alc Tested by: pho
|
245836 |
23-Jan-2013 |
luigi |
Add support for transparent mode while in netmap.
By setting dev.netmap.fwd=1 (or enabling the feature with a per-ring flag), packets are forwarded between the NIC and the host stack unless the netmap client clears the NS_FORWARD flag on the individual descriptors.
This feature greatly simplifies applications where some traffic (think of ARP, control traffic, ssh sessions...) must be processed by the host stack, whereas the bulk is handled by the netmap process which simply (un)marks packets that should not be forwarded. The default is chosen so that now a netmap receiver operates in a mode very similar to bpf.
Of course there is no free lunch: traffic to/from the host stack still operates at OS speed (or less, as there is one extra copy in one direction). HOWEVER, since traffic goes to the user process before being reinjected, and reinjection occurs in a user context, you get some form of livelock protection for free.
|
245835 |
23-Jan-2013 |
luigi |
control some debugging messages with dev.netmap.verbose
add infrastracture to adapt to changes in number of queues and buffers at runtime
|
245581 |
17-Jan-2013 |
luigi |
remove the old memory allocator, not useful anymore
|
245579 |
17-Jan-2013 |
luigi |
add some definition and driver changes in preparation for two upcoming features:
semi-transparent mode: when a device is opened in this mode, the user program will be able to mark slots that must be forwarded to the "other" side (i.e. from NIC to host stack, or viceversa), and the forwarding will occur automatically at the next netmap syscall. This saves the need to open another file descriptor and do the forwarding manually.
direct-forwarding mode: when operating with a VALE port, the user can specify in the slot the actual destination port, overriding the forwarding decision made by a lookup of the destination MAC. This can be useful to implement packet dispatchers.
No API changes will be introduced. No new functionality in this patch yet.
|
245570 |
17-Jan-2013 |
luigi |
remove an incorrect comment and debugging code
|
244514 |
20-Dec-2012 |
luigi |
rename the 'tag' and 'map' fields used the rx ring to their previous names, 'ptag' and 'pmap' -- p stands for packet.
This change reduces the difference between the code in stable/9 and head, and also helps using the same ixgbe_netmap.h on both branches.
Approved by: Jack Vogel
|
243714 |
30-Nov-2012 |
jfv |
First of a series of 11 patches leading to new ixgbe version 2.5.0 This removes the header split and supporting code from the driver.
|
241750 |
19-Oct-2012 |
emaste |
Use M_NOWAIT when calling malloc with a lock held.
The check for a NULL return was already in place so I assume this was just an oversight.
|
241723 |
19-Oct-2012 |
glebius |
Fix build.
|
241719 |
19-Oct-2012 |
luigi |
This is an import of code, mostly from Giuseppe Lettieri, that revises the netmap memory allocator so that the various parameters (number and size of buffers, rings, descriptors) can be modified at runtime through sysctl variables. The changes become effective when no netmap clients are active.
The API is mostly unchanged, although the NIOCUNREGIF ioctl now does not bring the interface back to normal mode: and you need to close the file descriptor for that. This change was necessary to track who is using the mapped region, and since it is a simplification of the API there was no incentive in trying to preserve NIOCUNREGIF. We will remove the ioctl from the kernel next time we need a real API change (and version bump).
Among other things, buffer allocation when opening devices is now much faster: it used to take O(N^2) time, now it is linear.
Submitted by: Giuseppe Lettieri
|
241643 |
17-Oct-2012 |
emaste |
Avoid panic when a netmap instance cannot obtain memory.
A uint32_t is always >= 0.
Sponsored by: ADARA Networks
|
239242 |
13-Aug-2012 |
emaste |
Reword comment to try to improve clarity, and fix a typo.
|
239149 |
09-Aug-2012 |
emaste |
Improve lock and unlock symmetry
- Move destruction of per-ring locks to netmap_dtor_locked to mirror the initialization that happens in NIOCREGIF. Otherwise unloading a netmap- capable interface that was never put into netmap mode would try to mtx_destroy an uninitialized mutex, and panic.
- Destroy core_lock in netmap_detach, mirroring init in netmap_attach.
- Also comment out the knlist_destroy for now as there is currently no knlist_init.
Sponsored by: ADARA Networks Reviewed by: luigi@
|
239141 |
08-Aug-2012 |
emaste |
Fix whitespace (missing newline)
|
239140 |
08-Aug-2012 |
emaste |
Clarify comments about number of tx / rx rings
|
238985 |
02-Aug-2012 |
luigi |
fix some signed/unsigned warnings in the netmap code. Unfortunately the original drivers still have a lot of sign conversion/comparison warnings.
|
238982 |
02-Aug-2012 |
luigi |
Add a newline on an error message; rename linux functions to avoid confusion; fix error reporting on linux
|
238937 |
31-Jul-2012 |
luigi |
remove a redundant MALLOC_DECLARE
|
238912 |
30-Jul-2012 |
luigi |
- move the inclusion of netmap headers to the common part of the code; - more portable annotations for unused arguments;
|
238837 |
27-Jul-2012 |
luigi |
use __builtin_prefetch() for prefetch.
merge in the remaining part of the linux-specific glue so i do not need to maintain two different distributions.
|
238831 |
27-Jul-2012 |
luigi |
remove unused definition, whitespace cleanup
|
238818 |
26-Jul-2012 |
luigi |
define prefetch as a noop on !x86
|
238812 |
26-Jul-2012 |
luigi |
Add support for VALE bridges to the netmap core, see
http://info.iet.unipi.it/~luigi/vale/
VALE lets you dynamically instantiate multiple software bridges that talk the netmap API (and are *extremely* fast), so you can test netmap applications without the need for high end hardware.
This is particularly useful as I am completing a netmap-aware version of ipfw, and VALE provides an excellent testing platform.
Also, I also have netmap backends for qemu mostly ready for commit to the port, and this too will let you interconnect virtual machines at high speed without fiddling with bridges, tap or other slow solutions.
The API for applications is unchanged, so you can use the code in tools/tools/netmap (which i will update soon) on the VALE ports.
This commit also syncs the code with the one in my internal repository, so you will see some conditional code for other platforms. The code should run mostly unmodified on stable/9 so people interested in trying it can just copy sys/dev/netmap/ and sys/net/netmap*.h from HEAD
VALE is joint work with my colleague Giuseppe Lettieri, and is partly supported by the EU Projects CHANGE and OPENLAB
|
235562 |
17-May-2012 |
luigi |
this file is too old and not interesting anymore now that netmap has been MFC'ed.
|
234986 |
03-May-2012 |
luigi |
print 'netmap stack ring full' only in verbose mode.
|
234290 |
14-Apr-2012 |
luigi |
i prefer this fix for the -Wformat warning (just one cast, all the other variables are already correct for %x). My previous attempt put the cast in the wrong place.
|
234283 |
14-Apr-2012 |
bz |
Make compile on 64bit somehow for now after a first try at r234242 on maybe 32bit?
|
234242 |
13-Apr-2012 |
luigi |
fix build with -Wformat -Wmissing-prototypes
|
234229 |
13-Apr-2012 |
luigi |
Properly disable crc stripping when operating in netmap mode.
Contrarily to what i wrote in my previous commit, the 82599 does include the CRC in the length. The operating mode is reset in ixgbe_init_locked() and so we need to hook into the places where the two registers (HLREG0 and RDRXCTL) are modified.
|
234228 |
13-Apr-2012 |
luigi |
add the new memory allocator for netmap, which allocates memory in small clusters instead of one big contiguous chunk. This was already enabled in the previous commit.
|
234227 |
13-Apr-2012 |
luigi |
A bit of cleanup in the names of fields of netmap-related structures. Use the name 'ring' instead of 'queue' in all fields. Bump NETMAP_API.
|
234225 |
13-Apr-2012 |
luigi |
do not use a deprecated field in a structure.
|
234185 |
12-Apr-2012 |
luigi |
Apparently the length field in advanced descriptors does not include the CRC irrespective of the setting of CRCSTRIP. The 82599 data sheets (sec. 7.1.6) say differently. Very strange. Need to check what happens on legacy descriptors, but for the time being this restores functionality.
|
234174 |
12-Apr-2012 |
luigi |
Some code restructuring to bring the memory allocator out of netmap.c and make it easier to replace it with a different implementation. On passing, also fix indentation.
NOTE: I know that #include "foo.c" is ugly, but the alternative (add another entry to sys/conf/files, add a separate header with structs and prototypes, and expose functions that are meant to be private) looks even worse to me. We need a more modular way to specify dependencies and build options.
|
234169 |
12-Apr-2012 |
luigi |
use correct selinfo pointer for the generic interrupt handler (it is never used in current FreeBSD drivers).
|
234140 |
11-Apr-2012 |
luigi |
A couple of changes related to ixgbe operation in netmap mode:
- add a sysctl, dev.netmap.ix_crcstrip, to control whether ixgbe should strip the CRC on received frames. Defaults to 0, which keeps the CRC. and improves performance when receiving min-sized (64-byte) frames. This matters because min-sized frames is one of the standard benchmarks for switches and routers, some chipsets seem to issue read-modify-write cycles for PCIe transactions that are not a full cache line, and a min-sized frame triggers the bug, resulting in reduced throughput -- 9.7 instead of 14.88 Mpps -- and heavy bus load.
- for the time being, always look for incoming packets on a select/poll even if there has not been an interrupt in the meantime. This is only a temporary workaround for a probable race condition in keeping track of rx interrupts. Add a couple of diagnostic vars to help studying the problem.
|
232238 |
27-Feb-2012 |
luigi |
A bunch of netmap fixes:
USERSPACE: 1. add support for devices with different number of rx and tx queues;
2. add better support for zero-copy operation, adding an extra field to the netmap ring to indicate how many buffers we have already processed but not yet released (with help from Eddie Kohler);
3. The two changes above unfortunately require an API change, so while at it add a version field and some spares to the ioctl() argument to help detect mismatches.
4. update the manual page for the two changes above;
5. update sample applications in tools/tools/netmap
KERNEL:
1. simplify the internal structures moving the global wait queues to the 'struct netmap_adapter';
2. simplify the functions that map kring<->nic ring indexes
3. normalize device-specific code, helps mainteinance;
4. start exploring the impact of micro-optimizations (prefetch etc.) in the ixgbe driver. Use 'legacy' descriptors on the tx ring and prefetch slots gives about 20% speedup at 900 MHz. Another 7-10% would come from removing the explict calls to bus_dmamap* in the core (they are effectively NOPs in this case, but it takes expensive load of the per-buffer dma maps to figure out that they are all NULL.
Rx performance not investigated.
I am postponing the MFC so i can import a few more improvements before merging.
|
231881 |
17-Feb-2012 |
luigi |
Various cleanups for readability (no functional changes)
- remove the KEVENT code, which was incomplete and not compiled anyways; - change some while() loops into for() - adjust indentation - remove extra whitespace
MFC after: 1 week
|
231796 |
15-Feb-2012 |
luigi |
(This commit only touches code within the DEV_NETMAP blocks)
Introduce some functions to map NIC ring indexes into netmap ring indexes and vice versa. This way we can implement the bound checks only in one place (and hopefully in a correct way).
On passing, make the code and comments more uniform across the various drivers.
|
231778 |
15-Feb-2012 |
luigi |
reduce the differences between these three files. The three drivers (em, lem and igb) are extremely similar, too bad that the structures use different names and we cannot share the code.
|
231594 |
13-Feb-2012 |
luigi |
- use struct ifnet as explicit type of the argument to the txsync() and rxsync() callbacks, removing some variables made useless by this change;
- add generic lock and irq handling routines. These can be useful in case there are no driver locks that we can reuse;
- add a few macros to reduce differences with the Linux version.
|
231198 |
08-Feb-2012 |
luigi |
- change the buffer size from a constant to a TUNABLE variable (hw.netmap.buf_size) so we can experiment with values different from 2048 which may give better cache performance.
- rearrange the memory allocation code so it will be easier to replace it with a different implementation. The current code relies on a single large contiguous chunk of memory obtained through contigmalloc. The new implementation (not committed yet) uses multiple smaller chunks which are easier to fit in a fragmented address space.
|
230572 |
26-Jan-2012 |
luigi |
ixgbe changes: - remove experimental code for disabling CRC - use the correct constant for conversion between interrupt rate and EITR values (the previous values were off by a factor of 2) - make dev.ix.N.queueM.interrupt_rate a RW sysctl variable. Changing individual values affects the queue immediately, and propagates to all interfaces at the next reinit. - add dev.ix.N.queueM.irqs rdonly sysctl, to export the actual interrupt counts
Netmap-related changes for ixgbe: - use the "new" format for TX descriptors in netmap mode. - pass interrupt mitigation delays to the user process doing poll() on a netmap file descriptor. On the RX side this means we will not check the ring more than once per interrupt. This gives the process a chance to sleep and process packets in larger batches, thus reducing CPU usage. On the TX side we take this even further: completed transmissions are reclaimed every half ring even if the NIC interrupts more often. This saves even more CPU without any additional tx delays.
Generic Netmap-related changes: - align the netmap_kring to cache lines so that there is no false sharing (possibly useful for multiqueue NICs and MSIX interrupts, which are handled by different cores). It's a minor improvement but it does not cost anything.
Reviewed by: Jack Vogel Approved by: Jack Vogel
|
230058 |
13-Jan-2012 |
luigi |
indentation and whitespace fixes
|
230055 |
13-Jan-2012 |
luigi |
fix indentation
|
230052 |
13-Jan-2012 |
luigi |
Two performance-related fixes: 1. as reported by Alexander Fiveg, the allocator was reporting half of the allocated memory. Fix this by exiting from the loop earlier (not too critical because this code is going away soon).
2. following a discussion on freebsd-current http://lists.freebsd.org/pipermail/freebsd-current/2012-January/031144.html turns out that (re)loading the dmamap was expensive and not optimized. This operation is in the critical path when doing zero-copy forwarding between interfaces. At least on netmap and i386/amd64, the bus_dmamap_load can be completely bypassed if the map is NULL, so we do it.
The latter change gives an almost 3x improvement in forwarding performance, from the previous 9.5Mpps at 2.9GHz to the current line rate (14.2Mpps) at 1.733GHz. (this is for 64+4 byte packets, in other configurations the PCIe bus is a bottleneck).
|
229947 |
10-Jan-2012 |
luigi |
other simplifications in the internal interfaces to the memory allocator.
|
229939 |
10-Jan-2012 |
luigi |
small code cleanup in preparation for future modifications in the memory allocator used by netmap. No functional change, two small bug fixes: - in if_re.c add a missing bus_dmamap_sync() - in netmap.c comment out a spurious free() in an error handling block
|
228881 |
25-Dec-2011 |
luigi |
remove a variable definition which shadows the correct one.
Submitted by: Eitan Adler
|
228845 |
23-Dec-2011 |
luigi |
1. don't use if_pspare directly, but through a macro WMA()
2. move a variable declaration at the beginning of a block
|
228844 |
23-Dec-2011 |
luigi |
whitespace fixes (one missing newline, one extra tab)
|
228694 |
18-Dec-2011 |
marius |
Fix compilation on sparc64 by actually supplying the bus_dma_tag_t member of the rx_ring to bus_dmamap_sync(9). Given that netmap code tries to obtain the bus addresses of netmap buffers via vtophys(9) instead of using bus_dma(9) it currently has zero chance of actually working on sparc64 though (and for that matter f.e. also not with MACs limited to 32-bit DMA on x86 machines with more than 4GB of RAM).
|
228280 |
05-Dec-2011 |
luigi |
revise the implementation of the rings connected to the host stack
|
228276 |
05-Dec-2011 |
luigi |
1. Fix the handling of link reset while in netmap more. A link reset now is completely transparent for the netmap client: even if the NIC resets its own ring (e.g. restarting from 0), the client will not see any change in the current rx/tx positions, because the driver will keep track of the offset between the two.
2. make the device-specific code more uniform across different drivers There were some inconsistencies in the implementation of the netmap support routines, now drivers have been aligned to a common code structure.
3. import netmap support for ixgbe . This is implemented as a very small patch for ixgbe.c (233 lines, 11 chunks, mostly comments: in total the patch has only 54 lines of new code) , as most of the code is in an external file sys/dev/netmap/ixgbe_netmap.h , following some initial comments from Jack Vogel about making changes less intrusive. (Note, i have emailed Jack multiple times asking if he had comments on this structure of the code; i got no reply so i assume he is fine with it).
Support for other drivers (em, lem, re, igb) will come later.
"ixgbe" is now the reference driver for netmap support. Both the external file (sys/dev/netmap/ixgbe_netmap.h) and the device-specific patches (in sys/dev/ixgbe/ixgbe.c) are heavily commented and should serve as a reference for other device drivers.
Tested on i386 and amd64 with the pkt-gen program in tools/tools/netmap, the sender does 14.88 Mpps at 1050 Mhz and 14.2 Mpps at 900 MHz on an i7-860 with 4 cores and 82599 card. Haven't tried yet more aggressive optimizations such as adding 'prefetch' instructions in the time-critical parts of the code.
|
227875 |
23-Nov-2011 |
luigi |
fix formatting warning using casts. The numbers involved are small and these are debug statements, so there is no reason to obfuscate the format string with PRIsomeKINDofINTEGER
|
227614 |
17-Nov-2011 |
luigi |
Bring in support for netmap, a framework for very efficient packet I/O from userspace, capable of line rate at 10G, see
http://info.iet.unipi.it/~luigi/netmap/
At this time I am bringing in only the generic code (sys/dev/netmap/ plus two headers under sys/net/), and some sample applications in tools/tools/netmap. There is also a manpage in share/man/man4 [1]
In order to make use of the framework you need to build a kernel with "device netmap", and patch individual drivers with the code that you can find in
sys/dev/netmap/head.diff
The file will go away as the relevant pieces are committed to the various device drivers, which should happen in a few days after talking to the driver maintainers.
Netmap support is available at the moment for Intel 10G and 1G cards (ixgbe, em/lem/igb), and for the Realtek 1G card ("re"). I have partial patches for "bge" and am starting to work on "cxgbe". Hopefully changes are trivial enough so interested third parties can submit their patches. Interested people can contact me for advice on how to add netmap support to specific devices.
CREDITS: Netmap has been developed by Luigi Rizzo and other collaborators at the Universita` di Pisa, and supported by EU project CHANGE (http://www.change-project.eu/) The code is distributed under a BSD Copyright.
[1] In my opinion is a bad idea to have all manpage in one directory. We should place kernel documentation in the same dir that contains the code, which would make it much simpler to keep doc and code in sync, reduce the clutter in share/man/ and incidentally is the policy used for all of userspace code. Makefiles and doc tools can be trivially adjusted to find the manpages in the relevant subdirs.
|