#
29077990 |
|
24-Jan-2024 |
Maciej Fijalkowski <maciej.fijalkowski@intel.com> |
intel: xsk: initialize skb_frag_t::bv_offset in ZC drivers Ice and i40e ZC drivers currently set offset of a frag within skb_shared_info to 0, which is incorrect. xdp_buffs that come from xsk_buff_pool always have 256 bytes of a headroom, so they need to be taken into account to retrieve xdp_buff::data via skb_frag_address(). Otherwise, bpf_xdp_frags_increase_tail() would be starting its job from xdp_buff::data_hard_start which would result in overwriting existing payload. Fixes: 1c9ba9c14658 ("i40e: xsk: add RX multi-buffer support") Fixes: 1bbc04de607b ("ice: xsk: add RX multi-buffer support") Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/r/20240124191602.566724-8-maciej.fijalkowski@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
f7f6aa8e |
|
24-Jan-2024 |
Maciej Fijalkowski <maciej.fijalkowski@intel.com> |
xsk: make xsk_buff_pool responsible for clearing xdp_buff::flags XDP multi-buffer support introduced XDP_FLAGS_HAS_FRAGS flag that is used by drivers to notify data path whether xdp_buff contains fragments or not. Data path looks up mentioned flag on first buffer that occupies the linear part of xdp_buff, so drivers only modify it there. This is sufficient for SKB and XDP_DRV modes as usually xdp_buff is allocated on stack or it resides within struct representing driver's queue and fragments are carried via skb_frag_t structs. IOW, we are dealing with only one xdp_buff. ZC mode though relies on list of xdp_buff structs that is carried via xsk_buff_pool::xskb_list, so ZC data path has to make sure that fragments do *not* have XDP_FLAGS_HAS_FRAGS set. Otherwise, xsk_buff_free() could misbehave if it would be executed against xdp_buff that carries a frag with XDP_FLAGS_HAS_FRAGS flag set. Such scenario can take place when within supplied XDP program bpf_xdp_adjust_tail() is used with negative offset that would in turn release the tail fragment from multi-buffer frame. Calling xsk_buff_free() on tail fragment with XDP_FLAGS_HAS_FRAGS would result in releasing all the nodes from xskb_list that were produced by driver before XDP program execution, which is not what is intended - only tail fragment should be deleted from xskb_list and then it should be put onto xsk_buff_pool::free_list. Such multi-buffer frame will never make it up to user space, so from AF_XDP application POV there would be no traffic running, however due to free_list getting constantly new nodes, driver will be able to feed HW Rx queue with recycled buffers. Bottom line is that instead of traffic being redirected to user space, it would be continuously dropped. To fix this, let us clear the mentioned flag on xsk_buff_pool side during xdp_buff initialization, which is what should have been done right from the start of XSK multi-buffer support. Fixes: 1bbc04de607b ("ice: xsk: add RX multi-buffer support") Fixes: 1c9ba9c14658 ("i40e: xsk: add RX multi-buffer support") Fixes: 24ea50127ecf ("xsk: support mbuf on ZC RX") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/r/20240124191602.566724-3-maciej.fijalkowski@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
62589808 |
|
05-Dec-2023 |
Jesse Brandeburg <jesse.brandeburg@intel.com> |
i40e: field get conversion Refactor the i40e driver to use FIELD_GET() for mask and shift reads, which reduces lines of code and adds clarity of intent. This code was generated by the following coccinelle/spatch script and then manually repaired. While making one of the conversions, an if() check was inverted to return early and avoid un-necessary indentation of the remainder of the function. In some other cases a stack variable was moved inside the block where it was used while doing cleanups/review. A couple places were changed to use le16_get_bits() instead of FIELD_GET with a le16_to_cpu combination. @get@ constant shift,mask; metavariable type T; expression a; @@ -(((T)(a) & mask) >> shift) +FIELD_GET(mask, a) and applied via: spatch --sp-file field_prep.cocci --in-place --dir \ drivers/net/ethernet/intel/ Cc: Julia Lawall <Julia.Lawall@inria.fr> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
#
56df3459 |
|
27-Sep-2023 |
Ivan Vecera <ivecera@redhat.com> |
i40e: Remove circular header dependencies and fix headers Similarly as for ice driver [1] there are also circular header dependencies in i40e driver: i40e.h -> i40e_virtchnl_pf.h -> i40e.h Another issue is that i40e header files does not contain their own dependencies on other header files (both private and standard) so their inclusion in .c file require to add these deps in certain order to that .c file to make it compilable. Fix both issues by removal the mentioned circular dependency, by filling i40e headers with their dependencies so they can be placed anywhere in a source code. Additionally remove bunch of includes from i40e.h super header file that are not necessary and include i40e.h only in .c files that really require it. [1] 649c87c6ff52 ("ice: remove circular header dependencies on ice.h") Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
#
913eda2b |
|
18-Oct-2023 |
Maciej Fijalkowski <maciej.fijalkowski@intel.com> |
i40e: xsk: remove count_mask Cited commit introduced a neat way of updating next_to_clean that does not require boundary checks on each increment. This was done by masking the new value with (ring length - 1) mask. Problem is that this is applicable only for power of 2 ring sizes, for every other size this assumption can not be made. In turn, it leads to cleaning descriptors out of order as well as splats: [ 1388.411915] Workqueue: events xp_release_deferred [ 1388.411919] RIP: 0010:xp_free+0x1a/0x50 [ 1388.411921] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 55 48 8b 57 70 48 8d 47 70 48 89 e5 48 39 d0 74 06 <5d> c3 cc cc cc cc 48 8b 57 60 83 82 b8 00 00 00 01 48 8b 57 60 48 [ 1388.411922] RSP: 0018:ffa0000000a83cb0 EFLAGS: 00000206 [ 1388.411923] RAX: ff11000119aa5030 RBX: 000000000000001d RCX: ff110001129b6e50 [ 1388.411924] RDX: ff11000119aa4fa0 RSI: 0000000055555554 RDI: ff11000119aa4fc0 [ 1388.411925] RBP: ffa0000000a83cb0 R08: 0000000000000000 R09: 0000000000000000 [ 1388.411926] R10: 0000000000000001 R11: 0000000000000000 R12: ff11000115829b80 [ 1388.411927] R13: 000000000000005f R14: 0000000000000000 R15: ff11000119aa4fc0 [ 1388.411928] FS: 0000000000000000(0000) GS:ff11000277e00000(0000) knlGS:0000000000000000 [ 1388.411929] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1388.411930] CR2: 00007f1f564e6c14 CR3: 000000000783c005 CR4: 0000000000771ef0 [ 1388.411931] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1388.411931] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 1388.411932] PKRU: 55555554 [ 1388.411933] Call Trace: [ 1388.411934] <IRQ> [ 1388.411935] ? show_regs+0x6e/0x80 [ 1388.411937] ? watchdog_timer_fn+0x1d2/0x240 [ 1388.411939] ? __pfx_watchdog_timer_fn+0x10/0x10 [ 1388.411941] ? __hrtimer_run_queues+0x10e/0x290 [ 1388.411945] ? clockevents_program_event+0xae/0x130 [ 1388.411947] ? hrtimer_interrupt+0x105/0x240 [ 1388.411949] ? __sysvec_apic_timer_interrupt+0x54/0x150 [ 1388.411952] ? sysvec_apic_timer_interrupt+0x7f/0x90 [ 1388.411955] </IRQ> [ 1388.411955] <TASK> [ 1388.411956] ? asm_sysvec_apic_timer_interrupt+0x1f/0x30 [ 1388.411958] ? xp_free+0x1a/0x50 [ 1388.411960] i40e_xsk_clean_rx_ring+0x5d/0x100 [i40e] [ 1388.411968] i40e_clean_rx_ring+0x14c/0x170 [i40e] [ 1388.411977] i40e_queue_pair_disable+0xda/0x260 [i40e] [ 1388.411986] i40e_xsk_pool_setup+0x192/0x1d0 [i40e] [ 1388.411993] i40e_reconfig_rss_queues+0x1f0/0x1450 [i40e] [ 1388.412002] xp_disable_drv_zc+0x73/0xf0 [ 1388.412004] ? mutex_lock+0x17/0x50 [ 1388.412007] xp_release_deferred+0x2b/0xc0 [ 1388.412010] process_one_work+0x178/0x350 [ 1388.412011] ? __pfx_worker_thread+0x10/0x10 [ 1388.412012] worker_thread+0x2f7/0x420 [ 1388.412014] ? __pfx_worker_thread+0x10/0x10 [ 1388.412015] kthread+0xf8/0x130 [ 1388.412017] ? __pfx_kthread+0x10/0x10 [ 1388.412019] ret_from_fork+0x3d/0x60 [ 1388.412021] ? __pfx_kthread+0x10/0x10 [ 1388.412023] ret_from_fork_asm+0x1b/0x30 [ 1388.412026] </TASK> It comes from picking wrong ring entries when cleaning xsk buffers during pool detach. Remove the count_mask logic and use they boundary check when updating next_to_process (which used to be a next_to_clean). Fixes: c8a8ca3408dc ("i40e: remove unnecessary memory writes of the next to clean pointer") Reported-by: Tushar Vyavahare <tushar.vyavahare@intel.com> Tested-by: Tushar Vyavahare <tushar.vyavahare@intel.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20231018163908.40841-1-maciej.fijalkowski@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
a92b96c4 |
|
19-Jul-2023 |
Tirthendu Sarkar <tirthendu.sarkar@intel.com> |
i40e: xsk: add TX multi-buffer support Set eop bit in TX desc command only for the last descriptor of the packet and do not set for all preceding descriptors. Signed-off-by: Tirthendu Sarkar <tirthendu.sarkar@intel.com> Link: https://lore.kernel.org/r/20230719132421.584801-17-maciej.fijalkowski@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
1c9ba9c1 |
|
19-Jul-2023 |
Tirthendu Sarkar <tirthendu.sarkar@intel.com> |
i40e: xsk: add RX multi-buffer support This patch is inspired from the multi-buffer support in non-zc path for i40e as well as from the patch to support zc on ice. Each subsequent frag is added to skb_shared_info of the first frag for possible xdp_prog use as well to xsk buffer list for accessing the buffers in af_xdp. For XDP_PASS, new pages are allocated for frags and contents are copied from memory backed by xsk_buff_pool. Replace next_to_clean with next_to_process as done in non-zc path and advance it for every buffer and change the semantics of next_to_clean to point to the first buffer of a packet. Driver will use next_to_process in the same way next_to_clean was used previously. For the non multi-buffer case, next_to_process and next_to_clean will always be the same since each packet consists of a single buffer. Signed-off-by: Tirthendu Sarkar <tirthendu.sarkar@intel.com> Link: https://lore.kernel.org/r/20230719132421.584801-14-maciej.fijalkowski@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
b028813a |
|
21-Jun-2023 |
Yueh-Shun Li <shamrocklee@posteo.net> |
i40e, xsk: fix comment typo Spell "transmission" properly. Found by searching for keyword "tranm". Signed-off-by: Yueh-Shun Li <shamrocklee@posteo.net> Link: https://lore.kernel.org/r/20230622012627.15050-3-shamrocklee@posteo.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
aae425ef |
|
12-Oct-2022 |
Jan Sokolowski <jan.sokolowski@intel.com> |
i40e: Fix DMA mappings leak During reallocation of RX buffers, new DMA mappings are created for those buffers. steps for reproduction: while : do for ((i=0; i<=8160; i=i+32)) do ethtool -G enp130s0f0 rx $i tx $i sleep 0.5 ethtool -g enp130s0f0 done done This resulted in crash: i40e 0000:01:00.1: Unable to allocate memory for the Rx descriptor ring, size=65536 Driver BUG WARNING: CPU: 0 PID: 4300 at net/core/xdp.c:141 xdp_rxq_info_unreg+0x43/0x50 Call Trace: i40e_free_rx_resources+0x70/0x80 [i40e] i40e_set_ringparam+0x27c/0x800 [i40e] ethnl_set_rings+0x1b2/0x290 genl_family_rcv_msg_doit.isra.15+0x10f/0x150 genl_family_rcv_msg+0xb3/0x160 ? rings_fill_reply+0x1a0/0x1a0 genl_rcv_msg+0x47/0x90 ? genl_family_rcv_msg+0x160/0x160 netlink_rcv_skb+0x4c/0x120 genl_rcv+0x24/0x40 netlink_unicast+0x196/0x230 netlink_sendmsg+0x204/0x3d0 sock_sendmsg+0x4c/0x50 __sys_sendto+0xee/0x160 ? handle_mm_fault+0xbe/0x1e0 ? syscall_trace_enter+0x1d3/0x2c0 __x64_sys_sendto+0x24/0x30 do_syscall_64+0x5b/0x1a0 entry_SYSCALL_64_after_hwframe+0x65/0xca RIP: 0033:0x7f5eac8b035b Missing register, driver bug WARNING: CPU: 0 PID: 4300 at net/core/xdp.c:119 xdp_rxq_info_unreg_mem_model+0x69/0x140 Call Trace: xdp_rxq_info_unreg+0x1e/0x50 i40e_free_rx_resources+0x70/0x80 [i40e] i40e_set_ringparam+0x27c/0x800 [i40e] ethnl_set_rings+0x1b2/0x290 genl_family_rcv_msg_doit.isra.15+0x10f/0x150 genl_family_rcv_msg+0xb3/0x160 ? rings_fill_reply+0x1a0/0x1a0 genl_rcv_msg+0x47/0x90 ? genl_family_rcv_msg+0x160/0x160 netlink_rcv_skb+0x4c/0x120 genl_rcv+0x24/0x40 netlink_unicast+0x196/0x230 netlink_sendmsg+0x204/0x3d0 sock_sendmsg+0x4c/0x50 __sys_sendto+0xee/0x160 ? handle_mm_fault+0xbe/0x1e0 ? syscall_trace_enter+0x1d3/0x2c0 __x64_sys_sendto+0x24/0x30 do_syscall_64+0x5b/0x1a0 entry_SYSCALL_64_after_hwframe+0x65/0xca RIP: 0033:0x7f5eac8b035b This was caused because of new buffers with different RX ring count should substitute older ones, but those buffers were freed in i40e_configure_rx_ring and reallocated again with i40e_alloc_rx_bi, thus kfree on rx_bi caused leak of already mapped DMA. Fix this by reallocating ZC with rx_bi_zc struct when BPF program loads. Additionally reallocate back to rx_bi when BPF program unloads. If BPF program is loaded/unloaded and XSK pools are created, reallocate RX queues accordingly in XSP_SETUP_XSK_POOL handler. Fixes: be1222b585fd ("i40e: Separate kernel allocated rx_bi rings from AF_XDP rings") Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Chandan <chandanx.rout@intel.com> (A Contingent Worker at Intel) Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
78f31931 |
|
23-Jun-2022 |
Ciara Loftus <ciara.loftus@intel.com> |
i40e: read the XDP program once per NAPI Similar to how it's done in the ice driver since 'eb087cd82864 ("ice: propagate xdp_ring onto rx_ring")', read the XDP program once per NAPI instead of once per descriptor cleaned. I measured an improvement in throughput of 2% for the AF_XDP xdpsock l2fwd benchmark for zero copy mode and 1% for copy mode. Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Link: https://lore.kernel.org/r/20220623100852.7867-1-ciara.loftus@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
9d87e41a |
|
21-Apr-2022 |
Maciej Fijalkowski <maciej.fijalkowski@intel.com> |
i40e, xsk: Get rid of redundant 'fallthrough' Intel drivers translate actions returned from XDP programs to their own return codes that have the following mapping: XDP_REDIRECT -> I40E_XDP_{REDIR,CONSUMED} XDP_TX -> I40E_XDP_{TX,CONSUMED} XDP_DROP -> I40E_XDP_CONSUMED XDP_ABORTED -> I40E_XDP_CONSUMED XDP_PASS -> I40E_XDP_PASS Commit b8aef650e549 ("i40e, xsk: Terminate Rx side of NAPI when XSK Rx queue gets full") introduced new translation XDP_REDIRECT -> I40E_XDP_EXIT which is set when XSK RQ gets full and to indicate that driver should stop further Rx processing. This happens for unsuccessful xdp_do_redirect() so it is valuable to call trace_xdp_exception() for this case. In order to avoid I40E_XDP_EXIT -> IXGBE_XDP_CONSUMED overwrite, XDP_DROP case was moved above which in turn made the 'fallthrough' that is in XDP_ABORTED useless as it became the last label in the switch statement. Simply drop this leftover. Fixes: b8aef650e549 ("i40e, xsk: Terminate Rx side of NAPI when XSK Rx queue gets full") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220421132126.471515-3-maciej.fijalkowski@intel.com
|
#
ed7ae2d6 |
|
13-Apr-2022 |
Maciej Fijalkowski <maciej.fijalkowski@intel.com> |
i40e, xsk: Diversify return values from xsk_wakeup call paths Currently, when debugging AF_XDP workloads, one can correlate the -ENXIO return code as the case that XSK is not in the bound state. Returning same code from ndo_xsk_wakeup can be misleading and simply makes it harder to follow what is going on. Change ENXIOs in i40e's ndo_xsk_wakeup() implementation to EINVALs, so that when probing it is clear that something is wrong on the driver side, not the xsk_{recv,send}msg. There is a -ENETDOWN that can happen from both kernel/driver sides though, but I don't have a correct replacement for this on one of the sides, so let's keep it that way. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220413153015.453864-10-maciej.fijalkowski@intel.com
|
#
b8aef650 |
|
13-Apr-2022 |
Maciej Fijalkowski <maciej.fijalkowski@intel.com> |
i40e, xsk: Terminate Rx side of NAPI when XSK Rx queue gets full When XSK pool uses need_wakeup feature, correlate -ENOBUFS that was returned from xdp_do_redirect() with a XSK Rx queue being full. In such case, terminate the Rx processing that is being done on the current HW Rx ring and let the user space consume descriptors from XSK Rx queue so that there is room that driver can use later on. Introduce new internal return code I40E_XDP_EXIT that will indicate case described above. Note that it does not affect Tx processing that is bound to the same NAPI context, nor the other Rx rings. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220413153015.453864-7-maciej.fijalkowski@intel.com
|
#
7e1b54d0 |
|
18-Feb-2022 |
Alexander Lobakin <alexandr.lobakin@intel.com> |
i40e: remove dead stores on XSK hotpath The 'if (ntu == rx_ring->count)' block in i40e_alloc_rx_buffers_zc() was previously residing in the loop, but after introducing the batched interface it is used only to wrap-around the NTU descriptor, thus no more need to assign 'xdp'. 'cleaned_count' in i40e_clean_rx_irq_zc() was previously being incremented in the loop, but after commit f12738b6ec06 ("i40e: remove unnecessary cleaned_count updates") it gets assigned only once after it, so the initialization can be dropped. Fixes: 6aab0bb0c5cd ("i40e: Use the xsk batched rx allocation interface") Fixes: f12738b6ec06 ("i40e: remove unnecessary cleaned_count updates") Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6dba2953 |
|
08-Dec-2021 |
Alexander Lobakin <alexandr.lobakin@intel.com> |
i40e: respect metadata on XSK Rx to skb For now, if the XDP prog returns XDP_PASS on XSK, the metadata will be lost as it doesn't get copied to the skb. Copy it along with the frame headers. Account its size on skb allocation, and when copying just treat it as a part of the frame and do a pull after to "move" it to the "reserved" zone. net_prefetch() xdp->data_meta and align the copy size to speed-up memcpy() a little and better match i40e_construct_skb(). Fixes: 0a714186d3c0 ("i40e: add AF_XDP zero-copy Rx support") Suggested-by: Jesper Dangaard Brouer <brouer@redhat.com> Suggested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
#
bc97f9c6 |
|
08-Dec-2021 |
Alexander Lobakin <alexandr.lobakin@intel.com> |
i40e: don't reserve excessive XDP_PACKET_HEADROOM on XSK Rx to skb {__,}napi_alloc_skb() allocates and reserves additional NET_SKB_PAD + NET_IP_ALIGN for any skb. OTOH, i40e_construct_skb_zc() currently allocates and reserves additional `xdp->data - xdp->data_hard_start`, which is XDP_PACKET_HEADROOM for XSK frames. There's no need for that at all as the frame is post-XDP and will go only to the networking stack core. Pass the size of the actual data only to __napi_alloc_skb() and don't reserve anything. This will give enough headroom for stack processing. Fixes: 0a714186d3c0 ("i40e: add AF_XDP zero-copy Rx support") Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
#
d1bc532e |
|
25-Jan-2022 |
Magnus Karlsson <magnus.karlsson@intel.com> |
i40e: xsk: Move tmp desc array from driver to pool Move desc_array from the driver to the pool. The reason behind this is that we can then reuse this array as a temporary storage for descriptors in all zero-copy drivers that use the batched interface. This will make it easier to add batching to more drivers. i40e is the only driver that has a batched Tx zero-copy implementation, so no need to touch any other driver. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Link: https://lore.kernel.org/bpf/20220125160446.78976-6-maciej.fijalkowski@intel.com
|
#
c8064e5b |
|
30-Nov-2021 |
Paolo Abeni <pabeni@redhat.com> |
bpf: Let bpf_warn_invalid_xdp_action() report more info In non trivial scenarios, the action id alone is not sufficient to identify the program causing the warning. Before the previous patch, the generated stack-trace pointed out at least the involved device driver. Let's additionally include the program name and id, and the relevant device name. If the user needs additional infos, he can fetch them via a kernel probe, leveraging the arguments added here. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/ddb96bb975cbfddb1546cf5da60e77d5100b533c.1638189075.git.pabeni@redhat.com
|
#
3c6f3ae3 |
|
29-Sep-2021 |
Yang Li <yang.lee@linux.alibaba.com> |
intel: Simplify bool conversion Fix the following coccicheck warning: ./drivers/net/ethernet/intel/i40e/i40e_xsk.c:229:35-40: WARNING: conversion to bool not needed here ./drivers/net/ethernet/intel/ice/ice_xsk.c:399:35-40: WARNING: conversion to bool not needed here Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Tested-by: Tony Brelinski <tony.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
#
6aab0bb0 |
|
22-Sep-2021 |
Magnus Karlsson <magnus.karlsson@intel.com> |
i40e: Use the xsk batched rx allocation interface Use the new xsk batched rx allocation interface for the zero-copy data path. As the array of struct xdp_buff pointers kept by the driver is really a ring that wraps, the allocation routine is modified to detect a wrap and in that case call the allocation function twice. The allocation function cannot deal with wrapped rings, only arrays. As we now know exactly how many buffers we get and that there is no wrapping, the allocation function can be simplified even more as all if-statements in the allocation loop can be removed, improving performance. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210922075613.12186-6-magnus.karlsson@gmail.com
|
#
49589b23 |
|
24-Jun-2021 |
Toke Høiland-Jørgensen <toke@redhat.com> |
intel: Remove rcu_read_lock() around XDP program invocation The Intel drivers all have rcu_read_lock()/rcu_read_unlock() pairs around XDP program invocations. However, the actual lifetime of the objects referred by the XDP program invocation is longer, all the way through to the call to xdp_do_flush(), making the scope of the rcu_read_lock() too small. This turns out to be harmless because it all happens in a single NAPI poll cycle (and thus under local_bh_disable()), but it makes the rcu_read_lock() misleading. Rather than extend the scope of the rcu_read_lock(), just get rid of it entirely. With the addition of RCU annotations to the XDP_REDIRECT map types that take bh execution into account, lockdep even understands this to be safe, so there's really no reason to keep it around. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Jesper Dangaard Brouer <brouer@redhat.com> # i40e Cc: Jesse Brandeburg <jesse.brandeburg@intel.com> Cc: Tony Nguyen <anthony.l.nguyen@intel.com> Cc: intel-wired-lan@lists.osuosl.org Link: https://lore.kernel.org/bpf/20210624160609.292325-12-toke@redhat.com
|
#
f6c10b48 |
|
10-May-2021 |
Magnus Karlsson <magnus.karlsson@intel.com> |
i40e: add correct exception tracing for XDP Add missing exception tracing to XDP when a number of different errors can occur. The support was only partial. Several errors where not logged which would confuse the user quite a lot not knowing where and why the packets disappeared. Fixes: 74608d17fe29 ("i40e: add support for XDP_TX action") Fixes: 0a714186d3c0 ("i40e: add AF_XDP zero-copy Rx support") Reported-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
#
262de08f |
|
18-Mar-2021 |
Jesse Brandeburg <jesse.brandeburg@intel.com> |
intel: clean up mismatched header comments A bunch of header comments were showing warnings when compiling with W=1. Fix them all at once. This changes only comments. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
#
346497c7 |
|
02-Dec-2020 |
Magnus Karlsson <magnus.karlsson@intel.com> |
i40e: optimize for XDP_REDIRECT in xsk path Optimize i40e_run_xdp_zc() for the XDP program verdict being XDP_REDIRECT in the xsk zero-copy path. This path is only used when having AF_XDP zero-copy on and in that case most packets will be directed to user space. This provides a little over 100k extra packets in throughput on my server when running l2fwd in xdpsock. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
#
528060ef |
|
19-Mar-2021 |
Magnus Karlsson <magnus.karlsson@intel.com> |
i40e: fix receiving of single packets in xsk zero-copy mode Fix so that single packets are received immediately instead of in batches of 8. If you sent 1 pps to a system, you received 8 packets every 8 seconds instead of 1 packet every second. The problem behind this was that the work_done reporting from the Tx part of the driver was broken. The work_done reporting in i40e controls not only the reporting back to the napi logic but also the setting of the interrupt throttling logic. When Tx or Rx reports that it has more to do, interrupts are throttled or coalesced and when they both report that they are done, interrupts are armed right away. If the wrong work_done value is returned, the logic will start to throttle interrupts in a situation where it should have just enabled them. This leads to the undesired batching behavior seen in user-space. Fix this by returning the correct boolean value from the Tx xsk zero-copy path. Return true if there is nothing to do or if we got fewer packets to process than we asked for. Return false if we got as many packets as the budget since there might be more packets we can process. Fixes: 3106c580fb7c ("i40e: Use batched xsk Tx interfaces to increase performance") Reported-by: Sreedevi Joshi <sreedevi.joshi@intel.com> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
#
b32cddd2 |
|
05-Feb-2021 |
Norbert Ciosek <norbertx.ciosek@intel.com> |
i40e: Fix endianness conversions Fixes the following sparse warnings: i40e_main.c:5953:32: warning: cast from restricted __le16 i40e_main.c:8008:29: warning: incorrect type in assignment (different base types) i40e_main.c:8008:29: expected unsigned int [assigned] [usertype] ipa i40e_main.c:8008:29: got restricted __le32 [usertype] i40e_main.c:8008:29: warning: incorrect type in assignment (different base types) i40e_main.c:8008:29: expected unsigned int [assigned] [usertype] ipa i40e_main.c:8008:29: got restricted __le32 [usertype] i40e_txrx.c:1950:59: warning: incorrect type in initializer (different base types) i40e_txrx.c:1950:59: expected unsigned short [usertype] vlan_tag i40e_txrx.c:1950:59: got restricted __le16 [usertype] l2tag1 i40e_txrx.c:1953:40: warning: cast to restricted __le16 i40e_xsk.c:448:38: warning: invalid assignment: |= i40e_xsk.c:448:38: left side has type restricted __le64 i40e_xsk.c:448:38: right side has type int Fixes: 2f4b411a3d67 ("i40e: Enable cloud filters via tc-flower") Fixes: 2a508c64ad27 ("i40e: fix VLAN.TCI == 0 RX HW offload") Fixes: 3106c580fb7c ("i40e: Use batched xsk Tx interfaces to increase performance") Fixes: 8f88b3034db3 ("i40e: Add infrastructure for queue channel support") Signed-off-by: Norbert Ciosek <norbertx.ciosek@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
#
f892a9af |
|
18-Jan-2021 |
Björn Töpel <bjorn@kernel.org> |
i40e: Simplify the do-while allocation loop Fold the count decrement into the while-statement. Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
#
f020fa1a |
|
14-Jan-2021 |
Cristian Dumitrescu <cristian.dumitrescu@intel.com> |
i40e: consolidate handling of XDP program actions Consolidate the actions performed on the packet based on the XDP program result into a separate function that is easier to read and maintain. Simplify the i40e_construct_skb_zc function, so that the input xdp buffer is always freed, regardless of whether the output skb is successfully created or not. Simplify the behavior of the i40e_clean_rx_irq_zc function, so that the current packet descriptor is dropped when function i40_construct_skb_zc returns an error as opposed to re-processing the same description on the next invocation. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
#
d4178c31 |
|
14-Jan-2021 |
Cristian Dumitrescu <cristian.dumitrescu@intel.com> |
i40e: remove the redundant buffer info updates For performance reasons, remove the redundant buffer info updates (*bi = NULL). The buffers ready to be cleaned can easily be tracked based on the ring next-to-clean variable, which is consistently updated. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
#
f12738b6 |
|
14-Jan-2021 |
Cristian Dumitrescu <cristian.dumitrescu@intel.com> |
i40e: remove unnecessary cleaned_count updates For performance reasons, remove the redundant updates of the cleaned_count variable, as its value can be computed based on the ring next-to-clean variable, which is consistently updated. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
#
c8a8ca34 |
|
14-Jan-2021 |
Cristian Dumitrescu <cristian.dumitrescu@intel.com> |
i40e: remove unnecessary memory writes of the next to clean pointer For performance reasons, avoid writing the ring next-to-clean pointer value back to memory on every update, as it is not really necessary. Instead, simply read it at initialization into a local copy, update the local copy as necessary and write the local copy back to memory after the last update. Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
#
7128c834 |
|
11-Jan-2021 |
Cristian Dumitrescu <cristian.dumitrescu@intel.com> |
i40e: fix potential NULL pointer dereferencing Currently, the function i40e_construct_skb_zc only frees the input xdp buffer when the output skb is successfully built. On error, the function i40e_clean_rx_irq_zc does not commit anything for the current packet descriptor and simply exits the packet descriptor processing loop, with the plan to restart the processing of this descriptor on the next invocation. Therefore, on error the ring next-to-clean pointer should not advance, the xdp i.e. *bi buffer should not be freed and the current buffer info should not be invalidated by setting *bi to NULL. Therefore, the *bi should only be set to NULL when the function i40e_construct_skb_zc is successful, otherwise a NULL *bi will be dereferenced when the work for the current descriptor is eventually restarted. Fixes: 3b4f0b66c2b3 ("i40e, xsk: Migrate to new MEM_TYPE_XSK_BUFF_POOL") Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/r/20210111181138.49757-1-cristian.dumitrescu@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
64050b5b |
|
11-Dec-2020 |
Björn Töpel <bjorn@kernel.org> |
i40e, xsk: clear the status bits for the next_to_use descriptor On the Rx side, the next_to_use index points to the next item in the HW ring to be refilled/allocated, and next_to_clean points to the next item to potentially be processed. When the HW Rx ring is fully refilled, i.e. no packets has been processed, the next_to_use will be next_to_clean - 1. When the ring is fully processed next_to_clean will be equal to next_to_use. The latter case is where a bug is triggered. If the next_to_use bits are not cleared, and the "fully processed" state is entered, a stale descriptor can be processed. The skb-path correctly clear the status bit for the next_to_use descriptor, but the AF_XDP zero-copy path did not do that. This change adds the status bits clearing of the next_to_use descriptor. Fixes: 3b4f0b66c2b3 ("i40e, xsk: Migrate to new MEM_TYPE_XSK_BUFF_POOL") Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
088d5360 |
|
08-Sep-2020 |
Marek Majtyka <marekx.majtyka@intel.com> |
i40e: remove redundant assignment Remove a redundant assignment of the software ring pointer in the i40e driver. The variable is assigned twice with no use in between, so just get rid of the first occurrence. Fixes: 3b4f0b66c2b3 ("i40e, xsk: Migrate to new MEM_TYPE_XSK_BUFF_POOL") Signed-off-by: Marek Majtyka <marekx.majtyka@intel.com> Acked-by: Björn Töpel <bjorn.topel@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
#
3106c580 |
|
15-Nov-2020 |
Magnus Karlsson <magnus.karlsson@intel.com> |
i40e: Use batched xsk Tx interfaces to increase performance Use the new batched xsk interfaces for the Tx path in the i40e driver to improve performance. On my machine, this yields a throughput increase of 4% for the l2fwd sample app in xdpsock. If we instead just look at the Tx part, this patch set increases throughput with above 20% for Tx. Note that I had to explicitly loop unroll the inner loop to get to this performance level, by using a pragma. It is honored by both clang and gcc and should be ignored by versions that do not support it. Using the -funroll-loops compiler command line switch on the source file resulted in a loop unrolling on a higher level that lead to a performance decrease instead of an increase. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/1605525167-14450-6-git-send-email-magnus.karlsson@gmail.com
|
#
f320460b |
|
15-Nov-2020 |
Magnus Karlsson <magnus.karlsson@intel.com> |
i40e: Remove unnecessary sw_ring access from xsk Tx Remove the unnecessary access to the software ring for the AF_XDP zero-copy driver. This was used to record the length of the packet so that the driver Tx completion code could sum this up to produce the total bytes sent. This is now performed during the transmission of the packet, so no need to record this in the software ring. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/1605525167-14450-3-git-send-email-magnus.karlsson@gmail.com
|
#
1773482f |
|
16-Sep-2020 |
Dan Carpenter <dan.carpenter@oracle.com> |
i40e, xsk: uninitialized variable in i40e_clean_rx_irq_zc() The "failure" variable is used without being initialized. It should be set to false. Fixes: 8cbf74149903 ("i40e, xsk: move buffer allocation out of the Rx processing loop") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Björn Töpel <bjorn.topel@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
#
8cbf7414 |
|
25-Aug-2020 |
Björn Töpel <bjorn@kernel.org> |
i40e, xsk: move buffer allocation out of the Rx processing loop Instead of checking in each iteration of the Rx packet processing loop, move the allocation out of the loop and do it once for each napi activation. For AF_XDP the rx_drop benchmark was improved by 6%. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
#
f78bd130 |
|
25-Aug-2020 |
Björn Töpel <bjorn@kernel.org> |
i40e, xsk: remove HW descriptor prefetch in AF_XDP path The software prefetching of HW descriptors has a negative impact on the performance. Therefore, it is now removed. Performance for the rx_drop benchmark increased with 2%. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
#
9647c57b |
|
28-Aug-2020 |
Magnus Karlsson <magnus.karlsson@intel.com> |
xsk: i40e: ice: ixgbe: mlx5: Test for dma_need_sync earlier for better performance Test for dma_need_sync earlier to increase performance. xsk_buff_dma_sync_for_cpu() takes an xdp_buff as parameter and from that the xsk_buff_pool reference is dug out. Perf shows that this dereference causes a lot of cache misses. But as the buffer pool is now sent down to the driver at zero-copy initialization time, we might as well use this pointer directly, instead of going via the xsk_buff and we can do so already in xsk_buff_dma_sync_for_cpu() instead of in xp_dma_sync_for_cpu. This gets rid of these cache misses. Throughput increases with 3% for the xdpsock l2fwd sample application on my machine. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-11-git-send-email-magnus.karlsson@intel.com
|
#
c4655761 |
|
28-Aug-2020 |
Magnus Karlsson <magnus.karlsson@intel.com> |
xsk: i40e: ice: ixgbe: mlx5: Rename xsk zero-copy driver interfaces Rename the AF_XDP zero-copy driver interface functions to better reflect what they do after the replacement of umems with buffer pools in the previous commit. Mostly it is about replacing the umem name from the function names with xsk_buff and also have them take the a buffer pool pointer instead of a umem. The various ring functions have also been renamed in the process so that they have the same naming convention as the internal functions in xsk_queue.h. This so that it will be clearer what they do and also for consistency. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-3-git-send-email-magnus.karlsson@intel.com
|
#
1742b3d5 |
|
28-Aug-2020 |
Magnus Karlsson <magnus.karlsson@intel.com> |
xsk: i40e: ice: ixgbe: mlx5: Pass buffer pool to driver instead of umem Replace the explicit umem reference passed to the driver in AF_XDP zero-copy mode with the buffer pool instead. This in preparation for extending the functionality of the zero-copy mode so that umems can be shared between queues on the same netdev and also between netdevs. In this commit, only an umem reference has been added to the buffer pool struct. But later commits will add other entities to it. These are going to be entities that are different between different queue ids and netdevs even though the umem is shared between them. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-2-git-send-email-magnus.karlsson@intel.com
|
#
1fd972eb |
|
23-Jun-2020 |
Magnus Karlsson <magnus.karlsson@intel.com> |
i40e: move check of full Tx ring to outside of send loop Move the check if the HW Tx ring is full to outside the send loop. Currently it is checked for every single descriptor that we send. Instead, tell the send loop to only process a maximum number of packets equal to the number of available slots in the Tx ring. This way, we can remove the check inside the send loop to and gain some performance. Suggested-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
#
5574ff7b |
|
23-Jun-2020 |
Magnus Karlsson <magnus.karlsson@intel.com> |
i40e: optimize AF_XDP Tx completion path Improve the performance of the AF_XDP zero-copy Tx completion path. When there are no XDP buffers being sent using XDP_TX or XDP_REDIRECT, we do not have go through the SW ring to clean up any entries since the AF_XDP path does not use these. In these cases, just fast forward the next-to-use counter and skip going through the SW ring. The limit on the maximum number of entries to complete is also removed since the algorithm is now O(1). To simplify the code path, the maximum number of entries to complete for the XDP path is therefore also increased from 256 to 512 (the default number of Tx HW descriptors). This should be fine since the completion in the XDP path is faster than in the SKB path that has 256 as the maximum number. This patch provides around 4% throughput improvement for the l2fwd application in xdpsock on my machine. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
#
5463fce6 |
|
03-Jun-2020 |
Jeff Kirsher <jeffrey.t.kirsher@intel.com> |
ethernet/intel: Convert fallthrough code comments Convert all the remaining 'fall through" code comments to the newer 'fallthrough;' keyword. Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
#
e92c0e02 |
|
16-Mar-2020 |
Jesper Dangaard Brouer <brouer@redhat.com> |
i40e: trivial fixup of comments in i40e_xsk.c The comment above i40e_run_xdp_zc() was clearly copy-pasted from function i40e_xsk_umem_setup, which is just above. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Björn Töpel <bjorn.topel@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
3b4f0b66 |
|
20-May-2020 |
Björn Töpel <bjorn@kernel.org> |
i40e, xsk: Migrate to new MEM_TYPE_XSK_BUFF_POOL Remove MEM_TYPE_ZERO_COPY in favor of the new MEM_TYPE_XSK_BUFF_POOL APIs. The AF_XDP zero-copy rx_bi ring is now simply a struct xdp_buff pointer. v4->v5: Fixed "warning: Excess function parameter 'bi' description in 'i40e_construct_skb_zc'". (Jakub) Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: intel-wired-lan@lists.osuosl.org Link: https://lore.kernel.org/bpf/20200520192103.355233-9-bjorn.topel@gmail.com
|
#
be1222b5 |
|
20-May-2020 |
Björn Töpel <bjorn@kernel.org> |
i40e: Separate kernel allocated rx_bi rings from AF_XDP rings Continuing the path to support MEM_TYPE_XSK_BUFF_POOL, the AF_XDP zero-copy/sk_buff rx_bi rings are now separate. Functions to properly allocate the different rings are added as well. v3->v4: Made i40e_fd_handle_status() static. (kbuild test robot) v4->v5: Fix kdoc for i40e_clean_programming_status(). (Jakub) Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: intel-wired-lan@lists.osuosl.org Link: https://lore.kernel.org/bpf/20200520192103.355233-8-bjorn.topel@gmail.com
|
#
e1675f97 |
|
20-May-2020 |
Björn Töpel <bjorn@kernel.org> |
i40e: Refactor rx_bi accesses As a first step to migrate i40e to the new MEM_TYPE_XSK_BUFF_POOL APIs, code that accesses the rx_bi (SW/shadow ring) is refactored to use an accessor function. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: intel-wired-lan@lists.osuosl.org Link: https://lore.kernel.org/bpf/20200520192103.355233-7-bjorn.topel@gmail.com
|
#
a71506a4 |
|
20-May-2020 |
Magnus Karlsson <magnus.karlsson@intel.com> |
xsk: Move driver interface to xdp_sock_drv.h Move the AF_XDP zero-copy driver interface to its own include file called xdp_sock_drv.h. This, hopefully, will make it more clear for NIC driver implementors to know what functions to use for zero-copy support. v4->v5: Fix -Wmissing-prototypes by include header file. (Jakub) Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200520192103.355233-4-bjorn.topel@gmail.com
|
#
2a637c5b |
|
13-May-2020 |
Jesper Dangaard Brouer <brouer@redhat.com> |
xdp: For Intel AF_XDP drivers add XDP frame_sz Intel drivers implement native AF_XDP zerocopy in separate C-files, that have its own invocation of bpf_prog_run_xdp(). The setup of xdp_buff is also handled in separately from normal code path. This patch update XDP frame_sz for AF_XDP zerocopy drivers i40e, ice and ixgbe, as the code changes needed are very similar. Introduce a helper function xsk_umem_xdp_frame_sz() for calculating frame size. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Björn Töpel <bjorn.topel@intel.com> Cc: intel-wired-lan@lists.osuosl.org Cc: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/158945347511.97035.8536753731329475655.stgit@firesoul
|
#
c77e9f09 |
|
04-Feb-2020 |
Maciej Fijalkowski <maciej.fijalkowski@intel.com> |
i40e: Relax i40e_xsk_wakeup's return value when PF is busy Return -EAGAIN instead of -ENETDOWN to provide a slightly milder information to user space so that an application will know to retry the syscall when __I40E_CONFIG_BUSY bit is set on pf->state. Fixes: b3873a5be757 ("net/i40e: Fix concurrency issues between config flow and XSK") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/20200205045834.56795-2-maciej.fijalkowski@intel.com
|
#
f8509aa0 |
|
19-Dec-2019 |
Magnus Karlsson <magnus.karlsson@intel.com> |
xsk: ixgbe: i40e: ice: mlx5: Xsk_umem_discard_addr to xsk_umem_release_addr Change the name of xsk_umem_discard_addr to xsk_umem_release_addr to better reflect the new naming of the AF_XDP queue manipulation functions. As this functions is used by drivers implementing support for AF_XDP zero-copy, it requires a name change to these drivers. The function xsk_umem_release_addr_rq has also changed name in the same fashion. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/1576759171-28550-10-git-send-email-magnus.karlsson@intel.com
|
#
b3873a5b |
|
17-Dec-2019 |
Maxim Mikityanskiy <maximmi@mellanox.com> |
net/i40e: Fix concurrency issues between config flow and XSK Use synchronize_rcu to wait until the XSK wakeup function finishes before destroying the resources it uses: 1. i40e_down already calls synchronize_rcu. On i40e_down either __I40E_VSI_DOWN or __I40E_CONFIG_BUSY is set. Check the latter in i40e_xsk_wakeup (the former is already checked there). 2. After switching the XDP program, call synchronize_rcu to let i40e_xsk_wakeup exit before the XDP program is freed. 3. Changing the number of channels brings the interface down (see i40e_prep_for_reset and i40e_pf_quiesce_all_vsi). 4. Disabling UMEM sets __I40E_CONFIG_BUSY, too. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191217162023.16011-4-maximmi@mellanox.com
|
#
70563957 |
|
08-Nov-2019 |
Magnus Karlsson <magnus.karlsson@intel.com> |
i40e: need_wakeup flag might not be set for Tx The need_wakeup flag for Tx might not be set for AF_XDP sockets that are only used to send packets. This happens if there is at least one outstanding packet that has not been completed by the hardware and we get that corresponding completion (which will not generate an interrupt since interrupts are disabled in the napi poll loop) between the time we stopped processing the Tx completions and interrupts are enabled again. In this case, the need_wakeup flag will have been cleared at the end of the Tx completion processing as we believe we will get an interrupt from the outstanding completion at a later point in time. But if this completion interrupt occurs before interrupts are enable, we lose it and should at that point really have set the need_wakeup flag since there are no more outstanding completions that can generate an interrupt to continue the processing. When this happens, user space will see a Tx queue need_wakeup of 0 and skip issuing a syscall, which means will never get into the Tx processing again and we have a deadlock. This patch introduces a quick fix for this issue by just setting the need_wakeup flag for Tx to 1 all the time. I am working on a proper fix for this that will toggle the flag appropriately, but it is more challenging than I anticipated and I am afraid that this patch will not be completed before the merge window closes, therefore this easier fix for now. This fix has a negative performance impact in the range of 0% to 4%. Towards the higher end of the scale if you have driver and application on the same core and issue a lot of packets, and towards no negative impact if you use two cores, lower transmission speeds and/or a workload that also receives packets. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
2c19e395 |
|
07-Oct-2019 |
Jeff Kirsher <jeffrey.t.kirsher@intel.com> |
i40e: Fix receive buffer starvation for AF_XDP Magnus's fix to resolve a potential receive buffer starvation for AF_XDP got applied to both the i40e_xsk_umem_enable/disable() functions, when it should have only been applied to the "enable". So clean up the undesired code in the disable function. CC: Magnus Karlsson <magnus.karlsson@intel.com> Fixes: 1f459bdc2007 ("i40e: fix potential RX buffer starvation for AF_XDP") Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
|
#
168dfc3a |
|
13-Sep-2019 |
Ciara Loftus <ciara.loftus@intel.com> |
i40e: fix xdp handle calculations Commit 4c5d9a7fa149 ("i40e: fix xdp handle calculations") reintroduced the addition of the umem headroom to the xdp handle in the i40e_zca_free, i40e_alloc_buffer_slow_zc and i40e_alloc_buffer_zc functions. However, the headroom is already added to the handle in the function i40_run_xdp_zc. This commit removes the latter addition and fixes the case where the headroom is non-zero. Fixes: 4c5d9a7fa149 ("i40e: fix xdp handle calculations") Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
1f459bdc |
|
09-Sep-2019 |
Magnus Karlsson <magnus.karlsson@intel.com> |
i40e: fix potential RX buffer starvation for AF_XDP When the RX rings are created they are also populated with buffers so that packets can be received. Usually these are kernel buffers, but for AF_XDP in zero-copy mode, these are user-space buffers and in this case the application might not have sent down any buffers to the driver at this point. And if no buffers are allocated at ring creation time, no packets can be received and no interrupts will be generated so the NAPI poll function that allocates buffers to the rings will never get executed. To rectify this, we kick the NAPI context of any queue with an attached AF_XDP zero-copy socket in two places in the code. Once after an XDP program has loaded and once after the umem is registered. This take care of both cases: XDP program gets loaded first then AF_XDP socket is created, and the reverse, AF_XDP socket is created first, then XDP program is loaded. Fixes: 0a714186d3c0 ("i40e: add AF_XDP zero-copy Rx support") Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
4c5d9a7f |
|
04-Sep-2019 |
Kevin Laatz <kevin.laatz@intel.com> |
i40e: fix xdp handle calculations Currently, we don't add headroom to the handle in i40e_zca_free, i40e_alloc_buffer_slow_zc and i40e_alloc_buffer_zc. The addition of the headroom to the handle was removed in commit 2f86c806a8a8 ("i40e: modify driver for handling offsets"), which will break things when headroom is non-zero. This patch fixes this and uses xsk_umem_adjust_offset to add it appropritely based on the mode being run. Fixes: 2f86c806a8a8 ("i40e: modify driver for handling offsets") Reported-by: Bjorn Topel <bjorn.topel@intel.com> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com> Acked-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
2f86c806 |
|
26-Aug-2019 |
Kevin Laatz <kevin.laatz@intel.com> |
i40e: modify driver for handling offsets With the addition of the unaligned chunks option, we need to make sure we handle the offsets accordingly based on the mode we are currently running in. This patch modifies the driver to appropriately mask the address for each case. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
10912fc9 |
|
26-Aug-2019 |
Kevin Laatz <kevin.laatz@intel.com> |
i40e: simplify Rx buffer recycle Currently, the dma, addr and handle are modified when we reuse Rx buffers in zero-copy mode. However, this is not required as the inputs to the function are copies, not the original values themselves. As we use the copies within the function, we can use the original 'old_bi' values directly without having to mask and add the headroom. Signed-off-by: Kevin Laatz <kevin.laatz@intel.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
3d0c5f1c |
|
14-Aug-2019 |
Magnus Karlsson <magnus.karlsson@intel.com> |
i40e: add support for AF_XDP need_wakeup feature This patch adds support for the need_wakeup feature of AF_XDP. If the application has told the kernel that it might sleep using the new bind flag XDP_USE_NEED_WAKEUP, the driver will then set this flag if it has no more buffers on the NIC Rx ring and yield to the application. For Tx, it will set the flag if it has no outstanding Tx completion interrupts and return to the application. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
9116e5e2 |
|
14-Aug-2019 |
Magnus Karlsson <magnus.karlsson@intel.com> |
xsk: replace ndo_xsk_async_xmit with ndo_xsk_wakeup This commit replaces ndo_xsk_async_xmit with ndo_xsk_wakeup. This new ndo provides the same functionality as before but with the addition of a new flags field that is used to specifiy if Rx, Tx or both should be woken up. The previous ndo only woke up Tx, as implied by the name. The i40e and ixgbe drivers (which are all the supported ones) are updated with this new interface. This new ndo will be used by the new need_wakeup functionality of XDP sockets that need to be able to wake up both Rx and Tx driver processing. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
4bce4e5c |
|
26-Jun-2019 |
Maxim Mikityanskiy <maximmi@mellanox.com> |
xsk: Return the whole xdp_desc from xsk_umem_consume_tx Some drivers want to access the data transmitted in order to implement acceleration features of the NICs. It is also useful in AF_XDP TX flow. Change the xsk_umem_consume_tx API to return the whole xdp_desc, that contains the data pointer, length and DMA address, instead of only the latter two. Adapt the implementation of i40e and ixgbe to this change. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Cc: Björn Töpel <bjorn.topel@intel.com> Cc: Magnus Karlsson <magnus.karlsson@intel.com> Acked-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
514af5f0 |
|
01-May-2019 |
Gustavo A. R. Silva <gustavo@embeddedor.com> |
i40e: mark expected switch fall-through In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. This patch fixes the following warning: drivers/net/ethernet/intel/i40e/i40e_xsk.c: In function ‘i40e_run_xdp_zc’: drivers/net/ethernet/intel/i40e/i40e_xsk.c:217:3: warning: this statement may fall through [-Wimplicit-fallthrough=] bpf_warn_invalid_xdp_action(act); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/net/ethernet/intel/i40e/i40e_xsk.c:218:2: note: here case XDP_ABORTED: ^~~~ Signed-off-by: "Gustavo A. R. Silva" <gustavo@embeddedor.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
44ddd4f1 |
|
12-Feb-2019 |
Björn Töpel <bjorn@kernel.org> |
i40e: add tracking of AF_XDP ZC state for each queue pair In commit f3fef2b6e1cc ("i40e: Remove umem from VSI") a regression was introduced; When the VSI was reset, the setup code would try to enable AF_XDP ZC unconditionally (as long as there was a umem placed in the netdev._rx struct). Here, we add a bitmap to the VSI that tracks if a certain queue pair has been "zero-copy enabled" via the ndo_bpf. The bitmap is used in i40e_xsk_umem, and enables zero-copy if and only if XDP is enabled, the corresponding qid in the bitmap is set and the umem is non-NULL. Fixes: f3fef2b6e1cc ("i40e: Remove umem from VSI") Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
14ffeb52 |
|
29-Jan-2019 |
Magnus Karlsson <magnus.karlsson@intel.com> |
i40e: fix potential RX buffer starvation for AF_XDP When the RX rings are created they are also populated with buffers so that packets can be received. Usually these are kernel buffers, but for AF_XDP in zero-copy mode, these are user-space buffers and in this case the application might not have sent down any buffers to the driver at this point. And if no buffers are allocated at ring creation time, no packets can be received and no interrupts will be generated so the NAPI poll function that allocates buffers to the rings will never get executed. To rectify this, we kick the NAPI context of any queue with an attached AF_XDP zero-copy socket in two places in the code. Once after an XDP program has loaded and once after the umem is registered. This take care of both cases: XDP program gets loaded first then AF_XDP socket is created, and the reverse, AF_XDP socket is created first, then XDP program is loaded. Fixes: 0a714186d3c0 ("i40e: add AF_XDP zero-copy Rx support") Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
f8ebfaf6 |
|
13-Feb-2019 |
Jan Sokolowski <jan.sokolowski@intel.com> |
net: bpf: remove XDP_QUERY_XSK_UMEM enumerator Commit c9b47cc1fabc ("xsk: fix bug when trying to use both copy and zero-copy on one queue id") moved the umem query code to the AF_XDP core, and therefore removed the need to query the netdevice for a umem. This patch removes XDP_QUERY_XSK_UMEM and all code that implement that behavior, which is just dead code. Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com> Acked-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
f3fef2b6 |
|
18-Dec-2018 |
Jan Sokolowski <jan.sokolowski@intel.com> |
i40e: Remove umem from VSI As current implementation of netdev already contains and provides umems for us, we no longer have the need to contain these structures in i40e_vsi. Refactor the code to operate on netdev-provided umems. Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com> Acked-by: Björn Töpel <bjorn.topel@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
800b8f63 |
|
04-Dec-2018 |
Michał Mirosław <mirq-linux@rere.qmqm.pl> |
i40e: DRY rx_ptype handling code Move rx_ptype extracting to i40e_process_skb_fields() to avoid duplicating the code. Signed-off-by: Michał Mirosław <michal.miroslaw@atendesoftware.pl> Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
2a508c64 |
|
04-Dec-2018 |
Michał Mirosław <mirq-linux@rere.qmqm.pl> |
i40e: fix VLAN.TCI == 0 RX HW offload This fixes two bugs in hardware VLAN offload: 1. VLAN.TCI == 0 was being dropped 2. there was a race between disabling of VLAN RX feature in hardware and processing RX queue, where packets processed in this window could have their VLAN information dropped Fix moves the VLAN handling into i40e_process_skb_fields() to save on duplicated code. i40e_receive_skb() becomes trivial and so is removed. Signed-off-by: Michał Mirosław <michal.miroslaw@atendesoftware.pl> Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
529eb362 |
|
27-Nov-2018 |
Jan Sokolowski <jan.sokolowski@intel.com> |
i40e: fix kerneldoc for xsk methods One method, xsk_umem_setup, had an incorrect kernel doc description, which has been corrected. Also fixes small typos found in the comments. Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
3ab52af5 |
|
07-Sep-2018 |
Björn Töpel <bjorn@kernel.org> |
i40e: disallow changing the number of descriptors when AF_XDP is on When an AF_XDP UMEM is attached to any of the Rx rings, we disallow a user to change the number of descriptors via e.g. "ethtool -G IFNAME". Otherwise, the size of the stash/reuse queue can grow unbounded, which would result in OOM or leaking userspace buffers. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
411dc16f |
|
07-Sep-2018 |
Björn Töpel <bjorn@kernel.org> |
i40e: clean zero-copy XDP Rx ring on shutdown/reset Outstanding Rx descriptors are temporarily stored on a stash/reuse queue. When/if the HW rings comes up again, entries from the stash are used to re-populate the ring. The latter required some restructuring of the allocation scheme for the AF_XDP zero-copy implementation. There is now a fast, and a slow allocation. The "fast allocation" is used from the fast-path and obtains free buffers from the fill ring and the internal recycle mechanism. The "slow allocation" is only used in ring setup, and obtains buffers from the fill ring and the stash (if any). Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
9dbb1370 |
|
07-Sep-2018 |
Björn Töpel <bjorn@kernel.org> |
i40e: clean zero-copy XDP Tx ring on shutdown/reset When the zero-copy enabled XDP Tx ring is torn down, due to configuration changes, outstanding frames on the hardware descriptor ring are queued on the completion ring. The completion ring has a back-pressure mechanism that will guarantee that there is sufficient space on the ring. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
93ee30f3 |
|
31-Aug-2018 |
Magnus Karlsson <magnus.karlsson@intel.com> |
xsk: i40e: get rid of useless struct xdp_umem_props This commit gets rid of the structure xdp_umem_props. It was there to be able to break a dependency at one point, but this is no longer needed. The values in the struct are instead stored directly in the xdp_umem structure. This simplifies the xsk code as well as af_xdp zero-copy drivers and as a bonus gets rid of one internal header file. The i40e driver is also adapted to the new interface in this commit. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
cf484f9f |
|
31-Aug-2018 |
Magnus Karlsson <magnus.karlsson@intel.com> |
i40e: fix possible compiler warning in xsk TX path With certain gcc versions, it was possible to get the warning "'tx_desc' may be used uninitialized in this function" for the i40e_xmit_zc. This was not possible, however this commit simplifies the code path so that this warning is no longer emitted. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
1328dcdd |
|
28-Aug-2018 |
Magnus Karlsson <magnus.karlsson@intel.com> |
i40e: add AF_XDP zero-copy Tx support This patch adds zero-copy Tx support for AF_XDP sockets. It implements the ndo_xsk_async_xmit netdev ndo and performs all the Tx logic from a NAPI context. This means pulling egress packets from the Tx ring, placing the frames on the NIC HW descriptor ring and completing sent frames back to the application via the completion ring. The regular XDP Tx ring is used for AF_XDP as well. This rationale for this is as follows: XDP_REDIRECT guarantees mutual exclusion between different NAPI contexts based on CPU id. In other words, a netdev can XDP_REDIRECT to another netdev with a different NAPI context, since the operation is bound to a specific core and each core has its own hardware ring. As the AF_XDP Tx action is running in the same NAPI context and using the same ring, it will also be protected from XDP_REDIRECT actions with the exact same mechanism. As with AF_XDP Rx, all AF_XDP Tx specific functions are added to i40e_xsk.c. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
0a714186 |
|
28-Aug-2018 |
Björn Töpel <bjorn@kernel.org> |
i40e: add AF_XDP zero-copy Rx support This patch adds zero-copy Rx support for AF_XDP sockets. Instead of allocating buffers of type MEM_TYPE_PAGE_SHARED, the Rx frames are allocated as MEM_TYPE_ZERO_COPY when AF_XDP is enabled for a certain queue. All AF_XDP specific functions are added to a new file, i40e_xsk.c. Note that when AF_XDP zero-copy is enabled, the XDP action XDP_PASS will allocate a new buffer and copy the zero-copy frame prior passing it to the kernel stack. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|