#
7033ada0 |
|
27-Apr-2024 |
Ivan Vecera <ivecera@redhat.com> |
i40e: Refactor argument of i40e_detect_recover_hung() Commit 07d44190a389 ("i40e/i40evf: Detect and recover hung queue scenario") changes i40e_detect_recover_hung() argument type from i40e_pf* to i40e_vsi* to be shareable by both i40e and i40evf. Because the i40evf does not exist anymore and the function is exclusively used by i40e we can revert this change. Reviewed-by: Michal Schmidt <mschmidt@redhat.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
#
ea558de7 |
|
15-Mar-2024 |
Ivan Vecera <ivecera@redhat.com> |
i40e: Enforce software interrupt during busy-poll exit As for ice bug fixed by commit b7306b42beaf ("ice: manage interrupts during poll exit") followed by commit 23be7075b318 ("ice: fix software generating extra interrupts") I'm seeing the similar issue also with i40e driver. In certain situation when busy-loop is enabled together with adaptive coalescing, the driver occasionally misses that there are outstanding descriptors to clean when exiting busy poll. Try to catch the remaining work by triggering a software interrupt when exiting busy poll. No extra interrupts will be generated when busy polling is not used. The issue was found when running sockperf ping-pong tcp test with adaptive coalescing and busy poll enabled (50 as value busy_pool and busy_read sysctl knobs) and results in huge latency spikes with more than 100000us. The fix is inspired from the ice driver and do the following: 1) During napi poll exit in case of busy-poll (napo_complete_done() returns false) this is recorded to q_vector that we were in busy loop. 2) Extends i40e_buildreg_itr() to be able to add an enforced software interrupt into built value 2) In i40e_update_enable_itr() enforces a software interrupt trigger if we are exiting busy poll to catch any pending clean-ups 3) Reuses unused 3rd ITR (interrupt throttle) index and set it to 20K interrupts per second to limit the number of these sw interrupts. Test results ============ Prior: [root@dell-per640-07 net]# sockperf ping-pong -i 10.9.9.1 --tcp -m 1000 --mps=max -t 120 sockperf: == version #3.10-no.git == sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s) [ 0] IP = 10.9.9.1 PORT = 11111 # TCP sockperf: Warmup stage (sending a few dummy messages)... sockperf: Starting test... sockperf: Test end (interrupted by timer) sockperf: Test ended sockperf: [Total Run] RunTime=119.999 sec; Warm up time=400 msec; SentMessages=2438563; ReceivedMessages=2438562 sockperf: ========= Printing statistics for Server No: 0 sockperf: [Valid Duration] RunTime=119.549 sec; SentMessages=2429473; ReceivedMessages=2429473 sockperf: ====> avg-latency=24.571 (std-dev=93.297, mean-ad=4.904, median-ad=1.510, siqr=1.063, cv=3.797, std-error=0.060, 99.0% ci=[24.417, 24.725]) sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0 sockperf: Summary: Latency is 24.571 usec sockperf: Total 2429473 observations; each percentile contains 24294.73 observations sockperf: ---> <MAX> observation = 103294.331 sockperf: ---> percentile 99.999 = 45.633 sockperf: ---> percentile 99.990 = 37.013 sockperf: ---> percentile 99.900 = 35.910 sockperf: ---> percentile 99.000 = 33.390 sockperf: ---> percentile 90.000 = 28.626 sockperf: ---> percentile 75.000 = 27.741 sockperf: ---> percentile 50.000 = 26.743 sockperf: ---> percentile 25.000 = 25.614 sockperf: ---> <MIN> observation = 12.220 After: [root@dell-per640-07 net]# sockperf ping-pong -i 10.9.9.1 --tcp -m 1000 --mps=max -t 120 sockperf: == version #3.10-no.git == sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s) [ 0] IP = 10.9.9.1 PORT = 11111 # TCP sockperf: Warmup stage (sending a few dummy messages)... sockperf: Starting test... sockperf: Test end (interrupted by timer) sockperf: Test ended sockperf: [Total Run] RunTime=119.999 sec; Warm up time=400 msec; SentMessages=2400055; ReceivedMessages=2400054 sockperf: ========= Printing statistics for Server No: 0 sockperf: [Valid Duration] RunTime=119.549 sec; SentMessages=2391186; ReceivedMessages=2391186 sockperf: ====> avg-latency=24.965 (std-dev=5.934, mean-ad=4.642, median-ad=1.485, siqr=1.067, cv=0.238, std-error=0.004, 99.0% ci=[24.955, 24.975]) sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0 sockperf: Summary: Latency is 24.965 usec sockperf: Total 2391186 observations; each percentile contains 23911.86 observations sockperf: ---> <MAX> observation = 195.841 sockperf: ---> percentile 99.999 = 45.026 sockperf: ---> percentile 99.990 = 39.009 sockperf: ---> percentile 99.900 = 35.922 sockperf: ---> percentile 99.000 = 33.482 sockperf: ---> percentile 90.000 = 28.902 sockperf: ---> percentile 75.000 = 27.821 sockperf: ---> percentile 50.000 = 26.860 sockperf: ---> percentile 25.000 = 25.685 sockperf: ---> <MIN> observation = 12.277 Fixes: 0bcd952feec7 ("ethernet/intel: consolidate NAPI and NAPI exit") Reported-by: Hugo Ferreira <hferreir@redhat.com> Reviewed-by: Michal Schmidt <mschmidt@redhat.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
#
0e8b9fdd |
|
13-Nov-2023 |
Ivan Vecera <ivecera@redhat.com> |
i40e: Consolidate hardware capabilities Fields .caps in i40e_hw and .hw_features in i40e_pf both indicate capabilities provided by hardware. Move and merge i40e_pf.hw_features into i40e_hw.caps as this is more appropriate place for them and adjust their names to I40E_HW_CAP_... convention. Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20231113231047.548659-9-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
70756d0a |
|
13-Nov-2023 |
Ivan Vecera <ivecera@redhat.com> |
i40e: Use DECLARE_BITMAP for flags and hw_features fields in i40e_pf Convert flags and hw_features fields from i40e_pf from u32 to bitmaps and their usage to use bit access functions. Changes: - Convert "pf_ptr->(flags|hw_features) & FL" to "test_bit(FL, ...)" - Convert "pf_ptr->(flags|hw_features) |= FL" to "set_bit(FL, ...)" - Convert "pf_ptr->(flags|hw_features) &= ~FL" to "clear_bit(FL, ...)" - Rename flag field to bitno in i40e_priv_flags and adjust ethtool callbacks to work with flags bitmap - Rename flag names where '_ENABLED'->'_ENA' and '_DISABLED'->'_DIS' like in ice driver Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20231113231047.548659-7-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
addca917 |
|
13-Nov-2023 |
Ivan Vecera <ivecera@redhat.com> |
i40e: Remove _t suffix from enum type names Enum type names should not be suffixed by '_t'. Either to use 'typedef enum name name_t' to so plain 'name_t var' instead of 'enum name_t var'. Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20231113231047.548659-6-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
56df3459 |
|
27-Sep-2023 |
Ivan Vecera <ivecera@redhat.com> |
i40e: Remove circular header dependencies and fix headers Similarly as for ice driver [1] there are also circular header dependencies in i40e driver: i40e.h -> i40e_virtchnl_pf.h -> i40e.h Another issue is that i40e header files does not contain their own dependencies on other header files (both private and standard) so their inclusion in .c file require to add these deps in certain order to that .c file to make it compilable. Fix both issues by removal the mentioned circular dependency, by filling i40e headers with their dependencies so they can be placed anywhere in a source code. Additionally remove bunch of includes from i40e.h super header file that are not necessary and include i40e.h only in .c files that really require it. [1] 649c87c6ff52 ("ice: remove circular header dependencies on ice.h") Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
#
1c9ba9c1 |
|
19-Jul-2023 |
Tirthendu Sarkar <tirthendu.sarkar@intel.com> |
i40e: xsk: add RX multi-buffer support This patch is inspired from the multi-buffer support in non-zc path for i40e as well as from the patch to support zc on ice. Each subsequent frag is added to skb_shared_info of the first frag for possible xdp_prog use as well to xsk buffer list for accessing the buffers in af_xdp. For XDP_PASS, new pages are allocated for frags and contents are copied from memory backed by xsk_buff_pool. Replace next_to_clean with next_to_process as done in non-zc path and advance it for every buffer and change the semantics of next_to_clean to point to the first buffer of a packet. Driver will use next_to_process in the same way next_to_clean was used previously. For the non multi-buffer case, next_to_process and next_to_clean will always be the same since each packet consists of a single buffer. Signed-off-by: Tirthendu Sarkar <tirthendu.sarkar@intel.com> Link: https://lore.kernel.org/r/20230719132421.584801-14-maciej.fijalkowski@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
e213ced1 |
|
09-Mar-2023 |
Tirthendu Sarkar <tirthendu.sarkar@intel.com> |
i40e: add support for XDP multi-buffer Rx This patch adds multi-buffer support for the i40e_driver. i40e_clean_rx_irq() is modified to collate all the buffers of a packet before calling the XDP program. xdp_buff is built for the first frag of the packet and subsequent frags are added to it. 'next_to_process' is incremented for all non-EOP frags while 'next_to_clean' stays at the first descriptor of the packet. XDP program is called only on receiving EOP frag. New functions are added for adding frags to xdp_buff and for post processing of the buffers once the xdp prog has run. For XDP_PASS this results in a skb with multiple fragments. i40e_build_skb() builds the skb around xdp buffer that already contains frags data. So i40e_add_rx_frag() helper function is now removed. Since fields before 'dataref' in skb_shared_info are cleared during napi_skb_build(), xdp_update_skb_shared_info() is called to set those. For i40e_construct_skb(), all the frags data needs to be copied from xdp_buffer's shared_skb_info to newly constructed skb's shared_skb_info. This also means 'skb' does not need to be preserved across i40e_napi_poll() calls and hence is removed from i40e_ring structure. Previously i40e_alloc_rx_buffers() was called for every 32 cleaned buffers. For multi-buffers this may not be optimal as there may be more cleaned buffers in each i40e_clean_rx_irq() call. So this is now called when at least half of the ring size has been cleaned. Signed-off-by: Tirthendu Sarkar <tirthendu.sarkar@intel.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
#
01aa49e3 |
|
09-Mar-2023 |
Tirthendu Sarkar <tirthendu.sarkar@intel.com> |
i40e: add xdp_buff to i40e_ring struct Store xdp_buff on Rx ring struct in preparation for XDP multi-buffer support. This will allow us to combine fragmented frames across separate NAPI cycles in the same way as currently skb fragments are handled. This means that skb pointer on Rx ring will become redundant and will be removed in a later patch. As a consequence i40e_trace() now uses xdp instead of skb pointer. Truesize only needs to be calculated for page sizes bigger than 4k as it is always half-page for 4k pages. With xdp_buff on ring, frame size can now be set during xdp_init_buff() and need not be repopulated in each NAPI call for 4k pages. As a consequence i40e_rx_frame_truesize() is now used only for bigger pages. Signed-off-by: Tirthendu Sarkar <tirthendu.sarkar@intel.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
#
e9031f2d |
|
09-Mar-2023 |
Tirthendu Sarkar <tirthendu.sarkar@intel.com> |
i40e: introduce next_to_process to i40e_ring Add a new field called next_to_process in the i40e_ring that is advanced for every buffer and change the semantics of next_to_clean to point to the first buffer of a packet. Driver will use next_to_process in the same way next_to_clean was used previously. For the non multi-buffer case, next_to_process and next_to_clean will always be the same since each packet consists of a single buffer. Signed-off-by: Tirthendu Sarkar <tirthendu.sarkar@intel.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
#
e2843f03 |
|
09-Mar-2023 |
Tirthendu Sarkar <tirthendu.sarkar@intel.com> |
i40e: add pre-xdp page_count in rx_buffer Page count of rx_buffer needs to be stored prior to XDP call to prevent page recycling in case that buffer would be freed within xdp redirect path. Instead of storing it on the stack, now it is stored in the rx_buffer struct. This will help in processing multi-buffers as the page counts of all rx_buffers (of the same packet) don't need to be stored on stack. Signed-off-by: Tirthendu Sarkar <tirthendu.sarkar@intel.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
#
aae425ef |
|
12-Oct-2022 |
Jan Sokolowski <jan.sokolowski@intel.com> |
i40e: Fix DMA mappings leak During reallocation of RX buffers, new DMA mappings are created for those buffers. steps for reproduction: while : do for ((i=0; i<=8160; i=i+32)) do ethtool -G enp130s0f0 rx $i tx $i sleep 0.5 ethtool -g enp130s0f0 done done This resulted in crash: i40e 0000:01:00.1: Unable to allocate memory for the Rx descriptor ring, size=65536 Driver BUG WARNING: CPU: 0 PID: 4300 at net/core/xdp.c:141 xdp_rxq_info_unreg+0x43/0x50 Call Trace: i40e_free_rx_resources+0x70/0x80 [i40e] i40e_set_ringparam+0x27c/0x800 [i40e] ethnl_set_rings+0x1b2/0x290 genl_family_rcv_msg_doit.isra.15+0x10f/0x150 genl_family_rcv_msg+0xb3/0x160 ? rings_fill_reply+0x1a0/0x1a0 genl_rcv_msg+0x47/0x90 ? genl_family_rcv_msg+0x160/0x160 netlink_rcv_skb+0x4c/0x120 genl_rcv+0x24/0x40 netlink_unicast+0x196/0x230 netlink_sendmsg+0x204/0x3d0 sock_sendmsg+0x4c/0x50 __sys_sendto+0xee/0x160 ? handle_mm_fault+0xbe/0x1e0 ? syscall_trace_enter+0x1d3/0x2c0 __x64_sys_sendto+0x24/0x30 do_syscall_64+0x5b/0x1a0 entry_SYSCALL_64_after_hwframe+0x65/0xca RIP: 0033:0x7f5eac8b035b Missing register, driver bug WARNING: CPU: 0 PID: 4300 at net/core/xdp.c:119 xdp_rxq_info_unreg_mem_model+0x69/0x140 Call Trace: xdp_rxq_info_unreg+0x1e/0x50 i40e_free_rx_resources+0x70/0x80 [i40e] i40e_set_ringparam+0x27c/0x800 [i40e] ethnl_set_rings+0x1b2/0x290 genl_family_rcv_msg_doit.isra.15+0x10f/0x150 genl_family_rcv_msg+0xb3/0x160 ? rings_fill_reply+0x1a0/0x1a0 genl_rcv_msg+0x47/0x90 ? genl_family_rcv_msg+0x160/0x160 netlink_rcv_skb+0x4c/0x120 genl_rcv+0x24/0x40 netlink_unicast+0x196/0x230 netlink_sendmsg+0x204/0x3d0 sock_sendmsg+0x4c/0x50 __sys_sendto+0xee/0x160 ? handle_mm_fault+0xbe/0x1e0 ? syscall_trace_enter+0x1d3/0x2c0 __x64_sys_sendto+0x24/0x30 do_syscall_64+0x5b/0x1a0 entry_SYSCALL_64_after_hwframe+0x65/0xca RIP: 0033:0x7f5eac8b035b This was caused because of new buffers with different RX ring count should substitute older ones, but those buffers were freed in i40e_configure_rx_ring and reallocated again with i40e_alloc_rx_bi, thus kfree on rx_bi caused leak of already mapped DMA. Fix this by reallocating ZC with rx_bi_zc struct when BPF program loads. Additionally reallocate back to rx_bi when BPF program unloads. If BPF program is loaded/unloaded and XSK pools are created, reallocate RX queues accordingly in XSP_SETUP_XSK_POOL handler. Fixes: be1222b585fd ("i40e: Separate kernel allocated rx_bi rings from AF_XDP rings") Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Chandan <chandanx.rout@intel.com> (A Contingent Worker at Intel) Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f728fa01 |
|
24-Mar-2022 |
Joe Damato <jdamato@fastly.com> |
i40e: Add tx_stopped stat Track TX queue stop events and export the new stat with ethtool. Signed-off-by: Joe Damato <jdamato@fastly.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
#
b76bc129 |
|
17-Dec-2021 |
Joe Damato <jdamato@fastly.com> |
i40e: Add a stat for tracking busy rx pages In some cases, pages cannot be reused by i40e because the page is busy. Add a counter for this event. Busy page count is accessible via ethtool. Signed-off-by: Joe Damato <jdamato@fastly.com> Tested-by: Dave Switzer <david.switzer@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
#
cb963b98 |
|
17-Dec-2021 |
Joe Damato <jdamato@fastly.com> |
i40e: Add a stat for tracking pages waived In some cases, pages can not be reused because they are not associated with the correct NUMA zone. Knowing how often pages are waived helps users to understand the interaction between the driver's memory usage and their system. Pass rx_stats through to i40e_can_reuse_rx_page to allow tracking when pages are waived. The page waive count is accessible via ethtool. Signed-off-by: Joe Damato <jdamato@fastly.com> Tested-by: Dave Switzer <david.switzer@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
#
453f8305 |
|
17-Dec-2021 |
Joe Damato <jdamato@fastly.com> |
i40e: Add a stat tracking new RX page allocations Add a counter for new page allocations in the i40e RX path. This stat is accessible with ethtool. Signed-off-by: Joe Damato <jdamato@fastly.com> Tested-by: Dave Switzer <david.switzer@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
#
79f227c4 |
|
05-Jan-2022 |
Joe Damato <jdamato@fastly.com> |
i40e: Remove unused RX realloc stat After commit 1a557afc4dd5 ("i40e: Refactor receive routine"), rx_stats.realloc_count is no longer being incremented, so remove it. The debugfs string was left, but hardcoded to 0. This is intended to prevent breaking any existing code / scripts that are parsing debugfs for i40e. Signed-off-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
#
d1bc532e |
|
25-Jan-2022 |
Magnus Karlsson <magnus.karlsson@intel.com> |
i40e: xsk: Move tmp desc array from driver to pool Move desc_array from the driver to the pool. The reason behind this is that we can then reuse this array as a temporary storage for descriptors in all zero-copy drivers that use the batched interface. This will make it easier to add batching to more drivers. i40e is the only driver that has a batched Tx zero-copy implementation, so no need to touch any other driver. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Link: https://lore.kernel.org/bpf/20220125160446.78976-6-maciej.fijalkowski@intel.com
|
#
89ec1f08 |
|
01-Jun-2021 |
Jedrzej Jagielski <jedrzej.jagielski@intel.com> |
i40e: Fix queue-to-TC mapping on Tx In SW DCB mode the packets sent receive incorrect UP tags. They are constructed correctly and put into tx_ring, but UP is later remapped by HW on the basis of TCTUPR register contents according to Tx queue selected, and BW used is consistent with the new UP values. This is caused by Tx queue selection in kernel not taking into account DCB configuration. This patch fixes the issue by implementing the ndo_select_queue NDO callback. Fixes: fd0a05ce74ef ("i40e: transmit, receive, and NAPI") Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Tested-by: Imam Hassan Reza Biswas <imam.hassan.reza.biswas@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
#
f7bb0d71 |
|
18-Jan-2021 |
Maciej Fijalkowski <maciej.fijalkowski@intel.com> |
i40e: store the result of i40e_rx_offset() onto i40e_ring Output of i40e_rx_offset() is based on ethtool's priv flag setting, which when changed, causes PF reset (disables napi, frees irqs, loads different Rx mem model, etc.). This means that within napi its result is constant and there is no reason to call it per each processed frame. Add new 'rx_offset' field to i40e_ring that is meant to hold the i40e_rx_offset() result and use it within i40e_clean_rx_irq(). Furthermore, use it within i40e_alloc_mapped_page(). Last but not least, un-inline the function of interest so that compiler makes the decision about inlining as it lives in .c file. Reviewed-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
#
3106c580 |
|
15-Nov-2020 |
Magnus Karlsson <magnus.karlsson@intel.com> |
i40e: Use batched xsk Tx interfaces to increase performance Use the new batched xsk interfaces for the Tx path in the i40e driver to improve performance. On my machine, this yields a throughput increase of 4% for the l2fwd sample app in xdpsock. If we instead just look at the Tx part, this patch set increases throughput with above 20% for Tx. Note that I had to explicitly loop unroll the inner loop to get to this performance level, by using a pragma. It is honored by both clang and gcc and should be ignored by versions that do not support it. Using the -funroll-loops compiler command line switch on the source file resulted in a loop unrolling on a higher level that lead to a performance decrease instead of an increase. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/1605525167-14450-6-git-send-email-magnus.karlsson@gmail.com
|
#
b50f7bca |
|
25-Sep-2020 |
Jesse Brandeburg <jesse.brandeburg@intel.com> |
intel-ethernet: clean up W=1 warnings in kdoc This takes care of all of the trivial W=1 fixes in the Intel Ethernet drivers, which allows developers and maintainers to build more of the networking tree with more complete warning checks. There are three classes of kdoc warnings fixed: - cannot understand function prototype: 'x' - Excess function parameter 'x' description in 'y' - Function parameter or member 'x' not described in 'y' All of the changes were trivial comment updates on function headers. Inspired by Lee Jones' series of wireless work to do the same. Compile tested only, and passes simple test of $ git ls-files *.[ch] | egrep drivers/net/ethernet/intel | \ xargs scripts/kernel-doc -none Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f0064bfd |
|
25-Aug-2020 |
Björn Töpel <bjorn@kernel.org> |
i40e: use 16B HW descriptors instead of 32B The i40e NIC supports two flavors of HW descriptors, 16 and 32 byte. The latter has, obviously, room for more offloading information. However, the only fields of the 32B HW descriptor that is being used by the driver, is also available in the 16B descriptor. In other words; Reading and writing 32 bytes instead of 16 byte is a waste of bus bandwidth. This commit starts using 16 byte descriptors instead of 32 byte descriptors. For AF_XDP the rx_drop benchmark was improved by 2%. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
#
1742b3d5 |
|
28-Aug-2020 |
Magnus Karlsson <magnus.karlsson@intel.com> |
xsk: i40e: ice: ixgbe: mlx5: Pass buffer pool to driver instead of umem Replace the explicit umem reference passed to the driver in AF_XDP zero-copy mode with the buffer pool instead. This in preparation for extending the functionality of the zero-copy mode so that umems can be shared between queues on the same netdev and also between netdevs. In this commit, only an umem reference has been added to the buffer pool struct. But later commits will add other entities to it. These are going to be entities that are different between different queue ids and netdevs even though the umem is shared between them. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-2-git-send-email-magnus.karlsson@intel.com
|
#
5574ff7b |
|
23-Jun-2020 |
Magnus Karlsson <magnus.karlsson@intel.com> |
i40e: optimize AF_XDP Tx completion path Improve the performance of the AF_XDP zero-copy Tx completion path. When there are no XDP buffers being sent using XDP_TX or XDP_REDIRECT, we do not have go through the SW ring to clean up any entries since the AF_XDP path does not use these. In these cases, just fast forward the next-to-use counter and skip going through the SW ring. The limit on the maximum number of entries to complete is also removed since the algorithm is now O(1). To simplify the code path, the maximum number of entries to complete for the XDP path is therefore also increased from 256 to 512 (the default number of Tx HW descriptors). This should be fine since the completion in the XDP path is faster than in the SKB path that has 256 as the maximum number. This patch provides around 4% throughput improvement for the l2fwd application in xdpsock on my machine. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
#
3c98f9ee |
|
06-Jan-2020 |
Jesse Brandeburg <jesse.brandeburg@intel.com> |
i40e: remove unused defines Remove all the unused defines as they are just dead weight. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
3b4f0b66 |
|
20-May-2020 |
Björn Töpel <bjorn@kernel.org> |
i40e, xsk: Migrate to new MEM_TYPE_XSK_BUFF_POOL Remove MEM_TYPE_ZERO_COPY in favor of the new MEM_TYPE_XSK_BUFF_POOL APIs. The AF_XDP zero-copy rx_bi ring is now simply a struct xdp_buff pointer. v4->v5: Fixed "warning: Excess function parameter 'bi' description in 'i40e_construct_skb_zc'". (Jakub) Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: intel-wired-lan@lists.osuosl.org Link: https://lore.kernel.org/bpf/20200520192103.355233-9-bjorn.topel@gmail.com
|
#
be1222b5 |
|
20-May-2020 |
Björn Töpel <bjorn@kernel.org> |
i40e: Separate kernel allocated rx_bi rings from AF_XDP rings Continuing the path to support MEM_TYPE_XSK_BUFF_POOL, the AF_XDP zero-copy/sk_buff rx_bi rings are now separate. Functions to properly allocate the different rings are added as well. v3->v4: Made i40e_fd_handle_status() static. (kbuild test robot) v4->v5: Fix kdoc for i40e_clean_programming_status(). (Jakub) Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: intel-wired-lan@lists.osuosl.org Link: https://lore.kernel.org/bpf/20200520192103.355233-8-bjorn.topel@gmail.com
|
#
d7840976 |
|
22-Jul-2019 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
net: Use skb accessors in network drivers In preparation for unifying the skb_frag and bio_vec, use the fine accessors which already exist and use skb_frag_t instead of struct skb_frag_struct. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0a714186 |
|
28-Aug-2018 |
Björn Töpel <bjorn@kernel.org> |
i40e: add AF_XDP zero-copy Rx support This patch adds zero-copy Rx support for AF_XDP sockets. Instead of allocating buffers of type MEM_TYPE_PAGE_SHARED, the Rx frames are allocated as MEM_TYPE_ZERO_COPY when AF_XDP is enabled for a certain queue. All AF_XDP specific functions are added to a new file, i40e_xsk.c. Note that when AF_XDP zero-copy is enabled, the XDP action XDP_PASS will allocate a new buffer and copy the zero-copy frame prior passing it to the kernel stack. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
763ea096 |
|
05-Jun-2018 |
Jesper Dangaard Brouer <brouer@redhat.com> |
i40e: remove ndo_xdp_flush call i40e_xdp_flush Remove the ndo_xdp_flush call implementation i40e_xdp_flush as no callers of ndo_xdp_flush are left. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
42b33468 |
|
31-May-2018 |
Jesper Dangaard Brouer <brouer@redhat.com> |
xdp: add flags argument to ndo_xdp_xmit API This patch only change the API and reject any use of flags. This is an intermediate step that allows us to implement the flush flag operation later, for each individual driver in a separate patch. The plan is to implement flush operation via XDP_XMIT_FLUSH flag and then remove XDP_XMIT_FLAGS_NONE when done. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
735fc405 |
|
24-May-2018 |
Jesper Dangaard Brouer <brouer@redhat.com> |
xdp: change ndo_xdp_xmit API to support bulking This patch change the API for ndo_xdp_xmit to support bulking xdp_frames. When kernel is compiled with CONFIG_RETPOLINE, XDP sees a huge slowdown. Most of the slowdown is caused by DMA API indirect function calls, but also the net_device->ndo_xdp_xmit() call. Benchmarked patch with CONFIG_RETPOLINE, using xdp_redirect_map with single flow/core test (CPU E5-1650 v4 @ 3.60GHz), showed performance improved: for driver ixgbe: 6,042,682 pps -> 6,853,768 pps = +811,086 pps for driver i40e : 6,187,169 pps -> 6,724,519 pps = +537,350 pps With frames avail as a bulk inside the driver ndo_xdp_xmit call, further optimizations are possible, like bulk DMA-mapping for TX. Testing without CONFIG_RETPOLINE show the same performance for physical NIC drivers. The virtual NIC driver tun sees a huge performance boost, as it can avoid doing per frame producer locking, but instead amortize the locking cost over the bulk. V2: Fix compile errors reported by kbuild test robot <lkp@intel.com> V4: Isolated ndo, driver changes and callers. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
51dce24b |
|
26-Apr-2018 |
Jeff Kirsher <jeffrey.t.kirsher@intel.com> |
net: intel: Cleanup the copyright/license headers After many years of having a ~30 line copyright and license header to our source files, we are finally able to reduce that to one line with the advent of the SPDX identifier. Also caught a few files missing the SPDX license identifier, so fixed them up. Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Acked-by: Shannon Nelson <shannon.nelson@oracle.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
44fa2dbd |
|
17-Apr-2018 |
Jesper Dangaard Brouer <brouer@redhat.com> |
xdp: transition into using xdp_frame for ndo_xdp_xmit Changing API ndo_xdp_xmit to take a struct xdp_frame instead of struct xdp_buff. This brings xdp_return_frame and ndp_xdp_xmit in sync. This builds towards changing the API further to become a bulk API, because xdp_buff is not a queue-able object while xdp_frame is. V4: Adjust for commit 59655a5b6c83 ("tuntap: XDP_TX can use native XDP") V7: Adjust for commit d9314c474d4f ("i40e: add support for XDP_REDIRECT") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b411ef11 |
|
17-Apr-2018 |
Jesper Dangaard Brouer <brouer@redhat.com> |
i40e: convert to use generic xdp_frame and xdp_return_frame API Also convert driver i40e, which very recently got XDP_REDIRECT support in commit d9314c474d4f ("i40e: add support for XDP_REDIRECT"). V7: This patch got added in V7 of this patchset. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d9314c47 |
|
22-Mar-2018 |
Björn Töpel <bjorn@kernel.org> |
i40e: add support for XDP_REDIRECT The driver now acts upon the XDP_REDIRECT return action. Two new ndos are implemented, ndo_xdp_xmit and ndo_xdp_flush. XDP_REDIRECT action enables XDP program to redirect frames to other netdevs. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
ae06c70b |
|
22-Mar-2018 |
Jeff Kirsher <jeffrey.t.kirsher@intel.com> |
intel: add SPDX identifiers to all the Intel drivers Add the SPDX identifiers to all the Intel wired LAN driver files, as outlined in Documentation/process/license-rules.rst. Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
04d41051 |
|
12-Feb-2018 |
Alan Brady <alan.brady@intel.com> |
i40e/i40evf: use SW variables for hang detection The i40e_detect_recover_hung function uses the i40e_get_tx_pending function to determine if there are packets stalled on the ring. i40e_get_tx_pending calculates the pending packets using the head writeback value and HW tail. If the queue is stopped and we lose the interrupt to update our next_to_clean then we a) won't get another interrupt to clean because queue is stopped b) we won't catch the problem with i40e_detect_recover_hung because the HW values look like there's no packets waiting to be transmitted. Using the SW values we can catch the issue because next_to_clean will be out of sync with head writeback. This has the added benefit being less CPU intensive because we don't need to reach into the hardware to get the values. Signed-off-by: Alan Brady <alan.brady@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
a0073a4b |
|
29-Dec-2017 |
Alexander Duyck <alexander.h.duyck@intel.com> |
i40e/i40evf: Add support for new mechanism of updating adaptive ITR This patch replaces the existing mechanism for determining the correct value to program for adaptive ITR with yet another new and more complicated approach. The basic idea from a 30K foot view is that this new approach will push the Rx interrupt moderation up so that by default it starts in low latency and is gradually pushed up into a higher latency setup as long as doing so increases the number of packets processed, if the number of packets drops to 4 to 1 per packet we will reset and just base our ITR on the size of the packets being received. For Tx we leave it floating at a high interrupt delay and do not pull it down unless we start processing more than 112 packets per interrupt. If we start exceeding that we will cut our interrupt rates in half until we are back below 112. The side effect of these patches are that we will be processing more packets per interrupt. This is both a good and a bad thing as it means we will not be blocking processing in the case of things like pktgen and XDP, but we will also be consuming a bit more CPU in the cases of things such as network throughput tests using netperf. One delta from this versus the ixgbe version of the changes is that I have made the interrupt moderation a bit more aggressive when we are in bulk mode by moving our "goldilocks zone" up from 48 to 96 to 56 to 112. The main motivation behind moving this is to address the fact that we need to update less frequently, and have more fine grained control due to the separate Tx and Rx ITR times. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
556fdfd6 |
|
29-Dec-2017 |
Alexander Duyck <alexander.h.duyck@intel.com> |
i40e/i40evf: Split container ITR into current_itr and target_itr This patch is mostly prep-work for replacing the current approach to programming the dynamic aka adaptive ITR. Specifically here what we are doing is splitting the Tx and Rx ITR each into two separate values. The first value current_itr represents the current value of the register. The second value target_itr represents the desired value of the register. The general plan by doing this is to allow for deferring the update of the ITR value under certain circumstances. For now we will work with what we have, but in the future I hope to change the behavior so that we always only update one ITR at a time using some simple logic to determine which ITR requires an update. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
92418fb1 |
|
29-Dec-2017 |
Alexander Duyck <alexander.h.duyck@intel.com> |
i40e/i40evf: Use usec value instead of reg value for ITR defines Instead of using the register value for the defines when setting up the ring ITR we can just use the actual values and avoid the use of shifts and macros to translate between the values we have and the values we want. This helps to make the code more readable as we can quickly translate from one value to the other. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
40588ca6 |
|
29-Dec-2017 |
Alexander Duyck <alexander.h.duyck@intel.com> |
i40e/i40evf: Only track one ITR setting per ring instead of Tx/Rx The rings are already split out into Tx and Rx rings so it doesn't make sense to have any single ring store both a Tx and Rx itr_setting value. Since that is the case drop the pair in favor of storing just a single ITR value. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
0a797db3 |
|
26-Jan-2018 |
Alexander Duyck <alexander.h.duyck@intel.com> |
i40e/i40evf: Update DESC_NEEDED value to reflect larger value When compared to ixgbe and other previous Intel drivers the i40e and i40evf drivers actually reserve 2 additional descriptors in maybe_stop_tx for cache line alignment. We need to update DESC_NEEDED to reflect this as otherwise we are more likely to return TX_BUSY which will cause issues with things like xmit_more. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
07d44190 |
|
18-Dec-2017 |
Sudheer Mogilappagari <sudheer.mogilappagari@intel.com> |
i40e/i40evf: Detect and recover hung queue scenario In VFs, there is a known issue which can cause writebacks to not occur when interrupts are disabled and there are less than 4 descriptors resulting in TX timeout. Timeout can also occur due to lost interrupt. The current implementation for detecting and recovering from hung queues in the PF is problematic because it actually actively encourages lost interrupts. By triggering a SW interrupt, interrupts are forced on. If we are already in napi_poll and an interrupt fires, napi_poll will not be rescheduled and the interrupt is effectively lost; thereby potentially *causing* hung queues. This patch checks whether packets are being processed between every watchdog cycle and determine potential hung queue and fires triggers SW interrupt only for that particular queue. Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
87128824 |
|
03-Jan-2018 |
Jesper Dangaard Brouer <brouer@redhat.com> |
i40e: setup xdp_rxq_info The i40e driver has a special "FDIR" RX-ring (I40E_VSI_FDIR) which is a sideband channel for configuring/updating the flow director tables. This (i40e_vsi_)type does not invoke XDP-ebpf code. As suggested by Björn (V2): Instead of marking this I40E_VSI_FDIR RX-ring a special case, reverse the logic and only select RX-rings of type I40E_VSI_MAIN to register xdp_rxq_info's for. Driver hook points for xdp_rxq_info: * reg : i40e_setup_rx_descriptors (via i40e_vsi_setup_rx_resources) * unreg: i40e_free_rx_resources (via i40e_vsi_free_rx_resources) Tested on actual hardware with samples/bpf program. V2: Fixed bug in i40e_set_ringparam (memset zero) + match on I40E_VSI_MAIN. V4: Update patch desc that got out-of-sync with code. Cc: intel-wired-lan@lists.osuosl.org Cc: Björn Töpel <bjorn.topel@intel.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
8f88b303 |
|
07-Sep-2017 |
Amritha Nambiar <amritha.nambiar@intel.com> |
i40e: Add infrastructure for queue channel support This patch sets up the infrastructure for offloading TCs and queue configurations to the hardware by creating HW channels(VSI). A new channel is created for each of the traffic class configuration offloaded via mqprio framework except for the first TC (TC0). TC0 for the main VSI is also reconfigured as per user provided queue parameters. Queue counts that are not power-of-2 are handled by reconfiguring RSS by reprogramming LUTs using the queue count value. This patch also handles configuring the TX rings for the channels, setting up the RX queue map for channel. Also, the channels so created are removed and all the queue configuration is set to default when the qdisc is detached from the root of the device. Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com> Signed-off-by: Kiran Patil <kiran.patil@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
95bc2fb4 |
|
07-Sep-2017 |
Jacob Keller <jacob.e.keller@intel.com> |
i40e/i40evf: bundle more descriptors when allocating buffers Double the number of descriptors we'll bundle into one tail bump when receiving. Empirical testing has shown that we reduce CPU utilization and don't appear to reduce throughput or packet rate. 32 seems to be the sweet spot, as it's half the default polling budget, so we'd essentially reduce from 4 tail writes when polling down to 2. Increasing this up to 64 appears to have negative impacts as it may become possible that we don't bump the tail each time we get polled, which could cause a long delay between returning descriptors to the hardware. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
42702559 |
|
07-Sep-2017 |
Jacob Keller <jacob.e.keller@intel.com> |
i40e/i40evf: fix incorrect default ITR values on driver load The ITR register expects to be programmed in units of 2 microseconds. Because of this, all of the drivers I40E_ITR_* constants are in terms of this 2 microsecond register. Unfortunately, the rx_itr_default value is expected to be programmed in microseconds. Effectively the driver defaults to an ITR value of half the expected value (in terms of minimum microseconds between interrupts). Fix this by changing the default values to be calculated using ITR_REG_TO_USEC macro which indicates that we're converting from the register units into microseconds. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
bd6cd4e6 |
|
29-Aug-2017 |
Jesse Brandeburg <jesse.brandeburg@intel.com> |
i40e/i40evf: use DECLARE_BITMAP for state When using set_bit and friends, we should be using actual bitmaps, and fix all the locations where we might access it. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
742c9875 |
|
14-Jul-2017 |
Jacob Keller <jacob.e.keller@intel.com> |
i40e/i40evf: avoid dynamic ITR updates when polling or low packet rate The dynamic ITR algorithm depends on a calculation of usecs which assumes that the interrupts have been firing constantly at the interrupt throttle rate. This is not guaranteed because we could have a low packet rate, or have been polling in software. We'll estimate whether this is the case by using jiffies to determine if we've been too long. If the time difference of jiffies is larger we are guaranteed to have an incorrect calculation. If the time difference of jiffies is smaller we might have been polling some but the difference shouldn't affect the calculation too much. This ensures that we don't get stuck in BULK latency during certain rare situations where we receive bursts of packets that force us into NAPI polling. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
0a2c7722 |
|
14-Jul-2017 |
Jacob Keller <jacob.e.keller@intel.com> |
i40e/i40evf: remove ULTRA latency mode Since commit c56625d59726 ("i40e/i40evf: change dynamic interrupt thresholds") a new higher latency ITR setting called I40E_ULTRA_LATENCY was added with a cryptic comment about how it was meant for adjusting Rx more aggressively when streaming small packets. This mode was attempting to calculate packets per second and then kick in when we have a huge number of small packets. Unfortunately, the ULTRA setting was kicking in for workloads it wasn't intended for including single-thread UDP_STREAM workloads. This wasn't caught for a variety of reasons. First, the ip_defrag routines were improved somewhat which makes the UDP_STREAM test still reasonable at 10GbE, even when dropped down to 8k interrupts a second. Additionally, some other obvious workloads appear to work fine, such as TCP_STREAM. The number 40k doesn't make sense for a number of reasons. First, we absolutely can do more than 40k packets per second. Second, we calculate the value inline in an integer, which sometimes can overflow resulting in using incorrect values. If we fix this overflow it makes it even more likely that we'll enter ULTRA mode which is the opposite of what we want. The ULTRA mode was added originally as a way to reduce CPU utilization during a small packet workload where we weren't keeping up anyways. It should never have been kicking in during these other workloads. Given the issues outlined above, let's remove the ULTRA latency mode. If necessary, a better solution to the CPU utilization issue for small packet workloads will be added in a future patch. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
d36e41dc |
|
23-Jun-2017 |
Jacob Keller <jacob.e.keller@intel.com> |
i40e: separate hw_features from runtime changing flags The number of flags found in pf->flags has grown quite large, and there are a lot of different types of flags. Most of the flags are simply hardware features which are enabled on some firmware or some MAC types. Other flags are dynamic run-time flags which enable or disable certain features of the driver. Separate these two types of flags into pf->hw_features and pf->flags. The hw_features list will contain a set of features which are enabled at init time. This will not contain toggles or otherwise dynamically changing features. These flags should not need atomic protections, as they will be set once during init and then be essentially read only. Everything else will remain in the flags variable. These flags may be modified at any time during run time. A future patch may wish to convert these flags into set_bit/clear_bit/test_bit or similar approach to ensure atomic correctness. The I40E_FLAG_MFP_ENABLED flag may be a good fit for hw_features but currently is used by ethtool in the private flags settings, and thus has been left as part of flags. Additionally, I40E_FLAG_DCB_CAPABLE may be a good fit for the hw_features but this patch has not tried to untangle it yet. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
1e3a5fd5 |
|
23-Jun-2017 |
Mitch Williams <mitch.a.williams@intel.com> |
i40e/i40evf: adjust packet size to account for double VLANs Now that the kernel supports double VLAN tags, we should at least play nice. Adjust the max packet size to account for two VLAN tags, not just one. Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
74608d17 |
|
23-May-2017 |
Björn Töpel <bjorn@kernel.org> |
i40e: add support for XDP_TX action This patch adds proper XDP_TX action support. For each Tx ring, an additional XDP Tx ring is allocated and setup. This version does the DMA mapping in the fast-path, which will penalize performance for IOMMU enabled systems. Further, debugfs support is not wired up for the XDP Tx rings. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
0c8493d9 |
|
23-May-2017 |
Björn Töpel <bjorn@kernel.org> |
i40e: add XDP support for pass and drop actions This commit adds basic XDP support for i40e derived NICs. All XDP actions will end up in XDP_DROP. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
ca9ec088 |
|
05-Apr-2017 |
Alexander Duyck <alexander.h.duyck@intel.com> |
i40e/i40evf: Add support for padding start of frames This patch adds padding to the start of frames to make room for headroom for us to eventually start using build_skb. Right now we guarantee at least NET_SKB_PAD + NET_IP_ALIGN, however we allocate more space if more is available. For example on x86 the headroom should be 192 bytes. On systems that have too large of a cache line size to support storing 1.5K padding and shared info we default to using 3K buffers and reserve everything that isn't used for skb_shared_info or the data buffer for headroom. Change-ID: I33c641c9a1ea10cf7cc484c2d20985368d2d709a Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
98efd694 |
|
05-Apr-2017 |
Alexander Duyck <alexander.h.duyck@intel.com> |
i40e/i40evf: Add support for using order 1 pages with a 3K buffer There are situations where adding padding to the front and back of an Rx buffer will require that we add additional padding. Specifically if NET_IP_ALIGN is non-zero, or the MTU size is larger than 7.5K we would need to use 2K buffers which leaves us with no room for the padding. To preemptively address these cases I am adding support for 3K buffers to the Rx path so that we can provide the additional padding needed in the event of NET_IP_ALIGN being non-zero or a cache line being greater than 64. Change-ID: I938bc1ba611285428df39a613cd66f98e60b55c7 Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
17daabb5 |
|
05-Apr-2017 |
Alan Brady <alan.brady@intel.com> |
i40e: Simplify i40e_detect_recover_hung_queue logic This patch greatly reduces the unneeded complexity in the i40e_detect_recover_hung_queue code path. The previous implementation set a 'hung bit' which would then get cleared while polling. If the detection routine was called a second time with the bit already set, we would issue a software interrupt. This patch makes it such that if interrupts are disabled and we have pending TX descriptors, we trigger a software interrupt since in, the worst case, queues are already clean and we have an extra interrupt. Additionally this patch removes the workaround for lost interrupts as calling napi_reschedule in this context can cause software interrupts to fire on the wrong CPU. Change-ID: Iae108582a3ceb6229ed1d22e4ed6e69cf97aad8d Signed-off-by: Alan Brady <alan.brady@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
dab86afd |
|
14-Mar-2017 |
Alexander Duyck <alexander.h.duyck@intel.com> |
i40e/i40evf: Change the way we limit the maximum frame size for Rx This patch changes the way we handle the maximum frame size for the Rx path. Previously we were rounding up to 2K for a 1500 MTU and then brining the max frame size down to MTU plus a fixed amount. With this patch applied what we now do is limit the maximum frame to 1.5K minus the value for NET_IP_ALIGN for standard MTU, and for any MTU greater than 1500 we allow up to the maximum frame size. This makes the behavior more consistent with the other drivers such as igb which had similar logic. In addition it reduces the test matrix for MTU since we only have two max frame sizes that are handled for Rx now. Change-ID: I23a9d3c857e7df04b0ef28c64df63e659c013f3f Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
9eed69a9 |
|
21-Feb-2017 |
Alexander Duyck <alexander.h.duyck@intel.com> |
i40e: Drop FCoE code from core driver files Looking over the code for FCoE it looks like the Rx path has been broken at least since the last major Rx refactor almost a year ago. It seems like FCoE isn't supported for any of the Fortville/Fortpark hardware so there isn't much point in carrying the code around, especially if it is broken and untested. Change-ID: I892de8fa551cb129ce2361e738ff82ce55fa229e Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
1793668c |
|
21-Feb-2017 |
Alexander Duyck <alexander.h.duyck@intel.com> |
i40e/i40evf: Update code to better handle incrementing page count Update the driver code so that we do bulk updates of the page reference count instead of just incrementing it by one reference at a time. The advantage to doing this is that we cut down on atomic operations and this in turn should give us a slight improvement in cycles per packet. In addition if we eventually move this over to using build_skb the gains will be more noticeable. I also found and fixed a store forwarding stall from where we were assigning "*new_buff = *old_buff". By breaking it up into individual copies we can avoid this and as a result the performance is slightly improved. Change-ID: I1d3880dece4133eca3c32423b04a5467321ccc52 Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
59605bc0 |
|
30-Jan-2017 |
Alexander Duyck <alexander.h.duyck@intel.com> |
i40e/i40evf: Add support for mapping pages with DMA attributes This patch adds support for DMA_ATTR_SKIP_CPU_SYNC and DMA_ATTR_WEAK_ORDERING. By enabling both of these for the Rx path we are able to see performance improvements on architectures that implement either one due to the fact that page mapping and unmapping only has to sync what is actually being used instead of the entire buffer. In addition by enabling the weak ordering attribute enables a performance improvement for architectures that can associate a memory ordering with a DMA buffer such as Sparc. Change-ID: If176824e8231c5b24b8a5d55b339a6026738fc75 Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
e72e5659 |
|
10-Feb-2017 |
Scott Peterson <scott.d.peterson@intel.com> |
i40e/i40evf: Moves skb from i40e_rx_buffer to i40e_ring This patch reduces the size of struct i40e_rx_buffer by one pointer, and makes the i40e driver a little more consistent with the igb driver in terms of packets that span buffers. We do this by moving the skb field from struct i40e_rx_buffer to struct i40e_ring. We pass the skb we already have (or NULL if we don't) to i40e_fetch_rx_buffer(), which skips the skb allocation if we already have one for this packet. Change-ID: I4ad48a531844494ba0c5d8e1a62209a057f661b0 Signed-off-by: Scott Peterson <scott.d.peterson@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
1c0e6a36 |
|
28-Nov-2016 |
Alan Brady <alan.brady@intel.com> |
i40e: refactor macro INTRL_USEC_TO_REG This patch refactors the macro INTRL_USEC_TO_REG into a static inline function and fixes a couple subtle bugs caused by the macro. This patch fixes a bug which was caused by passing a bad register value to the firmware. If enabling interrupt rate limiting, a non-zero value for the rate limit must be used. Otherwise the firmware sets the interrupt rate limit to the maximum value. Due to the limited resolution of the register, attempting to set a value of 1, 2, or 3 would be rounded down to 0 and limiting was left enabled, causing unexpected behavior. This patch also fixes a possible bug in which using the macro itself can introduce unintended side-affects because the macro argument is used more than once in the macro definition (e.g. a variable post-increment argument would perform a double increment on the variable). Without this patch, attempting to set interrupt rate limits of 1, 2, or 3 results in unexpected behavior and future use of this macro could cause subtle bugs. Change-Id: I83ac842de0ca9c86761923d6e3a4d7b1b95f2b3f Signed-off-by: Alan Brady <alan.brady@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
4293d5f5 |
|
08-Nov-2016 |
Mitch Williams <mitch.a.williams@intel.com> |
i40e: simplify txd use count calculation The i40e_txd_use_count function was fast but confusing. In the comments, it even admits that it's ugly. So replace it with a new function that is (very) slightly faster and has extensive commenting to help the thicker among us (including the author, who will forget in a week) understand how it works. Change-ID: Ifb533f13786a0bf39cb29f77969a5be2c83d9a87 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
1dc8b538 |
|
11-Oct-2016 |
Alexander Duyck <alexander.h.duyck@intel.com> |
i40e: Reorder logic for coalescing RS bits This patch reorders the logic at the end of i40e_tx_map to address the fact that the logic was rather convoluted and much larger than it needed to be. In order to try and coalesce the code paths I have updated some of the comments and repurposed some of the variables in order to reduce unnecessary overhead. This patch does the following: 1. Quit tracking skb->xmit_more with a flag, just max out packet_stride 2. Drop tail_bump and do_rs and instead just use desc_count and td_cmd 3. Pull comments from ixgbe that make need for wmb() more explicit. Change-ID: Ic7da85ec75043c634e87fef958109789bcc6317c Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
12490501 |
|
05-Oct-2016 |
Jacob Keller <jacob.e.keller@intel.com> |
i40e: replace PTP Rx timestamp hang logic The current Rx timestamp hang logic is not very robust because it does not notice a register is hung until all four timestamps have been latched and we wait a full 5 seconds. Replace this logic with a newer Rx hang detection based on storing the jiffies when we first notice a receive timestamp event. We store each register's time separately, along with a flag indicating if it is currently latched. Upon first transitioning to latch, we will update the latch_events[i] jiffies value. This indicates the time we first noticed this event. The watchdog routine will simply check that the either the flag has been cleared, or we have passed at least one second. In this case, it is able to clear the Rx timestamp register under the assumption that it was for a dropped frame. The benefit if this strategy is that we should be able to detect and clear out stalled RXTIME_H registers before we exhaust the supply of 4, and avoid complete stall of Rx timestamp events. Change-ID: Id55458c0cd7a5dd0c951ff2b8ac0b2509364131f Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
e486bdfd |
|
12-Sep-2016 |
Alexander Duyck <alexander.h.duyck@intel.com> |
i40e/i40evf: Add txring_txq function to match fm10k and ixgbe This patch adds a txring_txq function which allows us to convert a i40e_ring/i40evf_ring to a netdev_tx_queue structure. This way we can avoid having to make a multi-line function call for all the spots that need access to this. Change-ID: Ic063b71d8b92ea406d2c32e798c8e2b02809d65b Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
bec60fc4 |
|
18-Apr-2016 |
Jesse Brandeburg <jesse.brandeburg@intel.com> |
i40e/i40evf: Remove unused hardware receive descriptor code The hardware supports a 16 byte descriptor for receive, but the driver was never using it in production. There was no performance benefit to the real driver of 16 byte descriptors, so drop a whole lot of complexity while getting rid of the code. Also since the previous patch made us use no-split mode all the time, drop any support in the driver for any other value in dtype and assume it is always zero (aka no-split). Hooray for code removal! Change-ID: I2257e902e4dad84a07b94db6d2e6f4ce69b27bc0 Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
1a557afc |
|
20-Apr-2016 |
Jesse Brandeburg <jesse.brandeburg@intel.com> |
i40e: Refactor receive routine This is part 1 of the Rx refactor series, just including changes to i40e. This refactor aligns the receive routine with the one in ixgbe which was highly optimized. This reduces the code we have to maintain and allows for (hopefully) more readable and maintainable RX hot path. In order to do this: - consolidate the receive path into a single function that doesn't use packet split but *does* use pages for Rx buffers. - remove the old _1buf routine - consolidate several routines into helper functions - remove ethtool control over packet split Change-ID: I5ca100721de65992aa0114f8b4bac844b84758e0 Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
04b3b779 |
|
18-Apr-2016 |
Jesse Brandeburg <jesse.brandeburg@intel.com> |
i40e/i40evf: Remove reference to ring->dtype As part of the rx-refactor, the dtype variable in the i40e_ring struct is no longer used, so remove it. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
b32bfa17 |
|
18-Apr-2016 |
Jesse Brandeburg <jesse.brandeburg@intel.com> |
i40e: Drop packet split receive routine As part of preparation for the rx-refactor, remove the packet split receive routine and ancillary code. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
3f3f7cb8 |
|
30-Mar-2016 |
Alexander Duyck <aduyck@mirantis.com> |
i40e/i40evf: Limit TSO to 7 descriptors for payload instead of 8 per packet This patch addresses a bug introduced based on my interpretation of the XL710 datasheet. Specifically section 8.4.1 states that "A single transmit packet may span up to 8 buffers (up to 8 data descriptors per packet including both the header and payload buffers)." It then later goes on to say that each segment for a TSO obeys the previous rule, however it then refers to TSO header and the segment payload buffers. I believe the actual limit for fragments with TSO and a skbuff that has payload data in the header portion of the buffer is actually only 7 fragments as the skb->data portion counts as 2 buffers, one for the TSO header, and one for a segment payload buffer. Fixes: 2d37490b82af ("i40e/i40evf: Rewrite logic for 8 descriptor per packet check") Reported-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
1f15d667 |
|
01-Apr-2016 |
Jesse Brandeburg <jesse.brandeburg@intel.com> |
i40e/i40evf: Faster RX via avoiding FCoE As it turns out, calling into other files from hot path hurts performance a lot. In this case the majority of the time we call "check FCoE" and the packet is *not* FCoE, but this call was taking 5% of our total cycles spent on receive. Change-ID: I080552c26e7060bc7b78504dc2763f6f0b3d8c76 Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
5c4654da |
|
19-Feb-2016 |
Alexander Duyck <aduyck@mirantis.com> |
i40e/i40evf: Allow up to 12K bytes of data per Tx descriptor instead of 8K From what I can tell the practical limitation on the size of the Tx data buffer is the fact that the Tx descriptor is limited to 14 bits. As such we cannot use 16K as is typically used on the other Intel drivers. However artificially limiting ourselves to 8K can be expensive as this means that we will consume up to 10 descriptors (1 context, 1 for header, and 9 for payload, non-8K aligned) in a single send. I propose that we can reduce this by increasing the maximum data for a 4K aligned block to 12K. We can reduce the descriptors used for a 32K aligned block by 1 by increasing the size like this. In addition we still have the 4K - 1 of space that is still unused. We can use this as a bit of extra padding when dealing with data that is not aligned to 4K. By aligning the descriptors after the first to 4K we can improve the efficiency of PCIe accesses as we can avoid using byte enables and can fetch full TLP transactions after the first fetch of the buffer. This helps to improve PCIe efficiency. Below is the results of testing before and after with this patch: Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB Before: 87380 16384 16384 10.00 33682.24 20.27 -1.00 0.592 -1.00 After: 87380 16384 16384 10.00 34204.08 20.54 -1.00 0.590 -1.00 So the net result of this patch is that we have a small gain in throughput due to a reduction in overhead for putting together the frame. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
a75e8005 |
|
19-Feb-2016 |
Kan Liang <kan.liang@intel.com> |
i40e: queue-specific settings for interrupt moderation For i40e driver, each vector has its own ITR register. However, there are no concept of queue-specific settings in the driver proper. Only global variable is used to store ITR values. That will cause problems especially when resetting the vector. The specific ITR values could be lost. This patch move rx_itr_setting and tx_itr_setting to i40e_ring to store specific ITR register for each queue. i40e_get_coalesce and i40e_set_coalesce are also modified accordingly to support queue-specific settings. To make it compatible with old ethtool, if user doesn't specify the queue number, i40e_get_coalesce will return queue 0's value. While i40e_set_coalesce will apply value to all queues. Signed-off-by: Kan Liang <kan.liang@intel.com> Acked-by: Shannon Nelson <shannon.nelson@intel.com> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2d37490b |
|
17-Feb-2016 |
Alexander Duyck <aduyck@mirantis.com> |
i40e/i40evf: Rewrite logic for 8 descriptor per packet check This patch is meant to rewrite the logic for how we determine if we can transmit the frame or if it needs to be linearized. The previous code for this function was using a mix of division and modulus division as a part of computing if we need to take the slow path. Instead I have replaced this by simply working with a sliding window which will tell us if the frame would be capable of causing a single packet to span several descriptors. The logic for the scan is fairly simple. If any given group of 6 fragments is less than gso_size - 1 then it is possible for us to have one byte coming out of the first fragment, 6 fragments, and one or more bytes coming out of the last fragment. This gives us a total of 8 fragments which exceeds what we can allow so we send such frames to be linearized. Arguably the use of modulus might be more exact as the approach I propose may generate some false positives. However the likelihood of us taking much of a hit for those false positives is fairly low, and I would rather not add more overhead in the case where we are receiving a frame composed of 4K pages. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
4ec441df |
|
17-Feb-2016 |
Alexander Duyck <aduyck@mirantis.com> |
i40e/i40evf: Break up xmit_descriptor_count from maybe_stop_tx In an upcoming patch I would like to have access to the descriptor count used for the data portion of the frame. For this reason I am splitting up the descriptor count function from the function that stops the ring. Also in order to try and reduce unnecessary duplication of code I am moving the slow-path portions of the code out of being inline calls so that we can just jump to them and process them instead of having to build them into each function that calls them. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
529f1f65 |
|
24-Jan-2016 |
Alexander Duyck <aduyck@mirantis.com> |
i40e/i40evf: Add exception handling for Tx checksum Add exception handling to the Tx checksum path so that we can handle cases of TSO where the frame is bad, or Tx checksum where we didn't recognize a protocol Drop I40E_TX_FLAGS_CSUM as it is unused, move the CHECKSUM_PARTIAL check into the function itself so that we can decrease indent. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
a9c9a81f |
|
24-Jan-2016 |
Alexander Duyck <aduyck@mirantis.com> |
i40e/i40evf: Drop outer checksum offload that was not requested The i40e and i40evf drivers contained code for inserting an outer checksum on UDP tunnels. The issue however is that the upper levels of the stack never requested such an offload and it results in possible errors. In addition the same logic was being applied to the Rx side where it was attempting to validate the outer checksum, but the logic there was incorrect in that it was testing for the resultant sum to be equal to the header checksum instead of being equal to 0. Since this code is so massively flawed, and doing things that we didn't ask for it to do I am just dropping it, and will bring it back later to use as an offload for SKB_GSO_UDP_TUNNEL_CSUM which can make use of such a feature. As far as the Rx feature I am dropping it completely since it would need to be massively expanded and applied to IPv4 and IPv6 checksums for all parts, not just the one that supports Tx checksum offload for the outer. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
dd353109 |
|
15-Jan-2016 |
Anjali Singhai Jain <anjali.singhai@intel.com> |
i40e: Add a SW workaround for lost interrupts This patch adds a workaround for cases where we might have interrupts that got lost but WB happened. If that happens without this patch we will see a tx_timeout. To work around it, this patch goes ahead and reschedules NAPI in that situation, if NAPI is not already scheduled. We also add a counter in ethtool to keep track of when we detect a case of tx_lost_interrupt. Note: napi_reschedule() can be safely called from process/service_task context and is done in other drivers as well without an issue. Change-ID: I00f98f1ce3774524d9421227652bef20fcbd0d20 Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
4668607a |
|
13-Jan-2016 |
Mitch Williams <mitch.a.williams@intel.com> |
i40e: properly show packet split status in debugfs Get rid of the unused hsplit field in the ring struct and use the existing macro to detect packet split enablement. This allows debugfs dumps of the VSI to properly show which Rx routine is in use. Change-ID: Ic4e9589e6a788ab196ed0850703f704e30c03781 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
f16704e5 |
|
13-Jan-2016 |
Mitch Williams <mitch.a.williams@intel.com> |
i40e/i40evf: use pages correctly in Rx Refactor the packet split Rx code to properly use half-pages for receives. The previous code was doing way more mapping and unmapping than it needed to, and wasn't properly using half-pages. Increment the page use count each time we give a half-page to an skb, knowing that the stack will probably process and release the page before we need it again. Only free and reallocate pages if the count shows that both half-pages are in use. Add counters to track reallocations and page reuse. Change-ID: I534b299196036b64be82b4861a0a4036310a8f22 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
c2e245ab |
|
13-Jan-2016 |
Jesse Brandeburg <jesse.brandeburg@intel.com> |
i40e/i40evf: try again after failure This is the "Don't Give Up" patch. Previously the driver could fail an allocation, and then possibly stall a queue forever, by never coming back to continue receiving or allocating buffers. With this patch, the driver will keep polling trying to allocate receive buffers until it succeeds. This should keep all receive queues running even in the face of memory pressure. Also update copyright year in file header. Change-ID: I2b103d1ce95b9831288a7222c3343ffa1988b81b Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
6a899024 |
|
14-Dec-2015 |
Singhai, Anjali <anjali.singhai@intel.com> |
i40e: geneve tunnel offload support This patch adds driver hooks to implement ndo_ops to add/del udp port in the HW to identify GENEVE tunnels. Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Signed-off-by: Kiran Patil <kiran.patil@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
164c9f54 |
|
21-Oct-2015 |
Anjali Singhai Jain <anjali.singhai@intel.com> |
i40e/i40evf: Add a stat to track how many times we have to do a force WB When in NAPI with interrupts disabled, the HW needs to be forced to do a write back on TX if the number of descriptors pending are less than a cache line. This stat helps keep track of how many times we get into this situation. Change-ID: I76c1bcc7ebccd6bffcc5aa33bfe05f2fa1c9a984 Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
ee2319cf |
|
28-Sep-2015 |
Jesse Brandeburg <jesse.brandeburg@intel.com> |
i40e/i40evf: adjust interrupt throttle less frequently The adaptive ITR (interrupt throttle rate) algorithm was adjusting the hardware's interrupt rate too frequently. This caused a lot of variation in the interrupt rate for fairly constant workloads. Change the code to have a counter and adjust only once every N number of interrupts. Change-ID: I0460f1f86571037484eca5aca36ac4d889cb8389 Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
c56625d5 |
|
28-Sep-2015 |
Jesse Brandeburg <jesse.brandeburg@intel.com> |
i40e/i40evf: change dynamic interrupt thresholds The dynamic algorithm, while now working, doesn't have good performance in 40G mode. One part of this patch addresses the high CPU utilization of some small streaming workloads that the driver should reduce CPU in. It also changes the minimum ITR that the dynamic algorithm will settle on, causing our minimum latency to go from 12us to about 14us, when using adaptive mode. It also changes the BULK interrupt rate to allow maximum throughput on a 40Gb connection with a single thread of transmit, clamping interrupt rate to 8000 for TX makes single thread traffic go too slow. The new ULTRA bulk setting is introduced and is used when the Rx packet rate on this queue exceeds 40000 packets per second. This value of 40000 was chosen because the automatic tuning of minimum ITR=20us means that a single queue can't quite achieve that many packets per second from a round-robin test. Change-ID: Icce8faa128688ca5fd2c4229bdd9726877a92ea2 Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
ac26fc13 |
|
28-Sep-2015 |
Jesse Brandeburg <jesse.brandeburg@intel.com> |
i40e/i40evf: moderate interrupts differently The XL710 hardware has a different interrupt moderation design that can support a limit of total interrupts per second per vector, in addition to the "number of interrupts per second" controls already established in the driver. This combination of hardware features allows us to set very low default latency settings but minimize the total CPU utilization by not making too many interrupts, should the user desire. The current driver implementation is still enabling the dynamic moderation in the driver, and only using the rx/tx-usecs limit in ethtool to limit the interrupt rate per second, by default. The new code implemented in this patch 2) adds init/use of the new "Interrupt Limit" register 3) adds ethtool knob to control/report the limits above Usage is ethtool -C ethx rx-usecs-high <value> Where <value> is number of microseconds to create a rate of 1/N interrupts per second, regardless of rx-usecs or tx-usecs values. Since there is a credit based scheme in the hardware, the rx-usecs and tx-usecs can be configured for very low latency for short bursts, but once the credit runs out the refill rate on the credits is limited by rx-usecs-high. Change-ID: I3a1075d3296123b0f4f50623c779b027af5b188d Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
6995b36c |
|
28-Aug-2015 |
Jesse Brandeburg <jesse.brandeburg@intel.com> |
i40e/i40evf: clean up some code Add missings spaces after declarations, remove another __func__ use, remove uncessary braces, remove unneeded breaks, and useless returns, and generally fix up some code. Change-ID: Ie715d6b64976c50e1c21531685fe0a2bd38c4244 Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
2fc3d715 |
|
27-Aug-2015 |
Anjali Singhai Jain <anjali.singhai@intel.com> |
i40e/i40evf: Add a stat to keep track of linearization count Keep track of how many times we ask the stack to linearize the skb because the HW cannot handle skbs with more than 8 frags per segment/single packet. Change-ID: If455452060963a769bbe6112cba952e79e944b52 Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
9c70d7ce |
|
13-Aug-2015 |
Jesse Brandeburg <jesse.brandeburg@intel.com> |
i40e: fix 32 bit build warnings Sparse found some issues with 32 bit compilation, which probably should at least work without warning. Not only that, but the code was wrong. Thanks sparse!! And thanks to the kbuild robot zero day testing for finding this issue. $ make ARCH=i386 M=drivers/net/ethernet/intel/i40e C=2 CF="-D__CHECK_ENDIAN__" CHECK drivers/net/ethernet/intel/i40e/i40e_main.c include/linux/etherdevice.h:79:32: warning: restricted __be16 degrades to integer drivers/net/ethernet/intel/i40e/i40e_main.c:7565:17: warning: shift too big (32) for type unsigned long drivers/net/ethernet/intel/i40e/i40e_main.c:7565:17: warning: shift too big (42) for type unsigned long drivers/net/ethernet/intel/i40e/i40e_main.c:7565:17: warning: shift too big (39) for type unsigned long drivers/net/ethernet/intel/i40e/i40e_main.c:7565:17: warning: shift too big (40) for type unsigned long CC: kbuild-all@01.org Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Reported-by: kbuild test robot <fengguang.wu@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
58044743 |
|
25-Sep-2015 |
Anjali Singhai <anjali.singhai@intel.com> |
i40e: Fix RS bit update in Tx path and disable force WB workaround This patch fixes the issue of forcing WB too often causing us to not benefit from NAPI. Without this patch we were forcing WB/arming interrupt too often taking away the benefits of NAPI and causing a performance impact. With this patch we disable force WB in the clean routine for X710 and XL710 adapters. X722 adapters do not enable interrupt to force a WB and benefit from WB_ON_ITR and hence force WB is left enabled for those adapters. For XL710 and X710 adapters if we have less than 4 packets pending a software Interrupt triggered from service task will force a WB. This patch also changes the conditions for setting RS bit as described in code comments. This optimizes when the HW does a tail bump and when it does a WB. It also optimizes when we do a wmb. Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
b03a8c1f |
|
24-Sep-2015 |
Kiran Patil <kiran.patil@intel.com> |
i40e/i40evf: refactor tx timeout logic This patch modifies the driver timeout logic by issuing a writeback request via a software interrupt to the hardware the first time the driver detects a hang. The driver was too aggressive in resetting a hung queue, so back that off by removing logic to down the netdevice after too many hangs, and move the function to the service task. Change-ID: Ife100b9d124cd08cbdb81ab659008c1b9abbedea Signed-off-by: Kiran Patil <kiran.patil@intel.com> Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
1e6d6f8c |
|
24-Sep-2015 |
Kiran Patil <kiran.patil@intel.com> |
i40e: Move i40e_get_head into header file i40e_get_head needs to be called in multiple files in a further patch, prepare by moving the function into a header file. Signed-off-by: Kiran Patil <kiran.patil@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
527274c7 |
|
04-Jun-2015 |
Anjali Singhai Jain <anjali.singhai@intel.com> |
i40e/i40evf: Add TX/RX outer UDP checksum support for X722 X722 supports offloading of outer UDP TX and RX checksum for tunneled packets. This patch exposes the support and leaves it enabled by default. Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com> Tested-by: Jim Young <james.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
8e0764b4 |
|
04-Jun-2015 |
Anjali Singhai Jain <anjali.singhai@intel.com> |
i40e/i40evf: Add support for writeback on ITR feature for X722 X722 fixes an issue from X710 where TX descriptor WB would not happen if the interrupts were disabled. In order for the write backs to happen a bit needs to be set in the dynamic interrupt control register called WB_ON_ITR. With this feature, the SW driver need not arm SW interrupts to work around the issue in X710. Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com> Tested-by: Jim Young <james.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
e25d00b8 |
|
23-Jun-2015 |
Anjali Singhai Jain <anjali.singhai@intel.com> |
i40e/i40evf: RSS changes for X722 X722 uses the admin queue to configure RSS. This patch adds the necessary flow changes to configure RSS through AQ. It also adds the separate VMDQ2 lookup tables and hash key programming for X722. X722 also exposes a different set of PCTYPES for RSS, this patch accommodates those changes. Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com> Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Jim Young <james.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
41a1d04b |
|
04-Jun-2015 |
Jesse Brandeburg <jesse.brandeburg@intel.com> |
i40e: use BIT and BIT_ULL macros Use macros for abstracting (1 << foo) to BIT(foo) and (1ULL << foo64) to BIT_ULL(foo64) in order to match better with kernel requirements. NOTE: the adminq_cmd.h file was not modified on purpose because of the dependency upon firmware for that file. Change-ID: I73ee2e48c880d671948aad19bd53ca6b2ac558fc Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com> Tested-by: Jim Young <james.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
33507598 |
|
16-Apr-2015 |
Jesse Brandeburg <jesse.brandeburg@intel.com> |
i40e/i40evf: remove time_stamp member The driver doesn't use the time_stamp member to determine if there is a tx_hang any more. There really isn't any point to the variable at all so just remove it. It was left over from a previous tx_hang design. Change-ID: I4c814827e1bcb46e45118fe37acdcfa814fb62a0 Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Jim Young <james.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
89232c3b |
|
16-Apr-2015 |
Anjali Singhai Jain <anjali.singhai@intel.com> |
i40e/i40evf: Add ATR support for tunneled TCP/IPv4/IPv6 packets. Without this, RSS would have done inner header load balancing. Now we can get the benefits of ATR for tunneled packets to better align TX and RX queues with the right core/interrupt. Change-ID: I07d0e0a192faf28fdd33b2f04c32b2a82ff97ddd Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
71da6197 |
|
20-Feb-2015 |
Anjali Singhai <anjali.singhai@intel.com> |
i40e: Fix TSO with more than 8 frags per segment issue The hardware has some limitations the driver needs to adhere to, that we found in extended testing. 1) no more than 8 descriptors per packet on the wire 2) no header can span more than 3 descriptors If one of these events occurs, the hardware will generate an internal error and freeze the Tx queue. This patch linearizes the skb to avoid these situations. Change-ID: I37dab7d3966e14895a9663ec4d0aaa8eb0d9e115 Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Tested-by: Jim Young <james.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
a132af24 |
|
24-Jan-2015 |
Mitch Williams <mitch.a.williams@intel.com> |
i40e/i40evf: Refactor the receive routines Split the receive hot path code into two, one for packet split and one for single buffer. This improves receive performance since we only need to check if the ring is in packet split mode once per NAPI poll time, not several times per packet. The single buffer code is further improved by the removal of a bunch of code and several variables that are not needed. On a receive-oriented test this can improve single-threaded throughput. Also refactor the packet split receive path to use a fixed buffer for headers, like ixgbe does. This vastly reduces the number of DMA mappings and unmappings we need to do, allowing for much better performance in the presence of an IOMMU. Lastly, correct packet split descriptor types now that we are actually using them. Change-ID: I3a194a93af3d2c31e77ff17644ac7376da6f3e4b Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Jim Young <james.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
d91649f5 |
|
06-Jan-2015 |
Jesse Brandeburg <jesse.brandeburg@intel.com> |
i40e: fix un-necessary Tx hangs When the driver was polling with interrupts disabled the hardware will occasionally not write back descriptors. This patch causes the driver to detect this situation and force an interrupt to fire which will flush the stuck descriptor. Does not conflict with napi because if we are already polling the napi_schedule is ignored. Additionally the extra interrupts are rate limited, so don't cause a burden to the CPU. Change-ID: Iba4616d2a71288672a5f08e4512e2704b97335e8 Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
79442d38 |
|
24-Oct-2014 |
Jesse Brandeburg <jesse.brandeburg@intel.com> |
i40e: clean up throttle rate code The interrupt throttle rate minimum is actually 2us, so fix that define and while we are there, remove some unused defines. Change some strings in the function to be a bit less wrappy, and express the correct limits. Change-ID: I96829bbc77935e0b57c6f0fc1439fb4152b2960a Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Patrick Lu <patrick.lu@intel.com> Tested-by: Jim Young <jamesx.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
810b3ae4 |
|
10-Jul-2014 |
Anjali Singhai Jain <anjali.singhai@intel.com> |
i40e/i40evf: Ignore a driver perceived Tx hang if the number of desc pending < 4 We are seeing situations where the driver sees a hang with less than 4 desc pending, if the driver chooses to ignore it the queue progresses forward and the stack never experiences a real hang. With this patch we will log a stat when this situation happens "tx_sluggish" will increment and we can see some more details at a higher debug level. Other than that we will ignore this particular case of Tx hang. Change-ID: I7d1d1666d990e2b12f4f6bed0d17d22e1b6410d5 Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
38e00438 |
|
01-Aug-2014 |
Vasu Dev <vasu.dev@intel.com> |
i40e: Adds FCoE related code to i40e core driver Adds FCoE specific code to existing i40e core driver to:- 1. have separate FCoE VSI with additional FCoE queues pairs. 2. have FCoE related hash defines. 3. have additional FCoE related stats code. 4. export and then re-use existing functions required by FCoE build. Signed-off-by: Vasu Dev <vasu.dev@intel.com> Tested-by: Jack Morgan<jack.morgan@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
49d7d933 |
|
04-Jun-2014 |
Anjali Singhai Jain <anjali.singhai@intel.com> |
i40e/i40evf: Do not free the dummy packet buffer synchronously The HW still needs to consume it and freeing it in the function that created it would mean we will be racing with the HW. The i40e_clean_tx_ring() routine will free up the buffer attached once the HW has consumed it. The clean_fdir_tx_irq function had to be fixed to handle the freeing correctly. Cases where we program more than one filter per flow (Ipv4), the code had to be changed to allocate dummy buffer multiple times since it will be freed by the clean routine. This also fixes an issue where the filter program routine was not checking if there were descriptors available for programming a filter. Change-ID: Idf72028fd873221934e319d021ef65a1e51acaf7 Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Tested-by: Jim Young <jamesx.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
c571ea05 |
|
03-Jun-2014 |
Jesse Brandeburg <jesse.brandeburg@intel.com> |
i40e/i40evf: remove reserved type One of the PCTYPES that was moved to a reserved value wasn't removed from the code. Change-ID: I31fafe6d79c5f5128179979af5eaafa8c0cd62fe Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
980093eb |
|
09-May-2014 |
Jesse Brandeburg <jesse.brandeburg@intel.com> |
i40e/i40evf: fix TSO accounting The TSO logic in the transmit path had some assumptions that have been broken now that the kernel can send as much as 32kB in a single skb->frag[.] entry, even on a system with 4kB pages. This fixes the assumptions and allows the kernel to operate as efficiently as possible with both SENDFILE and SEND. In addition, the hardware limit of data contained in a descriptor is changed to the next power of two below where it currently is in order to align to a power of two value, preventing a single byte of data in a descriptor. Change-ID: I6af1f0b87c1458e10644dbd47541591075a52651 Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
d0277054 |
|
22-Apr-2014 |
Jesse Brandeburg <jesse.brandeburg@intel.com> |
i40e/i40evf: remove unused RX_LRO define Remove unused defines and macros for RX_LRO. Change-ID: I8ca6715edfa62b56837417a1c4ff68c2345dab6e Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Jim Young <jamesx.m.young@intel.com> Tested-by: Sibai Li <sibai.li@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
aee8087f |
|
08-Apr-2014 |
Jesse Brandeburg <jesse.brandeburg@intel.com> |
i40e/i40evf: remove storm control The storm control features are not part of the hardware and mistakenly were left in the code. Remove them as they are not needed any more. Change-ID: I6e9277c8da2c52e69348a657bae25271449c2099 Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
b2d36c03 |
|
08-Apr-2014 |
Kevin Scott <kevin.c.scott@intel.com> |
i40e/i40evf: Remove reserved PCTYPE defines Patch to remove PCTYPE definitions which are now reserved. Change-ID: I66c1c16a45a16f4894b2983101ab2a48ce03f1f4 Signed-off-by: Kevin Scott <kevin.c.scott@intel.com> Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Tested-by: Jim Young <jamesx.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
beb0dff1 |
|
10-Jan-2014 |
Jacob Keller <jacob.e.keller@intel.com> |
i40e: enable PTP New feature: Enable PTP support in the i40e driver. Change-ID: I6a8e799f582705191f9583afb1b9231a8db96cc8 Cc: Richard Cochran <richardcochran@gmail.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: Matthew Vick <matthew.vick@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
3126dcb7 |
|
20-Dec-2013 |
Shannon Nelson <shannon.nelson@intel.com> |
i40e: adjust ITR max and min values Set the ITR max and min values to match the hardware max value and the recommended min value. These values are shifted right one bit because the register counts in 2 usec units, so leave a comment to explain. Change-ID: I289c27955cf6c566a6d21b95c3110b88cbb15dad Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
420136cc |
|
18-Dec-2013 |
Mitch Williams <mitch.a.williams@intel.com> |
i40e: shorten wordy fields The alloc_rx_buff_failed and alloc_rx_page_failed variables are both part of an rx specific structure so just remove the _rx part of the name. No functional changes. Change-ID: Icffa2f5d13c6f2b1e09cf45b9472b83c9dae8fc6 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
dc641b73 |
|
18-Dec-2013 |
Greg Rose <gregory.v.rose@intel.com> |
i40e: Fix GPL header The GPL header included in each file in the i40e driver doesn't need to include the "this program" text since this driver is already part of the larger kernel. Signed-off-by: Greg Rose <gregory.v.rose@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
36fac581 |
|
27-Nov-2013 |
Vasu Dev <vasu.dev@intel.com> |
i40e: add header file flag _I40E_TXRX_H_ Add an include header guard to guard against multiple includes Change-Id: I73efa03efc912d2047edab903c7caed05b444da2 Signed-off-by: Vasu Dev <vasu.dev@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
12dc4fe3 |
|
27-Nov-2013 |
Mitch Williams <mitch.a.williams@intel.com> |
i40e: make a define from a large constant Make a define used in the header file by both VF and PF drivers. Change-Id: Ie9e35adcc021cd6a8f7513934984eb4ed55774f5 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
0319577f |
|
20-Nov-2013 |
Jesse Brandeburg <jesse.brandeburg@intel.com> |
i40e: remove and fix confusing define name I40E_ITR_NONE was being used as an ITRN register index by accident because it was easily associated with the I40E_RX_ITR and friends defines. Change the name slightly in order to make it clear that I40E_ITR_NONE is really associated with the DYN_CTL register sets. Change-Id: I04702c027c7495b90a8bf2db85d3e085a2c7d02a Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
980e9b11 |
|
28-Sep-2013 |
Alexander Duyck <alexander.h.duyck@intel.com> |
i40e: Add support for 64 bit netstats This change brings support for 64 bit netstats to the driver. Previously the stats were 64 bit but highly racy due to the fact that 64 bit transactions are not atomic on 32 bit systems. This change makes is so that the 64 bit byte and packet stats are reliable on all architectures. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
9f65e15b |
|
28-Sep-2013 |
Alexander Duyck <alexander.h.duyck@intel.com> |
i40e: Move rings from pointer to array to array of pointers Allocate the queue pairs individually instead of as a group. This allows for much easier queue management as it is possible to dynamically resize the queues without having to free and allocate the entire block. Ease statistic collection by treating Tx/Rx queue pairs as a single unit. Each pair is allocated together and starts with a Tx queue and ends with an Rx queue. By ordering them this way it is possible to know the Rx offset based on a pointer to the Tx queue. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
cd0b6fa6 |
|
28-Sep-2013 |
Alexander Duyck <alexander.h.duyck@intel.com> |
i40e: Replace ring container array with linked list This replaces the ring container array with a linked list. The idea is to make the logic much easier to deal with since this will allow us to call a simple helper function from the q_vectors to go through the entire list. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
a114d0a6 |
|
28-Sep-2013 |
Alexander Duyck <alexander.h.duyck@intel.com> |
i40e: Split bytes and packets from Rx/Tx stats This makes it so that the Tx and Rx byte and packet counts are separated from the rest of the statistics. This allows for better isolation of these stats when we move them into the 64 bit statistics. Simplify things by re-ordering how the stats display in ethtool. Instead of displaying all of the Tx queues as a block, followed by all the Rx queues, the new order is Tx[0], Rx[0], Tx[1], Rx[1], ..., Tx[n], Rx[n]. This reduces the loops and cleans up the display for testing purposes since it is very easy to verify if flow director is doing the right thing as the Tx and Rx queue pair are shown in pairs. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
b1941306 |
|
28-Sep-2013 |
Alexander Duyck <alexander.h.duyck@intel.com> |
i40e: Drop dead code and flags from Tx hotpath Drop Tx flag and TXSW which is tested but never set. As a result of this change we can drop a complicated check that always resulted in the final result of i40e_tx_csum being equal to the CHECKSUM_PARTIAL value. As such we can replace the entire function call with just a check for skb->summed == CHECKSUM_PARTIAL. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
a5e9c572 |
|
28-Sep-2013 |
Alexander Duyck <alexander.h.duyck@intel.com> |
i40e: clean up Tx fast path Sync the fast path for i40e_tx_map and i40e_clean_tx_irq so that they are similar to igb and ixgbe. - Only update the Tx descriptor ring in tx_map - Make skb mapping always on the first buffer in the chain - Drop the use of MAPPED_AS_PAGE Tx flag - Only store flags on the first buffer_info structure Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
35a1e2ad |
|
28-Sep-2013 |
Alexander Duyck <alexander.h.duyck@intel.com> |
i40e: Cleanup Tx buffer info layout - drop the mapped_as_page u8 from the Tx buffer info as it was unused - use the DMA unmap accessors for Tx DMA - replace checks of DMA with checks of the unmap length to verify if an unmap is needed - update the Tx buffer layout to make it consistent with igb, ixgbe Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
c304fdac |
|
28-Sep-2013 |
Alexander Duyck <alexander.h.duyck@intel.com> |
i40e: Drop unused completed stat The Tx "completed" stat was part of the original rewrite for detecting Tx hangs. However some time ago in ixgbe I determined that we could just use the packets stat instead. Since then this stat was removed from ixgbe and it serves no purpose in i40e so it can be dropped. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
7daa6bf3 |
|
11-Sep-2013 |
Jesse Brandeburg <jesse.brandeburg@intel.com> |
i40e: driver core headers This patch contains the main driver header files, containing structures and data types specific to the linux driver. i40e_osdep.h contains some code that helps us adapt our OS agnostic code to Linux. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> CC: PJ Waskiewicz <peter.p.waskiewicz.jr@intel.com> CC: e1000-devel@lists.sourceforge.net Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|