#
6cc9c6fb |
|
05-Feb-2024 |
Simon Horman <horms@kernel.org> |
mlx4: Address spelling errors Address spelling errors flagged by codespell. This patch follows-up on an earlier patch by Colin Ian King, which addressed a spelling error in a user-visible log message [1]. This patch includes that change. [1] https://lore.kernel.org/netdev/20231209225135.4055334-1-colin.i.king@gmail.com/ This patch is intended to cover all files under drivers/net/ethernet/mellanox/mlx4 Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20240205-mlx5-codespell-v1-1-63b86dffbb61@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
73382e91 |
|
09-Oct-2023 |
Christian Marangi <ansuelsmth@gmail.com> |
netdev: replace napi_reschedule with napi_schedule Now that napi_schedule return a bool, we can drop napi_reschedule that does the same exact function. The function comes from a very old commit bfe13f54f502 ("ibm_emac: Convert to use napi_struct independent of struct net_device") and the purpose is actually deprecated in favour of different logic. Convert every user of napi_reschedule to napi_schedule. Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Acked-by: Jeff Johnson <quic_jjohnson@quicinc.com> # ath10k Acked-by: Nick Child <nnac123@linux.ibm.com> # ibm Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> # for can/dev/rx-offload.c Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20231009133754.9834-3-ansuelsmth@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
9123397a |
|
12-Apr-2023 |
Jesper Dangaard Brouer <brouer@redhat.com> |
mlx4: bpf_xdp_metadata_rx_hash add xdp rss hash type Update API for bpf_xdp_metadata_rx_hash() with arg for xdp rss hash type via matching individual Completion Queue Entry (CQE) status bits. Fixes: ab46182d0dcb ("net/mlx4_en: Support RX XDP metadata") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/168132893562.340624.12779118462402031248.stgit@firesoul Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
0cd917a4 |
|
12-Apr-2023 |
Jesper Dangaard Brouer <brouer@redhat.com> |
xdp: rss hash types representation The RSS hash type specifies what portion of packet data NIC hardware used when calculating RSS hash value. The RSS types are focused on Internet traffic protocols at OSI layers L3 and L4. L2 (e.g. ARP) often get hash value zero and no RSS type. For L3 focused on IPv4 vs. IPv6, and L4 primarily TCP vs UDP, but some hardware supports SCTP. Hardware RSS types are differently encoded for each hardware NIC. Most hardware represent RSS hash type as a number. Determining L3 vs L4 often requires a mapping table as there often isn't a pattern or sorting according to ISO layer. The patch introduce a XDP RSS hash type (enum xdp_rss_hash_type) that contains both BITs for the L3/L4 types, and combinations to be used by drivers for their mapping tables. The enum xdp_rss_type_bits get exposed to BPF via BTF, and it is up to the BPF-programmer to match using these defines. This proposal change the kfunc API bpf_xdp_metadata_rx_hash() adding a pointer value argument for provide the RSS hash type. Change signature for all xmo_rx_hash calls in drivers to make it compile. The RSS type implementations for each driver comes as separate patches. Fixes: 3d76a4d3d4e5 ("bpf: XDP metadata RX kfuncs") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/168132892042.340624.582563003880565460.stgit@firesoul Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
915efd8a |
|
21-Mar-2023 |
Jesper Dangaard Brouer <brouer@redhat.com> |
xdp: bpf_xdp_metadata use EOPNOTSUPP for no driver support When driver doesn't implement a bpf_xdp_metadata kfunc the fallback implementation returns EOPNOTSUPP, which indicate device driver doesn't implement this kfunc. Currently many drivers also return EOPNOTSUPP when the hint isn't available, which is ambiguous from an API point of view. Instead change drivers to return ENODATA in these cases. There can be natural cases why a driver doesn't provide any hardware info for a specific hint, even on a frame to frame basis (e.g. PTP). Lets keep these cases as separate return codes. When describing the return values, adjust the function kernel-doc layout to get proper rendering for the return values. Fixes: ab46182d0dcb ("net/mlx4_en: Support RX XDP metadata") Fixes: bc8d405b1ba9 ("net/mlx5e: Support RX XDP metadata") Fixes: 306531f0249f ("veth: Support RX XDP metadata") Fixes: 3d76a4d3d4e5 ("bpf: XDP metadata RX kfuncs") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Stanislav Fomichev <sdf@google.com> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/167940675120.2718408.8176058626864184420.stgit@firesoul Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
ab46182d |
|
19-Jan-2023 |
Stanislav Fomichev <sdf@google.com> |
net/mlx4_en: Support RX XDP metadata RX timestamp and hash for now. Tested using the prog from the next patch. Also enabling xdp metadata support; don't see why it's disabled, there is enough headroom.. Cc: John Fastabend <john.fastabend@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Willem de Bruijn <willemb@google.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Anatoly Burakov <anatoly.burakov@intel.com> Cc: Alexander Lobakin <alexandr.lobakin@intel.com> Cc: Magnus Karlsson <magnus.karlsson@gmail.com> Cc: Maryam Tahhan <mtahhan@redhat.com> Cc: xdp-hints@xdp-project.net Cc: netdev@vger.kernel.org Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20230119221536.3349901-14-sdf@google.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
#
4444584d |
|
19-Jan-2023 |
Stanislav Fomichev <sdf@google.com> |
net/mlx4_en: Introduce wrapper for xdp_buff No functional changes. Boilerplate to allow stuffing more data after xdp_buff. Cc: John Fastabend <john.fastabend@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Willem de Bruijn <willemb@google.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Anatoly Burakov <anatoly.burakov@intel.com> Cc: Alexander Lobakin <alexandr.lobakin@intel.com> Cc: Magnus Karlsson <magnus.karlsson@gmail.com> Cc: Maryam Tahhan <mtahhan@redhat.com> Cc: xdp-hints@xdp-project.net Cc: netdev@vger.kernel.org Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20230119221536.3349901-13-sdf@google.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
#
3c2dfb73 |
|
12-Mar-2022 |
Julia Lawall <Julia.Lawall@inria.fr> |
net/mlx4_en: use kzalloc Use kzalloc instead of kmalloc + memset. The semantic patch that makes this change is: (https://coccinelle.gitlabpages.inria.fr/website/) //<smpl> @@ expression res, size, flag; @@ - res = kmalloc(size, flag); + res = kzalloc(size, flag); ... - memset(res, 0, size); //</smpl> Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20220312102705.71413-3-Julia.Lawall@inria.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
c8064e5b |
|
30-Nov-2021 |
Paolo Abeni <pabeni@redhat.com> |
bpf: Let bpf_warn_invalid_xdp_action() report more info In non trivial scenarios, the action id alone is not sufficient to identify the program causing the warning. Before the previous patch, the generated stack-trace pointed out at least the involved device driver. Let's additionally include the program name and id, and the relevant device name. If the user needs additional infos, he can fetch them via a kernel probe, leveraging the arguments added here. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/ddb96bb975cbfddb1546cf5da60e77d5100b533c.1638189075.git.pabeni@redhat.com
|
#
dee3b2d0 |
|
29-Sep-2021 |
Joshua Roys <roysjosh@gmail.com> |
net/mlx4_en: Add XDP_REDIRECT statistics Add counters for XDP REDIRECT success and failure. This brings the redirect path in line with metrics gathered via the other XDP paths. Signed-off-by: Joshua Roys <roysjosh@gmail.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a8551c9b |
|
22-Sep-2021 |
Joshua Roys <roysjosh@gmail.com> |
net: mlx4: Add support for XDP_REDIRECT Signed-off-by: Joshua Roys <roysjosh@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
eb9c5c0d |
|
22-Aug-2021 |
Christophe JAILLET <christophe.jaillet@wanadoo.fr> |
net/mellanox: switch from 'pci_' to 'dma_' API The wrappers in include/linux/pci-dma-compat.h should go away. The patch has been generated with the coccinelle script below. It has been hand modified to use 'dma_set_mask_and_coherent()' instead of 'pci_set_dma_mask()/pci_set_consistent_dma_mask()' when applicable. This is less verbose. It has been compile tested. @@ @@ - PCI_DMA_BIDIRECTIONAL + DMA_BIDIRECTIONAL @@ @@ - PCI_DMA_TODEVICE + DMA_TO_DEVICE @@ @@ - PCI_DMA_FROMDEVICE + DMA_FROM_DEVICE @@ @@ - PCI_DMA_NONE + DMA_NONE @@ expression e1, e2, e3; @@ - pci_alloc_consistent(e1, e2, e3) + dma_alloc_coherent(&e1->dev, e2, e3, GFP_) @@ expression e1, e2, e3; @@ - pci_zalloc_consistent(e1, e2, e3) + dma_alloc_coherent(&e1->dev, e2, e3, GFP_) @@ expression e1, e2, e3, e4; @@ - pci_free_consistent(e1, e2, e3, e4) + dma_free_coherent(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_map_single(e1, e2, e3, e4) + dma_map_single(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_unmap_single(e1, e2, e3, e4) + dma_unmap_single(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4, e5; @@ - pci_map_page(e1, e2, e3, e4, e5) + dma_map_page(&e1->dev, e2, e3, e4, e5) @@ expression e1, e2, e3, e4; @@ - pci_unmap_page(e1, e2, e3, e4) + dma_unmap_page(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_map_sg(e1, e2, e3, e4) + dma_map_sg(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_unmap_sg(e1, e2, e3, e4) + dma_unmap_sg(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_dma_sync_single_for_cpu(e1, e2, e3, e4) + dma_sync_single_for_cpu(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_dma_sync_single_for_device(e1, e2, e3, e4) + dma_sync_single_for_device(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_dma_sync_sg_for_cpu(e1, e2, e3, e4) + dma_sync_sg_for_cpu(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_dma_sync_sg_for_device(e1, e2, e3, e4) + dma_sync_sg_for_device(&e1->dev, e2, e3, e4) @@ expression e1, e2; @@ - pci_dma_mapping_error(e1, e2) + dma_mapping_error(&e1->dev, e2) @@ expression e1, e2; @@ - pci_set_dma_mask(e1, e2) + dma_set_mask(&e1->dev, e2) @@ expression e1, e2; @@ - pci_set_consistent_dma_mask(e1, e2) + dma_set_coherent_mask(&e1->dev, e2) Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c4411b37 |
|
24-Jun-2021 |
Toke Høiland-Jørgensen <toke@redhat.com> |
mlx4: Remove rcu_read_lock() around XDP program invocation The mlx4 driver has rcu_read_lock()/rcu_read_unlock() pairs around XDP program invocations. However, the actual lifetime of the objects referred by the XDP program invocation is longer, all the way through to the call to xdp_do_flush(), making the scope of the rcu_read_lock() too small. This turns out to be harmless because it all happens in a single NAPI poll cycle (and thus under local_bh_disable()), but it makes the rcu_read_lock() misleading. Rather than extend the scope of the rcu_read_lock(), just get rid of it entirely. With the addition of RCU annotations to the XDP_REDIRECT map types that take bh execution into account, lockdep even understands this to be safe, so there's really no reason to keep it around. Also switch the RCU dereferences in the driver loop itself to the _bh variants. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/bpf/20210624160609.292325-14-toke@redhat.com
|
#
c420c989 |
|
07-Jun-2021 |
Matteo Croce <mcroce@microsoft.com> |
skbuff: add a parameter to __skb_frag_unref This is a prerequisite patch, the next one is enabling recycling of skbs and fragments. Add an extra argument on __skb_frag_unref() to handle recycling, and update the current users of the function with that. Signed-off-by: Matteo Croce <mcroce@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
be9df4af |
|
22-Dec-2020 |
Lorenzo Bianconi <lorenzo@kernel.org> |
net, xdp: Introduce xdp_prepare_buff utility routine Introduce xdp_prepare_buff utility routine to initialize per-descriptor xdp_buff fields (e.g. xdp_buff pointers). Rely on xdp_prepare_buff() in all XDP capable drivers. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Shay Agroskin <shayagr@amazon.com> Acked-by: Martin Habets <habetsm.xilinx@gmail.com> Acked-by: Camelia Groza <camelia.groza@nxp.com> Acked-by: Marcin Wojtas <mw@semihalf.com> Link: https://lore.kernel.org/bpf/45f46f12295972a97da8ca01990b3e71501e9d89.1608670965.git.lorenzo@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
43b5169d |
|
22-Dec-2020 |
Lorenzo Bianconi <lorenzo@kernel.org> |
net, xdp: Introduce xdp_init_buff utility routine Introduce xdp_init_buff utility routine to initialize xdp_buff fields const over NAPI iterations (e.g. frame_sz or rxq pointer). Rely on xdp_init_buff in all XDP capable drivers. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Shay Agroskin <shayagr@amazon.com> Acked-by: Martin Habets <habetsm.xilinx@gmail.com> Acked-by: Camelia Groza <camelia.groza@nxp.com> Acked-by: Marcin Wojtas <mw@semihalf.com> Link: https://lore.kernel.org/bpf/7f8329b6da1434dc2b05a77f2e800b29628a8913.1608670965.git.lorenzo@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
80a62dee |
|
10-Dec-2020 |
Thomas Gleixner <tglx@linutronix.de> |
net/mlx4: Replace irq_to_desc() abuse No driver has any business with the internals of an interrupt descriptor. Storing a pointer to it just to use yet another helper at the actual usage site to retrieve the affinity mask is creative at best. Just because C does not allow encapsulation does not mean that the kernel has no limits. Retrieve a pointer to the affinity mask itself and use that. It's still using an interface which is usually not for random drivers, but definitely less hideous than the previous hack. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20201210194044.580936243@linutronix.de
|
#
b02e5a0e |
|
30-Nov-2020 |
Björn Töpel <bjorn@kernel.org> |
xsk: Propagate napi_id to XDP socket Rx path Add napi_id to the xdp_rxq_info structure, and make sure the XDP socket pick up the napi_id in the Rx path. The napi_id is used to find the corresponding NAPI structure for socket busy polling. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/bpf/20201130185205.196029-7-bjorn.topel@gmail.com
|
#
1a0058cf |
|
17-Nov-2020 |
Tariq Toukan <tariqt@nvidia.com> |
net/mlx4_en: Remove unused performance counters Performance analysis counters are maintained under the MLX4_EN_PERF_STAT definition, which is never set. Clean them up, with all related structures and logic. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Link: https://lore.kernel.org/r/20201118103427.4314-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
785d21b8 |
|
06-Nov-2020 |
Kaixu Xia <kaixuxia@tencent.com> |
net/mlx4: Assign boolean values to a bool variable Fix the following coccinelle warnings: ./drivers/net/ethernet/mellanox/mlx4/en_rx.c:687:1-17: WARNING: Assignment of 0/1 to bool variable Reported-by: Tosk Robot <tencent_os_robot@tencent.com> Signed-off-by: Kaixu Xia <kaixuxia@tencent.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/1604732038-6057-1-git-send-email-kaixuxia@tencent.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
b2b8a927 |
|
08-Oct-2020 |
Jonathan Lemon <bsd@fb.com> |
mlx4: handle non-napi callers to napi_poll netcons calls napi_poll with a budget of 0 to transmit packets. Handle this by: - skipping RX processing - do not try to recycle TX packets to the RX cache Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
aed4d4c6 |
|
26-Aug-2020 |
Tariq Toukan <tariqt@mellanox.com> |
net/mlx4_en: RX, Add a prefetch command for small L1_CACHE_BYTES A single cacheline might not contain the packet header for small L1_CACHE_BYTES values. Use net_prefetch() as it issues an additional prefetch in this case. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5e619d73 |
|
26-Jul-2020 |
Gustavo A. R. Silva <gustavoars@kernel.org> |
net/mlx4: Use fallthrough pseudo-keyword Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d201ea9e |
|
13-May-2020 |
Jesper Dangaard Brouer <brouer@redhat.com> |
mlx4: Add XDP frame size and adjust max XDP MTU The mlx4 drivers size of memory backing the RX packet is stored in frag_stride. For XDP mode this will be PAGE_SIZE (normally 4096). For normal mode frag_stride is 2048. Also adjust MLX4_EN_MAX_XDP_MTU to take tailroom into account. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Cc: Saeed Mahameed <saeedm@mellanox.com> Link: https://lore.kernel.org/bpf/158945341893.97035.2688142527052329942.stgit@firesoul
|
#
cf4058db |
|
22-Apr-2020 |
Eric Dumazet <edumazet@google.com> |
net/mlx4_en: use napi_complete_done() in TX completion In order to benefit from the new napi_defer_hard_irqs feature, we need to use napi_complete_done() variant in this driver. RX path is already using it, this patch implements TX completion side. mlx4_en_process_tx_cq() now returns the amount of retired packets, instead of a boolean, so that mlx4_en_poll_tx_cq() can pass this value to napi_complete_done(). Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
48ec7014 |
|
12-Aug-2019 |
Wenwen Wang <wenwen@cs.uga.edu> |
net/mlx4_en: fix a memory leak bug In mlx4_en_config_rss_steer(), 'rss_map->indir_qp' is allocated through kzalloc(). After that, mlx4_qp_alloc() is invoked to configure RSS indirection. However, if mlx4_qp_alloc() fails, the allocated 'rss_map->indir_qp' is not deallocated, leading to a memory leak bug. To fix the above issue, add the 'qp_alloc_err' label to free 'rss_map->indir_qp'. Fixes: 4931c6ef04b4 ("net/mlx4_en: Optimized single ring steering") Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
|
#
74abc07d |
|
11-Feb-2019 |
Saeed Mahameed <saeedm@mellanox.com> |
net/mlx4_en: Force CHECKSUM_NONE for short ethernet frames When an ethernet frame is padded to meet the minimum ethernet frame size, the padding octets are not covered by the hardware checksum. Fortunately the padding octets are usually zero's, which don't affect checksum. However, it is not guaranteed. For example, switches might choose to make other use of these octets. This repeatedly causes kernel hardware checksum fault. Prior to the cited commit below, skb checksum was forced to be CHECKSUM_NONE when padding is detected. After it, we need to keep skb->csum updated. However, fixing up CHECKSUM_COMPLETE requires to verify and parse IP headers, it does not worth the effort as the packets are so small that CHECKSUM_COMPLETE has no significant advantage. Future work: when reporting checksum complete is not an option for IP non-TCP/UDP packets, we can actually fallback to report checksum unnecessary, by looking at cqe IPOK bit. Fixes: 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends") Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
29dded89 |
|
11-Feb-2019 |
Saeed Mahameed <saeedm@mellanox.com> |
net/mlx4_en: Force CHECKSUM_NONE for short ethernet frames When an ethernet frame is padded to meet the minimum ethernet frame size, the padding octets are not covered by the hardware checksum. Fortunately the padding octets are usually zero's, which don't affect checksum. However, it is not guaranteed. For example, switches might choose to make other use of these octets. This repeatedly causes kernel hardware checksum fault. Prior to the cited commit below, skb checksum was forced to be CHECKSUM_NONE when padding is detected. After it, we need to keep skb->csum updated. However, fixing up CHECKSUM_COMPLETE requires to verify and parse IP headers, it does not worth the effort as the packets are so small that CHECKSUM_COMPLETE has no significant advantage. Future work: when reporting checksum complete is not an option for IP non-TCP/UDP packets, we can actually fallback to report checksum unnecessary, by looking at cqe IPOK bit. Fixes: 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends") Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4beaacc6 |
|
13-Dec-2018 |
Eric Dumazet <edumazet@google.com> |
net/mlx4_en: remove fallback after kzalloc_node() kzalloc_node(..., GFP_KERNEL, node) will attempt to allocate memory as close as possible to the node. There is no need to fallback to kzalloc() if this has failed. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4b17f9fe |
|
08-Nov-2018 |
Michał Mirosław <mirq-linux@rere.qmqm.pl> |
mlx4: use __vlan_hwaccel helpers Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3aa8029e |
|
30-Oct-2018 |
Eric Dumazet <edumazet@google.com> |
net/mlx4_en: add a missing <net/ip.h> include Abdul Haleem reported a build error on ppc : drivers/net/ethernet/mellanox/mlx4/en_rx.c:582:18: warning: `struct iphdr` declared inside parameter list [enabled by default] struct iphdr *iph) ^ drivers/net/ethernet/mellanox/mlx4/en_rx.c:582:18: warning: its scope is only this definition or declaration, which is probably not what you want [enabled by default] drivers/net/ethernet/mellanox/mlx4/en_rx.c: In function get_fixed_ipv4_csum: drivers/net/ethernet/mellanox/mlx4/en_rx.c:586:20: error: dereferencing pointer to incomplete type __u8 ipproto = iph->protocol; ^ Fixes: 55469bc6b577 ("drivers: net: remove <net/busy_poll.h> inclusion when not needed") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
55469bc6 |
|
25-Oct-2018 |
Eric Dumazet <edumazet@google.com> |
drivers: net: remove <net/busy_poll.h> inclusion when not needed Drivers using generic NAPI interface no longer need to include <net/busy_poll.h>, since busy polling was moved to core networking stack long ago. See commit 79e7fff47b7b ("net: remove support for per driver ndo_busy_poll()") for reference. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c8581f2b |
|
07-Aug-2018 |
Gustavo A. R. Silva <gustavo@embeddedor.com> |
net/mlx4/en_rx: Mark expected switch fall-throughs In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Addresses-Coverity-ID: 114794 ("Missing break in switch") Addresses-Coverity-ID: 114795 ("Missing break in switch") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
432e629e |
|
15-Jul-2018 |
Saeed Mahameed <saeedm@mellanox.com> |
net/mlx4_en: Don't reuse RX page when XDP is set When a new rx packet arrives, the rx path will decide whether to reuse the remainder of the page or not according to one of the below conditions: 1. frag_info->frag_stride == PAGE_SIZE / 2 2. frags->page_offset + frag_info->frag_size > PAGE_SIZE; The first condition is no met for when XDP is set. For XDP, page_offset is always set to priv->rx_headroom which is XDP_PACKET_HEADROOM and frag_info->frag_size is around mtu size + some padding, still the 2nd release condition will hold since XDP_PACKET_HEADROOM + 1536 < PAGE_SIZE, as a result the page will not be released and will be _wrongly_ reused for next free rx descriptor. In XDP there is an assumption to have a page per packet and reuse can break such assumption and might cause packet data corruptions. Fix this by adding an extra condition (!priv->rx_headroom) to the 2nd case to avoid page reuse when XDP is set, since rx_headroom is set to 0 for non XDP setup and set to XDP_PACKET_HEADROOM for XDP setup. No additional cache line is required for the new condition. Fixes: 34db548bfb95 ("mlx4: add page recycling in receive path") Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Suggested-by: Martin KaFai Lau <kafai@fb.com> CC: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2d943adf |
|
19-Apr-2018 |
Eric Dumazet <edumazet@google.com> |
net/mlx4_en: optimizes get_fixed_ipv6_csum() While trying to support CHECKSUM_COMPLETE for IPV6 fragments, I had to experiments various hacks in get_fixed_ipv6_csum(). I must admit I could not find how to implement this :/ However, get_fixed_ipv6_csum() does a lot of redundant operations, calling csum_partial() twice. First csum_partial() computes the checksum of saddr and daddr, put in @csum_pseudo_hdr. Undone later in the second csum_partial() computed on whole ipv6 header. Then nexthdr is added once, added a second time, then substracted. payload_len is added once, then substracted. Really all this can be reduced to two add_csum(), to add back 6 bytes that were removed by mlx4 when providing hw_checksum in RX descriptor. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Saeed Mahameed <saeedm@mellanox.com> Cc: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Acked-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e5e0a59b |
|
17-Apr-2018 |
Nikita V. Shirokov <tehnerd@tehnerd.com> |
bpf: make mlx4 compatible w/ bpf_xdp_adjust_tail w/ bpf_xdp_adjust_tail helper xdp's data_end pointer could be changed as well (only "decrease" of pointer's location is going to be supported). changing of this pointer will change packet's size. for mlx4 driver we will just calculate packet's length unconditionally (the same way as it's already being done in mlx5) Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Nikita V. Shirokov <tehnerd@tehnerd.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
ce5a453c |
|
27-Mar-2018 |
Eric Dumazet <edumazet@google.com> |
net/mlx4_en: CHECKSUM_COMPLETE support for fragments Refine the RX check summing handling to propagate the hardware provided checksum so that we do not have to compute it later in software. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Tariq Toukan <tariqt@mellanox.com> Acked-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d8c13f22 |
|
06-Mar-2018 |
Eric Dumazet <edumazet@google.com> |
net/mlx4_en: try to use high order pages for RX rings RX rings can fit most of the time in a contiguous piece of memory, so lets use kvzalloc_node/kvfree instead of vzalloc_node/vfree Note that kvzalloc_node() automatically falls back to another node, there is no need to do the fallback ourselves. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a970d8db |
|
27-Feb-2018 |
Tariq Toukan <tariqt@mellanox.com> |
net/mlx4_en: RX csum, pre-define enabled protocols for IP status masking Pre-define a mask for IP status of a completion, that tests the MLX4_CQE_STATUS_IPV6 only in case CONFIG_IPV6 is enabled. Use it for IP status testing upon completion, instead of separating the datapath into two flows. This takes common code structures (such as closing parenthesis) back to their original place, and makes code more readable. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Suggested-by: David S. Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1cb8b121 |
|
27-Feb-2018 |
Tariq Toukan <tariqt@mellanox.com> |
net/mlx4_en: Combine checks of end-cases in RX completion function Combine two end-cases in the same if statement with a single return value. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ae75415d |
|
03-Jan-2018 |
Jesper Dangaard Brouer <brouer@redhat.com> |
mlx4: setup xdp_rxq_info Driver hook points for xdp_rxq_info: * reg : mlx4_en_create_rx_ring * unreg: mlx4_en_destroy_rx_ring Tested on actual hardware. Cc: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
dc484851 |
|
28-Dec-2017 |
Tariq Toukan <tariqt@mellanox.com> |
net/mlx4_en: RX csum, reorder branches Use early goto commands, and save else branches. This uses less indentations and brackets, making the code more readable. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
345ef18c |
|
28-Dec-2017 |
Tariq Toukan <tariqt@mellanox.com> |
net/mlx4_en: RX csum, remove redundant branches and checks Do not check IPv6 bit in cqe status if CONFIG_IPV6 is not enabled. Function check_csum() is reached only with IPv4 or IPv6 set (if enabled), if IPv6 is not set (or is not enabled) it is redundant to test the IPv4 bit. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
453f85d4 |
|
15-Nov-2017 |
Mel Gorman <mgorman@techsingularity.net> |
mm: remove __GFP_COLD As the page free path makes no distinction between cache hot and cold pages, there is no real useful ordering of pages in the free list that allocation requests can take advantage of. Juding from the users of __GFP_COLD, it is likely that a number of them are the result of copying other sites instead of actually measuring the impact. Remove the __GFP_COLD parameter which simplifies a number of paths in the page allocator. This is potentially controversial but bear in mind that the size of the per-cpu pagelists versus modern cache sizes means that the whole per-cpu list can often fit in the L3 cache. Hence, there is only a potential benefit for microbenchmarks that alloc/free pages in a tight loop. It's even worse when THP is taken into account which has little or no chance of getting a cache-hot page as the per-cpu list is bypassed and the zeroing of multiple pages will thrash the cache anyway. The truncate microbenchmarks are not shown as this patch affects the allocation path and not the free path. A page fault microbenchmark was tested but it showed no sigificant difference which is not surprising given that the __GFP_COLD branches are a miniscule percentage of the fault path. Link: http://lkml.kernel.org/r/20171018075952.10627-9-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
5dad61b8 |
|
11-Oct-2017 |
Tariq Toukan <tariqt@mellanox.com> |
net/mlx4_en: Replace netdev parameter with priv in XDP xmit function The struct net_device parameter was passed only to extract struct mlx4_en_priv out of it. Here we pass the priv parameter directly. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
80a8dc75 |
|
09-Oct-2017 |
Inbar Karmy <inbark@mellanox.com> |
net/mlx4_en: Increase number of default RX rings Remove limitation of netif_get_num_default_rss_queues() from logic of RX rings default number. Signed-off-by: Inbar Karmy <inbark@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
de8f3a83 |
|
24-Sep-2017 |
Daniel Borkmann <daniel@iogearbox.net> |
bpf: add meta pointer for direct access This work enables generic transfer of metadata from XDP into skb. The basic idea is that we can make use of the fact that the resulting skb must be linear and already comes with a larger headroom for supporting bpf_xdp_adjust_head(), which mangles xdp->data. Here, we base our work on a similar principle and introduce a small helper bpf_xdp_adjust_meta() for adjusting a new pointer called xdp->data_meta. Thus, the packet has a flexible and programmable room for meta data, followed by the actual packet data. struct xdp_buff is therefore laid out that we first point to data_hard_start, then data_meta directly prepended to data followed by data_end marking the end of packet. bpf_xdp_adjust_head() takes into account whether we have meta data already prepended and if so, memmove()s this along with the given offset provided there's enough room. xdp->data_meta is optional and programs are not required to use it. The rationale is that when we process the packet in XDP (e.g. as DoS filter), we can push further meta data along with it for the XDP_PASS case, and give the guarantee that a clsact ingress BPF program on the same device can pick this up for further post-processing. Since we work with skb there, we can also set skb->mark, skb->priority or other skb meta data out of BPF, thus having this scratch space generic and programmable allows for more flexibility than defining a direct 1:1 transfer of potentially new XDP members into skb (it's also more efficient as we don't need to initialize/handle each of such new members). The facility also works together with GRO aggregation. The scratch space at the head of the packet can be multiple of 4 byte up to 32 byte large. Drivers not yet supporting xdp->data_meta can simply be set up with xdp->data_meta as xdp->data + 1 as bpf_xdp_adjust_meta() will detect this and bail out, such that the subsequent match against xdp->data for later access is guaranteed to fail. The verifier treats xdp->data_meta/xdp->data the same way as we treat xdp->data/xdp->data_end pointer comparisons. The requirement for doing the compare against xdp->data is that it hasn't been modified from it's original address we got from ctx access. It may have a range marking already from prior successful xdp->data/xdp->data_end pointer comparisons though. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
31975e27 |
|
15-Aug-2017 |
stephen hemminger <stephen@networkplumber.org> |
mlx4: sizeof style usage The kernel coding style is to treat sizeof as a function (ie. with parenthesis) not as an operator. Also use kcalloc and kmalloc_array Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e718fe45 |
|
03-Aug-2017 |
Davide Caratti <dcaratti@redhat.com> |
net/mlx4_en: don't set CHECKSUM_COMPLETE on SCTP packets if the NIC fails to validate the checksum on TCP/UDP, and validation of IP checksum is successful, the driver subtracts the pseudo-header checksum from the value obtained by the hardware and sets CHECKSUM_COMPLETE. Don't do that if protocol is IPPROTO_SCTP, otherwise CRC32c validation fails. V2: don't test MLX4_CQE_STATUS_IPV6 if MLX4_CQE_STATUS_IPV4 is set Reported-by: Shuang Li <shuali@redhat.com> Fixes: f8c6455bb04b ("net/mlx4_en: Extend checksum offloading by CHECKSUM COMPLETE") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f3301870 |
|
21-Jun-2017 |
Moshe Shemesh <moshe@mellanox.com> |
(IB, net)/mlx4: Add resource utilization support Adding visibility of resource usage of QPs, CQs and counters used by virtual functions. This feature will be used to give the PF administrator more data while debugging VF status. Usage info was added to ALLOC_RES command, to notify the PF if the resource which is being reserved or allocated for the VF will be used by kernel driver or by user verbs. Updated reservation and allocation functions of QP, CQ and counter with additional usage parameter. Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
8900b894 |
|
23-May-2017 |
Leon Romanovsky <leon@kernel.org> |
{net, IB}/mlx4: Remove gfp flags argument The caller to the driver marks GFP_NOIO allocations with help of memalloc_noio-* calls now. This makes redundant to pass down to the driver gfp flags, which can be GFP_KERNEL only. The patch removes the gfp flags argument and updates all driver paths. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
6c78511b |
|
15-Jun-2017 |
Tariq Toukan <tariqt@mellanox.com> |
net/mlx4_en: Poll XDP TX completion queue in RX NAPI Instead of having their own NAPIs, XDP TX completion queues get polled within the corresponding RX NAPI. This prevents any possible race on TX ring prod/cons indices, between the context that issues the transmits (RX NAPI) and the context that handles the completions (was previously done in a separate NAPI). This also improves performance, as it decreases the number of NAPIs running on a CPU, saving the overhead of syncing and switching between the contexts. Performance tests: Tested on ConnectX3Pro, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz Single queue no-RSS optimization ON. XDP_TX packet rate: ------------------------------------- | Before | After | Gain | IPv4 | 12.0 Mpps | 13.8 Mpps | 15% | IPv6 | 12.0 Mpps | 13.8 Mpps | 15% | ------------------------------------- Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Cc: kernel-team@fb.com Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
36ea7964 |
|
15-Jun-2017 |
Tariq Toukan <tariqt@mellanox.com> |
net/mlx4_en: Improve XDP xmit function Several performance improvements in XDP TX datapath, including: - Ring a single doorbell for XDP TX ring per NAPI budget, instead of doing it per a lower threshold (was 8). This includes removing the flow of immediate doorbell ringing in case of a full TX ring. - Compiler branch predictor hints. - Calculate values in compile time rather than in runtime. Performance tests: Tested on ConnectX3Pro, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz Single queue no-RSS optimization ON. XDP_TX packet rate: ------------------------------------- | Before | After | Gain | IPv4 | 10.3 Mpps | 12.0 Mpps | 17% | IPv6 | 10.3 Mpps | 12.0 Mpps | 17% | ------------------------------------- Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Cc: kernel-team@fb.com Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9bcee89a |
|
15-Jun-2017 |
Tariq Toukan <tariqt@mellanox.com> |
net/mlx4_en: Improve receive data-path Several small performance improvements in RX datapath, including: - Compiler branch predictor hints. - Replace a multiplication with a shift operation. - Minimize variables scope. - Write-prefetch for packet header. - Avoid trinary-operator ("?") when value can be preset in a matching branch. - Save a branch by updating RX ring doorbell within mlx4_en_refill_rx_buffers(), which now returns void. Performance tests: Tested on ConnectX3Pro, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz Single queue no-RSS optimization ON (enable by ethtool -L <interface> rx 1). XDP_DROP packet rate: Same (28.1 Mpps), lower CPU utilization (from ~100% to ~92%). Drop packets in TC: ------------------------------------- | Before | After | Gain | IPv4 | 4.14 Mpps | 4.18 Mpps | 1% | ------------------------------------- XDP_TX packet rate: ------------------------------------- | Before | After | Gain | IPv4 | 10.1 Mpps | 10.3 Mpps | 2% | IPv6 | 10.1 Mpps | 10.3 Mpps | 2% | ------------------------------------- Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Cc: kernel-team@fb.com Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4931c6ef |
|
15-Jun-2017 |
Saeed Mahameed <saeedm@mellanox.com> |
net/mlx4_en: Optimized single ring steering Avoid touching RX QP RSS context when loading with only one RX ring, to allow optimized A0 RX steering. Enable by: - loading mlx4_core with module param: log_num_mgm_entry_size = -6. - then: ethtool -L <interface> rx 1 Performance tests: Tested on ConnectX3Pro, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz XDP_DROP packet rate: ------------------------------------- | Before | After | Gain | IPv4 | 20.5 Mpps | 28.1 Mpps | 37% | IPv6 | 18.4 Mpps | 28.1 Mpps | 53% | ------------------------------------- Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Cc: kernel-team@fb.com Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
505a9249 |
|
09-May-2017 |
Kamal Heib <kamalh@mellanox.com> |
net/mlx4_en: Change the error print to debug print The error print within mlx4_en_calc_rx_buf() should be a debug print. Fixes: 51151a16a60f ('mlx4: allow order-0 memory allocations in RX path') Signed-off-by: Kamal Heib <kamalh@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
68b8df46 |
|
08-Mar-2017 |
Eric Dumazet <edumazet@google.com> |
mlx4: remove duplicate code in mlx4_en_process_rx_cq() We should keep one way to build skbs, regardless of GRO being on or off. Note that I made sure to defer as much as possible the point we need to pull data from the frame, so that future prefetch() we might add are more effective. These skb attributes derive from the CQE or ring : ip_summed, csum hash vlan offload hwtstamps queue_mapping As a bonus, this patch removes mlx4 dependency on eth_get_headlen() which is very often broken enough to give us headaches. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6969cf0f |
|
08-Mar-2017 |
Eric Dumazet <edumazet@google.com> |
mlx4: make validate_loopback() more generic Testing a boolean in fast path is not worth duplicating the code allocating packets, when GRO is on or off. If this proves to be a problem, we might later use a jump label. Next patch will remove this duplicated code and ease code review. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
02e6fd3e |
|
08-Mar-2017 |
Eric Dumazet <edumazet@google.com> |
mlx4: factorize page_address() calls We need to compute the frame virtual address at different points. Do it once. Following patch will use the new va address for validate_loopback() Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9e8c0395 |
|
08-Mar-2017 |
Eric Dumazet <edumazet@google.com> |
mlx4: do not access rx_desc from mlx4_en_process_rx_cq() Instead of fetching dma address from rx_desc->data[0].addr, prefer using frags[0].dma + frags[0].page_offset to avoid a potential cache line miss. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7d7bfc6a |
|
08-Mar-2017 |
Eric Dumazet <edumazet@google.com> |
mlx4: add rx_alloc_pages counter in ethtool -S This new counter tracks number of pages that we allocated for one port. lpaa24:~# ethtool -S eth0 | egrep 'rx_alloc_pages|rx_packets' rx_packets: 306755183 rx_alloc_pages: 932897 Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
34db548b |
|
08-Mar-2017 |
Eric Dumazet <edumazet@google.com> |
mlx4: add page recycling in receive path Same technique than some Intel drivers, for arches where PAGE_SIZE = 4096 In most cases, pages are reused because they were consumed before we could loop around the RX ring. This brings back performance, and is even better, a single TCP flow reaches 30Gbit on my hosts. v2: added full memset() in mlx4_en_free_frag(), as Tariq found it was needed if we switch to large MTU, as priv->log_rx_info can dynamically be changed. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b5a54d9a |
|
08-Mar-2017 |
Eric Dumazet <edumazet@google.com> |
mlx4: use order-0 pages for RX Use of order-3 pages is problematic in some cases. This patch might add three kinds of regression : 1) a CPU performance regression, but we will add later page recycling and performance should be back. 2) TCP receiver could grow its receive window slightly slower, because skb->len/skb->truesize ratio will decrease. This is mostly ok, we prefer being conservative to not risk OOM, and eventually tune TCP better in the future. This is consistent with other drivers using 2048 per ethernet frame. 3) Because we allocate one page per RX slot, we consume more memory for the ring buffers. XDP already had this constraint anyway. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
60c7f5ae |
|
08-Mar-2017 |
Eric Dumazet <edumazet@google.com> |
mlx4: removal of frag_sizes[] We will soon use order-0 pages, and frag truesize will more precisely match real sizes. In the new model, we prefer to use <= 2048 bytes fragments, so that we can use page-recycle technique on PAGE_SIZE=4096 arches. We will still pack as much frames as possible on arches with big pages, like PowerPC. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
acd7628d |
|
08-Mar-2017 |
Eric Dumazet <edumazet@google.com> |
mlx4: reduce rx ring page_cache size We only need to store the page and dma address. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d85f6c14 |
|
08-Mar-2017 |
Eric Dumazet <edumazet@google.com> |
mlx4: rx_headroom is a per port attribute No need to duplicate it per RX queue / frags. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
aaca121d |
|
08-Mar-2017 |
Eric Dumazet <edumazet@google.com> |
mlx4: get rid of frag_prefix_size Using per frag storage for frag_prefix_size is really silly. mlx4_en_complete_rx_desc() has all needed info already. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
159ddfd2 |
|
08-Mar-2017 |
Eric Dumazet <edumazet@google.com> |
mlx4: remove order field from mlx4_en_frag_info This is really a port attribute, no need to duplicate it per RX queue and per frag. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
69ba9431 |
|
08-Mar-2017 |
Eric Dumazet <edumazet@google.com> |
mlx4: dma_dir is a mlx4_en_priv attribute No need to duplicate it for all queues and frags. num_frags & log_rx_info become u8 to save space. u8 accesses are a bit faster than u16 anyway. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7f0137e2 |
|
22-Feb-2017 |
Eric Dumazet <edumazet@google.com> |
net/mlx4_en: Use __skb_fill_page_desc() Or we might miss the fact that a page was allocated from memory reserves. Fixes: dceeab0e5258 ("mlx4: support __GFP_MEMALLOC for rx") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bd4ce941 |
|
06-Feb-2017 |
Benjamin Poirier <bpoirier@suse.com> |
mlx4: Invoke softirqs after napi_reschedule mlx4 may schedule napi from a workqueue. Afterwards, softirqs are not run in a deterministic time frame and the following message may be logged: NOHZ: local_softirq_pending 08 The problem is the same as what was described in commit ec13ee80145c ("virtio_net: invoke softirqs after __napi_schedule") and this patch applies the same fix to mlx4. Fixes: 07841f9d94c1 ("net/mlx4_en: Schedule napi when RX buffers allocation fails") Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Benjamin Poirier <bpoirier@suse.com> Acked-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a67edbf4 |
|
24-Jan-2017 |
Daniel Borkmann <daniel@iogearbox.net> |
bpf: add initial bpf tracepoints This work adds a number of tracepoints to paths that are either considered slow-path or exception-like states, where monitoring or inspecting them would be desirable. For bpf(2) syscall, tracepoints have been placed for main commands when they succeed. In XDP case, tracepoint is for exceptions, that is, f.e. on abnormal BPF program exit such as unknown or XDP_ABORTED return code, or when error occurs during XDP_TX action and the packet could not be forwarded. Both have been split into separate event headers, and can be further extended. Worst case, if they unexpectedly should get into our way in future, they can also removed [1]. Of course, these tracepoints (like any other) can be analyzed by eBPF itself, etc. Example output: # ./perf record -a -e bpf:* sleep 10 # ./perf script sock_example 6197 [005] 283.980322: bpf:bpf_map_create: map type=ARRAY ufd=4 key=4 val=8 max=256 flags=0 sock_example 6197 [005] 283.980721: bpf:bpf_prog_load: prog=a5ea8fa30ea6849c type=SOCKET_FILTER ufd=5 sock_example 6197 [005] 283.988423: bpf:bpf_prog_get_type: prog=a5ea8fa30ea6849c type=SOCKET_FILTER sock_example 6197 [005] 283.988443: bpf:bpf_map_lookup_elem: map type=ARRAY ufd=4 key=[06 00 00 00] val=[00 00 00 00 00 00 00 00] [...] sock_example 6197 [005] 288.990868: bpf:bpf_map_lookup_elem: map type=ARRAY ufd=4 key=[01 00 00 00] val=[14 00 00 00 00 00 00 00] swapper 0 [005] 289.338243: bpf:bpf_prog_put_rcu: prog=a5ea8fa30ea6849c type=SOCKET_FILTER [1] https://lwn.net/Articles/705270/ Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
dceeab0e |
|
17-Jan-2017 |
Eric Dumazet <edumazet@google.com> |
mlx4: support __GFP_MEMALLOC for rx Commit 04aeb56a1732 ("net/mlx4_en: allocate non 0-order pages for RX ring with __GFP_NOMEMALLOC") added code that appears to be not needed at that time, since mlx4 never used __GFP_MEMALLOC allocations anyway. As using memory reserves is a must in some situations (swap over NFS or iSCSI), this patch adds this flag. Note that this driver does not reuse pages (yet) so we do not have to add anything else. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6496bbf0 |
|
29-Dec-2016 |
Eugenia Emantayev <eugenia@mellanox.com> |
net/mlx4_en: Fix bad WQE issue Single send WQE in RX buffer should be stamped with software ownership in order to prevent the flow of QP in error in FW once UPDATE_QP is called. Fixes: 9f519f68cfff ('mlx4_en: Not using Shared Receive Queues') Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ea3349a0 |
|
07-Dec-2016 |
Martin KaFai Lau <kafai@fb.com> |
mlx4: xdp: Reserve headroom for receiving packet when XDP prog is active Reserve XDP_PACKET_HEADROOM for packet and enable bpf_xdp_adjust_head() support. This patch only affects the code path when XDP is active. After testing, the tx_dropped counter is incremented if the xdp_prog sends more than wire MTU. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b45f0674 |
|
07-Dec-2016 |
Martin KaFai Lau <kafai@fb.com> |
mlx4: xdp: Allow raising MTU up to one page minus eth and vlan hdrs When XDP is active in mlx4, mlx4 is using one page/pkt. At the same time (i.e. when XDP is active), it is currently limiting MTU to be FRAG_SZ0 - ETH_HLEN - (2 * VLAN_HLEN) which is 1514 in x86. AFAICT, we can at least raise the MTU limit up to PAGE_SIZE - ETH_HLEN - (2 * VLAN_HLEN) which this patch is doing. It will be useful in the next patch which allows XDP program to extend the packet by adding new header(s). Note: In the earlier XDP patches, there is already existing guard to ensure the page/pkt scheme only applies when XDP is active in mlx4. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
dad42c30 |
|
20-Nov-2016 |
Eric Dumazet <edumazet@google.com> |
mlx4: avoid unnecessary dirtying of critical fields While stressing a 40Gbit mlx4 NIC with busy polling, I found false sharing in mlx4 driver that can be easily avoided. This patch brings an additional 7 % performance improvement in UDP_RR workload. 1) If we received no frame during one mlx4_en_process_rx_cq() invocation, no need to call mlx4_cq_set_ci() and/or dirty ring->cons 2) Do not refill rx buffers if we have plenty of them. This avoids false sharing and allows some bulk/batch optimizations. Page allocator and its locks will thank us. Finally, mlx4_en_poll_rx_cq() should not return 0 if it determined cpu handling NIC IRQ should be changed. We should return budget-1 instead, to not fool net_rx_action() and its netdev_budget. v2: keep AVG_PERF_COUNTER(... polled) even if polled is 0 Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2e713283 |
|
15-Nov-2016 |
Eric Dumazet <edumazet@google.com> |
net/mlx4_en: use napi_complete_done() return value Do not rearm interrupts if we are busy polling. mlx4 uses separate CQ for TX and RX, so number of TX interrupts does not change, unfortunately. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Adam Belay <abelay@google.com> Cc: Tariq Toukan <tariqt@mellanox.com> Cc: Yuval Mintz <Yuval.Mintz@cavium.com> Cc: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
15fca2c8 |
|
02-Nov-2016 |
Tariq Toukan <tariqt@mellanox.com> |
net/mlx4_en: Add ethtool statistics for XDP cases XDP statistics are reported in ethtool, in total and per ring, as follows: - xdp_drop: the number of packets dropped by xdp. - xdp_tx: the number of packets forwarded by xdp. - xdp_tx_full: the number of times an xdp forward failed due to a full tx xdp ring. In addition, all packets that are dropped/forwarded by XDP are no longer accounted in rx_packets/rx_bytes of the ring, so that they count traffic that is passed to the stack. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
67f8b1dc |
|
02-Nov-2016 |
Tariq Toukan <tariqt@mellanox.com> |
net/mlx4_en: Refactor the XDP forwarding rings scheme Separately manage the two types of TX rings: regular ones, and XDP. Upon an XDP set, do not borrow regular TX rings and convert them into XDP ones, but allocate new ones, unless we hit the max number of rings. Which means that in systems with smaller #cores we will not consume the current TX rings for XDP, while we are still in the num TX limit. XDP TX rings counters are not shown in ethtool statistics. Instead, XDP counters will be added to the respective RX rings in a downstream patch. This has no performance implications. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
57c970c2 |
|
20-Sep-2016 |
Kamal Heib <kamalh@mellanox.com> |
net/mlx4_en: Fix wrong indentation Use tabs instead of spaces before if statement, no functional change. Fixes: e7c1c2c46201 ("mlx4_en: Added self diagnostics test implementation") Signed-off-by: Kamal Heib <kamalh@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
de3d6fa8 |
|
20-Sep-2016 |
Tariq Toukan <tariqt@mellanox.com> |
net/mlx4_en: Add branch prediction hints in RX data-path Add likely/unlikely hints to improve branch predictions in the RX data-path. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5737f6c9 |
|
19-Sep-2016 |
Jesper Dangaard Brouer <brouer@redhat.com> |
mlx4: add missed recycle opportunity for XDP_TX on TX failure Correct drop handling for XDP_TX on TX failure, were recently added in commit 95357907ae73 ("mlx4: fix XDP_TX is acting like XDP_PASS on TX ring full"). The change missed an opportunity for recycling the RX page, instead of going through the page allocator, like the regular XDP_DROP action does. This patch cease the opportunity, by going through the XDP_DROP case. Fixes: 95357907ae73 ("mlx4: fix XDP_TX is acting like XDP_PASS on TX ring full") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
95357907 |
|
16-Sep-2016 |
Jesper Dangaard Brouer <brouer@redhat.com> |
mlx4: fix XDP_TX is acting like XDP_PASS on TX ring full The XDP_TX action can fail transmitting the frame in case the TX ring is full or port is down. In case of TX failure it should drop the frame, and not as now call 'break' which is the same as XDP_PASS. Fixes: 9ecc2d86171a ("net/mlx4_en: add xdp forwarding and data write support") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: Brenden Blanco <bblanco@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
326fe02d |
|
03-Sep-2016 |
Brenden Blanco <bblanco@plumgrid.com> |
net/mlx4_en: protect ring->xdp_prog with rcu_read_lock Depending on the preempt mode, the bpf_prog stored in xdp_prog may be freed despite the use of call_rcu inside bpf_prog_put. The situation is possible when running in PREEMPT_RCU=y mode, for instance, since the rcu callback for destroying the bpf prog can run even during the bh handling in the mlx4 rx path. Several options were considered before this patch was settled on: Add a napi_synchronize loop in mlx4_xdp_set, which would occur after all of the rings are updated with the new program. This approach has the disadvantage that as the number of rings increases, the speed of update will slow down significantly due to napi_synchronize's msleep(1). Add a new rcu_head in bpf_prog_aux, to be used by a new bpf_prog_put_bh. The action of the bpf_prog_put_bh would be to then call bpf_prog_put later. Those drivers that consume a bpf prog in a bh context (like mlx4) would then use the bpf_prog_put_bh instead when the ring is up. This has the problem of complexity, in maintaining proper refcnts and rcu lists, and would likely be harder to review. In addition, this approach to freeing must be exclusive with other frees of the bpf prog, for instance a _bh prog must not be referenced from a prog array that is consumed by a non-_bh prog. The placement of rcu_read_lock in this patch is functionally the same as putting an rcu_read_lock in napi_poll. Actually doing so could be a potentially controversial change, but would bring the implementation in line with sk_busy_loop (though of course the nature of those two paths is substantially different), and would also avoid future copy/paste problems with future supporters of XDP. Still, this patch does not take that opinionated option. Testing was done with kernels in either PREEMPT_RCU=y or CONFIG_PREEMPT_VOLUNTARY=y+PREEMPT_RCU=n modes, with neither exhibiting any drawback. With PREEMPT_RCU=n, the extra call to rcu_read_lock did not show up in the perf report whatsoever, and with PREEMPT_RCU=y the overhead of rcu_read_lock (according to perf) was the same before/after. In the rx path, rcu_read_lock is eventually called for every packet from netif_receive_skb_internal, so the napi poll call's rcu_read_lock is easily amortized. v2: Remove extra rcu_read_lock in mlx4_en_process_rx_cq body Annotate xdp_prog with __rcu, and convert all usages to rcu_assign or rcu_dereference[_protected] as appropriate. Add explicit mutex lock around rcu_assign instead of xchg loop. Fixes: d576acf0a22 ("net/mlx4_en: add page recycle to prepare rx ring for tx support") Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: Brenden Blanco <bblanco@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cb7386d3 |
|
20-Jul-2016 |
Brenden Blanco <bblanco@plumgrid.com> |
net/mlx4_en: use READ_ONCE when freeing xdp_prog For consistency, and in order to hint at the synchronous nature of the xdp_prog field, use READ_ONCE in the destroy path of the ring. All occurrences should now use either READ_ONCE or xchg. Signed-off-by: Brenden Blanco <bblanco@plumgrid.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9ecc2d86 |
|
19-Jul-2016 |
Brenden Blanco <bblanco@plumgrid.com> |
net/mlx4_en: add xdp forwarding and data write support A user will now be able to loop packets back out of the same port using a bpf program attached to xdp hook. Updates to the packet contents from the bpf program is also supported. For the packet write feature to work, the rx buffers are now mapped as bidirectional when the page is allocated. This occurs only when the xdp hook is active. When the program returns a TX action, enqueue the packet directly to a dedicated tx ring, so as to avoid completely any locking. This requires the tx ring to be allocated 1:1 for each rx ring, as well as the tx completion running in the same softirq. Upon tx completion, this dedicated tx ring recycles pages without unmapping directly back to the original rx ring. In steady state tx/drop workload, effectively 0 page allocs/frees will occur. In order to separate out the paths between free and recycle, a free_tx_desc func pointer is introduced that is optionally updated whenever recycle_ring is activated. By default the original free function is always initialized. Signed-off-by: Brenden Blanco <bblanco@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d576acf0 |
|
19-Jul-2016 |
Brenden Blanco <bblanco@plumgrid.com> |
net/mlx4_en: add page recycle to prepare rx ring for tx support The mlx4 driver by default allocates order-3 pages for the ring to consume in multiple fragments. When the device has an xdp program, this behavior will prevent tx actions since the page must be re-mapped in TODEVICE mode, which cannot be done if the page is still shared. Start by making the allocator configurable based on whether xdp is running, such that order-0 pages are always used and never shared. Since this will stress the page allocator, add a simple page cache to each rx ring. Pages in the cache are left dma-mapped, and in drop-only stress tests the page allocator is eliminated from the perf report. Note that setting an xdp program will now require the rings to be reconfigured. Before: 26.91% ksoftirqd/0 [mlx4_en] [k] mlx4_en_process_rx_cq 17.88% ksoftirqd/0 [mlx4_en] [k] mlx4_en_alloc_frags 6.00% ksoftirqd/0 [mlx4_en] [k] mlx4_en_free_frag 4.49% ksoftirqd/0 [kernel.vmlinux] [k] get_page_from_freelist 3.21% swapper [kernel.vmlinux] [k] intel_idle 2.73% ksoftirqd/0 [kernel.vmlinux] [k] bpf_map_lookup_elem 2.57% swapper [mlx4_en] [k] mlx4_en_process_rx_cq After: 31.72% swapper [kernel.vmlinux] [k] intel_idle 8.79% swapper [mlx4_en] [k] mlx4_en_process_rx_cq 7.54% swapper [kernel.vmlinux] [k] poll_idle 6.36% swapper [mlx4_core] [k] mlx4_eq_int 4.21% swapper [kernel.vmlinux] [k] tasklet_action 4.03% swapper [kernel.vmlinux] [k] cpuidle_enter_state 3.43% swapper [mlx4_en] [k] mlx4_en_prepare_rx_desc 2.18% swapper [kernel.vmlinux] [k] native_irq_return_iret 1.37% swapper [kernel.vmlinux] [k] menu_select 1.09% swapper [kernel.vmlinux] [k] bpf_map_lookup_elem Signed-off-by: Brenden Blanco <bblanco@plumgrid.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
47a38e15 |
|
19-Jul-2016 |
Brenden Blanco <bblanco@plumgrid.com> |
net/mlx4_en: add support for fast rx drop bpf program Add support for the BPF_PROG_TYPE_XDP hook in mlx4 driver. In tc/socket bpf programs, helpers linearize skb fragments as needed when the program touches the packet data. However, in the pursuit of speed, XDP programs will not be allowed to use these slower functions, especially if it involves allocating an skb. Therefore, disallow MTU settings that would produce a multi-fragment packet that XDP programs would fail to access. Future enhancements could be done to increase the allowable MTU. The xdp program is present as a per-ring data structure, but as of yet it is not possible to set at that granularity through any ndo. Signed-off-by: Brenden Blanco <bblanco@plumgrid.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
30f56e3c |
|
18-Jul-2016 |
Eugenia Emantayev <eugenia@mellanox.com> |
net/mlx4_en: Move filters cleanup to a proper location Filters cleanup should be done once before destroying net device, since filters list is contained in the private data. Fixes: 1eb8c695bda9 ('net/mlx4_en: Add accelerated RFS support') Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
82d69203 |
|
04-May-2016 |
Daniel Jurgens <danielj@mellanox.com> |
net/mlx4_en: Fix endianness bug in IPV6 csum calculation Use htons instead of unconditionally byte swapping nexthdr. On a little endian systems shifting the byte is correct behavior, but it results in incorrect csums on big endian architectures. Fixes: f8c6455bb04b ('net/mlx4_en: Extend checksum offloading by CHECKSUM COMPLETE') Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Carol Soto <clsoto@us.ibm.com> Tested-by: Carol Soto <clsoto@us.ibm.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
73898db0 |
|
04-May-2016 |
Haggai Abramovsky <hagaya@mellanox.com> |
net/mlx4: Avoid wrong virtual mappings The dma_alloc_coherent() function returns a virtual address which can be used for coherent access to the underlying memory. On some architectures, like arm64, undefined behavior results if this memory is also accessed via virtual mappings that are not coherent. Because of their undefined nature, operations like virt_to_page() return garbage when passed virtual addresses obtained from dma_alloc_coherent(). Any subsequent mappings via vmap() of the garbage page values are unusable and result in bad things like bus errors (synchronous aborts in ARM64 speak). The mlx4 driver contains code that does the equivalent of: vmap(virt_to_page(dma_alloc_coherent)), this results in an OOPs when the device is opened. Prevent Ethernet driver to run this problematic code by forcing it to allocate contiguous memory. As for the Infiniband driver, at first we are trying to allocate contiguous memory, but in case of failure roll back to work with fragmented memory. Signed-off-by: Haggai Abramovsky <hagaya@mellanox.com> Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reported-by: David Daney <david.daney@cavium.com> Tested-by: Sinan Kaya <okaya@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d21ed3a3 |
|
20-Apr-2016 |
Eran Ben Elisha <eranbe@mellanox.com> |
net/mlx4_en: Split SW RX dropped counter per RX ring Count SW packet drops per RX ring instead of a global counter. This will allow monitoring the number of rx drops per ring. In addition, SW rx_dropped counter was overwritten by HW rx_dropped counter, sum both of them instead to show the accurate value. Fixes: a3333b35da16 ('net/mlx4_en: Moderate ethtool callback to [...] ') Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reported-by: Brenden Blanco <bblanco@plumgrid.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
851b10d6 |
|
18-Apr-2016 |
Konstantin Khlebnikov <koct9i@gmail.com> |
net/mlx4_en: do batched put_page using atomic_sub This patch fixes couple error paths after allocation failures. Atomic set of page reference counter is safe only if it is zero, otherwise set can race with any speculative get_page_unless_zero. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
04aeb56a |
|
18-Apr-2016 |
Konstantin Khlebnikov <koct9i@gmail.com> |
net/mlx4_en: allocate non 0-order pages for RX ring with __GFP_NOMEMALLOC High order pages are optional here since commit 51151a16a60f ("mlx4: allow order-0 memory allocations in RX path"), so here is no reason for depleting reserves. Generic __netdev_alloc_frag() implements the same logic. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fe896d18 |
|
17-Mar-2016 |
Joonsoo Kim <iamjoonsoo.kim@lge.com> |
mm: introduce page reference manipulation functions The success of CMA allocation largely depends on the success of migration and key factor of it is page reference count. Until now, page reference is manipulated by direct calling atomic functions so we cannot follow up who and where manipulate it. Then, it is hard to find actual reason of CMA allocation failure. CMA allocation should be guaranteed to succeed so finding offending place is really important. In this patch, call sites where page reference is manipulated are converted to introduced wrapper function. This is preparation step to add tracepoint to each page reference manipulation function. With this facility, we can easily find reason of CMA allocation failure. There is no functional change in this patch. In addition, this patch also converts reference read sites. It will help a second step that renames page._count to something else and prevents later attempt to direct access to it (Suggested by Andrew). Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Michal Nazarewicz <mina86@mina86.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Minchan Kim <minchan@kernel.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
93f93a44 |
|
18-Nov-2015 |
Eric Dumazet <edumazet@google.com> |
net: move skb_mark_napi_id() into core networking stack We would like to automatically provide busy polling support to all NAPI drivers, without them having to implement anything. skb_mark_napi_id() can be called from napi_gro_receive() and napi_get_frags(). Few drivers are still calling skb_mark_napi_id() because they use netif_receive_skb(). They should eventually call napi_gro_receive() instead. I will leave this to drivers maintainers. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
868fdb06 |
|
18-Nov-2015 |
Eric Dumazet <edumazet@google.com> |
mlx4: remove mlx4_en_low_latency_recv() Busy polling can now be handled in generic NAPI poll infrastructure. This removes complexity and fast path overhead : mlx4 used two spin_lock()/spin_unlock() pair per napi->poll() call in mlx4_en_cq_lock_napi()/mlx4_en_cq_unlock_napi() Tested: Without busy polling : lpaa23:~# echo 0 >/proc/sys/net/core/busy_read lpaa24:~# echo 0 >/proc/sys/net/core/busy_read lpaa23:~# ./netperf -H lpaa24 -t TCP_RR MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpaa24.prod.google.com () port 0 AF_INET : first burst 0 Local /Remote Socket Size Request Resp. Elapsed Trans. Send Recv Size Size Time Rate bytes Bytes bytes bytes secs. per sec 16384 87380 1 1 10.00 47330.78 With busy polling : lpaa23:~# echo 70 >/proc/sys/net/core/busy_read lpaa24:~# echo 70 >/proc/sys/net/core/busy_read lpaa23:~# ./netperf -H lpaa24 -t TCP_RR MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpaa24.prod.google.com () port 0 AF_INET : first burst 0 Local /Remote Socket Size Request Resp. Elapsed Trans. Send Recv Size Size Time Rate bytes Bytes bytes bytes secs. per sec 16384 87380 1 1 10.00 97643.55 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4671fc6d |
|
15-Sep-2015 |
Eric Dumazet <edumazet@google.com> |
net/mlx4_en: really allow to change RSS key When changing rss key, we do not want to overwrite user provided key by the one provided by netdev_rss_key_fill(), which is the host random key generated at boot time. Fixes: 947cbb0ac242 ("net/mlx4_en: Support for configurable RSS hash function") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Eyal Perry <eyalpe@mellanox.com> CC: Amir Vadai <amirv@mellanox.com> Acked-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
dc2ec62f |
|
15-Sep-2015 |
Thomas Gleixner <tglx@linutronix.de> |
net/mlx4_en: Use access helper irq_data_get_affinity_mask() This is a preparatory patch for moving irq_data struct members. Search and replace was done with coccinelle Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Julia Lawall <Julia.Lawall@lip6.fr> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Amir Vadai <amirv@mellanox.com>
|
#
e38af4fa |
|
27-Jul-2015 |
Hadar Hen Zion <hadarh@mellanox.com> |
net/mlx4_en: Add support for hardware accelerated 802.1ad vlan To enable device support in accelerated 802.1ad vlan, the port capability "packet has vlan enable" (phv_en) should be set. Firmware won't work properly, in case phv_en is not set. The user can enable "phv_en" port capability with the new ethtool private flag phv-bit. The phv-bit private flag default value is OFF, users who are interested in 802.1ad hardware acceleration should turn ON the phv-bit private flag: $ ethtool --set-priv-flags eth1 phv-bit on Once the private flag is set, the device is ready for 802.1ad vlan acceleration. The user should also change the interface device features and turn on "tx-vlan-stag-hw-insert" which is off by default: $ ethtool -K eth1 tx-vlan-stag-hw-insert on "phv-bit" private flag setting is available only for Physical Functions(PF), the Virtual Function (VF) will be able to use the feature by setting "tx-vlan-stag-hw-insert" ethtool device feature only if the feature was enabled by the Hypervisor. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e802f8e4 |
|
27-Jul-2015 |
Hadar Hen Zion <hadarh@mellanox.com> |
net/mlx4: Prepare VLAN macros for 802.1ad Hardware accelerated support To add Hardware accelerated support in 802.1ad vlan, replace Current VLAN macros to CVLAN. Replace: MLX4_WQE_CTRL_INS_VLAN MLX4_CQE_VLAN_PRESENT_MASK With: MLX4_WQE_CTRL_INS_CVLAN MLX4_CQE_CVLAN_PRESENT_MASK Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
62e4c9b4 |
|
22-Jul-2015 |
Ido Shamay <idos@mellanox.com> |
net/mlx4_en: Remove BUG_ON assert when checking if ring is full In mlx4_en_is_ring_empty we check if ring surpassed its size. Since the prod and cons indicators are u32, there might be a state where prod wrapped around and cons, making this assert false, although no actual bug exists (other code segment can cope with this state). Signed-off-by: Ido Shamay <idos@mellanox.com> Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0a6d4245 |
|
02-Jul-2015 |
Eric Dumazet <edumazet@google.com> |
mlx4: TCP/UDP packets have L4 hash Mellanox driver has the knowledge if rxhash is a L4 hash, if it receives a non fragmented TCP or UDP frame and NETIF_F_RXCSUM is enabled on netdev. ip_summed value is CHECKSUM_UNNECESSARY in this case. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Amir Vadai <amirv@mellanox.com> Cc: Ido Shamay <idos@mellanox.com> Acked-by: Ido Shamay <idos@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
79a25852 |
|
25-Jun-2015 |
Ido Shamay <idos@mellanox.com> |
net/mlx4_en: Fix wrong csum complete report when rxvlan offload is disabled The check_csum() function relied on hwtstamp_rx_filter to know if rxvlan offload is disabled. This is wrong since rxvlan offload can be switched on/off regardless of hwtstamp_rx_filter. Also moved check_csum to query CQE information to identify VLAN packets and removed the check of IP packets, since it has been validated before. Fixes: f8c6455bb04b ('net/mlx4_en: Extend checksum offloading by CHECKSUM COMPLETE') Signed-off-by: Ido Shamay <idos@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c66fa19c |
|
31-May-2015 |
Matan Barak <matanb@mellanox.com> |
net/mlx4: Add EQ pool Previously, mlx4_en allocated EQs and used them exclusively. This affected RoCE performance, as applications which are events sensitive were limited to use only the legacy EQs. Change that by introducing an EQ pool. This pool is managed by mlx4_core. EQs are assigned to ports (when there are limited number of EQs, multiple ports could be assigned to the same EQs). An exception to this rule is the ASYNC EQ which handles various events. Legacy EQs are completely removed as all EQs could be shared. When a consumer (mlx4_ib/mlx4_en) requests an EQ, it asks for EQ serving on a specific port. The core driver calculates which EQ should be assigned to that request. Because IRQs are shared between IB and Ethernet modules, their names only include the PCI device BDF address. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Ido Shamay <idos@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
07841f9d |
|
30-Apr-2015 |
Ido Shamay <idos@mellanox.com> |
net/mlx4_en: Schedule napi when RX buffers allocation fails When system is out of memory, refilling of RX buffers fails while the driver continue to pass the received packets to the kernel stack. At some point, when all RX buffers deplete, driver may fall into a sleep, and not recover when memory for new RX buffers is once again availible. This is because hardware does not have valid descriptors, so no interrupt will be generated for the driver to return to work in napi context. Fix it by schedule the napi poll function from stats_task delayed workqueue, as long as the allocations fail. Signed-off-by: Ido Shamay <idos@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
12b3375f |
|
08-Apr-2015 |
Alexander Duyck <alexander.h.duyck@redhat.com> |
mlx4/mlx5: Use dma_wmb/rmb where appropriate This patch should help to improve the performance of the mlx4 and mlx5 on a number of architectures. For example, on x86 the dma_wmb/rmb equates out to a barrer() call as the architecture is already strong ordered, and on PowerPC the call works out to a lwsync which is significantly less expensive than the sync call that was being used for wmb. I placed the new barriers between any spots that seemed to be trying to order memory/memory reads or writes, if there are any spots that involved MMIO I left the existing wmb in place as the new barriers cannot order transactions between coherent and non-coherent memories. v2: Reduced the replacments to just the spots where I could clearly identify the usage pattern. Cc: Amir Vadai <amirv@mellanox.com> Cc: Ido Shamay <idos@mellanox.com> Cc: Eli Cohen <eli@mellanox.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f0df3503 |
|
02-Apr-2015 |
Muhammad Mahajna <muhammadm@mellanox.com> |
net/mlx4_en: Add RX-FCS support Enabled when device supports KEEP FCS. When the flag is set, Ethernet FCS is appended to the end of the frame, controlled by ethtool. Signed-off-by: Muhammad Mahajna <muhammadm@mellanox.com> Signed-off-by: Ido Shamay <idos@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e8e7f018 |
|
03-Feb-2015 |
Ido Shamay <idos@mellanox.com> |
net/mlx4_en: Adjust RX frag strides to frag sizes This patch improves memory utilization and therefore the packets rate for special MTU's. Instead of setting the frag_stride to the maximal hard coded frag_size, use the actual frag_size that is set according to the MTU, when setting the stride of the last frag. So, for example, for MTU 1600, where the frag_size of the 2nd frag is 86, the frag_size is set to 128 instead of 4096. See below: Before: frag:0 - size:1536 prefix:0 stride:1536 frag:1 - size:86 prefix:1536 stride:4096 frag 0 allocator: - size:32768 frags:21 frag 1 allocator: - size:32768 frags:8 After: frag:0 - size:1536 prefix:0 stride:1536 frag:1 - size:86 prefix:1536 stride:128 frag 0 allocator: - size:32768 frags:21 frag 1 allocator: - size:32768 frags:256 Signed-off-by: Ido Shamay <idos@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b110d2ce |
|
03-Feb-2015 |
Ido Shamay <idos@mellanox.com> |
net/mlx4_en: Print page allocator information After Initialization of page_alloc, print actual allocated page size and number of frags it contains. prints is done only when drv message level is set on the interface. Signed-off-by: Ido Shamay <idos@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
872bf2fb |
|
25-Jan-2015 |
Yishai Hadas <yishaih@mellanox.com> |
net/mlx4_core: Maintain a persistent memory for mlx4 device Maintain a persistent memory that should survive reset flow/PCI error. This comes as a preparation for coming series to support above flows. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d57febe1 |
|
11-Dec-2014 |
Matan Barak <matanb@mellanox.com> |
net/mlx4: Add A0 hybrid steering A0 hybrid steering is a form of high performance flow steering. By using this mode, mlx4 cards use a fast limited table based steering, in order to enable fast steering of unicast packets to a QP. In order to implement A0 hybrid steering we allocate resources from different zones: (1) General range (2) Special MAC-assigned QPs [RSS, Raw-Ethernet] each has its own region. When we create a rss QP or a raw ethernet (A0 steerable and BF ready) QP, we try hard to allocate the QP from range (2). Otherwise, we try hard not to allocate from this range. However, when the system is pushed to its limits and one needs every resource, the allocator uses every region it can. Meaning, when we run out of raw-eth qps, the allocator allocates from the general range (and the special-A0 area is no longer active). If we run out of RSS qps, the mechanism tries to allocate from the raw-eth QP zone. If that is also exhausted, the allocator will allocate from the general range (and the A0 region is no longer active). Note that if a raw-eth qp is allocated from the general range, it attempts to allocate the range such that bits 6 and 7 (blueflame bits) in the QP number are not set. When the feature is used in SRIOV, the VF has to notify the PF what kind of QP attributes it needs. In order to do that, along with the "Eth QP blueflame" bit, we reserve a new "A0 steerable QP". According to the combination of these bits, the PF tries to allocate a suitable QP. In order to maintain backward compatibility (with older PFs), the PF notifies which QP attributes it supports via QUERY_FUNC_CAP command. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ddae0349 |
|
11-Dec-2014 |
Eugenia Emantayev <eugenia@mellanox.co.il> |
net/mlx4: Change QP allocation scheme When using BF (Blue-Flame), the QPN overrides the VLAN, CV, and SV fields in the WQE. Thus, BF may only be used for QPNs with bits 6,7 unset. The current Ethernet driver code reserves a Tx QP range with 256b alignment. This is wrong because if there are more than 64 Tx QPs in use, QPNs >= base + 65 will have bits 6/7 set. This problem is not specific for the Ethernet driver, any entity that tries to reserve more than 64 BF-enabled QPs should fail. Also, using ranges is not necessary here and is wasteful. The new mechanism introduced here will support reservation for "Eth QPs eligible for BF" for all drivers: bare-metal, multi-PF, and VFs (when hypervisors support WC in VMs). The flow we use is: 1. In mlx4_en, allocate Tx QPs one by one instead of a range allocation, and request "BF enabled QPs" if BF is supported for the function 2. In the ALLOC_RES FW command, change param1 to: a. param1[23:0] - number of QPs b. param1[31-24] - flags controlling QPs reservation Bit 31 refers to Eth blueflame supported QPs. Those QPs must have bits 6 and 7 unset in order to be used in Ethernet. Bits 24-30 of the flags are currently reserved. When a function tries to allocate a QP, it states the required attributes for this QP. Those attributes are considered "best-effort". If an attribute, such as Ethernet BF enabled QP, is a must-have attribute, the function has to check that attribute is supported before trying to do the allocation. In a lower layer of the code, mlx4_qp_reserve_range masks out the bits which are unsupported. If SRIOV is used, the PF validates those attributes and masks out unsupported attributes as well. In order to notify VFs which attributes are supported, the VF uses QUERY_FUNC_CAP command. This command's mailbox is filled by the PF, which notifies which QP allocation attributes it supports. Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c58942f2 |
|
11-Dec-2014 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx4_en: Set csum level for encapsulated packets This was dropped by mistake for the napi_gro_frags flow, fix that. Fixes: dd65beac48a5 ('net/mlx4_en: Extend usage of napi_gro_frags') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
947cbb0a |
|
02-Dec-2014 |
Eyal Perry <eyalpe@mellanox.com> |
net/mlx4_en: Support for configurable RSS hash function The ConnectX HW is capable of using one of the following hash functions: Toeplitz and an XOR hash function. This patch extends the implementation of the mlx4_en driver set/get_rxfh callbacks to support getting and setting the RSS hash function used by the device. Signed-off-by: Eyal Perry <eyalpe@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bd635c35 |
|
22-Nov-2014 |
Eric Dumazet <edumazet@google.com> |
mlx4: fix mlx4_en_set_rxfh() mlx4_en_set_rxfh() can crash if no RSS indir table is provided. While we are at it, allow RSS key to be changed with ethtool -X Tested: myhost:~# cat /proc/sys/net/core/netdev_rss_key b6:89:91:f3:b2:c3:c2:90:11:e8:ce:45:e8:a9:9d:1c:f2:f6:d4:53:61:8b:26:3a:b3:9a:57:97:c3:b6:79:4d:2e:d9:66:5c:72:ed:b6:8e:c5:5d:4d:8c:22:67:30:ab:8a:6e:c3:6a myhost:~# ethtool -x eth0 RX flow hash indirection table for eth0 with 8 RX ring(s): 0: 0 1 2 3 4 5 6 7 RSS hash key: b6:89:91:f3:b2:c3:c2:90:11:e8:ce:45:e8:a9:9d:1c:f2:f6:d4:53:61:8b:26:3a:b3:9a:57:97:c3:b6:79:4d:2e:d9:66:5c:72:ed:b6:8e myhost:~# ethtool -X eth0 hkey \ 03:0e:e2:43:fa:82:0e:73:14:2d:c0:68:21:9e:82:99:b9:84:d0:22:e2:b3:64:9f:4a:af:00:fa:cc:05:b4:4a:17:05:14:73:76:58:bd:2f myhost:~# ethtool -x eth0 RX flow hash indirection table for eth0 with 8 RX ring(s): 0: 0 1 2 3 4 5 6 7 RSS hash key: 03:0e:e2:43:fa:82:0e:73:14:2d:c0:68:21:9e:82:99:b9:84:d0:22:e2:b3:64:9f:4a:af:00:fa:cc:05:b4:4a:17:05:14:73:76:58:bd:2f Reported-by: Ben Hutchings <ben@decadent.org.uk> Fixes: b9d1ab7eb42e ("mlx4: use netdev_rss_key_fill() helper") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b9d1ab7e |
|
16-Nov-2014 |
Eric Dumazet <edumazet@google.com> |
mlx4: use netdev_rss_key_fill() helper Use of well known RSS key increases attack surface. Switch to a random one, using generic helper so that all ports share a common key. Also provide ethtool -x support to fetch RSS key Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f8c6455b |
|
09-Nov-2014 |
Shani Michaeli <shanim@mellanox.com> |
net/mlx4_en: Extend checksum offloading by CHECKSUM COMPLETE When processing received traffic, pass CHECKSUM_COMPLETE status to the stack, with calculated checksum for non TCP/UDP packets (such as GRE or ICMP). Although the stack expects checksum which doesn't include the pseudo header, the HW adds it. To address that, we are subtracting the pseudo header checksum from the checksum value provided by the HW. In the IPv6 case, we also compute/add the IP header checksum which is not added by the HW for such packets. Cc: Jerry Chu <hkchu@google.com> Signed-off-by: Shani Michaeli <shanim@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
dd65beac |
|
09-Nov-2014 |
Shani Michaeli <shanim@mellanox.com> |
net/mlx4_en: Extend usage of napi_gro_frags We can call napi_gro_frags for all the received traffic regardless of the checksum status. Specifically, received packets whose status is CHECKSUM_NONE (and soon to be added CHECKSUM_COMPLETE) are eligible for napi_gro_frags as well. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Shani Michaeli <shanim@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2e1af7d7 |
|
10-Nov-2014 |
Eric Dumazet <edumazet@google.com> |
mlx4: restore conditional call to napi_complete_done() After commit 1a28817282 ("mlx4: use napi_complete_done()") we ended up calling napi_complete_done() in the case NAPI poll consumed all its budget. This added extra interrupt pressure, this patch restores proper behavior. Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: 1a28817282 ("mlx4: use napi_complete_done()") Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1a288172 |
|
06-Nov-2014 |
Eric Dumazet <edumazet@google.com> |
mlx4: use napi_complete_done() To enable gro_flush_timeout, a driver has to use napi_complete_done() instead of napi_complete(). Tested: Ran 200 netperf TCP_STREAM from A to B (10Gbe mlx4 link, 8 RX queues) Without this feature, we send back about 305,000 ACK per second. GRO aggregation ratio is low (811/305 = 2.65 segments per GRO packet) Setting a timer of 2000 nsec is enough to increase GRO packet sizes and reduce number of ACK packets. (811/19.2 = 42) Receiver performs less calls to upper stacks, less wakes up. This also reduces cpu usage on the sender, as it receives less ACK packets. Note that reducing number of wakes up increases cpu efficiency, but can decrease QPS, as applications wont have the chance to warmup cpu caches doing a partial read of RPC requests/answers if they fit in one skb. B:~# sar -n DEV 1 10 | grep eth0 | tail -1 Average: eth0 811269.80 305732.30 1199462.57 19705.72 0.00 0.00 0.50 B:~# echo 2000 >/sys/class/net/eth0/gro_flush_timeout B:~# sar -n DEV 1 10 | grep eth0 | tail -1 Average: eth0 811577.30 19230.80 1199916.51 1239.80 0.00 0.00 0.50 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1ab25f86 |
|
02-Nov-2014 |
Ido Shamay <idos@mellanox.com> |
net/mlx4_en: Add __GFP_COLD gfp flags in alloc_pages Needed in order to get cache cold pages (L3 flushed) for HW scatter. Otherwise memory may flush those entries when the packet comes from PCI, causing back pressure resulting in BW decrease. Signed-off-by: Ido Shamay <idos@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5f6e9800 |
|
02-Nov-2014 |
Ido Shamay <idos@mellanox.com> |
net/mlx4_en: Remove RX buffers alignment to IP_ALIGN When IP_ALIGN has a non zero value, hardware will write to a non aligned address. The only reader from this address is when copying the header from the first frag into the linear buffer (further access to the IP address will be from the linear buffer, in which the headers are aligned). Since the penalty of non align access by the hardware is greater than the software memcpy, changing the frag_align to always be 0. Signed-off-by: Ido Shamay <idos@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
477b35b4 |
|
29-Oct-2014 |
Eric Dumazet <edumazet@google.com> |
mlx4: use napi_schedule_irqoff() mlx4_en_rx_irq() and mlx4_en_tx_irq() run from hard interrupt context. They can use napi_schedule_irqoff() instead of napi_schedule() Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-By: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c2a3d4b4 |
|
27-Oct-2014 |
Jack Morgenstein <jackm@dev.mellanox.co.il> |
net/mlx4_en: Cleanups suggested by clang static checker clang flagged the following. All are actually cosmetic cleanups, not really bugs: drivers/net/ethernet/mellanox/mlx4/en_main.c:233:3: warning: Value stored to 'err' is never read err = -ENOMEM; ^ ~~~~~~~ drivers/net/ethernet/mellanox/mlx4/en_main.c:293:3: warning: Value stored to 'err' is never read err = -ENOMEM; drivers/net/ethernet/mellanox/mlx4/en_netdev.c:648:16: warning: Assigned value is garbage or undefined entry->reg_id = reg_id; ^ ~~~~~~ drivers/net/ethernet/mellanox/mlx4/en_netdev.c:659:2: warning: Function call argument is an uninitialized value mlx4_en_uc_steer_release(priv, priv->dev->dev_addr, *qpn, reg_id); (NOTE: reg_id is only used in the device-managed flow steering path, in which is it always initialized. This is not a bug. Cleanup here is therefore cosmetic only). drivers/net/ethernet/mellanox/mlx4/en_rx.c:122:3: warning: Value stored to 'frag_info' is never read frag_info = &priv->frag_info[i]; ^ ~~~~~~~~~~~~~~~~~~~ Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
98226208 |
|
10-Oct-2014 |
Eric Dumazet <edumazet@google.com> |
mlx4: fix race accessing page->_count This is illegal to use atomic_set(&page->_count, ...) even if we 'own' the page. Other entities in the kernel need to use get_page_unless_zero() to get a reference to the page before testing page properties, so we could loose a refcount increment. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b1b6b4da |
|
18-Sep-2014 |
Ido Shamay <idos@mellanox.com> |
net/mlx4_en: Add mlx4_en_get_cqe helper This function derives the base address of the CQE from the CQE size, and calculates the real CQE context segment in it from the factor (this is like before). Before this change the code used the factor to calculate the base address of the CQE as well. The factor indicates in which segment of the cqe stride the cqe information is located. For 32-byte strides, the segment is 0, and for 64 byte strides, the segment is 1 (bytes 32..63). Using the factor was ok as long as we had only 32 and 64 byte strides. However, with larger strides, the factor is zero, and so cannot be used to calculate the base of the CQE. The helper uses the same method of CQE buffer pulling made by all other components that reads the CQE buffer (mlx4_ib driver and libmlx4). Signed-off-by: Ido Shamay <idos@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cfecec56 |
|
05-Sep-2014 |
Eric Dumazet <edumazet@google.com> |
mlx4: only pull headers into skb head Use the new fancy eth_get_headlen() to pull exactly the headers into skb->head. This speeds up GRE traffic (or more generally tunneled traffuc), as GRO can aggregate up to 17 MSS per GRO packet instead of 8. (Pulling too much data was forcing GRO to keep 2 frags per MSS) Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9ca8600e |
|
27-Aug-2014 |
Tom Herbert <therbert@google.com> |
mlx4: Set skb->csum_level for encapsulated checksum Set skb->csum_level instead of skb->encapsulation when indicating CHECKSUM_UNNECESSARY for an encapsulated checksum. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ea1c1af1 |
|
22-Jul-2014 |
Amir Vadai <amirv@mellanox.com> |
net/mlx4_en: Reduce memory consumption on kdump kernel When memory is limited, reduce number of rx and tx rings. Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
32b333fe |
|
13-Jul-2014 |
Jason Wang <jasowang@redhat.com> |
mlx4: mark napi id for gro_skb Napi id was not marked for gro_skb, this will lead rx busy loop won't work correctly since they stack never try to call low latency receive method because of a zero socket napi id. Fix this by marking napi id for gro_skb. The transaction rate of 1 byte netperf tcp_rr gets about 50% increased (from 20531.68 to 30610.88). Cc: Amir Vadai <amirv@mellanox.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d5b8dff0 |
|
08-Jul-2014 |
Yishai Hadas <yishaih@mellanox.com> |
net/mlx4_en: Do not count LLC/SNAP in MTU calculation LLC/SNAP 8 bytes should not be added as part of header calculation. If used, payload will be decreased accordingly. For MTU of 1500 we'll set 1522 instead of 1523. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Liran Liss <liranl@mellanox.com> Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
35f6f453 |
|
29-Jun-2014 |
Amir Vadai <amirv@mellanox.com> |
net/mlx4_en: Don't use irq_affinity_notifier to track changes in IRQ affinity map IRQ affinity notifier can only have a single notifier - cpu_rmap notifier. Can't use it to track changes in IRQ affinity map. Detect IRQ affinity changes by comparing CPU to current IRQ affinity map during NAPI poll thread. CC: Thomas Gleixner <tglx@linutronix.de> CC: Ben Hutchings <ben@decadent.org.uk> Fixes: 2eacc23 ("net/mlx4_core: Enforce irq affinity changes immediatly") Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
40f2287b |
|
11-May-2014 |
Jiri Kosina <jkosina@suse.cz> |
IB/mlx4: Implement IB_QP_CREATE_USE_GFP_NOIO Modify the various routines used to allocate memory resources which serve QPs in mlx4 to get an input GFP directive. Have the Ethernet driver to use GFP_KERNEL in it's QP allocations as done prior to this commit, and the IB driver to use GFP_NOIO when the IB verbs IB_QP_CREATE_USE_GFP_NOIO QP creation flag is provided. Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
2eacc23c |
|
13-May-2014 |
Yuval Atias <yuvala@mellanox.com> |
net/mlx4_core: Enforce irq affinity changes immediatly During heavy traffic, napi is constatntly polling the complition queue and no interrupt is fired. Because of that, changes to irq affinity are ignored until traffic is stopped and resumed. By registering to the irq notifier mechanism, and forcing interrupt when affinity is changed, irq affinity changes will be immediatly enforced. Signed-off-by: Yuval Atias <yuvala@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1a91de28 |
|
07-May-2014 |
Joe Perches <joe@perches.com> |
mellanox: Logging message cleanups Use a more current logging style. o Coalesce formats o Add missing spaces for coalesced formats o Align arguments for modified formats o Add missing newlines for some logging messages o Use DRV_NAME as part of format instead of %s, DRV_NAME to reduce overall text. o Use ..., ##__VA_ARGS__ instead of args... in macros o Correct a few format typos o Use a single line message where appropriate Signed-off-by: Joe Perches <joe@perches.com> Acked-By: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
38be0a34 |
|
14-Mar-2014 |
Eric W. Biederman <ebiederm@xmission.com> |
mlx4: Don't receive packets when the napi budget == 0 Processing any incoming packets with a with a napi budget of 0 is incorrect driver behavior. This matters as netpoll will shortly call drivers with a budget of 0 to avoid receive packet processing happening in hard irq context. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bb2146bc |
|
20-Feb-2014 |
Ido Shamay <idos@mellanox.com> |
net/mlx4: Fix limiting number of IRQ's instead of RSS queues This fix a performance bug introduced by commit 90b1ebe "mlx4: set maximal number of default RSS queues", which limits the numbers of IRQs opened by core module. The limit should be on the number of queues in the indirection table - rx_rings, and not on the number of IRQ's. Also, limiting on mlx4_core initialization instead of in mlx4_en, prevented using "ethtool -L" to utilize all the CPU's, when performance mode is prefered, since limiting this number to 8 reduces overall packet rate by 15%-50% in multiple TCP streams applications. For example, after running ethtool -L <ethx> rx 16 Packet rate Before the fix 897799 After the fix 1142070 Results were obtained using netperf: S=200 ; ( for i in $(seq 1 $S) ; do ( \ netperf -H 11.7.13.55 -t TCP_RR -l 30 &) ; \ wait ; done | grep "1 1" | awk '{SUM+=$6} END {print SUM}' ) CC: Yuval Mintz <yuvalmin@broadcom.com> Signed-off-by: Ido Shamay <idos@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
02512482 |
|
20-Feb-2014 |
Ido Shamay <idos@mellanox.com> |
net/mlx4: Set number of RX rings in a utility function mlx4_en_add() is too long. Moving set number of RX rings to a utiltity function to improve readability and modulization of the code. Signed-off-by: Ido Shamay <idos@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e6a76758 |
|
09-Jan-2014 |
Eric Dumazet <edumazet@google.com> |
net/mlx4_en: call gro handler for encapsulated frames In order to use the native GRO handling of encapsulated protocols on mlx4, we need to call napi_gro_receive() instead of netif_receive_skb() unless busy polling is in action. While we are at it, rename mlx4_en_cq_ll_polling() to mlx4_en_cq_busy_polling() Tested with GRE tunnel : GRO aggregation is now performed on the ethernet device instead of being done later on gre device. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Amir Vadai <amirv@mellanox.com> Cc: Jerry Chu <hkchu@google.com> Cc: Or Gerlitz <ogerlitz@mellanox.com> Acked-By: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
837052d0 |
|
23-Dec-2013 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx4_en: Add netdev support for TCP/IP offloads of vxlan tunneling When the device tunneling offloads mode is vxlan do the following - call SET_PORT with the relevant setting - add DMFS steering vxlan rule for the device self and multicast mac addresses of the form: {<ETH, outer-mac> <VXLAN, ANY vnid> <ETH, ANY mac>} --> RSS QP - set relevant QPC fields in RSS context and RX ring QPs - in TX flow, set WQE fields to generate HW checksum, and handle gso skbs which are marked for encapsulation such that the HW will segment them properly. - in RX flow, read HW offloaded checksum for encapsulated packets from the CQE - advertize hw_enc_features and NETIF_F_GSO_UDP_TUNNEL to the networking stack Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
69174416 |
|
18-Dec-2013 |
Tom Herbert <therbert@google.com> |
net: mlx4 calls skb_set_hash Drivers should call skb_set_hash to set the hash and its type in an skbuff. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
163561a4 |
|
06-Nov-2013 |
Eugenia Emantayev <eugenia@mellanox.com> |
net/mlx4_en: Datapath structures are allocated per NUMA node For each RX/TX ring and its CQ, allocation is done on a NUMA node that corresponds to the core that the data structure should operate on. The assumption is that the core number is reflected by the ring index. The affected allocations are the ring/CQ data structures, the TX/RX info and the shared HW/SW buffer. For TX rings, each core has rings of all UPs. Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.com> Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
41d942d5 |
|
06-Nov-2013 |
Eugenia Emantayev <eugenia@mellanox.com> |
net/mlx4_en: Datapath resources allocated dynamically Currently all TX/RX rings and completion queues are part of the netdev priv structure and are allocated statically. This patch will change the priv to hold only arrays of pointers and therefore all TX/RX rings and completetion queues will be allocated dynamically. This is in preparation for NUMA aware allocations. Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.com> Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
021f1107 |
|
07-Oct-2013 |
Amir Vadai <amirv@mellanox.com> |
net/mlx4_en: Fix pages never dma unmapped on rx This patch fixes a bug introduced by commit 51151a16 (mlx4: allow order-0 memory allocations in RX path). dma_unmap_page never reached because condition to detect last fragment in page is wrong. offset+frag_stride can't be greater than size, need to make sure no additional frag will fit in page => compare offset + frag_stride + next_frag_size instead. next_frag_size is the same as the current one, since page is shared only with frags of the same size. CC: Eric Dumazet <edumazet@google.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
70fbe079 |
|
07-Oct-2013 |
Amir Vadai <amirv@mellanox.com> |
net/mlx4_en: Rename name of mlx4_en_rx_alloc members Add page prefix to page related members: @size and @offset into @page_size and @page_offset CC: Eric Dumazet <edumazet@google.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8b80cda5 |
|
10-Jul-2013 |
Eliezer Tamir <eliezer.tamir@linux.intel.com> |
net: rename ll methods to busy-poll Rename ndo_ll_poll to ndo_busy_poll. Rename sk_mark_ll to sk_mark_napi_id. Rename skb_mark_ll to skb_mark_napi_id. Correct all useres of these functions. Update comments and defines in include/net/busy_poll.h Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
076bb0c8 |
|
10-Jul-2013 |
Eliezer Tamir <eliezer.tamir@linux.intel.com> |
net: rename include/net/ll_poll.h to include/net/busy_poll.h Rename the file and correct all the places where it is included. Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
51151a16 |
|
23-Jun-2013 |
Eric Dumazet <eric.dumazet@gmail.com> |
mlx4: allow order-0 memory allocations in RX path Signed-off-by: Eric Dumazet <edumazet@google.com> mlx4 exclusively uses order-2 allocations in RX path, which are likely to fail under memory pressure. We therefore drop frames more than needed. This patch tries order-3, order-2, order-1 and finally order-0 allocations to keep good performance, yet allow allocations if/when memory gets fragmented. By using larger pages, and avoiding unnecessary get_page()/put_page() on compound pages, this patch improves performance as well, lowering false sharing on struct page. Also use GFP_KERNEL allocations in initialization path, as allocating 12 MB (390 order-3 pages) can easily fail with GFP_ATOMIC. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Amir Vadai <amirv@mellanox.com> Acked-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9e77a2b8 |
|
18-Jun-2013 |
Amir Vadai <amirv@mellanox.com> |
net/mlx4_en: Add Low Latency Socket (LLS) support Add basic support for LLS. Signed-off-by: Amir Vadai <amirv@mellanox.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ec693d47 |
|
23-Apr-2013 |
Amir Vadai <amirv@mellanox.com> |
net/mlx4_en: Add HW timestamping (TS) support The patch allows to enable/disable HW timestamping for incoming and/or outgoing packets. It adds and initializes all structs and callbacks needed by kernel TS API. To enable/disable HW timestamping appropriate ioctl should be used. Currently HWTSTAMP_FILTER_ALL/NONE and HWTSAMP_TX_ON/OFF only are supported. When enabling TS on receive flow - VLAN stripping will be disabled. Also were made all relevant changes in RX/TX flows to consider TS request and plant HW timestamps into relevant structures. mlx4_ib was fixed to compile with new mlx4_cq_alloc() signature. Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
86a9bad3 |
|
18-Apr-2013 |
Patrick McHardy <kaber@trash.net> |
net: vlan: add protocol argument to packet tagging functions Add a protocol argument to the VLAN packet tagging functions. In case of HW tagging, we need that protocol available in the ndo_start_xmit functions, so it is stored in a new field in the skb. The new field fits into a hole (on 64 bit) and doesn't increase the sks's size. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b67bfe0d |
|
27-Feb-2013 |
Sasha Levin <sasha.levin@oracle.com> |
hlist: drop the node parameter from iterators I'm not sure why, but the hlist for each entry iterators were conceived list_for_each_entry(pos, head, member) The hlist ones were greedy and wanted an extra parameter: hlist_for_each_entry(tpos, pos, head, member) Why did they need an extra pos parameter? I'm not quite sure. Not only they don't really need it, it also prevents the iterator from looking exactly like the list iterator, which is unfortunate. Besides the semantic patch, there was some manual work required: - Fix up the actual hlist iterators in linux/list.h - Fix up the declaration of other iterators based on the hlist ones. - A very small amount of places were using the 'node' parameter, this was modified to use 'obj->member' instead. - Coccinelle didn't handle the hlist_for_each_entry_safe iterator properly, so those had to be fixed up manually. The semantic patch which is mostly the work of Peter Senna Tschudin is here: @@ iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host; type T; expression a,c,d,e; identifier b; statement S; @@ -T b; <+... when != b ( hlist_for_each_entry(a, - b, c, d) S | hlist_for_each_entry_continue(a, - b, c) S | hlist_for_each_entry_from(a, - b, c) S | hlist_for_each_entry_rcu(a, - b, c, d) S | hlist_for_each_entry_rcu_bh(a, - b, c, d) S | hlist_for_each_entry_continue_rcu_bh(a, - b, c) S | for_each_busy_worker(a, c, - b, d) S | ax25_uid_for_each(a, - b, c) S | ax25_for_each(a, - b, c) S | inet_bind_bucket_for_each(a, - b, c) S | sctp_for_each_hentry(a, - b, c) S | sk_for_each(a, - b, c) S | sk_for_each_rcu(a, - b, c) S | sk_for_each_from -(a, b) +(a) S + sk_for_each_from(a) S | sk_for_each_safe(a, - b, c, d) S | sk_for_each_bound(a, - b, c) S | hlist_for_each_entry_safe(a, - b, c, d, e) S | hlist_for_each_entry_continue_rcu(a, - b, c) S | nr_neigh_for_each(a, - b, c) S | nr_neigh_for_each_safe(a, - b, c, d) S | nr_node_for_each(a, - b, c) S | nr_node_for_each_safe(a, - b, c, d) S | - for_each_gfn_sp(a, c, d, b) S + for_each_gfn_sp(a, c, d) S | - for_each_gfn_indirect_valid_sp(a, c, d, b) S + for_each_gfn_indirect_valid_sp(a, c, d) S | for_each_host(a, - b, c) S | for_each_host_safe(a, - b, c, d) S | for_each_mesh_entry(a, - b, c, d) S ) ...+> [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c] [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c] [akpm@linux-foundation.org: checkpatch fixes] [akpm@linux-foundation.org: fix warnings] [akpm@linux-foudnation.org: redo intrusive kvm changes] Tested-by: Peter Senna Tschudin <peter.senna@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
14f8dc49 |
|
07-Feb-2013 |
Joe Perches <joe@perches.com> |
drivers: net: Remove remaining alloc/OOM messages alloc failures already get standardized OOM messages and a dump_stack. For the affected mallocs around these OOM messages: Converted kmallocs with multiplies to kmalloc_array. Converted a kmalloc/memcpy to kmemdup. Removed now unused stack variables. Removed unnecessary parentheses. Neatened alignment. Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Arend van Spriel <arend@broadcom.com> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> Acked-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c07cb4b0 |
|
06-Feb-2013 |
Yan Burman <yanb@mellanox.com> |
net/mlx4_en: Manage hash of MAC addresses per port As a preparation step for supporting multiple unicast addresses, store MAC addresses in hash table. Remove the radix tree for MAC addresses per QP, as it's not in use. Signed-off-by: Yan Burman <yanb@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6bbb6d99 |
|
06-Feb-2013 |
Yan Burman <yanb@mellanox.com> |
net/mlx4_en: Optimize Rx fast path filter checks Currently, RX path code that does RX filtering is not optimized and does an expensive conversion. In order to use ether_addr_equal_64bits which is optimized for such cases, we need the MAC address kept by the device to be in the form of unsigned char array instead of u64. Store the MAC address as unsigned char array and convert to/from u64 out of the fast path when needed. Side effect of this is that we no longer need priv->mac, since it's the same as dev->dev_addr. This optimization was suggested by Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Yan Burman <yanb@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
79aeaccd |
|
06-Feb-2013 |
Yan Burman <yanb@mellanox.com> |
net/mlx4_en: Optimize loopback related checks in data path Currently there are relatively complex conditional checks in the fast path, for TX loopback enabling and resulting RX filter logic. Move elaborate if's out of data path, replace them with a single flag for each state and update that state from appropriate places. Also, in native (non SRIOV) mode and not in loopback or in selftest, there is no need to try and filter out packets that HW loopback-ed, as in native mode we do not loopback packets anymore. Signed-off-by: Yan Burman <yanb@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
08ff3235 |
|
21-Oct-2012 |
Or Gerlitz <ogerlitz@mellanox.com> |
mlx4: 64-byte CQE/EQE support ConnectX-3 devices can use either 64- or 32-byte completion queue entries (CQEs) and event queue entries (EQEs). Using 64-byte EQEs/CQEs performs better because each entry is aligned to a complete cacheline. This patch queries the HCA's capabilities, and if it supports 64-byte CQEs and EQES the driver will configure the HW to work in 64-byte mode. The 32-byte vs 64-byte mode is global per HCA and not per CQ or EQ. Since this mode is global, userspace (libmlx4) must be updated to work with the configured CQE size, and guests using SR-IOV virtual functions need to know both EQE and CQE size. In case one of the 64-byte CQE/EQE capabilities is activated, the patch makes sure that older guest drivers that use the QUERY_DEV_FUNC command (e.g as done in mlx4_core of Linux 3.3..3.6) will notice that they need an update to be able to work with the PPF. This is done by changing the returned pf_context_behaviour not to be zero any more. In case none of these capabilities is activated that value remains zero and older guest drivers can run OK. The SRIOV related flow is as follows 1. the PPF does the detection of the new capabilities using QUERY_DEV_CAP command. 2. the PPF activates the new capabilities using INIT_HCA. 3. the VF detects if the PPF activated the capabilities using QUERY_HCA, and if this is the case activates them for itself too. Note that the VF detects that it must be aware to the new PF behaviour using QUERY_FUNC_CAP. Steps 1 and 2 apply also for native mode. User space notification is done through a new field introduced in struct mlx4_ib_ucontext which holds device capabilities for which user space must take action. This changes the binary interface so the ABI towards libmlx4 exposed through uverbs is bumped from 3 to 4 but only when **needed** i.e. only when the driver does use 64-byte CQEs or future device capabilities which must be in sync by user space. This practice allows to work with unmodified libmlx4 on older devices (e.g A0, B0) which don't support 64-byte CQEs. In order to keep existing systems functional when they update to a newer kernel that contains these changes in VF and userspace ABI, a module parameter enable_64b_cqe_eqe must be set to enable 64-byte mode; the default is currently false. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
f1d29a3f |
|
15-Nov-2012 |
Ben Hutchings <bhutchings@solarflare.com> |
mlx4_en: Remove remnants of LRO support Commit fa37a9586f92051de03a13e55e5ec3880bb6783e ('mlx4_en: Moving to work with GRO') left behind the Kconfig depends/select, some dead code and comments referring to LRO. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Acked-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c8c40b7f |
|
02-Aug-2012 |
Amir Vadai <amirv@mellanox.com> |
net/mlx4_en: loopbacked packets are dropped when SMAC=DMAC Should NOT check SMAC=DMAC when: 1. loopback is turned on 2. validate_loopback is true. Fixed it accordingly. Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4cce66cd |
|
16-Jul-2012 |
Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com> |
mlx4_en: map entire pages to increase throughput In its receive path, mlx4_en driver maps each page chunk that it pushes to the hardware and unmaps it when pushing it up the stack. This limits throughput to about 3Gbps on a Power7 8-core machine. One solution is to map the entire allocated page at once. However, this requires that we keep track of every page fragment we give to a descriptor. We also need to work with the discipline that all fragments will be released (in the sense that it will not be reused by the driver anymore) in the order they are allocated to the driver. This requires that we don't reuse any fragments, every single one of them must be reallocated. We do that by releasing all the fragments that are processed and only after finished processing the descriptors, we start the refill. We also must somehow guarantee that we either refill all fragments in a descriptor or none at all, without resorting to giving up a page fragment that we would have already given. Otherwise, we would break the discipline of only releasing the fragments in the order they were allocated. This has passed page allocation fault injections (restricted to the driver by using required-start and required-end) and device hotplug while 16 TCP streams were able to deliver more than 9Gbps. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1eb8c695 |
|
18-Jul-2012 |
Amir Vadai <amirv@mellanox.com> |
net/mlx4_en: Add accelerated RFS support Use RFS infrastructure and flow steering in HW to keep CPU affinity of rx interrupts and application per TCP stream. A flow steering filter is added to the HW whenever the RFS ndo callback is invoked by core networking code. Because the invocation takes place in interrupt context, the actual setup of HW is done using workqueue. Whenever new filter is added, the driver checks for expiry of existing filters. Since there's window in time between the point where the core RFS code invoked the ndo callback, to the point where the HW is configured from the workqueue context, the 2nd, 3rd etc packets from that stream will cause the net core to invoke the callback again and again. To prevent inefficient/double configuration of the HW, the filters are kept in a database which is indexed using hash function to enable fast access. Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cabdc8ee |
|
04-Jul-2012 |
Hadar Hen Zion <hadarh@mellanox.co.il> |
net/mlx4_en: Add support for drop action through ethtool The drop action is implemented by allocating a QP and keeping it in a reset state such that the HW drops any packets which are steered to that QP. When a drop action is requested, we attach the relevant flow to that QP. Sign-off-by: Hadar Hen Zion <hadarh@mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0e98b523 |
|
04-Apr-2012 |
Amir Vadai <amirv@mellanox.com> |
net/mlx4_en: Force user priority by QP attribute Instead of relying on HW to change schedule queue by UP, schedule queue is fixed for a tx_ring, and UP in WQE is ignored in this aspect. This resolves two issues with untagged traffic: 1. untagged traffic has no UP in packet which is needed for QoS. The change above allows setting the schedule queue (and by that the UP) of such a stream. 2. BlueFlame uses the same field used by vlan tag. So forcing UP from QPC allows using BF for untagged but prioritized traffic. In old firmware that force UP is not supported, untagged traffic will not subject to QoS. Because UP is set by QP, need to always have a tx ring per UP, even if pfcrx module paramter is false. Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
39b2c4eb |
|
05-Mar-2012 |
Yevgeny Petrilin <yevgenyp@mellanox.co.il> |
net/mlx4: fix sparse warnings on wrong type for RSS keys The keys used for the hardware RSS topelitz hash are of type __be32 where the values provided by the driver are from array of u32, this triggered sparse warning on incorrect type in assignment as of different base types. Since these values are picked randomly, the fix is to transform the key to __be32 by executing cpu_to_be_32() Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ebf8c9aa |
|
05-Mar-2012 |
Yevgeny Petrilin <yevgenyp@mellanox.co.il> |
net/mlx4_en: Saving mem access on data path Localized the pdev->dev, and using dma_map instead of pci_map There are multiple map/unmap operations on data path, optimizing those by saving redundant pointer access. Those places were identified as hot-spots when running kernel profiling during some benchmarks. The fixes had most impact when testing packet rate with small packets, reducing several % from CPU load, and in some case being the difference between reaching wire speed or being CPU bound. Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7e2eb99c |
|
06-Feb-2012 |
Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com> |
mlx4: fix DMA mapping leak when allocation fails mlx4_en_prepare_rx_desc does not correctly clean up after it finds an allocation failure. It should unmap a page before calling put_page, but it only calls the later. This bug would prevent a device removal using hotplug after setting the device MTU to 9000 and opening the network interface. After the fix, we still see the allocation failure with MTU 9000, but we are able to remove the device. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
68355f71 |
|
06-Feb-2012 |
Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com> |
mlx4: allow device removal by fixing dma unmap size After opening the network interface, Mellanox ConnectX device cannot be removed by hotplug because it has not properly unmapped all DMA memory. It happens that mlx4_en_activate_rx_rings overrides the variable that keeps the size of the memory mapped. This is fixed by passing to mlx4_en_destroy_rx_ring the same size that is given to mlx4_en_create_rx_ring. After applying this patch, hot unplugging the device works after opening the interface. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c056b734 |
|
04-Feb-2012 |
Pradeep A Dalvi <netdev@pradeepdalvi.com> |
netdev: ethernet dev_alloc_skb to netdev_alloc_skb Replaced deprecating dev_alloc_skb with netdev_alloc_skb in drivers/net/ethernet - Removed extra skb->dev = dev after netdev_alloc_skb Signed-off-by: Pradeep A Dalvi <netdev@pradeepdalvi.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e404decb |
|
28-Jan-2012 |
Joe Perches <joe@perches.com> |
drivers/net: Remove unnecessary k.alloc/v.alloc OOM messages alloc failures use dump_stack so emitting an additional out-of-memory message is an unnecessary duplication. Remove the allocation failure messages. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
93d3e367 |
|
17-Jan-2012 |
Yevgeny Petrilin <yevgenyp@mellanox.co.il> |
mlx4_en: set number of rx rings used by RSS using ethtool Value must be a power of 2 due to HW limitation. Driver supports only 'equal' mode in ethtool and can't be set by using weights. Signed-off-by: Amir Vadai <amirv@mellanox.co.il> Reviewed-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
89efea25 |
|
19-Dec-2011 |
Yevgeny Petrilin <yevgenyp@mellanox.co.il> |
mlx4_en: FIX: Setting default_qpn before using it When UDP RSS is enabled, we use same QPN for TCP and UDP ranges The bug is that the default_qpn was used base UDP qpn before it was set. Fixes bug introduced in commit: 1202d460b1df3a77fda66f4ba5f90ae3527dd42f Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5b4c4d36 |
|
12-Dec-2011 |
Eugenia Emantayev <eugenia@mellanox.co.il> |
mlx4_en: Allow communication between functions on same host To enable internal loopback, always fill DMAC in control segment when transmitting the packet, once this is done, the packet is subject for loopback for if the DMAC mathces one of the multicast/unicast addresses registered on the physical port. In receive path if source MAC is our own MAC and we are not in selftest, or not in force LB mode - drop this packet. Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il> Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1202d460 |
|
26-Nov-2011 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx4: fix UDP RSS related settings Using RSS which takes into account UDP headers is controlled by a module param, fix the setting of the HW RSS context to align with that scheme. So far it was uncoditionally allowing hashing on the UDP headers. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
876f6e67 |
|
26-Nov-2011 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx4: move RSS related definitions to be global Towards adding RSS support for IB drivers/application who use the mlx4 HW, make the RSS related definitions global and change the mlx4_en driver to use them. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4a5f4dd8 |
|
14-Nov-2011 |
Yevgeny Petrilin <yevgenyp@mellanox.co.il> |
mlx4_en: Remove FCS bytes from packet length. When HW doesn't remove FCS bytes they are reported in the completion byte count, we don't need to take them to skb. Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
311761c8 |
|
19-Oct-2011 |
Ian Campbell <Ian.Campbell@citrix.com> |
mlx4: convert to SKB paged frag API. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
90278c9f |
|
19-Oct-2011 |
Eric Dumazet <eric.dumazet@gmail.com> |
mlx4_en: fix skb truesize underestimation skb->truesize must account for allocated memory, not the used part of it. Doing this work is important to avoid unexpected OOM situations. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ad86107f |
|
17-Oct-2011 |
Yevgeny Petrilin <yevgenyp@mellanox.co.il> |
mlx4_en: Adding rxhash support Moving to Toeplitz function in RSS calculation. Reporting rxhash in skb. Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3b61008d |
|
17-Oct-2011 |
Yevgeny Petrilin <yevgenyp@mellanox.co.il> |
mlx4_en: Recording rx queue for gro packets Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ad04378c |
|
17-Oct-2011 |
Yevgeny Petrilin <yevgenyp@mellanox.co.il> |
mlx4_en: Checksum counters per ring Not updating common counters from data path. The checksum counters are per ring, summarizing them when collecting statistics. Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f3a9d1f2 |
|
17-Oct-2011 |
Yevgeny Petrilin <yevgenyp@mellanox.co.il> |
mlx4_en: Controlling FCS header removal Canceling FCS removal where FW allows for better alignment of incoming data. Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9e903e08 |
|
18-Oct-2011 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: add skb frag size accessors To ease skb->truesize sanitization, its better to be able to localize all references to skb frags size. Define accessors : skb_frag_size() to fetch frag size, and skb_frag_size_{set|add|sub}() to manipulate it. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5a2cc190 |
|
13-May-2011 |
Jeff Kirsher <jeffrey.t.kirsher@intel.com> |
mlx4: Move the Mellanox driver Moves the Mellanox driver into drivers/net/ethernet/mellanox/ and make the necessary Kconfig and Makefile changes. CC: Roland Dreier <roland@kernel.org> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|