#
aa4ac90d |
|
11-Apr-2024 |
Tariq Toukan <tariqt@nvidia.com> |
net/mlx5: SD, Handle possible devcom ERR_PTR Check if devcom holds an error pointer and return immediately. This fixes Smatch static checker warning: drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c:221 sd_register() error: 'devcom' dereferencing possible ERR_PTR() Enhance mlx5_devcom_register_component() so it stops returning NULL, making it easier for its callers. Fixes: d3d057666090 ("net/mlx5: SD, Implement devcom communication and primary election") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://lore.kernel.org/all/f09666c8-e604-41f6-958b-4cc55c73faf9@gmail.com/T/ Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Link: https://lore.kernel.org/r/20240411115444.374475-3-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
7772dc74 |
|
09-Apr-2024 |
Tariq Toukan <tariqt@nvidia.com> |
net/mlx5: Disallow SRIOV switchdev mode when in multi-PF netdev Adaptations need to be made for the auxiliary device management in the core driver level. Block this combination for now. Fixes: 678eb448055a ("net/mlx5: SD, Implement basic query and instantiation") Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Link: https://lore.kernel.org/r/20240409190820.227554-12-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
0553e753 |
|
09-Apr-2024 |
Shay Drory <shayd@nvidia.com> |
net/mlx5: E-switch, store eswitch pointer before registering devlink_param Next patch will move devlink register to be first. Therefore, whenever mlx5 will register a param, the user will be notified. In order to notify the user, devlink is using the get() callback of the param. Hence, resources that are being used by the get() callback must be set before the devlink param is registered. Therefore, store eswitch pointer inside mdev before registering the param. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240409190820.227554-2-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
85ea2c5c |
|
10-Jan-2024 |
Jianbo Liu <jianbol@nvidia.com> |
net/mlx5: E-switch, Change flow rule destination checking The checking in the cited commit is not accurate. In the common case, VF destination is internal, and uplink destination is external. However, uplink destination with packet reformat is considered as internal because firmware uses LB+hairpin to support it. Update the checking so header rewrite rules with both internal and external destinations are not allowed. Fixes: e0e22d59b47a ("net/mlx5: E-switch, Add checking for flow rule destinations") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
8deeefb2 |
|
18-Oct-2023 |
Gavin Li <gavinl@nvidia.com> |
Revert "net/mlx5: Block entering switchdev mode with ns inconsistency" This reverts commit 662404b24a4c4d839839ed25e3097571f5938b9b. The revert is required due to the suspicion it is not good for anything and cause crash. Fixes: 662404b24a4c ("net/mlx5e: Block entering switchdev mode with ns inconsistency") Signed-off-by: Gavin Li <gavinl@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
04ad04e4 |
|
06-Oct-2023 |
Vlad Buslov <vladbu@nvidia.com> |
net/mlx5: Refactor mlx5_flow_destination->rep pointer to vport num Currently the destination rep pointer is only used for comparisons or to obtain vport number from it. Since it is used both during flow creation and deletion it may point to representor of another eswitch instance which can be deallocated during driver unload even when there are rules pointing to it[0]. Refactor the code to store vport number and 'valid' flag instead of the representor pointer. [0]: [176805.886303] ================================================================== [176805.889433] BUG: KASAN: slab-use-after-free in esw_cleanup_dests+0x390/0x440 [mlx5_core] [176805.892981] Read of size 2 at addr ffff888155090aa0 by task modprobe/27280 [176805.895462] CPU: 3 PID: 27280 Comm: modprobe Tainted: G B 6.6.0-rc3+ #1 [176805.896771] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [176805.898514] Call Trace: [176805.899026] <TASK> [176805.899519] dump_stack_lvl+0x33/0x50 [176805.900221] print_report+0xc2/0x610 [176805.900893] ? mlx5_chains_put_table+0x33d/0x8d0 [mlx5_core] [176805.901897] ? esw_cleanup_dests+0x390/0x440 [mlx5_core] [176805.902852] kasan_report+0xac/0xe0 [176805.903509] ? esw_cleanup_dests+0x390/0x440 [mlx5_core] [176805.904461] esw_cleanup_dests+0x390/0x440 [mlx5_core] [176805.905223] __mlx5_eswitch_del_rule+0x1ae/0x460 [mlx5_core] [176805.906044] ? esw_cleanup_dests+0x440/0x440 [mlx5_core] [176805.906822] ? xas_find_conflict+0x420/0x420 [176805.907496] ? down_read+0x11e/0x200 [176805.908046] mlx5e_tc_rule_unoffload+0xc4/0x2a0 [mlx5_core] [176805.908844] mlx5e_tc_del_fdb_flow+0x7da/0xb10 [mlx5_core] [176805.909597] mlx5e_flow_put+0x4b/0x80 [mlx5_core] [176805.910275] mlx5e_delete_flower+0x5b4/0xb70 [mlx5_core] [176805.911010] tc_setup_cb_reoffload+0x27/0xb0 [176805.911648] fl_reoffload+0x62d/0x900 [cls_flower] [176805.912313] ? mlx5e_rep_indr_block_unbind+0xd0/0xd0 [mlx5_core] [176805.913151] ? __fl_put+0x230/0x230 [cls_flower] [176805.913768] ? filter_irq_stacks+0x90/0x90 [176805.914335] ? kasan_save_stack+0x1e/0x40 [176805.914893] ? kasan_set_track+0x21/0x30 [176805.915484] ? kasan_save_free_info+0x27/0x40 [176805.916105] tcf_block_playback_offloads+0x79/0x1f0 [176805.916773] ? mlx5e_rep_indr_block_unbind+0xd0/0xd0 [mlx5_core] [176805.917647] tcf_block_unbind+0x12d/0x330 [176805.918239] tcf_block_offload_cmd.isra.0+0x24e/0x320 [176805.918953] ? tcf_block_bind+0x770/0x770 [176805.919551] ? _raw_read_unlock_irqrestore+0x30/0x30 [176805.920236] ? mutex_lock+0x7d/0xd0 [176805.920735] ? mutex_unlock+0x80/0xd0 [176805.921255] tcf_block_offload_unbind+0xa5/0x120 [176805.921909] __tcf_block_put+0xc2/0x2d0 [176805.922467] ingress_destroy+0xf4/0x3d0 [sch_ingress] [176805.923178] __qdisc_destroy+0x9d/0x280 [176805.923741] dev_shutdown+0x1c6/0x330 [176805.924295] unregister_netdevice_many_notify+0x6ef/0x1500 [176805.925034] ? netdev_freemem+0x50/0x50 [176805.925610] ? _raw_spin_lock_irq+0x7b/0xd0 [176805.926235] ? _raw_spin_lock_bh+0xe0/0xe0 [176805.926849] unregister_netdevice_queue+0x1e0/0x280 [176805.927592] ? unregister_netdevice_many+0x10/0x10 [176805.928275] unregister_netdev+0x18/0x20 [176805.928835] mlx5e_vport_rep_unload+0xc0/0x200 [mlx5_core] [176805.929608] mlx5_esw_offloads_unload_rep+0x9d/0xc0 [mlx5_core] [176805.930492] mlx5_eswitch_unload_vf_vports+0x108/0x1a0 [mlx5_core] [176805.931422] ? mlx5_eswitch_unload_sf_vport+0x50/0x50 [mlx5_core] [176805.932304] ? rwsem_down_write_slowpath+0x11f0/0x11f0 [176805.932987] mlx5_eswitch_disable_sriov+0x6f9/0xa60 [mlx5_core] [176805.933807] ? mlx5_core_disable_hca+0xe1/0x130 [mlx5_core] [176805.934576] ? mlx5_eswitch_disable_locked+0x580/0x580 [mlx5_core] [176805.935463] mlx5_device_disable_sriov+0x138/0x490 [mlx5_core] [176805.936308] mlx5_sriov_disable+0x8c/0xb0 [mlx5_core] [176805.937063] remove_one+0x7f/0x210 [mlx5_core] [176805.937711] pci_device_remove+0x96/0x1c0 [176805.938289] device_release_driver_internal+0x361/0x520 [176805.938981] ? kobject_put+0x5c/0x330 [176805.939553] driver_detach+0xd7/0x1d0 [176805.940101] bus_remove_driver+0x11f/0x290 [176805.943847] pci_unregister_driver+0x23/0x1f0 [176805.944505] mlx5_cleanup+0xc/0x20 [mlx5_core] [176805.945189] __x64_sys_delete_module+0x2b3/0x450 [176805.945837] ? module_flags+0x300/0x300 [176805.946377] ? dput+0xc2/0x830 [176805.946848] ? __kasan_record_aux_stack+0x9c/0xb0 [176805.947555] ? __call_rcu_common.constprop.0+0x46c/0xb50 [176805.948338] ? fpregs_assert_state_consistent+0x1d/0xa0 [176805.949055] ? exit_to_user_mode_prepare+0x30/0x120 [176805.949713] do_syscall_64+0x3d/0x90 [176805.950226] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [176805.950904] RIP: 0033:0x7f7f42c3f5ab [176805.951462] Code: 73 01 c3 48 8b 0d 75 a8 1b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 45 a8 1b 00 f7 d8 64 89 01 48 [176805.953710] RSP: 002b:00007fff07dc9d08 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 [176805.954691] RAX: ffffffffffffffda RBX: 000055b6e91c01e0 RCX: 00007f7f42c3f5ab [176805.955691] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 000055b6e91c0248 [176805.956662] RBP: 000055b6e91c01e0 R08: 0000000000000000 R09: 0000000000000000 [176805.957601] R10: 00007f7f42d9eac0 R11: 0000000000000206 R12: 000055b6e91c0248 [176805.958593] R13: 0000000000000000 R14: 000055b6e91bfb38 R15: 0000000000000000 [176805.959599] </TASK> [176805.960324] Allocated by task 20490: [176805.960893] kasan_save_stack+0x1e/0x40 [176805.961463] kasan_set_track+0x21/0x30 [176805.962019] __kasan_kmalloc+0x77/0x90 [176805.962554] esw_offloads_init+0x1bb/0x480 [mlx5_core] [176805.963318] mlx5_eswitch_init+0xc70/0x15c0 [mlx5_core] [176805.964092] mlx5_init_one_devl_locked+0x366/0x1230 [mlx5_core] [176805.964902] probe_one+0x6f7/0xc90 [mlx5_core] [176805.965541] local_pci_probe+0xd7/0x180 [176805.966075] pci_device_probe+0x231/0x6f0 [176805.966631] really_probe+0x1d4/0xb50 [176805.967179] __driver_probe_device+0x18d/0x450 [176805.967810] driver_probe_device+0x49/0x120 [176805.968431] __driver_attach+0x1fb/0x490 [176805.968976] bus_for_each_dev+0xed/0x170 [176805.969560] bus_add_driver+0x21a/0x570 [176805.970124] driver_register+0x133/0x460 [176805.970684] 0xffffffffa0678065 [176805.971180] do_one_initcall+0x92/0x2b0 [176805.971744] do_init_module+0x22d/0x720 [176805.972318] load_module+0x58c3/0x63b0 [176805.972847] init_module_from_file+0xd2/0x130 [176805.973441] __x64_sys_finit_module+0x389/0x7c0 [176805.974045] do_syscall_64+0x3d/0x90 [176805.974556] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [176805.975566] Freed by task 27280: [176805.976077] kasan_save_stack+0x1e/0x40 [176805.976655] kasan_set_track+0x21/0x30 [176805.977221] kasan_save_free_info+0x27/0x40 [176805.977834] ____kasan_slab_free+0x11a/0x1b0 [176805.978505] __kmem_cache_free+0x163/0x2d0 [176805.979113] esw_offloads_cleanup_reps+0xb8/0x120 [mlx5_core] [176805.979963] mlx5_eswitch_cleanup+0x182/0x270 [mlx5_core] [176805.980763] mlx5_cleanup_once+0x9a/0x1e0 [mlx5_core] [176805.981477] mlx5_uninit_one+0xa9/0x180 [mlx5_core] [176805.982196] remove_one+0x8f/0x210 [mlx5_core] [176805.982868] pci_device_remove+0x96/0x1c0 [176805.983461] device_release_driver_internal+0x361/0x520 [176805.984169] driver_detach+0xd7/0x1d0 [176805.984702] bus_remove_driver+0x11f/0x290 [176805.985261] pci_unregister_driver+0x23/0x1f0 [176805.985847] mlx5_cleanup+0xc/0x20 [mlx5_core] [176805.986483] __x64_sys_delete_module+0x2b3/0x450 [176805.987126] do_syscall_64+0x3d/0x90 [176805.987665] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [176805.988667] Last potentially related work creation: [176805.989305] kasan_save_stack+0x1e/0x40 [176805.989839] __kasan_record_aux_stack+0x9c/0xb0 [176805.990443] kvfree_call_rcu+0x84/0xa30 [176805.990973] clean_xps_maps+0x265/0x6e0 [176805.991547] netif_reset_xps_queues.part.0+0x3f/0x80 [176805.992226] unregister_netdevice_many_notify+0xfcf/0x1500 [176805.992966] unregister_netdevice_queue+0x1e0/0x280 [176805.993638] unregister_netdev+0x18/0x20 [176805.994205] mlx5e_remove+0xba/0x1e0 [mlx5_core] [176805.994872] auxiliary_bus_remove+0x52/0x70 [176805.995490] device_release_driver_internal+0x361/0x520 [176805.996196] bus_remove_device+0x1e1/0x3d0 [176805.996767] device_del+0x390/0x980 [176805.997270] mlx5_rescan_drivers_locked.part.0+0x130/0x540 [mlx5_core] [176805.998195] mlx5_unregister_device+0x77/0xc0 [mlx5_core] [176805.998989] mlx5_uninit_one+0x41/0x180 [mlx5_core] [176805.999719] remove_one+0x8f/0x210 [mlx5_core] [176806.000387] pci_device_remove+0x96/0x1c0 [176806.000938] device_release_driver_internal+0x361/0x520 [176806.001612] unbind_store+0xd8/0xf0 [176806.002108] kernfs_fop_write_iter+0x2c0/0x440 [176806.002748] vfs_write+0x725/0xba0 [176806.003294] ksys_write+0xed/0x1c0 [176806.003823] do_syscall_64+0x3d/0x90 [176806.004357] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [176806.005317] The buggy address belongs to the object at ffff888155090a80 which belongs to the cache kmalloc-64 of size 64 [176806.006774] The buggy address is located 32 bytes inside of freed 64-byte region [ffff888155090a80, ffff888155090ac0) [176806.008773] The buggy address belongs to the physical page: [176806.009480] page:00000000a407e0e6 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x155090 [176806.010633] flags: 0x200000000000800(slab|node=0|zone=2) [176806.011352] page_type: 0xffffffff() [176806.011905] raw: 0200000000000800 ffff888100042640 ffffea000422b1c0 dead000000000004 [176806.012949] raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000 [176806.013933] page dumped because: kasan: bad access detected [176806.014935] Memory state around the buggy address: [176806.015601] ffff888155090980: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc [176806.016568] ffff888155090a00: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc [176806.017497] >ffff888155090a80: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc [176806.018438] ^ [176806.019007] ffff888155090b00: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc [176806.020001] ffff888155090b80: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc [176806.020996] ================================================================== Fixes: a508728a4c8b ("net/mlx5e: VF tunnel RX traffic offloading") Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
da75fa54 |
|
13-Nov-2023 |
Jianbo Liu <jianbol@nvidia.com> |
net/mlx5e: Fix overrun reported by coverity Coverity Scan reports the following issue. But it's impossible that mlx5_get_dev_index returns 7 for PF, even if the index is calculated from PCI FUNC ID. So add the checking to make coverity slience. CID 610894 (#2 of 2): Out-of-bounds write (OVERRUN) Overrunning array esw->fdb_table.offloads.peer_miss_rules of 4 8-byte elements at element index 7 (byte offset 63) using index mlx5_get_dev_index(peer_dev) (which evaluates to 7). Fixes: 9bee385a6e39 ("net/mlx5: E-switch, refactor FDB miss rule add/remove") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
7aaf9752 |
|
30-Aug-2023 |
Gavin Li <gavinl@nvidia.com> |
net/mlx5e: Check netdev pointer before checking its net ns Previously, when comparing the net namespaces, the case where the netdev doesn't exist wasn't taken into account, and therefore can cause a crash. In such a case, the comparing function should return false, as there is no netdev->net to compare the devlink->net to. Furthermore, this will result in an attempt to enter switchdev mode without a netdev to fail, and which is the desired result as there is no meaning in switchdev mode without a net device. Fixes: 662404b24a4c ("net/mlx5e: Block entering switchdev mode with ns inconsistency") Signed-off-by: Gavin Li <gavinl@nvidia.com> Reviewed-by: Gavi Teitz <gavi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
baac8351 |
|
10-Oct-2023 |
Jianbo Liu <jianbol@nvidia.com> |
net/mlx5e: Reduce eswitch mode_lock protection context Currently eswitch mode_lock is so heavy, for example, it's locked during the whole process of the mode change, which may need to hold other locks. As the mode_lock is also used by IPSec to block mode and encap change now, it is easy to cause lock dependency. Since some of protections are also done by devlink lock, the eswitch mode_lock is not needed at those places, and thus the possibility of lockdep issue is reduced. Fixes: c8e350e62fc5 ("net/mlx5e: Make TC and IPsec offloads mutually exclusive on a netdev") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
#
bdf788cf |
|
14-Nov-2023 |
Jianbo Liu <jianbol@nvidia.com> |
net/mlx5e: Don't modify the peer sent-to-vport rules for IPSec offload As IPSec packet offload in switchdev mode is not supported with LAG, it's unnecessary to modify those sent-to-vport rules to the peer eswitch. Fixes: c6c2bf5db4ea ("net/mlx5e: Support IPsec packet offload for TX in switchdev mode") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20231114215846.5902-9-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
34413460 |
|
05-Sep-2023 |
Bodong Wang <bodong@nvidia.com> |
mlx5/core: E-Switch, Create ACL FT for eswitch manager in switchdev mode ACL flow table is required in switchdev mode when metadata is enabled, driver creates such table when loading each vport. However, not every vport is loaded in switchdev mode. Such as ECPF if it's the eswitch manager. In this case, ACL flow table is still needed. To make it modularized, create ACL flow table for eswitch manager as default and skip such operations when loading manager vport. Also, there is no need to load the eswitch manager vport in switchdev mode. This means there is no need to load it on regular connect-x HCAs where the PF is the eswitch manager. This will avoid creating duplicate ACL flow table for host PF vport. Fixes: 29bcb6e4fe70 ("net/mlx5e: E-Switch, Use metadata for vport matching in send-to-vport rules") Fixes: eb8e9fae0a22 ("mlx5/core: E-Switch, Allocate ECPF vport if it's an eswitch manager") Fixes: 5019833d661f ("net/mlx5: E-switch, Introduce helper function to enable/disable vports") Signed-off-by: Bodong Wang <bodong@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b691b111 |
|
25-Aug-2023 |
Dima Chumak <dchumak@nvidia.com> |
net/mlx5: Implement devlink port function cmds to control ipsec_packet Implement devlink port function commands to enable / disable IPsec packet offloads. This is used to control the IPsec capability of the device. When ipsec_offload is enabled for a VF, it prevents adding IPsec packet offloads on the PF, because the two cannot be active simultaneously due to HW constraints. Conversely, if there are any active IPsec packet offloads on the PF, it's not allowed to enable ipsec_packet on a VF, until PF IPsec offloads are cleared. Signed-off-by: Dima Chumak <dchumak@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20230825062836.103744-9-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
06bab696 |
|
25-Aug-2023 |
Dima Chumak <dchumak@nvidia.com> |
net/mlx5: Implement devlink port function cmds to control ipsec_crypto Implement devlink port function commands to enable / disable IPsec crypto offloads. This is used to control the IPsec capability of the device. When ipsec_crypto is enabled for a VF, it prevents adding IPsec crypto offloads on the PF, because the two cannot be active simultaneously due to HW constraints. Conversely, if there are any active IPsec crypto offloads on the PF, it's not allowed to enable ipsec_crypto on a VF, until PF IPsec offloads are cleared. Signed-off-by: Dima Chumak <dchumak@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20230825062836.103744-8-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
e2537341 |
|
25-Aug-2023 |
Leon Romanovsky <leon@kernel.org> |
net/mlx5e: Rewrite IPsec vs. TC block interface In the commit 366e46242b8e ("net/mlx5e: Make IPsec offload work together with eswitch and TC"), new API to block IPsec vs. TC creation was introduced. Internally, that API used devlink lock to avoid races with userspace, but it is not really needed as dev->priv.eswitch is stable and can't be changed. So remove dependency on devlink lock and move block encap code back to its original place. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20230825062836.103744-5-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
c46fb773 |
|
25-Aug-2023 |
Leon Romanovsky <leon@kernel.org> |
net/mlx5: Drop extra layer of locks in IPsec There is no need in holding devlink lock as it gives nothing compared to already used write mode_lock. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20230825062836.103744-4-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
7d833520 |
|
31-May-2023 |
Jiri Pirko <jiri@resnulli.us> |
net/mlx5: Store vport in struct mlx5_devlink_port and use it in port ops Instead of using internal devlink_port->index to perform vport lookup in every devlink port op, store the vport pointer to the container struct mlx5_devlink_port and use it directly in port ops. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
eb555e34 |
|
01-Jun-2023 |
Jiri Pirko <jiri@resnulli.us> |
net/mlx5: Check vhca_resource_manager capability in each op and add extack msg Since the follow-up patch is going to remove mlx5_devlink_port_fn_get_vport() entirely, move the vhca_resource_manager capability checking to individual ops. Add proper extack message on the way. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
5c632cc3 |
|
01-Jun-2023 |
Jiri Pirko <jiri@resnulli.us> |
net/mlx5: Relax mlx5_devlink_eswitch_get() return value checking If called from port ops, it is not needed to perform the checks in mlx5_devlink_eswitch_get(). The reason is devlink port would not be registered if the checks are not true. Introduce relaxed version mlx5_devlink_eswitch_nocheck_get() and use it in port ops. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
c0ae0092 |
|
01-Jun-2023 |
Jiri Pirko <jiri@resnulli.us> |
net/mlx5: Return -EOPNOTSUPP in mlx5_devlink_port_fn_migratable_set() directly Instead of initializing "err" variable, just return "-EOPNOTSUPP" directly where it is needed. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
2caa2a39 |
|
31-May-2023 |
Jiri Pirko <jiri@resnulli.us> |
net/mlx5: Reduce number of vport lookups passing vport pointer instead of index During devlink port init/cleanup and register/unregister calls, there are many lookups of vport. Instead of passing vport_num as argument to functions, pass the vport struct pointer directly and avoid repeated lookups. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
2c5f33f6 |
|
31-May-2023 |
Jiri Pirko <jiri@resnulli.us> |
net/mlx5: Embed struct devlink_port into driver structure Struct devlink_port is usually embedded in a driver-specific struct which allows to carry driver context to devlink port ops. Introduce a container struct to include devlink_port struct in preparation to also include driver context for devlink port ops. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
13f878a2 |
|
01-Jun-2023 |
Jiri Pirko <jiri@resnulli.us> |
net/mlx5: Don't register ops for non-PF/VF/SF port and avoid checks in ops Currently each PF/VF/SF devlink port op called into mlx5 code calls is_port_function_supported() to check if the port is either PF, VF or SF. So make sure that the ops are registered with devlink port only for those and avoid the is_port_function_supported() checks in ops. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
b940ec4b |
|
26-May-2023 |
Jiri Pirko <jiri@resnulli.us> |
net/mlx5: Remove no longer used mlx5_esw_offloads_sf_vport_enable/disable() Since the previous patch removed the only users of these functions, remove them. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
e855afd7 |
|
26-May-2023 |
Jiri Pirko <jiri@resnulli.us> |
net/mlx5: Introduce mlx5_eswitch_load/unload_sf_vport() and use it from SF code Similar to the PF/VF helpers, introduce a set of load/unload helpers for SF vports. From there, call mlx5_eswitch_load/unload_vport() which are common for PFs/VFs and newly introduced SF helpers. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
d9833bcf |
|
25-May-2023 |
Jiri Pirko <jiri@resnulli.us> |
net/mlx5: Push devlink port PF/VF init/cleanup calls out of devlink_port_register/unregister() In order to prepare for mlx5_esw_offloads_devlink_port_register/unregister() to be used for SFs as well, push out the PF/VF specific init/cleanup calls outside. Introduce mlx5_eswitch_load/unload_pf_vf_vport() and call them from there. Use these new helpers of PF/VF loading and make mlx5_eswitch_local/unload_vport() reusable for SFs. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
ba3d85f0 |
|
24-May-2023 |
Jiri Pirko <jiri@resnulli.us> |
net/mlx5: Call mlx5_esw_offloads_rep_load/unload() for uplink port directly For uplink port, mlx5_esw_offloads_load/unload_rep() are currently called. There are 2 check inside, which effectively make the functions a simple wrappers of mlx5_esw_offloads_rep_load/unload() for uplink port. So avoid one check and indirection and call mlx5_esw_offloads_rep_load/unload() for uplink port directly. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
e0e22d59 |
|
18-Apr-2023 |
Jianbo Liu <jianbol@nvidia.com> |
net/mlx5: E-switch, Add checking for flow rule destinations Firmware doesn't allow flow rules in FDB to do header rewrite and send packets to both internal and uplink vports. The following syndrome will be generated when trying to offload such kind of rules: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 23569): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x8c8f08), err(-22) To avoid this syndrome, add a checking before creating FTE. If a rule with header rewrite action forwards packets to both VF and PF, an error is returned directly. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
d602be22 |
|
11-Jul-2023 |
Roi Dayan <roid@nvidia.com> |
net/mlx5: E-Switch, Remove redundant arg ignore_flow_lvl The arg is always passed as true and thus redundant. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
d1569537 |
|
31-Jul-2023 |
Jianbo Liu <jianbol@nvidia.com> |
net/mlx5e: Modify and restore TC rules for IPSec TX rules After IPsec policy/state TX rules are added, any TC flow rule, which forwards packets to uplink, is modified to forward to IPsec TX tables. As these tables are destroyed dynamically, whenever there is no reference to them, the destinations of this kind of rules must be restored to uplink. There is a special case for packet encapsulation, as the packet_reformat_id in the extended destination is used to reformat packets, but only for the VPORT destination. To forward packet to IPsec table and do encapsulation in one FTE, move the packet_reformat_id to flow context, instead of using the extended destination. As a limitation, multiple encapsulations with table forwarding, and one together with other VPORT destinations, are not allowed, so add a check when offloading TC rules. TC rules are not allowed before IPsec TX rule is added, so only need to restore TC rules after flush IPSec TX rules. As they are saved in the vport_rep rhashtables, we walk all the rules in the rhashtables, and find TC rules with destinations pointing to IPsec tables, and modify them one by one. To avoid concurrent issue, this handling is done under the protection of eswitch mode_lock. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/7bcb2c7e2ecf0e0d06b095c8dcc6a37ea7f02faf.1690802064.git.leon@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
366e4624 |
|
31-Jul-2023 |
Jianbo Liu <jianbol@nvidia.com> |
net/mlx5e: Make IPsec offload work together with eswitch and TC The eswitch mode is not allowed to change if there are any IPsec rules. Besides, by using mlx5_esw_try_lock() to get eswitch mode lock, IPsec rules are not allowed to be offloaded if there are any TC rules. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/e442b512b21a931fbdfb87d57ae428c37badd58a.1690802064.git.leon@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
c6c2bf5d |
|
31-Jul-2023 |
Jianbo Liu <jianbol@nvidia.com> |
net/mlx5e: Support IPsec packet offload for TX in switchdev mode The IPsec encryption is done at the last, so add new prio for IPsec offload in FDB, and put it just lower than the slow path prio and higher than the per-vport prio. Three levels are added for TX. The first one is for ip xfrm policy. The sa table is created in the second level for ip xfrm state. The status table is created at the last to count the number of packets encrypted. The rules, which forward packets to uplink, are changed to forward them to IPsec TX tables first. These rules are restored after those tables are destroyed, which is done immediately when there is no reference to them, just as what does in legacy mode. The support for slow path is added here, by refreshing uplink's channels. But, the handling for TC fast path, which is more complicated, will be added later. Besides, reg c4 is used instead to match reqid. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/cfd0e6ffaf0b8c55ebaa9fb0649b7c504b6b8ec6.1690802064.git.leon@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
9eca8bb8 |
|
25-May-2023 |
Jiri Pirko <jiri@resnulli.us> |
net/mlx5: Give esw_offloads_load/unload_rep() "mlx5_" prefix As esw_offloads_load/unload_rep() are used outside eswitch.c it is nicer for them to have "mlx5_" prefix. Add it. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
b7186387 |
|
25-May-2023 |
Jiri Pirko <jiri@resnulli.us> |
net/mlx5: Make mlx5_esw_offloads_rep_load/unload() static mlx5_esw_offloads_rep_load/unload() functions are not used outside of eswitch_offloads.c. Make them static. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
550449d8 |
|
01-Jun-2023 |
Jiri Pirko <jiri@resnulli.us> |
net/mlx5: Don't check vport->enabled in port ops vport->enabled is always set for a vport for which a devlink port is registered, therefore the checks in the ops are pointless. Remove those. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
e2bb7984 |
|
23-May-2023 |
Roi Dayan <roid@nvidia.com> |
net/mlx5e: E-Switch, Allow devcom initialization on more vports New features could use the devcom interface but not necessarily the lag feature although for vport managers and ECPF still check for lag support. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
1161d22d |
|
22-May-2023 |
Roi Dayan <roid@nvidia.com> |
net/mlx5e: E-Switch, Register devcom device with switch id key Register devcom devices with switch id instead of guid. Devcom interface is used to sync between ports in the eswitch, e.g. Adding miss rules between the ports. New eswitch devices could have the same guid but a different switch id so its more correct to group according to switch id which is the identifier if the ports are on the same eswitch. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
88d162b4 |
|
03-May-2023 |
Roi Dayan <roid@nvidia.com> |
net/mlx5: Devcom, Infrastructure changes Update devcom infrastructure to be more generic, without depending on max supported ports definition or a device guid, and also more encapsulated so callers don't need to pass the register devcom component id per event call. Signed-off-by: Eli Cohen <elic@nvidia.com> Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
62752c0b |
|
14-Jun-2023 |
Shay Drory <shayd@nvidia.com> |
net/mlx5: DR, Fix peer domain namespace setting The offending patch is based on the assumption that for PFs, mlx5_get_dev_index() is the same as vhca_id. However, this assumption is wrong in case of DPU (ECPF). Fix it by using vhca_id directly, and switch the array of peers to xarray. Fixes: 6d5b7321d8af ("net/mlx5: DR, handle more than one peer domain") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
61eab651 |
|
16-Jul-2023 |
Chris Mi <cmi@nvidia.com> |
net/mlx5: fs_chains: Fix ft prio if ignore_flow_level is not supported The cited commit sets ft prio to fs_base_prio. But if ignore_flow_level it not supported, ft prio must be set based on tc filter prio. Otherwise, all the ft prio are the same on the same chain. It is invalid if ignore_flow_level is not supported. Fix it by setting ft prio based on tc filter prio and setting fs_base_prio to 0 for fdb. Fixes: 8e80e5648092 ("net/mlx5: fs_chains: Refactor to detach chains from tc usage") Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
0507f2c8 |
|
03-Jul-2023 |
Shay Drory <shayd@nvidia.com> |
net/mlx5: Honor user input for migratable port fn attr Currently, whenever a user is setting migratable port fn attr, the driver is always turn migratable capability on. Fix it by honor the user input Fixes: e5b9642a33be ("net/mlx5: E-Switch, Implement devlink port function cmds to control migratable") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
899862b6 |
|
31-May-2023 |
Jiri Pirko <jiri@resnulli.us> |
net/mlx5: Remove redundant check from mlx5_esw_query_vport_vhca_id() Since mlx5_esw_query_vport_vhca_id() could be called either from mlx5_esw_vport_enable() or mlx5_esw_vport_disable() where the the check is done, this is always false here. Remove the redundant check. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
15ddd72e |
|
29-May-2023 |
Roi Dayan <roid@nvidia.com> |
net/mlx5e: E-Switch, Fix shared fdb error flow On error flow resources being freed in esw_master_egress_destroy_resources() but pointers not being set to null if error flow is from creating a bounce rule. Then in esw_acl_egress_ofld_cleanup() we try to access already freed pointers. Fix it by resetting the pointers to null. Also if error is from creating a second or later bounce rule then the flow group and table being used and cannot and should not be freed. Add a check to destroy the flow group and table if there are no bounce rules. mlx5_core.sf mlx5_core.sf.2: mlx5_destroy_flow_group:2306:(pid 2235): Flow group 4 wasn't destroyed, refcount > 1 mlx5_core.sf mlx5_core.sf.2: mlx5_destroy_flow_table:2295:(pid 2235): Flow table 3 wasn't destroyed, refcount > 1 Fixes: 5e0202eb49ed ("net/mlx5: E-switch, Handle multiple master egress rules") Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
ae4de894 |
|
29-May-2023 |
Roi Dayan <roid@nvidia.com> |
net/mlx5e: Remove redundant comment The function comment says what it is and the comment is redundant. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
4575ab3b |
|
28-May-2023 |
Roi Dayan <roid@nvidia.com> |
net/mlx5e: E-Switch, Pass other_vport flag if vport is not 0 When creating flow table for shared fdb resources, there is only need to pass other_vport flag if vport is not 0 or if the port is ECPF in BlueField. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
70c36438 |
|
27-May-2023 |
Roi Dayan <roid@nvidia.com> |
net/mlx5e: E-Switch, Use xarray for devcom paired device index To allow devcom events on E-Switch that is not a vport group manager, use vhca id as an index instead of device index which might be shared between several E-Switches. for example SF and its PF. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
1552e9b5 |
|
27-May-2023 |
Roi Dayan <roid@nvidia.com> |
net/mlx5e: E-Switch, Add peer fdb miss rules for vport manager or ecpf Add peer fdb rules for E-Switch that are vport managers or ecpf device. It is not needed for other devices. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
8ec91f5d |
|
22-May-2023 |
Roi Dayan <roid@nvidia.com> |
net/mlx5: Lag, Remove duplicate code checking lag is supported Remove duplicate function for checking if device has lag support. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
fa3c73ee |
|
07-Mar-2023 |
Daniel Jurgens <danielj@nvidia.com> |
net/mlx5: Add/remove peer miss rules for EC VFs Add and remove the peer miss rules for EC VFs. It's possible that there are different amounts of total VFs per function so only create rules for the minimum number of max VFs. Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Reviewed-by: William Tu <witu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
a7719b29 |
|
07-Mar-2023 |
Daniel Jurgens <danielj@nvidia.com> |
net/mlx5: Add management of EC VF vports Add init, load, unload, and cleanup of the EC VF vports. This includes changes in how eswitch SRIOV is managed. Previous on an embedded CPU platform the number of VFs provided when enabling the eswitch was always 0, host VFs vports are handled in the eswitch functions change event handler. Now track the number of EC VFs as well, so they can be handled properly in the enable/disable flows. There are only 3 marks available for use in xarrays, all 3 were already in use for this use case. EC VF vports are in a known range so we can access them by index instead of marks. Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Reviewed-by: William Tu <witu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
18a92b05 |
|
07-Mar-2023 |
Daniel Jurgens <danielj@nvidia.com> |
net/mlx5: Simplify unload all rep code Instead of using type specific iterators which are only used in one place just traverse the xarray. It will provide suitable ordering based on the vport numbers. This will also eliminate the need for changes here when new types are added. Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Reviewed-by: William Tu <witu@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
97bd788e |
|
06-Jun-2023 |
Jiri Pirko <jiri@resnulli.us> |
net/mlx5: Skip inline mode check after mlx5_eswitch_enable_locked() failure Commit bffaa916588e ("net/mlx5: E-Switch, Add control for inline mode") added inline mode checking to esw_offloads_start() with a warning printed out in case there is a problem. Tne inline mode checking was done even after mlx5_eswitch_enable_locked() call failed, which is pointless. Later on, commit 8c98ee77d911 ("net/mlx5e: E-Switch, Add extack messages to devlink callbacks") converted the error/warning prints to extack setting, which caused that the inline mode check error to overwrite possible previous extack message when mlx5_eswitch_enable_locked() failed. User then gets confusing error message. Fix this by skipping check of inline mode after mlx5_eswitch_enable_locked() call failed. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
e2a82bf8 |
|
06-Feb-2023 |
Shay Drory <shayd@nvidia.com> |
net/mlx5: Devcom, extend mlx5_devcom_send_event to work with more than two devices mlx5_devcom_send_event is used to send event from one eswitch to the other. In other words, only one event is sent, which means, no error mechanism is needed. However, In case devcom have more than two eswitches, a proper error mechanism is needed. Hence, in case of error, devcom will perform the error unwind, since devcom knows how many events were successful. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
8611df72 |
|
02-Feb-2023 |
Shay Drory <shayd@nvidia.com> |
net/mlx5: E-switch, mark devcom as not ready when all eswitches are unpaired Whenever an eswitch is unpaired with another, the driver mark devcom as not ready. While this is correct in case we are pairing only two eswitches, in order to support pairing of more than two eswitches, driver need to mark devcom as not ready only when all eswitches are unpaired. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
e67f928a |
|
07-Feb-2023 |
Shay Drory <shayd@nvidia.com> |
net/mlx5: Devcom, Rename paired to ready In downstream patch devcom will provide support for more than two devices. The term 'paired' will be renamed as 'ready' to convey a more accurate meaning. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
6d5b7321 |
|
21-Feb-2023 |
Shay Drory <shayd@nvidia.com> |
net/mlx5: DR, handle more than one peer domain Currently, DR domain is using the assumption that each domain can only have a single peer. In order to support VF LAG of more then two ports, expand peer domain to use an array of peers, and align the code accordingly. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
014e4d48 |
|
02-Feb-2023 |
Shay Drory <shayd@nvidia.com> |
net/mlx5: E-switch, generalize shared FDB creation Shared FDB creation is hard coded for only two eswitches. Generalize shared FDB creation so that any number of eswitches could create shared FDB. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
5e0202eb |
|
22-Feb-2023 |
Shay Drory <shayd@nvidia.com> |
net/mlx5: E-switch, Handle multiple master egress rules Currently, whenever a shared FDB is created, the slave eswitch is creating master egress rule to the master eswitch. In order to support more than two ports, which means there will be more than one slave eswitch, enlarge bounce_rule, which is used to create master egress rule, to an xarray. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
9bee385a |
|
05-Feb-2023 |
Shay Drory <shayd@nvidia.com> |
net/mlx5: E-switch, refactor FDB miss rule add/remove Currently, E-switch FDB have a single peer miss rule. In order to support more than one peer, refactor E-switch FDB to have peer miss rule per peer, and change the code to add/remove a rule from specific peer. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
18e31d42 |
|
05-Feb-2023 |
Shay Drory <shayd@nvidia.com> |
net/mlx5: E-switch, enlarge peer miss group table There is an implicit assumption that peer miss group table require to handle only a single peer. Also, there is an assumption that total_vports of the master is greater or equal to the total_vports of each peer. Change the code to support peer miss group for more than one peer. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
9be6c21f |
|
06-Feb-2023 |
Shay Drory <shayd@nvidia.com> |
net/mlx5e: Handle offloads flows per peer Currently, E-switch offloads table have a list of all flows that create a peer_flow over the peer eswitch. In order to support more than one peer, extend E-switch offloads table peer_flow to hold an array of lists, where each peer have dedicate index via mlx5_get_dev_index(). Thereafter, extend original flow to hold an array of peers as well. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
ed7a8fe7 |
|
30-Mar-2022 |
Mark Bloch <mbloch@nvidia.com> |
net/mlx5e: rep, store send to vport rules per peer Each representor, for each send queue, is holding a send_to_vport rule for the peer eswitch. In order to support more than one peer, and to map between the peer rules and peer eswitches, refactor representor to hold both the peer rules and pointer to the peer eswitches. This enables mlx5 to store send_to_vport rules per peer, where each peer have dedicate index via mlx5_get_dev_index(). Signed-off-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
71c93e37 |
|
25-May-2023 |
Jiri Pirko <jiri@resnulli.us> |
devlink: move port_fn_hw_addr_get/set() to devlink_port_ops Move port_fn_hw_addr_get/set() from devlink_ops into newly introduced devlink_port_ops. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Acked-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
8c253dfc |
|
06-Feb-2023 |
Shay Drory <shayd@nvidia.com> |
net/mlx5: E-switch, Devcom, sync devcom events and devcom comp register devcom events are sent to all registered component. Following the cited patch, it is possible for two components, e.g.: two eswitches, to send devcom events, while both components are registered. This means eswitch layer will do double un/pairing, which is double allocation and free of resources, even though only one un/pairing is needed. flow example: cpu0 cpu1 ---- ---- mlx5_devlink_eswitch_mode_set(dev0) esw_offloads_devcom_init() mlx5_devcom_register_component(esw0) mlx5_devlink_eswitch_mode_set(dev1) esw_offloads_devcom_init() mlx5_devcom_register_component(esw1) mlx5_devcom_send_event() mlx5_devcom_send_event() Hence, check whether the eswitches are already un/paired before free/allocation of resources. Fixes: 09b278462f16 ("net: devlink: enable parallel ops on netlink interface") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
2be5bd42 |
|
20-Mar-2023 |
Shay Drory <shayd@nvidia.com> |
net/mlx5: Handle pairing of E-switch via uplink un/load APIs In case user switch a device from switchdev mode to legacy mode, mlx5 first unpair the E-switch and afterwards unload the uplink vport. From the other hand, in case user remove or reload a device, mlx5 first unload the uplink vport and afterwards unpair the E-switch. The latter is causing a bug[1], hence, handle pairing of E-switch as part of uplink un/load APIs. [1] In case VF_LAG is used, every tc fdb flow is duplicated to the peer esw. However, the original esw keeps a pointer to this duplicated flow, not the peer esw. e.g.: if user create tc fdb flow over esw0, the flow is duplicated over esw1, in FW/HW, but in SW, esw0 keeps a pointer to the duplicated flow. During module unload while a peer tc fdb flow is still offloaded, in case the first device to be removed is the peer device (esw1 in the example above), the peer net-dev is destroyed, and so the mlx5e_priv is memset to 0. Afterwards, the peer device is trying to unpair himself from the original device (esw0 in the example above). Unpair API invoke the original device to clear peer flow from its eswitch (esw0), but the peer flow, which is stored over the original eswitch (esw0), is trying to use the peer mlx5e_priv, which is memset to 0 and result in bellow kernel-oops. [ 157.964081 ] BUG: unable to handle page fault for address: 000000000002ce60 [ 157.964662 ] #PF: supervisor read access in kernel mode [ 157.965123 ] #PF: error_code(0x0000) - not-present page [ 157.965582 ] PGD 0 P4D 0 [ 157.965866 ] Oops: 0000 [#1] SMP [ 157.967670 ] RIP: 0010:mlx5e_tc_del_fdb_flow+0x48/0x460 [mlx5_core] [ 157.976164 ] Call Trace: [ 157.976437 ] <TASK> [ 157.976690 ] __mlx5e_tc_del_fdb_peer_flow+0xe6/0x100 [mlx5_core] [ 157.977230 ] mlx5e_tc_clean_fdb_peer_flows+0x67/0x90 [mlx5_core] [ 157.977767 ] mlx5_esw_offloads_unpair+0x2d/0x1e0 [mlx5_core] [ 157.984653 ] mlx5_esw_offloads_devcom_event+0xbf/0x130 [mlx5_core] [ 157.985212 ] mlx5_devcom_send_event+0xa3/0xb0 [mlx5_core] [ 157.985714 ] esw_offloads_disable+0x5a/0x110 [mlx5_core] [ 157.986209 ] mlx5_eswitch_disable_locked+0x152/0x170 [mlx5_core] [ 157.986757 ] mlx5_eswitch_disable+0x51/0x80 [mlx5_core] [ 157.987248 ] mlx5_unload+0x2a/0xb0 [mlx5_core] [ 157.987678 ] mlx5_uninit_one+0x5f/0xd0 [mlx5_core] [ 157.988127 ] remove_one+0x64/0xe0 [mlx5_core] [ 157.988549 ] pci_device_remove+0x31/0xa0 [ 157.988933 ] device_release_driver_internal+0x18f/0x1f0 [ 157.989402 ] driver_detach+0x3f/0x80 [ 157.989754 ] bus_remove_driver+0x70/0xf0 [ 157.990129 ] pci_unregister_driver+0x34/0x90 [ 157.990537 ] mlx5_cleanup+0xc/0x1c [mlx5_core] [ 157.990972 ] __x64_sys_delete_module+0x15a/0x250 [ 157.991398 ] ? exit_to_user_mode_prepare+0xea/0x110 [ 157.991840 ] do_syscall_64+0x3d/0x90 [ 157.992198 ] entry_SYSCALL_64_after_hwframe+0x46/0xb0 Fixes: 04de7dda7394 ("net/mlx5e: Infrastructure for duplicated offloading of TC flows") Fixes: 1418ddd96afd ("net/mlx5e: Duplicate offloaded TC eswitch rules under uplink LAG") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
7eb197fd |
|
23-Apr-2023 |
Roi Dayan <roid@nvidia.com> |
net/mlx5: E-Switch, Use metadata matching for RoCE loopback rule Use metadata matching for RoCE loopback rule if device is configured to use metadata for source port matching. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
bea416c7 |
|
23-Apr-2023 |
Roi Dayan <roid@nvidia.com> |
net/mlx5e: E-Switch, Check device is PF when stopping esw offloads Checking sriov is done on the pci device so it can return true on other devices like SF but nothing should be done in this case. Add a check that the device is PF. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
29bcb6e4 |
|
02-Apr-2023 |
Roi Dayan <roid@nvidia.com> |
net/mlx5e: E-Switch, Use metadata for vport matching in send-to-vport rules Like other rules use metadata matching if supported instead of source_port. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
edab80b8 |
|
30-Jan-2023 |
Roi Dayan <roid@nvidia.com> |
net/mlx5e: E-Switch, Remove flow_source check for metadata matching There is no reason to check for flow_source cap to allow metadata matching. When flow_source match is being used the flow_source cap is being checked. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
4c818930 |
|
10-Mar-2023 |
Chris Mi <cmi@nvidia.com> |
net/mlx5: E-switch, Don't destroy indirect table in split rule Source port rewrite (forward to ovs internal port or statck device) isn't supported in the rule of split action. So there is no indirect table in split rule. The cited commit destroyes indirect table in split rule. The indirect table for other rules will be destroyed wrongly. It will cause traffic loss. Fix it by removing the destroy function in split rule. And also remove the destroy function in error flow. Fixes: 10742efc20a4 ("net/mlx5e: VF tunnel TX traffic offloading") Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
fd745f4c |
|
10-Mar-2023 |
Chris Mi <cmi@nvidia.com> |
net/mlx5: E-switch, Create per vport table based on devlink encap mode Currently when creating per vport table, create flags are hardcoded. Devlink encap mode is set based on user input and HW capability. Create per vport table based on devlink encap mode. Fixes: c796bb7cd230 ("net/mlx5: E-switch, Generalize per vport table API") Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
38d9a740 |
|
21-Mar-2023 |
Roi Dayan <roid@nvidia.com> |
net/mlx5: E-Switch, Remove unused mlx5_esw_offloads_vport_metadata_set() Remove unused function which also seems a duplicate of esw_port_metadata_set(). Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
0a431418 |
|
20-Mar-2023 |
Maher Sanalla <msanalla@nvidia.com> |
Revert "net/mlx5: Expose vnic diagnostic counters for eswitch managed vports" This reverts commit 606e6a72e29dff9e3341c4cc9b554420e4793f401 which exposes the vnic diagnostic counters via debugfs. Instead, The upcoming series will expose the same counters through devlink health reporter. Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
acc10929 |
|
13-Apr-2023 |
Leon Romanovsky <leon@kernel.org> |
net/mlx5: Allow blocking encap changes in eswitch Existing eswitch encap option enables header encapsulation. Unfortunately currently available hardware isn't able to perform double encapsulation, which can happen once IPsec packet offload tunnel mode is used together with encap mode set to BASIC. So as a solution for misconfiguration, provide an option to block encap changes, which will be used for IPsec packet offload. Reviewed-by: Emeel Hakim <ehakim@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
8e80e564 |
|
14-Mar-2023 |
Paul Blakey <paulb@nvidia.com> |
net/mlx5: fs_chains: Refactor to detach chains from tc usage To support more generic chains that will be used on other namespaces and without tc, refactor to remove the dependency on tc terms. Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Link: https://lore.kernel.org/r/bb8570d532d569285b5bff981578507bd15350cb.1678714336.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
#
662404b2 |
|
08-Feb-2023 |
Gavin Li <gavinl@nvidia.com> |
net/mlx5e: Block entering switchdev mode with ns inconsistency Upon entering switchdev mode, VF/SF representors are spawned in the devlink instance's net namespace, whereas the PF net device transforms into the uplink representor, remaining in the net namespace the PF net device was in. Therefore, if a PF net device's namespace is different from its parent devlink net namespace, entering switchdev mode can create an illegal situation where all representors sharing the same core device are NOT in the same net namespace. To avoid this issue, block entering switchdev mode for devices whose child netdev net namespace has diverged from the parent devlink's. Fixes: 7768d1971de6 ("net/mlx5: E-Switch, Add control for encapsulation") Signed-off-by: Gavin Li <gavinl@nvidia.com> Reviewed-by: Gavi Teitz <gavi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
1313d78a |
|
07-Feb-2023 |
Maor Dickman <maord@nvidia.com> |
net/mlx5: E-switch, Fix wrong usage of source port rewrite in split rules In few cases, rules with mirror use case are split to two FTEs, one which do the mirror action and forward to second FTE which do the rest of the rule actions and the second redirect action. In case of mirror rules which do split and forward to ovs internal port or VF stack devices, source port rewrite should be used in the second FTE but it is wrongly also set in the first FTE which break the offload. Fix this issue by removing the wrong check if source port rewrite is needed to be used on the first FTE of the split and instead return EOPNOTSUPP which will block offload of rules which mirror to ovs internal port or VF stack devices which isn't supported. Fixes: 10742efc20a4 ("net/mlx5e: VF tunnel TX traffic offloading") Fixes: a508728a4c8b ("net/mlx5e: VF tunnel RX traffic offloading") Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
1bf8b0da |
|
30-Jan-2023 |
Roi Dayan <roid@nvidia.com> |
net/mlx5e: Verify flow_source cap before using it When adding send to vport rule verify flow_source matching is supported by checking the flow_source cap. Fixes: d04442540372 ("net/mlx5: E-Switch, set flow source for send to uplink rule") Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
8ce81fc0 |
|
01-Dec-2022 |
Roi Dayan <roid@nvidia.com> |
net/mlx5e: TC, Add peer flow in mpesw mode While at it rename mlx5_lag_mpesw_is_activated() to mlx5_lag_is_mpesw() to be consistent with checking if other lag modes are activated. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
633ad4b2 |
|
21-Sep-2022 |
Roi Dayan <roid@nvidia.com> |
net/mlx5e: Remove redundant code for handling vlan actions Remove unused code which was used only with deprecated HW which didn't support vlan actions. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
d2a651ef |
|
26-Jan-2023 |
Jiri Pirko <jiri@nvidia.com> |
net/mlx5: Move eswitch port metadata devlink param to flow eswitch code Move the param registration and handling code into the eswitch offloads code as they are related to each other. No point in having the devlink param registration done in separate file. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
efb4879f |
|
23-Oct-2022 |
Leon Romanovsky <leon@kernel.org> |
net/mlx5e: Use read lock for eswitch get callbacks In commit 367dfa121205 ("net/mlx5: Remove devl_unlock from mlx5_eswtich_mode_callback_enter") all functions were converted to use write lock without relation to their actual purpose. Change the devlink eswitch getters to use read and not write locks. Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
521933cd |
|
20-Dec-2022 |
Maor Dickman <maord@nvidia.com> |
net/mlx5e: Support Geneve and GRE with VF tunnel offload Today VF tunnel offload (tunnel endpoint is on VF) is implemented by indirect table which use rules that match on VXLAN VNI to recirculated to root table, this limit the support for only VXLAN tunnels. This patch change indirect table to use one single match all rule to recirculated to root table which is added when any tunnel decap rule is added with tunnel endpoint is VF. This allow support of Geneve and GRE with this configuration. Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
cd4f186d |
|
15-Dec-2022 |
Shay Drory <shayd@nvidia.com> |
net/mlx5: E-switch, Coverity: overlapping copy When a capability is set via port function caps callbacks, a memcpy() is performed in which the source and the target are the same address, e.g.: the copy is redundant. Hence, Remove it. Discovered by Coverity. Fixes: 7db98396ef45 ("net/mlx5: E-Switch, Implement devlink port function cmds to control RoCE") Fixes: e5b9642a33be ("net/mlx5: E-Switch, Implement devlink port function cmds to control migratable") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
e0bf81bf |
|
16-Aug-2022 |
Ariel Levkovich <lariel@nvidia.com> |
net/mlx5: check attr pointer validity before dereferencing it Fix attr pointer validity checks after it was already dereferenced. Fixes: cb0d54cbf948 ("net/mlx5e: Fix wrong source vport matching on tunnel rule") Signed-off-by: Ariel Levkovich <lariel@nvidia.com> Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
6fda078d |
|
02-Nov-2022 |
Oz Shlomo <ozsh@nvidia.com> |
net/mlx5e: TC, add support for meter mtu offload Initialize the meter object with the TC police mtu parameter. Use the hardware range destination to compare the pkt len to the mtu setting. Assign the range destination hit/miss ft to the police conform/exceed attributes. Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
5a5624d1 |
|
03-Dec-2022 |
Oz Shlomo <ozsh@nvidia.com> |
net/mlx5e: E-Switch, handle flow attribute with no destinations Rules with drop action are not required to have a destination. Currently the destination list is allocated with the maximum number of destinations and passed to the fs_core layer along with the actual number of destinations. Remove redundant passing of dest pointer when count of dest is 0. Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20221203221337.29267-2-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
e5b9642a |
|
06-Dec-2022 |
Shay Drory <shayd@nvidia.com> |
net/mlx5: E-Switch, Implement devlink port function cmds to control migratable Implement devlink port function commands to enable / disable migratable. This is used to control the migratable capability of the device. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Acked-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
7db98396 |
|
06-Dec-2022 |
Yishai Hadas <yishaih@nvidia.com> |
net/mlx5: E-Switch, Implement devlink port function cmds to control RoCE Implement devlink port function commands to enable / disable RoCE. This is used to control the RoCE device capabilities. This patch implement infrastructure which will be used by downstream patches that will add additional capabilities. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Acked-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
47d0c500 |
|
06-Dec-2022 |
Shay Drory <shayd@nvidia.com> |
net/mlx5: Add generic getters for other functions caps Downstream patch requires to get other function GENERAL2 caps while mlx5_vport_get_other_func_cap() gets only one type of caps (general). Rename it to represent this and introduce a generic implementation of mlx5_vport_get_other_func_cap(). Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Acked-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
dcf19b9c |
|
24-Nov-2022 |
Maor Dickman <maord@nvidia.com> |
net/mlx5e: TC, Add offload support for trap with additional actions TC trap action offload is currently supported only when trap is the sole action in the flow. This patch remove this limitation by changing trap action offload to not use MLX5_ATTR_FLAG_SLOW_PATH flag and instead set the flow destination table explicitly to be the slow table. This will allow offload of the additional actions. TC flow example: tc filter add dev $REP2 protocol ip prio 2 root \ flower skip_sw dst_mac $mac0 \ action mirred egress redirect dev $REP3 \ action pedit ex munge eth dst set $mac2 pipe \ action trap Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
42760d95 |
|
20-Nov-2022 |
Roi Dayan <roid@nvidia.com> |
net/mlx5e: Do early return when setup vports dests for slow path flow Adding flow flag cases in setup vport dests before the slow path case is incorrect as the slow path should take precedence. Current code doesn't show this importance so make the slow path case return early and separate from the other cases and remove the redundant comparison of it in the sample case. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
2318b8bb |
|
17-Nov-2022 |
Chris Mi <cmi@nvidia.com> |
net/mlx5: E-switch, Destroy legacy fdb table when needed The cited commit removes eswitch mode none. But when disabling sriov in legacy mode or changing from switchdev to legacy mode without sriov enabled, the legacy fdb table is not destroyed. It is not the right behavior. Destroy legacy fdb table in above two caes. Fixes: f019679ea5f2 ("net/mlx5: E-switch, Remove dependency between sriov and eswitch mode") Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Eli Cohen <elic@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
6d942e40 |
|
16-Nov-2022 |
Roi Dayan <roid@nvidia.com> |
net/mlx5: E-Switch, Set correctly vport destination The cited commit moved from using reformat_id integer to packet_reformat pointer which introduced the possibility to null pointer dereference. When setting packet reformat flag and pkt_reformat pointer must exists so checking MLX5_ESW_DEST_ENCAP is not enough, we need to make sure the pkt_reformat is valid and check for MLX5_ESW_DEST_ENCAP_VALID. If the dest encap valid flag does not exists then pkt_reformat can be either invalid address or null. Also, to make sure we don't try to access invalid pkt_reformat set it to null when invalidated and invalidate it before calling add flow code as its logically more correct and to be safe. Fixes: 2b688ea5efde ("net/mlx5: Add flow steering actions to fs_cmd shim layer") Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Chris Mi <cmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
e12de39c |
|
03-Nov-2022 |
Chris Mi <cmi@nvidia.com> |
net/mlx5: E-switch, Set to legacy mode if failed to change switchdev mode No need to rollback to the other mode because probably will fail again. Just set to legacy mode and clear fdb table created flag. So that fdb table will not be cleared again. Fixes: f019679ea5f2 ("net/mlx5: E-switch, Remove dependency between sriov and eswitch mode") Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
8c9cc1eb |
|
01-Oct-2022 |
Roi Dayan <roid@nvidia.com> |
net/mlx5: E-Switch, Allow offloading fwd dest flow table with vport Before this commit a fwd dest flow table resulted in ignoring vport dests which is incorrect and is supported. With this commit the dests can be a mix of flow table and vport dests. There is still a limitation that there cannot be more than one flow table dest. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
430e2d5e |
|
18-Jul-2022 |
Roi Dayan <roid@nvidia.com> |
net/mlx5: E-Switch, Move send to vport meta rule creation Move the creation of the rules from offloads fdb table init to per rep vport init. This way the driver will creating the send to vport meta rule on any representor, e.g. SF representors. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
4a561817 |
|
18-Jul-2022 |
Roi Dayan <roid@nvidia.com> |
net/mlx5: E-Switch, Split creating fdb tables into smaller chunks Split esw_create_offloads_fdb_tables() into smaller functions. This will help maintenance. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
8ea7bcf6 |
|
05-Apr-2022 |
Jianbo Liu <jianbol@nvidia.com> |
net/mlx5: E-Switch, Add default drop rule for unmatched packets The ft_offloads table serves to steer packets, which are from the eswitch, to the representor associated with the packets' source vport. Previously, if a packet's source vport or metadata was not associated with any representor, it was forwarded to the uplink representor. The representor got packets it shouldn't have as they weren't coming from the uplink vport. One such effect of this breakage can be observed if the uplink representor is attached to a bridge, where such illegal packets will be broadcast to the remaining ports, flooding the switch with illegal packets. In the case where IB loopback (e.g, SNAP) is enabled, all transmitted packets would be looped back, and received by the uplink representor, and result in an infinite feedback loop. Therefore, block this hole by adding a default drop rule to the ft_offloads table, so that all unmatched packets with no associated representor are dropped. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Gavi Teitz <gavi@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
b868c8fe |
|
15-Jul-2022 |
Dan Carpenter <dan.carpenter@oracle.com> |
net/mlx5: unlock on error path in esw_vfs_changed_event_handler() Unlock before returning on this error path. Fixes: f1bc646c9a06 ("net/mlx5: Use devl_ API in mlx5_esw_offloads_devlink_port_register") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
942fca7e |
|
06-Aug-2022 |
Eli Cohen <elic@nvidia.com> |
net/mlx5: Eswitch, Fix forwarding decision to uplink Make sure to modify the rule for uplink forwarding only for the case where destination vport number is MLX5_VPORT_UPLINK. Fixes: 94db33177819 ("net/mlx5: Support multiport eswitch mode") Signed-off-by: Eli Cohen <elic@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
c0063a43 |
|
07-Jul-2022 |
Vlad Buslov <vladbu@nvidia.com> |
net/mlx5e: Modify slow path rules to go to slow fdb While extending available range of supported chains/prios referenced commit also modified slow path rules to go to FT chain instead of actual slow FDB. However neither of existing users of the MLX5_ATTR_FLAG_SLOW_PATH flag (tunnel encap entries with invalid encap and flows with trap action) need to match on FT chain. After bridge offload was implemented packets of such flows can also be matched by bridge priority tables which is undesirable. Restore slow path flows implementation to redirect packets to slow_fdb. Fixes: 278d51f24330 ("net/mlx5: E-Switch, Increase number of chains and priorities") Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
9153da46 |
|
06-Jul-2022 |
Jianbo Liu <jianbol@nvidia.com> |
net/mlx5e: configure meter in flow action After police action is parsed, set meter data in flow action, so they can be used when adding FTE. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
606e6a72 |
|
18-May-2022 |
Michael Guralnik <michaelgur@nvidia.com> |
net/mlx5: Expose vnic diagnostic counters for eswitch managed vports Expose on vport group managers debug counters for their managed vports. Counters are exposed through debugfs, the directory will be present only for functions that are eswitch managers and only counters that are supported on their specific HW/FW will be exposed. Example: $ ls /sys/kernel/debug/mlx5/0000:08:00.0/esw/ pf sf_8 vf_0 vf_1 $ ls -l /sys/kernel/debug/mlx5/0000:08:00.0/esw/vf_0/vnic_diag/ cq_overrun quota_exceeded_command total_q_under_processor_handle invalid_command send_queue_priority_update_flow List of all counter added: total_q_under_processor_handle - number of queues in error state due to an async error or errored command. send_queue_priority_update_flow - number of QP/SQ priority/SL update events. cq_overrun - number of times CQ entered an error state due to an overflow. async_eq_overrun -number of time an EQ mapped to async events was overrun. comp_eq_overrun - number of time an EQ mapped to completion events was overrun. quota_exceeded_command - number of commands issued and failed due to quota exceeded. invalid_command - number of commands issued and failed dues to any reason other than quota exceeded. Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
973598d4 |
|
11-Jul-2022 |
Moshe Shemesh <moshe@nvidia.com> |
net/mlx5: Remove devl_unlock from mlx5_devlink_eswitch_mode_set The callback mlx5_devlink_eswitch_mode_set() had unlocked devlink as a temporary workaround once devlink instance lock was added to devlink eswitch callbacks. Now that all flows triggered by this function that took devlink lock are using devl_ API and all parallel paths are locked we can remove this workaround. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
#
7b19119f |
|
11-Jul-2022 |
Moshe Shemesh <moshe@nvidia.com> |
net/mlx5: Use devl_ API in mlx5e_devlink_port_register As part of the flows invoked by mlx5_devlink_eswitch_mode_set() get to mlx5_rescan_drivers_locked() which can call mlx5e_probe()/mlx5e_remove and register/unregister mlx5e driver ports accordingly. This can lead to deadlock once mlx5_devlink_eswitch_mode_set() will use devlink lock. Use devl_port_register/unregister() instead of devlink_port_register/unregister() and add devlink instance locks in the driver paths to this function to have it locked while calling devl_ API function. If remove or probe were called by module init or module cleanup flows, need to lock devlink just before calling devl_port_register(), otherwise it is called by attach/detach or register/unregister flow and we can have the flow locked. Added flag to distinguish between these cases. This will be used by the downstream patch to invoke mlx5_devlink_eswitch_mode_set() with devlink locked. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
#
f1bc646c |
|
11-Jul-2022 |
Moshe Shemesh <moshe@nvidia.com> |
net/mlx5: Use devl_ API in mlx5_esw_offloads_devlink_port_register The function mlx5_esw_offloads_devlink_port_register() calls devlink_port_register() and devlink_rate_leaf_create(). Use devl_ API to call devl_port_register() and devl_rate_leaf_create() accordingly and add devlink instance lock in driver paths to this function. Similarly, use devl_ API to call devl_port_unregister() and devl_rate_leaf_destroy() in mlx5_esw_offloads_devlink_port_unregister() and ensure locking devlink instance lock on the paths to this function too. This will be used by the downstream patch to invoke mlx5_devlink_eswitch_mode_set() with devlink lock held. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
#
03f9c47d |
|
11-Jul-2022 |
Moshe Shemesh <moshe@nvidia.com> |
net/mlx5: Use devl_ API for rate nodes destroy Use devl_rate_nodes_destroy() instead of devlink_rate_nodes_destroy(). Add devlink instance lock in the driver paths to this function to have it locked while calling devl_ API function. This will be used by the downstream patch to invoke mlx5_devlink_eswitch_mode_set() with devlink lock held. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
#
367dfa12 |
|
11-Jul-2022 |
Moshe Shemesh <moshe@nvidia.com> |
net/mlx5: Remove devl_unlock from mlx5_eswtich_mode_callback_enter The function mlx5_eswtich_mode_callback_enter() was added as a temporary workaround once devlink instance lock was added to devlink eswitch callbacks. However, code review and testing show that all the callbacks part to eswitch_mode_set don't take devlink instance lock in any flow and so unlocking devlink instance lock while entering these functions is not needed. Remove devl_lock from mlx5_eswtich_mode_callback_enter() and devl_unlock from mlx5_eswtich_mode_callback_exit(). Also remove the functions mlx5_eswtich_mode_callback_enter()/exit() as they are not needed any more. The callback eswitch_mode_set will be treated separately in the following patches. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
#
b6f2846a |
|
29-May-2022 |
Chris Mi <cmi@nvidia.com> |
net/mlx5: E-switch: Change eswitch mode only via devlink command Enable or disable switchdev according to the eswitch mode set by devlink command. So it is not changed by other functions anymore. Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
f019679e |
|
29-May-2022 |
Chris Mi <cmi@nvidia.com> |
net/mlx5: E-switch, Remove dependency between sriov and eswitch mode Currently, there are three eswitch modes, none, legacy and switchdev. None is the default mode. Remove redundant none mode as eswitch mode should always be either legacy mode or switchdev mode. With this patch, there are two behavior changes: 1. Legacy becomes the default mode. When querying eswitch mode using devlink, a valid mode is always returned. 2. When disabling sriov, the eswitch mode will not change, only vfs are unloaded. Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
3008e6a0 |
|
25-May-2022 |
Mark Bloch <mbloch@nvidia.com> |
net/mlx5: E-Switch, pair only capable devices OFFLOADS paring using devcom is possible only on devices that support LAG. Filter based on lag capabilities. This fixes an issue where mlx5_get_next_phys_dev() was called without holding the interface lock. This issue was found when commit bc4c2f2e0179 ("net/mlx5: Lag, filter non compatible devices") added an assert that verifies the interface lock is held. WARNING: CPU: 9 PID: 1706 at drivers/net/ethernet/mellanox/mlx5/core/dev.c:642 mlx5_get_next_phys_dev+0xd2/0x100 [mlx5_core] Modules linked in: mlx5_vdpa vringh vhost_iotlb vdpa mlx5_ib mlx5_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_umad ib_ipoib ib_cm ib_uverbs ib_core overlay fuse [last unloaded: mlx5_core] CPU: 9 PID: 1706 Comm: devlink Not tainted 5.18.0-rc7+ #11 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:mlx5_get_next_phys_dev+0xd2/0x100 [mlx5_core] Code: 02 00 75 48 48 8b 85 80 04 00 00 5d c3 31 c0 5d c3 be ff ff ff ff 48 c7 c7 08 41 5b a0 e8 36 87 28 e3 85 c0 0f 85 6f ff ff ff <0f> 0b e9 68 ff ff ff 48 c7 c7 0c 91 cc 84 e8 cb 36 6f e1 e9 4d ff RSP: 0018:ffff88811bf47458 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88811b398000 RCX: 0000000000000001 RDX: 0000000080000000 RSI: ffffffffa05b4108 RDI: ffff88812daaaa78 RBP: ffff88812d050380 R08: 0000000000000001 R09: ffff88811d6b3437 R10: 0000000000000001 R11: 00000000fddd3581 R12: ffff88815238c000 R13: ffff88812d050380 R14: ffff8881018aa7e0 R15: ffff88811d6b3428 FS: 00007fc82e18ae80(0000) GS:ffff88842e080000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f9630d1b421 CR3: 0000000149802004 CR4: 0000000000370ea0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> mlx5_esw_offloads_devcom_event+0x99/0x3b0 [mlx5_core] mlx5_devcom_send_event+0x167/0x1d0 [mlx5_core] esw_offloads_enable+0x1153/0x1500 [mlx5_core] ? mlx5_esw_offloads_controller_valid+0x170/0x170 [mlx5_core] ? wait_for_completion_io_timeout+0x20/0x20 ? mlx5_rescan_drivers_locked+0x318/0x810 [mlx5_core] mlx5_eswitch_enable_locked+0x586/0xc50 [mlx5_core] ? mlx5_eswitch_disable_pf_vf_vports+0x1d0/0x1d0 [mlx5_core] ? mlx5_esw_try_lock+0x1b/0xb0 [mlx5_core] ? mlx5_eswitch_enable+0x270/0x270 [mlx5_core] ? __debugfs_create_file+0x260/0x3e0 mlx5_devlink_eswitch_mode_set+0x27e/0x870 [mlx5_core] ? mutex_lock_io_nested+0x12c0/0x12c0 ? esw_offloads_disable+0x250/0x250 [mlx5_core] ? devlink_nl_cmd_trap_get_dumpit+0x470/0x470 ? rcu_read_lock_sched_held+0x3f/0x70 devlink_nl_cmd_eswitch_set_doit+0x217/0x620 Fixes: dd3fddb82780 ("net/mlx5: E-Switch, handle devcom events only for ports on the same device") Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
94db3317 |
|
30-Jan-2022 |
Eli Cohen <elic@nvidia.com> |
net/mlx5: Support multiport eswitch mode Multiport eswitch mode is a LAG mode that allows to add rules that forward traffic to a specific physical port without being affected by LAG affinity configuration. This mode of operation is mutual exclusive with the other LAG modes used by multipath and bonding. To make the transition between the modes, we maintain a counter on the number of rules specifying one of the uplink representors as the target of mirred egress redirect action. An example of such rule would be: $ tc filter add dev enp8s0f0_0 prot all root flower dst_mac \ 00:11:22:33:44:55 action mirred egress redirect dev enp8s0f0 If the reference count just grows to one and LAG is not in use, we create the LAG in multiport eswitch mode. Other mode changes are not allowed while in this mode. When the reference count reaches zero, we destroy the LAG and let other modes be used if needed. logic also changed such that if forwarding to some uplink destination cannot be guaranteed, we fail the operation so the rule will eventually be in software and not in hardware. Signed-off-by: Eli Cohen <elic@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
cb0d54cb |
|
15-Mar-2022 |
Ariel Levkovich <lariel@nvidia.com> |
net/mlx5e: Fix wrong source vport matching on tunnel rule When OVS internal port is the vtep device, the first decap rule is matching on the internal port's vport metadata value and then changes the metadata to be the uplink's value. Therefore, following rules on the tunnel, in chain > 0, should avoid matching on internal port metadata and use the uplink vport metadata instead. Select the uplink's metadata value for the source vport match in case the rule is in chain greater than zero, even if the tunnel route device is internal port. Fixes: 166f431ec6be ("net/mlx5e: Add indirect tc offload of ovs internal port") Signed-off-by: Ariel Levkovich <lariel@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
14e426bf |
|
18-Mar-2022 |
Jakub Kicinski <kuba@kernel.org> |
devlink: hold the instance lock during eswitch_mode callbacks Make the devlink core hold the instance lock during eswitch_mode callbacks. Cheat in case of mlx5 (see the cover letter). Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
82e86a6c |
|
29-Nov-2021 |
Mark Bloch <mbloch@nvidia.com> |
net/mlx5: E-switch, remove special uplink ingress ACL handling As both uplinks set the same metadata there is no need to merge the ACL handling of both into a single one. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
0b0ea3c5 |
|
09-Mar-2021 |
Sunil Rani <sunrani@nvidia.com> |
net/mlx5: E-Switch, reserve and use same uplink metadata across ports When in switchdev mode wire traffic will hit the FDB in one of two scenarios. - Shared FDB, in that case traffic from both physical ports should be tagged by the same metadata value so a single FDB rule could catch traffic from both ports. - Two E-Switches, traffic from each physical port will hit the native E-Switch which means traffic from one physical port can't reach the E-Switch of the other one. Looking at those two scenarios it means we can always use the same metadata value to tag wire traffic regardless of the mode. Reserve a single metadata value to be used to tag wire traffic. Signed-off-by: Sunil Rani <sunrani@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
84ba8062 |
|
15-Dec-2021 |
Roi Dayan <roid@nvidia.com> |
net/mlx5e: Test CT and SAMPLE on flow attr Currently the mlx5_flow object contains a single mlx5_attr instance. However, multi table actions (e.g. CT) instantiate multiple attr instances. Prepare for multiple attr instances by testing for CT or SAMPLE flag on attr flags instead of flow flag. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Chris Mi <cmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
e5d4e1da |
|
19-Dec-2021 |
Roi Dayan <roid@nvidia.com> |
net/mlx5e: Refactor eswitch attr flags to just attr flags The flags are flow attrs and not esw specific attr flags. Refactor to remove the esw prefix and move from eswitch.h to en_tc.h where struct mlx5_flow_attr exists. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
eeed226e |
|
05-Dec-2021 |
Roi Dayan <roid@nvidia.com> |
net/mlx5e: TC, Hold sample_attr on stack instead of pointer In later commit we are going to instantiate multiple attr instances for flow instead of single attr. Parsing TC sample allocates a new memory but there is no symmetric cleanup in the infrastructure. To avoid asymmetric alloc/free use sample_attr as part of the flow attr and not allocated and held as a pointer. This will avoid a cleanup leak when sample action is not on the first attr. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
07666c75 |
|
28-Jan-2022 |
Ariel Levkovich <lariel@nvidia.com> |
net/mlx5: Fix wrong limitation of metadata match on ecpf Match metadata support check returns false for ecpf device. However, this support does exist for ecpf and therefore this limitation should be removed to allow feature such as stacked devices and internal port offloaded to be supported. Fixes: 92ab1eb392c6 ("net/mlx5: E-Switch, Enable vport metadata matching if firmware supports it") Signed-off-by: Ariel Levkovich <lariel@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
de31854e |
|
24-Nov-2021 |
Dima Chumak <dchumak@nvidia.com> |
net/mlx5e: Fix nullptr on deleting mirroring rule Deleting a Tc rule with multiple outputs, one of which is internal port, like this one: tc filter del dev enp8s0f0_0 ingress protocol ip pref 5 flower \ dst_mac 0c:42:a1:d1:d0:88 \ src_mac e4:ea:09:08:00:02 \ action tunnel_key set \ src_ip 0.0.0.0 \ dst_ip 7.7.7.8 \ id 8 \ dst_port 4789 \ action mirred egress mirror dev vxlan_sys_4789 pipe \ action mirred egress redirect dev enp8s0f0_1 Triggers a call trace: BUG: kernel NULL pointer dereference, address: 0000000000000230 RIP: 0010:del_sw_hw_rule+0x2b/0x1f0 [mlx5_core] Call Trace: tree_remove_node+0x16/0x30 [mlx5_core] mlx5_del_flow_rules+0x51/0x160 [mlx5_core] __mlx5_eswitch_del_rule+0x4b/0x170 [mlx5_core] mlx5e_tc_del_fdb_flow+0x295/0x550 [mlx5_core] mlx5e_flow_put+0x1f/0x70 [mlx5_core] mlx5e_delete_flower+0x286/0x390 [mlx5_core] tc_setup_cb_destroy+0xac/0x170 fl_hw_destroy_filter+0x94/0xc0 [cls_flower] __fl_delete+0x15e/0x170 [cls_flower] fl_delete+0x36/0x80 [cls_flower] tc_del_tfilter+0x3a6/0x6e0 rtnetlink_rcv_msg+0xe5/0x360 ? rtnl_calcit.isra.0+0x110/0x110 netlink_rcv_skb+0x46/0x110 netlink_unicast+0x16b/0x200 netlink_sendmsg+0x202/0x3d0 sock_sendmsg+0x33/0x40 ____sys_sendmsg+0x1c3/0x200 ? copy_msghdr_from_user+0xd6/0x150 ___sys_sendmsg+0x88/0xd0 ? ___sys_recvmsg+0x88/0xc0 ? do_futex+0x10c/0x460 __sys_sendmsg+0x59/0xa0 do_syscall_64+0x48/0x140 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fix by disabling offloading for flows matching esw_is_chain_src_port_rewrite() which have more than one output. Fixes: 10742efc20a4 ("net/mlx5e: VF tunnel TX traffic offloading") Signed-off-by: Dima Chumak <dchumak@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
e9d491a6 |
|
21-Oct-2021 |
Parav Pandit <parav@nvidia.com> |
net/mlx5: E-switch, move offloads mode callbacks to offloads file eswitch.c is mainly for common code between legacy and offloads mode. MAC address get and set via devlink is applicable only in offloads mode. Hence, move it to eswitch_offloads.c file. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
e219440d |
|
23-Nov-2021 |
Maor Dickman <maord@nvidia.com> |
net/mlx5: E-Switch, Use indirect table only if all destinations support it When adding rule with multiple destinations, indirect table is used for all of the destinations if at least one of the destinations support it, this can cause creation of invalid indirect tables for the destinations that doesn't support it. Fixed it by using indirect table only if all destinations support it. Fixes: a508728a4c8b ("net/mlx5e: VF tunnel RX traffic offloading") Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
43a0696f |
|
20-Oct-2021 |
Mark Bloch <mbloch@nvidia.com> |
net/mlx5: E-Switch, fix single FDB creation on BlueField Always use MLX5_FLOW_TABLE_OTHER_VPORT flag when creating egress ACL table for single FDB. Not doing so on BlueField will make firmware fail the command. On BlueField the E-Switch manager is the ECPF (vport 0xFFFE) which is filled in the flow table creation command but as the other_vport field wasn't set the firmware complains about a bad parameter. This is different from a regular HCA where the E-Switch manager vport is the PF (vport 0x0). Passing MLX5_FLOW_TABLE_OTHER_VPORT will make the firmware happy both on BlueField and on regular HCAs without special condition for each. This fixes the bellow firmware syndrome: mlx5_cmd_check:819:(pid 571): CREATE_FLOW_TABLE(0x930) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x754a4) Fixes: db202995f503 ("net/mlx5: E-Switch, add logic to enable shared FDB") Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
c4c31767 |
|
01-Nov-2021 |
Raed Salem <raeds@nvidia.com> |
net/mlx5: E-Switch, return error if encap isn't supported On regular ConnectX HCAs getting encap mode isn't supported when the E-Switch is in NONE mode. Current code would return no error code when trying to get encap mode in such case which is wrong. Fix by returning error value to indicate failure to caller in such case. Fixes: 8e0aa4bc959c ("net/mlx5: E-switch, Protect eswitch mode changes") Signed-off-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
d7751d64 |
|
20-May-2021 |
Paul Blakey <paulb@nvidia.com> |
net/mlx5: E-Switch, Fix resetting of encap mode when entering switchdev E-Switch encap mode is relevant only when in switchdev mode. The RDMA driver can query the encap configuration via mlx5_eswitch_get_encap_mode(). Make sure it returns the currently used mode and not the set one. This reverts the cited commit which reset the encap mode on entering switchdev and fixes the original issue properly. Fixes: 9a64144d683a ("net/mlx5: E-Switch, Fix default encap mode") Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
166f431e |
|
29-Apr-2021 |
Ariel Levkovich <lariel@nvidia.com> |
net/mlx5e: Add indirect tc offload of ovs internal port Register callbacks for tc blocks of ovs internal port devices. This allows an indirect offloading rules that apply on such devices as the filter device. In case a rule is added to a tc block of an internal port, the mlx5 driver will implicitly add a matching on the internal port's unique vport metadata value to the rule's matching list. Therefore, only packets that previously hit a rule that redirects to an internal port and got the vport metadata overwritten to the internal port's unique metadata, can match on such indirect rule. Offloading of both ingress and egress tc blocks of internal ports is supported as opposed to other devices where only ingress block offloading is supported. Signed-off-by: Ariel Levkovich <lariel@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
27484f71 |
|
08-Jan-2021 |
Ariel Levkovich <lariel@nvidia.com> |
net/mlx5e: Offload tc rules that redirect to ovs internal port Allow offloading rules that redirect to ovs internal port ingress and egress. To support redirect to ingress device, offloading of REDIRECT_INGRESS action is added. When a tc rule redirects to ovs internal port, the hw rule will overwrite the input vport value in reg_c0 with a new vport metadata value that is mapped for this internal port using the internal port mapping api that is introduce in previous patches. After that the hw rule will redirect the packet to the root table to continue processing with the new vport metadata value. The new vport metadata value indicates that this packet is now arriving through an internal port and therefore should be processed using rules that apply on the same internal port as the filter device. Therefore, following rules that apply on this internal port will have to match on the same vport metadata value as part of their matching keys to make sure the packet belongs to the internal port. Signed-off-by: Ariel Levkovich <lariel@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
4f4edcc2 |
|
29-Apr-2021 |
Ariel Levkovich <lariel@nvidia.com> |
net/mlx5: E-Switch, Add ovs internal port mapping to metadata support Adding infrastructure to map ovs internal port device to vport match metadata to support offload of rules with internal port as the filter device or as the destination device. The infrastructure allows adding and removing internal port device to an eswitch database and getting a unique vport metadata value to be placed and match on in reg_c0 when offloading rules that are coming from or going to an internal port. The new int port metadata can be written to the source port register in HW to indicate that current source port of the packet is the internal port and not one of the actual HW vports (uplink or VF). Using this method, it is possible to offload TC rules with an OVS internal port as their destination port (overwriting the src vport register) or as the filter port (matching on the value of the src vport register and making sure it matches to the internal port's value). There is also a need to handle a miss case where the packet's src port value was changed in HW to an internal port but a following rule which matches on this new src port value wasn't found in HW. In such case, the packet will be forwarded to the driver with metadata which allows driver to restore the info of the internal port's netdevice. Once this info is restored, the uplink driver can forward the packet to the relevant netdevice in SW. Signed-off-by: Ariel Levkovich <lariel@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
40888162 |
|
20-Sep-2021 |
Maor Dickman <maord@nvidia.com> |
net/mlx5: E-Switch, Use dynamic alloc for dest array Use dynamic allocation for the dest array in preparation for the next patch which increase MLX5_MAX_FLOW_FWD_VPORTS and will cause stack allocation to be bigger than 1024 bytes. Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
2ec16ddd |
|
16-Sep-2021 |
Rongwei Liu <rongweil@nvidia.com> |
net/mlx5: Introduce new device index wrapper Downstream patches. Signed-off-by: Rongwei Liu <rongweil@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
6ba2e2b3 |
|
07-Sep-2021 |
Vlad Buslov <vladbu@nvidia.com> |
net/mlx5e: Support accept action Support TC generic 'accept' action in mlx5 by introducing MLX5_ESW_ATTR_FLAG_ACCEPT attribute flag. Flag has similar semantics to existing MLX5_ESW_ATTR_FLAG_SLOW_PATH flag, however, dedicated flag is required because existing 'slow path' flag can be flipped by tunneling subsystem when neighbor changes state. Introduce new helper function mlx5_esw_attr_flags_skip() to check whether attribute flags for 'slow path' or 'accept' action are set and use it in eswitch code instead of direct bit manipulation. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
806bf340 |
|
28-Sep-2021 |
Gustavo A. R. Silva <gustavoars@kernel.org> |
net/mlx5: Use kvcalloc() instead of kvzalloc() Use 2-factor argument form kvcalloc() instead of kvzalloc(). Link: https://github.com/KSPP/linux/issues/162 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
2741f223 |
|
21-Jun-2021 |
Chris Mi <cmi@nvidia.com> |
net/mlx5e: TC, Support sample offload action for tunneled traffic Currently the sample offload actions send the encapsulated packet to software. This commit decapsulates the packet before performing the sampling and set the tunnel properties on the skb metadata fields to make the behavior consistent with OVS sFlow. If decapsulating first, we can't use the same match like before in default table. So instantiate a post action instance to continue processing the action list. If HW can preserve reg_c, also use the post action instance. Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
bcd6740c |
|
18-Aug-2021 |
Chris Mi <cmi@nvidia.com> |
net/mlx5e: Move sample attribute to flow attribute Currently it is in eswitch attribute. Move it to flow attribute to reflect the change in previous patch. Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
db202995 |
|
03-Aug-2021 |
Mark Bloch <mbloch@nvidia.com> |
net/mlx5: E-Switch, add logic to enable shared FDB Shared FDB allows to direct traffic from all the vports in the HCA to a single eswitch. In order to do that three things are needed. 1) Point the ingress ACL of the slave uplink to that of the master. With this, wire traffic from both uplinks will reach the same eswitch with the same metadata where a single steering rule can catch traffic from both ports. 2) Set the FDB root flow table of the slave's eswitch to that of the master. As this flow table can change dynamically make sure to sync it on any set root flow table FDB command. This will make sure traffic from SFs, VFs, ECPFs and PFs reach the master eswitch. 3) Split wire traffic at the eswitch manager egress ACL so that it's directed to the native eswitch manager. We only treat wire traffic from both ports the same at the eswitch level. If such traffic wasn't handled in the eswitch it needs to reach the right representor to be processed by software. For example LACP packets should *always* reach the right uplink representor for correct operation. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
cac1eb2c |
|
03-Aug-2021 |
Mark Bloch <mbloch@nvidia.com> |
net/mlx5: Lag, properly lock eswitch if needed Currently when doing hardware lag we check the eswitch mode but as this isn't done under a lock the check isn't valid. As the code needs to sync between two different devices an extra care is needed. - When going to change eswitch mode, if hardware lag is active destroy it. - While changing eswitch modes block any hardware bond creation. - Delay handling bonding events until there are no mode changes in progress. - When attaching a new mdev to lag, block until there is no mode change in progress. In order for the mode change to finish the interface lock will have to be taken. Release the lock and sleep for 100ms to allow forward progress. As this is a very rare condition (can happen if the user unbinds and binds a PCI function while also changing eswitch mode of the other PCI function) it has no real world impact. As taking multiple eswitch mode locks is now required lockdep will complain about a possible deadlock. Register a key per eswitch to make lockdep happy. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
898b0786 |
|
03-Aug-2021 |
Mark Bloch <mbloch@nvidia.com> |
net/mlx5: Add send to vport rules on paired device When two mlx5 devices are paired in switchdev mode, always offload the send-to-vport rule to the peer E-Switch. This allows to abstract the logic when this is really necessary (single FDB) and combine the logic of both cases into one. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
c8e6a9e6 |
|
03-Aug-2021 |
Mark Bloch <mbloch@nvidia.com> |
net/mlx5: E-Switch, Add event callback for representors This callback will allow to notify representors about relevant events when in OFFLOADS mode. In downstream patches, this will be used to notify about PAIR/UNPAIR devcom events. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
2198b932 |
|
03-Aug-2021 |
Roi Dayan <roid@nvidia.com> |
net/mlx5e: Use shared mappings for restoring from metadata FTEs are added with mapped metadata which is saved per eswitch. When uplink reps are bonded and we are in a single FDB mode, we could fail to find metadata which was stored on one eswitch mapping but not the other or with a different id. To resolve this issue use shared mapping between eswitch ports. We do not have any conflict using a single mapping, for a type, between the ports. Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
d0444254 |
|
03-Aug-2021 |
Ariel Levkovich <lariel@nvidia.com> |
net/mlx5: E-Switch, set flow source for send to uplink rule Set the flow source param to local vport for the uplink rep send-to-vport rule. This will comply with the recent changes in SW steering that use the flow source as an indication for the rule type - rx or tx. Since the uplink send-to-vport rule is forwarding traffic to the wire it has to indicate that it is an sx rule and can't use the any port value in the flow source. Signed-off-by: Ariel Levkovich <lariel@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
979bf468 |
|
03-Aug-2021 |
Mark Bloch <mbloch@nvidia.com> |
{net, RDMA}/mlx5: Extend send to vport rules In shared FDB there is only one eswitch which is active and it receives traffic from all representors and all vports in the HCA. While the Ethernet representor will always reside on its native PF the IB representor will not. Extend send to vport rule creation to support such flows. Need to account for source vport that sends the traffic (on which the representors resides) and the target eswitch the traffic which reach. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
bcd68c04 |
|
22-Jul-2021 |
Jiapeng Chong <jiapeng.chong@linux.alibaba.com> |
net/mlx5: Fix missing return value in mlx5_devlink_eswitch_inline_mode_set() The return value is missing in this code scenario, add the return value '0' to the return value 'err'. Eliminate the follow smatch warning: drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c:3083 mlx5_devlink_eswitch_inline_mode_set() warn: missing error code 'err'. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Fixes: 8e0aa4bc959c ("net/mlx5: E-switch, Protect eswitch mode changes") Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
c85a6b8f |
|
28-Jul-2021 |
Aya Levin <ayal@nvidia.com> |
net/mlx5: Block switchdev mode while devlink traps are active Since switchdev mode can't support devlink traps, verify there are no active devlink traps before moving eswitch to switchdev mode. If there are active traps, prevent the switchdev mode configuration. Fixes: eb3862a0525d ("net/mlx5e: Enable traps according to link state") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
dd3fddb8 |
|
02-Jun-2021 |
Roi Dayan <roid@nvidia.com> |
net/mlx5: E-Switch, handle devcom events only for ports on the same device This is the same check as LAG mode checks if to enable lag. This will fix adding peer miss rules if lag is not supported and even an incorrect rules in socket direct mode. Also fix the incorrect comment on mlx5_get_next_phys_dev() as flow #1 doesn't exists. Fixes: ac004b832128 ("net/mlx5e: E-Switch, Add peer miss rules") Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
c6719725 |
|
22-Jun-2021 |
Maor Dickman <maord@nvidia.com> |
net/mlx5: E-Switch, Set destination vport vhca id only when merged eswitch is supported Destination vport vhca id is valid flag is set only merged eswitch isn't supported. Change destination vport vhca id value to be set also only when merged eswitch is supported. Fixes: e4ad91f23f10 ("net/mlx5e: Split offloaded eswitch TC rules for port mirroring") Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
ec3be887 |
|
04-Mar-2021 |
Vlad Buslov <vladbu@nvidia.com> |
net/mlx5: Create TC-miss priority and table In order to adhere to kernel software datapath model bridge offloads must come after TC and NF FDBs. Following patches in this series add new FDB priority for bridge after FDB_FT_OFFLOAD. However, since netfilter offload is implemented with unmanaged tables, its miss path is not automatically connected to next priority and requires the code to manually connect with slow table. To keep bridge offloads encapsulated and not mix it with eswitch offloads, create a new FDB_TC_MISS priority between FDB_FT_OFFLOAD and FDB_SLOW_PATH: + | +---------v----------+ | | | FDB_TC_OFFLOAD | | | +---------+----------+ | | | +---------v----------+ | | | FDB_FT_OFFLOAD | | | +---------+----------+ | | | +---------v----------+ | | | FDB_TC_MISS | | | +---------+----------+ | | | +---------v----------+ | | | FDB_SLOW_PATH | | | +---------+----------+ | v Initialize the new priority with single default empty managed table and use the table as TC/NF miss patch instead of slow table. This approach allows bridge offloads to be created as new FDB namespace priority between FDB_TC_MISS and FDB_SLOW_PATH without exposing its internal tables to any other modules since miss path of managed TC-miss table is automatically wired to next priority. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
2a2c84fa |
|
19-May-2021 |
Roi Dayan <roid@nvidia.com> |
net/mlx5e: Fix adding encap rules to slow path On some devices the ignore flow level cap is not supported and we shouldn't use it. Setting the dest ft with mlx5_chains_get_tc_end_ft() already gives the correct end ft if ignore flow level cap is supported or not. Fixes: 39ac237ce009 ("net/mlx5: E-Switch, Refactor chains and priorities") Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
f1b9acd3 |
|
08-Mar-2021 |
Parav Pandit <parav@nvidia.com> |
net/mlx5: SF, Extend SF table for additional SF id range Extended the SF table to cover additioanl SF id range of external controller. A user optionallly provides the external controller number when user wants to create SF on the external controller. An example on eswitch system: $ devlink dev eswitch set pci/0033:01:00.0 mode switchdev $ devlink port show pci/0033:01:00.0/196607: type eth netdev enP51p1s0f0np0 flavour physical port 0 splittable false pci/0033:01:00.0/131072: type eth netdev eth0 flavour pcipf controller 1 pfnum 0 external true splittable false function: hw_addr 00:00:00:00:00:00 $ devlink port add pci/0033:01:00.0 flavour pcisf pfnum 0 sfnum 77 controller 1 pci/0033:01:00.0/163840: type eth netdev eth1 flavour pcisf controller 1 pfnum 0 sfnum 77 external true splittable false function: hw_addr 00:00:00:00:00:00 state inactive opstate detached Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
47dd7e60 |
|
18-Mar-2021 |
Parav Pandit <parav@nvidia.com> |
net/mlx5: E-Switch, Use xarray for vport number to vport and rep mapping Currently vport number to vport and its representor are mapped using an array and an index. Vport numbers of different types of functions are not contiguous. Adding new such discontiguous range using index and number mapping is increasingly complex and hard to maintain. Hence, maintain an xarray of vport and rep whose lookup is done based on the vport number. Each VF and SF entry is marked with a xarray mark to identify the function type. Additionally PF and VF needs special handling for legacy inline mode. They are additionally marked as host function using additional HOST_FN mark. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Vu Pham <vuhuong@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
6308a5f0 |
|
02-Mar-2021 |
Parav Pandit <parav@nvidia.com> |
net/mlx5: E-Switch, Make vport number u16 Vport number is 16-bit field in hardware. Make it u16. Move location of vport in the structure so that it reduces a hole in the structure. Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
7bf481d7 |
|
30-Oct-2020 |
Parav Pandit <parav@nvidia.com> |
net/mlx5: E-Switch, let user to enable disable metadata Currently each packet inserted in eswitch is tagged with a internal metadata to indicate source vport. Metadata tagging is not always needed. Metadata insertion is needed for multi-port RoCE, failover between representors and stacked devices. In many other cases, metadata enablement is not needed. Metadata insertion slows down the packet processing rate of the E-switch when it is in switchdev mode. Below table show performance gain with metadata disabled for VXLAN offload rules in both SMFS and DMFS steering mode on ConnectX-5 device. ---------------------------------------------- | steering | metadata | pkt size | rx pps | | mode | | | (million) | ---------------------------------------------- | smfs | disabled | 128Bytes | 42 | ---------------------------------------------- | smfs | enabled | 128Bytes | 36 | ---------------------------------------------- | dmfs | disabled | 128Bytes | 42 | ---------------------------------------------- | dmfs | enabled | 128Bytes | 36 | ---------------------------------------------- Hence, allow user to disable metadata using driver specific devlink parameter. Metadata setting of the eswitch is applicable only for the switchdev mode. Example to show and disable metadata before changing eswitch mode: $ devlink dev param show pci/0000:06:00.0 name esw_port_metadata pci/0000:06:00.0: name esw_port_metadata type driver-specific values: cmode runtime value true $ devlink dev param set pci/0000:06:00.0 \ name esw_port_metadata value false cmode runtime $ devlink dev eswitch set pci/0000:06:00.0 mode switchdev Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Vu Pham <vuhuong@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> --- changelog: v1->v2: - added performance numbers in commit log - updated commit log and documentation for switchdev mode - added explicit note on when user can disable metadata in documentation
|
#
f94d6389 |
|
21-Sep-2020 |
Chris Mi <cmi@nvidia.com> |
net/mlx5e: TC, Add support to offload sample action The following diagram illustrates the hardware model for tc sample action: +---------------------+ + original flow table + +---------------------+ + original match + +---------------------+ | v +------------------------------------------------+ + Flow Sampler Object + +------------------------------------------------+ + sample ratio + +------------------------------------------------+ + sample table id | default table id + +------------------------------------------------+ | | v v +-----------------------------+ +----------------------------------------+ + sample table + + default table per <vport, chain, prio> + +-----------------------------+ +----------------------------------------+ + forward to management vport + + original match + +-----------------------------+ +----------------------------------------+ + other actions + +----------------------------------------+ The sample action is translated to a goto flow table object destination which samples packets according to the provided sample ratio. Sampled packets are duplicated. One copy is processed by a termination table, named the sample table, which sends the packet to the eswitch manager port (that will be processed by software). The second copy is processed by the default table which executes the subsequent actions. The default table is created per <vport, chain, prio> tuple as rules with different prios and chains may overlap. Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
c9355682 |
|
30-Aug-2020 |
Chris Mi <cmi@nvidia.com> |
net/mlx5: Instantiate separate mapping objects for FDB and NIC tables Currently, the u32 chain id is mapped to u16 value which is stored on the lower 16 bits of reg_c0 for FDB and reg_b for NIC tables. The mapping is internally maintained by the chains object. However, with the introduction of reg_c0 objects the fdb may store more than just the chain id on reg_c0. This is not relevant for NIC tables. Separate the chains mapping instantiation for FDB and NIC tables. Remove the mapping from the chains object. For FDB tables, create the mapping per eswitch. For NIC tables, create the mapping per tc table. Pass the corresponding mapping pointer when creating the chains object. Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
a91d98a0 |
|
10-Sep-2020 |
Chris Mi <cmi@nvidia.com> |
net/mlx5: Map register values to restore objects Currently reg_c0 lower 16 bits and reg_b are used to store the chain id that missed in FDB and NIC tables accordingly. However, the registers' values may index a restore object, rather than a single u32 value. Different object types can be used to restore mutually exclusive contexts such as chain id and sample group id. Use the mapping object to associate an index with a restore object as a prestep for supporting additional restore types. Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
c796bb7c |
|
30-Aug-2020 |
Chris Mi <cmi@nvidia.com> |
net/mlx5: E-switch, Generalize per vport table API Currently, per vport table was used only for port mirroring actions. However, sample action will also require a per vport table instance. Generalize the vport table API to work with multiple namespaces where each namespace manages its own vport table instance. Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
0a9e2307 |
|
14-Jan-2021 |
Chris Mi <cmi@nvidia.com> |
net/mlx5: E-switch, Rename functions to follow naming convention. Public api starts with mlx5 and remove mlx5 for non-public api. Signed-off-by: Chris Mi <cmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
4c7f4028 |
|
30-Aug-2020 |
Chris Mi <cmi@nvidia.com> |
net/mlx5: E-switch, Move vport table functions to a new file Currently, the vport table functions are in common eswitch offload file. This file is too big. Move the vport table create, delete and lookup functions to a separate file. Put the file in esw directory. Pre-step for generalizing its functionality for serving both the mirroring and the sample features. Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
ea6c8635 |
|
22-Mar-2021 |
Wan Jiabing <wanjiabing@vivo.com> |
net: ethernet: indir_table.h is included twice indir_table.h has been included at line 41, so remove the duplicate one at line 43. Signed-off-by: Wan Jiabing <wanjiabing@vivo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7dc84de9 |
|
16-Sep-2020 |
Roi Dayan <roid@nvidia.com> |
net/mlx5: E-Switch, Protect changing mode while adding rules We re-use the native NIC port net device instance for the Uplink representor, a driver currently cannot unbind TC setup callback actively, hence protect changing E-Switch mode while adding rules. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
c55479d0 |
|
16-Sep-2020 |
Roi Dayan <roid@nvidia.com> |
net/mlx5: E-Switch, Change mode lock from mutex to rw semaphore E-Switch mode change routine will take the write lock to prevent any consumer to access the E-Switch resources while E-Switch is going through a mode change. In the next patch E-Switch consumers (e.g vport representors) will take read_lock prior to accessing E-Switch resources to prevent E-Switch mode changing in the middle of the operation. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
865d6d1c |
|
19-Oct-2020 |
Roi Dayan <roid@nvidia.com> |
net/mlx5e: Move devlink port register and unregister calls We will re-use the native NIC port net device instance for the Uplink representor. As such we also don't want to unregister/register the devlink port as part of the profile. Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
3a46f4fb |
|
11-Mar-2021 |
Mark Bloch <mbloch@nvidia.com> |
net/mlx5: E-Switch, Refactor send to vport to be more generic Now that each representor stores a pointer to the managing E-Switch use that information when creating the send-to-vport rules. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
59c904c8 |
|
11-Mar-2021 |
Mark Bloch <mbloch@nvidia.com> |
net/mlx5: E-Switch, Add eswitch pointer to each representor Store the managing E-Switch of each representor. This will be used when a representor is created on eswitch manager 0 but the vport belongs to eswitch manager 1. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
7d97822a |
|
11-Mar-2021 |
Mark Bloch <mbloch@nvidia.com> |
net/mlx5: E-Switch, Add match on vhca id to default send rules Match on the vhca id of the E-Switch owner when creating the send-to-vport representor rules. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
027d7166 |
|
11-Mar-2021 |
Zheng Yongjun <zhengyongjun3@huawei.com> |
net/mlx5: simplify the return expression of mlx5_esw_offloads_pair() Simplify the return expression. Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
9f4d9283 |
|
09-Mar-2021 |
Roi Dayan <roid@nvidia.com> |
net/mlx5e: Alloc flow spec using kvzalloc instead of kzalloc flow spec is not small and we do allocate it using kvzalloc in most places of the driver. fix rest of the places to use kvzalloc to avoid failure in allocation when memory is too fragmented. Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
7bef147a |
|
03-Dec-2020 |
Saeed Mahameed <saeedm@nvidia.com> |
net/mlx5: Don't skip vport check Users of mlx5_eswitch_get_vport() are required to check return value prior to passing mlx5_vport further. Fix all the places to do not skip that check. Reviewed-by: Eli Cohen <elic@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
e929e3da |
|
15-Mar-2021 |
Maor Dickman <maord@nvidia.com> |
net/mlx5: E-switch, Create vport miss group only if src rewrite is supported Create send to vport miss group was added in order to support traffic recirculation to root table with metadata source rewrite. This group is created also in case source rewrite isn't supported. Fixed by creating send to vport miss group only if source rewrite is supported by FW. Fixes: 8e404fefa58b ("net/mlx5e: Match recirculated packet miss in slow table using reg_c1") Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
f574531a |
|
01-Mar-2021 |
Maor Dickman <maord@nvidia.com> |
net/mlx5: Disable VF tunnel TX offload if ignore_flow_level isn't supported VF tunnel TX traffic offload is adding flow which forward to flow tables with lower level, which isn't support on all FW versions and may cause firmware to fail with syndrome. Fixed by enabling VF tunnel TX offload only if flow table capability ignore_flow_level is enabled. Fixes: 10742efc20a4 ("net/mlx5e: VF tunnel TX traffic offloading") Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
8e404fef |
|
31-Aug-2020 |
Vlad Buslov <vladbu@nvidia.com> |
net/mlx5e: Match recirculated packet miss in slow table using reg_c1 Previous patch in series that implements stack devices RX path implements indirect table rules that match on tunnel VNI. After such rule is created all tunnel traffic is recirculated to root table. However, recirculated packet might not match on any rules installed in the table (for example, when IP traffic follows ARP traffic). In that case packets appear on representor of tunnel endpoint VF instead being redirected to the VF itself. Extend slow table with additional flow group that matches on reg_c0 (source port value set by indirect tables implemented by previous patch in series) and reg_c1 (special 0xFFF mark). When creating offloads fdb tables, install one rule per VF vport to match on recirculated miss packets and redirect them to appropriate VF vport. Modify indirect tables code to also rewrite reg_c1 with special 0xFFF mark. Implementation reuses reg_c1 tunnel id bits. This is safe to do because recirculated packets are always matched before decapsulation. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
a508728a |
|
25-Jan-2021 |
Vlad Buslov <vladbu@nvidia.com> |
net/mlx5e: VF tunnel RX traffic offloading When tunnel endpoint is on VF the encapsulated RX traffic is exposed on the representor of the VF without any further processing of rules installed on the VF. Detect such case by checking if the device returned by route lookup in decap rule handling code is a mlx5 VF and handle it with new redirection tables API. Example TC rules for VF tunnel traffic: 1. Rule that encapsulates the tunneled flow and redirects packets from source VF rep to tunnel device: $ tc -s filter show dev enp8s0f0_1 ingress filter protocol ip pref 4 flower chain 0 filter protocol ip pref 4 flower chain 0 handle 0x1 dst_mac 0a:40:bd:30:89:99 src_mac ca:2e:a7:3f:f5:0f eth_type ipv4 ip_tos 0/0x3 ip_flags nofrag in_hw in_hw_count 1 action order 1: tunnel_key set src_ip 7.7.7.5 dst_ip 7.7.7.1 key_id 98 dst_port 4789 nocsum ttl 64 pipe index 1 ref 1 bind 1 installed 411 sec used 411 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 no_percpu used_hw_stats delayed action order 2: mirred (Egress Redirect to device vxlan_sys_4789) stolen index 1 ref 1 bind 1 installed 411 sec used 0 sec Action statistics: Sent 5615833 bytes 4028 pkt (dropped 0, overlimits 0 requeues 0) Sent software 0 bytes 0 pkt Sent hardware 5615833 bytes 4028 pkt backlog 0b 0p requeues 0 cookie bb406d45d343bf7ade9690ae80c7cba4 no_percpu used_hw_stats delayed 2. Rule that redirects from tunnel device to UL rep: $ tc -s filter show dev vxlan_sys_4789 ingress filter protocol ip pref 4 flower chain 0 filter protocol ip pref 4 flower chain 0 handle 0x1 dst_mac ca:2e:a7:3f:f5:0f src_mac 0a:40:bd:30:89:99 eth_type ipv4 enc_dst_ip 7.7.7.5 enc_src_ip 7.7.7.1 enc_key_id 98 enc_dst_port 4789 enc_tos 0 ip_flags nofrag in_hw in_hw_count 1 action order 1: tunnel_key unset pipe index 2 ref 1 bind 1 installed 434 sec used 434 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 used_hw_stats delayed action order 2: mirred (Egress Redirect to device enp8s0f0_1) stolen index 4 ref 1 bind 1 installed 434 sec used 0 sec Action statistics: Sent 129936 bytes 1082 pkt (dropped 0, overlimits 0 requeues 0) Sent software 0 bytes 0 pkt Sent hardware 129936 bytes 1082 pkt backlog 0b 0p requeues 0 cookie ac17cf398c4c69e4a5b2f7aabd1b88ff no_percpu used_hw_stats delayed Co-developed-by: Dmytro Linkin <dlinkin@nvidia.com> Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
34ca6535 |
|
24-Jan-2021 |
Vlad Buslov <vladbu@nvidia.com> |
net/mlx5: E-Switch, Indirect table infrastructure Indirect table infrastructure is used to allow fully processing VF tunnel traffic in hardware. Kernel software model uses two TC rules for such traffic: UL rep to tunnel device, then tunnel VF rep to destination VF rep. To implement such pipeline driver needs to program the hardware after matching on UL rule to overwrite source vport from UL to tunnel VF and recirculate the packet to the root table to allow matching on the rule installed on tunnel VF. For this indirect table matches all encapsulated traffic by tunnel parameters and all other IP traffic is sent to tunnel VF by the miss rule. Indirect table API overview: - mlx5_esw_indir_table_{init|destroy}() - init and destroy opaque indirect table object. - mlx5_esw_indir_table_get() - get or create new table according to vport id and IP version. Table has following pre-created groups: recirculation group with match on ethertype and VNI (rules that match encapsulated packets are installed to this group) and forward group with default/miss rule that forwards to vport of tunnel endpoint VF (rule for regular non-encapsulated packets). - mlx5_esw_indir_table_put() - decrease reference to the indirect table and matching rule (for encapsulated traffic). - mlx5_esw_indir_table_needed() - check that in_port is an uplink port and out_port is VF on the same eswitch, verify that the rule is for IP traffic and source port rewrite functionality can be used. - mlx5_esw_indir_table_decap_vport() - function returns decap vport of flow attribute. Co-developed-by: Dmytro Linkin <dlinkin@nvidia.com> Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
10742efc |
|
21-Jan-2021 |
Vlad Buslov <vladbu@nvidia.com> |
net/mlx5e: VF tunnel TX traffic offloading When tunnel endpoint is on VF, driver still assumes that endpoint is on uplink and incorrectly configures encap rule offload according to that assumption. As a result, traffic is sent directly to the uplink and rules installed on representor of tunnel endpoint VF are ignored. Implement following changes to allow offloading tx traffic with tunnel endpoint on VF: - For tunneling flows perform route lookup on route and out devices pair. If out device is uplink and route device is VF of same physical port, then modify packet reg_c_0 metadata register (source port) with the value of VF vport. Use eswitch vhca_id->vport mapping introduced in one of previous patches in the series to obtain vport from route netdevice. - Recirculate encapsulated packets to VF vport in order to apply any flow rules installed on VF representor that match on encapsulated traffic. Only enable support for this functionality when all following conditions are true: - Hardware advertises capability to preserve reg_c_0 value on packet recirculation. - Vport metadata matching is enabled. - Termination tables are to be used by the flow. Example TC rules for VF tunnel traffic: 1. Rule that redirects packets from UL to VF rep that has the tunnel endpoint IP address: $ tc -s filter show dev enp8s0f0 ingress filter protocol ip pref 4 flower chain 0 filter protocol ip pref 4 flower chain 0 handle 0x1 dst_mac 16:c9:a0:2d:69:2c src_mac 0c:42:a1:58:ab:e4 eth_type ipv4 ip_flags nofrag in_hw in_hw_count 1 action order 1: mirred (Egress Redirect to device enp8s0f0_0) stolen index 3 ref 1 bind 1 installed 377 sec used 0 sec Action statistics: Sent 114096 bytes 952 pkt (dropped 0, overlimits 0 requeues 0) Sent software 0 bytes 0 pkt Sent hardware 114096 bytes 952 pkt backlog 0b 0p requeues 0 cookie 878fa48d8c423fc08c3b6ca599b50a97 no_percpu used_hw_stats delayed 2. Rule that decapsulates the tunneled flow and redirects to destination VF representor: $ tc -s filter show dev vxlan_sys_4789 ingress filter protocol ip pref 4 flower chain 0 filter protocol ip pref 4 flower chain 0 handle 0x1 dst_mac ca:2e:a7:3f:f5:0f src_mac 0a:40:bd:30:89:99 eth_type ipv4 enc_dst_ip 7.7.7.5 enc_src_ip 7.7.7.1 enc_key_id 98 enc_dst_port 4789 enc_tos 0 ip_flags nofrag in_hw in_hw_count 1 action order 1: tunnel_key unset pipe index 2 ref 1 bind 1 installed 434 sec used 434 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 used_hw_stats delayed action order 2: mirred (Egress Redirect to device enp8s0f0_1) stolen index 4 ref 1 bind 1 installed 434 sec used 0 sec Action statistics: Sent 129936 bytes 1082 pkt (dropped 0, overlimits 0 requeues 0) Sent software 0 bytes 0 pkt Sent hardware 129936 bytes 1082 pkt backlog 0b 0p requeues 0 cookie ac17cf398c4c69e4a5b2f7aabd1b88ff no_percpu used_hw_stats delayed Co-developed-by: Dmytro Linkin <dlinkin@nvidia.com> Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
9e51c0a6 |
|
20-Jan-2021 |
Vlad Buslov <vladbu@nvidia.com> |
net/mlx5: E-Switch, Refactor rule offload forward action processing Following patches in the series extend forwarding functionality with VF tunnel TX and RX handling. Extract action forwarding processing code into dedicated functions to simplify further extensions: - Handle every forwarding case with dedicated function instead of inline code. - Extract forwarding dest dispatch conditional into helper function esw_setup_dests(). - Unify forwaring cleanup code in error path of mlx5_eswitch_add_offloaded_rule() and in rule deletion code of __mlx5_eswitch_del_rule() in new helper function esw_cleanup_dests() (dual to new esw_setup_dests() helper). This patch does not change functionality. Co-developed-by: Dmytro Linkin <dlinkin@nvidia.com> Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
84ae9c1f |
|
23-Sep-2020 |
Vlad Buslov <vladbu@nvidia.com> |
net/mlx5e: E-Switch, Maintain vhca_id to vport_num mapping Following patches in the series need to be able to map VF netdev to vport. Since it is trivial to obtain vhca_id from netdev, maintain mapping from vhca_id to vport_num inside eswitch offloads using xarray. Provide function mlx5_eswitch_vhca_id_to_vport() to be used by TC code in following patches to obtain the mapping. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
b055ecf5 |
|
12-Oct-2020 |
Mark Bloch <mbloch@nvidia.com> |
net/mlx5: E-Switch, Refactor setting source port Setting the source port requires only the E-Switch and vport number. Refactor the function to get those parameters instead of passing the full attribute. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
d970812b |
|
11-Dec-2020 |
Parav Pandit <parav@nvidia.com> |
net/mlx5: E-switch, Add eswitch helpers for SF vport Add helpers to enable/disable eswitch port, register its devlink port and load its representor. Signed-off-by: Vu Pham <vuhuong@nvidia.com> Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
d7f33a45 |
|
11-Dec-2020 |
Vu Pham <vuhuong@nvidia.com> |
net/mlx5: E-switch, Prepare eswitch to handle SF vport Prepare eswitch to handle SF vport during (a) querying eswitch functions (b) egress ACL creation (c) account for SF vports in total vports calculation Assign a dedicated placeholder for SFs vports and their representors. They are placed after VFs vports and before ECPF vports as below: [PF,VF0,...,VFn,SF0,...SFm,ECPF,UPLINK]. Change functions to map SF's vport numbers to indices when accessing the vports or representors arrays, and vice versa. Signed-off-by: Vu Pham <vuhuong@nvidia.com> Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
873d2f12 |
|
09-Dec-2020 |
Zheng Yongjun <zhengyongjun3@huawei.com> |
net: mlx5: convert comma to semicolon Replace a comma between expression statements by a semicolon. Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ade84367 |
|
01-Dec-2020 |
Zhu Yanjun <zyjzyj2000@gmail.com> |
net/mlx5e: remove unnecessary memset Since kvzalloc will initialize the allocated memory, it is not necessary to initialize it once again. Fixes: 11b717d61526 ("net/mlx5: E-Switch, Get reg_c0 value on CQE") Signed-off-by: Zhu Yanjun <yanjunz@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
036e19b9 |
|
29-Aug-2020 |
Hamdan Igbaria <hamdani@mellanox.com> |
net/mlx5: E-Switch, Support flow source for local vport Set flow source as hint for local vport. Signed-off-by: Hamdan Igbaria <hamdani@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
c7eddc60 |
|
31-Aug-2020 |
Parav Pandit <parav@nvidia.com> |
net/mlx5: E-switch, Move devlink eswitch ports closer to eswitch Currently devlink eswitch ports are registered and unregistered by the representor layer. However it is better to register them at eswitch layer so that in future user initiated command port add and delete commands can also register/unregister devlink ports without depending on representor layer. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Vu Pham <vuhuong@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
38679b5a |
|
31-Aug-2020 |
Parav Pandit <parav@nvidia.com> |
net/mlx5: E-switch, Use helper function to load unload representor To register and unregister devlink ports when loading/unload representors, refactor the code to helper functions. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Vu Pham <vuhuong@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
2c40db2f |
|
01-Sep-2020 |
Parav Pandit <parav@nvidia.com> |
net/mlx5: E-switch, Add helper to check egress ACL need Currently only VF vports need egress ACL table. Add a generic helper to check whether a vport need egress ACL table or not. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Vu Pham <vuhuong@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
7cd7becd |
|
10-Sep-2020 |
sunils <sunils@nvidia.com> |
net/mlx5: E-switch, Use PF num in metadata reg c0 Currently only 256 vports can be supported as only 8 bits are reserved for them and 8 bits are reserved for vhca_ids in metadata reg c0. To support more than 256 vports, replace vhca_id with a unique shorter 4-bit PF number which covers upto 16 PF's. Use remaining 12 bits for vports ranging 1-4095. This will continue to generate unique metadata even if multiple PCI devices have same switch_id. Signed-off-by: sunils <sunils@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Vu Pham <vuhuong@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
c620b772 |
|
29-Apr-2020 |
Ariel Levkovich <lariel@mellanox.com> |
net/mlx5: Refactor tc flow attributes structure In order to support chains and connection tracking offload for nic flows, there's a need to introduce a common flow attributes struct so that these features can be agnostic and have access to a single attributes struct, regardless of the flow type. Therefore, a new tc flow attributes format is introduced to allow access to attributes that are common to eswitch and nic flows. The common attributes will always get allocated for the new flows, regardless of their type, while the type specific attributes are separated into different structs and will be allocated based on the flow type to avoid memory waste. When allocating the flow attributes the caller provides the flow steering namespace and according the namespace type the additional space for the extra, type specific, attributes is determined and added to the total attribute allocation size. In addition, the attributes that are going to be common to both flow types are moved to the common attributes struct. Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
ae430332 |
|
24-Apr-2020 |
Ariel Levkovich <lariel@mellanox.com> |
net/mlx5: Refactor multi chains and prios support Decouple the chains infrastructure from eswitch and make it generic to support other steering namespaces. The change defines an agnostic data structure to keep all the relevant information for maintaining flow table chaining in any steering namespace. Each namespace that requires table chaining will be required to allocate such data structure. The chains creation code will receive the steering namespace and flow table parameters from the caller so it will operate agnosticly when creating the required resources to maintain the table chaining function while Parts of the code that are relevant to eswitch specific functionality are moved to eswitch files. Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
6cec0229 |
|
05-Aug-2020 |
Maor Dickman <maord@mellanox.com> |
net/mlx5e: Enable adding peer miss rules only if merged eswitch is supported The cited commit creates peer miss group during switchdev mode initialization in order to handle miss packets correctly while in VF LAG mode. This is done regardless of FW support of such groups which could cause rules setups failure later on. Fix by adding FW capability check before creating peer groups/rule. Fixes: ac004b832128 ("net/mlx5e: E-Switch, Add peer miss rules") Signed-off-by: Maor Dickman <maord@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Raed Salem <raeds@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
cd1ef966 |
|
18-May-2020 |
Vu Pham <vuhuong@mellanox.com> |
net/mlx5: E-Switch, Use vport metadata matching by default Multiple features use metadata matching such as bond vport in live migration, multi-port RoCE mode, stacked devices; hence, enable vport metadata matching by default. Fixes: 1e62e222db2e ("net/mlx5: E-Switch, Use vport metadata matching only when mandatory") Signed-off-by: Vu Pham <vuhuong@mellanox.com> Reviewed-by: Bodong Wang <bodong@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
fc99c3d6 |
|
22-May-2020 |
Vu Pham <vuhuong@mellanox.com> |
net/mlx5: E-Switch, Setup all vports' metadata to support peer miss rule In merged eswitch configuration, peer miss rule is setup for all vports. If metadata is enabled, peer miss rule with metadata matching will be configured instead of source port matching; however, some vports that have not yet been enabled don't have default_metadata setup and their default_metadata will be zero. Hence, setup/cleanup default metadata for all vports when eswitch moves in/out of offloads mode. Fixes: 133dcfc577ea ("net/mlx5: E-Switch, Alloc and free unique metadata for match") Signed-off-by: Vu Pham <vuhuong@mellanox.com> Reviewed-by: Bodong Wang <bodong@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
406493a5 |
|
23-Jun-2020 |
Vu Pham <vuhuong@mellanox.com> |
net/mlx5: E-Switch, Dedicated metadata for uplink vport Uplink vport must have a dedicated metadata with vhca_id being part of the metadata. Fixes: 133dcfc577ea ("net/mlx5: E-Switch, Alloc and free unique metadata for match") Signed-off-by: Vu Pham <vuhuong@mellanox.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
4e9a9ef7 |
|
23-Jun-2020 |
Vu Pham <vuhuong@mellanox.com> |
net/mlx5: E-Switch, Check and enable metadata support flag before using Check E-Switch capabilities and enable metadata support flag before using it to setup other features that need metadata. Signed-off-by: Vu Pham <vuhuong@mellanox.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
a53cf949 |
|
08-Sep-2020 |
Parav Pandit <parav@nvidia.com> |
net/mlx5: E-switch, Read controller number from device ECPF supports one external host controller. Read controller number from the device. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6f7bbad1 |
|
02-Jul-2020 |
Jianbo Liu <jianbol@mellanox.com> |
net/mlx5e: E-Switch, Specify flow_source for rule with no in_port The flow_source must be specified, even for rule without matching source vport, because some actions are only allowed in uplink. Otherwise, rule can't be offloaded and firmware syndrome happens. Fixes: 6fb0701a9cfa ("net/mlx5: E-Switch, Add support for offloading rules with no in_port") Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Chris Mi <chrism@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
0faddfe6 |
|
01-Jul-2020 |
Jianbo Liu <jianbol@mellanox.com> |
net/mlx5e: E-Switch, Add misc bit when misc fields changed for mirroring The modified flow_context fields in FTE must be indicated in modify_enable bitmask. Previously, the misc bit in modify_enable is always set as source vport must be set for each rule. So, when parsing vxlan/gre/geneve/qinq rules, this bit is not set because those are all from the same misc fileds that source vport fields are located at, and we don't need to set the indicator twice. After adding per vport tables for mirroring, misc bit is not set, then firmware syndrome happens. To fix it, set the bit wherever misc fileds are changed. This also makes it unnecessary to check misc fields and set the misc bit accordingly in metadata matching, so here remove it. Besides, flow_source must be specified for uplink because firmware will check it and some actions are only allowed for packets received from uplink. Fixes: 96e326878fa5 ("net/mlx5e: Eswitch, Use per vport tables for mirroring") Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Chris Mi <chrism@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
c8b838d1 |
|
27-Jul-2020 |
Gustavo A. R. Silva <gustavoars@kernel.org> |
net/mlx5: Use fallthrough pseudo-keyword Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
8d6bd3c3 |
|
20-Jul-2020 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: E-switch, Use eswitch total_vports Currently steering table and rx group initialization helper routines works on the total_vports passed as input parameter. Both eswitch helpers work on the mlx5_eswitch and thereby have access to esw->total_vports. Hence use it directly instead of passing it via function input arguments. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
0da3c12d |
|
20-Jul-2020 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: E-switch, Reuse total_vports and avoid duplicate nvports Total e-switch vports are already stored in mlx5_eswitch total_vports. Avoid copy of it in nvports and reuse existing total_vports calculation. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
8b95bda4 |
|
20-Jul-2020 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: E-switch, Consider maximum vf vports for steering init When eswitch is enabled, VFs might not be enabled. Hence, consider maximum number of VFs. This further closes the gap between handling VF vports between ECPF and PF. Fixes: ea2128fd632c ("net/mlx5: E-switch, Reduce dependency on num_vfs during mode set") Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
ea2128fd |
|
26-Jun-2020 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: E-switch, Reduce dependency on num_vfs during mode set Currently only ECPF allows enabling eswitch when SR-IOV is disabled. Enable PF also to enable eswitch when SR-IOV is disabled. Load VF vports when eswitch is already enabled. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
bd939753 |
|
18-Jun-2020 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: E-switch, Introduce and use eswitch support check helper Introduce an helper routine to get esw from a devlink device and use it at eswitch callbacks and in subsequent patch. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
133dcfc5 |
|
28-Feb-2020 |
Vu Pham <vuhuong@mellanox.com> |
net/mlx5: E-Switch, Alloc and free unique metadata for match Introduce infrastructure to create unique metadata for match for vport without depending on vport_num. Vport uses its default metadata for match in standalone configuration but will share a different unique "bond_metadata" for match with other vports in bond configuration. Using ida to generate unique metadata for match for vports in default and bond configurations. Introduce APIs to generate, free metadata for match. Introduce APIs to set vport's bond_metadata and replace its ingress acl rules with bond_metatada. Signed-off-by: Vu Pham <vuhuong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
07bab950 |
|
28-Mar-2020 |
Vu Pham <vuhuong@mellanox.com> |
net/mlx5: E-Switch, Refactor eswitch ingress acl codes Restructure the eswitch ingress acl codes into eswitch directory and different files: . Acl ingress helper functions to acl_helper.c/h . Acl ingress functions used in offloads mode to acl_ingress_ofld.c . Acl ingress functions used in legacy mode to acl_ingress_lgy.c This patch does not change any functionality. Signed-off-by: Vu Pham <vuhuong@mellanox.com>
|
#
ea651a86 |
|
06-Nov-2019 |
Vu Pham <vuhuong@mellanox.com> |
net/mlx5: E-Switch, Refactor eswitch egress acl codes Refactor the egress acl codes so that offloads and legacy modes can configure specifically their own needs of egress acl table, groups and rules. While at it, restructure the eswitch egress acl codes into eswitch directory and different files: . Acl egress helper functions to acl_helper.c/h . Acl egress functions used in offloads mode to acl_egress_ofld.c . Acl egress functions used in legacy mode to acl_egress_lgy.c This patch does not change any functionality. Signed-off-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
14e6b038 |
|
03-Feb-2020 |
Eli Cohen <eli@mellanox.com> |
net/mlx5e: Add support for hw decapsulation of MPLS over UDP MPLS over UDP is supported in hardware by using a packet reformat object with reformat type equal L3_TUNNEL_TO_L2 which both decapsulates the outer L3, L4 and MPLS headers, and allows for setting the L2 headers of the resulting decapsulated packet. For the hardware to operate correctly, the configuration of the firmware must have FLEX_PARSER_PROFILE_ENABLE = 1. Example tc rule: tc filter add dev bareudp0 protocol all prio 1 root flower enc_dst_port \ 6635 enc_src_ip 8.8.8.23 action mpls pop protocol ip pipe \ action pedit ex munge eth dst set 00:11:22:33:44:21 pipe action \ mirred egress redirect dev enp59s0f0_0 We use pedit to set the correct destination MAC. For MPLS over UDP decapsulation to take place, the driver logic requires the following: 1. flower filter added on bareudp device. 2. action mpls pop 3. zero or more pedit munge actions 4. one redirect action Current implementation supports only IPv4 and no VLAN. tc filter show output looks like this: filter protocol all pref 1 flower chain 0 filter protocol all pref 1 flower chain 0 handle 0x1 enc_src_ip 8.8.8.24 enc_dst_port 6635 in_hw in_hw_count 1 action order 1: mpls pop protocol ip pipe index 2 ref 1 bind 1 action order 2: pedit action pipe keys 2 index 1 ref 1 bind 1 key #0 at eth+0: val 00112233 mask 00000000 key #1 at eth+4: val 44210000 mask 0000ffff action order 3: mirred (Egress Redirect to device enp59s0f0_0) stolen index 2 ref 1 bind 1 Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
d956873f |
|
12-May-2020 |
Vlad Buslov <vladbu@mellanox.com> |
net/mlx5e: Introduce kconfig var for TC support In order to improve code maintainability and readability, introduce new CONFIG_MLX5_CLS_ACT kconfig variable to control compilation of TC hardware offloads implementation. This allows distinguishing between features that require TC support (MPLSoUDP, etc.) and features that just rely on representor functionality (rep_bond for live migration, etc.). Modify rep_tc.h, rep_neigh.h, en_tc.h and chains.h files to provide stubs for functions that are called from generic code. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
f8d1edda |
|
21-Apr-2020 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: E-switch, Fix mutex init order In cited patch mutex is initialized after its used. Below call trace is observed. Fix the order to initialize the mutex early enough. Similarly follow mirror sequence during cleanup. kernel: DEBUG_LOCKS_WARN_ON(lock->magic != lock) kernel: WARNING: CPU: 5 PID: 45916 at kernel/locking/mutex.c:938 __mutex_lock+0x7d6/0x8a0 kernel: Call Trace: kernel: ? esw_vport_tbl_get+0x3b/0x250 [mlx5_core] kernel: ? mark_held_locks+0x55/0x70 kernel: ? __slab_free+0x274/0x400 kernel: ? lockdep_hardirqs_on+0x140/0x1d0 kernel: esw_vport_tbl_get+0x3b/0x250 [mlx5_core] kernel: ? mlx5_esw_chains_create_fdb_prio+0xa57/0xc20 [mlx5_core] kernel: mlx5_esw_vport_tbl_get+0x88/0xf0 [mlx5_core] kernel: mlx5_esw_chains_create+0x2f3/0x3e0 [mlx5_core] kernel: esw_create_offloads_fdb_tables+0x11d/0x580 [mlx5_core] kernel: esw_offloads_enable+0x26d/0x540 [mlx5_core] kernel: mlx5_eswitch_enable_locked+0x155/0x860 [mlx5_core] kernel: mlx5_devlink_eswitch_mode_set+0x1af/0x320 [mlx5_core] kernel: devlink_nl_cmd_eswitch_set_doit+0x41/0xb0 Fixes: 96e326878fa5 ("net/mlx5e: Eswitch, Use per vport tables for mirroring") Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
e9864539 |
|
20-Apr-2020 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: E-switch, Fix printing wrong error value When mlx5_modify_header_alloc() fails, instead of printing the error value returned, current error log prints 0. Fix by printing correct error value returned by mlx5_modify_header_alloc(). Fixes: 6724e66b90ee ("net/mlx5: E-Switch, Get reg_c1 value on miss") Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
79949985 |
|
20-Apr-2020 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: E-switch, Fix error unwinding flow for steering init failure Error unwinding is done incorrectly in the cited commit. When steering init fails, there is no need to perform steering cleanup. When vport error exists, error cleanup should be mirror of the setup routine, i.e. to perform steering cleanup before metadata cleanup. This avoids the call trace in accessing uninitialized objects which are skipped during steering_init() due to failure in steering_init(). Call trace: mlx5_cmd_modify_header_alloc:805:(pid 21128): too many modify header actions 1, max supported 0 E-Switch: Failed to create restore mod header BUG: kernel NULL pointer dereference, address: 00000000000000d0 [ 677.263079] mlx5_destroy_flow_group+0x13/0x80 [mlx5_core] [ 677.268921] esw_offloads_steering_cleanup+0x51/0xf0 [mlx5_core] [ 677.275281] esw_offloads_enable+0x1a5/0x800 [mlx5_core] [ 677.280949] mlx5_eswitch_enable_locked+0x155/0x860 [mlx5_core] [ 677.287227] mlx5_devlink_eswitch_mode_set+0x1af/0x320 [ 677.293741] devlink_nl_cmd_eswitch_set_doit+0x41/0xb0 [ 677.299217] genl_rcv_msg+0x1eb/0x430 Fixes: 7983a675ba65 ("net/mlx5: E-Switch, Enable chains only if regs loopback is enabled") Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
d65dbedf |
|
24-Apr-2020 |
Huy Nguyen <huyn@mellanox.com> |
net/mlx5: Add support for COPY steering action Add COPY type to modify_header action. IPsec feature is the first feature that needs COPY steering action. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Raed Salem <raeds@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Acked-by: Leon Romanovsky <leonro@mellanox.com>
|
#
e08a6832 |
|
08-Apr-2020 |
Leon Romanovsky <leon@kernel.org> |
net/mlx5: Update eswitch to new cmd interface Do mass update of eswitch to reuse newly introduced mlx5_cmd_exec_in*() interfaces. Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
#
84be2fda |
|
28-Mar-2020 |
Eli Cohen <eli@mellanox.com> |
net/mlx5: Fix condition for termination table cleanup When we destroy rules from slow path we need to avoid destroying termination tables since termination tables are never created in slow path. By doing so we avoid destroying the termination table created for the slow path. Fixes: d8a2034f152a ("net/mlx5: Don't use termination tables in slow path") Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
49964352 |
|
13-Mar-2020 |
Saeed Mahameed <saeedm@mellanox.com> |
net/mlx5: E-Switch: Move eswitch chains to a new directory eswitch_offloads_chains.{c,h} were just introduced this kernel release cycle, eswitch is in high development demand right now and many features are planned to be added to it. eswitch deserves its own directory and here we move these new files to there, in preparation for upcoming eswitch features and new files. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com>
|
#
8e0aa4bc |
|
18-Dec-2019 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: E-switch, Protect eswitch mode changes Currently eswitch mode change is occurring from 2 different execution contexts as below. 1. sriov sysfs enable/disable 2. devlink eswitch set commands Both of them need to access eswitch related data structures in synchronized manner. Without any synchronization below race condition exist. SR-IOV enable/disable with devlink eswitch mode change: cpu-0 cpu-1 ----- ----- mlx5_device_disable_sriov() mlx5_devlink_eswitch_mode_set() mlx5_eswitch_disable() esw_offloads_stop() esw_offloads_disable() mlx5_eswitch_disable() esw_offloads_disable() Hence, they are synchronized using a new mode_lock. eswitch's state_lock is not used as it can lead to a deadlock scenario below and state_lock is only for vport and fdb exclusive access. ip link set vf <param> netlink rcv_msg() - Lock A rtnl_lock vfinfo() esw->state_lock() - Lock B devlink eswitch_set devlink_mutex esw->state_lock() - Lock B attach_netdev() register_netdev() rtnl_lock - Lock A Alternatives considered: 1. Acquiring rtnl lock before taking esw->state_lock to follow similar locking sequence as ip link flow during eswitch mode set. rtnl lock is not good idea for two reasons. (a) Holding rtnl lock for several hundred device commands is not good idea. (b) It leads to below and more similar deadlocks. devlink eswitch_set devlink_mutex rtnl_lock - Lock A esw->state_lock() - Lock B eswitch_disable() reload() ib_register_device() ib_cache_setup_one() rtnl_lock() 2. Exporting devlink lock may lead to undesired use of it in vendor driver(s) in future. 3. Unloading representors outside of the mode_lock requires serialization with other process trying to enable the eswitch. 4. Differing the representors life cycle to a different workqueue requires synchronization with func_change_handler workqueue. Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
ebf77bb8 |
|
18-Dec-2019 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: E-switch, Extend eswitch enable to handle num_vfs change Subsequent patch protects eswitch mode changes across sriov and devlink interfaces. It is desirable for eswitch to provide thread safe eswitch enable and disable APIs. Hence, extend eswitch enable API to optionally update num_vfs when requested. In subsequent patch, eswitch num_vfs are updated after all the eswitch users eswitch drops its reference count. Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
ae24432c |
|
14-Dec-2019 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: Split eswitch mode check to different helper function In order to check eswitch state under a lock, prepare code to split capability check and eswitch state check into two helper functions. Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
c8508713 |
|
19-Mar-2020 |
Roi Dayan <roid@mellanox.com> |
net/mlx5: E-Switch, free flow_group_in after creating the restore table We allocate a temporary memory but forget to free it. Fixes: 11b717d61526 ("net/mlx5: E-Switch, Get reg_c0 value on CQE") Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
7983a675 |
|
18-Mar-2020 |
Paul Blakey <paulb@mellanox.com> |
net/mlx5: E-Switch, Enable chains only if regs loopback is enabled Register c0 loopback is needed to fully support chains and prios. Enable chains and prio only if loopback (of reg c1 which came together with c0), is enabled. To be able to check that, move enabling of loopback before eswitch chains init. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
60acc105 |
|
18-Mar-2020 |
Paul Blakey <paulb@mellanox.com> |
net/mlx5: E-Switch, Enable restore table only if reg_c1 is supported Reg c0/c1 matching, rewrite of regs c0/c1, and copy header of regs c1,B is needed for the restore table to function, might not be supported by firmware, and creation of the restore table or the copy header will fail. Check reg_c1 loopback support, as firmware which supports this, should have all of the above. Fixes: 11b717d61526 ("net/mlx5: E-Switch, Get reg_c0 value on CQE") Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
d8a2034f |
|
26-Feb-2020 |
Eli Cohen <eli@mellanox.com> |
net/mlx5: Don't use termination tables in slow path Don't use termination tables for packets that are steered to the slow path, as a pre-step for supporting packet encap (packet reformat) action on termination tables. Packet encap (reformat action) actions steer the packet to the slow path until outer arp entries are resolved. Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
0e6fa491 |
|
17-Dec-2019 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: Avoid deriving mlx5_core_dev second time All callers needs to work on mlx5_core_dev and it is already derived before calling mlx5_devlink_eswitch_check(). Hence, accept mlx5_core_dev in mlx5_devlink_eswitch_check(). Given that it works on mlx5_core_dev change helper function name to drop devlink prefix. Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
2bb72e7e |
|
14-Dec-2019 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: E-switch, Annotate termtbl_mutex mutex destroy Annotate mutex destroy to keep it symmetric to init sequence. It should be destroyed after its users (representor netdevices) are destroyed in below flow. esw_offloads_disable() esw_offloads_unload_rep() Hence, initialize the mutex before creating the representors which uses it. Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
5c2aa8ae |
|
17-Jan-2020 |
Mark Bloch <markb@mellanox.com> |
net/mlx5: Accept flow rules without match Allow passing NULL spec when creating a flow rule. Such rules will act as "catch all" flow rules. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
4110fc59 |
|
12-Nov-2019 |
Bodong Wang <bodong@mellanox.com> |
net/mlx5: E-Switch, Refactor unload all reps per rep type Following introduction of per vport configuration of vport and rep, unload all reps per rep type is still needed as IB reps can be unloaded individually. However, a few internal functions exist purely for this purpose, merge them to a single function. This patch doesn't change any existing functionality. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
23bb50cf |
|
12-Nov-2019 |
Bodong Wang <bodong@mellanox.com> |
net/mlx5: E-Switch, Update VF vports config when num of VFs changed Currently, ECPF eswitch manager does one-time only configuration for VF vports when device switches to offloads mode. However, when num of VFs changed from host side, driver doesn't update VF vports configurations. Hence, perform VFs vport configuration update whenever num_vfs change event occurs. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
c2d7712c |
|
11-Nov-2019 |
Bodong Wang <bodong@mellanox.com> |
net/mlx5: E-Switch, Introduce per vport configuration for eswitch modes Both legacy and offload modes require vport setup, only offload mode requires rep setup. Before this patch, vport and rep operations are separated applied to all relevant vports in different stages. Change to use per vport configuration, so that vport and rep operations are modularized per vport. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
6fb0701a |
|
11-Mar-2020 |
Paul Blakey <paulb@mellanox.com> |
net/mlx5: E-Switch, Add support for offloading rules with no in_port FTEs in global tables may match on packets from multiple in_ports. Provide the capability to omit the in_port match condition. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d18296ff |
|
11-Mar-2020 |
Paul Blakey <paulb@mellanox.com> |
net/mlx5: E-Switch, Introduce global tables Currently, flow tables are automatically connected according to their <chain,prio,level> tuple. Introduce global tables which are flow tables that are detached from the eswitch chains processing, and will be connected by explicitly referencing them from multiple chains. Add this new table type, and allow connecting them by refenece. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5b7cb745 |
|
11-Mar-2020 |
Paul Blakey <paulb@mellanox.com> |
net/mlx5: E-Switch, Enable reg c1 loopback when possible Enable reg c1 loopback if firmware reports it's supported, as this is needed for restoring packet metadata (e.g chain). Also define helper to query if it is enabled. Signed-off-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cc617ced |
|
18-Dec-2019 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: E-switch, make query inline mode a static function mlx5_eswitch_inline_mode_get() is used only in eswitch_offloads.c. Hence, make it static and adjacent to its caller function. Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
d9fb932f |
|
04-Mar-2020 |
Dan Carpenter <dan.carpenter@oracle.com> |
net/mlx5e: Fix an IS_ERR() vs NULL check The esw_vport_tbl_get() function returns error pointers on error. Fixes: 96e326878fa5 ("net/mlx5e: Eswitch, Use per vport tables for mirroring") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
1e62e222 |
|
27-Jan-2020 |
Majd Dibbiny <majd@mellanox.com> |
net/mlx5: E-Switch, Use vport metadata matching only when mandatory Multi-port RoCE mode requires tagging traffic that passes through the vport. This matching can cause performance degradation, therefore disable it and use the legacy matching on vhca_id and source_port when possible. Fixes: 92ab1eb392c6 ("net/mlx5: E-Switch, Enable vport metadata matching if firmware supports it") Signed-off-by: Majd Dibbiny <majd@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
87dac697 |
|
26-Dec-2019 |
Jianbo Liu <jianbol@mellanox.com> |
net/mlx5e: Add devlink fdb_large_groups parameter Add a devlink parameter to control the number of large groups in a autogrouped flow table. The default value is 15, and the range is between 1 and 1024. The size of each large group can be calculated according to the following formula: size = 4M / (fdb_large_groups + 1). Examples: - Set the number of large groups to 20. $ devlink dev param set pci/0000:82:00.0 name fdb_large_groups \ cmode driverinit value 20 Then run devlink reload command to apply the new value. $ devlink dev reload pci/0000:82:00.0 - Read the number of large groups in flow table. $ devlink dev param show pci/0000:82:00.0 name fdb_large_groups pci/0000:82:00.0: name fdb_large_groups type driver-specific values: cmode driverinit value 20 Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
96e32687 |
|
14-Jan-2020 |
Eli Cohen <eli@mellanox.com> |
net/mlx5e: Eswitch, Use per vport tables for mirroring When using port mirroring, we forward the traffic to another table and use that table to forward to the mirrored vport. Since the hardware loses the values of reg c, and in particular reg c0, we fail the match on the input vport which previously existed in reg c0. To overcome this situation, we use a set of per vport tables, positioned at the lowest priority, and forward traffic to those tables. Since these tables are per vport, we can avoid matching on reg c0. Fixes: c01cfd0f1115 ("net/mlx5: E-Switch, Add match on vport metadata for rule in fast path") Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
6724e66b |
|
15-Feb-2020 |
Paul Blakey <paulb@mellanox.com> |
net/mlx5: E-Switch, Get reg_c1 value on miss The HW model implicitly decapsulates tunnels on chain 0 and sets reg_c1 with the mapped tunnel id. On miss, the packet does not have the outer header and the driver restores the tunnel information from the tunnel id. Getting reg_c1 value in software requires enabling reg_c1 loopback and copying reg_c1 to reg_b. reg_b comes up on CQE as cqe->imm_inval_pkey. Use the reg_c0 restoration rules to also copy reg_c1 to reg_B. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
11b717d6 |
|
15-Feb-2020 |
Paul Blakey <paulb@mellanox.com> |
net/mlx5: E-Switch, Get reg_c0 value on CQE On RX side create a restore table in OFFLOADS namespace. This table will match on all values for reg_c0 we will use, and set it to the flow_tag. This flow tag can then be read on the CQE. As there is no copy action from reg c0 to flow tag, instead we have to set the flow tag explictily. We add an API so callers can add all the used reg_c0 values (tags) and for each of those we add a restore rule. This will be used in a following patch to save the miss chain mapping tag on reg_c0 and from it restore the tc chain on the skb. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
0f0d3827 |
|
15-Feb-2020 |
Paul Blakey <paulb@mellanox.com> |
net/mlx5: E-Switch, Move source port on reg_c0 to the upper 16 bits Multi chain support requires the miss path to continue the processing from the last chain id, and for that we need to save the chain miss tag (a mapping for 32bit chain id) on reg_c0 which will come in a next patch. Currently reg_c0 is exclusively used to store the source port metadata, giving it 32bit, it is created from 16bits of vcha_id, and 16bits of vport number. We will move this source port metadata to upper 16bits, and leave the lower bits for the chain miss tag. We compress the reg_c0 source port metadata to 16bits by taking 8 bits from vhca_id, and 8bits from the vport number. Since we compress the vport number to 8bits statically, and leave two top ids for special PF/ECPF numbers, we will only support a max of 254 vports with this strategy. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
383de108 |
|
12-Feb-2020 |
Dmytro Linkin <dmitrolin@mellanox.com> |
net/mlx5e: Don't clear the whole vf config when switching modes There is no need to reset all vf config (except link state) between legacy and switchdev modes changes. Also, set link state to AUTO, when legacy enabled. Fixes: 3b83b6c2e024 ("net/mlx5e: Clear VF config when switching modes") Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
3b83b6c2 |
|
13-Jan-2020 |
Dmytro Linkin <dmitrolin@mellanox.com> |
net/mlx5e: Clear VF config when switching modes Currently VF in LEGACY mode are not able to go up. Also in OFFLOADS mode, when switching to it first time, VF can go up independently to his representor, which is not expected. Perform clearing of VF config when switching modes and set link state to AUTO as default value. Also, when switching to OFFLOADS mode set link state to DOWN, which allow VF link state to be controlled by its REP. Fixes: 1ab2068a4c66 ("net/mlx5: Implement vports admin state backup/restore") Fixes: 556b9d16d3f5 ("net/mlx5: Clear VF's configuration on disabling SRIOV") Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com> Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
93b8a7ec |
|
31-Dec-2019 |
Paul Blakey <paulb@mellanox.com> |
net/mlx5: Fix lowest FDB pool size The pool sizes represent the pool sizes in the fw. when we request a pool size from fw, it will return the next possible group. We track how many pools the fw has left and start requesting groups from the big to the small. When we start request 4k group, which doesn't exists in fw, fw wants to allocate the next possible size, 64k, but will fail since its exhausted. The correct smallest pool size in fw is 128 and not 4k. Fixes: e52c28024008 ("net/mlx5: E-Switch, Add chains and priorities") Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
278d51f2 |
|
20-Nov-2019 |
Paul Blakey <paulb@mellanox.com> |
net/mlx5: E-Switch, Increase number of chains and priorities Increase the number of chains and priorities to support the whole range available in tc. We use unmanaged tables and ignore flow level to create more tables than what we declared to fs_core steering, and we manage the connections between the tables themselves. To support that we need FW with ignore_flow_level capability. Otherwise the old behaviour will be used, where we are limited by the number of levels we declared (4 chains, 16 prios). Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
39ac237c |
|
07-Jan-2020 |
Paul Blakey <paulb@mellanox.com> |
net/mlx5: E-Switch, Refactor chains and priorities To support the entire chain and prio range (32bit + 16bit), instead of a using a static array of chains/prios of limited size, create them dynamically, and use a rhashtable to search for existing chains/prio combinations. This will be used in next patch to actually increase the number using unamanged tables support and ignore flow level capability. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
e66cbc96 |
|
26-Nov-2019 |
Paul Blakey <paulb@mellanox.com> |
net/mlx5: ft: Use getter function to get ft chain FT chain is defined as the next chain after tc. To prepare for next patches that will increase the number of tc chains available at runtime, use a getter function to get this value. The define is still used in static fs_core allocation, to calculate the number of chains. This static allocation will be used if the relevant capabilities won't be available to support dynamic chains. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
61dc7b01 |
|
14-Nov-2019 |
Paul Blakey <paulb@mellanox.com> |
net/mlx5: Refactor mlx5_create_auto_grouped_flow_table Refactor mlx5_create_auto_grouped_flow_table() to use ft_attr param which already carries the max_fte, prio and flags memebers, and is used the same in similar mlx5_create_flow_table() function. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
b7826076 |
|
12-Nov-2019 |
Parav Pandit <parav@mellanox.com> |
net/mlx5e: E-switch, Fix Ingress ACL groups in switchdev mode for prio tag In cited commit, when prio tag mode is enabled, FTE creation fails due to missing group with valid match criteria. Hence, (a) create prio tag group metadata_prio_tag_grp when prio tag is enabled with match criteria for vlan push FTE. (b) Rename metadata_grp to metadata_allmatch_grp to reflect its purpose. Also when priority tag is enabled, delete metadata settings after deleting ingress rules, which are using it. Tide up rest of the ingress config code for unnecessary labels. Fixes: 10652f39943e ("net/mlx5: Refactor ingress acl configuration") Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
34b13cb3 |
|
11-Nov-2019 |
Paul Blakey <paulb@mellanox.com> |
net/mlx5: Accumulate levels for chains prio namespaces Tc chains are implemented by creating a chained prio steering type, and inside it there is a namespace for each chain (FDB_TC_MAX_CHAINS). Each of those has a list of priorities. Currently, all namespaces in a prio start at the parent prio level. But since we can jump from chain (namespace) to another chain in the same prio, we need the levels for higher chains to be higher as well. So we created unused prios to account for levels in previous namespaces. Fix that by accumulating the namespaces levels if we are inside a chained type prio, and removing the unused prios. Fixes: 328edb499f99 ('net/mlx5: Split FDB fast path prio to multiple namespaces') Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
2cf2954b |
|
11-Nov-2019 |
Paul Blakey <paulb@mellanox.com> |
net/mlx5: Rename FDB_* tc related defines to FDB_TC_* defines Rename it to prepare for next patch that will add a different type of offload to the FDB. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
f382b0df |
|
28-Oct-2019 |
Roi Dayan <roid@mellanox.com> |
net/mlx5e: Fix eswitch debug print of max fdb flow The value is already the calculation so remove the log prefix. Fixes: e52c28024008 ("net/mlx5: E-Switch, Add chains and priorities") Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
9ea7f01f |
|
05-Nov-2019 |
Colin Ian King <colin.king@canonical.com> |
net/mlx5: fix spelling mistake "metdata" -> "metadata" There is a spelling mistake in a esw_warn warning message. Fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
556b9d16 |
|
03-Sep-2019 |
Aya Levin <ayal@mellanox.com> |
net/mlx5: Clear VF's configuration on disabling SRIOV When setting number of VFs to 0 (disable SRIOV), clear VF's configuration. Signed-off-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
238302fa |
|
28-Oct-2019 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: E-switch, Enable metadata on own vport Currently on ECPF, metadata is enabled on the ECPF vport = 0xfffe (manager vport). Metadata when supported, must be enabled on own vport which is used to pass metadata to vport of NIC Rx Flow Table. Due to this error, traffic tagged by ingress ACL is not processed correctly at NIC rx flow table level which is supposed to work on metadata tag. Hence, instead of working on eswitch manager vport, always working on eswitch own vport regardless of PF or ECPF. Given that mlx5_eswitch_query/modify_esw_vport_context() is used to access other vport in legacy mode and own vport settings in switchdev mode, extend low level API to explicitly specify other_vport. Fixes: c1286050cf47 ("net/mlx5: E-Switch, Pass metadata from FDB to eswitch manager") Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
10652f39 |
|
28-Oct-2019 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: Refactor ingress acl configuration Drop, untagged, spoof check and untagged spoof check flow groups are limited to legacy mode only. Therefore, following refactoring is done to (a) improve code readability (b) have better code split between legacy and offloads mode 1. Move legacy flow groups under legacy structure 2. Add validity check for group deletion 3. Restrict scope of esw_vport_disable_ingress_acl to legacy mode 4. Rename esw_vport_enable_ingress_acl() to esw_vport_create_ingress_acl_table() and limit its scope to table creation 5. Introduce legacy flow groups creation helper esw_legacy_create_ingress_acl_groups() and keep its scope to legacy mode 6. Reduce offloads ingress groups from 4 to just 1 metadata group per vport 7. Removed redundant IS_ERR_OR_NULL as entries are marked NULL on free. 8. Shortern error message to remove redundant 'E-switch' Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
a962d7a6 |
|
28-Oct-2019 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: Restrict metadata disablement to offloads mode Now that there is clear separation for acl setup/cleanup between legacy and offloads mode, limit metdata disablement to offloads mode. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
748da30b |
|
28-Oct-2019 |
Vu Pham <vuhuong@mellanox.com> |
net/mlx5: E-switch, Offloads shift ACL programming during enable/disable vport Currently legacy mode enables ACL while enabling vport, while offloads mode enable ACL when moving to offloads mode. Bring consistency to both modes by enabling/disabling ACL when enabling/disabling a vport. It also eliminates creating ingress ACL table on unused ECPF vport in offloads mode. Signed-off-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
89a0f1fb |
|
28-Oct-2019 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: E-switch, Offloads introduce and use per vport acl tables APIs Introduce and use per vport ACL tables creation and destroy APIs, so that subsequently patch can use them during enabling/disabling a vport. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
925a6acc |
|
28-Oct-2019 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: E-switch, Prepare code to handle vport enable error In subsequent patch, esw_enable_vport() could fail and return error. Prepare code to handle such error. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
d68316b5 |
|
28-Oct-2019 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: Move metdata fields under offloads structure Metadata fields are offload mode specific. To improve code readability, move metadata under offloads structure. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
fdde49e0 |
|
28-Oct-2019 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: E-switch, Introduce and use vlan rule config helper Between legacy mode and switchdev mode, only two fields are changed, vlan_tag and flow action. Hence to avoid duplicte code between two modes, introduce and and use helper function to configure allowed VLAN rule. While at it, get rid of duplicate debug message. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
b1a3380a |
|
28-Oct-2019 |
Vu Pham <vuhuong@mellanox.com> |
net/mlx5: E-Switch, Rename ingress acl config in offloads mode Changing the function name esw_ingress_acl_common_config() to esw_ingress_acl_config() to be consistent with egress config function naming in offloads mode. Signed-off-by: Vu Pham <vuhuong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
6d94e610 |
|
28-Oct-2019 |
Vu Pham <vuhuong@mellanox.com> |
net/mlx5: E-Switch, Rename egress config to generic name Refactor vport egress config in offloads mode Refactoring vport egress configuration in offloads mode that includes egress prio tag configuration. This makes code symmetric to ingress configuration. Signed-off-by: Vu Pham <vuhuong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
752d3dc0 |
|
29-Aug-2019 |
Dmytro Linkin <dmitrolin@mellanox.com> |
net/mlx5e: Remove incorrect match criteria assignment line Driver have function, which enable match criteria for misc parameters in dependence of eswitch capabilities. Fixes: 4f5d1beadc10 ("Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux") Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com> Reviewed-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
e53e6655 |
|
04-Sep-2019 |
Colin Ian King <colin.king@canonical.com> |
net/mlx5: fix missing assignment of variable err The error return from a call to mlx5_flow_namespace_set_peer is not being assigned to variable err and hence the error check following the call is currently not working. Fix this by assigning ret as intended. Addresses-Coverity: ("Logically dead code") Fixes: 8463daf17e80 ("net/mlx5: Add support to use SMFS in switchdev mode") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
8463daf1 |
|
18-Aug-2019 |
Maor Gottlieb <maorg@mellanox.com> |
net/mlx5: Add support to use SMFS in switchdev mode In case that flow steering mode of the driver is SMFS (Software Managed Flow Steering), then use the DR (SW steering) API to create the steering objects. In addition, add a call to the set peer namespace when switchdev gets devcom pair event. It is required to support VF LAG in SMFS. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
2b688ea5 |
|
15-Aug-2019 |
Maor Gottlieb <maorg@mellanox.com> |
net/mlx5: Add flow steering actions to fs_cmd shim layer Add flow steering actions: modify header and packet reformat to the fs_cmd shim layer. This allows each namespace to define possibly different functionality for alloc/dealloc action commands. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
ef2e4094 |
|
26-Jul-2019 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: E-switch, Removed unused hwid Currently mlx5_eswitch_rep stores same hw ID for all representors. However it is never used from this structure. It is always used from mlx5_vport. Hence, remove unused field. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
93b3586e |
|
17-Jul-2019 |
Huy Nguyen <huyn@mellanox.com> |
net/mlx5: Support inner header match criteria for non decap flow action We have an issue that OVS application creates an offloaded drop rule that drops VXLAN traffic with both inner and outer header match criteria. mlx5_core driver detects correctly the inner and outer header match criteria but does not enable the inner header match criteria due to an incorrect assumption in mlx5_eswitch_add_offloaded_rule that only decap rule needs inner header criteria. Solution: Remove mlx5_esw_flow_attr's match_level and tunnel_match_level and add two new members: inner_match_level and outer_match_level. inner/outer_match_level is set to NONE if the inner/outer match criteria is not specified in the tc rule creation request. The decap assumption is removed and the code just needs to check for inner/outer_match_level to enable the corresponding bit in firmware's match_criteria_enable value. Fixes: 6363651d6dd7 ("net/mlx5e: Properly set steering match levels for offloaded TC decap rules") Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
5896b972 |
|
29-Jul-2019 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: E-switch, Tide up eswitch config sequence Currently for PF and ECPF vports, representors are created before their eswitch hardware ports are initialized in below flow. mlx5_eswitch_enable() esw_offloads_init() esw_offloads_load_all_reps() [..] esw_enable_vport() However for VFs, vports are initialized before creating their respective netdev represnetors in event handling context. Similarly while disabling eswitch, first hardware vports are disabled, followed by destroying their representors. Here while underlying vports gets destroyed but its respective user facing netdevice can still exist on which user can continue to perform more offload operations. Instead, its more accurate to do enable_eswitch switchdev mode: 1. perform FDB tables initialization 2. initialize hw vport 3. create and publish representor for this vport disable_eswitch switchdev mode: 1. destroy user facing representor for the vport 2. disable hw vport 3. perform FDB tables cleanup Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
332bd3a5 |
|
29-Jul-2019 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: E-switch, Combine metadata enable/disable functionality Except bit toggling code, rest of the code is same to enable/disable metadata passing functionality. Hence, combine them to single function and control using enable flag. Also instead of checking metadata supported at multiple places, fold into the helper function. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
0e18134f |
|
11-Sep-2018 |
Vlad Buslov <vladbu@mellanox.com> |
net/mlx5e: Eswitch, use state_lock to synchronize vlan change esw->state_lock is already used to protect vlan vport configuration change. However, all preparation and correctness checks, and code that sets vport data are not protected by this lock and assume external synchronization by rtnl lock. In order to remove dependency on rtnl lock, extend esw->state_lock protection to whole eswitch vlan add/del functions. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
525e84be |
|
18-Nov-2018 |
Vlad Buslov <vladbu@mellanox.com> |
net/mlx5e: Eswitch, change offloads num_flows type to atomic64 Eswitch implements its own locking by means of state_lock mutex and multiple fine-grained lock in containing data structures, and is supposed to not rely on rtnl lock. However, eswitch offloads num_flows type is a regular long long integer and cannot be modified concurrently. This is an implicit assumptions that mlx5 tc is serialized (by rtnl lock or any other means). In order to remove implicit dependency on rtnl lock, change num_flows type to atomic64 to allow concurrent modifications. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
3a5ee3b3 |
|
14-Jul-2019 |
Fuqian Huang <huangfq.daxian@gmail.com> |
ethernet: remove redundant memset kvzalloc already zeroes the memory during the allocation. pci_alloc_consistent calls dma_alloc_coherent directly. In commit 518a2f1925c3 ("dma-mapping: zero memory returned from dma_alloc_*"), dma_alloc_coherent has already zeroed the memory. So the memset after these function is not needed. Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9446d17e |
|
11-Jul-2019 |
Saeed Mahameed <saeedm@mellanox.com> |
net/mlx5: E-Switch, Reduce ingress acl modify metadata stack usage Fix the following compiler warning: In function ‘esw_vport_add_ingress_acl_modify_metadata’: the frame size of 1084 bytes is larger than 1024 bytes [-Wframe-larger-than=] Since the structure is never written to, we can statically allocate it to avoid the stack usage. Fixes: 7445cfb1169c ("net/mlx5: E-Switch, Tag packet with vport number in VF vports and uplink ingress ACLs") Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Jianbo Liu <jianbol@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9a64144d |
|
17-Jun-2019 |
Maor Gottlieb <maorg@mellanox.com> |
net/mlx5: E-Switch, Fix default encap mode Encap mode is related to switchdev mode only. Move the init of the encap mode to eswitch_offloads. Before this change, we reported that eswitch supports encap, even tough the device was in non SRIOV mode. Fixes: 7768d1971de67 ('net/mlx5: E-Switch, Add control for encapsulation') Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
dd28087c |
|
07-Jun-2019 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: Refactor mlx5_esw_query_functions for modularity Functions change event output data size changes when functions other than VFs will be enabled in HCA CAP. With current API, multiple callers needs to align, calculate accurate size of the output data depending on number on non VF functions enabled in the device. Instead of duplicating such math at multiple places, refactor mlx5_esw_query_functions() to return raw output allocated by itself. Caller must free the allocated memory using kvfree() as described in the function comment section. This hides calcuation within mlx5_esw_query_functions() and provides simpler API. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
7e736f9a |
|
07-Jun-2019 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: E-Switch prepare functions change handler to be modular Eswitch function change handler will service multiple type of events for VFs and non VF functions update. Hence, introduce and use the helper function esw_vfs_changed_event_handler() for handling change in num VFs to improve the code readability. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
2752b823 |
|
14-May-2019 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: Introduce and use mlx5_eswitch_get_total_vports() Instead MLX5_TOTAL_VPORTS, use mlx5_eswitch_get_total_vports(). mlx5_eswitch_get_total_vports() in subsequent patch accounts for SF vports as well. Expanding MLX5_TOTAL_VPORTS macro would require exposing SF internals to more generic vport.h header file. Such exposure is not desired. Hence a mlx5_eswitch_get_total_vports() is introduced. Given that mlx5_eswitch_get_total_vports() API wants to work on const mlx5_core_dev*, change its helper functions also to accept const *dev. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
411ec9e0 |
|
28-Jun-2019 |
Bodong Wang <bodong@mellanox.com> |
net/mlx5: E-Switch, Consider host PF for inline mode and vlan pop When ECPF is the eswitch manager, host PF is treated like other VFs. Driver should do the same for inline mode and vlan pop. Add new iterators to include host PF if ECPF is the eswitch manager. Signed-off-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
db68cc56 |
|
28-Jun-2019 |
Bodong Wang <bodong@mellanox.com> |
net/mlx5: E-Switch, Use iterator for vlan and min-inline setups Use the defined iterators to traversal VF reps/vport. Also, rely on num of VFs rather than the counter of enabled vports as PF will also be enabled from ECPF side, and the counter will be different from num of VFs. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
16fff98a |
|
28-Jun-2019 |
Bodong Wang <bodong@mellanox.com> |
net/mlx5: E-Switch, Reg/unreg function changed event at correct stage When driver is doing eswitch mode change, it's critical to keep number of enabled VFs unchanged. However, it can be changed on the fly once function changed event is registered. To remove this uncertainty, function changed event should not be registered before all setups, and first be unregistered before all cleanups. Wrap this functionality together with vport event handler. Fixes: 61fc880839e6 ("net/mlx5: E-Switch, Handle representors creation in handler context") Signed-off-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
062f4bf4 |
|
28-Jun-2019 |
Bodong Wang <bodong@mellanox.com> |
net/mlx5: E-Switch, Consolidate eswitch function number of VFs Enabled number of VFs is key for eswich manager to do flow steering initialization and vport configurations. However, the number of enabled VFs may come from two sources as below. PF: num of VFs is provided by enabled SR-IOV of itself. ECPF: num of VFs is provided by enabled SR-IOV from its peer PF. And SR-IOV can't be enabled from ECPF itself. Current driver handles the two cases in different stages and passing the number of enabled VFs among a large scope of internal functions. It is usually hard to find out where is the real number of VFs from due to layers of argument pass-in. This patch consolidated that number from the entry point of doing eswitch setup, and maintained a copy so that eswitch functions can refer to it directly. Eswitch driver shall always use this number when referring to enabled number of VFs, don't use other numbers such as from SR-IOV. Signed-off-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
f6455de0 |
|
28-Jun-2019 |
Bodong Wang <bodong@mellanox.com> |
net/mlx5: E-Switch, Refactor eswitch SR-IOV interface Devlink eswitch mode is not necessarily related to SR-IOV, e.g, ECPF can be at offload mode when SR-IOV is not enabled. Rename the interface and eswitch mode names to decouple from SR-IOV, and cleanup eswitch messages accordingly. Signed-off-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
e1d974d0 |
|
28-Jun-2019 |
Bodong Wang <bodong@mellanox.com> |
net/mlx5: Handle host PF vport mac/guid for ECPF When ECPF is eswitch manager, it has the privilege to query and configure the mac and node guid of host PF. While vport number of host PF is 0, the vport command should be issued with other_vport set in this case as the cmd is issued by ECPF vport(0xfffe). Add a specific function to query own vport mac. Low level functions are used by vport manager to query/modify any vport mac and node guid. Signed-off-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
5ccf2770 |
|
28-Jun-2019 |
Bodong Wang <bodong@mellanox.com> |
net/mlx5: Don't handle VF func change if host PF is disabled When ECPF eswitch manager is at offloads mode, it monitors functions changed event from host PF side and acts according to the number of VFs enabled/disabled. As ECPF and host PF work in two independent hosts, it's possible that host PF OS reboots but ECPF system is still kept on and continues monitoring events from host PF. When kernel from host PF side is booting, PCI iov driver does sriov_init and compute_max_vf_buses by iterating over all valid num of VFs. This triggers FLR and generates functions changed events, even though host PF HCA is not enabled at this time. However, ECPF is not aware of this information, and still handles these events as usual. ECPF system will see massive number of reps are created, but destroyed immediately once creation finished. To eliminate this noise, a bit is added to host parameter context to indicate host PF is disabled. ECPF will not handle the VF changed event if this bit is set. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
2f69e591 |
|
28-Jun-2019 |
Bodong Wang <bodong@mellanox.com> |
{IB, net}/mlx5: E-Switch, Use index of rep for vport to IB port mapping In the single IB device mode, the mapping between vport number and rep relies on a counter. However for dynamic vport allocation, it is desired to keep consistent map of eswitch vport and IB port. Hence, simplify code to remove the free running counter and instead use the available vport index during load/unload sequence from the eswitch. Signed-off-by: Bodong Wang <bodong@mellanox.com> Suggested-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
d6518db2 |
|
28-Jun-2019 |
Bodong Wang <bodong@mellanox.com> |
net/mlx5: E-Switch, Use vport index when init rep Driver is referring to the array index when doing rep initialization, using vport is confusing as it's normally interpreted as vport number. This patch doesn't change any functionality. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
92ab1eb3 |
|
25-Jun-2019 |
Jianbo Liu <jianbol@mellanox.com> |
net/mlx5: E-Switch, Enable vport metadata matching if firmware supports it As the ingress ACL rules save vhca id and vport number to packet's metadata REG_C_0, and the metadata matching for the rules in both fast path and slow path are all added, enable this feature if supported. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
a5641cb5 |
|
25-Jun-2019 |
Jianbo Liu <jianbol@mellanox.com> |
net/mlx5: E-Switch, Add match on vport metadata for rule in slow path In slow path, packet that not matched by any offloaded rule is forwarded to eswitch vport manager for further processing. Add matching on metadata for peer miss rules in FDB, and rules which forward packet to correct representor in esw manager NIC_RX table. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
c1286050 |
|
25-Jun-2019 |
Jianbo Liu <jianbol@mellanox.com> |
net/mlx5: E-Switch, Pass metadata from FDB to eswitch manager In order to do matching on metadata in slow path when demuxing traffic to representors, explicitly enable the feature that allows HW to pass metadata REG_C_0 from FDB to eswitch manager NIC_RX table. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
c01cfd0f |
|
25-Jun-2019 |
Jianbo Liu <jianbol@mellanox.com> |
net/mlx5: E-Switch, Add match on vport metadata for rule in fast path If FW's capabilities and configurations meet the requirement of vport metadata matching, this feature will be used. As the information about vport number and vhca_id related to packet is already stored to its metadata register, which is used as an indicator for perticular vport, now we can change to match on this metadata for all the offloading rules in fast path. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
7445cfb1 |
|
25-Jun-2019 |
Jianbo Liu <jianbol@mellanox.com> |
net/mlx5: E-Switch, Tag packet with vport number in VF vports and uplink ingress ACLs When a dual-port VHCA sends a RoCE packet on its non-native port, and the packet arrives to its affiliated vport FDB, a mismatch might occur on the rules that match the packet source vport as it is not represented by single VHCA only in this case. So we change to match on metadata instead of source vport. To do that, a rule is created in all vports and uplink ingress ACLs, to save the source vport number and vhca id in the packet's metadata in order to match on it later. The metadata register used is the first of the 32-bit type C registers. It can be used for matching and header modify operations. The higher 16 bits of this register are for vhca id, and the lower 16 ones is for vport number. This change is not for dual-port RoCE only. If HW and FW allow, the vport metadata matching is enabled by default. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
91d6291c |
|
25-Jun-2019 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: Introduce a helper API to check VF vport Introduce a helper API mlx5_eswitch_is_vf_vport() to check if a given vport_num belongs to VF or not. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Jianbo Liu <jianbol@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
98fdbea5 |
|
12-Jun-2019 |
Leon Romanovsky <leon@kernel.org> |
net/mlx5: Declare more strictly devlink encap mode Devlink has UAPI declaration for encap mode, so there is no need to be loose on the data get/set by drivers. Update call sites to use enum devlink_eswitch_encap_mode instead of plain u8. Suggested-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Petr Vorel <pvorel@suse.cz>
|
#
10ee82ce |
|
10-Jun-2019 |
Bodong Wang <bodong@mellanox.com> |
net/mlx5: E-Switch, Return raw output for query esw functions Current function only returns host num of VFs, later patch requires other params such as host maximum num of VFs. Return the raw output so that caller can extract info as needed. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
ac35dcd6 |
|
10-Jun-2019 |
Vu Pham <vuhuong@mellanox.com> |
net/mlx5: E-Switch, Handle representors creation in handler context Unified representors creation in esw_functions_changed context handler. Emulate the esw_function_changed event for FW/HW that does not support this event. Signed-off-by: Vu Pham <vuhuong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
10caabda |
|
18-Apr-2019 |
Oz Shlomo <ozsh@mellanox.com> |
net/mlx5e: Use termination table for VLAN push actions HW does not support push VLAN action in the RX direction (packets arriving from the wire). The FW works around this limitation by haripining the packet. The hairpin workaround applies only when the push VLAN action is specified in a termination table, assuring that there are no actions following the haripin. Instantiate termination table for push VLAN actions. Re-use identical terminating tables for increased HW cache efficiency. Signed-off-by: Oz Shlomo <ozsh@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
d4a18e16 |
|
30-Jan-2019 |
Yevgeny Kliteynik <kliteyn@mellanox.com> |
net/mlx5e: Enable setting multiple match criteria for flow group When filling in flow spec match criteria, to allow previous modifications of the match criteria, use "|=" rather than "=". Tunnel options are parsed before the match criteria of the offloaded flow are being set. If the the flow that we're about to offload has encapsulation options, the flow group might need to match on additional criteria. For Geneve, an additional flow group matching parameter should be used - misc3. The appropriate bit in the match criteria is set while parsing the tunnel options, so the criteria value shouldn't be overwritten. This is a pre-step for supporting Geneve TLV options offload. Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
8693115a |
|
29-May-2019 |
Parav Pandit <parav@mellanox.com> |
{IB,net}/mlx5: Constify rep ops functions pointers Currently for every representor type and for every single vport, representer function pointers copy is stored even though they don't change from one to other vport. Additionally priv data entry for the rep is not passed during registration, but its copied. It is used (set and cleared) by the user of the reps. As we want to scale vports, to simplify and also to split constants from data, 1. Rename mlx5_eswitch_rep_if to mlx5_eswitch_rep_ops as to match _ops prefix with other standard netdev, ibdev ops. 2. Constify the IB and Ethernet rep ops structure. 3. Instead of storing copy of all rep function pointers, store copy per eswitch rep type. 4. Split data and function pointers to mlx5_eswitch_rep_ops and mlx5_eswitch_rep_data. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
6706a3b9 |
|
29-May-2019 |
Vu Pham <vuhuong@mellanox.com> |
net/mlx5: E-Switch, Honor eswitch functions changed event cap Whenever device supports eswitch functions changed event, honor such device setting. Do not limit it to ECPF. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
cd56f929 |
|
29-May-2019 |
Vu Pham <vuhuong@mellanox.com> |
net/mlx5: E-Switch, Replace host_params event with functions_changed event To support sriov on a E-Switch manager, num_vfs are queried to the firmware whenever E-Switch manager is notified by esw_functions_changed event. Replace host_params event with esw_functions_changed event that reflects more appropriate naming. While at it, also correct num_vfs type from int to u16 as expected by the function mlx5_esw_query_functions(). Signed-off-by: Vu Pham <vuhuong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
95585800 |
|
13-May-2019 |
Eli Britstein <elibr@mellanox.com> |
net/mlx5e: Fix number of vports for ingress ACL configuration With the cited commit, ACLs are configured for the VF ports. The loop for the number of ports had the wrong number. Fix it. Fixes: 184867373d8c ("net/mlx5e: ACLs for priority tag mode") Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
02f3afd9 |
|
05-Apr-2019 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: E-Switch, Correct type to u16 for vport_num and int for vport_index To avoid any ambiguity between vport index and vport number, rename functions that had vport, to vport_num or vport_index appropriately. vport_num is u16 hence change mlx5_eswitch_index_to_vport_num() return type to u16. vport_index is an int in vport array. Hence change input type of vport index in mlx5_eswitch_index_to_vport_num() to int. Correct multiple eswitch representor interfaces use type u16 of rep->vport as type int vport_index. Send vport FW commands with correct eswitch u16 vport_num instead host int vport_index. Fixes: 5ae5162066d8 ("net/mlx5: E-Switch, Assign a different position for uplink rep and vport") Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Vu Pham <vuhuong@mellanox.com> Reviewed-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
6f4e0219 |
|
18-Apr-2019 |
Bodong Wang <bodong@mellanox.com> |
net/mlx5: E-Switch, Use atomic rep state to serialize state change When the state of rep was introduced, it was also designed to prevent duplicate unloading of the same rep. Considering the following two flows when an eswitch manager is at switchdev mode with n VF reps loaded. +--------------------------------------+--------------------------------+ | cpu-0 | cpu-1 | | -------- | -------- | | mlx5_ib_remove | mlx5_eswitch_disable_sriov | | mlx5_ib_unregister_vport_reps | esw_offloads_cleanup | | mlx5_eswitch_unregister_vport_reps | esw_offloads_unload_all_reps | | __unload_reps_all_vport | __unload_reps_all_vport | +--------------------------------------+--------------------------------+ These two flows will try to unload the same rep. Per original design, once one flow unloads the rep, the state moves to REGISTERED. The 2nd flow will no longer needs to do the unload and bails out. However, as read and write of the state is not atomic, when 1st flow is doing the unload, the state is still LOADED, 2nd flow is able to do the same unload action. Kernel crash will happen. To solve this, driver should do atomic test-and-set for the state. So that only one flow can change the rep state from LOADED to REGISTERED, and proceed to do the actual unloading. Since the state is changing to atomic type, all other read/write should be atomic action as well. Fixes: f121e0ea9586 (net/mlx5: E-Switch, Add state to eswitch vport representors) Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
786ef904 |
|
20-Apr-2019 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: Reuse mlx5_esw_for_each_vf_vport macro in two files Currently mlx5_esw_for_each_vf_vport iterates over mlx5_vport entries in eswitch.c Same macro in eswitch_offloads.c iterates over vport number in eswitch_offloads.c Instead of duplicate macro names, to avoid confusion and to reuse the same macro in both files, move it to eswitch.h. To iterate over vport numbers where there is no need to iterate over mlx5_vport, but only a vport number is needed, rename those macros in eswitch_offloads.c to mlx5_esw_for_each_vf_num_vport*. While at it, keep all vport and vport rep iterators together. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
18486737 |
|
03-Mar-2019 |
Eli Britstein <elibr@mellanox.com> |
net/mlx5e: ACLs for priority tag mode Current ConnectX HW is unable to perform VLAN pop in TX path and VLAN push on RX path. As a workaround, untagged packets are tagged with VID 0x000 allowing pop/push actions to be exchanged with VLAN rewrite actions. Use the ingress ACL table, preceding the FDB, to push VLAN 0x000 ID tag for untagged packets and the egress ACL table, succeeding the FDB, to pop VLAN 0x000 ID tag. Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
80f09dfc |
|
29-Apr-2019 |
Maor Gottlieb <maorg@mellanox.com> |
net/mlx5: Eswitch, enable RoCE loopback traffic When in switchdev mode, we would like to treat loopback RoCE traffic (on eswitch manager) as RDMA and not as regular Ethernet traffic In order to enable it we add flow steering rule that forward RoCE loopback traffic to the HW RoCE filter (by adding allow rule). In addition we add RoCE address in GID index 0, which will be set in the RoCE loopback packet. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
b6d9ccb1 |
|
28-Mar-2019 |
Mark Bloch <markb@mellanox.com> |
net/mlx5: E-Switch, don't use hardcoded values for FDB prios When creating the FDB prios, use the enum values already defined and not the hardcoded values. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
#
eda99e11 |
|
27-Feb-2019 |
Max Gurtovoy <maxg@mellanox.com> |
net/mlx5: E-Switch, Fix double mutex initialization Delete mutex_init call of a lock that's initialized in inner function. Fixes: eca8cc389535 ("net/mlx5: E-Switch, Refactor offloads flow steering init/cleanup") Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
5c1d260e |
|
21-Mar-2019 |
Roi Dayan <roid@mellanox.com> |
net/mlx5: E-Switch, Protect from invalid memory access in offload fdb table The esw offloads structures share a union with the legacy mode structs. Reset the offloads struct to zero in init to protect from null assumptions made by the legacy mode code. Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
ee576ec1 |
|
21-Mar-2019 |
Saeed Mahameed <saeedm@mellanox.com> |
net/mlx5e: Fix compilation warning in en_tc.c Amazingly a mlx5e_tc function is being called from the eswitch layer, which is by itself very terrible! The function was declared locally in eswitch_offloads.c so it could be used there, which caused the following compilation warning, fix that. drivers/.../mlx5/core/en_tc.c:3242:6: [-Werror=missing-prototypes] error: no previous prototype for ‘mlx5e_tc_clean_fdb_peer_flows’ Fixes: 04de7dda7394 ("net/mlx5e: Infrastructure for duplicated offloading of TC flows") Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
2aca1787 |
|
21-Mar-2019 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: Rename total_vfs to total_vports Macro MLX5_TOTAL_VPORTS() returns total number of vports. Therefore, rename variable total_vfs to total_vports to improve code readability. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
c96692fb |
|
20-Dec-2018 |
Bodong Wang <bodong@mellanox.com> |
net/mlx5: E-Switch, Allow transition to offloads mode for ECPF Currently, the e-switch driver requires going to legacy mode before changing to the offloads mode. This makes sense for regular case as the legacy mode is done by creating VFs. However, it's problematic when ECPF is the eswitch manager. In such case, ECPF will control the vports on peer host including the peer PF and VFs. But ECPF doesn't need and shall not create VFs as the VFs are created in the peer PF host. Grant ECPF the ability to change from none to the offloads mode. Note that currently the only way to go back to none mode is by unloading the ECPF driver. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
a3888f33 |
|
29-Jan-2019 |
Bodong Wang <bodong@mellanox.com> |
net/mlx5: E-Switch, Load/unload VF reps according to event from host PF When host PF changes the number of VFs, the ECPF esw driver will get a FW event. It should query the number of VFs enabled by host PF and update the VF reps accordingly. Note that host PF can't change the number of VFs dynamically, it has to reset the number of VFs to 0 before changing to a new positive number. The host event is registered when driver is moving to switchdev mode, and it's the last step to do in esw_offloads_init. It's unregistered and the work queue is flushed when driver quits from switchdev mode. In this way, the host event and devlink command are serialized. When driver is enabling switchdev mode, pay attention to the following two facts: 1. Host PF must not have VF initialized as the flow table in ECPF has ENCAP enabled as default. Such flow table can't be created with existing initialized VFs. 2. ECPF doesn't know how many VFs the host PF will enable, ECPF offloads flow steering shall create the flow table/groups based on the max number of VFs possibly supported by host PF. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
81cd229c |
|
10-Dec-2018 |
Bodong Wang <bodong@mellanox.com> |
net/mlx5: E-Switch, Consider ECPF vport depends on eswitch ownership ECPF connects to the eswitch through vport 0xfffe. ECPF may or may not be the eswitch manager depending on firmware configuration. 1. If ECPF is eswitch manager: ECPF will take over the eswitch manager responsibility. A rep of the host PF shall be created at the ECPF side for the eswitch manager to control. 2. If ECPF is not eswitch manager: host PF will be the eswitch manager, ECPF acts similar as a VF to the host PF. Host PF will be aware of the ECPF vport presence and control it's rep. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
5ae51620 |
|
14-Dec-2018 |
Bodong Wang <bodong@mellanox.com> |
net/mlx5: E-Switch, Assign a different position for uplink rep and vport In offloads mode, the current implementation puts the uplink representor at index zero of the vport reps array. It is not "natural" to place it at index 0 since we want to put the representor for vport 0 at index 0 with the introduction of SmartNIC. A separate patch will handle the case whether a rep is needed for vport 0 (PF vport). So, we want to have a different placeholder for uplink vport and representor. It was placed at the end of vport and rep array. Since vport number can no longer act as an index into the vport or representors arrays, use functions to map vport numbers to indices when accessing the vports or representors arrays, and vice versa. Signed-off-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
f8e8fa02 |
|
31-Jan-2019 |
Bodong Wang <bodong@mellanox.com> |
net/mlx5: E-Switch, Centralize repersentor reg/unreg to eswitch driver Eswitch has two users: IB and ETH. They both register repersentors when mlx5 interface is added, and unregister the repersentors when mlx5 interface is removed. Ideally, each driver should only deal with the entities which are unique to itself. However, current IB and ETH drivers have to perform the following eswitch operations: 1. When registering, specify how many vports to register. This number is the same for both drivers which is the total available vport numbers. 2. When unregistering, specify the number of registered vports to do unregister. Also, unload the repersentors which are already loaded. It's unnecessary for eswitch driver to hands out the control of above operations to individual driver users, as they're not unique to each driver. Instead, such operations should be centralized to eswitch driver. This consolidates eswitch control flow, and simplified IB and ETH driver. This patch doesn't change any functionality. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
29d9fd7d |
|
29-Jan-2019 |
Bodong Wang <bodong@mellanox.com> |
net/mlx5: E-Switch, Support load/unload reps of specific vport types Currently the driver loads and unloads all reps in an unbreakable group. However, with ECPF, the reps of special vports such as uplink and host PF should always be loaded in switchdev mode where the reps for VFs will be loaded on-demand and unloaded on no-demand. This is a pre-step for that change. This patch doesn't change any functionality. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
f121e0ea |
|
29-Jan-2019 |
Bodong Wang <bodong@mellanox.com> |
net/mlx5: E-Switch, Add state to eswitch vport representors Currently the eswitch vport reps have a valid indicator, which is set on register and unset on unregister. However, a rep can be loaded or not loaded when doing unregister, current driver checks if the vport of that rep is enabled as a flag to imply the rep is loaded. However, for ECPF, this is not valid as the host PF will enable the vports for its VFs instead. Add three states: {unregistered, registered, loaded}, with the following state changes across different operations: create: (none) -> unregistered reg: unregistered -> registered load: registered -> loaded unload: loaded -> registered unreg: registered -> unregistered Note that the state shall only be updated inside eswitch driver rather than individual drivers such as ETH or IB. Signed-off-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Suggested-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
879c8f84 |
|
28-Jan-2019 |
Bodong Wang <bodong@mellanox.com> |
net/mlx5: E-Switch, Use getter and iterator to access vport/rep With only PF and VF, it is sufficient to have the vport/rep array index as the vport number. This is because PF and VF vports numbers are consecutive serial numbers. In downstream patches with introducing of ECPF and UPLINK vports, it's not consecutive any more. Use getter to get specific vport/rep, and use iterator to traversal a list of vport/rep. This hides the translation between array index and vport number, and provides flexibility of using different translation mechanism in the future. This patch doesn't change any functionality. Signed-off-by: Bodong Wang <bodong@mellanox.com> Suggested-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
c9b99abc |
|
31-Jan-2019 |
Bodong Wang <bodong@mellanox.com> |
net/mlx5: E-Switch, Split VF and special vports for offloads mode When driver is entering offloads mode, there are two major tasks to do: initialize flow steering and create representors. Flow steering should make sure enough flow table/group spaces are reserved for all reps. Representors will be created in a group, all or none. With the introduction of ECPF, flow steering should still reserve the same spaces. But, the representors are not always loaded/unloaded in a single piece. Once ECPF is in offloads mode, it will get the number of VF changing event from host PF. In such scenario, only the VF reps should be loaded/unloaded, not the reps for special vports (such as the uplink vport). Thus, when entering offloads mode, driver should specify the total number of reps, and the number of VF reps separately. When leaving offloads mode, the cleanup should use the information self-contained in eswitch such as number of VFs. This patch doesn't change any functionality. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
eca8cc38 |
|
07-Feb-2019 |
Bodong Wang <bodong@mellanox.com> |
net/mlx5: E-Switch, Refactor offloads flow steering init/cleanup E-switch offloads mode initialize/cleanup multiple steering related entities (flow table/group). Refactor these operations to internal helper functions for better block design. This patch doesn't change any functionality. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
a1b3839a |
|
08-Nov-2018 |
Bodong Wang <bodong@mellanox.com> |
net/mlx5: E-Switch, Properly refer to the esw manager vport In SmartNIC mode, the eswitch manager is not necessarily the PF (vport 0). Use a helper function to get the correct eswitch manager vport number and cache on the eswitch instance for fast reference. Signed-off-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
cd7e4186 |
|
12-Feb-2019 |
Bodong Wang <bodong@mellanox.com> |
net/mlx5: E-Switch, Avoid magic numbers when initializing offloads mode When dealing with the offloads mode initialization, driver refers to the number of VFs and add magic number one (1) to take account of the uplink. This is not clear and will make the code less readable after adding other vports (e.g. host PF). As these are special vports compared to VF vports, add a helper macro to denote such special vports and eliminate the use of magic number. Moreover, when creating offloads flow table and groups, the driver reserves two more slots for UC and MC miss rules. Replace this magic number with a helper macro as well. This patch doesn't change any functionality. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
b05af6aa |
|
12-Feb-2019 |
Bodong Wang <bodong@mellanox.com> |
net/mlx5: E-Switch, Normalize the name of uplink vport number Driver used to name uplink vport as FDB_UPLINK_VPORT, it's hard to comply with the same naming convention along with the introduction of other vports. Use MLX5_VPORT as the prefix for such vports and relocate the uplink vport definition to public header file for the benefits of both net and IB drivers. This patch doesn't change any functionality. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
6363651d |
|
10-Jan-2019 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5e: Properly set steering match levels for offloaded TC decap rules The match level computed by the driver gets to be wrong for decap rules with wildcarded inner packet match such as: tc filter add dev vxlan_sys_4789 protocol all parent ffff: prio 2 flower enc_dst_ip 192.168.0.9 enc_key_id 100 enc_dst_port 4789 action tunnel_key unset action mirred egress redirect dev eth1 The FW errs for a missing matching meta-data indicator for the outer headers (where we do have a match), and a wrong matching meta-data indicator for the inner headers (where we don't have a match). Fix that by taking into account the matching on the tunnel info and relating the match level of the encapsulated packet to the firmware inner headers indicator in case of decap. As for vxlan we mandate a match on the tunnel udp dst port, and in general we practically madndate a match on the source or dest ip for any IP tunnel, the fix was done in a minimal manner around the tunnel match parsing code. Fixes: d708f902989b ('net/mlx5e: Get the required HW match level while parsing TC flow matches') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reported-by: Slava Ovsiienko <viacheslavo@mellanox.com> Reviewed-by: Jianbo Liu <jianbol@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
915fe1a0 |
|
13-Nov-2018 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5: E-Switch, Remove redundant reloading of the IB interface The reload of the IB interface done on the offloads stop call is redundant b/c we do that on mlx5_eswitch_disable_sriov(), remove it. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
04de7dda |
|
11-Nov-2018 |
Roi Dayan <roid@mellanox.com> |
net/mlx5e: Infrastructure for duplicated offloading of TC flows Under uplink LAG or multipath schemes, traffic that matches one flow might arrive on both uplink ports and transmitted through both as part of supporting aggregation and high-availability. To cope with the fact that the SW model might use logical SW port (e.g uplink team or bond) but we have two HW ports with e-switch on each, there are cases where in order to offload a SW TC rule we need to duplicate it to two HW flows. Since each HW rule has its own counter we also aggregate the counter of both rules when a flow stats query is executed from user-space. Introduce the changes for the different elements (add/delete/stats), currently nothing is duplicated. Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Aviv Heller <avivh@mellanox.com> Signed-off-by: Shahar Klein <shahark@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
ac004b83 |
|
11-Nov-2018 |
Roi Dayan <roid@mellanox.com> |
net/mlx5e: E-Switch, Add peer miss rules In the sriov offloads mode, packets that are not matched by any other rule are sent towards the e-switch vport manager for further processing. Under upcoming patches (e.g for uplink LAG), packets sent from VF vports belonging to esw0 (e-switch related to PF0) might end up in esw1 (e-switch related to PF1) due to muxing logic applied by the FW. In such a case we still want the missed packet to be sent to the "base" esw manager vport in order to present the control plane a consistent view of the source (VF reresentor) port. Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Aviv Heller <avivh@mellanox.com> Signed-off-by: Shahar Klein <shahark@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
8c4dc42b |
|
18-Nov-2018 |
Eli Britstein <elibr@mellanox.com> |
net/mlx5e: Support multiple encapsulations for a TC flow Currently a flow is associated with a single encap structure. The FW extended destination features enables the driver to associate a flow with multiple encap instances. Change the encap id field from a flow scope to a per destination value in the flow attributes struct. Use the encaps array to associate a flow table entry with multiple encap entries. Update the neigh logic to offload only if all encapsulations used in a flow are connected, and un-offload upon the first one disconnected. Note that the driver can now support up to two encap destinations. Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
1cc26d74 |
|
25-Nov-2018 |
Eli Britstein <elibr@mellanox.com> |
net/mlx5e: Support header rewrite actions with remote port mirroring A rule with the following actions is split to a two level FDB: 1. Forward to local mirror vport 2. Header rewrite 3. Forward to local vport In the first level flow table, forward the packet to the local port and forward the packet to the second level flow table for header rewrite and local port forwarding. This configuration fails when mirroring to a remote encapsulated destination because currently an FTE cannot support encap and table destinations. Use the extended destination capabilities to configure the first level flow table with a multi-destination FTE to the uplink and second level table and the second level flow table for the header rewrite and local port forwarding. Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
a18e879d |
|
03-Dec-2018 |
Eli Britstein <elibr@mellanox.com> |
net/mlx5e: Annul encap action ordering requirement Currently a FW syndrome is emitted if the driver configures a multi-destination FTE where the first destination is a tunneled uplink port and the second destination is a local vPort. Support this scenario by creating a multi-destination FTE using the firmware's extended destination capabilities. Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
f493f155 |
|
01-Dec-2018 |
Eli Britstein <elibr@mellanox.com> |
net/mlx5e: Move flow attr reformat action bit to per dest flags Flow attr reformat action bit is moved from the global action bits to a per destination flags field, as a pre-step for adding additional flags to support encapsulation properties per destination, with no functionality change. Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
df65a573 |
|
01-Dec-2018 |
Eli Britstein <elibr@mellanox.com> |
net/mlx5e: Refactor eswitch flow attr for destination specific properties Currently the eswitch flow attr structure stores each destination specific property in its own specific array. Group them in an array of destination structures as a pre-step towards adding additional destination specific field properties. Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
e85e02ba |
|
23-Nov-2018 |
Eli Britstein <elibr@mellanox.com> |
net/mlx5: E-Switch, Rename esw attr mirror count field The mirror count esw attributes field is used to determine if splitting the rule to two FTEs is required while programming e-switch mirroring. Rename it to split count, making it clearer with no functional change. Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
aa39c2c0 |
|
10-Dec-2018 |
Eli Britstein <elibr@mellanox.com> |
net/mlx5: E-Switch, Change vhca id valid bool field to bit flag Change the driver flow destination struct to use bit flags with the vhca id valid being the 1st one. The flags field is more extendable and will be used in downstream patch. Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
bf07aa73 |
|
02-Sep-2018 |
Paul Blakey <paulb@mellanox.com> |
net/mlx5e: Support offloading tc priorities and chains for eswitch flows Currently we fail when user specify a non-zero chain, this patch adds the support for it and tc priorities. To get to a new chain, use the tc goto action. Currently we support a fixed prio range 1-16, and chain range 0-3. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
c92a0b94 |
|
04-Sep-2018 |
Paul Blakey <paulb@mellanox.com> |
net/mlx5: E-Switch, Enable setting goto slow path chain action A pre-step for the tc offloads code to use this when a neigh is not available for encap rules. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
42f7ad67 |
|
02-Sep-2018 |
Paul Blakey <paulb@mellanox.com> |
net/mlx5e: For TC offloads, always add new flow instead of appending the actions When replacing a tc flower rule, flower first requests to add the new rule (new action), then deletes the old one. But currently when asked to add a new tc flower flow, we append the actions (and counters to it). This can result in a fte with two flow counters or conflicting actions (drop and encap action) which firmware complains/errs about and isn't achieving what the user aimed for. Instead, insert the flow using the new no-append flag which will add a new HW rule, the old flow and rule will be deleted later by flower Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanmox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
e52c2802 |
|
02-Jul-2018 |
Paul Blakey <paulb@mellanox.com> |
net/mlx5: E-Switch, Add chains and priorities A chain is a group of priorities, so use the fdb parallel sub namespaces to implement chains, and a flow table for each priority in them. Because these namespaces are parallel and in series to the slow path fdb, the chains aren't connected to one another (but to the slow path), and one must use a explicit goto action to reach a different chain. Flow tables for the priorities will be created on demand and destroyed once not used. The Firmware has four pools of tables for sizes S/XS/M/L (4k, 64k, 1m, 4m). We maintain ghost copies of the pools occupancy. When a new table is to be created, we scan the pools from large to small and find the 1st table size which can be now created. When a table is destroyed, we update the relevant pool. Multi chain/prio isn't enabled yet by this patch, for now all flows will use the default chain 0, and prio 1. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
48265006 |
|
20-Sep-2018 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5: E-Switch, Have explicit API to delete fwd rules Be symmetric with the e-switch API to add rules which has a specific function to add fwd rules which are used as part of vport mirroring. This patch doesn't change any functionality. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
171c7625b |
|
02-Oct-2018 |
Mark Bloch <markb@mellanox.com> |
net/mlx5: Use flow counter IDs and not the wrapping cache object Currently, when a flow rule is created using the FS core layer, the caller has to pass the entire flow counter object and not just the counter HW handle (ID). This requires both the FS core and the caller to have knowledge about the inner implementation of the FS layer flow counters cache and limits the possible users. Move to use the counter ID across the place when dealing with flows. Doing this decoupling, now can we privatize the inner implementation of the flow counters. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
b8aee822 |
|
02-Oct-2018 |
Mark Bloch <markb@mellanox.com> |
net/mlx5: E-Switch, Get counters for offloaded flows from callers There's no real reason for the e-switch logic to manage the creation of counters for offloaded flows. The API already has the directive for the caller to denote they want to attach a counter to the created flow. As such, we go and move the management of flow counters to the mlx5e tc offload logic. This also lets us remove an inelegant interface where the FS layer had to provide a way to retrieve a counter from a flow rule. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
8c98ee77 |
|
05-Aug-2018 |
Eli Britstein <elibr@mellanox.com> |
net/mlx5e: E-Switch, Add extack messages to devlink callbacks Return extack messages for failures in the e-switch devlink callbacks. Messages provide reasons for not being able to issue the operation. Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
db7ff19e |
|
15-Aug-2018 |
Eli Britstein <elibr@mellanox.com> |
devlink: Add extack for eswitch operations Add extack argument to the eswitch related operations. Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
c966f7d5 |
|
16-Aug-2018 |
Gavi Teitz <gavi@mellanox.com> |
net/mlx5: E-Switch, Provide flow dest when creating vport rx rule Currently the destination for the representor e-switch rx rule is a TIR number. Towards changing that to potentially be a flow table, as part of enabling RSS for representors, modify the signature of the related e-switch API to get a flow destination. Signed-off-by: Gavi Teitz <gavi@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
c88a026e |
|
21-Aug-2018 |
Raed Salem <raeds@mellanox.com> |
net/mlx5: E-Switch, Fix memory leak when creating switchdev mode FDB tables The memory allocated for the slow path table flow group input structure was not freed upon successful return, fix that. Fixes: 1967ce6ea5c8 ("net/mlx5: E-Switch, Refactor fast path FDB table creation in switchdev mode") Signed-off-by: Raed Salem <raeds@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
60786f09 |
|
28-Aug-2018 |
Mark Bloch <markb@mellanox.com> |
{net, RDMA}/mlx5: Rename encap to reformat packet Renames all encap mlx5_{core,ib} code to use the new naming of packet reformat. This change doesn't introduce any function change and is needed to properly reflect the operation being done by this action. For example not only can we encapsulate a packet, but also decapsulate it. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
#
61444b45 |
|
28-Aug-2018 |
Mark Bloch <markb@mellanox.com> |
net/mlx5: Break encap/decap into two separated flow table creation flags Today we are able to attach encap and decap actions only to the FDB. In preparation to enable those actions on the NIC flow tables, break the single flag into two. Those flags control whatever a decap or encap operations can be attached to the flow table created. For FDB, if encapsulation is required, we set both of them. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
#
cc495188 |
|
25-Apr-2018 |
Jianbo Liu <jianbol@mellanox.com> |
net/mlx5e: Support offloading double vlan push/pop tc actions As we can configure two push/pop actions in one flow table entry, add support to offload those double vlan actions in a rule to HW. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
1482bd3d |
|
02-Jul-2018 |
Jianbo Liu <jianbol@mellanox.com> |
net/mlx5e: Refactor tc vlan push/pop actions offloading Extract actions offloading code to a new function, and also extend data structures for double vlan actions. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
8da6fe2a |
|
16-Jul-2018 |
Jianbo Liu <jianbol@mellanox.com> |
net/mlx5: Add core support for double vlan push/pop steering action As newer firmware supports double push/pop in a single FTE, we add core bits and extend vlan action logic for it. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
733d3e54 |
|
31-May-2018 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5e: Avoid dealing with vport representors if not being e-switch manager In smartnic env, the host (PF) driver might not be an e-switch manager, hence the switchdev mode representors are running on the embedded cpu (EC) and not at the host. As such, we should avoid dealing with vport representors if not being esw manager. While here, make sure to disallow eswitch switchdev related setups through devlink if we are not esw managers. Fixes: cb67b832921c ('net/mlx5e: Introduce SRIOV VF representors') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
e4ad91f2 |
|
16-May-2018 |
Chris Mi <chrism@mellanox.com> |
net/mlx5e: Split offloaded eswitch TC rules for port mirroring If a TC rule needs to be split for mirroring, create two HW rules, in the first level and the second level flow tables accordingly. In the first level flow table, forward the packet to the mirror port and forward the packet to the second level flow table for further processing, eg. encap, vlan push or header re-write. Currently the matching is repeated in both stages. While here, simplify the setup of the vhca id valid indicator also in the existing code. Signed-off-by: Chris Mi <chrism@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
592d3651 |
|
03-May-2018 |
Chris Mi <chrism@mellanox.com> |
net/mlx5e: Parse mirroring action for offloaded TC eswitch flows Currently, we only support the mirred redirect TC sub-action. In order to support flow based vport mirroring, add support to parse the mirred mirror sub-action. For mirroring, user-space will typically set the action order such that the mirror port (mirror VF) sees packets as the original port (VF under mirroring) sent them or as it will receive them. In the general case, it means that packets are potentially sent to the mirror port before or after some actions were applied on them. To properly do that, we should follow on the exact action order as set for the flow and make sure this will also be the case when we program the HW offload. We introduce a counter for the output ports (attr->out_count), which we increase when parsing each mirred redirect/mirror sub-action and when dealing with encap. We introduce a counter (attr->mirror_count) telling us if split is needed. If no split is needed and mirroring is just multicasting to vport, the mirror count is zero, all the actions of the TC flow should apply on that single HW flow. If split is needed, the mirror count tells where to do the split, all non-mirred tc actions should apply only after the split. The mirror count is set while parsing the following actions encap/decap, header re-write, vlan push/pop. Signed-off-by: Chris Mi <chrism@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
a842dd04 |
|
16-May-2018 |
Chris Mi <chrism@mellanox.com> |
net/mlx5: E-switch, Create a second level FDB flow table If firmware supports the forward action with a destination list that includes a flow table, create a second level FDB flow table. This is going to be used for flow based mirroring under the switchdev offloads mode. Signed-off-by: Chris Mi <chrism@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
52fff327 |
|
16-May-2018 |
Chris Mi <chrism@mellanox.com> |
net/mlx5: E-Switch, Reorganize and rename fdb flow tables We have several fdb flow tables for each of the legacy and switchdev modes. In the switchdev mode, there are fast path and slow path flow tables. Towards adding more flow tables in upcoming patches, reorganize and rename the various existing ones to reflect their functionality. Signed-off-by: Chris Mi <chrism@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
10ff5359 |
|
18-Mar-2018 |
Shahar Klein <shahark@mellanox.com> |
net/mlx5e: Explicitly set source e-switch in offloaded TC rules Set a specific source e-switch when setting a rule that matches on the ingress port. Signed-off-by: Shahar Klein <shahark@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
56e858df |
|
18-Mar-2018 |
Rabie Loulou <rabiel@mellanox.com> |
net/mlx5e: Explicitly set destination e-switch in FDB rules Set a specific destination e-switch when setting a destination vport. Signed-off-by: Rabie Loulou <rabiel@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Shahar Klein <shahark@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
b17f7fc1 |
|
21-Mar-2018 |
Shahar Klein <shahark@mellanox.com> |
net/mlx5: Add destination e-switch owner The destination e-switch owner allows a rule in namespace of one e-switch owner to point to a vport that is natively associated with another e-switch owner. Signed-off-by: Shahar Klein <shahark@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
e574978a |
|
16-May-2018 |
Christophe JAILLET <christophe.jaillet@wanadoo.fr> |
net/mlx5: Eswitch, Use 'kvfree()' for memory allocated by 'kvzalloc()' When 'kvzalloc()' is used to allocate memory, 'kvfree()' must be used to free it. Fixes: fed9ce22bf8ae ("net/mlx5: E-Switch, Add API to create vport rx rules") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
38aa51c1 |
|
05-Apr-2018 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5e: Support offloaded TC flows with no matches on headers For example: tc filter add dev ens2f0_0 parent ffff: flower skip_sw action drop Note that for eswitch flows, we still always match on the source port. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
6acfbf38 |
|
31-Jan-2018 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5e: Offload tc vlan push/pop using HW action Currently, we are emulating the offload of vlan push/pop actions using global setup as done by commit f5f82476090f ("net/mlx5: E-Switch, Support VLAN actions in the offloads mode"). With newer NICs, we can apply a flow action for that matter, do that while keeping the emulated path for the older HW brands. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
aa24670e |
|
30-Jan-2018 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5: E-Switch, Use same source for offloaded actions check Align the checks for modify header and encap actions with the rest of the code. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
c5447c70 |
|
23-Jan-2018 |
Mark Bloch <markb@mellanox.com> |
net/mlx5: E-Switch, Reload IB interface when switching devlink modes Up until this point it wasn't possible to activate IB representors when switching to switchdev mode, remove this limitation. We trigger reload of the PF IB interface in order to make sure that already allocated resources are invalid and new resources will be opened correctly with all the limitations of switchdev mode applied (only raw packet capabilities, without RoCE). We also move the remove/add to a place where the E-Switch mode is set/unset to better control when to trigger this action, this will allow the IB side to start in the correct mode. For better code reuse, create a function which reloads an interface and export it. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
f80be543 |
|
30-Jan-2018 |
Mark Bloch <markb@mellanox.com> |
net/mlx5: E-Switch, Optimize HW steering tables in switchdev mode Under switchdev mode we insert an eswitch miss rule causing any unmatched traffic to be sent towards the PF vport. This miss rule can be optimized if we break it to two, one case is for multicast traffic and the other for unicast. Breaking the miss rule into two (unicast and multicast) allows the firmware to program the hardware in a more efficient way. Using ConncetX-5 Ex with IXIA and testpmd (which use IB representors): IXIA -> NIC -> PF -> IB representor -> NIC -> VF: - Without this optimization: 9.2 MPPS. - With this optimization: 18 MPPS. VF -> NIC -> IB representor-> PF -> NIC -> IXIA: - Without this optimization: 17 MPPS. - With this optimization: 23.4 MPPS. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
cd3d07e7 |
|
23-Jan-2018 |
Mark Bloch <markb@mellanox.com> |
net/mlx5: E-Switch, Increase number of FTEs in FDB in switchdev mode The max FTE number should be the max number of SQs that can be opened. Ethernet representors open one SQ each. Once we add IB representor this will increase (depends on the user). For now lets start with 31 per IB representor and if needed increase in the future. This increase only affects the number of FTEs in the slow path FDB, offloaded rules (done via TC on the fast path portion of the FDB) aren't affected. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
57cbd893 |
|
16-Jan-2018 |
Mark Bloch <markb@mellanox.com> |
net/mlx5: E-Switch, Move representors definition to a global scope In preparation for IB representors, move representors structs to a global scope, also expose functions needed for registration, unregistration, eswitch mode and creating a flow rule to direct traffic from SQs to the right VF. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
22215908 |
|
27-Sep-2017 |
Mark Bloch <markb@mellanox.com> |
net/mlx5: E-Switch, Add callback to get representor device Add a callback interface to get a protocol device (per representor type). The Ethernet representors will expose their netdev via this interface. This functionality can be later used by IB representor in order to find the corresponding net device representor. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
a4b97ab4 |
|
07-Dec-2017 |
Mark Bloch <markb@mellanox.com> |
net/mlx5: E-Switch, Create generic header struct to be used by representors Now that we don't store type dependent data in struct mlx5_eswitch_rep we can create a generic interface, and representor type. struct mlx5_eswitch_rep will store an array of interfaces, each interface is used by a different representor type. Once we moved to a more generic interface, rdma driver representors can be added and utilize the same mechanism as the Ethernet driver representors use. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
5ed99fb4 |
|
07-Dec-2017 |
Mark Bloch <markb@mellanox.com> |
net/mlx5e: Move ethernet representors data into separate struct Ethernet representors have a need to store data which is applicable only for them. Create a priv void pointer in struct mlx5_eswitch_rep and move mlx5e to store the relevant data there. As part of this change we also initialize rep_if in mlx5e_rep_register_vf_vports() as otherwise the E-Switch code will copy a priv value which is garbage. We also rename mlx5_eswitch_get_uplink_netdev() to mlx5_eswitch_get_uplink_priv() and make it return void *. This way E-Switch code doesn't need to deal with net devices and we leave the task of getting it to mlx5e. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
159fe639 |
|
07-Dec-2017 |
Mark Bloch <markb@mellanox.com> |
net/mlx5: E-Switch, Create a dedicated send to vport rule deletion function In order for representors to send packets directly to VFs we use an E-Switch function which insert special rules into the HW. For symmetry create an E-Switch function that deletes these rules as well. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
f7a68945 |
|
07-Dec-2017 |
Mark Bloch <markb@mellanox.com> |
net/mlx5: E-Switch, Move mlx5e only logic outside E-Switch In our pursuit to cleanup e-switch sub-module from mlx5e specific code, we move the functions that insert/remove the flow steering rules that allow mlx5e representors to send packets directly to VFs into the EN driver code. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
4c66df01 |
|
24-Aug-2017 |
Mark Bloch <markb@mellanox.com> |
net/mlx5: E-Switch, Simplify representor load/unload callback API In the load() callback for loading representors we don't really need struct mlx5_eswitch but struct mlx5_core_dev, pass it directly. In the unload() callback for unloading representors we don't need the struct mlx5_eswitch argument, remove it. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
6ed1803a |
|
09-Aug-2017 |
Mark Bloch <markb@mellanox.com> |
net/mlx5: E-Switch, Refactor load/unload of representors Refactor the load/unload stages for better code reuse. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
e8d31c4d |
|
09-Aug-2017 |
Mark Bloch <markb@mellanox.com> |
net/mlx5: E-Switch, Refactor vport representors initialization Refactor the init stage of vport representors registration. vport number and hw id can be assigned by the E-Switch driver and not by the netdevice driver. While here, make the error path of mlx5_eswitch_init() a reverse order of the good path, also use kcalloc to allocate an array instead of kzalloc. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
4c5009c5 |
|
18-Oct-2017 |
Rabie Loulou <rabiel@mellanox.com> |
net/mlx5: Initialize destination_flow struct to 0 This is needed in order to enlarge it with more members that will get value of 0 when not set. Signed-off-by: Rabie Loulou <rabiel@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
19122039 |
|
01-Aug-2017 |
Shahar Klein <shahark@mellanox.com> |
net/mlx5: E-Switch, Unload the representors in the correct order When changing from switchdev to legacy mode, all the representor port devices (uplink nic and reps) are cleaned up. Part of this cleaning process is removing the neigh entries and the hash table containing them. However, a representor neigh entry might be linked to the uplink port hash table and if the uplink nic is cleaned first the cleaning of the representor will end up in null deref. Fix that by unloading the representors in the opposite order of load. Fixes: cb67b832921c ("net/mlx5e: Introduce SRIOV VF representors") Signed-off-by: Shahar Klein <shahark@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
a8ffcc74 |
|
09-Jul-2017 |
Rabie Loulou <rabiel@mellanox.com> |
net/mlx5: Increase the maximum flow counters supported Read new NIC capability field which represnts 16 MSBs of the max flow counters number supported (max_flow_counter_31_16). Backward compatibility with older firmware is preserved, the modified driver reads max_flow_counter_31_16 as 0 from the older firmware and uses up to 64K counters. Changed flow counter id from 16 bits to 32 bits. Backward compatibility with older firmware is preserved as we kept the 16 LSBs of the counter id in place and added 16 MSBs from reserved field. Changed the background bulk reading of flow counters to work in chunks of at most 32K counters, to make sure we don't attempt to allocate very large buffers. Signed-off-by: Rabie Loulou <rabiel@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
2fe30e23 |
|
28-May-2017 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5: Avoid space after casting Fix checkpatch complaints on that: CHECK: No space is necessary after a cast Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
e53eef63 |
|
28-May-2017 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5: Align to match opening parenthesis Fixed checkpatch complaints of the form: CHECK: Alignment should match open parenthesis Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
9d1cef19 |
|
04-Jun-2017 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5: Properly check applicability of devlink eswitch commands Currently we don't check that the link type is Eth and hence crash on IB ports when attempting to deref esw->xxx, fix that. To avoid repeating this check over and over, put the existing checks and the one on link type in a single helper. Fixes: 7768d1971de6 ('net/mlx5: E-Switch, Add control for encapsulation') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reported-by: Mohamad Badarnah <mohamadb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
1b9a07ee |
|
10-May-2017 |
Leon Romanovsky <leon@kernel.org> |
{net, IB}/mlx5: Replace mlx5_vzalloc with kvzalloc Commit a7c3e901a46f ("mm: introduce kv[mz]alloc helpers") added proper implementation of mlx5_vzalloc function to the MM core. This made the mlx5_vzalloc function useless, so let's remove it. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
45247bf2 |
|
25-Apr-2017 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5: Remove encap entry pointer from the eswitch flow attributes Encap wise, the tc eswitch flow attribute struct needs to have only the encap ID which is programmed later to the HW and none of the higher level encap params, fix that. This patch doesn't change any functionality. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
c415f704 |
|
30-Mar-2017 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5: E-Switch, Correctly deal with inline mode on ConnectX-5 On ConnectX5 the wqe inline mode is "none" and hence the FW reports MLX5_CAP_INLINE_MODE_NOT_REQUIRED. Fix our devlink callbacks to deal with that on get and set. Also fix the tc flow parsing code not to fail anything when inline isn't required. Fixes: bffaa916588e ('net/mlx5: E-Switch, Add control for inline mode') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
7768d197 |
|
25-Sep-2016 |
Roi Dayan <roid@mellanox.com> |
net/mlx5: E-Switch, Add control for encapsulation Implement the devlink e-switch encapsulation control set and get callbacks. Apply the value set by the user on the switchdev offloads mode when creating the fast FDB table where offloaded rules will be set. Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
1967ce6e |
|
14-Feb-2017 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5: E-Switch, Refactor fast path FDB table creation in switchdev mode Refactor the creation of the fast path FDB table that holds the offloaded rules in SRIOV switchdev mode into it's own function. This will be used in the next patch to be able and re-create the table under different settings without going through legacy mode. This patch doesn't change any functionality. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
b3ba5149 |
|
12-Apr-2017 |
Erez Shitrit <erezsh@mellanox.com> |
net/mlx5: Refactor create flow table method to accept underlay QP IB flow tables need the underlay qp to perform flow steering. Here we change the API of the flow tables creation to accept the underlay QP number as a parameter in order to support IB (IPoIB) flow steering. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d7e75a32 |
|
25-Jan-2017 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5e: Add offloading of E-Switch TC pedit (header re-write) actions This includes calling the parsing code that translates from pedit speak to the HW API, allocation (deallocation) of a modify header context and setting the modify header id associated with this context to the FTE of that flow. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
aa0cbbae |
|
14-Mar-2017 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5e: Properly deal with resource cleanup when adding TC flow fails The code for adding tc fdb flows leaves things half set when it fails in the middle. Currently we are not leaking things (e.g eswitch vlan reference, encap reference and HW resources) since the main code to add flower rules does a cleanup by calling mlx5e_tc_del_flow(). This cleanup further works just b/c we're checking there if the HW rule for the flow we are attempting to delete is valid before touching it, and since under the current possible combinations of supported actions it's okay to go and blidnly deref or delete all the action related resources (encap, vlan). Instead, do things properly, namely make sure that if add flow fails we clean all what was allocated or referenced. Now, the flow delete code can blindly deref/deallocate both the rule and the actions related resources and when more action combinations are introduced (such as the upcoming header re-write) we are fine with clear and robust code. While here, align all of nic/fdb parse actions/add flow functions to get mlx5e_tc_flow struct param and pick the attributes or whatever else needed from there. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
375f51e2 |
|
21-Mar-2017 |
Roi Dayan <roid@mellanox.com> |
net/mlx5: E-Switch, Don't allow changing inline mode when flows are configured Changing the eswitch inline mode can potentially cause already configured flows not to match the policy. E.g. set policy L4, add some L4 rules, set policy to L2 --> bad! Hence we disallow it. Keep track of how many offloaded rules are now set and refuse inline mode changes if this isn't zero. Fixes: bffaa916588e ("net/mlx5: E-Switch, Add control for inline mode") Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d85cdccb |
|
21-Mar-2017 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5e: Change the TC offload rule add/del code path to be per NIC or E-Switch Refactor the code to deal with add/del TC rules to have handler per NIC/E-switch offloading use case, and push the latter into the e-switch code. This provides better separation and is to be used in down-stream patch for applying a fix. Fixes: bffaa916588e ("net/mlx5: E-Switch, Add control for inline mode") Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5bae8c031 |
|
15-Jan-2017 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5: E-Switch, Re-enable RoCE on mode change only after FDB destroy We must re-enable RoCE on the e-switch management port (PF) only after destroying the FDB in its switchdev/offloaded mode. Otherwise, when encapsulation is supported, this re-enablement will fail. Also, it's more natural and symmetric to disable RoCE on the PF before we create the FDB under switchdev mode, so do that as well and revert if getting into error during the mode change later. Fixes: 9da34cd34e85 ('net/mlx5: Disable RoCE on the e-switch management [..]') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
5403dc70 |
|
11-Jan-2017 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5: E-Switch, Err when retrieving steering name-space fails Make sure to return error when we failed retrieving the FDB steering name space. Also, while around, correctly print the error when mode change revert fails in the warning message. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reported-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
eff596da |
|
12-Jan-2017 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5: Return EOPNOTSUPP when failing to get steering name-space When we fail to retrieve a hardware steering name-space, the returned error code should say that this operation is not supported. Align the various places in the driver where this call is made to this convention. Also, make sure to warn when we fail to retrieve a SW (ANCHOR) name-space. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
9eb78923 |
|
11-Jan-2017 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5: Change ENOTSUPP to EOPNOTSUPP As ENOTSUPP is specific to NFS, change the return error value to EOPNOTSUPP in various places in the mlx5 driver. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Suggested-by: Yotam Gigi <yotamg@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
264d7bf3 |
|
19-Dec-2016 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5: E-Switch, Enlarge the FDB size for the switchdev mode The E-Switch FDB size was hard coded to 8k. Change it to be min(max eswitch table size, max flow counters * num flow groups) where the max values are read from the firmware and the number of flow groups is hard-coded as before this change. We don't know upfront the division of flows to group. This setup allows each group to be of size up to the where we want to support (we mandate pairing of flows with counters for offloading). Thus, we don't expect multiple occurences for a group which in turn adds steering hops. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Tested-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
9da34cd3 |
|
28-Dec-2016 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5: Disable RoCE on the e-switch management port under switchdev mode Under the switchdev/offloads mode, packets that don't match any e-switch steering rule are sent towards the e-switch management port. We use a NIC HW steering rule set per vport (uplink and VFs) to make them be received into the host OS through the respective vport representor netdevice. Currnetly such missed RoCE packets will not get to this NIC steering rule, and hence VF RoCE will not work over the slow path of the offloads mode. This is b/c these packets will be matched by a steering rule added by the firmware that serves RoCE traffic set on the PF NIC vport which is also the e-switch management port under SRIOV. Disabling RoCE on the e-switch management vport when we are in the offloads mode, will signal to the firmware to remove their RoCE rule, and then the missed RoCE packets will be matched by the representor NIC steering rule as any other missed packets. To achieve that, we disable RoCE on the PF vport. We do that by removing (hot-unplugging) the IB device instance associated with the PF. This is also required by our current model where the PF serves as the uplink representor and hence only SW switching (TC, bridge, OVS) applications and slow path vport mlx5e net-device should be running over that vport. Fixes: c930a3ad7453 ('net/mlx5e: Add devlink based SRIOV mode changes') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
726293f1 |
|
01-Dec-2016 |
Hadar Hen Zion <hadarh@mellanox.com> |
net/mlx5e: Save the represntor netdevice as part of the representor Replace the representor private data to a net_device pointer holding the representor netdevice, instead of void pointer holding mlx5e_priv. It will be used by a new eswitch service function, returning the uplink representor netdevice. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bffaa916 |
|
22-Nov-2016 |
Roi Dayan <roid@mellanox.com> |
net/mlx5: E-Switch, Add control for inline mode Implement devlink show and set of HW inline-mode. The supported modes: none, link, network, transport. We currently support one mode for all vports so set is done on all vports. When eswitch is first initialized the inline-mode is queried from the FW. Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a54e20b4 |
|
07-Nov-2016 |
Hadar Hen Zion <hadarh@mellanox.com> |
net/mlx5e: Add basic TC tunnel set action for SRIOV offloads In mlx5 HW, encapsulation is offloaded by the steering rule having index into an encapsulation table containing the entire set of headers to be added by the HW. The driver sets these headers in a buffer when we are offloading the action. The code maintains mlx5_encap_entry for each encap header it has encountered when attempted to offload TC tunnel set action. This entry maintains a linked list of all the flows sharing the same encap header, when the last flow is removed from the list the encap entry is removed. The actual encap_header is allocated by the driver in the hardware only if we have layer two neighbour info when the encap entry is created. While the flow is in the driver, the driver holds a reference on the neighbour. When a new flow with encap action is inserted, the code first checks if the required encap entry exists according to the tunnel set parameters. If it does the encap is shared, otherwise a new mlx5_encap_entry is created. TC action parsing implementation in the driver assumes that tunnel set action is provided in the same order set by the user, e.g before the mirred_redirect action. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bbd00f7e |
|
07-Nov-2016 |
Hadar Hen Zion <hadarh@mellanox.com> |
net/mlx5e: Add TC tunnel release action for SRIOV offloads Enhance the parsing of offloaded TC rules to set HW matching on outer (encapsulation) headers. Parse TC tunnel release action and set it as mlx5 decap action when the required capabilities are supported. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
66958ed9 |
|
07-Nov-2016 |
Hadar Hen Zion <hadarh@mellanox.com> |
net/mlx5: Support encap id when setting new steering entry In order to support steering rules which add encapsulation headers, encap_id parameter is needed. Add new mlx5_flow_act struct which holds action related parameter: action, flow_tag and encap_id. Use mlx5_flow_act struct when adding a new steering rule. This patch doesn't change any functionality. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c9f1b073 |
|
07-Nov-2016 |
Hadar Hen Zion <hadarh@mellanox.com> |
net/mlx5: Add creation flags when adding new flow table When creating flow tables, allow the caller to specify creation flags. Currently no flags are used and as such this patch doesn't add any new functionality. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ee39fbc4 |
|
03-Nov-2016 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5: E-Switch, Set the actions for offloaded rules properly As for the current generation of the mlx5 HW (CX4/CX4-Lx) per flow vlan push/pop actions are emulated, we must not program them to the firmware. Fixes: f5f82476090f ('net/mlx5: E-Switch, Support VLAN actions in the offloads mode') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reported-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e37a79e5 |
|
20-Sep-2016 |
Mark Bloch <markb@mellanox.com> |
net/mlx5e: Add tc support for FWD rule with counter When creating a FWD rule using tc create also a HW counter for this rule. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
#
74491de9 |
|
31-Aug-2016 |
Mark Bloch <markb@mellanox.com> |
net/mlx5: Add multi dest support Currently when calling mlx5_add_flow_rule we accept only one flow destination, this commit allows to pass multiple destinations. This change forces us to change the return structure to a more flexible one. We introduce a flow handle (struct mlx5_flow_handle), it holds internally the number for rules created and holds an array where each cell points the to a flow rule. From the consumers (of mlx5_add_flow_rule) point of view this change is only cosmetic and requires only to change the type of the returned value they store. From the core point of view, we now need to use a loop when allocating and deleting rules (e.g given to us a flow handler). Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
#
f5f82476 |
|
22-Sep-2016 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5: E-Switch, Support VLAN actions in the offloads mode Many virtualization systems use a policy under which a vlan tag is pushed to packets sent by guests, and popped before the packet is forwarded to the VM. The current generation of the mlx5 HW doesn't fully support that on a per flow level. As such, we are addressing the above common use case with the SRIOV e-Switch abilities to push vlan into packets sent by VFs and pop vlan from packets forwarded to VFs. The HW can match on the correct vlan being present in packets forwarded to VFs (eSwitch steering is done before stripping the tag), so this part is offloaded as is. A common practice for vlans is to avoid both push vlan and pop vlan for inter-host VM/VM (east-west) communication because in this case, push on egress cancels out with pop on ingress. For supporting that, we use a global eswitch vlan pop policy, hence allowing guest A to communicate with both remote VM B and local VM C. This works since the HW pops the vlan only if it exists (e.g for C --> A packets but not for B --> A packets). On the slow path, when a VF vport has an offloaded flow which involves pushing vlans, wheres another flow is not currently offloaded, the packets from the 2nd flow seen by the VF representor on the host have vlan. The VF rep driver removes such vlan before calling into the host networking stack. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
776b12b6 |
|
22-Sep-2016 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5: Put elements related to offloaded TC rule in one struct Put the representors related to the source and dest vports and the action in struct mlx5_esw_flow_attr which is used while setting the FDB rule. This patch doesn't change any functionality. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bac9b6aa |
|
22-Sep-2016 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5: E-Switch, Set vport representor fields explicitly on registration The structure we use for the eswitch vport representor (mlx5_eswitch_rep) has some fields which are set from upper layers in the driver when they register the rep. Use explicit setting on registration time for them and avoid global memcpy. This patch doesn't add new functionality. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9deb2241 |
|
22-Sep-2016 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5: E-Switch, Set the vport when registering the uplink rep Set the vport value in the PF entry to be that of the uplink so we can use it blindly over the tc / eswitch offload code without translating it each time we deal with the uplink representor. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6c419ba8 |
|
18-Sep-2016 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5: E-Switch, Handle mode change failures E-switch mode changes involve creating HW tables, potentially allocating netdevices, etc, and things can fail. Add an attempt to rollback to the existing mode when changing to the new mode fails. Only if rollback fails, getting proper SRIOV functionality requires module unload or sriov disablement/enablement. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1a8ee6f2 |
|
18-Aug-2016 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5: E-Switch, Set the send-to-vport rules in the correct table While adding actual offloading support to the new switchdev mode, we didn't change the setup of the send-to-vport rules to put them in the slow path table, fix that. Fixes: 1033665e63b6 ('net/mlx5: E-Switch, Use two priorities for SRIOV offloads mode') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ef78618b |
|
18-Aug-2016 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5: E-Switch, Return the correct devlink e-switch mode Since mlx5 has also the NONE e-switch mode, we must translate from mlx5 mode to devlink mode on the devlink eswitch mode get call, do that. While here, remove the mlx5_ prefix from the static function helpers that deal with the mode to comply with the rest of the code. Fixes: c930a3ad7453 ('net/mlx5e: Add devlink based SRIOV mode change') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3d80d1a2 |
|
14-Jul-2016 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5: E-Switch, Add API to configure rules for the offloaded mode This allows for upper levels in the driver, e.g the TC offload code to add e-switch offloaded steering rules. The caller provides the rule spec for matching, action, source and destination vports. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1033665e |
|
14-Jul-2016 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5: E-Switch, Use two priorities for SRIOV offloads mode In the offloads mode, some slow path rules are added by the driver (e.g send-to-vport), while offloaded rules are to be added from upper layers. The slow path rules have lower priority and we don't want matching on offloaded rules to suffer from extra steering hops related to the slow path rules. We use two priorities, one for offloaded rules (fast path), and one for the control rules (slow path). To allow for that, we enable two priorities for the FDB namespace in the FS core code. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c5bb1730 |
|
04-Jul-2016 |
Maor Gottlieb <maorg@mellanox.com> |
net/mlx5: Refactor mlx5_add_flow_rule Reduce the set of arguments passed to mlx5_add_flow_rule by introducing flow_spec structure. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cb67b832 |
|
01-Jul-2016 |
Hadar Hen Zion <hadarh@mellanox.com> |
net/mlx5e: Introduce SRIOV VF representors Implement the relevant profile functions to create mlx5e driver instance serving as VF representor. When SRIOV offloads mode is enabled, each VF will have a representor netdevice instance on the host. To do that, we also export set of shared service functions from en_main.c, such that they can be used by both NIC and repsresentors netdevs. The newly created representor netdevice has a basic set of net_device_ops which are the same ndo functions as the NIC netdevice and an ndo of it's own for phys port name. The profiling infrastructure allow sharing code between the NIC and the vport representor even though the representor has only a subset of the NIC functionality. The VF reps and the PF which is used in that mode to represent the uplink, expose switchdev ops. Currently the only op supposed is attr get for the port parent ID which here serves to identify net-devices belonging to the same HW E-Switch. Other than that, no offloading is implemented and hence switching functionality is achieved if one sets SW switching rules, e.g using tc, bridge or ovs. Port phys name (ndo_get_phys_port_name) is implemented to allow exporting to user-space the VF vport number and along with the switchdev port parent id (phys_switch_id) enable a udev base consistent naming scheme: SUBSYSTEM=="net", ACTION=="add", ATTR{phys_switch_id}=="<phys_switch_id>", \ ATTR{phys_port_name}!="", NAME="$PF_NIC$attr{phys_port_name}" where phys_switch_id is exposed by the PF (and VF reps) and $PF_NIC is the name of the PF netdevice. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
127ea380 |
|
01-Jul-2016 |
Hadar Hen Zion <hadarh@mellanox.com> |
net/mlx5: Add Representors registration API Introduce E-Switch registration/unregister representors functions. Those functions are called by the mlx5e driver when the PF NIC is created upon pci probe action regardless of the E-Switch mode (NONE, LEGACY or OFFLOADS). Adding basic E-Switch database that will hold the vport represntors upon creation. This patch doesn't add any new functionality. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c930a3ad |
|
01-Jul-2016 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5e: Add devlink based SRIOV mode changes Implement handlers for the devlink commands to get and set the SRIOV E-Switch mode. When turning to the switchdev/offloads mode, we disable the e-switch and enable it again in the new mode, create the NIC offloads table and create VF reps. When turning to legacy mode, we remove the VF reps and the offloads table, and re-initiate the e-switch in it's legacy mode. The actual creation/removal of the VF reps is done in downstream patches. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
feae9087 |
|
01-Jul-2016 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5: Add devlink interface The devlink interface is initially used to set/get the mode of the SRIOV e-switch. Currently, these are only stubs for get/set, down-stream patch will actually fill them out. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fed9ce22 |
|
01-Jul-2016 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5: E-Switch, Add API to create vport rx rules Add the API to create vport rx rules of the form packet meta-data :: vport == $VPORT --> $TIR where the TIR is opened by this VF representor. This logic will by used for packets that didn't match any rule in the e-switch datapath and should be received into the host OS through the netdevice that represents the VF they were sent from. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c116c6ee |
|
01-Jul-2016 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5: E-Switch, Add offloads table Belongs to the NIC offloads name-space, and to be used as part of the SRIOV offloads logic to steer packets that hit the e-switch miss rule to the TIR of the relevant VF representor. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ab22be9b |
|
01-Jul-2016 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5: E-Switch, Add API to create send-to-vport rules Add the API to create send-to-vport e-switch rules of the form packet meta-data :: send-queue-number == $SQN and source-vport == 0 --> $VPORT These rules are to be used for a send-to-vport logic which conceptually bypasses the "normal" steering rules currently present at the e-switch datapath. Such rule should apply only for packets that originate in the e-switch manager vport (0) and are sent for a given SQN which is used by a given VF representor device, and hence the matching logic. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3aa33572 |
|
01-Jul-2016 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5: E-Switch, Add miss rule for offloads mode In the sriov offloads mode, packets that are not matched by any other rule should be sent towards the e-switch manager for further processing. Add such "miss" rule which matches ANY packet as the last rule in the e-switch FDB and programs the HW to send the packet to vport 0 where the e-switch manager runs. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
69697b6e |
|
01-Jul-2016 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5: E-Switch, Add support for the sriov offloads mode Unlike the legacy mode, here, forwarding rules are not learned by the driver per events on macs set by VFs/VMs into their vports, but rather should be programmed by higher-level SW entities. Saying that, still, in the offloads mode (SRIOV_OFFLOADS), two flow groups are created by the driver for management (slow path) purposes: The first group will be used for sending packets over e-switch vports from the host OS where the e-switch management code runs, to be received by VFs. The second group will be used by a miss rule which forwards packets toward the e-switch manager. Further logic will trap these packets such that the receiving net-device as seen by the networking stack is the representor of the vport that sent the packet over the e-switch data-path. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|